SECURE AGGREGATION WITH INTEGRITY VERIFICATION

FIELD

The present invention relates to a method, system and computer-readable medium for secure aggregation.

BACKGROUND

Secure aggregation can allow a server to compute the sum of client-provided inputs, without revealing individual client inputs to any party. Secure aggregation protocols are widely used for privacy-preserving federated learning applications.

Secure aggregation can allow a set of clients to compute the sum of their inputs while keeping each party's input private. In single-server secure aggregation protocols, clients may not have direct communication channels with each other, and the aggregation protocol may be orchestrated by a server the communicates with each client and forwards relevant messages. Single-server secure aggregation was introduced as a privacy-friendly mechanism for federated learning applications to prevent the server from learning individual inputs provided by clients. In particular, many secure aggregation solutions target strong privacy guarantees but offer little or no integrity protection against a possibly malicious server that tampers with the sum of clients' inputs.

SUMMARY

In an embodiment, the present disclosure provides a computer-implemented method for secure aggregation, by a server of a distributed client-server application, of client-provided inputs in a manner allowing for integrity verification. The method includes receiving, from each of a plurality of clients, a respective client input, for which a commitment to the respective client input is published. The commitments have been computed using randomness, and the commitments are aggregated by at least two super-clients and a sum of the aggregated commitments is published by each of the at least two super-clients. The method also includes obtaining a sum of the received client inputs. The method also includes publishing the sum of the received client inputs such that a validity of the sum of the received client inputs is checkable, by one or more of the clients, by comparing the sum of the received client inputs to a result of a verification algorithm that uses a sum of additive shares that were computed by the clients using the randomness, and by verifying that the published sum of the aggregated commitments is the same for each of the at least two super-clients. The method can be applied to use cases, for example, in digital medicine using medical data or smartcity applications to support decision-making.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the present invention will be described in even greater detail below based on the exemplary figures. The present invention is not limited to the exemplary embodiments. All features described and/or illustrated herein can be used alone or combined in different combinations in embodiments of the present invention. The features and advantages of various embodiments of the present invention will become apparent by reading the following detailed description with reference to the attached drawings which illustrate the following:

FIG. 1 schematically illustrates an exemplary environment of entities involved in a secure aggregation protocol;

FIG. 2 illustrates an example of a functionality of a commitment scheme;

FIG. 3 illustrates an example of a functionality of a homomorphic commitment scheme;

FIG. 4 illustrates an example process of a secure aggregation protocol with integrity;

FIG. 5 is a block diagram of an exemplary processing system, which can be configured to perform any and all operations disclosed herein; and

FIG. 6 illustrates a process for secure aggregation with public randomness according to an embodiment of the present disclosure.

DETAILED DESCRIPTION

Embodiments of the present invention provide improvements to an efficient secure aggregation protocol and framework that leverages a random beacon service to select a subset of clients that aid the server to aggregate inputs. Embodiments of the present invention provide for improvements to secure aggregation protocols and frameworks by providing input integrity, e.g., assurance that the sum output by the protocol was computed correctly.

Embodiments of the present invention can use linearly-homomorphic commitments to improve the secure aggregation protocol and framework with the ability for clients to verify whether the sum of client-provided inputs, as published by the server, has been computed correctly. Embodiments of the present invention also provide a method for enhancing secure aggregation protocols (e.g., single server protocols) with integrity protection. Embodiments of the present invention can further enable each client who participates in the aggregation process to verify that the output of the SA protocol, as declared by the server, matches with the sum of the client provided inputs.

In an example secure aggregation protocol, each client provides a private input so that the server can learn the sum of all client-provided inputs without learning individual inputs. The sum as computed by the server is then shared with the clients. Embodiments of the present invention can enable clients to verify that the sum published by the server is the actual sum of the inputs provided by the clients.

In a first aspect, the present disclosure provides a computer-implemented method for secure aggregation, by a server of a distributed client-server application, of client-provided inputs in a manner allowing for integrity verification. The method includes receiving, from each of a plurality of clients, a respective client input, for which a commitment to the respective client input is published. The commitments have been computed using randomness, and the commitments are aggregated by at least two super-clients and a sum of the aggregated commitments is published by each of the at least two super-clients. The method also includes obtaining a sum of the received client inputs. The method also includes publishing the sum of the received client inputs such that a validity of the sum of the received client inputs is checkable, by one or more of the clients, by comparing the sum of the received client inputs to a result of a verification algorithm that uses a sum of additive shares that were computed by the clients using the randomness, and by verifying that the published sum of the aggregated commitments is the same for each of the at least two super-clients.

In a second aspect, the present disclosure provides the method according to the first aspect, wherein the at least two super-clients, and/or at least two other super-clients, receive and publish different additive shares of a same client along with additional additive shares from other clients such that no super-client obtains all additive shares of a same client, and such that the sum of all additive shares is obtainable by summing the additive shares published by the at least two super-clients and/or the at least two other super-clients.

In a third aspect, the present disclosure provides the method according to the first or second aspect, wherein a number of the super-clients to be used for the secure aggregation is determined from a random beacon, and wherein a number of the additive shares computed per client corresponds to the number of the super-clients such that each of the super-clients receives one additive share per client.

In a fourth aspect, the present disclosure provides the method according to any of the first to third aspects, wherein the number of the super-clients to be used for the secure aggregation is determined using a pseudo-random function which takes as input a random seed generated by the random beacon and public keys of the clients.

In a fifth aspect, the present disclosure provides the method according to any of the first to fourth aspects, wherein the randomness is different for different clients, and wherein the randomness can only be reconstructed if all of the additive shares for a respective one of the clients are available.

In a sixth aspect, the present disclosure provides the method according to any of the first to fifth aspects, wherein at least one backup client is selected for each of the at least two super-clients. The backup client in each case has a secret key of a corresponding one of the super-clients.

In a seventh aspect, the present disclosure provides the method according to any of the first to sixth aspects, wherein the server receives the secret key from the corresponding one of the super-clients and sends the secret key to a corresponding one of the backup clients.

In an eighth aspect, the present disclosure provides the method according to any of the first to seventh aspects, wherein a set of backup clients are selected for each of the at least two super-clients, and wherein a secret sharing reconstruction threshold is less than a number of backup clients in the set so as to allow for a dropout of at least one of the at least two super-clients.

In a ninth aspect, the present disclosure provides the method according to any of the first to eighth aspects, wherein the client inputs are blinded using a mask using a key shared with one of the at least two super-clients.

In a tenth aspect, the present disclosure provides the method according to any of the first to ninth aspects, further comprising receiving, from each of the at least two super-clients, a partial blinding that was computed by a respective one of the super-clients by summing the masks received by the respective super-client from different ones of the clients. Obtaining the sum of the received client inputs includes aggregating the blinded client inputs and subtracting a sum of the partial blindings.

In an eleventh aspect, the present disclosure provides the method according to any of the first to tenth aspects, wherein the commitments are determined using a linearly-homomorphic commitment scheme.

In a twelfth aspect, the present disclosure provides the method according to any of the first to eleventh aspects, wherein the client inputs are from medical records and/or independent healthcare facilities, and wherein the sum of the received client inputs is used for training a model for medical diagnostics or for monitoring health status of patients.

In a thirteenth aspect, the present disclosure provides the method according to any of the first to twelfth aspects, wherein the client inputs are from sensors of a smartcity application, and wherein the sum of the client inputs are used to determine a condition in the smartcity application.

In a fourteenth aspect, the present disclosure provides a computer system for secure aggregation, by a server of a distributed client-server application, of client-provided inputs in a manner allowing for integrity verification, the system comprising one or more hardware processors which, alone or in combination, are configured to provide for execution of the method according to any of the first to thirteenth aspects.

In a fifteenth aspect, the present disclosure provides a tangible, non-transitory computer-readable medium having instructions thereon for secure aggregation, by a server of a distributed client-server application, of client-provided inputs in a manner allowing for integrity verification, the instructions, upon being executed by one or more hardware processors, alone or in combination, providing for execution of the method according to any of the first to thirteenth aspects.

Embodiments of the present invention can use a set of k super-clients. Those are regular clients chosen randomly from the set of n clients. Their task is to help in the verification of the sum as published by the server. Super-clients can be chosen, for example, using a random beacon service 106, as depicted in the environment 100 of FIG. 1. For instance, a set of n clients 102 can communicate with a server 104. The random beacon service 106 can be used to obtain a random permutation of the client IDs, and the first k clients are selected as super-clients 18.

Referring to FIG. 1, the system 100 shows the entities involved in the secure aggregation protocols with public randomness. Embodiments of the present invention can use a random beacon service 106 to generate uniform randomness for selecting the super-clients 108 from the clients 102. In some instances, the backup clients of super-clients 108 can be generated using the uniform randomness from the random beacon service 106 as well. For example, according to some embodiments, a server 104 determines a set of super-clients 108 from the clients 102 based on evaluating a pseudo-random function (PRF) on input a random seed, generated by the random beacon service 106, and the public keys of all the clients 102. Similarly, each super-client 108 can be assigned a pseudo-randomly chosen set of backup clients. For instance, client 102A can also be a super-client 108 with one or more backup clients (e.g., other clients such as client 102B and/or 102C).

Super-clients 108 and backup clients aid the server 104 in computing the aggregation of client inputs throughout the protocol execution, while the rest of the clients 102 that are not also super-clients 108 only have to be active once to provide their inputs to the server 104. At a high level, each client 102 can provide a masked input to the server 104 and each super-client 108 can provide a partial output to the server 104. The server 104 can use these partial outputs to remove the masks of the masked inputs and recover the sum of client inputs. This method can guarantee that the inputs of each client remain private, in the sense that neither the server 104 nor any other client 102 can learn individual inputs (c.g., which input belongs to which of the clients 102). Moreover, should some of the super-clients drop out before being able to send their partial outputs to the server 104, their backup clients allow the server 104 to reconstruct the missing partial outputs and complete the protocol.

Although certain entities within system 100 are described herein in the FIGs. as being singular entities, it will be appreciated that the entities and functionalities discussed herein can be implemented by and/or include one or more entities (e.g., one or more servers). The entities within the system 100 are in communication with other devices and/or systems within the system 100 via a network. The network can be a global area network (GAN) such as the Internet, a wide area network (WAN), a local area network (LAN), or any other type of network or combination of networks. The network can provide a wireline, wireless, or a combination of wireline and wireless communication between the entities within the system 100.

The clients 102 can each include computing devices or sensors that are operated by a client, customer, and/or other individual that is associated with the server 104. Each user from system 100 is and/or includes, but is not limited to, a desktop, laptop, tablet, mobile device (e.g., smartphone device, or other mobile device), computing system, sensor, IoT device and/or other types of computing entities that generally comprises one or more communication components, one or more processing components, and one or more memory components.

The server 104 is a computing system that performs one or more functions described herein. For instance, the server 104 can include, execute, operate, and/or otherwise be configured to perform secure aggregation using public randomness. For instance, in operation, the server 104 obtains information (e.g., randomness such as a seed) from the random beacon service 106 and/or information from the users 102 (e.g., public keys). The server 104 determines super-clients and/or backup clients from the users 102 based on the information from the random beacon service 106 and/or the information from the users 102. Subsequently, based on determining the super-clients 108 and/or backup clients, the server 104 can perform secure aggregation. For instance, the super-clients 108 and/or the back-up neighbors can assist the server 104 in computing the aggregation of the inputs from the users 102 as well as additional users within the system 100. For example, the super-clients 108 are a subset of the clients 102 from system 100 that aid the server 104 in computing the aggregation of the inputs provided by the users of the system 100. Each super-client can include one or more back-up neighbors, and the back-up neighbors allow the server 104 to reconstruct missing partial outputs and complete the aggregation protocol if some super-clients drop out before being able to send their partial outputs to the server 104. This will be explained in further detail below.

The server 104 includes and/or is implemented using one or more computing devices, computing platforms, cloud computing platforms, systems, servers, and/or other apparatuses. In some instances, the server 104 can be implemented as engines, software functions, and/or applications. For example, the functionalities of the server 104 can be implemented as software instructions stored in storage (e.g., memory) and executed by one or more processors.

The random beacon service 106 can be a computing system that performs one or more functions described herein. For instance, the random beacon service 106 can provide trusted public randomness (e.g., a random seed for a PRF) to the server 104 and/or the users within the system 100. The random beacon service 106 includes and/or is implemented using one or more computing devices, computing platforms, cloud computing platforms, systems, servers, and/or other apparatuses. The random beacon service 106 may be a third party entity (e.g., an entity that is not associated with the server 104 and/or the users of the system 100). In some instances, the random beacon service 106 can be implemented as engines, software functions, and/or applications. For example, the functionalities of the random beacon service 106 can be implemented as software instructions stored in storage (e.g., memory) and executed by one or more processors.

Embodiments of the current invention use linearly-homomorphic commitments. A commitment scheme COMM, as shown in FIG. 2, allows a party (e.g., a sender) to commit to a message m. Formally, a commitment scheme COMM comprises two algorithms, COMM.Commit and COMM.Verify. In order to commit to a message m, the sender can run algorithm COMM. Commit on input m and a uniformly chosen random value r (chosen by the sender), obtaining a commitment C, denoted by C:=COMM. Commit(m;r). Later on, the sender can open the commitment by revealing the input message m and the randomness r. Given values C, m, and r, any party can check that C is a valid commitment for message m by running algorithm COMM. Verify (C, m, r). A so-derived commitment C is hiding, e.g., it does not reveal any information about the message m, and it is binding, e.g., it is not possible to open commitment C to a message different from m. The specific format of the input values (i.e., message m, randomness r) and output values (. i.e., commitment C), and how the values are generated and/or chosen, can vary depending on the specific instantiation of the commitment scheme. In the example functionality 200 of FIG. 3, a commitment scheme 202 (i.e., function, algorithm) can receive, as inputs, a message m 204 and a randomness r 206. The commitment scheme 202 can process the message m 204 and randomness r 206 to output (i.e., determine) a commitment C 208. A verification scheme 210 (i.e., function, algorithm) can then receive, as inputs, the message m 204, the randomness r 206, and the commitment C 208. The verification scheme 210 can then verify that the commitment C 208 was properly computed using the message m 204, the randomness r 206. Additionally, most commitment schemes providing for the properties of linear homomorphism can be used to perform the COMM functionalities in different instantiations. For example, based on a given instantiation, different COMM.vrfy functions may verify as “yes,” “1”, “TRUE,” or the equivalent.

In an example embodiment of a commitment scheme (e.g., a commitment scheme recalled due to Pedersen), p, q∈ custom-character where q is a large prime and p=2q+1 is a safe prime, denotes a group of order q, and g, h are two generators of . To commit to a message m∈_q. the sender can choose a value r∈_quniformly at random and computes C:=g^m·h^r. Given a commitment C, a message m and a random value r, a party can check if C is a valid commitment with respect to message m by evaluating the predicate C custom-character g^m·h^r. Then, COMM.Vrfy(C, y, r)=TRUE if and only if C=g^y·h^r, where r is the randomness chosen by the sender. Afterwards, when providing the specification of the verifiable aggregation protocol, the verification algorithm COMM.Vrfy can take as randomness . Therefore, in the protocol for verifiable secure aggregation, multiple commitments can be combined, and the clients can verify the combined commitment using as randomness the sum of random values ρ_jprovided by super-clients.

A commitment scheme is linearly homomorphic if, when given commitment C₁to message m₁and commitment C₂to message m₂, then it is possible to aggregate C₁and C₂in a commitment C that can be opened to message m₁'m₂. This is shown in FIG. 3.

In the following embodiments of the present invention, commi =COMM.Commit(x_i; r_i) is used to denote the computation, performed by user i, of a commitment comm; for input message x_iand randomness r_i.

In the example functionality 300 of FIG. 3, multiple commitment schemes can be employed, e.g., a different commitment C can be output for each individual message m and randomness r. For instance, a commitment scheme 302 can receive a message m₁304 and a randomness r₁306, which can be provided by a first client (e.g., a client i or a super-client j). The commitment scheme 302 can then output a commitment _C1308 by applying the randomness r₁306 to the message m₁304. A commitment scheme 310 (which is preferably the same commitment scheme as commitment scheme 302) can receive a message m₂312 and a randomness r₂314 provided by a second client. The commitment scheme 310 can then output a commitment C₂316. The messages 304, 312 can be summed into a message m 318, the randomnesses 306, 314 can be summed into a randomness r 320, and the commitments 308. 316 can be summed into a commitment C 322. A verification scheme 324 (e.g., function, algorithm) can then receive, as inputs, the message m 318, the randomness r 320, and the commitment C 322. The verification scheme 324 can run a further commitment scheme 326 (which is preferably the same commitment scheme as commitment schemes 302, 310), and verify whether the output of the further commitment scheme 326 is or is not equal to the commitment C 322.

In embodiments of the present invention, before clients share their inputs with the server, client i can compute the commitment to its input x_ias commi:=COMM.Commit(x_i, r_i). At the same time, client i can compute k random shares of randomness r_iso that r_ican be reconstructed only if all of the k shares are available. This could be done, for example, by selecting k−1 shares at random r_i2, . . . , r_ij, . . . , r_ikand then computing r_i=r_i−(r_i2, . . . ,r_ij, . . . ,r_ik). Next client i can send comm_ito all the super-clients while it sends r_ijto super-client j.

Super-client j can aggregate all received commitments comm₁, . . . , comm_nas C_jand publish the result (e.g., send C_;to all clients when the server is sending the message and/or send to the server or requesting to forward to all clients when any entity other than the server is sending the message). At the same time, super-client j can aggregate all random shares received r_ij, . . . ,n_jas z_jand publish the result. Thus, as used herein, publishing means sending or making available to all clients (or super-clients depending on the situation), possibly using the server as an intermediary. As mentioned above, all communications between clients will typically go through the server as there is typically no communication channel between clients.

After the server publishes the sum y of client-provided inputs, each client can do the following. Each client can accept the sum y published by the server as valid when (a) all super-clients j have published the same value C_j, and COMM.Verify(C_j; y; z) outputs “yes” (or 1 or TRUE, or the equivalent), where z=z₁+z₂+ . . . +z_k.

In an embodiment, the present invention can provide for computing, by a server of a distributed client-server application, the sum of client-provided inputs while ensuring to clients that the computed sum as published by the server is the actual sum of the client-provided inputs. For example, a process for verifying a server output by a client can be seen in FIG. 4. While the steps of process 400, provided below, are numbered steps, the steps are not necessarily performed in that number order, and are capable of being performed in different order unless one step is required to be performed to provide information for a subsequent step. For example, step S7 can be performed before step S1, and the server can perform step S8 upon receiving the client inputs, or at a later time. Moreover, steps can be performed simultaneously, such as step S1 and S4.

At step S1, a client i with input x_ican compute and publish a commitment to its input comm_i:=COMM.Commit(x_i;r_i), where r_iis the randomness used to obtain the commitment.

At step S2, a client sends a commitment comm; to the super-clients (e.g., via the server), a super-client can receive the commitment to input x_i. Since in most cases, the clients do not communicate with each other, the commitments are sent to the server, who sends them to the chosen super-clients.

At step S3, super-client j can aggregate all client commitments as C_j:=comm₁*comm₂* . . . * comm_n, where*denotes the commitment-aggregation operation, and publish the result of the aggregation.

At step S4, client i can computes k additive shares r_{1,1}, . . . , r_{ik} of the randomness r_i.

At step S5, the client i can send each computed additive share to one of k super-clients j. Since in most cases, the clients do not communicate with each other, the commitments are sent to the server, who sends them to the chosen super-clients. Each client sends one additive share in each case to one of the k super-clients such that each super-client only receives one additive share per client.

At step S6, super-client j can sum all additive shares it receives as z_j=r_{1,j}+r_{2,j}+ . . . +r_{n,ji} and publish the result.

At step S7, client i can send its blinded input c_ito the sever, protected according to the secure aggregation protocol.

At stepS7, the server can follow steps of the secure aggregation protocol to obtain the sum of all client inputs y:=x₁+x₂+ . . . +x_nand publish the result.

At stepS8, client i can verify that super-client j has published the same value Cj in step S3 as all other super-clients.

At step S9, client i can verify that y is valid based on the aggregation at step S3, summation at step S6, and aggregation at step S7 (e.g., COMM.Vrfy(Cj;y;z)==1 where z=z₁+z₂+ . . . +z_k). Client i can accept the sum y published by the server as valid if the verifications of step S8 and S9 are true (e.g., the verification algorithms return TRUE). In other words, each of k super-clients can perform a partial sum of all of the additive shares and publish this partial sum. The verification algorithm can then sum up these published, partial sums. Upon receiving k partial sums z₁, . . . , z_k, client i can compute the overall sum z=E_jz_jand then use the so-derived z as randomness in the COMM. Verify algorithm. With other words, the verification procedure at step S9 of FIG. 4 can comprise an aggregation step to derive z and a commitment verification step to verify that C_jis a valid commitment for message y with randomness z.

Single-server secure aggregation protocols can be introduced as a privacy-enhancing mechanism for federated learning, a recent paradigm to enable the training of a machine-learning models using data obtained from distributed data sources without the need of collecting data in a centralized location. The federated learning paradigm has enabled many practically-relevant applications of artificial intelligence (AI).

In an embodiment of the present invention, federated learning can be used in the training of a global model for medical diagnostics based on a patients data provided by independent healthcare facilities. Relying on data from many different sources can allow for derivations of a more accurate model, however, keeping the individual models private is advantageous for protecting the privacy of patients.

In another embodiment of the present invention, federated learning can be used in healthcare leveraging of wearable devices (e.g., such as a smart-watches) to train a machine learning (ML) model for monitoring the health status of patients. Again, the prediction capabilities of the resulting ML model may rely on large amounts of sensitive data from different patients, which can raise privacy concerns.

In another embodiment of the present invention, privacy-preserving federated learning can also support business-oriented use cases. For instance, different financial institutions may benefit from training prediction models based on the data of all of their customers, without disclosing individual data.

Finally, secure aggregation protocols find applicability beyond federated learning. For instance, smart-city sensor data may be combined to compute information such as average temperature in a certain area, traffic information, etc.

In an embodiment, the present invention provides a method for computing, by a server of a distributed client-server application, the sum of client-provided inputs while ensuring to clients that the computed sum as published by the server is the actual sum of the client-provided inputs, the method comprising the steps of:

- 1) Client i with input x_icomputes and publishes commitment to its input comm_i:=COMM.Commit(x_i;r_i), where r_iis the randomness used to obtain the commitment; client i also sends its input x_iinput to the sever, protected according to the secure aggregation protocol.
- 2) Client i computes k additive shares r_{1,1}, . . . , r_{i,k} of r_iand sends each of them to one of k super-clients.
- 3) Super-client j aggregates all client commitments as C_j:=comm₁*comm₂* . . . *comm_nand publishes the result.
- 4) Super-client j sums all additive shares it receives as z_j=r_{1,j}+r_{2,j}+ . . . +r_{n,j} and publishes the result.
- 5) The server follows the step of the secure aggregation protocol to obtain the sum of all client inputs y:=x₁+x₂+ . . . +x_nand publishes the result.
- 6) Clients accept the sum y published by the server as valid if (a) all super-clients have published the same value in step 3, and (b) COMM.Vrfy(C_j;y;z)==1 where z=z₁+z₂+ . . . +z_k.

Embodiments of the present invention can provide for the following improvements over existing technology:

- 1) Publishing, by a client of a distributed client-server application, a commitment to its input and sharing, by the same client, additive shares of the randomness used to compute the commitment to a set of super-clients.
- 2) Publishing by each of the super-clients the sum of all client commitments C_jand the sum of all shares received by a client z_j.
- 3) Checking, by a client, that all super-clients have published the same C_jand that a value y provided by the server with randomness z=z₁+z₂+ . . . +z_kis a valid opening for commitment C_j.
- 4) In contrast to existing technology of integrity mechanisms for secure aggregation, which have communication complexity that is quadratic in the number of clients, embodiments of the present invention can advantageously provide that a client has to send or receive only O(k) messages, with k<<n.

Referring to FIG. 5, a processing system 500 can include one or more processors 502, memory 504, one or more input/output devices 506, one or more sensors 508, one or more user interfaces 510, and one or more actuators 512. Processing system 500 can be representative of each computing system disclosed herein.

Processors 502 can include one or more distinct processors, each having one or more cores. Each of the distinct processors can have the same or different structure. Processors 502 can include one or more central processing units (CPUs), one or more graphics processing units (GPUs), circuitry (e.g., application specific integrated circuits (ASICs)), digital signal processors (DSPs), and the like. Processors 502 can be mounted to a common substrate or to multiple different substrates.

Processors 502 are configured to perform a certain function, method, or operation (e.g., are configured to provide for performance of a function, method, or operation) at least when one of the one or more of the distinct processors is capable of performing operations embodying the function, method, or operation. Processors 502 can perform operations embodying the function, method, or operation by, for example, executing code (e.g., interpreting scripts) stored on memory 504 and/or trafficking data through one or more ASICs. Processors 502, and thus processing system 500, can be configured to perform, automatically, any and all functions, methods, and operations disclosed herein. Therefore, processing system 500 can be configured to implement any of (e.g., all of) the protocols. devices, mechanisms, systems, and methods described herein.

For example, when the present disclosure states that a method or device performs task “X” (or that task “X” is performed), such a statement should be understood to disclose that processing system 500 can be configured to perform task “X”. Processing system 500 is configured to perform a function, method, or operation at least when processors 502 are configured to do the same.

Memory 504 can include volatile memory, non-volatile memory, and any other medium capable of storing data. Each of the volatile memory, non-volatile memory, and any other type of memory can include multiple different memory devices, located at multiple distinct locations and each having a different structure. Memory 504 can include remotely hosted (e.g., cloud) storage.

Examples of memory 504 include a non-transitory computer-readable media such as RAM, ROM, flash memory, EEPROM, any kind of optical storage disk such as a DVD, a Blu-Ray® disc, magnetic storage, holographic storage, a HDD, a SSD, any medium that can be used to store program code in the form of instructions or data structures, and the like. Any and all of the methods, functions, and operations described herein can be fully embodied in the form of tangible and/or non-transitory machine-readable code (e.g., interpretable scripts) saved in memory 504.

Input-output devices 506 can include any component for trafficking data such as ports, antennas (i.e., transceivers), printed conductive paths, and the like. Input-output devices 506 can enable wired communication via USB®, DisplayPort®, HDMI®, Ethernet, and the like. Input-output devices 506 can enable electronic, optical, magnetic, and holographic, communication with suitable memory 506. Input-output devices 506 can enable wireless communication via WiFi®, Bluetooth®, cellular (e.g., LTE®, CDMA®, GSM®, WiMax®, NFC®), GPS, and the like. Input-output devices 506 can include wired and/or wireless communication pathways.

Sensors 508 can capture physical measurements of environment and report the same to processors 502. User interface 510 can include displays, physical buttons, speakers, microphones, keyboards, and the like. Actuators 512 can enable processors 502 to control mechanical forces.

Processing system 500 can be distributed. For example, some components of processing system 500 can reside in a remote hosted network service (e.g., a cloud computing environment) while other components of processing system 500 can reside in a local computing system. Processing system 500 can have a modular design where certain modules include a plurality of the features/functions shown in FIG. 5. For example, I/O modules can include volatile memory and one or more processors. As another example, individual processor modules can include read-only-memory and/or local caches.

FIG. 6 illustrates a process for privacy preservation according to an embodiment of the present disclosure. For instance, at block 602, a computing entity (e.g., the server 104) determines (e.g., selects), based on public randomness (e.g., from a random beacon service 106), a set of super-clients (e.g., super-clients 108).

At block 604, the computing entity obtains, from a plurality of client devices (e.g., clients 102), a plurality of masked inputs. For instance, each client (e.g., each client device) can send random values to the set of super-clients determined at block 602. The random values are used to mask the client private input. Then, the clients can send the masked input also to the computing entity.

At block 606, the computing entity obtains, from each of the set of super-clients, aggregated random values received by the super-client from the other clients.

At block 608, the computing entity aggregates the plurality of masked inputs received from the plurality of client devices and the aggregated random values from the set of super-clients.

In the following, several embodiments of the present invention using the secure aggregation protocols with public randomness are discussed. For instance, in some embodiments, blinding (e.g., secret-sharing) is used. In other embodiments, homomorphic encryption is used. In some variations, the secure aggregation protocols can take several rounds and clients can go offline at any time.

For blinding, a random beacon service (e.g., random beacon service 106) is provided that broadcasts unbiased random seeds either on demand or at pre-defined time intervals. A Public Key Infrastructure (PKI) is also provided that distributes authentic public keys of clients (e.g., the users of system 100) and/or the server 104.

custom-character denotes the initial set of clients, and ||=n. For example, denotes the clients such as all of the users of system 100. The size of i.e., the total number of clients in system 100, is denoted by n.

In a first step, the server 104 obtains the public keys of all clients from the PKI and receives the random seed Q from the random beacon service 106. For instance, the server 104 obtains the public keys from the users (e.g., users 104A-104E) of system 100 and public randomness (e.g., the random seed Q) from the random beacon service 106. The server 104 computes the set of super-clients custom-character ←{i: F(Q;pk_i)<τ} by evaluating a PRF F keyed with the random seed Q, where τ is a protocol parameter. For example, the server 104 determines (e.g., calculates, computes, and/or obtains) a set of super-clients, , based on a PRF, F. For instance, the server 104 inputs (e.g., keys) a random seed, Q, with the public keys, pk_i, from the clients into the PRF to determine the set of super-clients. Furthermore, the server 104 uses a protocol parameter, τ, to determine the set of super-clients.

For example, if the range of F is [0,1] (e.g., the output from the PRF is between 0 and 1), the server 104 can set τ to 0.1 so to have clients be appointed super-client with 0.1 probability. For instance, if there are 100 total users (e.g., clients) in the system 100 and the protocol parameter is 0.1, then the server 104 may determine 10 clients as super-clients.

For each super-client i∈ custom-character (e.g., each super-client i that is an element of the total set of super-clients ), the server 104 can also compute the list of backup clients _i={G_B(Q, pk_i, j)}_1<j<lusing a Pseudo-random Generator (PRG) G_Band encoding the PRG output as a list of l clients. The PRG range is [n], which is the set itself (e.g., the set {1,2, . . . , n−1,n}). For instance, the server 104 determines the list of backup clients, custom-character _i, using a PRG, GB. For example, the server 104 inputs the random seed Q, the public keys, pk_i, and j. As used herein, “i” and “j” refer to users (e.g., the clients 102), which can include a regular user, a super-client, and/or a backup client. Further, as used herein, it should be understood that the user “i” and/or the user “j” might not refer to the same user from a preceding section and/or embodiment (e.g., the user “i” may refer to multiple different users in different sections such as a super-client in one section and a backup client in another). In most embodiments, the first user may be designated “i” and additional users may be designated “j”. In other embodiments, this may be reversed.

In a second step, the super-clients obtain the public keys of every user from the PKI. For instance, the server 104 can provide an indication to the users of the system 100 that have been determined as super-clients. Then, the super-clients can request and obtain the public keys from the users of the system 100. Further, each super-client i from the entire list of super-clients, custom-character , determines (e.g., computes and/or obtains) the list of its backup clients _i. For example, both the server 104 and the super-clients may become aware of the list of backup clients for that super-user by determining the list of its backup clients. In some instances, the server 104 computes this list and sends it to the super-client. In other instances, the server 104 and the super-client compute this list individually. The list can be determined by randomness Q, and both parties can end up with the same list in both instances. Each super-client i secret-shares its secret key sk_iin a t_sout of l scheme, creating shares {sk_ij custom-character , where l=|_i| and t_sis a tunable parameter determining the number of shares required to reconstruct sk_i. For instance, the shares may be determined using a Shamir secret-sharing. For example, the entity (e.g., the super-client) computing the shares of the secret S may draw a random polynomial (e.g., “f(x)”) of degree t−1 such that f(0)=S. Next, the same party computes the i-th share as f(i). Given t or more shares (e.g., t is the tunable parameter), f(0) can be recovered, that is the secret S, via interpolation. In operation, each super-client i shares its secret key, sk_i, with its backup clients. For example, for five total backup clients (e.g., l is five from the list of backup clients custom-character _i), the super-client can determine that three (e.g., ts is three) can be used to reconstruct the super-client's secret key. The super-client can provide shares of its key to the back-up neighbors such that only three of them are needed to reconstruct the secret key sk_ij.

For instance, for each backup client j, the super-client i can encrypt share sk_ijunder the public key of backup client j and send the ciphertext to backup client j. For example, using the public key of the backup client, the super-client encrypts its secret key sk_ij, and provides the encrypted secret key (e.g., ciphertext) to the backup client.

In a third step, each client i generates k masking seeds {b_ij custom-character , one for each super-client j∈. The masking seeds may be drawn randomly (e.g., the masking seeds may be a random value in the domain of function G, which is a PRG). For instance, each client generates a number of masking seeds, where k represents the total number and b_ijrepresents the masking seeds that are generated. Then, for each super-client j, client i encrypts seed b_ijunder the public key of super-client j and sends the ciphertext to super-client j. For instance, the client encrypts its masking seed for a particular super-client with the public key of the super-client, and provides the encrypted masking seed (e.g., ciphertext) to that super-client. Next, the client i computes y_i=x_i+B_i, where B_i= custom-character G(b_ij) and G is a PRG, and sends it to the server. For example, the client may compute B_i, which is computed as the sum of the outputs of G evaluated at all b_ij. For instance, if the entity computing this sum is a first client and nodes 3, 7, 8 are super-clients (e.g., clients 3, 7, and 8 are super-clients), then node 1 (e.g., the first client) may draw three masking seeds: b_13, b_17, b_18. Hence, B_1=G(b_13)+G(b_17)+G(b_18). In some variations, the client determines the masked input y_ibased on the private input x_iand randomly drawn information B_i. For instance, the client determines the randomly drawn information by using a summation function for each of the super-clients within the total set of super-clients and a PRG, G, with the generated masked seeds for each of the super-clients.

In a fourth step, which can be performed if the third step above is performed (c.g., based one or more of the clients dropping out that is described above), denote U₂⊆U₁as the set of clients that completed step three above. For instance, U₂is a subset of U₁, which includes the total number of clients in system 100, and U₂is the set of clients that completed the third step above. For this subset, the super-clients receive the masking seeds from these clients within the subset. Then, the super-client j computes B_j^sum=Σ_i∈U₂G(b_ij) and sends B_j^sumto the server 104. B_j^summay be computed similarly to B_i, which is described above. B_j^sumis the aggregated random value from the other clients (e.g., clients other than the super-client that computes the aggregated random value). Each super-client can send the aggregated random value to the server 104 such that the server obtains a plurality of aggregated random values.

Next, denote custom-character ₁⊆ as the set of super-clients that completed step four above. Let r=|₁|. For instance, ₁is a subset of the total super-clients that completed step four above, and r is the number (e.g., amount) of the subset of super-clients.

In some examples, if r<k, then a fifth step is carried out. For instance, if the number from the subset of super-clients that completed step four is less than k, then a fifth step is carried out. “k” is the size of the set of super-clients, and is a tunable parameter. In a fifth step, the server 104 sends the list of dropout super-clients custom-character \₁to all neighbors of all super-clients . For instance, “\” represents the set difference formed by all of the super-clients that are not within the subset of super-clients ₁. The server 104 sends the list of these super-clients that are not within the subset of super-clients above to the backup clients custom-character _ifor the super-clients. Each backup client j of dropped super-client i decrypts and sends to the server their share of super-client i's secret key sk_ij. For each dropped super-client i, if the server 104 receives at least t_sshares of the secret key, the server 104 reconstructs the secret key sk_iof the dropped out super-client. The set of super-clients from which the secret key has been reconstructed can be denoted as custom-character rec. Then, let r_d=|_rec| (e.g., r_dis the magnitude of _rec). The server 104 then decrypts and obtains the blinding seeds sent to dropped super-client i: {b_ji}_j∈U₂. The server gets (e.g., obtains) B_i^sumfor each super-clients i∈_rec.

In some instances, if r is at least k (e.g., r is greater than or equal to k) or r+r_dis at least k (e.g., r+r_dis greater than or equal to k), then a sixth step is carried out. In the sixth step, the server 104 determines (e.g., computes) the sum of blindings as Σ_i∈U₂B_i= custom-character B_j^sum, and outputs Σ_i∈U₂x_i=Σ_i∈U₂y_i−Σ_i∈U₂B_i.

In some embodiments of the present invention, homomorphic encryption is used. For instance, compared to the embodiments using blinding, an additively homomorphic encryption scheme and a public key PK for the set of super-clients are provided such that each super-client holds a share of the corresponding decryption key. In these embodiments, the first two steps (e.g., steps one and two) described above with the use of blinding are the same and are not repeated for brevity.

At a third step, each client i encrypts its input x_iunder the super-clients public key PK and sends the ciphertext, denoted it as y_i, to the server 104. For instance, the server 104 obtains the ciphertext (e.g., the encrypted input) that is encrypted based on the clients private input x_iand the super-clients public key PK.

Next, denote U₂⊆U₁as the set of clients that completed step three above. For instance, For instance, U₂is a subset of U₁, which includes the total number of clients in system 100, and U₂is the set of clients that completed the third step above. In step four, the server 104 aggregates all ciphertexts received from clients in U₂as ysum=Σ_i∈U₂y_iand provides y_sumto the super-clients. For instance, the server 104 determines, from the masked inputs from the clients y_i, an aggregation (e.g., by using a summation Σ_i∈U₂y_iof the clients i within the set U₂) of the masked inputs, which is denoted by ysum.

At step five, each super-client uses its share of the decryption key to partially decrypt y_sumand sends the partially decrypted ciphertext to the server 104. The decryption key might not be public, but may be tied (e.g., associated with) the public key.

Following, denote custom-character ₁∈ as the set of super-clients that completed step five above. Let r=|₁|. For instance, ₁is a subset of the total super-clients that completed step five above, and r is the number (e.g., amount) of the subset of super-clients.

If r<k, then step six is performed. For instance, if the number from the subset of super-clients that completed step four is less than k, then a sixth step is carried out. For instance, step six, the server 104 sends the list of dropout super-clients custom-character \₁to all neighbors of all super-clients . For instance, \ represents the set difference formed by all of the super-clients that are not within the subset of super-clients ₁. The server 104 sends the list of these super-clients that are not within the subset of super-clients above to the backup clients custom-character _ifor the super-clients. Each backup client j of dropped super-client i decrypts and sends to the server 104 their share of super-client i's secret key sk_ij. For each dropped super-client i, if the server receives at least t_sshares of the secret key, the server 104 reconstructs the secret key sk_i. The set of super-clients from which the secret key has been reconstructed can be denoted as custom-character _rec. Then, let r_d=|_rec| (e.g., ra is the magnitude of _rec). The server can thus carry out the partial decryption of y_sum(e.g., the aggregation of the masked inputs) on behalf of each super-client in _rec.

If r+r_d=k, then step seven is performed. In step seven, the server 104 can decrypt y_sumand obtain Σ_i∈U₂x_i. For instance, Σ_i∈U₂x_iis the sum of the individual inputs from all of the users.

Embodiments of the present invention provide for the following improvements and advantages over existing computer systems and computer networks specially adapted and programmed for aggregation:

- 1. Masking, by a client of a distributed application, a client input with a set of client-drawn random values and sending, by the client, the masked input to the server, and sending, by the client, each of the client-drawn random values to a set of selected super-clients, chosen by means of public randomness.
- 2. Adding, by each of the selected super-clients, the random values received from the other clients and sending the result to the server
- 3. Adding, by the server, the masked inputs received from clients and the sum of random values received from the selected super-clients so as to compute the sum of the client inputs.
- 4. Providing for low communication overhead in terms of total number of messages and in terms of average number of messages per client, thereby enabling to save computational power, computation time and/or computational resources to perform the aggregations, while also decreasing the load on the computer network.

In an embodiment, the present invention provides a method for single-server secure aggregation comprising the steps of:

- 1. Selecting, by means of public randomness, a set of super-clients.
- 2. Sending, by each client, random values to the set of selected super-clients of step 1 and using the random values to mask the client private input.
- 3. Sending, by each client, the masked input to the server.
- 4. Sending, by each selected super-client of step 1, the sum of the random values received by the other clients to the server. For instance, the super-client j may send a message to the server. For the blind use embodiment(s) described above, the message is B_j^sum. For the homomorphic encryption embodiment(s), the message is a partially decrypted y_sum.
- 5. Adding, by the server, the masked inputs received by the clients and the aggregated random values received by the selected super-clients of step 1. For instance, note that Σ_i∈U₂B_i=B_j^sum.

Hence, Σ_i∈U₂y_i− custom-character B_j^sum=Σ_i∈U₂x_i+Σ_i∈U₂B_i−B_j^sum=Σ_i∈U₂x_i+Σ_i∈U₂B_i−Σ_i∈U₂B_i=Σ_i∈U₂x_i.

For instance, each client may mask their input with one random value per super-client and share each of those random values with each of the super-clients. The sum of user inputs may thus include a random value per client per super-client. Each super-client adds the random values it gets from each client and sends it to the server. The sum of the random values (e.g., B_j^sumand/or y_sum) at the server 104 is the sum of each random value per client per super-client. Hence, the two quantities cancel out and the server 104 determines (e.g., learns) the sum of the individual inputs.

Embodiments of the present invention thus provide for general improvements to computers in machine learning systems to provide for integrity verification for secure aggregation by a server. Moreover, embodiments of the present invention can be practically applied to use cases to effect further improvements in a number of technical fields including. but not limited to, medical (e.g., digital medicine, personalized healthcare, AI-assisted drug or vaccine development (AIDD or Oncoimmunity), etc.) and smart cities (e.g., automated traffic or vehicle control, smart districts, smart buildings, smart industrial plants, smart agriculture, energy management, etc.), forecasting, and in particular any application that would benefit from secure aggregation.

In the following, further background and description of exemplary embodiments of the present invention, which may overlap with some of the information provided above, are provided in further detail. To the extent the terminology used to describe the following embodiments may differ from the terminology used to describe the preceding embodiments, a person having skill in the art would understand that certain terms correspond to one another in the different embodiments. For example, herein, clients may also be referred to as users, super-clients can be referred to as committee members, and the private inputs of the clients provided to the server may also be referred to as hidden, noisy, blinded or masked. Features described below can be combined with features described above in various embodiments.

Secure aggregation can be a key component of privacy-friendly federated learning applications, where the server learns the sum of many user-supplied gradients, while individual gradients are kept private. State-of-the-art SA protocols can protect individual inputs with zero-sum random shares that are distributed across users, have a per-user overhead that is logarithmic in the number of users, and take more than 5 rounds of interaction.

An example embodiment of the present invention provides LIghtweight Secure Aggregation (LiSA), an SA protocol that leverages a source of public randomness to minimize per-user overhead and the number of rounds. In particular, LiSA can use two rounds and has a communication overhead that is asymptotically equal to that of a non-private protocol—one where inputs are provided to the server in the clear—for most of the users. The example embodiment of LiSA can use the public randomness to select a subset of the users—a committee of super-clients—that aid the server to recover the aggregated input. Users blind their individual inputs with random values shared with each of the super-clients, and each super-client provides the server with an aggregate of the randomness shared with each user. It can be possible that when one super-client is honest, the server cannot learn individual inputs but only the sum of threshold-many inputs.

Secure aggregation allows a set of parties to compute a linear function (e.g., the sum) of their inputs while keeping each party's input private. Recently, secure-aggregation protocols have been proposed as a privacy-preserving mechanism for federated learning. where a large number of users and a central server train a joint model by leveraging user-private gradients.

Secure aggregation requires users to provide a noisy version of their inputs to the server so that, when all noisy inputs are aggregated, the noise cancels-out and the server learns the aggregated gradient. An SA protocol can minimize the overhead at the user and the number of rounds, since in many federated learning applications, users are devices with limited computing power and an erratic online behavior. This can be, for example, the case of federated learning used to train voice recognition or text prediction models on mobile phones.

For instance, when assuming the availability of a random beacon service, e.g., a source of public randomness, efficient secure aggregation protocols for federated learning can be designed. LiSA is a secure aggregation protocol that leverages a public random beacon to minimize overhead. For example, LiSA can protect user inputs with random noise. Thus, LiSA uses the random beacon at each learning epoch to select a small subset of the users, namely a committee of super-clients, that hold the seeds necessary to remove the noise from the inputs provided by the users. The technique used to add and remove noise from the inputs can be such that as long as one super-client is honest, a compromised server cannot learn individual user inputs but only the sum of threshold many inputs. The random choice of super-clients ensures (with a high probability) that at least one super-client is honest, despite that a fraction of the users may be compromised. Users of LiSA can add noise to their inputs with randomness derived from a non-interactive key agreement with each of the super-clients. Hence, super-clients can provide the server with the same aggregated randomness added by the clients, so that the aggregated inputs can be de-noised. In order to protect individual user inputs, honest super-clients must not share with the server the noise used to blind an individual input, but rather provide the server with aggregated noise, so that the server can de-noise only aggregated inputs. LiSA further uses the random beacon to select, for each super-client, a set of backup clients uniformly at random. The backup clients of a super-client allow the server to recover the aggregated randomness, shared by the users with the super-client, in the event that the super-client may go offline (e.g., due to network outages, system failure, or adversarial compromise).

Formally, with a single server S and set of n users custom-character , without loss of generality, cach user can be identified with a member of [n] so that =[n]. User i∈ holds private input x . The protocol should allow the server to learn y=x_i. Users have no direct communication channels as all messages are routed through the server. A PKI distribute the genuine user public keys. Hence any pair of users establish a confidential and authenticated channel. Further, each user have a confidential and authenticated channel with the server. Protocols be designed that are robust—that is, protocols that provide the aggregation output—in presence of a δ fraction of users that go offline at any time during the protocol execution.

Semi-honest as well as malicious settings can be accounted for. In the semi-honest settings, corrupted parties can follow the protocol as expected; in the malicious settings, corrupted parties can act arbitrarily. In both cases, the adversary can corrupt the server and a γ fraction of the users, statically. The security of the protocols can guarantee that the input of an honest user is aggregated with at least an a fraction of all other inputs before it is revealed to the server in the clear. In other words, the server learns y= custom-character x_iif ′⊆ and |′|≥αn. Denials of services attacks or attacks to the integrity of the computation may considered via other protocols. The notations are provided in Table 1 below.

TABLE 1

Frequently used notations.

Symbol
Description

System model

custom-character

Set of clients

custom-character

Set of dropout users

custom-character

Set of corrupt clients

n
Overall number of clients

δ
Fraction of dropped clients

γ
Fraction of corrupt clients

Q
Public randomness

Protocol parameters

custom-character

Committee of super-clients

custom-character

_j
Backup clients set for super-client j

k
Size of the set of super-clients, k ≤ n

custom-character

Size of backup-clients sets, custom-character

≤ n

t
Secret sharing reconstruction threshold, t ≤ custom-character

{tilde over (c)}
Estimated number of corruptions in custom-character

α
Minimum fraction of aggregated inputs

A specification of a semi-honest protocol is provided in process 1, with changes that can be made to the semi-honest protocol so as to withstand a malicious server, as discussed below. In a protocol, each user can assume up to three roles: regular user, super-client, and backup-client. Each user i∈ custom-character can have two key-pairs (sk_i, pk_i) (SK_i, PK_i) used for non-interactive key agreement. Key pair (SK_i, PK_i) can be used by user i when she acts as a super-client, whereas key-pair (sk_i, pk_i) can be used for all other purposes.

During round 1, all users and the server can use the random beacon to select a committee of super-clients, denoted as custom-character , at random. In particular, ←Select(Q, , k) denotes a function that selects k users from by means of a PRG seeded with randomness Q.

During round 2, each user i can add noise to her input x_iby using k random masks, each shared with one of the super-clients. In particular, user i can agree on a shared key with super-client j by running k_ij=KA. Agree(sk_i, PK_j) and then obtain a mask as F(k*_ij), where F is a PRG. The noisy input, denoted as c_i, is sent to the server.

The server can aggregate the noisy inputs as c_agg, and asks each of the super-clients for the aggregated randomness required to de-noise c_agg(round 3). When custom-character ′₁is the set of users from who the server receives a noisy input, each super-client j runs a non-interactive key agreement with each user i∈′₁to obtain {k_j,i=KA. Agree(SK_j, pk_i), Next, super-client j can use each of the agreed keys to obtain the shared masks by means of a PRF, aggregate all the shared masks as ∂_j, and send it to the server. Finally, the server can obtain the aggregated masks from each super-client and subtracts those to c_aggto remove the noise and obtain the aggregated output.

The above protocol might not be robust against dropouts by super-clients. For example, when a user j∈ custom-character goes offline before she sends her aggregated noise share ∂_jto the server (round 3, step 6), then the aggregated noisy input might no longer be de-noised. Robustness can be added by selecting a set _jof backup clients for each super-client j, and by asking j to secret-share her secret key SKÅ among them, with reconstruction threshold t≤ custom-character (round 2).

Similar to the committee selection, the selection of a backup set can use the randomness provided by the random beacon service. In this case, each super-client j runs L_j←Select(Q∥j, custom-character , l) to select l backup clients. The PRG of the Select function can be seeded with Q∥j—the random beacon Q concatenated with the index of the super-client j—so that different super-clients obtain independent sets of backup clients.

If a super-client j goes offline before she can send the aggregated noise share, the server can recover j's secret key by asking for her shares to her backup clients (round 4). If t—many users in j's backup set custom-character _jsend a share of SK_jto the server (round 5), then the server can recover the key and compute the aggregated noise share (round 6), despite j being offline.

Backup clients can aid the server in recovering the secret key of a super-client if the number of dropped-out super-clients | custom-character _drop| is below k−{tilde over (c)}, where {tilde over (c)} is the expected number of super-clients that are corrupted. This can make sure that even by corrupting some super-clients and by recovering some secret keys via backup clients, the server may still be missing at least one secret key of a super-client and, therefore, cannot de-noise individual noisy inputs of victim users.

Regular users may need to be online until round 2, then they can go offline. Super-clients and backup clients may be required to remain online until round 3 and until round 5, respectively.

Also, in the example of process 1, each user encrypts a single message to ease readability. In case users encrypt vectors of size m, each element can be encrypted separately, and aggregation can be carried-out element-wise.

The malicious protocol can follow the same blueprint of the semi-honest one. Differences between the two protocol can stem from the ability of a corrupted server to act arbitrarily. Steps 3 and 4 of round 4 of process 1, and steps 1-4 of round 5 of process 1, provide changes that can be made to the semi-honest protocol so as to withstand a malicious server.

A malicious server can declare different sets of dropped-out super-clients, each of size <k−{tilde over (c)}, to different backup sets, so to obtain their key-shares. When the server manages to obtain the secret keys of each honest super-client, it can remove noise from any single noisy input, thereby breaking security. Round 4 and round 5 can be used to run a consistency check among backup sets, so as to agree on the set of dropped-out super-clients; if the size of this set is bigger than a threshold.

Process 1: LiSA secure aggregation protocol with input privacy. Parties: Server and users custom-character =[n]. Public parameters: input domain , fraction of drop-outs δ, fraction of corruptions γ, security parameter for cryptographic primitives μ, committee size k, backup-set size 1, secret sharing reconstruction threshold t, minimum fraction of aggregated inputs α, maximum number of corrupt super-clients {tilde over (c)}. Prerequisites: Each user i∈ custom-character has key-pairs computed as (sk_i, pk_i)←KA. Gen(1^λ), (SK_i, PK_i)←KA. Gen(1^λ) and (sk_i^s, pk_i^s)←DS. Gen(1^λ); public keys (pk_i, PK_i, pk_i^s) are registered with the PKI.

For custom-character and r∈[6], the set of users that complete the execution of round r without dropping out are denoted by _r, and the set of users the server knows have completed round r are denoted by ′_r. It holds ′_r⊆_r⊆r₋₁for all r∈[6].

Round 1: each party (1) receives random seed Q; and (2) selects committee custom-character ←Select(Q, , k).

Round 2: a super-client j| custom-character (1) selects backup sets _j←Select(Q∥j, \{j}, ); (2) fetches public keys of backup clients {pk_i} from the PKI; (3) derives symmetric keys: {k_j,i^e←KA(“ENC”; sk_j, pk_i); (4) secret shares key SK_j: {S_j,i←SS. Share(; t; SK_j); (5) encrypts shares of SK_j: {E_j,i←AE.Enc(k_j,i^e, S_j,i) custom-character ; and (6) sends {(j; i; E_j,i)} to the server. User i∈ (1) fetches public keys {PK_j from the PKI; (2) derives symmetric keys: {k*_i,j←KA(PRG; sk_i, PK_i); (3) computes blinded input: c_i←x_i+F (k*_i,j) ; and (4) sends c_ito the server.

Round 3: a server (1) receives encrypted key shares {(j; i; E_j,i)} custom-character from j∈n∩′₂and sends each of them to a corresponding backup client; (2) receives blinded inputs c_ifrom users i∈′₂; (3) aggregates input c_agg←ci; (4) sends ′₂to super-clients j∈∩′₂. A super-client j∈∩′₂(1) receives ′₂from the server; (2) if | custom-character ′₂|<αn, aborts; (3) fetches public keys {pk_i from the PKI; (4) derives symmetric keys {k*_j,i←KA(“PRG”; SK_j, pk_i); (5) computes partial blinding ∂_j←F(k*_j,i); and (6) sends ∂_jto the server.

Round 4: a server, when // custom-character _alive:=∩′₃and _drop:=\_aliveand =_j, (1) receives {∂_j} from super-clients in _alive; (2) if |_alive|=k jumps to step (2) of Round 6; and (3) sends _dropto all users in . A backup client i∈ (1) receives Kdrop from the server; (2) if |_drop|>k−{tilde over (c)}, then aborts; (3) computes signature σ_i←DS. Sign(sk_i^s; custom-character _drop); and (4) sends σ_ito the server.

Round 5: a server (1) receives {σ_i custom-character from users ∩′₃and forwards them to all users in ∩′₃. A backup client i∈∩′₃(1) receives {σ_i; (2) fetches {pk_i^s from the PKI; (3) computes _ack={l∈(∩′₄: DS. Verif(pk_l^s; σ_l; _drop)=TRUE}; (4) if |_j∩_ack|<t for any j∈, then aborts; and (5) for any j∈ custom-character _dropsuch that i∈_j: (a) fetches pk_jfrom the PKI; (b) derives symmetric key (k_i,j^e)←KA(“ENC”; sk_i, pk_j); (c) decrypts S_j,i←AE. Dec(k_i,j^e; E_j,i); and (d) sends S_j,ito the server.

Round 6: a server (1) for each j∈ custom-character _drop: (a) collects shares {S_j,i and aborts if the server receives less than t shares; (b) reconstructs secret key SK_j←SS. Rec({S_j,i); (c) derives symmetric keys {k*_j,i←KA(PRG; SK_j;pk_i); and (d) computes missing partial blinding ∂_j∂F(k*_i,j); and (2) given {∂_j={∂_j custom-character ∪{∂_j, computes the output: y←c_aggj.

When k−{tilde over (c)}, then it is possible that no honest backup client will help the server in recovering the missing keys.

Adding integrity protection to LiSA: LiSA can guarantee input privacy, and may treat attacks against the integrity of the aggregation as out of scope. The following example (e.g., process 2) adds an integrity protection mechanism to LiSA.

A commitment scheme (COMM.Gen,COMM.Vfy) can be hiding and binding. In particular, given a message m and randomness r, comm←COMM.Gen(m, r) is a commitment to message m; opening the commitment requires revealing both m and r. To verify whether m, r is a valid opening for comm, one can run COMM.Vfy(comm, m,r) that provides a TRUE/FALSE output. The commitment scheme can be linearly-homomorphic. That is, given comm₁←COMM.Gen (m₁, r₁) and comm₂←COMM.Gen(m₂, r₂), there is a “multiplication” operation ⊙ such that comm₁⊙comm2==COMM.Gen(m₁+m₂, r₁+r₂). In other words COMM.Vfy(comm₁⊙comm₂, m₁+m₂, r₁+r₂) outputs TRUE. Pedersen commitments can also fulfill the above properties. The details of an embodiment of the protocol are shown in process 2 through 6 example rounds and steps. For instance, each user i can commit to her input x_iwith randomness r_iand send the commitment comm; to all of the super-clients. Hence each super-client j can multiply the commitments received from the server so as to obtain a commitment to the sum of the inputs, say C_j= custom-character comm_i. Each super-client can publish C_jand the protocol can abort as soon as one of the published C_jdiffers from the others. As long as one super-client is honest, it can be ensured that (i) the server provides the same set of commitments to the super-clients, and (ii) that all super-clients publish the same C_j; if one of those conditions does not hold, the protocol can be aborted. Note that C_jis a commitment to y custom-character xⁱwith randomness rr_i.

At the same time, each client i can secret-share the randomness r_iused to create comm_iacross all of the super-clients. Hence, super-client j receives r_i,jsuch that r_i= custom-character r_i,j; as long as one super-client is honest, r_iis not leaked to the adversary and commitment comm_idoes not leak x_i.

Further, each super-client can compute and publish the sum of received random shares. That is, super-client j publishes ρ_j= custom-character r_i,j, where ρ_j==r_i. Hence, when the server publishes the aggregate sum y, a client can accept it as valid if COMM.Vfy(C_j, y, ρ_j) outputs TRUE.

Process 2: LiSA secure aggregation protocol with integrity. Parties: Server and users custom-character =[n]. Public parameters: input domain , fraction of drop-outs δ, fraction of corruptions y, security parameter for cryptographic primitives λ, committee size k, backup-set size , secret sharing reconstruction threshold t, minimum fraction of aggregated inputs α, maximum number of corrupt super-clients {tilde over (c)}. Prerequisites: Each user i∈ custom-character has key-pairs computed as (sk_i, pk_i)←KA. Gen(1^λ), (SK_i, PK_i)+KA. Gen(1^λ) and (sk_i^s, pk_i^s)←DS. Gen(1^λ); public keys (pk_i, PK_i, pk_i^s) are registered with the PKI.

For custom-character and r∈[6] and, we denote by _rthe set of users that complete the execution of round r without dropping out, and we denote by ′_rthe set of users the server knows have completed round r. It holds ′⊆′_r⊆_r−1for all r∈[6].

Round 1: each party (1) receives random seed Q; and (2) selects committee of super-clients custom-character ←Select(Q, , k).

Round 2: a super-client j∈ custom-character (1) selects backup sets _j←Select(Q∥j, \{j}, ); (2) fetches public keys of backup clients {pk_i from the PKI; (3) derives symmetric keys: {k_j,i^e←KA( “ENC”; sk_j, pk_i); (4) secret shares key SK_j: {S_i,j←SS. Share(; t; SK_j); (5) encrypts shares of SK_j: {E_j,i←AE. Enc(k_j,i^e, S_j,i) custom-character ; and (6) sends {j; i; E_j,i) to the server. A user i∈ (1) fetches public keys {PK_j from the PKI; (2) derives symmetric keys: {k*_i,j←KA (“PRG”; sk_i, PK_j); (3) computes blinded input: c_i←x_i←Σ_j∈kF(k*_i,j) ; (4) sends c_ito the server; (5) computes commitment to x_i: comm_i←COMM. Gen(x_i, r_i) where r_iis a random value; (6) computes signature on comm_i: σ_i←DS. Sign(sk_i^s; comm_i); (7) computes k-out-of-k shares of r_i: {r_i,j custom-character ←SS. Share(k, k; r_i); (8) derives symmetric keys: {k_i,j^s←KA (“ENC”; sk_i, PK_j); (9) encrypts the random shares of r_i: {E_i,j^s←AE. Enc(k_i,j^s, r_i,j); and (10) sends {(i; j; E_i,j^s), (comm_i, σ_i) to the server.

Round 3: a server (1) receives encrypted key shares {(j; i; E_j,i) custom-character from j∈∩′₂and sends each of them to corresponding backup client; (2) receives blinded inputs c_ifrom users i∈′₂; (3) aggregates input C_agg←c_i; (4) sends ′₂to super-clients in j∈∩′₂; (5) receives encrypted random shares {(i; j; E_i,j^s) from i∈′₂and sends each of them to the corresponding super-client; and (6) receives commitments and signatures {(comm_i, σ_i) custom-character and sends all of them to all super-clients. A super-client j∈∩′₂(1) receives ′₂from the server; (2) if |′₂|<αn, aborts; (3) receives commitments and signatures {(comm_i, σ_i); (4) asserts that signatures are valid: if ({i∈′₂: DS. Verif(pk_i^s;σ_i; comm_i)==TRUE}|<| custom-character ′₂|, then aborts; (5) aggregates the commitments: C_j=comm_i; (6) signs the aggregate commitment and set ′₂: σ_j^C←DS. Sign(sk_j^s; C_j∥′₂); (7) receives encrypted random shares: {(i; j; E_i,j^s); (8) derives symmetric keys {k_j,i←KA(“ENC”; SK_j, pk_i); (9) decrypts random shares {r_i,j,←AE. Dec(k_j,i^s; E_i,j) custom-character ; (10) computes the sum or random shares ρ_j=r_i,j; (11) signs the sum or random shares of σ_j^ρ←DS. Sign(sk_j, ρ_j); (12) sends ρ_j, σ_j^ρ, C_j, σ_j^Cto the server; (13) fetches public keys {pk_i from the PKI; (14) derives symmetric keys {k*_i,j←KA(“PRG”; SK_j, pk_i) custom-character ; (15) computes partial blinding ∂_j←F(k*_j,i); and (16) sends ∂_jto the server.

Round 4: a server, when custom-character _alive:=∩′₃and _drop:=\_aliveand =_j(1) receives {(ρ_j, σ_j^ρ, C_j, σ_j^C) from super-clients in _alive; (2) receives {∂_j}_j∈_alivefrom super-clients in _alive; (3) if |_alive|=k jumps to step (2) of Round 6; (4) sends _dropto all users in . A backup client i∈ (1) receives custom-character _dropfrom the server; (2) if |_drop|≥k−{tilde over (c)}, then aborts; (3) computes signature σ_i←DS. Sign(sk_i^s; _drop); and (4) sends σ_ito the server.

Round 5: a server (1) receives {σ_i custom-character from users ∩′₃and forwards them to all users in ∩′₃. A backup client i∈∩′₃(1) receives {σ_i; (2) fetches {pk_i^s from the PKI; (3) computes _ack={l∈(∩′₄: DS. Verif(pk_l^s; σ_l; _drop)=TRUE}; (4) if |_j∩_ack|<t for any j∈, then aborts; and (5) for any j∈ custom-character _dropsuch that i∈_j: (a) fetches pk_jfrom the PKI; (b) derives symmetric key (k_i,j^e)←KA(“ENC”; sk_i, pk_i); (c) decrypts S_j,i←AE. Dec(k_i,j^e; E_{j, i}); and (d) sends S_j,ito the server.

Round 6: a server (1) for each j∈ custom-character _drop: (a) collects shares {S_j,i and aborts if receives less than t shares; (b) reconstructs secret key SK_j←SS. Rec({S_j,i); (c) derives symmetric keys {k*_j,i←KA(“PRG”; SK_j; pk_i); (d) computes missing partial blinding σ′_j←F(k*_i,j); (e) derives symmetric keys {k_j,i^s←KA(“ENC”; SK_j; pk_i) custom-character ; (f) decrypt random shares {r_i,j,←AE. Dec(k_j,i^s; E_i,j); (g) computes the sum or random shares ρ_j=r_i,j; (2) given {∂_j={∂_j∪{∂_j, computes the output: y←c_agg−∂_j; and (3) publishes y and ′₂and {(ρ_j, σ_j^ρ, C_j, σ_j^C) and {ρ_j}_j∈_drop. A user i (1) fetches {pk_j^s custom-character from the PKI; and (2) asserts the following: (a) |_alive|≥k−{tilde over (c)}; (b) for all pair of indices j₁, j₂∈K_alive: C_j₁==C_j₂; (c) DS. Verif(pk_j^s; σ_j^ρ; ρ_j)==TRUE for j∈alive; (d) DS. Verif(pk_j^s; σ_j^C; C_j∥′₂)==TRUE for j∈alive; and (e) COMM.Vfy(C_j, y, custom-character ρ_j)==TRUE.

While subject matter of the present disclosure has been illustrated and described in detail in the drawings and foregoing description, such illustration and description are to be considered illustrative or exemplary and not restrictive. Any statement made herein characterizing the invention is also to be considered illustrative or exemplary and not restrictive as the invention is defined by the claims. It will be understood that changes and modifications may be made, by those of ordinary skill in the art, within the scope of the following claims, which may include any combination of features from different embodiments described above.

The terms used in the claims should be construed to have the broadest reasonable interpretation consistent with the foregoing description. For example, the use of the article “a” or “the” in introducing an element should not be interpreted as being exclusive of a plurality of elements. Likewise, the recitation of “or” should be interpreted as being inclusive, such that the recitation of “A or B” is not exclusive of “A and B,” unless it is clear from the context or the foregoing description that only one of A and B is intended. Further, the recitation of “at least one of A, B and C” should be interpreted as one or more of a group of elements consisting of A, B and C, and should not be interpreted as requiring at least one of each of the listed elements A, B and C, regardless of whether A, B and C are related as categories or otherwise. Moreover, the recitation of “A, B and/or C” or “at least one of A, B or C” should be interpreted as including any singular entity from the listed elements, e.g., A, any subset from the listed elements, e.g., A and B, or the entire list of elements A, B and C.

SECURE AGGREGATION WITH INTEGRITY VERIFICATION

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

CROSS-REFERENCE TO RELATED APPLICATION

Provisional Applications (1)