The discussion below is merely provided for general background information and is not intended to be used as an aid in determining the scope of the claimed subject matter.
Aspects of the invention relate to providing security against user collusion in data analytics using random group selection, and, in particular, though not exclusively, to methods and systems for random group selection, methods and system for secure computation of inputs using random group selection, client devices and server devices adapted to execute such methods and a computer program product adapted to execute such methods.
Currently the number of data processing schemes that require the processing of large sets of privacy-sensitive data is growing exponentially. Commercial, governmental and academic organizations make decisions or verify or disprove models, theories or hypotheses using big data analytics. These techniques generally include collecting, organizing and analysing large sets of privacy-sensitive data in order to discover descriptive, predictive or even prescriptive insights about a group or groups of people, organisations or companies. This information may be of substantial commercial value. For example, a recommender system may include one or more network servers that run an algorithm to generate personal suggestions for individual users on the basis of data of a large group of users. The data is collected by client applications running on (mobile) devices of the users and transmitted to the server using a suitable client-server protocol.
Data processing schemes used in big data analytics are based on multiple sets of privacy-sensitive user inputs, wherein each set of user-inputs may be provided by a selected group of users, wherein the data processing may include computations, e.g. one or more arithmetical operations such as addition, multiplication, comparison, etc., of the collected data. Privacy-preserving protocols are used in order to eliminate or at least limit information leakage during the processing and transmission of the data. Typically, these techniques include obfuscating the privacy-sensitive data prior to the data processing and communication, which is typically performed in an untrustworthy environment.
WO2013066176 and WO2013066177 describe a privacy-preserving protocol for use in a recommender system wherein inputs from selected groups of users may be aggregated and processed in the encrypted domain using a cryptographic privacy-preserving protocol which may include the use of a homomorphic cryptographic system. The problem of such approach however is that despite the use of cryptographic techniques, information leakage is still possible. For example, Kononchuk et al. “Privacy-preserving user data oriented services for groups with dynamic participation”, in Computer Security (ESORICS) Lecture Notes in Computer Science, 8134, Springer, Berlin, 2013) have shown that repeating such a protocol several times with groups of colluding users may cause information to leak away.
Kononchuk et al. suggested the so-called random user selection approach to cope with the problem of user collusion leading to information leakage. In this approach, a random subset of users is generated for the computation of an output function (e.g. computing suggestions in a recommendation system) based on a random bit vector, wherein each element of the vector indicates whether a particular user is selected or not for computing a current group service. This selection vector is kept hidden from all parties, including the server that is computing the aggregation. The use of the random user selection approach is further described in the article by Veugen et al., “Improved privacy of dynamic group services”, EURASIP journal on information security, March 2017.
In Kononchuk et al. the random bits generation is solved by having each user randomly generating a permutation of length N in cleartext and encrypting the thus generated permutation. Then, the N encrypted permutations are concatenated in the encrypted domain forming one joint encrypted random permutation, and using this to permute an initial vector containing exactly t bits (‘ones’). One problem associated with this solution is that when the number of permutations N becomes large (e.g. a random selection from a set of a million users or more) the permutation process as described by Kononchuk et al. becomes communicational and computational complex. Random bit generation by means of permutation matrices requires N matrix multiplications, wherein each matrix multiplication requires exchange of information between the participating client devices. In practical implementations however, client devices are heterogeneous devices, typically mobile devices having limited resources. Hence, the communication and computational complexity of the protocol should be minimized without compromising the security of the protocol.
Generally, there is a need in the art for improved methods and systems for privacy-preserving random group selection. In particular, there is a need in the art for methods and systems that enable secure random selection of a group of client devices and secure aggregation of privacy sensitive data of a randomly selected group of client devices. This efficiency is especially important for data analytics applications, as more efficient algorithms allow for e.g. recommender systems to be more responsive to changes in user behavior; and packaging of more information, such as user behavior history, leads to more accurate predictions.
This Summary and the Abstract herein are provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary and the Abstract are not intended to identify key features or essential features of the claimed subject matter, nor are they intended to be used as an aid in determining the scope of the claimed subject matter. The claimed subject matter is not limited to implementations that solve any or all disadvantages noted in the Background.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit”, “module” or “system.” Functions described in this disclosure may be implemented as an algorithm executed by a microprocessor of a computer. Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied, e.g., stored, thereon.
Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by, or in connection with, an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by, or in connection with, an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including, but not limited to, wireless, wireline, optical fiber, cable, RF, etc., or any suitable combination of the foregoing. Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java™, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
Aspects of the present invention are described below with reference to exemplary flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor, in particular a microprocessor or central processing unit (CPU), of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer, other programmable data processing apparatus, or other devices create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In an aspect, the invention may relate to a method for secure computation of inputs of t client devices which are randomly selected from a set of N client devices.
In an embodiment, the method may comprise: client devices i (i=1, . . . , N) of the set of N client devices generating, preferably jointly, a random binary vector b of weight t in an obfuscated domain, the generating including using random numbers to randomly swap bit values bi at different positions i (i=1, . . . , N) in the binary vector; each client device i (i=1, . . . , N) transforming an input value xi into the obfuscated domain and determining the product bi·xi of bit value bi at position i of the random binary vector and the input value xi in the obfuscated domain; and, performing a secure computation in the obfuscated domain on the basis of the products bi·xi (i=1, . . . , N).
In an embodiment, bit value bi at position i in the random binary vector b may signal if client device i is selected or not. For example, a bit value one at position i in the binary vector may indicate that client device i has been selected.
Hence, the method steps are executed in the obfuscated domain, which means that the operations are executed based on obfuscated values. For example, obfuscated random numbers (e.g. encrypted or secret-shared random numbers) are used to randomly swap obfuscated bit values bi (e.g. encrypted or secret-shared bit values) at different positions i (i=1, . . . , N) in the obfuscated random binary vector. Similarly, obfuscated input values xi and obfuscated bit values bi are used to determine the obfuscated product bi·xi, which is used to perform secure computations.
In an embodiment, the obfuscated domain may be an encrypted domain based on e.g. homomorphic encryption. In another embodiment, the obfuscated domain may be a secret-sharing domain based on a secret sharing scheme.
Hence, the method provides a computationally and communicationally efficient scheme for secure selection of a random group of client devices from a set of N client devices. The method relies on randomly swapping bits in a binary vector in the obfuscated domain. Additionally, the method provides a computationally and communicationally efficient scheme for performing secure computations based on the input values of the randomly selected client devices. Both the random selection and the computations are executed in the obfuscated domain so that neither the client devices, nor devices external to the client devices, e.g. a server, have knowledge about the group that is selected for the computation, or about the computation itself (e.g. the input values and/or the outcome of the computation).
The secure random bit generation protocol takes not more than 3 tN secure multiplications. This is a more efficient way (i.e. compared to Kononchuk et al) of securing the privacy of users as secured against collusion by malicious users.
Embodiments of the invention are particular useful for big data analytics applications that involve multi-party computations in which the privacy of users needs to be secured. These applications include (but not limited to) online or cloud-based processing of private-sensitive data for recommender systems, processing metrics in television and multimedia applications (e.g. audience ratings of channel usage in television broadcast or streaming applications), processing of financial data of commercial and/or institutional organisations, etc.
In an embodiment, generating a random binary vector may include: determining an initial binary vector b of weight t by setting the first t bits to one: bi=1, 1≤i≤t, and all further bits to zero: bi=0, t<i≤N; generating a random binary vector on the basis of the initial binary vector, the generating including each of the client devices determining, preferably jointly, a position n in the binary vector and a random number r in the range {n, n+1, . . . , N}; and, using the random number r to swap binary values at position n and position r of the binary vector b. In an embodiment, for each position n (1≤n≤t) in the binary vector: generating a random number r and swapping the binary value at position n and position r in the binary vector b.
In an embodiment, the swapping of the bits may be based on a delta function (δin, wherein (δin=1, if a random value rn=i, and δin=0, for all other i positions in the vector.
In an embodiment, the delta function δin may be defined on the basis of a polynomial function Din(x):
wherein dj represent coefficients of the Lagrange polynomial Din(x). Thus, a Lagrange interpolation function representing the delta function δin may be used to manipulate, e.g. swap, pairs of values in the binary vector in the obfuscated domain.
In an embodiment, the obfuscated domain may be based on a homomorphic cryptosystem, preferably an additively homomorphic cryptosystem.
In an embodiment, the random generated binary vector may include encrypted random bits [bi], 1≤i≤N, such that Σibi=t; wherein the client device i processes its input xi on the basis of the binary vector of encrypted bits by computing [xi·bi]=[bi]x
In an embodiment, the method may further include: each of the client devices i transmitting the computed value [xi·bi] to a server, preferably an aggregation server; and, the server performing the secure computation on the basis of the computed values [xi·bi] (i=1, . . . , N).
In an embodiment, the method may further include: each of the client devices transmitting the computed value [xi·bi] to one or more client devices; and, the one or more client devices performing the secure computation on the basis of [xi·bi] (i=1, . . . , N).
In an embodiment, the obfuscated domain may be based on a secret-sharing system, preferably the secret-sharing system being based on a modulo computation using a fixed prime p.
In an embodiment, the random generated binary vector may include secret-shared random bits bi1≤i≤N, such that Σibi=t; wherein the client device i determines a secret share xi using the fixed prime p, and the clients jointly compute bi·xi; and wherein the secure computation in the obfuscated domain is based on the computed values bixi (i=1, . . . , N).
In a further aspect, the invention may relate to a system for secure computation of inputs of t client devices which are randomly selected from a set of N client devices. In an embodiment, the system may comprise: a set of client devices i (i=1, . . . , N), each client device comprising a computer readable storage medium having computer readable program code embodied therewith, the program code including an obfuscation function, preferably a homomorphic encryption function, or a secret-sharing function, for performing computations in an obfuscated domain and a processor, preferably a microprocessor, coupled to the computer readable storage medium, wherein responsive to executing the first computer readable program code, wherein the processor is configured to perform executable operations comprising: generating a random binary vector b of weight t in an obfuscated domain, preferably an encrypted domain or a secret-sharing domain, the generating including using random numbers to randomly swap bit values bi at different positions i (i=1, . . . , N) in the binary vector; transforming an input value xi into the obfuscated domain, and determining the product bi·xi of bit value bi at position i of the random binary vector and the input value xi in the obfuscated domain; and, transmitting the computed product bi·xi to a server system or to a further client device, wherein the server or the further client device is configured to perform a secure computation in the obfuscated domain on the basis of the products bi·xi (i=1, . . . , N) computed by each client device i (i=1, . . . , N).
In a further aspect, embodiments may relate to client apparatuses. Such client apparatus may be used in a system for secure computation. Such system may perform secure computations based on inputs of t client apparatuses, which are randomly selected from a set of N client apparatuses. In an embodiment, the client apparatus may comprise: a computer readable storage medium having computer readable program code embodied therewith, the program code including an obfuscation function, preferably a homomorphic encryption function or a secret-sharing function, for performing computations in an obfuscated domain and a processor, preferably a microprocessor, coupled to the computer readable storage medium, wherein responsive to executing the first computer readable program code, wherein the processor is configured to perform executable operations.
In an embodiment, the executable operation may comprise: generating a random binary vector b of weight t in the obfuscated domain, the generating including using random numbers to randomly swap bit values bi at different positions i (i=1, . . . , N) in the binary vector; and, transforming an input value xi into the obfuscated domain and determining the product bi·xi of bit value bi at position i of the random binary vector and the input value xi in the obfuscated domain.
In an embodiment, the executable operations may further comprise: transmitting the computed product bi·xi to a server system or to a further client device, wherein the server or the further client device is configured to perform a secure computation in the obfuscated domain on the basis of the products bi·xi (i=1, . . . , N) computed by each client device i (i=1, . . . , N).
In a further aspect, embodiments may relate to methods for secure random selection of t client devices from a set of N client devices. In an embodiment, the method may comprise: determining an initial binary vector b of weight t by setting the first t bits to one: bi−1, 1≤i≤t, and all further bits to zero: bi−0, t<i≤N; each client device i (i=1, . . . , N) of the set of N client devices jointly generating a random binary vector b of weight t in an obfuscated domain on the basis of the initial binary vector b, a bit i in the random binary vector b signalling if client device i is selected or not, the generation of the random binary vector including: determining a position n in the binary vector; determining a random number r in {n, n+1, . . . N}; and, using the random number to swap binary values at positions n and r of the binary vector b.
The secure random selection process may be used in any multi-party computation scheme that requires secure selection of t users (client devices) from a group of N users (client devices).
In an embodiment, the swapping of the binary values at positions n and r may be based on a delta function δin, wherein δin=1, if a random value rn=i, and δin=0, for all other i positions in the vector. In an embodiment, the delta function δin may be defined on the basis of a polynomial function Din(x):
wherein dj represent coefficients of the Lagrange polynomial Din(x).
In an embodiment, the obfuscated domain may be based on a homomorphic cryptosystem, preferably an additive homomorphic cryptosystem.
In an embodiment, determining a random number r may further include: each client device i generating l random bits, 0≤j<l, where l is such that 2l−1≤N−n<2l, encrypting the random bits αij using the homomorphic cryptosystem; each client device computing l encrypted random bits [αj], 0≤j<l, wherein [αj]=[α1j⊕α2j⊕ . . . ⊕αNj]); each client device i computing an encrypted random number [r] on the basis of the encrypted random bits [αj].
In an embodiment, the computation of the encrypted random number may include [r]=[Σj=0l−1αj·2j]=Πj=0l−1[αj]2
In an embodiment, the obfuscated domain may be based on a secret-sharing system, preferably the secret-sharing system being based on a modulo computation using a fixed prime p.
In an embodiment, the determining of a random number r may further include: each client device i may generating l random bits αij, 0≤j<l, where l is such that 2l−1≤N−n<2l, transforming the random bits to the secretly-shared domain; the client devices jointly compute l secretly-shared random bits xj, 0≤j<l wherein αj=α1j⊕α2j⊕ . . . ⊕αNj; each client device i computing the secretly shared random number r the basis of the secretly-shared random bits αj. In an embodiment, the computation of the secretly-shared random number may include r=Σj=0l−1αj·2j.
Embodiments may also relate to a program product comprising software code portions configured for, when run in the memory of a computer, executing any of the method steps described above.
Aspects of the invention will be further illustrated with reference to the attached drawings, which schematically will show embodiments. It will be understood that the invention is not in any way restricted to these specific embodiments.
The embodiments in this disclosure generally relate privacy-preserving random group selection schemes and multi-party computation schemes, such as used in data-analytics, using such privacy-preserving random group selection, in particular privacy-preserving multi-party computation schemes wherein a random securely selected group of client devices may jointly compute a function on the basis of privacy sensitive user data.
A client device may be implemented on a user apparatus, e.g. a smart phone, a computer, a server, a (smart) television, a vehicle or any consumer or industrial electronic apparatus that is configured to process and/or generate private data. Client devices may typically comprise a wireless or a hardwired communication interface allowing client devices to communicate with each other or with network elements, e.g. servers, in the network. The client devices and the server may use a secure protocol, in particular a secure multi-party computation protocol based on an obfuscation scheme, e.g. an encryption scheme or secret sharing scheme, which ensures secure communication between the client devices and the aggregation server 112. The server may comprise a processor 116 adapted to process the private data in the obfuscated domain. Communication between the client devices and the server device in the network may be based on a suitable client server protocol 1201,2 e.g. an HTTP protocol or the like. Further, during the execution of the secure multi-party computation protocols described in this application, client devices may exchange information with each other using a central model, e.g. a trusted server or using a decentralized communication model on the basis of a suitable peer-to-peer protocol 121, e.g. the well-known BitTorrent protocol or a derivative thereof.
In an embodiment, when the system of
In an embodiment, the homomorphic cryptosystem may be configured as a so-called homomorphic threshold cryptosystem. Examples of homomorphic threshold cryptosystems are described in the article by Y. Desmedt and Y. Frankel, with title “threshold cryptosystems”, CRYPTO 1989, P. 308-315, Springer-Verlag. The threshold homomorphic cryptosystem is associated with key information 1101,2, including public key information, e.g. a public encryption key e that is shared by the client devices and the aggregator server, and secret key information including a secret decryption key d, which may be secret-shared among the client devices. Each client device i may be provided with a decryption share di of the secret decryption key d, wherein the decryption operation needs to be jointly performed by sufficiently many clients.
An example of a privacy-preserving computing system that is based on a threshold homomorphic cryptosystem is described in a related. European patent application EP 17177653.7 with title “Privacy preserving computation protocol for data analytics”, which is herewith incorporated by reference into the description of this application.
In another embodiment, when the system of
In the system of
The secure protocol is configured as a privacy-preserving protocol allowing a server processor 116 to securely process, e.g. aggregate, private data mi of a group of (at least) t client devices randomly selected from N client devices without the risk that the server and/or client devices learn the individual private data mi and without leaking information of user inputs to sources. Selection of the client devices by the aggregation server may be realized by a selector 114.
In many big data applications, the aggregation process is repeated multiple times wherein each time the selector 114 may select a different group of client devices from the N client devices. However, as shown by Kononchuk et al., when the computing system repeats the aggregation process a number of times for different groups of users, information about the private data may leak away. For example, when the aggregator server would collude with t−1 users, it is possible to obtain the private input of any other user in this way.
This information leakage problem may be addressed by implementing the selector 114 as a secure random group selection processor. This selector may trigger the N client devices to execute a random group selection protocol. As shown in
Prior art solutions include a random group selection process that involves the execution of permutations in the encrypted domain resulting in a computation-heavy protocol, which renders the protocol not, or at least less, suitable for practical applications. It is an aim of the embodiments in this application to provide an efficient privacy-preserving random group selection process which eliminates, or at least reduces, the risk of leakage of privacy sensitive information in multi-party computation schemes, which are often used in data analytics applications. In an embodiment, such random group selection process may be used in a privacy-preserving computing system as described with reference to
The random group selection processes used in the embodiment of this application may be based on a set of bits, e.g. a binary vector bi, 1≤i≤N, which is used to identify which client device in a set of N client devices is selected. For example, setting the k-th bit of the vector to one (i.e. bk=1) may indicate that client device with client index i=k has been selected to participate in a multi-party computation scheme, e.g. secure data aggregation process. Hence, randomly selecting a group of client devices includes securely and randomly setting t bits in the vector to one. The random group selection process according to the invention thus includes the generation of a binary vector of weight t, using a random bit generation method in a domain in which the bits are obfuscated, e.g. the encrypted domain or the secret-sharing domain. Here, the (Hamming) weight t of a binary vector may be defined as the sum of the bit values of the binary vector.
The random binary vector generation may include a swapping process, wherein random numbers are used to swap two bits in the binary vector. This process may include an initial step in which a binary vector of weight t is initialized by setting the first t bits to one: bi=1, 1≤i≤t, and all further bits to zero: bi=0, t<i≤N. Thereafter, a bit swapping process may be repeated t times. An example of such bit swapping protocol may look as follows.
For n=1 to t do:
After execution of this bit swapping protocol, each bit bi is one with probability t/N. It only requires the generation of t random numbers and performing t swaps instead of generating N random permutations in cleartext and N times permuting all items in the encrypted domain as known from the prior art. The randomly generated vector represents a randomly selected group of client devices, wherein a bit at the k-th position in the vector signal indicates whether client device with index k is part of the selected group. In order to avoid leakage of the selection process to unauthorized users, the swapping protocol is executed in an obfuscated domain. Transforming the swapping protocol to such obfuscated domain, e.g. the encrypted domain or the secret-sharing domain, is however not a trivial exercise. This will be illustrated in more detail in the embodiments hereunder.
In an embodiment, the bit swapping protocol may be executed in the encrypted domain using e.g. a homomorphic encryption system. In that case, a binary vector b of encrypted bits may be generated in which the first t encrypted bits may be set to one: [bi]=[1], 1≤i≤t and the rest of the encrypted bits may be set to zero: [bi]=[0], t<i≤N. Further, t encrypted random numbers [rn], 1≤n≤t, need to be generated by the system such that rn is uniformly drawn from the set (n, n+1, . . . N).
Thereafter, a bit swapping process may be executed in which the encrypted random numbers [rn] are used to swap encrypted bits in the binary vector.
In order to process bits of the binary vector, a suitable bit manipulation function may be used. In an embodiment, a delta function δin may be used for processing bits in the binary vector, wherein δin=1, if rn=i, and 0, for all other i positions in the vector. Hence, the delta function sets the bit at position rn in the binary vector to one, and to 0 for all other positions.
In order to transform the delta function to an obfuscated domain, the function may be defined on the basis of a Lagrange interpolation scheme wherein a function Din(x) may define the following polynomial:
Here, the coefficients dj represent coefficients of the Lagrange polynomial Din(x). Based on this function, δin can be constructed as follows: δin=Din(rn), n≤i≤N, which represents a bit on position rn in the binary vector. This function can be transformed into the encrypted domain using an additively homomorphic cryptosystem and the Lagrange coefficients dj:
[δin]=Πj=0N−n[rj]d
As shown by this expression, the homomorphic encryption transforms the summation into a product. This encrypted delta function may be used to process, e.g. swap, bits in the encrypted domain. This way, a bit swapping process may be executed in the encrypted domain, which includes swapping t encrypted bits of the binary vector based on encrypted random numbers.
A process of securely generating random bits, e.g. in the form of a random binary vector, according to an embodiment of the invention may look as follows:
For n=1 to t do
In step 2.d of the protocol above, the new value of bi becomes δin+bi−δin·bi, which equals 1, if δin=1, and bi, it δin=0. Since bn=1, and δin=1 exactly when i=r, we get br←bn, where the other bi remain unaltered. This secure random bit generation protocol takes not more than 3 tN secure multiplications and t random number generations. In contrast, the generation by means of permutation matrices as described by Kononchuk et al. requires N matrix multiplications, which sums up to N4 secure multiplications.
In the above-described random bit generation protocol, the N client devices need to jointly generate an encrypted random number r, r∈(n, n+1, . . . , N).
In an embodiment, such encrypted random number may be generated by the client devices as follows:
If γ−1, this protocol securely generates an encrypted random number r from the set {n, n+1, . . . , N}. The computation of the comparison is known per se. An example of such computation is described in B. Schoenmakers and P. Tuyls, Practical Two-Party Computation based on the Conditional Gate, ASIACRYPT 2004, Lecture Notes in Computer Science 3329 (2004) 119136. Springer-Verlag. Typically, to generate many bounded random numbers, per random number four executions of the protocol above is sufficient.
In the above-described protocol, the client devices use a secure multiplication protocol, which is described hereunder in greater detail, in this scheme, initially, the client devices hold encryptions of additively homomorphically encrypted numbers [x] and [y], and would like to securely compute an encryption [x·y] of the product. To that end, the client devices may execute the following steps:
In the scheme above, the integers x+rx and y+ry may be safely decrypted, since the values of x and y are additively blinded by a large random number, which is unknown to each party. It can be shown that z=x·y: (x+rx)·(y+ry)=xy+xry+rxy+rxry=xy+Σizi.
In an embodiment, the above-described privacy-preserving random group selection process may be used for secure aggregation of private data.
The method allows secure computation of user inputs of t client devices, which are randomly selected from a group of N client device. The process includes a secure selection of client devices in the obfuscated domain, so that neither the client devices, nor the server, know which inputs are used in the computation. Here, the number t may be selected based on various considerations including e.g. the computation and communication time that is needed in order to process data of the selected client devices, the trust in the network, etc. The selection includes an efficient bit swapping process. The process may be implemented using different obfuscation schemes.
In case threshold decryption is used, the aggregator may request t users to decrypt. [y]. In an embodiment, this may be combined with verifiable aggregation to assure that the aggregator added all inputs. Such scheme is described in a related European patent application EP 17177653.7 with title “Privacy preserving computation protocol for data analytics”. This way, the aggregator obtains y=Σi∈Hxi, where the set H={i|bi=1} is randomly chosen, and hidden from all parties.
In a further embodiment, a secret-sharing based secure multi-party computation system may be used to generate the random bits. The random bits bi may be computed and are secret-shared between N client devices. If desired, it is possible to transform such a shared secret to a homomorphically encrypted bit [bi]. This way, an aggregation server may add up encrypted inputs (as described hereunder in more detail).
When using secret-sharing, as e.g. described in the article by Damgard et al., “Unconditionally secure constant-rounds multi-party computation for equality, comparison, bits and exponentiation”, each secret value x is secret-shared which is indicated by the notation x.
Similar to the above-described embodiment related to homomorphic encryption, the main problem is how to securely generate N secret-shared bits bi, such that Σibi=t. Also in this case, prior art solution such as Kononchuk et al. suggest to generate N random permutations of length N by client devices, encrypting these permutations and concatenating these encrypted permutations into one random permutation in the encrypted domain, which will lead to very inefficient and computational heavy processes. Translating the random bit generation process of the invention into the secret-sharing domain will provide a very efficient and secure random group selection process.
The protocol for securely multiplying two secret-shared numbers however differs from a secure multiplication based on (threshold) homomorphic encryption. In particular, in a secret-sharing scheme, values are computed on the basis of a modulo of a fixed prime p. Hence, having a secret share x means that each party i has a (meaningless) share xi, such that Σixi mod p=x. Adding two secret-shared values is simply a client device locally adding the shares. Multiplying two secret-shared values x and y however, requires a protocol and communication between client devices. As a prerequisite, the client devices should hold secret-sharings of three random numbers a, b, and c, such that c=(a·b) mod p. Then, a multiplication may look as follows:
Given the secure multiplication protocol for secret-sharings, the secure random bit generation protocol in the encrypted domain can be easily translated into the secret-share domain by using . instead of [.].
The secure random bit generation protocol in the secret-share domain may include an initialization process wherein a binary vector of secret-shared bits may be generated, in which the first t secret-shared bits may be set to one: bi=1, 1≤i≤t and the rest of the secretly shared bits may be set to zero: bi=0, t<i≤N.
Thereafter, a bit swapping process may be executed in the secret-shared domain, which includes swapping t secret-shared bits, e.g. the t secret shared bits of bit value one, of the binary vector in a similar way as described above with reference to the swapping process in the encrypted domain. In an embodiment, the process may look as follows:
For n=1 to t do
In the above-described random bit generation protocol, the client device need to jointly generate an encrypted random number r, r∈{n, n+1, . . . , N}. In an embodiment, such encrypted random number may be generated by the client devices as follows:
If γ=1, this protocol securely generates a secret-shared random number r from the set {n, n+1, . . . , N}. Typically, to generate many bounded random numbers, per random number four executions of the protocol above should suffice. In general, an addition of two encrypted values [x] and [y] is computed by [x+y]=[x]·[y], whereas secret-shared values are just (locally) added: x+y=x+y. Similarly, [c·x]=[x]c becomes c·x=x·c, which is locally multiplying (modulo p) each share with the known number c.
An advantage of such scheme is that it is no longer necessary to verify the aggregation step performed by the aggregator server. Hence, this way, computations on the secret (and randomised) user data, which is usually done by a central server, may be performed by the client devices. This enables schemes performing all kinds of computations on the user data, without restricting to the homomorphic property of the encryption system.
The systems and methods for privacy-preserving computation of private data as described in this application are particularly useful for implementation in data analytics applications, which require the processing of large amounts of privacy sensitive data. An example of such data analytics application may be a secure privacy-preserving recommender system, wherein a group of participants (voters) may register with the platform, and install an electronic voting application on their computer or mobile device.
The problem of preserving data privacy of users, who repeatedly join the execution of a service with different groups, is present in many data analytics applications requiring multi-party computation, including for example recommendation systems (see Z Erkin, et al, Generating private recommendations efficiently using homomorphic encryption and data packing, IEEE Trans. Inform. Forensics Secur. 7(3), 1053-1066, 2012), aggregating data in smart grids, reputation systems (see e.g. J Kacprzyk, Group decision making with a fuzzy linguistic majority, Fuzzy Sets Syst, 18, 105-118, 1986), unsupervised machine learning (see e.g. F Anselmi et al., Unsupervised learning of invariant representations with low sample complexity: the magic of sensory cortex or a new framework for machine learning? J. Theor. Comput. Sci. 633(C), 112-121 (2016). CBMM, MIT, arXiv:1311.4158v5), collective decision-making (see e.g. A Mathes, Folksonomies—cooperative classification and communication through shared metadata, Comput. Mediated Commun. 47, 1-28, 2004). nr. 10), data clustering (see e.g. Z Erkin et al., in IEEE International Workshop on Information Forensics and Security, Privacy-preserving user clustering in a social network, IEEE, Washington, 2009), and social classification (see e.g. H Kargupta et. al, in ICDM, IEEE Computer Society, On the privacy preserving properties of random data perturbation techniques, IEEE, Washington, 2003, pp. 99-106). The embodiments in this disclosure can be advantageously used in such data analytics applications to eliminate or at least reduce undesired information leakage and to secure the privacy of users against collusion by malicious users during the execution of such services.
The systems and methods described in this application may be used in any data analytics application that requires an aggregated result (e.g. a sum and/or product) of private data of individual users. For example, possible data analytics applications may include the processing of medical data, e.g. online processing of clinical data or medical records of patients, processing data of voting and secret ballot schemes, processing metrics in television and multimedia applications (e.g. audience ratings of channel usage in television broadcast or streaming applications), processing of financial data of commercial and/or institutional organisations, etc.
Memory elements 604 may include one or more physical memory devices such as, for example, local memory 608 and one or more hulk storage devices 610. Local memory may refer to random access memory, or other non-persistent memory device(s) generally used during actual execution of the program code. A bulk storage device may be implemented as a hard drive or other persistent data storage device. The processing system 600 may also include one or more cache memories (not shown) that provide temporary storage of at least some program code in order to reduce the number of times program code must be retrieved from bulk storage device 610 during execution.
Input/output (I/O) devices depicted as input device 612 and output device 614 optionally can be coupled to the data processing system. Examples of input device may include, but are not limited to, for example, a keyboard, a pointing device such as a mouse, or the like. Examples of output device may include, but are not limited to, for example, a monitor or display, speakers, or the like. Input device and/or output device may be coupled to data processing system either directly or through intervening I/O controllers. A network adapter 616 may also be coupled to data processing system to enable it to become coupled to other systems, computer systems, remote network devices, and/or remote storage devices through intervening private or public networks. The network adapter may comprise a data receiver for receiving data that is transmitted by said systems, devices and/or networks to said data and a data transmitter for transmitting data to said systems, devices and/or networks. Modems, cable modems, and Ethernet cards are examples of different types of network adapter that may be used with data processing system 650.
As pictured in
In one aspect, for example, data processing system 600 may represent a client data processing system. In that case, application 618 may represent a client application that, when executed, configures data processing system 600 to perform the various functions described herein with reference to a “client”. Examples of a client can include, but are not limited to, a personal computer, a portable computer, a mobile phone, or the like.
In another aspect, data processing system may represent a server. For example, data processing system may represent an (HTTP) server in which case application 618, when executed, may configure data processing system to perform (HTTP) server operations. In another aspect, data processing system may represent a module, unit or function as referred to in this specification.
The terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting of the invention. As used herein, the singular forms “a”, “an”, and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising”, when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art, without departing from the scope and spirit of the invention. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.
Number | Date | Country | Kind |
---|---|---|---|
17210886.2 | Dec 2017 | EP | regional |
Number | Name | Date | Kind |
---|---|---|---|
8555400 | Shi | Oct 2013 | B2 |
8565435 | Gentry | Oct 2013 | B2 |
8627107 | Kennedy | Jan 2014 | B1 |
8627488 | Cormode | Jan 2014 | B2 |
8909711 | Staddon | Dec 2014 | B1 |
9083526 | Gentry | Jul 2015 | B2 |
9281941 | Gentry | Mar 2016 | B2 |
9436835 | Saldamli | Sep 2016 | B1 |
9536114 | El Defrawy | Jan 2017 | B1 |
9558359 | El Defrawy | Jan 2017 | B1 |
9660803 | Chalker | May 2017 | B2 |
9660813 | van Dijk | May 2017 | B1 |
10015007 | Bacon | Jul 2018 | B2 |
10069631 | Rane | Sep 2018 | B2 |
10095880 | Bent | Oct 2018 | B2 |
10333696 | Ahmed | Jun 2019 | B2 |
10644876 | Williams | May 2020 | B2 |
10771237 | Williams | Sep 2020 | B2 |
10873568 | Williams | Dec 2020 | B2 |
10880275 | Williams | Dec 2020 | B2 |
11048819 | Pihur | Jun 2021 | B2 |
20030204742 | Gupta | Oct 2003 | A1 |
20080219194 | Kim | Sep 2008 | A1 |
20090282039 | Diamond | Nov 2009 | A1 |
20090327141 | Rabin | Dec 2009 | A1 |
20110040820 | Rane | Feb 2011 | A1 |
20110283099 | Nath | Nov 2011 | A1 |
20120106738 | Belenkiy | May 2012 | A1 |
20130035979 | Tenbrock | Feb 2013 | A1 |
20140184803 | Chu | Jul 2014 | A1 |
20150100785 | Joye | Apr 2015 | A1 |
20150154406 | Naehrig | Jun 2015 | A1 |
20150229480 | Joye | Aug 2015 | A1 |
20150312028 | Cheon | Oct 2015 | A1 |
20160149866 | Dolev | May 2016 | A1 |
20160170996 | Frank | Jun 2016 | A1 |
20160224803 | Frank | Aug 2016 | A1 |
20160300252 | Frank | Oct 2016 | A1 |
20160335440 | Clark | Nov 2016 | A1 |
20170220817 | Shen | Aug 2017 | A1 |
20170222798 | Morel | Aug 2017 | A1 |
20170279616 | Loeb | Sep 2017 | A1 |
20170293772 | Chen | Oct 2017 | A1 |
20170329643 | Wang | Nov 2017 | A1 |
20170372226 | Costa | Dec 2017 | A1 |
20180034654 | Mumme | Feb 2018 | A1 |
20180101697 | Rane | Apr 2018 | A1 |
20180157703 | Wang | Jun 2018 | A1 |
20180199361 | Zhang | Jul 2018 | A1 |
20180203926 | Phan | Jul 2018 | A1 |
20180373882 | Veugen | Dec 2018 | A1 |
20190190694 | Joye | Jun 2019 | A1 |
20190289620 | Zhang | Sep 2019 | A1 |
20190335512 | Shi | Oct 2019 | A1 |
20200068616 | Qian | Feb 2020 | A1 |
Number | Date | Country |
---|---|---|
2013066176 | May 2013 | WO |
2013066177 | May 2013 | WO |
Entry |
---|
Erkin, Z., Veugen, T., Toft, T. et al. Privacy-preserving distributed clustering. EURASIP J. on Info. Security 2013, 4 (2013). https://doi.org/10.1186/1687-417X-2013-4. |
Z. Erkin, T. Veugen, T. Toft and R. L. Lagendijk, “Generating Private Recommendations Efficiently Using Homomorphic Encryption and Data Packing,” in IEEE Transactions on Information Forensics and Security, vol. 7, No. 3, pp. 1053-1066, Jun. 2012, doi: 10.1109/TIFS.2012.2190726. |
Z. Erkin, T. Veugen, T. Toft and R. L. Lagendijk, “Privacy-preserving user clustering in a social network,” 2009 First IEEE International Workshop on Information Forensics and Security (WIFS), London, 2009, pp. 96-100, doi: 10.1109/WIFS.2009.5386476. |
T. Veugen, R. de Haan, R. Cramer and F. Muller, “A Framework for Secure Computations With Two Non-Colluding Servers and Multiple Clients, Applied to Recommendations,” in IEEE Transactions on Information Forensics and Security, vol. 10, No. 3, pp. 445-457, Mar. 2015, doi: 10.1109/TIFS.2014.2370255. |
Z. Erkin, T. Veugen and R. L. Lagendijk, “Privacy-preserving recommender systems in dynamic environments,” 2013 IEEE International Workshop on Information Forensics and Security (WIFS), Guangzhou, 2013, pp. 61-66, doi: 10.1109/WIFS.2013.6707795. |
D Kononchuk, Z Erkin, JCA van der Lubbe, RL Lagendijk, Privacy-preserving user data oriented services for groups with dynamic participation, in Computer Security—ESORICS 2013, Lecture Notes in Computer Science, vol. 8134 (Springer, Berlin, 2013), pp. 418-442. |
Anselm et al., Unsupervised learning of invariant representations with low sample complexity: the magic of sensory cortex or a new framework for machine learning. |
Damgard et al., “Unconditionally secure constant-rounds multi-party computation for equality, comparison, bits and exponentiation”, In: Halevi, S., Rabin, T. (eds.) TCC 2006, LNCS, vol. 3876, pp. 285-304. Springer, Heidelberg. |
Desmedt and Y. Frankel, “Threshold cryptosystems”, CRYPTO 1989, p. 308-315, Springer-Verlag. |
Erkin, et al, “Generating private recommendations efficiently using homomorphic encryption and data packing”, IEEE Trans. Inform. Forensics Secur. 7(3), 1053-1066, 2012. |
Erkin et al., in IEEE International Workshop on Information Forensics and Security, “Privacy-preserving user clustering in a social network”, IEEE, Washington, 2009. |
Extended European Search Report issued in corresponding European Application No. 17210886.2, dated Sep. 13, 2018. |
HEAP, “Permutations by Interchanges”, Computer Journal., vol. 6, No. 3, Nov. 1, 1963 (Nov. 1, 1963), pp. 293-298. |
Kargupta et al., in ICDM, IEEE Computer Society, “On the privacy preserving properties of random data perturbation techniques”, IEEE, Washington, 2003, pp. 99-106. |
Kacprzyk, “Group decision making with a fuzzy linguistic majority”, Fuzzy Sets Syst, 18, 105-118, 1986. |
Kononchuk Dmitry et al., “Privacy-Preserving User Data Oriented Services for Groups with Dynamic Participation”, (Sep. 9, 2013), Medical Image Computing and ComputerAssisted Intervention—MIGCAI 2015 : 18th International Conference, Munich, Germany, Oct. 5-9, 2015; Proceedings [Lecture Notes in Computer Science; Lect.Notes Computer], Springer International Publishing, CH. |
Veugen et al., “Improved privacy of dynamic group services”, EURASIP Journal on Information Security, Biomed Central Ltd, London, Uk, vol. 2017, No. 1, (Feb. 1, 2017), pp. 1-9. |
Mathes, Folksonomies “Cooperative classification and communication through shared metadata”, Comput. Mediated Commun. 47, 1-28, 2004). nr. 10. |
F Anselmi et al., “Unsupervised learning of invariant representations with 35 low sample complexity: the magic of sensory cortex or a new framework for machine learning?” Theor. Comput. Sci. 633(C), 112-121 (2016). CBMM, MIT, arXiv:1311.4158v5), Collective decision-making. |
B. Schoenmakers and P. Tuyls, “Practical Two-Party Computation based on Conditional Gate,” ASIACRYPT 2004, Lecture Notes in Computer Science 3329 (2004) 119136. |
Ivan Damgard et al., “A Generalisation, a Simplication and Some Applications of Pailler's Probabilistic Public-Key System” In: “Lecture Notes in Computer Science”, Jun. 5, 2001 (Jun. 5, 2001), Springer Berlin Heidelbergg, Berlin, Heidelberg, XP055148869, ISSN: 0302-97-42; ISBN: 978-3-54-045234-0, vol. 1992, pp. 119-136. |
European Search Report and Written Opinion for European Patent Application No. EP 17177653, filed Jun. 23, 2017, dated Jan. 9, 2018. |
Fuyou Miao et al., “Randomized Component and Its Application to (t, m, n)—Group Oriented Secret Sharing”, IEEE Transactions on Information Forensics and Security, IEEE, Piscataway, NJ, US, vol. 10, No. 5, May 1, 2015 (May 1, 2015), pp. 889-899, XP011577142, ISSN: 1556-6013. |
Desmedt Y et al.: “Threshold Cryptosystems”, Advances in Cryptology. Santa Barbara, Aug. 20-24, 1989 [Proceedings of the Conference on Theory and Applications of Cryptology], New York, Pringer, US, vol. 435, Aug. 20, 1989 (Aug. 20, 1989), pp. 307-315, XP002295910. |
Parno et al., “Pinocchio: Nearly Practical Verifiable Computation”, Communications of the ACM (2016) vol. 59 Issue 2: 103-112. |
Catalano et al., “Homomorphic Signatures with Efficient Verification for Polynomial Functions”, J.A. Garay and R. Gennaro (Eds): CRYPTO 2014, Part I, LNCS 8616: 371-389, International Association for Cryptologic Research, 2014. |
Number | Date | Country | |
---|---|---|---|
20190205568 A1 | Jul 2019 | US |