The disclosure relates to a device performing statistical operation on a homomorphic ciphertext and a method thereof. More particularly, the disclosure relates to an electronic device capable of effectively performing statistical operation on a homomorphic ciphertext and a method thereof.
With the development of electronic and communication technology, a variety of services are being supported by utilizing data which is transmitted and received between various services. Among the examples therefrom, a user may keep one's private information or the like stored in a server, and actively use cloud computing services which uses the information in the server.
In this environment, use of security technology is essential for preventing data leakage. Accordingly, the server is configured to store encrypted data. In this case, because the server is configured to decrypt encrypted data each time when searching stored data or performing a series work based on the data, waste in resource and time may occur.
In addition, when a third-party hacking occurs while in a temporarily decrypted state for a operation in the server, there is the problem of private information being easily leaked to the third-party.
In order to solve the above-described problems and disadvantages, a homomorphic encryption method is being researched. By using a homomorphic encryption scheme, even if an operation is performed in a ciphertext itself without decrypting encrypted information, a same result as with the encrypted value after operation on a plaintext is performed may be obtained. Accordingly, various operations may be performed on the ciphertext without performing any decryption.
However, processing time is slower than a plaintext operation scheme of the related art in that operation in a homomorphic ciphertext state requires more operational volume than the operation in the plaintext state. Specifically, the statistical operation on data requires a method which may more effectively perform a statistical operation on a homomorphic ciphertext in that much operational volume is required even when in the plaintext state.
Aspects of the disclosure are to address at least the above-mentioned problems and/or disadvantages and to provide at least the advantages described below. Accordingly, an aspect of the disclosure is to provide an electronic device capable of effectively performing a statistical operation on a homomorphic ciphertext and a method thereof.
According to an example embodiment, an electronic device includes a memory configured to store at least one instruction, and store homomorphic ciphertexts storing a plurality of variable data in an encrypted state in plurality, and a processor configured to execute at least one instruction, and the processor is configured to generate, by executing the at least one instruction, number data corresponding to a variable combination by using a bin mask having different variable data classified for each of the homomorphic ciphertexts based on an operation instruction on the plurality of homomorphic ciphertexts being received.
The homomorphic ciphertext may include a plurality of slots, and each of the plurality of slots may include one variable data.
The bin mask may include a plurality of slots, and each of the plurality of slots may include data on whether one variable value is present, and the processor may be configured to generate a plurality of bin masks for each variable data included in the homomorphic ciphertext with respect to each of the homomorphic ciphertexts, select a bin mask corresponding to the variable combination from among the plurality of generated bin masks, and generate number data with the variable combination by using multiplication between the selected bin masks.
The bin mask may include a plurality of slots, and each of the plurality of slots may include a plurality of sub slots including data on whether one variable value is present, and the processor may be configured to generate one bin mask on each of the homomorphic ciphertexts, and generate number data with the variable combination by using sub slots in the bin mask which correspond to the variable combination from among the plurality of bin masks.
The plurality of sub slots may be configured to be disposed in one slot with a preset bit distance.
The processor may be configured to join a first homomorphic ciphertext and a second homomorphic ciphertext including a plurality of data on a same feature to one homomorphic ciphertext.
The processor may be configured to use a first position data in the first homomorphic ciphertext and a second position data in the second homomorphic ciphertext on common data in the first homomorphic ciphertext and the second homomorphic ciphertext to join the first homomorphic ciphertext and the second homomorphic ciphertext as one.
The processor may be configured to compare, based on data encrypted with a one direction encryption scheme using a preset common key with respect to each of the plurality of data comprised in the first and second homomorphic ciphertexts and position data in a homomorphic ciphertext on the encrypted data being input, encrypted data on the first homomorphic ciphertext with encrypted data on the second homomorphic ciphertext, and check the first position data and the second position data which include common data between the two homomorphic ciphertexts.
According to an example embodiment, a method of processing ciphertext on a homomorphic ciphertext includes storing homomorphic ciphertexts, which stores a plurality of variable data in an encrypted state, in plurality, and receiving an operation instruction on the plurality of homomorphic ciphertexts, generating a bin mask having different variable data classified for each of the plurality of homomorphic ciphertexts, generating number data corresponding to a variable combination by using the bin mask, and outputting the generated number data.
The homomorphic ciphertext may include a plurality of slots, and each of the plurality of slots may include one variable data.
The bin mask may include a plurality of slots, and each of the plurality of slots may include data on whether one variable value is present, and the generating the bin mask may include generating a plurality of bin masks for each variable data included in the homomorphic ciphertext with respect to each homomorphic ciphertext, and the generating number data may include selecting a bin mask corresponding to the variable combination from among the plurality of generated bin masks, and using multiplication between the selected bin masks to generate number data with the variable combination.
The bin mask may include a plurality of slots, and each of the plurality of slots may include a plurality of sub slots including data on whether one variable value is present, the generating the bin mask may include generating one bin mask with respect to each of the homomorphic ciphertexts, and the generating number data may include using sub slots in the bin mask corresponding to the variable combination from among the plurality of bin masks to generate number data with the variable combination.
The plurality of sub slots may be configured to be disposed in one slot with a preset bit distance.
The encryption processing method may further include joining a first homomorphic ciphertext and a second homomorphic ciphertext including a plurality of data on a same feature to one homomorphic ciphertext.
The joining may include using a first position data in the first homomorphic ciphertext and a second position data in the second homomorphic ciphertext on common data in the first homomorphic ciphertext and the second homomorphic ciphertext, and joining the first homomorphic ciphertext and the second homomorphic ciphertext as one.
The joining may include comparing, based on data encrypted with a one direction encryption scheme using a preset common key with respect to each of the plurality of data included in the first and second homomorphic ciphertexts and position data in a homomorphic ciphertext on the encrypted data being input, encrypted data on the first homomorphic ciphertext with encrypted data on the second homomorphic ciphertext, and checking the first position data and the second position data which include common data between the two homomorphic ciphertexts.
According to an example embodiment, a computer readable recording medium including a program for executing a ciphertext processing method includes storing homomorphic ciphertexts, which stores a plurality of variable data in an encrypted state, in plurality, and receiving an operation instruction on the plurality of homomorphic ciphertexts, generating a bin mask having different variable data classified for each of the plurality of homomorphic ciphertexts, generating number data corresponding to a variable combination by using the bin mask, and outputting the generated number data.
According to various example embodiments of the disclosure as described above, various statistical processing is possible by using a homomorphic ciphertext, and statistical processing is possible by merging with respect to a homomorphic ciphertext having data structures of different schemes.
The disclosure will be described in detail below with reference to accompanying drawings. A data transmitting process performed in the disclosure may be applied with encryption/decryption if necessary, and all expressions describing the data transmitting process in the disclosure and in the claims should be interpreted to include encryption/decryption even if it is not specific mentioned. Expressions in forms such as “transmit (transfer) from A to B” or “receive A from B” in the disclosure may include transmitting (transferring) or receiving with another medium included therebetween, and not necessarily describe transmitting (transferring) or receiving directly from A to B only.
In describing the disclosure, an order of each step is to be understood as non-limiting unless the order of each step needs to be performed such that a preceding step must be performed logically and temporally prior to a following step. That is, except for exceptional cases as described above, even if a process described as the following step is performed preceding a process described as the preceding step, it does not influence the nature of the disclosure and the scope of protection should also be defined regardless of the order of the step. Further, in the disclosure, expressions such as “A or B” not only refers to any one of A and B selectively, but also may be defined as including both A and B. In addition, the term “include” may have a comprehensive meaning as further including another element in addition to the elements listed as included.
In the disclosure, only the essential elements necessary in describing the disclosure have been described, and elements not related to the nature of the disclosure have been omitted. Further, the disclosure is not to be construed in an exclusive sense including only the recited elements, but to be interpreted in a non-exclusive sense where other elements may be included.
Further, in the disclosure, the term “value” may be defined as not only including a scalar value, but also a vector and a polynomial form.
Mathematical operations and calculations of each step in the disclosure described below may be realized with computer operations by a coding method known for performing a relevant operation or calculation and/or coding appropriately designed in the disclosure.
Specific equations described below are described as an example from among several possible alternatives, and the scope of protection of the disclosure should not be interpreted as being limited by the recited equations.
For convenience of description, notations such as the following will be used disclosure.
a←D: select element a according to distribution D
s1, s2∈R: each of S1, S2 is an element belonging to a set R
mod(q): compute modular with element q
-: round-off an internal value
The various example embodiments of the disclosure will be described in detail below using the accompanying drawings.
Referring to
The network 10 may be realized through a wired/wireless communication network, a broadcast communication network, an optical communication network, a cloud network, or the like of various forms, and each device may be connected in methods such as a Wi-Fi, a Bluetooth, a near field communication (NFC), or the like without a separate medium.
In
The user may input various data through the electronic devices 100-1 to 100-n used by oneself. The input data may be stored in the electronic devices 100-1 to 100-n itself, but may be transmitted to an external device for reasons such as storage capacity and security and stored. In
Each of the electronic devices 100-1 to 100-n may homomorphically encrypt the input data, and transmit a homomorphic ciphertext to the first server device 200.
Each electronic device 100-1 to 100-n may include encrypted noise, which is calculated in the process of performing homomorphic encryption, that is, an error in the ciphertext. For example, the homomorphic ciphertext generated in each of the electronic devices 100-1 to 100-n may be generated in a form in which a result value, which includes a message and an error value when decrypting using a secret key thereafter, is stored.
In an example, the homomorphic ciphertext generated in the electronic devices 100-1 to 100-n may be generated in a form satisfying the following property when decrypting using a secret key.
Dec(ct,sk)=<ct,sk>=M+e(mod q) [Equation 1]
Here, < and > represent a usual inner product, ct represents a ciphertext, sk represents a secret key, M represents a plaintext message, e represents an encryption error value, and mod q represents a modulus of a ciphertext. q may be selected greater than a result value M where a scaling factor (Δ) is multiplied to a message. If an absolute value of error value e is sufficiently small compared to M, a decryption value M+e of the ciphertext may be a value which may substitute the original message from the significant figure operation to a same degree of precision. The error from among the decrypted data may be disposed at a least significant bit (LSB) side, and M may be disclosed at a second least significant bit side.
Based on the size of the message being too small or too large, the size may be adjusted by using a scaling factor. If the scaling factor is used, because not only the message in integer form but even the message in error form may be encrypted, utilization may be greatly increased. In addition, by adjusting the size of the message using the scaling factor, an area in which messages are present in the ciphertext after operation is performed, that is, a size of an effective area may be adjusted.
According to an example embodiment, a ciphertext modulus q may be set to various forms and used. In an example, the modulus of the ciphertext may to set to a form of an exponentiation q=ΔL of the scaling factor Δ. If Δ is 2, it may be set to a value such as q=210.
In another example, the ciphertext modulus may be set to a value of which a plurality of different scaling factors are multiplied. Each factor may be set to a value within a similar range, that is, a value of a size similar with one another. For example, it may be set to q=q1 q2 q3 . . . qx, and each of q1, q2, q3 qx may be a size similar with scaling factor Δ, and may be set to a value of a small relationship with one another.
If the scaling factor is set in this method, because a whole operation can be carried out by separating into a plurality of modulus operations according to a Chinese Remainder Theorem (CRT), the burden of operation may be reduced.
In addition, by using factors of similar sizes with one another, when performing rounding in a step to be described below, a value nearly similar with the result value of the previous example may be obtained.
The first server device 200 may not decrypt the received homomorphic ciphertext, and store in the ciphertext state.
The second server device 300 may be configured to request a specific processing result on the homomorphic ciphertext to the first server device 200. The first server device 200 may be configured to transmit, after performing a specific operation according to a request of the second sever device 300, the result to the second sever device 300. Here, the specific operation may not only be general operations such as performing addition on a plurality of homomorphic ciphertexts and homomorphic multiplications, but also operations such as a statistical operation, for example, an average, a frequency distribution, a linear regression, a covariance, or the like.
At this time, the second server device 300 may be configured to perform a joining operation on the plurality of homomorphic ciphertexts.
In an example, based on ciphertexts ct1 and ct2 transmitted by the two electronic devices 100-1 and 100-2 being stored in the first server device 200, the second server device 300 may be configured to request a value of aggregated data provided from the two electronic devices 100-1 and 100-2 to the first server device 200. The first server device 200 may be configured to transmit, after performing operation of aggregating the two ciphertexts according to the request, the result value (ct1+ct2) to the second sever device 300.
Based on the properties of the homomorphic ciphertext, the first server device 200 may be configured to perform operation in a state not having performed decryption, and the result value thereof may be in ciphertext form. At this time, the first server device 200 may be configured to perform bootstrapping on the operation result.
The first server device 200 may be configured to transmit an operation result ciphertext to the second sever device 300. The second sever device 300 may be configured to decrypt the received operation result ciphertext and obtain the operation result value of data included in each homomorphic ciphertext. Further, the first server device 200 may be configured to perform operation according to a user request numerous times.
In
Referring to
The memory 110 may be configured to store at least one instruction on the electronic device 100. For example, the memory 110 may be stored with various programs (or software) for the electronic device 100 to operate according to the various example embodiments of the disclosure.
The memory 110 as described above may be realized to various forms such as a random access memory (RAM) or a read only memory (ROM), a Buffer, a cache, a flash memory, a hard disk drive (HDD), an external memory, a memory card, or the like, and is not limited to any one.
The memory 110 may be configured to store a message to be encrypted. Here, the message may be various credit data, private data, or the like variously cited by the user, and may be data associated with use history, or the like such as position data and internet use time data used in the electronic device 100.
Further, the memory 110 may be configured to store a public key, and store, based on the electronic device 100 generating the public key directly, not only a secret key, but also various parameters required in generating the public key and the secret key.
Further, the memory 110 may be configured to store the homomorphic ciphertext generated in a process described below. Further, the memory 110 may be configured to store the homomorphic ciphertext transmitted from the external device. In addition, the memory 110 may be configured to store the operation result ciphertext which is a result product of an operation process described below.
The communication device 130 may be formed to connect the electronic device 100 with the external device (not shown), and may be formed not only in a form connecting to the external device through a local area network (LAN) and an internet network, but also in a form connecting through a universal serial bus (USB) port or a wireless communication (e.g., WiFi 802.11a/b/g/n, NFC, Bluetooth) port. The communication device 130 may be referred to as a transceiver.
The communication device 130 may be configured to receive the public key from the external device, and transmit the public key generated on its own by the electronic device 100 to the external device.
Further, the communication device 130 may be configured to receive a message from the external device, and transmit the generated homomorphic ciphertext or the operation result to the external device.
In addition, the communication device 130 may be configured to receive various parameters required in generating the ciphertext from the external device. The various parameters upon realization may be received directly from the user through the operation input device 150 which will be described below.
In addition, the communication device 130 may be configured to receive a request of operation on the homomorphic ciphertext from the external device, and transmit the calculated result according thereto to the external device. The requested operation may be operation such as addition, subtraction, and multiplication (e.g., modular multiplication operation), and may be statistical operation. Here, modular multiplication operation may refer to modular operation with a q element.
The display 140 may be configured to display a user interface window for selecting a function supported by the electronic device 100. For example, the display 140 may be configured to display the user interface window for selecting various functions provided by the electronic device 100. The display 140 may be a monitor such as a liquid crystal display (LCD), an organic light emitting diodes (OLED), or the like, and may be realized to a touch screen capable of simultaneously performing a function of the operation input device 150 which will be described below.
The display 140 may be configured to display a message requesting input of a parameter required in generating a secret key or a public key. Further, the display 140 may be configured to display a message having the subject of encryption to select the message. The subject of encryption upon implementation may be selected directly by the user, or selected automatically. That is, private data and the like required in encryption may be set automatically even if the message is not directly selected by the user.
The operation input device 150 may be configured to receive input of a function selection of the electronic device 100 and a control command on a relevant function from the user. For example, the operation input device 150 may be configured to receive a parameter required in generating the secret key and the public key from the user. In addition, the operation input device 150 may be configured to receive, from the user, the setting of the message to be encrypted.
The processor 120 may be configured to control the overall operation of the electronic device 100. For example, processor 120 may be configured to control, by executing at least one instruction stored in the memory 110, the operation of the electronic device 100 overall. The processor 120 may be configured to a single device such as a central processing unit (CPU) and an application-specific integrated circuit (ASIC), or configured to a plurality of configurations such as the CPU and a graphics processing unit (GPU).
When the message to be transmitted is input, the processor 120 may be configured to store in the memory 110. Further, the processor 120 may be configured to use the various setting values and program stored in the memory 110 to homomorphically encrypt the message. In this case, the public key may be used.
The processor 120 may be configured to use a public key required in performing encryption by generating the public key on its own, or may and receive and use from the external device. In an example, the second sever device 300, which performs decryption, may be configured to distribute the public key to other devices.
When generating the key on its own, the processor 120 may be configured to generate the public key by using a Ring-LWE technique. For example, the processor 120 may be configured to first set various parameters and ring, and store in the memory 110. An example of the parameter may be a length of a plaintext message bit, a size of the public key and the secret key, and the like.
The ring may be represented with Equation 2 as below.
Here, R represents the ring, Zq represents a coefficient, f(x) represents an n-th polynomial.
The ring, as a set of polynomials having a predetermined coefficient, may refer to a set of which addition and multiplication between the elements are defined and closed with respect to the addition and multiplication. The ring may be referred to as a ring.
In an example, the ring may refer to a set of an n-th polynomial where the coefficient is Zq. For example, if n is Φ(N), it may mean an N-th cyclotomic polynomial. (f(x)) may represent an ideal of Zq[x] which is generated as f(x). A Euler totient function Φ(N) may refer to a number of natural numbers disjoint from N and smaller than N. When ΦN(x) is defined as the N-th cyclotomic polynomial, the ring may be represented with Equation 3 as below. Here, 217 may be used for N.
The secret key (sk) may be represented as below.
The ring of Equation 3 described above may include a complex number in the plaintext space. In order to increase operational speed on the homomorphic ciphertext, only the set of which the plaintext space is a real number from among the above-described set of rings sets may be used.
When the ring as described above is set, the processor 120 may be configured to calculate the secret key (sk) from the ring.
sk←(1,s(x)),s(x)∈R [Equation 4]
Here, s(x) may refer to a polynomial randomly generated as a small coefficient.
Further, the processor 120 may be configured to calculate a first random polynomial(a(x)) from the ring. The first random polynomial may be represented as below.
a(x)←R [Equation 5]
In addition, the processor 120 may be configured to calculate an error. For example, the processor 120 may be configured to calculate an error from a discrete Gaussian distribution or a distribution having a close statistical distance therefrom. The error may be represented as below.
e(x)←Dnaq [Equation 6]
When even the error is calculated, the processor 120 may be configured to perform a modular operation of the error to the first random polynomial and the secret key to calculate a second random polynomial. The second random polynomial may be represented as below.
b(x)=−a(x)s(x)+e(x)(mod q) [Equation 7]
Finally, the public key (pk) may be set as below in a form which includes the first random polynomial and the second random polynomial.
pk=(b(x),a(x)) [Equation 8]
Because the above-described key generation method is merely one example, the embodiment is not necessarily limited thereto, and the public key and the secret key may be generated in other methods in addition to the above.
The processor 120 may be configured to control, based on the public key being generated, the communication device 130 to transmit to the other devices.
Further, the processor 120 may be configured to generate the homomorphic ciphertext on the message. For example, the processor 120 may be configured to apply the public key generated previously on the message to generate the homomorphic ciphertext.
The message to be decrypted may be received from an external source, and may be input from an input device directly included in or connected to the electronic device 100. For example, based on the electronic device 100 including a touch screen or a key pad, the processor 120 may be configured to store data input through the touch screen or the keypad by the user in the memory 110, and then encrypt the input data. The generated homomorphic ciphertext may be in a form which is restored to a result value of adding the error to a value which reflects the scaling factor in the message when performing decryption. The scaling factor may use a value, which is previously input and set, as is.
Alternatively, the processor 120 may be configured to perform encryption by using the public key immediately while multiplying the message and the scaling factor. In this case, the error calculated in the encryption process may be added to the result value of multiplying the message and the scaling factor.
In addition, the processor 120 may be configured to generate a length of the ciphertext to correspond to a size of the scaling factor.
Further, the processor 120 may be configured to control, based on the homomorphic ciphertext being generated, the communication device 130 to store in the memory 110, or transmit the homomorphic ciphertext to another device according to a user request or a pre-set default instruction.
According to an example embodiment of the disclosure, packing may be performed. When packing is used in a homomorphic encryption, it may be possible to encrypt multiple messages to one ciphertext. In this case, when an operation is performed between each of the ciphertexts in the electronic device 100, because consequentially operations on multiple messages are processed in parallel, the operational burden is greatly reduced.
For example, the processor 120 may be configured convert, based on the message being formed of a plurality of message vectors, the plurality of message vectors to a polynomial of a form which may be encrypted in parallel, and then perform homomorphic encryption by multiplying the scaling factor to the polynomial and using the public key. Accordingly, the processor 120 may be configured to generate the ciphertext which carried out packing of the plurality of message vectors.
Based on the data stored by the electronic device 100 being a statistical table, the processor 120 may be configured to generate, in the generating process of the homomorphic ciphertext, the homomorphic ciphertext including variable data in a plurality of slots in the ciphertext. In addition, the processor 120 may be configured to generate, in the generating process of the homomorphic ciphertext, a bin mask on the relevant homomorphic ciphertext. The specific bin mask generating operation will be described below with reference to
Further, the processor 120 may be configured to apply, based on decryption being required on the homomorphic ciphertext, the secret key to the homomorphic ciphertext to generate a decryption text of a polynomial form, and generate a message by decoding the decryption text of the polynomial form. The message generated at this time may include an error as described in Equation 1 described above.
Further, the processor 120 may be configured to perform an operation on the ciphertext. For example, the processor 120 may be configured to not only perform an operation of addition, subtraction, multiplication, or the like while maintaining the encrypted state on the homomorphic ciphertext, but also perform various statistical operations such as an average and frequency distribution on a plurality of data. The specific statistical operation method will be described below with reference to
The electronic device 100 may be configured to detect, based on the operation being completed, data of an effective area from the operation result data. For example, the electronic device 100 may be configured to perform a rounding process of the operation result data to detect data of the effective area.
Here, the rounding process may mean proceed with a round-off of the message in the encrypted state, and may otherwise be referred to as rescaling. For example, the electronic device 100 may be configured to remove a noise area by multiplying a reciprocal number Δ−1 of the scaling factor to a component of each of the ciphertexts and rounding-off. The noise area may be set to correspond to the size of the scaling factor. Consequentially, a message of an effective area with the noise area excluded may be detected. Because it is proceeded in the encrypted state, additional errors may be generated, but because the size is sufficiently small, it may be disregarded.
Further, in the above-described rounding process, the modular multiplication operation as described above may be used.
In addition, the electronic device 100 may be configured to expand, based on a weight of an approximate message in the operation result ciphertext exceeding a threshold value, the plaintext space of the operation result ciphertext. For example, if q is smaller than M in the above-described Equation 1, because M+e(mod q) is to have a different value from M+e, decryption may not be possible. Accordingly, the value of q is to be maintained greater than M at all times. However, the value of q is gradually decreased as the operation proceeds. The expansion of the plaintext space may refer to changing ciphertext ct to a ciphertext having a greater modulus. The operation of expanding the plaintext space may otherwise be referred to as rebooting. In performing rebooting, the ciphertext may be in a state in which operation is possible once again.
The electronic device 100 according to the disclosure as described above may not only effectively perform an operation on the homomorphic ciphertext, but also on a complex statistical operation. In addition, the electronic device 100 may be configured to manage homomorphic ciphertext provided from multiple devices in one database (DB).
The specific operation of statistical operation on the homomorphic ciphertext will be described below.
First, for an effective statistical operation on the homomorphic ciphertext, the homomorphic ciphertext may be generated to include the data structure as described below.
Described is which method to store table data of the plaintext, that is, multiple record data formed of various features in the homomorphic encryption which provides a Single Instruction Multiple Data (SIMD) function. To this end, data may be gathered for each feature and stored in the ciphertext. That is, one ciphertext may store only data belonging to one feature. Here, this means that one ciphertext may store a plurality of variable values on one feature and not one ciphertext including only one data.
The homomorphic ciphertext may include multiple slots, and each slot may store multiple data. Accordingly, using the above, values on one feature (i.e., same column data in a table) may be stored in each of the multiple slots.
Specifically, table data may be stored and managed in the form as below. For example, if the size of the plaintext table is n and m (here, n is length of data row, and m is length of data column (=number of features)), and the number of data which may include a fully homomorphic ciphertext is M(=N/2), the encryption table including the encrypted data may include the description as below.
1. ciphertext c0,0, c0,1, . . . , c0,┌n/M-1┐, c1,0, . . . , cm-1,0, cm-1,1, . . . , cm-1,┌n/M-1┐ (here, ciphertext cij is (j−1)-th ciphertext including i+1-th feature, and the number of ciphertexts is n/M for each feature,
2. the number of the total features is m, data column is n,
3. the number of data comprised in other additional metadata (e.g., names of each feature, one ciphertext (also referred to as 1 Block), and table name).
Based on using the method as described above, a more effective calculation may be possible when calculating the statistical value for each floating feature in the encrypted table data. In addition, table joining may be effectively performed without decryption with respect to encryption tables of different methods. The table joining operation will be described below with reference to
Each homomorphic ciphertext including data on one feature has been assumed below, and an operation on statistical operation with respect to homomorphic ciphertext will be described.
First, the homomorphic ciphertext may perform an operation while in an encrypted state, but because the operation of the encryption process consumes much time, an efficient operation method is required.
In terms of statistical processing, it is first necessary to find a variable which coincides with a specific condition, and find the variable which preferentially coincides with the specific condition in that processing such as an average and dispersion on the found variable are performed.
In this aspect, a bin count operation for finding a variable which satisfies the specific condition will be first described below with reference to
Referring to
After generating intermediate data as described above, the result data 330 representing possible combinations of the three variables and a number on the relevant combinations may be generated by using the intermediate data with a count value.
The detailed operation of generating the intermediate data described above will be described below with reference to
Referring to
Each of the variables 410, 420 and 430 may include the plurality of slots. In the illustrated example, although it has been illustrated as having eight slots, slots of more than or less than nine may be included at realization.
A bin mask on each variable value may be generated with respect to each of the variables. For example, in case of variable A 410, because four values are included, a bin mask 420 corresponding to each of the four values may be formed. Further, in case of variable B 430, because four values are included, the bin mask 420 on each of the four values may be formed, and in case of variable C 450, because two values are included, a bin mask 460 on each of the two values may be formed.
The bin mask as described above may be formed at the time of encryption or may form a bin mask by calculating after being encrypted using a table look-up table function. A more detailed operation of generating the bin mask will be described with reference to
Then, when checking a bin count relevant to a specific combination, the bin mask corresponding to the relevant combination may be selected, and a result may be checked through the multiplication of the selected bin masks. For example, when checking of 1, 1, 1 combination is required, an operation of multiplying a bin mask 471 of value 1 from among the bin masks on variable A 420, a bin mask 472 of value 1 from among the bin masks on variable B 430, and a bin mask 473 of value 1 from among the bin masks on variable C 450 with one another may be performed.
An output mask 480 generated by the operation as described above may include a value of 1 at a position relevant to the relevant combination. Through the above, the position in each variable which form the relevant combination and the number of relevant combinations may be checked.
When expanding the operation as described above, a bin average and a bin variance may be calculated. The bin average may be an operation of obtaining an average on another variable of data having the combination of the specific bin variable. Further, the bin variance may be an operation of obtaining a variance on data. A method of performing the relevant operation may include multiplying the bin mask value on the intermediate data 320 in
The bin count described above may be used in the classification process, and may be used in methods such as association rule mining. However, in order to raise accuracy in classification, many number of possible cases in classification is required, and a data analyzer may operate bin count by combining many variables having a greater range in bin value.
Data of various continuous values may be represented as quantile data for convenience of analysis, and as an example, there may be fifty variable values with respect to one.
If there are fifty variable values with respect to one variable, the number of possible cases is significant. When the number of possible cases is significantly increased, if bin count is performed by generating the bin mask for each variable value, the total number of operations may be (n−1)×w×└u/M┘ (here, n is the number of variables, w is the number of combinations, M is the number of slots of the ciphertext, and u is the length of a row of whole data). Here, if w and u are a unit of millions, n is less than or equal to 10, and M is a unit of tens of thousands, when the number of multiplication times required reaches billions, the necessary time required be dozens of days or more time even if the time for performing multiplications is set as several milliseconds.
Accordingly, a modified method of bin count in which types of variables are varied will be described below.
The bin mask below is represents the bin value in an encoded form. For example, when there is data having a range of [1,10] bin value, the bin value may be represented as 10 bytes. If representing bin value i(∈[1,10]) is desired, a method of setting i-th byte as 1 from among ten bytes and the rest as 0 may be used.
The bin masks set in this method may be added between one another. The above will be described with reference to
Referring to
When utilizing the power bin mask as described above, individual bin masks may have been generated for each variable type in the previous process, but the power bin mask may be satisfied with one for each variable type.
Further, the operation method may be the same as the bin mask method, and if a specific combination is required, output data 520 may be generated by using the sub slot of the power bin mask corresponding to the relevant combination. Then, output data 520 may be decoded, and the value of each combination may be calculated 530.
If the method as shown in
Advantages of utilizing the power bin mask as described above will be described with reference to
Referring to
Because the multiplication between the expanded bin masks is not operated based on the feature of the proposed method, only one expanded bin mask may participate in the calculation of the bin count. In addition, because the number of bins which may be included in the one bin mask is 50 at maximum, a reduction of multiplication times which may be improved by using the proposed method may be 1/50. In addition, because the multiplication which uses the expanded bin mask is to include multiple bin data in one slot, and this results in the increase of bit length to be managed per slot, this may also influence the multiplication time. The number of bins and number of reduction times described above are merely exemplary, and the numerical value may be changed according to a homomorphic encryption scheme applied.
For the statistical operation, it may be necessary to use not only a table of one owner, but also tables of different owners.
Accordingly, a method of joining tables owned by different devices to one table will be described below.
Different databases (DBs) may store homomorphically encrypted data in a different method. However, when each of the different DBs is stored by dividing by feature and dividing by feature of the table, it is possible to easily join the two tables.
Specifically, it may be assumed that an encrypted table is owned by a first electronic device 100-1 and a second electronic device 100-2 which are different, and key data for joining is shared therebetween. If there is a third electronic device 100-3 honestly performing protocol, two encrypted tables may be joined with the assistance of the third electronic device 100-3. The method as described above may be effectively performed in that data joining is possible without an additional homomorphic encryption operation excluding table encryption. In addition, even from a security aspect, there is the advantage that other than the number (inner-join) of common data in the process of performing joining of the plurality of tables or the number (outer-join) of data not owned by oneself from among the data owned by a counterpart, no other data is exposed.
To this end, column values relevant to the key for joining may perform one-way encryption where decryption is impossible by using a separate hash-based message authentication code (HMAC) function rather than homomorphic encryption. At this time, because each of the electronic devices 100-1 and 100-2 share the same HMAC key, the HMAC performance result on the same key value may be the same. However, because the HMAC result also rely on the shared key value, even if it is the same key value, when another HMAC key is used, another HMAC value may be obtained.
The HMAC value and a position value of the row in the table of the original join key used for forming the relevant HMAC value may be combined to form a pair of data, and this data set may be arranged by using the HMAC key value and then transmitted to the third electronic device 100-3. Here, the position value may refer to the join key value and a row number of the other data connected therewith.
A joining company may be configured to identify keys which match by using the HMAC value, based on each electronic device sending row value data gathered together, the relevant first electronic device and second electronic device may be configured to transmit data which encrypted the relevant rows to the third electronic device 100-3, and the third electronic device 100-3 may be configured to perform joining by combining metadata of the data sent from the two devices and forming metadata of one joined table.
A more detailed joining operation will be described below with reference to
Referring to
Participants of the protocol are as follows. Data owning companies D1 and D2 may be present, a data joiner F, and Z which will own the finally encrypted joining table may be present. Here, D1 and D2 may be the electronic device or server illustrated in
The data owning companies D1 and D2 may have the same homomorphically encrypted encryption key instance (pkh), and also share a same MAC key (symmetric key skMAC: 256 bit random bit). The data analyzing company Z may have a calculation key evkh which may perform calculation on data encrypted with pkh.
A parameter and algorithm on homomorphic encryption may be shared by D1, D2, and Z, and the MAC algorithm
may be shared by A and B.
Data owned by the data owning companies D1 and D2 may be described as below.
The data owning companies D1 and D2 may have data of mD
The features in the remaining columns may be represented as
In addition, fX
Accordingly, a data tuple on an arbitrary user idD1, owned by D1 company may be defined as
An operation may be divided into an inner-join of performing joining with respect to the common key present in the data owning companies D1 and D2, and an outer-join of performing joining with respect to all data present in the two companies. The detailed operation is described as below.
(Initial environment) Companies D1, D2, F, and Z are present, and the data owned by the respective companies are the same as described above.
1. Each company DX(∈{1,2}) may add a new column and perform MACsk
2. All data may be arranged by a row unit based on macid
(i∈[0,nX−1]) form. That is, the row including maci may be recorded in the i-th row of an input table. The table may be referred to as TX. TX may be owned by DX.
3. Each company DX may transfer LX=(i,maciX):i∈[0,mD
4. F may use the received L1, L2 to calculate
and then define as
such that
In addition, define as
If function indexX:SD
First, the outer-join will be first described.
a) Data with order PD
b) RD
Here, πs(R) may mean a safe permutation which mixes the order of R value using randomness provided by s, and if s is unknown, data on data R with the original order may not be learned. In addition, if s is known, π−1
d) F may transfer UID1 to D1, and transfer UID2 to D2. In addition, F may transfer |SD
e) The DX which received UIDX may arrange data matching the order of values in the relevant sequence. That is is
be arranged in
data may
order. Here, it may mean nUID
f) After, ⊥ sequence of length mD
g) X may use the method in 1) to encrypt DX by column to form an encryption table object Ctb
h) F may use π−1S
to restore the order of Ctb
i) If the result of h) is C′1, C′2 respectively, the two tables may be combined and one encrypted table C′ may be formed. At this time, metadata of the joined encryption table may be generated by joining the metadata of each of C′1, C′2.
j) F may transfer the joined encryption table CR to Z.
The inner-join may be similar to the outer-join above, but is different in a), b) and d) processes as below. In addition, f) process may not be performed.)
b) RD
d) F may transfer UID1 to D1, and UID2 to D2.
An actual statical operation using the bin mask described above will be described below.
Referring to
If the number of people in the metropolitan city of the age 60s or more is to be checked in the data as described above, the count may be carried out by using the bin mask which includes 3 for age and 2 for region.
In addition, if an average credit rating is to be obtained of a person from Seoul, a relevant slot may be detected by using 1 bin mask for the region, and calculate the average by performing homomorphic addition on data of an encrypted credit rating corresponding to the detected slot. A more detailed conditional statistical operation will be described below.
A more detailed operation on the statistic calculation described above using the previously described bin mask and expanded bin mask will be described below.
For convenience of description, a categorical variable may be referred to as a ‘bin variable’ (or, bin feature), and for convenience of operation, the bin feature will be represented as a positive integer continued from 1. For example, one column of bin variable with three categories may be such that the value of each row is 1 or 2 or 3.
The system according to the disclosure may provide the statistical operation on data of which a certain bin variable has a specific value, that is, belonging to a specific category. This is because data in the specific category and the statistical feature may be different because the average, variance and standard deviation operation above is an operation on all data.
In addition, the statistical operation may be possible on not only the condition on the one bin variable, but also on data having the condition on several bin variables of an arbitrary number.
The statistical operation as described above is the same as realizing the operation formula as below in the encrypted state.
Average:
Variance:
Correlation Coefficient:
It may be assumed that index f0, f1, . . . , fm-1 of the bin features are present, and integer c0, c1, . . . cm-1 is present. The operation below may perform the statistical operation with the condition on x(f
Bin Count: function counting a number of possible cases satisfying the condition in sum(x(f
Average: total of x(f
Bin Average: E(x(f
Bin Variance: Var(x(f
A more detailed statistical operation method will be described with reference to
A comparison operation between the bin variable column and the specific integer value is necessary for the process described above.
However, because the comparison operation using the homomorphic operation involves significant cost, the comparison results may be stored in advance in the bin mask and utilized in the operation.
A certain bin feature fO may include sO as a maximum bin variable. That is, each row of one column of the data may include one value from among {1, 2, . . . s0}. The bin mask generated at this time may be encrypted columns of a total sO number, and may be represented as bi(f
The operation of generating the bin mask used in the statistical operation will be described in greater detail below.
The bin mask may be generated at the encryption step, and may be generated even after the encryption step. First, the operation of generating the bin mask at the encryption step will be described below with reference to
Referring to
The data table in the plaintext state may be represented as x, and the v-bit table may be represented as v. The output value may be b={b(f
The generating operation of the bin mask after the encryption operation will be described below.
The generating of the bin mask after the encryption operation may be used when data in the plaintext state cannot be approached, and when generating the bin mask using the homomorphic operation in the encrypted state.
First, a realization of an integer comparing operation as below is required in the encrypted stated for the bin mask generation.
This operation may be represented as ((a≡b)=δ(a−b) with respect to a function as with Equation 11 below.
Because the categorical variable includes all integer values, it is to satisfy δ(x) function with respect to the input of the integer value. Because addition and multiplication are provided in the homomorphic encryption system, the function is to be satisfied within the integer range by approximating the δ(x) function with any polynomial.
At this time, a sinc function as below may be used as an approximation function
The above may include a value such as δ(x) from all integer values, and a polynomial approximation may be possible. However, when approximating in the wide range, the amount in calculation may increase as the degree of an approximation formula becomes higher. In this case, approximation in a wide range may be possible with only the approximation of a narrow range by using a multiple angle formula of a trigonometric function.
First, using the sin 2ϕ=2 sin ϕ cos ϕ equation, the sinc function may be changed like in Equation 13 as below.
If
value is known, because the value of
can all be known by using the cos 2ϕ=2 cos2ϕ−1 formula, the sinc(x) value may be obtained if the values of
are known. Accordingly, the sinc(x) value may be known in [−μ,μ] with only approximating sinc(x) and cos(x) in
range.
A Chevyshev approximation may be used as below for the approximation of the f(x)=sinc(x) or f(x)=cos(x) function. However, other approximation methods may be used at realization.
Here, cj is the same as in Equation 15.
Here, Tj(x) is a j-th Chevyshev polynomial.
The polynomial generated according to the approximation method described above may be a K-th polynomial.
When using the method as described above, approximation may be carried out with a small error in [−1, 1] range, and if K is an even number, the approximation formula may include a value of 1 exactly when x=O
If there is an error present, because the value of the bin mask may not include a value of 1 exactly as the error is continuously amplified when calculating sinc(x) using the multiangle formula, K may be set as an even number.
Referring to
is calculated in advance.
Each slot of x with respect to μ of the input may be a value in [−μ, μ] range, and K may be an even number as a degree of approximation polynomial. M of the process may be the number of slots in one ciphertext.
At this time, the multiplication operation may be O(K+3┌log 2μ┐) time.
The bin mask generating operation which uses the integer comparing operation by using the method described above will be described with reference to
Referring to
Encrypted data table X, and encrypted v-bit table V may be received as input.
The output value may be b={b(f
A detailed method of calculating the bin count by using the generated bin mask will be described below.
Here, the bin count operation may be a function which counts the number of rows satisfying the condition on the several bin features.
When receiving vector {right arrow over (c)}=(c0,c1, . . . cm-1) formed of vector {right arrow over (f)}=(f0,f1, . . . fm-1) and integer value of the bin feature as an input, the bin count operation may include counting the number of valid data which satisfies m number of conditions of fO-th feature value being cO, . . . fm-1-th feature value being cm-1.
When using the previously generated bin mask, because v-bit was taken into reference in the previous bin mask process, a separate consideration on v-bit is not necessary.
If all conditions are satisfied when all the bin masks corresponding to the condition are multiplied row-wise, the value may be 1 and if not 0.
Accordingly, the bin count operation may add the value of all slots of the multiplication result.
In
Further, when nb is the number of blocks in one column, m is the number of conditions, and M is the number of slots in one ciphertext, the process above may perform O(nb*m) number of multiplications and O(log 2M) rotation operation.
A modification of the above-described operation of the bin count operation will be described below.
First, the modified bin count method (hereinbelow, referred to as a large bin count) may obtain all number of possible cases formed by the several bin features of a certain data and represent the result in table form.
The specific operation will be described with reference to
Referring to
If a large bin count is performed on m number of bin features having s0, s1, . . . sm-1 with the maximum bin value respectively, a result table representing all
counts of number of possible cases may be obtained.
The bin count method may use the bin mask which was previously formed as described above. Bin mask b bi(f
As illustrated in
Referring to
Accordingly, the number of possible cases sought may be obtained by adding the value of all rows of the multiplication result.
However, only the number of one possible case from among all combinations may be obtained through the process of multiplying and adding the bin masks as described above.
Accordingly, when bin features f0, f1 . . . fm-1 of m number include s0, s1, . . . sm-1 as maximum bin value, the process above is to be repeated by
times in order to obtain the number of possible cases of all combinations formed by the features.
The data table may include rows of n number, and when each ciphertext includes slots of M number, each row may include blocks of n/M number. Accordingly, multiplication of a total
number is required.
The number of bin features seeking the number of possible cases, or the number of multiplication operations necessary as each of the maximum bin value increases may increase.
Accordingly, a method for calculating the number of possible cases with multiplication of a small number will be described below.
In order to perform the object as described above, the power bin mask may be used in place of the bin mask.
Each row of the power bin mask may include a value of 2(i-1×Δ+δ when the value of the corresponding bin feature is i. That is, a margin of about Δ may be provided for each bin value.
Referring to
Based on the power bin mask 1620 being generated as described above, p(A), b1(B), b1(C), b1(D), b1(E) 1630 may be multiplied as in the previous method.
The value of each row in the multiplication result may be 0 when features B, C, D and E are all not 1. Then, the value of all rows may be added. If it can be ensured that the number of rows being added at this time is less than 2Δ, the addition result may be respectively stored in an area by about Δbit.
The example in
The result value may include both B, C, D and E being 1, and five counts of number of possible cases when A=1˜5. Accordingly, although the method may include performing the multiplication operation of the same number as the previous bin count method, it may be equivalent to obtaining a number of possible cases which is 5 times more.
Accordingly, bin features f0, f1 . . . fm-1 of m number may include s0, s1, . . . sm-1 as the maximum bin value, and when trying to obtain the number of possible cases of all combinations thereof with fO, the number of multiplication times may be reduced by 1/sO times in theory when a bin mask expanded with fO is generated.
In order to apply the power bin mask method as described above, each row of the bin mask is to be represented as 0 or 2Λ (here, Λ is a positive integer) rather than 0 or 1 considering an error term of the homomorphic encryption method.
When bin mask is multiplied by m−1 number, the value of each row is increased by up to a maximum 2(m-1)Λ, and the value is not to exceed a modulus bit of the homomorphic ciphertext.
To this end, a process of representing bin features of m−1 number as new bin features of k number may be included (m−1>k).
This is because if a bin mask is formed and multiplied with one another in the method described above with respect to the new bin features of k number, the maximum value of each row may be reduced by 2kΛ.
The whole bin count operation which considers the error term as described above is the same as in
Referring to
In summarizing the operation of the bin count operation as described above, first the power bin mask and the big bin mask may be generated proactively. The bin mask process as described above may be performed in the encryption process. Alternatively, it may be performed after the encryption process.
When the power bin mask and the big bin mask are generated, the multiplication operation between the relevant masks may be performed. Then, a decryption process for checking the operation result may be performed.
Specifically, an expanded bin mask may be generated with one feature (f0) for the large bin count operation.
Then, a big bin mask of k number may be generated with the rest of (f1, f2, . . . fm-1) of m−1 number (k<m−1). This is to represent columns of m−1 number as columns of k number which is a smaller number, and a process of forming a bin mask of a new column. For convenience of description, a case of k=2 will be described below. The big bin mask generated as described above may be referred to as big bin mask 1 and big bin mask 2, respectively.
At this time, the generated expanded bin mask, big bin mask 1, and big bin mask 2 may be represented as p, q(i), r(i) (i=1, 2, . . . Q) respectively. (p, q(i), r(i) (i=1, 2, . . . Q) may be equivalent to one column encrypted respectively. Accordingly, it may be comprised of ciphertext by a block number (n/M) for one column, respectively.)
q={q
(1)
,q
(2)
, . . . q
(Q)
},r={r
(1)
,r
(2)
, . . . r
(Q)} [Equation 16]
a value relevant to j+1-th slot of k+1-th ciphertext in each mask (i=0, 1, . . . m−1, k=0, 1, . . . n/M−1, j=0, 1, . . . M−1)
Referring to
Because an error is generated at a lower bit in a specific homomorphic encryption system, an offset of lower δ bit is provided to each slot value of the power bin mask.
x, v, f0 of input may be an index of the bin feature for forming a data table, v-bit table in plaintext state, and power bin mask, respectively.
Further, n may be a number of data rows, and M may be a number of slots of one ciphertext.
Referring to
A typical bin mask represented whether it is relevant to a specific value as 0 or 1, but it may be represented as 0 or 2Λ in this case.
If a feature index of m−1 number is f1, f2, . . . fm-1 respectively, and the maximum bin value of each feature is s1, s2, . . . sm-1, the new two variables may each include
as the maximum bin variable. Here, x, v may be a data table in plaintext state and v-bit table in plaintext state, respectively.
As a result of this process, bin masks on the two new assumed columns may be generated. That is, it may be equivalent to columns of 2*Q number, of which the number of rows is n, being generated.
This process is so that the value stored in each slot of the multiplication process of the large bin count is not greater than or equal to the modulus bit.
As described above, when the large bin count and the big bin count are prepared, the multiplication operation may be performed.
Specifically, all pk·qk(i)·rk(j), (i,j=1, 2, . . . Q, k=0, 1, . . . n/M−1) may be calculated. A GPU may be used in the homomorphic multiplication operation of this process, and each GPU may perform the multiplication operation on a block basis. The specific operation will be described below with reference to
Referring to
After processing one block, a result ciphertext may be generated by Q2 number.
When the above is stored with respect to all blocks, it may become n/M*Q2 number and this may require greater storage space as the block number increases. Accordingly, if the number of blocks is greater than a number (Ng) of GPU, the result ciphertext of the previously processed block may be called and stored after adding with the new result ciphertexts. Accordingly, the number of ciphertexts which may be stored may be limited to a maximum Ng*Q2.
When the GPU is able to load one expanded bin mask, big bin mask 1 of c number, and big bin mask 2 of c number all at once in the operation process, the algorithm may be as described in
In
Because the mask generated through the process as described above is in an encrypted state, utilizing the data as is may be difficult. Accordingly, the decryption operation after the operation of multiplication operation will be described below.
Referring to
Because BigBin1, BigBin2 represent m−1 number features as two features, (i,j) pair may be mapped as one possible case from among the combination of bin features of m−1 number. There may be Ng number of ciphertexts with the value of BigBin1 being i, and the value of BigBin2 being j.
Accordingly, after decryption of all ciphertexts of Ng number, by cutting the value of each slot to a bit unit and adding only the values of Δ bits of (2·Λ+l·Δ+δ)≈(2·Λ+(l−1)·Δ+δ), the number of possible cases in which the bin feature fO value which formed the expanded bin mask is 1 may be obtained.
As illustrated above, with the input value of the relevant algorithm, the result of the multiplication process of the previous bin mask is required. With the result of this process, a table of Q2*sO size containing number of all possible cases may be obtained.
Referring to
Referring to
If the description illustrated in
What is to be obtained through this operation is whether it is relevant to a specific bin value, and may be identified through a value from lower (2Λ+s0·Δ+δ)-th bit to (2Λ+s0·Δ+δ) bit of an arbitrary slot of a multiplication result.
The multiplication result of an arbitrary slot may be represented as in Equation 17 below.
{2(i-1)×Δ+δ+e}·{a1·2Λ+e}·{a2·2Λ+e}≈a1a2·2((i-1)·Δ+δ+2Λ)+{22Λ+2·2((i-1)·Δ+δ+2Λ)}·e (a1a2∈{0,1}) [Equation 17]
At this time, the lower bit of the value desired by the error of the multiplication result is not to be exceeded, and an upper bit of the value is not to exceed the modulus bit. That is, the error term (22Λ+2((s
Considering that the result generated in another block is added to thereto after the multiplication, the equation is to be adjusted to have a margin of about Δ bit in the condition above. The result is as follows. It may be assumed that bm is the modulus bit, and a log upper bound of the error term is be (|e|<pow(2,be) bc<O). Equation 18 below may be an equation on the error term, and Equation 19 may be an equation on the modulus bit.
(22Λ+2((s
(2Λ+s0·Δ+δ)<bm [Equation 19]
Accordingly, Δ, δ, Δ may be set to satisfy the condition of n/M/Ng<2Δ and two inequations above.
Although the big bin mask has been formed by representing features of m−1 number as two columns in the above-described process, it may be possible to also represent as columns of arbitrary natural numbers of k-number.
When representing as columns of k-number, it is to be set as
and it is to be divided repeatedly with Q by k times in the big bin mask generating process, and the rest may be set to a bin value of a new column.
In this case, the result of multiplication may be
and a limiting formula on the error term and modulus bit above may be described as below.
(2kΛ+k2((s
In the large bin count process as described above, the homomorphic multiplication operation of n/M(Q2+Q) number may be used.
In order to calculate the statistic calculation, a comparison of values in the homomorphic ciphertext may be necessary. That is, it may be necessary to check whether the ciphertext value in the homomorphic ciphertext corresponds to a specific variable value.
Referring to
Specifically,
Referring to
The whole process is as follows. X,V may respectively be the encrypted data table and the v-bit table, and b may be a set of BinMask. {right arrow over (f)}, {right arrow over (c)} may represent the condition on the bin variable, ft may represent the index of the column seeking the average, and f may represent the index of the column temporarily generated in the process of seeking the average. k may be an iteration number of ApproxIn in an average operation.
The process above may be equivalent to multiplication of O(m) number being added to the Average process. Accordingly, multiplication of O(nb+2k+m) and rotation process of O(2 log 2(M)) number may be required. (nb is the block number of one column, k is the iteration number in an inverse process of the average operation, and M is the number of slots per one ciphertext.)
Further,
Referring to
The process above may be equivalent to the multiplication process of about O(m) being added to the previous variance process. Accordingly, multiplication of O(3nb+2k+m) and rotation operation of O(3 log 2 (M)) may be necessary. (nb is the block number of one column, k is the iteration number in an inverse process of the average operation, and M is the number of slots per one ciphertext.)
Pearson correlation coefficient between the two features f0, f1 of the data table may be operated. At this time, operation may be performed only on rows in which the values of the two features are all valid taking into reference the v-bit table. The correlation coefficient formula
may be used on the two variables X, Y. The algorithm seeking the correlation coefficient of the two features f0, f1 in the encrypted data table and the encrypted v-bit table in the homomorphic encryption method is as follows. The iteration number of ApproxIn in the operation process may be k2, and the iteration number of ApproxSqtInv may be k1. Likewise, when the value of all slots is ensured to be valid, it may be possible in even operations not using v-bit.
The difficult part in the above-described job is to find a reciprocal number of a number associated with the job. The reason finding the reciprocal number is difficult is that a range of values required in a reverse calculation is to be set, a parameter is to be set so that the result is not diverged in the range, and that approximation algorithm is mainly comprised of repeat algorithm. Accordingly, for the accuracy of the result, the number of repeating times is to be increased, but because calculation costs increase as the number of repeating times increases, an appropriate number of repeating times is to be performed. Because homomorphic ciphertext includes errors, after a certain number of operation times, a rebooting operation is to be performed.
Referring to
Referring to
In the above, only the operation of calculating the maximum value has been described, but it is also possible to calculate a minimum value through an operation of calculating a small value in the comparison process.
Referring to
Referring to
Then, the plurality of homomorphic ciphertext may be stored with the encrypted state of the plurality of variable data.
Then, the bin mask having different variable data classified on each of the homomorphic ciphertext may be generated (S3220). The generating of the bin mask as described above may be generated in the generation process of the homomorphic ciphertext as previously described, and may also be generated in the homomorphic ciphertext state. Further, the bin mask may be a bin mask including only one variable data per slot, and may be the expanded bin mask including whether the plurality of variable values is present, the power bin mask previously described, the big bin mask, or the like.
Then, number data corresponding to the variable combination may be generated by using the bin mask (S3230). Specifically, a count value matching a specific condition may be calculated by using the multiplication of the generated bin mask.
Then, the calculated number data may be output. The output as described above may be performed in the encrypted state, the process of decrypting the relevant data may be performed, and may be output as the decrypted result.
Accordingly, the encryption processing method according to an example embodiment may perform an efficient statistical operation on the homomorphic ciphertext. The ciphertext processing method as in
In addition, the ciphertext processing method as described above may be realized with a program including an algorithm executable in a computer, and the above-described program may be stored in a non-transitory computer readable medium and provided.
The non-transitory computer readable medium may refer to a medium that stores data semi-permanently rather than storing data for a very short time, such as a register, a cache, a memory, or the like, and is readable by a device. Specifically, programs for performing the various methods described above may be stored in the non-transitory computer readable medium such as, for example, and without limitation, a compact disc (CD), a digital versatile disc (DVD), a hard disc, a Blu-ray disc, a USB, a memory card, a ROM, and the like, and provided.
In addition, while the disclosure has been shown and described with reference to the example embodiments thereof, the disclosure is not limited to the specific example embodiments described above, and various modifications may be made therein by those skilled in the art to which this disclosure pertains without departing from the spirit and scope of the disclosure, and such modifications shall not be understood as separate from the technical concept or outlook of the present disclosure.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/KR2021/007517 | 6/15/2021 | WO |
Number | Date | Country | |
---|---|---|---|
63039086 | Jun 2020 | US |