The present invention relates to statistical processing of encrypted data in computer environment.
To enhance efficiency of costs for development and operational control of an information system, not only data processing utilizing an own information system but data processing by a server on a cloud provided by another organization are being generalized. The utilization of the cloud means that its own data is entrusted to the server managed by another organization. Therefore, the utilization of cryptographic technology is attracting attention to prevent information leakage.
For technique for preventing information leakage, entrusting the management of data to a server of another organization, an invention utilizing homomorphic encryption that enables operation is known. For example, in Patent Literature 1, a method of utilizing homomorphic encryption that enables computing addition and multiplication of cipher text and entrusting specific processing such as totaling to a server for data to be entrusted is disclosed.
Patent literature 1: Japanese Patent No. 5679018
There is demand for requesting a server on a cloud to analyze data. Data analysis is work for searching similarity of data, a trend and the like, and basic statistics such as an average and variance are required for operation for data analysis. However, there occurs a problem that when individual data is encrypted so as to preclude data on a cloud from being decrypted for prevention of information leakage, it becomes difficult to compute basic statistics. This reason is that division of cipher text is difficult. Therefore, heretofore, to acquire basic statistics such as an average and variance requiring division, it has been required to decrypt cipher text and to acquire plain text.
For example, when data analysis is made in a server to which data is entrusted using the technique disclosed in Patent Literature 1, it is required that cipher text is returned from the server to a client, the client applies division to the decrypted cipher text after the client decrypts the cipher text, the client encrypts the divided cipher text again and the client transmits the cipher text to the server again. That is, there occurs a problem that when computation of the basic statistics in the server to which the data is entrusted is tried, processing becomes intricate and a heavier load is applied to computing resources.
To settle the abovementioned problem, in a key generator according to the present invention, the sum of random numbers which is a key used when individual plaintext data is encrypted to a server is transmitted to a server as a key for decrypting the sum of ciphertext data, and the sum of ciphertext data is decrypted using the key.
Only a value of the sum of data is decrypted on the side of the server to be plain text. That is, it is enabled to compute basic statistics in the server without disclosing plain text of individual data to the server.
Referring to the drawings, embodiments of the present invention will be described in detail below. Hereby, the present invention is not limited to the embodiments. In the embodiments, the same reference numeral is allocated to the same member in principle and repeated description is omitted. First, terminology used in the embodiments of the present invention will be defined.
A divisor denotes a value to divide in modulo operation (computation for acquiring a residue of division).
“mod t” denotes modulo operation. Unless especially referred to, a result of modulo operation having t as a divisor is represented as integers [0, 1, 2, - - - , t−1] in this embodiment and a value below 0 (zero) or a value equal to or larger than “t” shall be not used.
Modulo operation is applied to an addition result of integers. For example, it is concluded that 4+5 mod 7=2.
Modulo operation is applied to a subtraction result of integers. For example, it is concluded that 4−5 mod 7=6.
Modulo operation is applied to a square result of an integer. For example, it is concluded that 42 mod 7=2.
In the present invention, basic statistics denote any of the sum, an average, variance and standard deviation which are representative statistics.
A database is one type of a management mode of data and in the present invention, may also be abbreviated as DB. In addition, when cipher text is stored as data, the database may also be abbreviated as encryption DB.
Registration PC 100 denotes any or all of registration PCs 100a, 100b, - - - , 100n.
An aggregated value decryption key is a generic name of a specific aggregated value decryption key for an average and an aggregated value decryption key for variance.
An encryption key is a generic name of an encryption key for an average and an encryption key for variance.
Data to be encrypted is called plain text. After the data is encrypted, it is called cipher text.
Plaintext space denotes a set of values which plain text can take. For example, when plaintext space is equal to or larger than 0 and below t, plain text can take any of integers [0, 1, 2, - - - , t−1].
Variance is one of statistical indexes showing a variation of data distribution. Variance Var of plain text Mi can be computed on the basis of an average of the plain text Mi and an average of plain text Mi2. It is generally concluded that Var(Mi)=Avg(Mi2)−Avg(Mi)2.
Like variance, standard deviation is one of statistical indexes showing a variation of data distribution. A square root of variance is standard deviation Std. That is, it is generally concluded that Std=√{square root over (Var)}.
A round function to an integer is a function for returning a value of the closest integer to an input real number. When x is an input value to the round function to an integer and y is an output value (the output value is an integer), the round function is represented as “” ( is the input value). At this time, y=“x” and it is concluded that |y−x|≦½.
This embodiment will be described as an example that this embodiment is applied to health care business. In the following, suppose that the third-party institution is a certification authority, the business entities A (1 to n) are medical checkup providers, the business entity B is a cloud service provider and the business entity C is a hospital.
The medical checkup provider entrusts its own information system to the management server 300 owned by the cloud service provider. In other words, the medical checkup provider entrusts medical checkup data acquired from a medical checkup examinee to the management server 300. The medical checkup data is individual information of the examinee and is required to be cautiously handled to prevent the data from being leaked to other people. The details of the medical checkup data will be described later referring to
Therefore, the medical checkup provider entrusts specified items in medical checkup data acquired from examinees and input to the registration PC 100a to the management server 300 of the cloud service provider after the medical checkup provider encrypts the specified items using an encryption key issued by the key distribution PC 000 of the certification authority. That is, no contents of the encrypted items in the individual medical checkup data are disclosed to the cloud service provider.
Since the hospital requires basic statistics data of disease based upon the medical checkup data, it requests the cloud service provider that stores the medical checkup data to transmit the basic statistics data. The details of the statistics data will be described later referring to
The management server of the cloud service provider computes the sum of cipher text of medical checkup data stored inside, decrypts the sum using an aggregated value decryption key issued by the key distribution PC 000 of the certification authority, and the management server acquires plaintext data of the sum. The details of the plaintext data of the sum will be described later referring to
Since division can be applied to plain text, a value acquired by dividing plain text of the sum by the number of medical checkup data becomes an average value which is representative basic statistics data. Afterward, the average value is disclosed to the hospital side, but individual medical checkup data is not disclosed. That is, the hospital has acquired basic statistics data without disclosing individual medical checkup data owned by the medical checkup provider to other people. As described above, computation of the sum of cipher text can be processed by the related art.
Data in each line shown in
Respective values stored in a “sum of ages” column and in a “sum of hospitalization periods” column in
Respective values stored in an “average age” column and in an “average hospitalization period (days)” column respectively shown in
The abovementioned medical checkup data are one example. If necessary, plural cipher texts may also be combined. For example, another item managed in the form of plain text may also be encrypted using block cipher, searchable cipher, public key cryptography and the like.
The registration PC 100, the read PC 200, and the management server 300 are also provided with the similar hardware configuration. The respective details are omitted because of repetition.
S110 is the processing of the registration PC for specifying a range or items of data to which statistical processing is to be applied. As described referring to
S120 is the processing for the registration PC to request the key distribution PC to transmit an encryption key as to the range or the items specified in S110.
S010 is the processing for generating a parameter for encryption statistical computation on the basis of the request received in S120. The details will be described later referring to
S020 is the processing for generating an encryption key D010 and an aggregated value decryption key D020 using the parameter for the encryption statistical computation generated in S010. The details will be described later referring to
S030 is the processing for transmitting the encryption key D010 generated in S020 to the registration PC 100 and for transmitting the aggregated value decryption key D020 generated in S020 to the management server.
S130 is the processing for generating cipher text D110 using the encryption key D010 transmitted in S030. The details will be described later referring to
S140 is the processing for transmitting the cipher text D110 generated in S130 to the management server.
S310 is the processing for transferring a state of the cipher text D110 transmitted in S140. The details will be described later referring to
S320 is the processing for computing basic statistics D210 of the cipher text D110 the state of which is transferred in S310. The details will be described later referring to
S210 is the processing for the read PC to request to transmit the basic statistics D210 computed in S320.
S330 is the processing for receiving the request in S110 and transmitting the basic statistics D210 to the read PC.
As described above, the management server 300 can acquire plain text of basic statistics useful for analysis without acquiring plain text of individual data.
The abovementioned procedure is one example to the end, and sequence and contents of processing may also be changed if necessary. For example, when the registration PC and the read PC are located in the same entity (in this example, the hospital is supposed), the processing in S330 may also be executed without waiting for the request processing in S210. In this case, it is also conceivable that the registration PC 100 and the read PC 200 are the same terminal and in that case, S210 is also executed by the registration PC 100.
In addition, as the processing in S320 is processing for dividing the plain text of the sum of object data which is output of S310 by the number of the data, the processing is not necessarily required to be executed in the management server. For example, the output in S310 is transmitted to the read PC 200 and processing equivalent to S320 may also be executed in the read PC 200.
Further, basic statistics of medical checkup data collected from plural medical checkup providers (the registration PCs 100a to 100n) can also be computed.
The management server 300 adds residues modulo P of t pieces of cipher texts Ci, subtracts a residue of S from the added value, and generates intermediate data SUM in which an encrypted state is released. That is, it is concluded that Sum=(C1+C2+, - - - , Ct)−S mod P (S211a). In this case, since plaintext space of the plain text Mi includes integers equal to or larger than 0 and below P/t, Sum is equal to the sum of t pieces of plain texts Mi. That is, it is concluded that Sum=M1+M2+, - - - , +Mt. That is, the expression means that the management server 300 can decrypt only a value of the sum of object data using the aggregated value decryption key S for an average.
The management server 300 adds residues modulo P of t pieces of cipher texts Di, subtracts a residue of L from the added value, and the management server generates intermediate data tmp in which an encrypted state is released. That is, it is concluded that tmp=(D1+D2, - - - , +Dt)−L mod P (S211b). In this case, since plaintext space of the plain text Mi2 includes integers equal to or larger than 0 and below P/t, tmp is equal to the sum of t pieces of plain texts Mi2. That is, it is concluded that tmp=M12+M22+, - - - , +Mt2. That is, the management server 300 can decrypt only a value of the sum of object data using the aggregated value decryption key L for variance.
By the abovementioned processing, the management server 300 can acquire the sum SUM, the average Avg, the variance Var and the standard deviation Std which are respectively basic statistics of the plain text Mi without acquiring values of the plain texts Mi, Mi2 from the cipher text Ci and the cipher text Di respectively received from the registration server, without acquiring an encryption key of individual data, and further without intricate exchanges with totaling PC 200.
The present invention is not limited to the abovementioned embodiment and variations are possible in a scope of its purport. For example, statistical computation is not limited to an average and variance and a statistic except them may also be computed. In addition, a value of data is not limited to an integer and may also be a real number.
In a second embodiment, as in the first embodiment, a method that a management server 300 can acquire only basic statistics from received individual cipher text in the form of plain text will be described. This embodiment is based upon a problem called LWE (Learning with Errors) and difficult from a viewpoint of a calculation amount in encryption.
A schematic diagram showing a system in the second embodiment is similar to that in the first embodiment. In addition, a configuration of key distribution PC 000, one or more registration PCs 100a to 100n, totaling PC 200 and the management server 300 is also similar to that of the first embodiment. Further, data transmission/reception and a process flow of a program in the key distribution PC 000, the registration PCs 100a to 100n, the management server 300, and the totaling PC 200 are also similar to those of the first embodiment.
Next, the difference in a process flow for generating a parameter for encryption statistical computation (S010) in the key distribution PC 000 from the first embodiment will be described.
The key distribution PC 000 generates a positive integer P, a positive integer N, a positive integer t and a positive integer m. However, t shall be a divisor of n. In addition, the key distribution PC 000 generates an N-dimensional vector A having random numbers equal to or larger than 0 and below P as an element (S011).
Next, the difference in a process flow of encryption key generation in the key distribution PC 000 (S020) from the first embodiment will be described.
The key distribution PC 000 generates N-dimensional vectors S1, S2, - - - , St and L1, L2, - - - , Lt respectively having random numbers equal to or larger than 0 and below P as an element (S021). Next, a value acquired by adding residues modulo P of the random numbers S1, S2, - - - , St shall be S and a value acquired by adding residues modulo P of the random numbers L1, L2, - - - Lt shall be L. That is, S=S1+S2+, - - - , +St mod P, and L=L1+L2+, - - - , +Lt mod P (S022). Finally, the random numbers S1, S2, - - - , St are output as an encryption key for an average, the random numbers L1, L2, - - - , Lt are output as an encryption key for variance, S is output as an aggregated value decryption key for an average, and L is output as an aggregated value decryption key for variance (S023).
A database which the key distribution PC 000 uses in transmitting the aggregated value decryption key and the encryption key is similar to that in the first embodiment.
The registration PC 100 computes an inner product <A, Si> of the N-dimensional vector A and an n-dimensional vector Si modulo P. Further, the registration PC 100 creates random numbers Ei that meet |Ei|<P/(2mt). Finally, the registration PC 100 adds respective residues modulo P of the inner product <A, Si>, the random numbers Ei, and plain text Mi multiplied by P/m so as to acquire cipher text Ci. That is, it is concluded that Ci=(Mi)P/m+<A, Si>+Ei mod P (S111a). At this time, plaintext space of the plain text Mi includes integers equal to or larger than 0 and below m/t. Next, the cipher text Ci is output (S112a).
|t/m(E1+E2+, - - - , +Et)|<½ on the basis of |Ei|<P/2mt and the sum of the random numbers Ei is nullified by the round function to an integer. Accordingly, SUM is a total value of plain texts Mi.
A process flow when the management server 300 computes an average value is similar to that in the first embodiment.
A process flow when the management server 300 computes variance or standard deviation is similar to that in the first embodiment.
Owing to the abovementioned processes, the management server 300 can acquire the sum SUM, the average Avg, the variance Var, and the standard deviation Std which are respectively the basic statistics of the plain text Mi without acquiring values of each plain text Mi, Mi2 from the cipher text Ci and the cipher text Di respectively received from the registration server and without intricate exchanges with the totaling PC 200.
In the present invention, as in the first embodiment, variations are possible in a scope of its purport. For example, in generation random numbers, for distribution of the random numbers, specific distribution such as Gaussian distribution and uniform distribution may also be used.
In the second embodiment, the management server 300 acquires basic statistics of the plain text Mi as plain text. However, basic statistics are computed in an encrypted state, and the basic statistics may also be concealed from the management server 300. For example, the key distribution PC 000 transmits a decryption key to the totaling PC 200 beforehand and the totaling PC 200 which receives encrypted basic statistics from the management server 300 may also decrypt them. In addition, the management server 300 may also be adjusted to partially disclose, such as not disclosing the sum to the totaling PC 200 but disclosing an average, by performing specific processing.
For example, to conceal a denominator t used for average computation, the management server 300 computes numbers which meet 1/t≈r/R, transmits the sum of cipher texts multiplied by r and R to the totaling PC 200, divides the sum of the acquired plain texts multiplied by r by R after the management server decrypts the cipher texts multiplied by r in the totaling PC 200, and computes an average value. At this time, since r is unidentifiable for the totaling PC 200, the totaling PC cannot acquire the sum of original plain texts from the sum of averages multiplied by r. In the meantime, the totaling PC can acquire the average having the denominator t by dividing by R.
For example, the management server 300 changes a value of a corresponding divisor from P to P/t and may also transmit cipher text converted for the value P/t of a new divisor to the totaling PC. At this time, the totaling PC decrypts the cipher texts using the divisor P/t and can acquire an average divided by t.
000: Key distribution PC 100, 100a, 100b, 100n: Registration PC 200: Totaling PC 300: Management server 400: Network 001: CPU (Central Processing Unit) 002: Memory 003: Storage 004: Internal communication line 005: Input device 006: Output device 007: Reader/Writer 008: Communication device
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2016/056322 | 3/2/2016 | WO | 00 |