SECURE RANDOM NUMBER CALCULATION SYSTEM, SECURE RANDOM NUMBER CALCULATION APPARATUS, SECURE RANDOM NUMBER CALCULATION METHOD, SECURE CLUSTER CALCULATION SYSTEM, SECURE CLUSTER CALCULATION APPARATUS, SECURE CLUSTER CALCULATION METHOD, AND PROGRAM

Description

TECHNICAL FIELD

The present invention relates to a secure computation technology, and more particularly to a technology for secure computation of a random number that can be used in the k-means++ method.

BACKGROUND ART

As a technology for classifying a plurality of pieces of data, there is a technology (hereinafter, referred to as a clustering technology) for classifying similar pieces of data into one cluster. Examples of the clustering technology include the k-means method described in Non Patent Literature 1 and the k-means++ method described in Non Patent Literature 2.

The secure computation is a method of obtaining an operation result of a designated operation without restoring an encrypted numerical value (for example, see Reference Non Patent Literature 1). In the method of Reference Non Patent Literature 1, encryption of distributing a plurality of pieces of information capable of restoring numerical values to three secure computation devices is performed, and thus the results of addition/subtraction, constant addition, multiplication, constant multiplication, logical operation (NOT, logical product, logical sum, exclusive logical sum), data format conversion (integer and binary number), and the like can be held in a state of being distributed to the three secure computation devices without restoring the numerical values, that is, in an encrypted state. In general, the distribution number is not limited to three and may be W (W is a predetermined constant of three or more), and a protocol that achieves secure computation by cooperative calculation by W secure computation devices is called a multi-party protocol.

(Reference Non Patent Literature 1: Koji Chida, Koki Hamada, Dai Igarashi, Katsumi Takahashi, “Keiryo kensho kano 3 party hitoku kansu keisan no saiko (in Japanese) (A Three-Party Secure Function Evaluation with Lightweight Verifiability Revisited)”, In CSS, 2010.)

CITATION LIST
Non Patent Literature

Non Patent Literature 1: John A Hartigan and Manchek A Wong, “Algorithm AS 136: A K-Means Clustering Algorithm”, Journal of the Royal Statistical Society. Series C (Applied Statistics), Vol. 28, No. 1, pp. 100-108, 1979.

Non Patent Literature 2: David Arthur and Sergei Vassilvitskii, “k-means++: the advantages of careful seeding”, Technical report, Stanford University, 2006.

SUMMARY OF INVENTION
Technical Problem

The k-means method is said to be a method in which generation of clusters greatly depends on how to provide an initial value. Therefore, when a completely random initial value is given to generate a cluster, a preferable result may not be obtained. Therefore, in the plaintext k-means method, the initial value is generated using the k-means++ method that can easily obtain a better result than giving a completely random initial value.

However, no secure computation method for the k-means++ method has been proposed so far. This is because a random number generation method using weighted probability distribution that can be used in the k-means++ method has not been proposed so far.

Therefore, an object of the present invention is to provide a technology for performing secure computation of a random number generation method using weighted probability distribution with high accuracy while keeping data secure.

Solution to Problem

An aspect of the present invention is a secure random number computation system having L and S being integers of 1 or more, including three or more secure random number computation devices, configured to compute a share ([[r₁]], . . . , [[r_S]]) of a vector (r₁, . . . , r_S) (where r_i(i=1, . . . , S) is equal to one of output possibility values x₁, . . . , x_L) having an output value as an element, from a share ([[x₁]], . . . , [[x_L]]) of a vector (x₁, . . . , x_L) having an output possibility value as an element and a share ([[p₁]], . . . , [[p_L]]) of a vector (p₁, . . . , p_L) (where p_i(i=1, . . . , L) is a probability that the output possibility value x_iis output, and satisfies Σp_i=1) having an output probability as an element, the secure random number computation system including: first vector computation means that computes a share ([[p′₁]], . . . , [[p′_L]]) of a vector (p′₁, . . . , p′_L) from the share ([[p₁]], . . . , [[p_L]]) of the vector (p₁, . . . , p_L) by ([[p′₁]], . . . [[p′_L]])=prefix_sum (([[p₁]], . . . , [[p_L]])); uniform random number generation means that generates a share ([[q₁]], . . . , [[q_S]]) of a vector (q₁, . . . , q_S) (where q_i(i=1, . . . , S) is a uniform random number, and satisfies 0≤q_i≤1) having a uniform random number as an element; and random number computation means that computes a share ([[r₁]], . . . , [[r_S]]) of a vector (r₁, . . . , r_S) having an output value as an element from the share ([[p′₁]], . . . , [[p′_L]]) of the vector (p′₁, . . . , p′_L), the share ([[x₁]], . . . , [[x_L]]) of the vector (x₁, . . . , x_L), and the share ([[q₁]], . . . , [[q_S]]) of the vector (q₁, . . . , q_S) by ([[r₁]], . . . , [[r_S]])=map (([[p′₁]], . . . , [[p′_L]]), ([[x₁]], . . . , [[x_L]]), ([[q₁]], . . . , [[q_S]])).

Advantageous Effects of Invention

According to the present invention, it is possible to perform secure computation of a random number generation method using weighted probability distribution with high accuracy while keeping data secure.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a diagram illustrating a situation of groupBySum operation.

FIG. 2 is a diagram illustrating a situation of groupByCount operation.

FIG. 3 is a diagram illustrating an example of a cluster ID table.

FIG. 4 is a diagram illustrating an example of a data table.

FIG. 5 is a diagram illustrating an example of a centroid table.

FIG. 6 is a diagram illustrating an example of a distance table.

FIG. 7 is a block diagram illustrating a configuration of a secure random number computation system 10.

FIG. 8 is a block diagram illustrating a configuration of a secure random number computation device 100_i.

FIG. 9 is a flowchart illustrating operation of the secure random number computation system 10.

FIG. 10 is a block diagram illustrating a configuration of a secure cluster computation system 20.

FIG. 11 is a block diagram illustrating a configuration of a secure cluster computation device 200_i.

FIG. 12 is a flowchart illustrating operation of the secure cluster computation system 20.

FIG. 13 is a block diagram illustrating a configuration of a centroid table initialization unit 210_i.

FIG. 14 is a flowchart illustrating operation of centroid table initialization means 210.

FIG. 15 is a diagram illustrating an example of a functional configuration of a computer that implements each device according to an embodiment of the present invention.

DESCRIPTION OF EMBODIMENTS

Hereinafter, an embodiment of the present invention will be described in detail. Note that components having the same functions are denoted by the same reference numerals, and redundant description will be omitted.

Prior to the description of each embodiment, a notation method in the present specification will be described.

{circumflex over ( )} (caret) represents a superscript. For example, x^{y{circumflex over ( )}z}represents that y^zis a superscript for x, and x^{y{circumflex over ( )}z}represents that y^zis a subscript for x. Furthermore, _ (underscore) represents a subscript. For example, x_{y_z}represents that y^zis a superscript for x, and x^y_zrepresents that y^zis a subscript for x.

A superscript “{circumflex over ( )}” or “˜” such as {circumflex over ( )}x or ˜x for a certain letter x would normally be placed directly above “x”, but is written as {circumflex over ( )}x or ˜x due to restrictions on notation in the specification.

TECHNICAL BACKGROUND
<<Secure Computation>>

The secure computation in the invention of the present application is constructed by a combination of existing secure computation operations. The operations necessary for the secure computation are concealment, addition, subtraction, multiplication, division, logical operation (NOT, logical product, logical sum, exclusive logical sum), comparison operation (=, <, >, ≤, ≥), secure sorting, secure unique check, Group-by operation, prefix_sum (prefix sum), and secure collective mapping. Hereinafter, some operations including the notation will be described.

[Concealment]

It is assumed that [[x]] is a value (hereinafter, referred to as a share of x) obtained by concealing x by secure distribution. Any known method can be used as the secure distribution method. For example, the Shamir's secure distribution on GF (2⁶¹−1) and the duplication secure distribution on Z₂can be used.

A plurality of secure distribution methods may be used in combination in one algorithm. In this case, mutual conversion is appropriately performed.

For the N-dimensional vector ^{{right arrow over ( )}}x=(x₁, . . . , x_N), [[^{{right arrow over ( )}}x]]=([[x₁]], . . . , [[x_N]]) is set. That is, [[^{{right arrow over ( )}}x]] is a vector having the share [[x_n]] of the n-th element x_nof ^{{right arrow over ( )}}x as the n-th element. As similar to this, also for the M×N matrix A=(a_m,n) (1≤m≤M, 1≤n≤N), [[A]] is set as a matrix having the share [[a_m,n]] of the (m, n)th element a_m,nOf A as the (m, n)-th element.

x is referred to as a plaintext of [[x]].

As a method of obtaining [[x]] from x (concealment) and a method of obtaining x from [[x]] (restoration), specifically, there are methods described in Reference Non Patent Literature 1 and Reference Non Patent Literature 2.

(Non Patent Literature 2: Shamir, A., “How to share a secret”, Communications of the ACM, Vol. 22, No. 11, pp. 612-613, 1979.)

[Addition, Subtraction, Multiplication, and Division]

The addition [[x]]+[[y]] by secure computation uses [[x]], [[y]] as inputs and outputs [[x+y]]. The subtraction [[x]]−[[y]] by secure computation uses [[x]], [[y]] as inputs, and outputs [[x−y]]. The multiplication [[x]]> [[y]] (sometimes represented as mul ([[x]], [[y]])) by secure computation uses [[x]], [[y]] as inputs, and outputs [[x×y]]. The division [[x]]/[[y]] (sometimes represented as div ([[x]], [[y]])) by the secure computation uses [[x]], [[y]] as inputs, and outputs [[x/y]].

As specific methods of addition, subtraction, multiplication, and division, there are methods described in Reference Non Patent Literature 3 and Reference Non Patent Literature 4.

(Reference Non Patent Literature 3: Ben-Or, M., Goldwasser, S. and Wigderson, A., “Completeness theorems for non-cryptographic fault-tolerant distributed computation”, Proceedings of the twentieth annual ACM symposium on Theory of computing, ACM, pp. 1-10, 1988.)
(Reference Non Patent Literature 4: Gennaro, R., Rabin, M. O. and Rabin, T., “Simplified VSS and fast-track multiparty computations with applications to threshold cryptography”, Proceedings of the seventeenth annual ACM symposium on Principles of distributed computing, ACM, pp. 101-111, 1998.)

For the share [[^{{right arrow over ( )}}x]]=([[x₁]], . . . , [[x_N]]) of the N-dimensional vector ^{{right arrow over ( )}}x=(x₁, . . . , x_N) and the share [[^{{right arrow over ( )}}y]]=([[y₁]], . . . , [[y_N]]) of the N-dimensional vector ^{{right arrow over ( )}}y=(y₁, . . . , y_N), the division [[^{{right arrow over ( )}}x]]/[[^{{right arrow over ( )}}y]] by the secure computation is set as [[^{{right arrow over ( )}}x/^{{right arrow over ( )}}y]]=([[x₁/y₁]], . . . , [[x_N/y_N]]). [[^{{right arrow over ( )}}x]]/[[^{{right arrow over ( )}}y]] is also referred to as a value obtained by dividing [[^{{right arrow over ( )}}x]] by [[^{{right arrow over ( )}}y]].

[Logical Operation]

Not [[x]] based on secure computation uses [[x]] as inputs, and outputs [not (x)]. The logical product and ([[x]], [[y]]) by the secure computation uses [[x]], [[y]] as inputs, and outputs [[and (x, y)]]. The logical sum or ([[x]], [[y]]) by secure computation uses [[x]], [[y]] as inputs and outputs [[or (x, y)]]. An exclusive OR xor ([[x]], [[y]]) by secure computation uses [[x]], [[y]] as inputs and outputs [[xor (x, y)]].

The logical operation can be easily configured by combining addition, subtraction, multiplication, and division.

[Comparison Operation]

The equal sign determination=([[x]], [[y]]) (sometimes represented as equal ([[x]], [[y]]) by secure computation uses [[x]], [[y]] as inputs, and outputs [[1]] when x=y, and outputs [[0]] otherwise. The comparison < ([[x]], [[y]]) by secure computation uses [[x]], [[y]] as inputs, and outputs [[1]] when x<y, and [[0]] otherwise. The comparison by secure computation > ([[x]], [[y]]) uses [[x]], [[y]] as inputs, and outputs [[1]] when x>y, and [[0]] otherwise. The comparison ≤([[x]], [[y]]) by secure computation uses [[x]], [[y]] as inputs, and outputs [[1]] when x≤y, and outputs [[0]] otherwise. The comparison ≥([[x]], [[y]]) by secure computation uses [[x]], [[y]] as inputs, and outputs [[1]] when x≥y, and outputs [[0]] otherwise.

The comparison operation can be easily configured by combining logical operations.

[Secure Sorting]

The secure sorting uses the share [[^{{right arrow over ( )}}x]] of the N-dimensional vector ^{{right arrow over ( )}}x=(x₁, . . . , x_N) as inputs, and outputs a vector sort ([[^{{right arrow over ( )}}x]]): =([[x_{i__1}]], . . . , [[x_{i__N}]]) (where x_{i_1}, . . . , and x_{i_N}satisfy x_{i_1}≤x_{i_2}≤ . . . ≤x_{i_N}) obtained by sorting the elements [[x₁]], . . . , [[x_N]] of [[^{{right arrow over ( )}}x]] in ascending order. For a table T in which each element of any attribute is concealed, a table in which the table T is subjected to secure sorting with the attribute α of the table T as a key is a table exchanged record by record so that the values of the elements of the attribute α are in ascending order from the first record.

As a specific method of secure sorting, there is a method described in Reference Non Patent Literature 5.

(Reference Non Patent Literature 5: Hiroshi Igarashi, Hiroki Hamada, Ryo Kikuchi, Koji Chida, “A Design and an Implementation of Super-High-Speed Multi-Party Sorting: The Day When Multi-Party Computation Reaches Scripting Languages”, Computer Security Symposium (CSS), 2017.)

[Secure Unique Check]

The secure unique check uses the share [[^{{right arrow over ( )}}x]] of the N-dimensional vector ^{{right arrow over ( )}}x=(x₁, . . . , x_N) as an input, and outputs a vector unique_check ([[^{{right arrow over ( )}}x]]): =([[x_{i_1}]], . . . [[x_{i_N}]]) (where x_{i_1}, x_{i_N}are either 1 or 0) substituted with [[1]] for the element that appears in the first time and [[0]] for the elements that appear in the second and subsequent times when the same value appears twice or more in the elements [[x₁]], . . . , [[x_N]] of [[^{{right arrow over ( )}}x]]. For example, executing the secure unique check on the vector ([[1]], [[2]], [[2]], [[3]], [[3]], [3]]) results in ([[1]], [[1]], [[0]], [[1]], [[0]], [0]]).

The secure unique check can be easily configured by combining secure sorting and comparison operation.

[Group-by Operation]

The Group-by operation is an operation for obtaining a statistical value (for example, average height for each gender) grouped for each value of an element of a key attribute in a case where a table has a key attribute (for example, gender) and a value attribute (for example, height) and each element of any attribute of the table is concealed. A case in which the computed statistical value is a sum is a groupBySum operation, and a case in which the computed statistical value is a frequency is a groupByCount operation.

FIG. 1 is a diagram illustrating an example of groupBySum operation. As can be seen from the example of FIG. 1, in the table representing the result of the groupBySum operation, the values of the elements of the key attribute are arranged in ascending order from the first record, and the sum of the elements of the value attribute 1 and the sum of the elements of the value attribute 2 are obtained for each value of the element of the key attribute. FIG. 2 is a diagram illustrating an example of groupByCount operation. As can be seen from the example of FIG. 2, in the table representing the result of the groupByCount operation, the values of the key attribute are arranged in ascending order from the first record, and the number of the elements of the value attribute 1 and the number of the elements of the value attribute 2 are obtained for each value of the element of the key attribute.

As a specific method of Group-by operation, there is a method described in Reference Non Patent Literature 6.

(Reference Non Patent Literature 6: Ryo Kikuchi, Koki Hamada, Dai Ikarashi, Gen Takahashi, Katsumi Takahashi, “Oudanteki dousen bunseki wo himitsu keisan de yattemiyou (Let's Conduct Cross-Sectional Traffic Line Analysis by Secure Computation)”, 2020 Symposium on Cryptography and Information Security, 3C2-1, 2020.)

In the Group-by operation disclosed in Reference Non Patent Literature 6, various Group-by operations are efficiently performed by using a groupByCommon operation.

[prefix_sum (Prefix Sum)]

prefix_sum uses the share [[^{{right arrow over ( )}}x]] of the N-dimensional vector ^{{right arrow over ( )}}x=(x₁, . . . , x_N) as an input, and outputs the share [[^{{right arrow over ( )}}y]]=([[y₁]], . . . , [y_N]) (where y_i=Σ_j=1ⁱx_j) of the N-dimensional vector y.

[Secure Collective Mapping]

The secure collective mapping is a function of calculating a lookup table, and can arbitrarily determine two vectors used to define the lookup table. Since secure collective mapping performs processing in units of vectors, there is a property that efficiency is high when the same processing is performed on a plurality of inputs. The secure collective mapping is a function map defined as follows.

The secure collective mapping uses the share [[^{{right arrow over ( )}}a]]=([[a₁]], . . . , [[a_K]]) of the vector ^{{right arrow over ( )}}a=(a₁, . . . , a_K), the share [[^{{right arrow over ( )}}b]]=([[b₁]], . . . , [[b_K]]) of the vector ^{{right arrow over ( )}}b=(b₁, . . . , b_K) (where a₁, . . . , a_K, b₁, . . . , b_Kare real numbers, and satisfy a₁< . . . <a_K), and the share [^{{right arrow over ( )}}x]=([[x₁]], . . . , [[x_N]]) of the vector ^{{right arrow over ( )}}x=(x₁, . . . , x_N) as inputs, and outputs [[^{{right arrow over ( )}}y]]: =([[y₁]], . . . , [y_N]) such that a_p<x_n≤a_p+1and y_n=b_pare satisfied for 1≤n≤N, that is, the share obtained by mapping the share of each element of the vector ^{{right arrow over ( )}}x. At this time, [[^{{right arrow over ( )}}y]]=map ([[^{{right arrow over ( )}}a]], [[^{{right arrow over ( )}}b]], [[^{{right arrow over ( )}}x]]) is expressed. The vector ^{{right arrow over ( )}}a and the vector ^{{right arrow over ( )}}b are two vectors used to define the lookup table. For example, when the shares of the two vectors ^{{right arrow over ( )}}a and ^{{right arrow over ( )}}b used for defining the lookup table are ([[50]], [[80]], [[100]]), ([[1]], [[2]], [3]), respectively, map (([[50]], [80], [[100]]), ([[1]], [[2]], [[3]]), ([[5]], [[68]], [[91]]))=([[1]], [[2]], [[3]]) is obtained.

As the secure collective mapping, for example, the algorithm described in Reference Non Patent Literature 7 can be used.

(Reference Non Patent Literature 7: Koki Hamada, Dai Igarashi, Koji Chida, “A Batch Mapping Algorithm for Secure Function Evaluation”, IEICE A, Vol. J96-A, No. 4, pp. 157-165, 2013.)

<<k-means Method>>

The k-means method is one of machine learning methods classified as unsupervised learning. In supervised learning such as regression analysis and class classification, desired output (called training data) is prepared, and its object is to construct a model that reproduces the output with high accuracy, whereas in unsupervised learning such as clustering, desired output is not determined in advance.

In clustering, distances between a plurality of pieces of given data are computed, and pieces of data having close distances are put together into clusters as similar data. The clustering includes a non-hierarchical method in which the number of clusters to be generated is determined in advance such as the k-means method, and a hierarchical method in which the number of clusters to be generated is not determined in advance, and clusters are sequentially formed from the most similar pieces of data (that is, pieces of data having a minimum distance). The k-means method has a feature that the amount of calculation is less likely to increase even when large-scale data is clustered as compared with the hierarchical method, and thus, is often used for data of a scale that cannot be handled by the hierarchical method.

The flow of processing in the k-means method is as follows.

- (1) The number K of clusters to be generated and M pieces of data to be clustered are input. Each data is represented by an N-dimensional vector. Each dimension represents a feature amount of data.
- (2) The initial values of the K centroids are set. Here, the centroid is a centroid of data included in the cluster. As a method of setting an initial value, there are a Forgy method, in which K pieces of data randomly selected from M pieces of data are set as centroids, and a k-means++ method.
- (3) For each pair of data and centroid, the distance is computed. As the distance, for example, a Euclidean distance is used.
- (4) A cluster having a minimum distance to the centroid is assigned to each data as a cluster to which the data belongs.
- (5) For each cluster, a centroid is computed.
- (6) When a predetermined end condition is satisfied, the cluster assignment result with respect to the data obtained in (4), that is, the information representing the cluster to which the data belongs is output, and in other cases, the processing of (3) to (5) is performed. The predetermined end condition is, for example, a condition of whether each of the K centroids has converged to a certain position, or a condition of whether the number of executions of the processing of (3) to (5) has reached a predetermined number of times.
  
  <<k-means++ Method>>

In the k-means++ method, the initial value of the centroid is set on the basis of the idea that the centroids selected from the data as the initial value are preferably separated from each other as much as possible.

The flow of processing in the k-means++ method is as follows.

- (1) One piece of data randomly selected from the M pieces of data is set as a first centroid.
- (2) For data that is not selected as a centroid, a distance to each centroid is computed, and the square of the distance to the centroid at which the distance to the data is minimum is computed.
- (3) With respect to the square of the distance computed in (2), a value obtained by dividing the square of the distance by the sum of the squares of the distance is obtained, and a vector having the divided value as an element is generated.
- (4) Using a random number generation method using weighted probability distribution, one data ID of data not selected as a centroid is selected from a vector having a data ID of data not selected as a centroid as an element and the vector generated in (3), and data of the selected data ID is set as the centroid.
- (5) The processing of (2) to (4) are performed until K centroids are selected.

<<Random Number Generation Method Using Weighted Probability Distribution>>

In the random number generation method using the weighted probability distribution, a vector (x₁, . . . , x_L) having an output possibility value as an element and a vector (p₁, . . . , p_L) (where p_i(i=1, . . . , L) is a probability that the output possibility value x_iis output, and Σp_i=1 is satisfied) having an output probability as an element are input, and a vector (r₁, . . . , r_S) (where r_i(i=1, . . . , S) is equal to one of the output possibility values x₁, . . . , x_L) having an output value as an element is output. Here, r_i(i=1, . . . , S) is selected according to the distribution of the output probability. For example, when a random number is generated using a vector (1, 2, 3) having an output possibility value as an element and a vector (0.5, 0.3, 0.2) having an output probability as an element, l is output with a probability of 0.5, 2 is output with a probability of 0.3, and 3 is output with a probability of 0.2.

<<Secure k-Means Method>>

The secure k-means method is a method of performing secure computation of the k-means method, and the method is secure except for the number M of data and the number K of clusters. In a case where the convergence determination condition as described above is used as the end condition, it is necessary to decode one-bit information indicating “whether each of the K centroids has converged to a certain position” for each convergence determination, but all the other information is processed while being concealed. For example, the M pieces of data, the K pieces of centroids, the distance between the data and the centroid, the information indicating the cluster to which the data belongs, and the number of pieces of data included in the cluster are processed while being concealed.

Next, a table handled by the secure k-means method will be described. In the k-means method, it is necessary to handle information indicating a cluster to which data belongs. Here, as illustrated in FIG. 3, management is performed by using a table (hereinafter, referred to as a cluster ID table) in which data IDs of M pieces of data and cluster IDs of K clusters are associated on a one-to-one basis. It is assumed that all IDs are dispensed in order from 1. The M pieces of data are managed using a table (hereinafter, referred to as a data table) in which a data ID and data of the data ID are associated with each other as illustrated in FIG. 4. The K centroids are managed using a table (hereinafter, referred to as a centroid table) in which a cluster ID and a centroid of a cluster of the cluster ID are associated with each other as illustrated in FIG. 5. Among the three tables, the data table is not rewritten in the middle of the processing, while the cluster ID table and the centroid table are appropriately rewritten.

The processing flow in the secure k-means method is the same as the processing flow in the k-means method, and is different only in whether secure computation is performed. The way of performing the processing of (3) to (5) using the above three tables will be described below.

[Calculation of Distance Between Data and Centroid]

Here, processing of (3) will be described. In the processing, the distance between the data of the data ID and the centroid of the cluster of the cluster ID is computed for all combinations of the data ID and the cluster ID using the data table and the centroid table. However, in order to facilitate the subsequent processing, the calculation results are summarized in a distance table as illustrated in FIG. 6.

[Assignment of Clusters]

Here, the processing of (4) will be described. In the processing, the cluster ID table is updated using the distance table obtained in the processing of (3) such that a cluster having the minimum distance to the centroid is paired with each data. That is, in the cluster ID table, the cluster ID paired with the data ID is the cluster ID of the cluster including the centroid having the minimum distance from the data of the data ID. For this purpose, a cluster ID of a cluster having the minimum distance to the data of the data ID for each data ID is extracted from the distance table. Specifically, the following processing is performed.

- (4-1) The distance table is secure sorted by using the distance attribute as a key.
- (4-2) The secure unique check is performed on the element column of the data ID attribute of the table generated in (4-1), and a table, in which the element column obtained as a result of the secure unique check is added to the table generated in (4-1) as the checked data ID attribute, is generated.
- (4-3) Only records in which the value of the element of the checked data ID attribute of the table generated in (4-2) is equal to [[1]] are extracted, and a table including these records is generated.
- (4-4) The table generated in (4-3) is subjected to secure sorting with the data ID attribute as a key.

A table obtained by extracting the element column of the data ID attribute and the element column of the cluster ID attribute from the table generated in (4-4) is a cluster ID table in which a cluster having the minimum distance to the centroid for each data is paired.

[Calculation of Centroid]

Here, the processing of (5) will be described. In this process, the centroid table is updated using the cluster ID table obtained in the processing of (4). The centroid is an average of data included in each cluster. Therefore, if the sum of data included in the cluster and the number of data included in the cluster can be obtained for each cluster, the centroid can be obtained. Since each data is expressed as a vector, the centroid can be efficiently obtained by using the groupBySum operation and the groupByCount operation. Specifically, the following processing is performed.

- (5-1) The element column of the cluster ID attribute of the cluster ID table is extracted, and the element column of the data ID attribute of the data table is replaced with the element column.
- (5-2) The groupBySum operation is performed on the table generated in (5-1) using the data ID attribute as a key.
- (5-3) The groupByCount operation is performed on the table generated in (5-1) using the data ID attribute as a key.
- (5-4) A value obtained by dividing the value of the element of the data attribute of each record of the table generated in (5-2) by the value of the element of the data attribute of each record of the table generated in (5-3) is obtained.

The value obtained in (5-4) is the centroid of each cluster.

As can be seen from the above description, by performing cluster assignment using secure sorting and secure unique check and performing centroid calculation using groupBySum operation and groupByCount operation, it is possible to safely and efficiently perform secure computation of the k-means method.

<<Secure Computation of Random Number Generation Method Using Weighted Probability Distribution>>

The input and output in the secure computation of the random number generation method using the weighted probability distribution are as follows.

(Input)

- (1) A share ([[x₁]], . . . , [[x_L]]) of a vector (x₁, . . . , x_L) having an output possibility value as an element
- (2) A share ([[p₁]], . . . , [[p_L]]) of a vector (p₁, . . . , p_L) (where p_i(i=1, . . . , L) is a probability that the output possibility value x_iis output, and Σp_i=1 is satisfied) having an output probability as an element
- (3) Number of output values S

(Output)

- (1) A share ([[r₁]], . . . , [r_S]]) of a vector (r₁, . . . , r_S) (where r_i(i=1, . . . , S) is equal to one of the output possibility values x₁, . . . , x_L) having an output value as an element

Here, the number S of output values may be the plaintext.

This secure computation is very similar in terms of the processing structure of secure collective mapping that is computed using two vectors for defining the lookup table. Therefore, the secure computation is achieved using the secure collective mapping. A flow of the processing will be described below.

- (1) From the share ([[p₁]], . . . , [[p_L]]) of the vector (p₁, . . . , p_L), the share ([[p′₁]], . . . , [[p′_L]]) of the vector (p′₁, . . . , p′_L) is computed by ([[p′₁]], . . . , [[p′_L]])=prefix_sum (([[p₁]], . . . , [[p_L]])). The share ([[p′₁]], . . . , [[p′_L]]) corresponds to the share [{right arrow over ( )}a] in the secure collective mapping.
- (2) A share ([[q₁]], . . . , [[q_S]]) of a vector (q₁, . . . , q_S) (where q_i(i=1, . . . , S) is a uniform random number, and satisfies 0≤q_i≤1) having a uniform random number as an element is generated.
- (3) From the share ([[p′₁]], . . . , [[p′_L]]) of the vector (p′₁, . . . , p′_L), the share ([[x₁]], . . . , [[x_L]]) of the vector (x₁, . . . , x_L), and the share ([[q₁]], . . . , [[q_S]]) of the vector (q₁, . . . , q_S), the share ([[r₁]], . . . , [r_S]]) of the vector (r₁, . . . , r_S) with the output value as an element is computed by ([[r₁]], . . . , [r_S]])=map (([[p′₁]], . . . , [[p′_L]]), ([[x₁]], . . . , [[x_L]]), ([[q₁]], . . . , [[q_S]])).

The output values r₁, . . . , Is are random numbers generated using the weighted probability distribution.

When ([[x₁]], [[x₂]], [[x₃]])=([[1]], [[2]], [[3]]), ([[p₁]], [[p₂]], [[p₃]])=([[0.5]], [[0.3]], [[0.2]]), ([[p′₁]], [[p′₂]], [[p′₃]])=([[0.5]], [[0.8]], [[1.0]]) is obtained. Since q_iis a uniform random number satisfying 0≤q_i≤1, the probability that the uniform random number q_iis included in the section [0, 0.5] is 0.5, the probability that the uniform random number q_iis included in the section [0.5, 0.8] is 0.3, and the probability that the uniform random number q is included in the section [0.8, 1.0] is 0.2, and the share ([[r₁]], . . . , [[r_S]]) of the random numbers according to the weighted probability distribution is obtained by map (([[p′₁]], [[p′₂]], [[p′₃]]), ([[x₁]], [[x₂]], [[x₃]]), ([[q₁]], . . . , [[q_S]]))=map (([[0.5]], [[0.8]], [[1.0]]), ([[1]], [[2]], [[3]]), ([[q₁]], . . . , [[q_S]])).

In general, each element of the vector (p₁, . . . , p_L) is a decimal, and thus the calculation cost of the secure computation is very large. Therefore, for example, if secure computation is performed after each element of the vector (p₁, . . . , p_L) and the vectors (q₁, . . . , q_S) is converted into an integer by multiplying each element by 100, the calculation cost of the secure computation can be suppressed.

As can be seen from the above description, it is possible to safely and efficiently perform secure computation of the random number generation method using the weighted probability distribution by using the secure collective mapping.

<<Secure k-means++ Method>>

In the k-means++ method, the weighted probability distribution is used in the processing of (4), and the other processing can be achieved by basic operations such as addition, subtraction, multiplication, and division. Therefore, here, the processing of (4) will be briefly described.

A share ([[k₁]], . . . , [k_M−j]) of a vector (k₁, . . . , k_M−j) having a data ID of data not selected as a centroid as an element is set as a vector having an output possibility value as an element. A share ([d_{k_1}²/Σd_{k_m}²], . . . , [d_{k_(M−j)}²/Σd_{k_n}²]]) of a vector (d_{k_1}²/Σd_{k_m}², . . . d_{k_(M−j)}²/Σd_{k_m}²). (where d_{k_m}is the smallest distance from a distance between the centroid specified by the first record of the centroid table and the data of the data ID k_nto a distance between the centroid specified by the j-th record of the centroid table and the data of the data ID k_n) is a vector with the output probability as an element. Furthermore, the number of output values is one.

By performing secure computation of the random number generation method using the weighted probability distribution using these values as inputs, a share [[i_j+1]] of the output value i_j+1equal to any of the data IDs of the data not selected as the centroid is obtained. That is, the share ([[x_{i_(j+1)1}]], . . . [x_{i_(j+1)N}]]) of the data (x_{i_(j+1)1}, . . . , x_{i_(j+1)N}) having the data ID i_j+1is the share of the newly obtained centroid.

As can be seen from the above description, it is possible to safely and efficiently perform secure computation of the k-means++ method by using the random number generation method using the weighted probability distribution.

First Embodiment

The secure random number computation system 10 will be described below with reference to FIGS. 7 to 9. FIG. 7 is a block diagram illustrating a configuration of the secure random number computation system 10. The secure random number computation system 10 includes W (W is a predetermined integer of 3 or more) secure random number computation devices 100₁, 100_W. The secure random number computation devices 100₁, 100_Ware connected to the network 800 and can communicate with each other. The network 800 may be, for example, a communication network such as the Internet, a broadcasting channel, or the like. FIG. 8 is a block diagram illustrating a configuration of a secure random number computation device 100_i(1≤i≤W). FIG. 9 is a flowchart illustrating operation of the secure random number computation system 10.

As illustrated in FIG. 8, the secure random number computation device 100_iincludes a first vector computation unit 110_i, a uniform random number generation unit 120_i, a random number computation unit 130_i, and a recording unit 190_i. Each component of the secure random number computation device 100_iexcept for the recording unit 190_iis configured to be able to execute operation required for secure computation, that is, operation required for achieving the function of each component among at least concealment, addition, subtraction, multiplication, division, prefix_sum (prefix sum), and secure collective mapping. As a specific functional configuration for achieving each operation in the present invention, for example, a configuration capable of executing an existing algorithm including the algorithms disclosed in each of Reference Non Patent Literatures 1 to 4 and 7 is sufficient, and since these are conventional configurations, a detailed description thereof will be omitted. The recording unit 190_iis a component that records information necessary for the processing of the secure random number computation device 100_i. For example, the recording unit 190_imay record in advance a share ([[x₁]], . . . [[x_L]]) of a vector (x₁, . . . , x_L) having an output possibility value as an element and a share ([[p₁]], . . . , [[p_L]]) of a vector (p₁, . . . , p_L) (where p_i(i=1, . . . , L) is a probability that the output possibility value x; is output, and Σp_i=1 is satisfied) having an output probability as an element.

Through cooperative calculation by the W secure random number computation devices 100_i, the secure random number computation system 10 achieves secure computation of a random number generation method using weighted probability distribution that is a multi-party protocol. Therefore, the first vector computation means 110 (not illustrated) of the secure random number computation system 10 includes the first vector computation units 110₁, 110_W, the uniform random number generation means 120 (not illustrated) includes the uniform random number generation units 120₁, . . . , 120_W, and the random number computation means 130 (not illustrated) includes the random number computation units 130₁, . . . , 130_W.

The secure random number computation system 10 sets L and S to integers of 1 or more, and calculates a share ([[r₁]], . . . , [[r_S]]) of a vector (r₁, . . . , r_S) (where r_i(i=1, . . . , S) is equal to one of output possibility values x₁, . . . , x_L) having an output value as an element from a share ([[x₁]], . . . , [[x_L]]) of a vector (x₁, . . . , x_L) having an output possibility value as an element and a share ([[p₁]], . . . , [[p_L]]) of a vector (p₁, . . . , p_L) (where p_i(i=1, . . . , L) is a probability that the output possibility value x_iis output, and Σp_i=1 is satisfied) having an output probability as an element.

The operation of the secure random number computation system 10 will be described with reference to FIG. 9.

In S110, the first vector computation means 110 computes a share ([[p′₁]], . . . , [[p′_L]]) of a vector (p′₁, . . . , p′_L) from the share ([[p₁]], . . . , [[p_L]]) of the vector (p₁, . . . , p_L) by ([[p′₁]], . . . , [[p′_L]])=prefix_sum (([[p₁]], . . . , [[p_L]])).

In S120, the uniform random number generation means 120 generates a share ([[q₁]], . . . , [[q_S]]) of a vector (q₁, . . . , q_S) (where q_i(i=1, . . . , S) is a uniform random number, and satisfies 0≤q_i≤1) having a uniform random number as an element.

In S130, the random number computation means 130 computes a share ([[r₁]], . . . , [[r_S]]) of a vector (r₁, . . . r_S) having an output value as an element from the share ([[p′₁]], . . . , [[p′_L]]) of the vector (p′₁, . . . , p′_L) computed in S110, the share ([[x₁]], . . . , [[x_L]]) of the vector (x₁, . . . , x_L), and the share ([[q₁]], . . . , [[q_S]]) of the vector (q₁, . . . , q_S) generated in S120 by ([[r₁]], . . . , [[r_S]])=map (([[p′₁]], . . . , [[p′_L]]), ([[x₁]], . . . , [[x_L]]), ([[q₁]], . . . , [[q_S]])).

According to the embodiment of the present invention, it is possible to perform secure computation of a random number generation method using weighted probability distribution with high accuracy while keeping data secure.

Second Embodiment

The secure cluster computation system 20 will be described below with reference to FIGS. 10 to 12. FIG. 10 is a block diagram illustrating a configuration of the secure cluster computation system 20. The secure cluster computation system 20 includes W (W is a predetermined integer equal to or greater than 3) secure cluster computation devices 200₁, 200_W. The secure cluster computation devices 200₁, 200_Ware connected to the network 800 and can communicate with each other. The network 800 may be, for example, a communication network such as the Internet, a broadcasting channel, or the like. FIG. 11 is a block diagram illustrating a configuration of a secure cluster computation device 200_i(1≤i≤W). FIG. 12 is a flowchart illustrating operation of the secure cluster computation system 20.

As illustrated in FIG. 11, the secure cluster computation device 200_iincludes a centroid table initialization unit 210_i, a distance table computation unit 220_i, a cluster ID table computation unit 230_i, a centroid table computation unit 240_i, an end condition determination unit 250_i, and a recording unit 290_i. Each component of the secure cluster computation device 200_iexcept for the recording unit 290_iis configured to be able to execute operation required for secure computation, that is, operation required for achieving the function of each component among at least concealment, addition, subtraction, multiplication, division, logical operation (NOT, logical product, logical sum, exclusive logical sum), comparison operation (=, <, >, <, >), secure sorting, secure unique check, Group-by operation, prefix_sum (prefix sum), and secure collective mapping. As a specific functional configuration for achieving each operation in the present invention, for example, a configuration capable of executing an existing algorithm including the algorithms disclosed in each of Reference Non Patent Literatures 1 to 7 is sufficient, and since these are conventional configurations, a detailed description thereof will be omitted. The recording unit 290 is a component that records information necessary for the processing of the secure cluster computation device 200_i. For example, the recording unit 290_irecords a data table representing data to be clustered in advance.

Through cooperative calculation by the W secure cluster computation devices 200_i, the secure cluster computation system 20 achieves secure computation of the k-means method and the k-means++ method that is a multi-party protocol. Therefore, the centroid table initialization means 210 (not illustrated) of the secure cluster computation system 20 is configured by the centroid table initialization units 210₁, . . . , 210_W, the distance table computation means 220 (not illustrated) is configured by the distance table computation units 220₁, . . . , 220_W, the cluster ID table computation means 230 (not illustrated) is configured by the cluster ID table computation units 230₁, . . . , 230_W, the centroid table computation means 240 (not illustrated) is configured by the centroid table computation units 240₁, . . . , 240_W, and the end condition determination means 250 (not illustrated) is configured by the end condition determination units 250₁, . . . , 250_W.

The secure cluster computation system 20 sets M (M is an integer of 1 or more) as the number of data, K (K is an integer of 1 or more) as the number of clusters, N (N is an integer of 1 or more) as a dimension of data, (x_i1, . . . , x_iN) (i=1, . . . , M) as data of the data ID i, and calculates a share [[k(i)]] of a cluster ID k(i) (where k(i) satisfies 1≤k(i)≤K) of a cluster to which the data of the data ID i belongs from shares ([[x_i1]], . . . , [x_iN]) (i=1, . . . , M) of M pieces of data (x_i1, . . . , x_iN). Here, it is assumed that a cluster ID table includes a data ID and a cluster ID of a cluster to which data of the data ID belongs as attributes (hereinafter, referred to as a data ID attribute and a cluster ID attribute), a data table includes a data ID and data of the data ID as attributes (hereinafter, referred to as a data ID attribute and a data attribute), a centroid table includes a cluster ID and a centroid of a cluster of the cluster ID as attributes (hereinafter, referred to as a cluster ID attribute and a centroid attribute), a distance table includes a data ID, a cluster ID, and a distance between data of the data ID and a centroid of a cluster of the cluster ID as attributes (hereinafter, referred to as a data ID attribute, a cluster ID attribute, and a distance attribute), and the data table includes a set of a share [[i]] of the data ID i and the share ([[x_i1]], . . . , [[x_iN]]) of the data (x_i1, . . . , x_iN) of the data ID i as the i-th record (i=1, . . . , M).

The operation of the secure cluster computation system 20 will be described with reference to FIG. 12.

In S210, the centroid table initialization means 210 sets a table including a set of a share [[j]] of a cluster ID j and a share ([[c_j1]], . . . [c_jN]]) of a centroid (c_j1, . . . , c_jN) of the cluster ID j (where the shares are computed by a predetermined method) as the j-th record (j=1, . . . , K) as an initial value of the centroid table. For example, the centroid table initialization means 210 randomly selects shares [[i₁]], . . . , [[i_K]] of the data IDs from among the shares [[1]], . . . , [[M]], and sets a set of the share [[j]] of the cluster ID j and the share ([x_{i_j1}]], . . . , [x_{i_jN}]]) of the data (x_{i_j1}, . . . , x_{i_jN}) of the data ID i_jas an initial value of the j-th record (j=1, . . . , K) of the centroid table.

In S220, using the data table and the centroid table, the distance table computation means 220 computes a distance table including a set of the share [[i]] of the data ID i, the share [[j]] of the cluster ID j, a share [[d_ij]] of a distance d_ijbetween the data (x_i1, . . . , x_iN) of the data ID i and the centroid (c_j1, . . . , c_jN) of the cluster ID j as an M(j−1)+i-th record (i=1, . . . , M, j=1, . . . , K). The centroid table computed in S210 is used at the time of the first execution of S220, and the centroid table computed in S240 is used at the time of the second and subsequent executions.

In S230, the cluster ID table computation means 230 computes a cluster ID table including a set of the share [[i]] of the data ID i and the share [[k(i)]] of the cluster ID k(i) of the cluster to which the data of the data ID i belongs as the i-th record (i=1, . . . , M) using the distance table. For example, the cluster ID table computation means 230 computes a first intermediate table by performing secure sorting on the distance table by using the distance attribute of the distance table as a key, using the distance table, computes a second intermediate table by adding a column obtained by performing secure unique check on an element column of the data ID attribute of the first intermediate table to the first intermediate table as a checked data ID attribute, using the first intermediate table, computes a third intermediate table including a record in which a value of an element of the checked data ID attribute of the second intermediate table is [[1]], using the second intermediate table, and computes a cluster ID table by respectively using, as an element column of the data ID attribute and an element column of the cluster ID attribute of the cluster ID table, an element column of the data ID attribute and an element column of the cluster ID attribute of a table obtained by performing secure sorting on the third intermediate table by using the data ID attribute of the third intermediate table as a key, using the third intermediate table. Here, the first intermediate table is a table including the data ID attribute, the cluster ID attribute, and the distance attribute, the second intermediate table is a table including the checked data ID attribute, the data ID attribute, the cluster ID attribute, and the distance attribute, and the third intermediate table is a table including the checked data ID attribute, the data ID attribute, the cluster ID attribute, and the distance attribute. The first intermediate table includes MK records, the second intermediate table includes MK records, and the third intermediate table includes M records.

In S240, the centroid table computation means 240 computes the centroid table using the data table and the cluster ID table. For example, the centroid table computation means 240 computes a fifth intermediate table by replacing the element column of the data ID attribute of the data table with the element of the cluster ID attribute of the cluster ID table, using the data table and the cluster ID table, computes a sixth intermediate table by a groupBySum operation using the data ID attribute of the fifth intermediate table as a key, using the fifth intermediate table, computes a seventh intermediate table by a groupByCount operation using the data ID attribute of the fifth intermediate table as a key, using the fifth intermediate table, and computes a centroid table by setting a value obtained by dividing the value of the element of the data attribute of the j-th record of the sixth intermediate table by the value of the element of the data attribute of the j-th record of the seventh intermediate table as a value of an element of the centroid attribute of the j-th record of the centroid table, using the sixth intermediate table and the seventh intermediate table. Here, the fifth intermediate table is a table including the data ID attribute and the data attribute, the sixth intermediate table is a table including the data ID attribute and the data attribute, and the seventh intermediate table is a table including the data ID attribute and the data attribute. The fifth intermediate table includes M records, the sixth intermediate table includes K records, and the seventh intermediate table includes K records.

In S250, the end condition determination means 250 ends the process in a case where a predetermined end condition is satisfied, and returns to the processing of S220 in other cases. That is, the secure cluster computation system 20 repeats the processing of S220 to S240. When the predetermined end condition is the number of executions of the processing of S220 to S240, it is assumed that the share [[T]] of the number of executions T is given in advance, and the centroid table initialization means 210 initializes the value of the counter t with the share [[0]]. Then, the end condition determination means 250 may update the value of the counter t with t+[1], and in a case where the value of =([[t>T]], [[1]]) is [[1]], the process may be ended, and in other cases, the process may return to the processing of S220.

<<Setting of Initial Value Using Secure k-Means++ Method>>

Hereinafter, the centroid table initialization means 210 will be described with reference to FIGS. 13 and 14. FIG. 13 is a block diagram illustrating a configuration of the centroid table initialization unit 210_i(1≤i≤W). FIG. 14 is a flowchart illustrating operation of the centroid table initialization means 210. As illustrated in FIG. 13, the centroid table initialization unit 210_iincludes a first initial value setting unit 211_i, a first vector computation unit 212_i, a second initial value setting unit 213_i, and an end condition determination unit 214_i.

The first initial value setting means 211 (not illustrated) of the centroid table initialization means 210 is configured by the first initial value setting units 211₁, . . . , and 211_W, the first vector computation means 212 (not illustrated) is configured by the first vector computation units 212₁, . . . , 212_W, the second initial value setting means 213 (not illustrated) is configured by the second initial value setting units 213₁, . . . , 213_W, and the end condition determination means 214 (not illustrated) is configured by the end condition determination units 214₁, . . . , 214_W.

Hereinafter, the operation of the centroid table initialization means 210 will be described with reference to FIG. 14.

In S211, the first initial value setting means 211 randomly selects a share [[i]] of a data ID from among shares [[1]], . . . , [[M]], and sets a set of the share [[1]] of the cluster ID 1 and the share ([[x_{i_11}]], . . . , [[x_{i_1N}]]) of the data (x_{i_11}, . . . , x_{i_1N}) of the data ID i₁as an initial value of the first record of the centroid table. The first initial value setting means 211 initializes the value of the counter t with the share [[0]].

In S212, the first vector computation means 212 computes a share ([d_{k_1}²/Σd_{k_m}²], . . . , [d_{k_(M−j)}²/Σd_{k_m}²]) of a vector (d_{k_1}²/Σd_{k_m}², . . . , d_{k_(M−j)}²/Σd_{k_m}²) (where d_{k_n}is the smallest distance from a distance between the centroid specified by the first record of the centroid table and the data of the data ID k_mto a distance between the centroid specified by the j-th record of the centroid table and the data of the data ID k_m), using a share ([x_{k_11}]], . . . [x_{k_(M−j)N}]) of data (x_{k_11}, . . . , x_{k_(M−j)N}) (where k_m(m=1, . . . , M−j, j satisfies 1≤j<K) is a data ID of data that is not selected as the centroid) of the data ID k_m. For example, the first vector computation means 212 may perform processing similar to the processing of S220 and S230 to compute the share ([[d_{k_1}²/Σd_{k_m}²]], . . . , [[d_{k_(M−j)}²/Σd_{k_m}²]]).

In S213, using the secure random number computation system of the first embodiment, the second initial value setting means 213 computes a share [[i_j+1]] of one output value i_j+1(i_j+1is equal to one of the data IDs k₁, . . . , k_M−jof the data not selected as the centroid) from the share ([[k₁]], . . . [[k_M−j]]) of the vector (k₁, . . . , k_M−j) having a data ID of data not selected as the centroid as an element and the share ([[d_{k_1}²/Σd_{k_m}²]], . . . , [[d_{k_(M−j)}²/Σd_{k_n}²]]) of the vector (d_{k_1}²/Σd_{k_n}², . . . , d_{k_(M−j)}²/Σd_{k_m}²) computed in S212, and sets a set of the share [[j+1]] of the cluster ID j+1 and the share ([[x_{i_(j+1)1}]], . . . , [x_{i_(j+1)N}]]) of the data (x_{i_(j+1)1}, . . . , x_{i_(j+1)N}) of the data ID i_j+1as an initial value of the j+1-th record of the centroid table.

In S214, the end condition determination means 214 may update the value of the counter t with t+[[1]], and in a case where the value of =([[t>K]], [[1]]) is [[1]], the process ends, and in other cases, the process returns to the processing of S212. That is, the centroid table initialization means 210 repeats the processing of S212 to S213.

According to the embodiment of the present invention, it is possible to perform secure computation of the k-means method and the k-means++ method with high accuracy while keeping data secure.

FIG. 15 is a diagram illustrating an example of a functional configuration of a computer that implements each device described above. It is possible to perform the processes in the respective devices described above, by causing a recording unit 2020 to read a program for causing a computer to function as the respective devices described above and causing a control unit 2010, an input unit 2030, an output unit 2040, and the like to operate.

The device according to the present invention includes, as a single hardware entity for example, an input unit that can be connected to a keyboard or the like, an output unit that can be connected to a liquid crystal display or the like, a communication unit that can be connected to a communication device (e.g., a communication cable) capable of communicating with the outside of the hardware entity, a CPU (Central Processing Unit, which may include a cache memory or a register), a RAM or a ROM which is a memory, an external storage device as a hard disk, and a bus that connects the input unit, the output unit, the communication unit, the CPU, the RAM, the ROM, and the external storage device so that data can be exchanged therebetween. A device (drive) or the like that can write and read data in and from a recording medium such as a CD-ROM may be provided in the hardware entity as necessary. Examples of a physical entity including such a hardware resource include a general-purpose computer.

The external storage device of the hardware entity stores a program required to implement the above-described functions, data required to process the program, and the like (the present invention is not limited to the external storage device and the program may be stored, for example, in a ROM which is a read-only storage device). Data or the like obtained by processing the program is appropriately stored in a RAM, an external storage device, or the like.

In the hardware entity, each program stored in the external storage device (or ROM or the like) and data required to process each program are read to a memory as necessary and are appropriately interpreted and processed by the CPU. As a result, the CPU implements a predetermined function (each component represented as unit, . . . means, or the like).

The present invention is not limited to the above-described embodiment and can be appropriately modified without departing from the gist of the present invention. The processes described in the foregoing embodiment may be executed not only chronologically in accordance with the described order, but also in parallel or individually in accordance with the processing capability of a device that executes the processes or as necessary.

As described above, when the processing function of the hardware entity (the device according to the present invention) described in the foregoing embodiment is implemented by a computer, processing content of the function of the hardware entity is described by a program. In addition, as the computer executes the program, the processing function of the hardware entity is implemented on the computer.

The program in which the processing content is written may be recorded on a computer-readable recording medium. The computer-readable recording medium may be, for example, any recording medium such as a magnetic recording device, an optical disc, a magneto-optical recording medium, or a semiconductor memory. Specifically, for example, a hard disk device, a flexible disk, a magnetic tape, or the like, can be used as a magnetic recording device, a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc Read Only Memory), a CD-R (Recordable)/RW (ReWritable), or the like, can be used as an optical disk, an MO (Magneto-Optical disc), or the like, can be used as a magneto-optical recording medium, an EEP-ROM (Electronically Erasable and Programmable-Read Only Memory), or the like, can be used as a semiconductor memory.

In addition, the program is distributed by, for example, selling, transferring, or renting a portable recording medium such as a DVD or a CD-ROM on which the program is recorded. Further, a configuration may also be employed in which the program is stored in a storage device of a server computer and the program is distributed by transferring the program from the server computer to other computers via a network.

For example, a computer that executes such a program first temporarily stores a program recorded on a portable recording medium or a program transferred from the server computer in a storage device of the computer. In addition, to perform the processing, the computer reads the program stored in the storage device of the computer, and executes the processing in accordance with the read program. Also, in other modes of execution of the program, the computer may read the program directly from a portable recording medium and performs processing in accordance with the program, or alternatively, the computer may sequentially perform processing in accordance with a received program every time a program is transferred from the server computer to the computer. In addition, the above-described processing may be executed by a so-called ASP (Application Service Provider) type service that implements a processing function only by an execution instruction and result acquisition without transferring the program from the server computer to the computer. Note that the program in the present embodiment includes information that is used for processing by an electronic computer and is equivalent to the program (data or the like that is not a direct command to the computer but has property that defines processing performed by the computer).

Although the hardware entity is configured by causing a computer to execute a predetermined program in the present embodiment, at least some of the processing content may be implemented by hardware.

Claims

1. A secure random number computation system having L and S being integers of 1 or more, including three or more secure random number computation devices, and configured to compute a share (((r1)), . . . , ((rS))) of a vector (r1, . . . , rS) (where ri (i=1, . . . , S) is equal to one of output possibility values x1, . . . , xL) having an output value as an element, from a share ((((x1)), . . . , ((xL))) of a vector (x1, . . . , xL) having an output possibility value as an element and a share the secure random number computation system comprising:first vector computation circuitry configured to compute a share of a vector (p′1, . . . , p′L) from the share (((p′1), . . . (p′L))) of the vector (p1, . . . , pL) by (((p′1), . . . , ((p′L)))=prefix_sum ((((p1)), . . . , ((pL))));uniform random number generation circuitry configured to generate a share (((q1)), . . . , ((qS))) of a vector (q1, . . . , qS) (where qi (i=1, . . . , S) is a uniform random number, and satisfies 0≤qi≤1) having a uniform random number as an element; andrandom number computation circuitry configured to compute a share (((r1)), . . . , ((rS))) of a vector (r1, . . . , rS) having an output value as an element from the share (((p′1)), . . . , ((p′L))) of the vector (p′1, . . . , p′L), the share (((x1)), . . . , ((xL))) of the vector (x1, . . . , xL), and the share (((qi)), . . . , ((qS))) of the vector (q1, . . . , qS) by (((r1)), . . . , ((rS)))=map ((((p′1)), . . . , ((p′L))), (((x1)), . . . , ((xL))), (((q1)), . . . , ((qS)))).
2. A secure random number computation device having L and S being integers of 1 or more and included in a secure random number computation system including three or more secure random number computation devices that computes a share (((r1)), . . . , ((rS))) of a vector (r1, . . . , rS) (where ri (i=1, . . . , S) is equal to one of output possibility values x1, . . . , xL) having an output value as an element, from a share (((x1)), . . . , ((xL))) of a vector (x1, . . . , xL) having an output possibility value as an element and a share (((p1)), ((pL))) of a vector (p1, . . . , pL) (where pi (i=1, . . . , L) is a probability that the output possibility value xi is output, and satisfies Σpi=1) having an output probability as an element,the secure random number computation device comprising:a first vector computation circuitry configured to compute a share ((p′1)), . . . , (p′L)) of a vector (p′1, . . . , p′L) from the share (((p1)), . . . , ((pL))) of the vector (p1, . . . , pL) by (((p′1), . . . , ((p′L)))=prefix sum((((p1)), . . . , ((pL))));a uniform random number generation t-circuitry configured to generate a share (((q1)), . . . , ((qS))) of a vector (q1, . . . , qS) (where qi (i=1, . . . , S) is a uniform random number, and satisfies 0≤qi≤1) having a uniform random number as an element; anda random number computation circuitry configured to compute a share (((r1)), . . . , ((rS))) of a vector (r1, . . . , rS) having an output value as an element from the share (((p′1)), . . . , ((p′L))) of the vector (p′1, . . . , p′L), the share (((x1)), . . . , ((xL))) of the vector (x1, . . . , xL), and the share (((q1)), . . . , ((qS))) of the vector (q1, . . . , qS) by (((r1)), . . . , ((rS)))=map((((p′1)), . . . , ((p′L))), (((x1)), . . . , ((xL))), (((q1)), . . . , ((qS)))).
3. A secure random number computation method, by a secure random number computation system including three or more secure random number computation devices, of computing a share (((r1)), . . . , ((rS))) of a vector (r1, . . . , rS) (where ri (i=1, . . . , S) is equal to one of output possibility values x1, . . . , xL) having an output value as an element, from a share (((x1)), . . . , ((xL))) of a vector (x1, . . . , xL) having an output possibility value as an element and a share (((p1), . . . , (pL))) of a vector (p1, . . . pL) (where pi (i=1, . . . , L) is a probability that the output possibility value xi is output, and satisfies Σpi=1) having an output probability as an element, the secure random number computation method comprising:a first vector computation step of computing a share (((p′1), . . . , ((p′L))) of a vector (p′1, . . . , p′L) from the share of the vector (p1, . . . , pL) by (((p′1), . . . , ((p′L)=prefix_sum((((p1)), . . . , ((pL)))), by the secure random number computation system;a uniform random number generation step of generating a share (((q1)), . . . , ((qS))) of a vector (q1, . . . , qS) (where qi (i=1, . . . , S) is a uniform random number, and satisfies 0≤qi≤1) having a uniform random number as an element, by the secure random number computation system; anda random number computation step of computing a share (((r1)), . . . , ((rS))) of a vector (r1, . . . , rS) having an output value as an element from the share (((p′1)), . . . , ((p′L))) of the vector (p′1, . . . , p′L), the share (((x1)), . . . , ((xL))) of the vector (x1, . . . , xL), and the share (((qi)), . . . , ((qS))) of the vector (q1, . . . , qS) by (((r1)), . . . , ((rS)))=map((((p′1)), . . . , ((p′L))), (((x1)), . . . , ((xL))), (((qi)), . . . , ((qS)))), by the secure random number computation system.
4. A secure cluster computation system having M (M is an integer of 1 or more) as the number of data, K (K is an integer of 1 or more) as the number of clusters, N (N is an integer of 1 or more) as a dimension of data, and (xi1, . . . , xiN) (i=1, . . . , M) as data of data ID i, including three or more secure cluster computation devices, and configured to compute a share ((k(i))) of a cluster ID k(i) (where k(i) satisfies 1≤k(i)≤K) of a cluster to which the data of the data ID i belongs from shares (((xi1)), . . . , ((xiN))) (i=1, . . . , M) of M pieces of data (xi1, . . . , xiN),wherein a table that includes a data ID and a cluster ID of a cluster to which data of the data ID belongs as attributes (hereinafter, referred to as a data ID attribute and a cluster ID attribute) is set as a cluster ID table, a table that includes a data ID and data of the data ID as attributes (hereinafter, referred to as a data ID attribute and a data attribute) is set as a data table, a table that includes a cluster ID and a centroid of a cluster of the cluster ID as attributes (hereinafter, referred to as a cluster ID attribute and a centroid attribute) is set as a centroid table, and a table that includes a data ID, a cluster ID, and a distance between data of the data ID and a centroid of a cluster of the cluster ID as attributes (hereinafter, referred to as a data ID attribute, a cluster ID attribute, and a distance attribute) is set as a distance table,the data table includes a set of a share ((i)) of the data ID i and the share (((xi1)), . . . , ((xiN))) of the data (xi1, . . . , xiN) of the data ID i as an i-th record (i=1 . . . , M),the secure cluster computation system comprises:centroid table initialization circuitry configured to set a table including a set of a share ((j)) of a cluster ID j and a share (((cj1), . . . , ((cjN)) of a centroid (cj1, . . . , cjN) of the cluster ID j (where the shares are computed by a predetermined method) as a j-th record (j=1, . . . , K) as an initial value of the centroid table;distance table computation circuitry configured to use the data table and the centroid table to compute a distance table including a set of the share ((i)) of the data ID i, the share (jj) of the cluster ID j, a share (dij)) of a distance dij between the data (xi1, . . . , xiN) of the data ID i and the centroid (cj1, . . . , cjN) of the cluster ID j as an M(j−1)+i-th record (i=1, . . . , M, j=1, . . . , K);cluster ID table computation circuitry configured to compute a cluster ID table including a set of the share ((i)) of the data ID i and the share ((k(i))) of the cluster ID k(i) of the cluster to which the data of the data ID i belongs as the i-th record (i=1, . . . , M) using the distance table; andcentroid table computation circuitry configured to compute the centroid table using the data table and the cluster ID table, andthe centroid table initialization circuitry includesfirst initial value setting circuitry configured to randomly select a share ((i1)) of a data ID from among shares ((1)), . . . , (M)), and sets a set of the share ((1)) of the cluster ID 1 and the share (((xi_11), . . . , ((xi_1N))) of the data (xi_11, . . . , xi_iN) of the data ID i1 as an initial value of a first record of the centroid table, first vector computation m-ea-as-circuitry configured to compute a share (((dk_12/Σdk_m2), . . . , ((dk_(M−j)2/Σdk_m2)) of a vector (dk_12/Σdk_m2, . . . , dk_(M−j)2/Σdk_m2) (where dk_m is the smallest distance from a distance between the centroid specified by the first record of the centroid table and the data of the data ID km to a distance between the centroid specified by the j-th record of the centroid table and the data of the data ID km), using a share (((xk_11)), . . . , ((xk_(M−j)N))) of data (xk_11, . . . , xk(M−j)N) (where km (m=1, . . . , M−j, j satisfies 1≤j<K) is a data ID of data that is not selected as the centroid) of the data ID km, andsecond initial value setting circuitry configured to use the secure random number computation system according to claim 1 to compute a share ((ij+1)) of one output value ij+1 (ij+1 is equal to one of the data IDs k1, . . . , kM−j of data not selected as the centroid) from the share ((k1)), . . . , ((kM−j)) of the vector (ki, . . . , kM−j) having a data ID of data not selected as the centroid as an element and the share (((dk_12/Σdk_m2)), . . . , ((dk_(M−j)2/Σdk_m2))) of the vector (dk_12/Σdk_m2, . . . , dk_(M−j)2/Σdk_m2), and sets a set of the share (j+1) of the cluster ID j+1 and the share (((xi_(i+1)1)), . . . , ((xi_(j+1)N))) of the data (xi_(j+1)1, . . . , xi_(j+1)N) of the data ID ij+1 as an initial value of a j+1-th record of the centroid table.
5. A secure cluster computation device included in a secure cluster computation system including three or more secure cluster computation devices, having M (M is an integer of 1 or more) as the number of data, K (K is an integer of 1 or more) as the number of clusters, N (N is an integer of 1 or more) as a dimension of data, and (xi1, . . . , xiN) (i=1, . . . , M) as data of a data ID i, and configured to compute a share (k(i))) of a cluster ID k(i) (where k(i) satisfies 1≤k(i)≤K) of a cluster to which the data of the data ID i belongs from shares (((xi1)), . . . , ((xiN))) (i=1, . . . , M) of M pieces of data (xi1, . . . , xiN),wherein a table that includes a data ID and a cluster ID of a cluster to which data of the data ID belongs as attributes (hereinafter, referred to as a data ID attribute and a cluster ID attribute) is set as a cluster ID table, a table that includes a data ID and data of the data ID as attributes (hereinafter, referred to as a data ID attribute and a data attribute) is set as a data table, a table that includes a cluster ID and a centroid of a cluster of the cluster ID as attributes (hereinafter, referred to as a cluster ID attribute and a centroid attribute) is set as a centroid table, and a table that includes a data ID, a cluster ID, and a distance between data of the data ID and a centroid of a cluster of the cluster ID as attributes (hereinafter, referred to as a data ID attribute, a cluster ID attribute, and a distance attribute) is set as a distance table,the data table includes a set of a share ((i)) of the data ID i and the share (((xi1)), . . . , ((xiN))) of the data (xi1, . . . , xiN) of the data ID i as an i-th record (i=1, . . . , M), the secure cluster computation device comprises:a centroid table initialization circuitry configured to set a table including a set of a share ((j)) of a cluster ID j and a share (((cj1), . . . , ((cjN) of a centroid (cj1, . . . , cjN) of the cluster ID j (where the shares are computed by a predetermined method) as a j-th record (j=1, . . . , K) as an initial value of the centroid table;a distance table computation circuitry configured to use the data table and the centroid table to compute a distance table including a set of the share ((i)) of the data ID i, the share ((j)) of the cluster ID j, a share ((dij)) of a distance dij between the data (xi1, . . . , xiN) of the data ID i and the centroid (cj1, . . . , cjN) of the cluster ID j as an M(j−1)+i-th record (i=1, . . . , M, j=1, . . . , K);a cluster ID table computation circuitry configured to compute a cluster ID table including a set of the share ((i)) of the data ID i and the share ((k(i))) of the cluster ID k(i) of the cluster to which the data of the data ID i belongs as the i-th record (i=1, . . . , M) using the distance table; anda centroid table computation circuitry configured to compute the centroid table using the data table and the cluster ID table, andthe centroid table initialization circuitry includesa first initial value setting circuitry configured to randomly select a share ((i1)) of a data ID from among shares ((1)), . . . , ((M)), and sets a set of the share ((1)) of the cluster ID 1 and the share ((xi_11)), . . . , ((xi_1N))) of the data (xi_11, . . . , xi_1N) of the data ID i1 as an initial value of a first record of the centroid table,a first vector computation circuitry configured to compute a share (((dk_12/Σdk_m2)), . . . , ((dk_(M−j)2/Σdk_m2)) of a vector (dk_12/Σdk_m2, . . . , dk_(M−j)2/Σdk_m2) (where dk_m is the smallest distance from a distance between the centroid specified by the first record of the centroid table and the data of the data ID km to a distance between the centroid specified by the j-th record of the centroid table and the data of the data ID km), using a share (((xk_11)), . . . , ((xk(M−j)N)) of data (xk_11, . . . , xk_(M−j)N) (where km (m=1, . . . , M−j, j satisfies 1≤j<K) is a data ID of data that is not selected as the centroid) of the data ID km, anda second initial value setting circuitry configured to use the secure random number computation device according to claim 2 to compute a share ((ij+1)) of one output value ij+1 (ij+1 is equal to one of the data IDs k1, . . . , kM−j of the data not selected as the centroid) from the share (((k1), . . . , ((kM−j))) of the vector (k1, . . . , kM−j) having a data ID of data not selected as the centroid as an element and the share (((dk_12/Σdk_m2)), . . . , ((dk_(M−j)2/Σdk_m2)) of the vector (dk_12/Σdk_m2, . . . , dk_(M−j)2/Σdk_m2), and sets a set of the share ((j+1)) of the cluster ID j+1 and the share (((xi_(j+1)1)), . . . , ((xi_(i+1)N))) of the data (xi_j+1)1, . . . , xi_(j+1)N) of the data ID ij+1 as an initial value of a j+1-th record of the centroid table.
6. A secure cluster computation method, by a secure cluster computation system having M (M is an integer of 1 or more) as the number of data, K (K is an integer of 1 or more) as the number of clusters, N (N is an integer of 1 or more) as a dimension of data, and (xi1, . . . , xiN) (i=1, . . . , M) as data of a data ID i, and including three or more secure cluster computation devices, of computing a share ((k(i))) of a cluster ID k(i) (where k(i) satisfies 1≤k(i)≤K) of a cluster to which the data of the data ID i belongs from shares (((xi1)), . . . , ((xiN))) (i=1, . . . , M) of M pieces of data (xi1, . . . , xiN), wherein a table that includes a data ID and a cluster ID of a cluster to which data of the data ID belongs as attributes (hereinafter, referred to as a data ID attribute and a cluster ID attribute) is set as a cluster ID table, a table that includes a data ID and data of the data ID as attributes (hereinafter, referred to as a data ID attribute and a data attribute) is set as a data table, a table that includes a cluster ID and a centroid of a cluster of the cluster ID as attributes (hereinafter, referred to as a cluster ID attribute and a centroid attribute) is set as a centroid table, and a table that includes a data ID, a cluster ID, and a distance between data of the data ID and a centroid of a cluster of the cluster ID as attributes (hereinafter, referred to as a data ID attribute, a cluster ID attribute, and a distance attribute) is set as a distance table,the data table includes a set of a share ((i (((xi1)), . . . , ((xiN))) of the data (xi1, . . . , xiN) of the data ID i as an i-th record (i=1 . . . , M),the secure cluster computation method comprises:a centroid table initialization step of setting a table including a set of a share ((j)) of a cluster ID j and a share (((cj1)), . . . , ((cjN))) of a centroid (cj1, . . . , cjN) of the cluster ID j (where the shares are computed by a predetermined method) as a j-th record (j=1, . . . , K) as an initial value of the centroid table, by the secure cluster computation system;a distance table computation step of using the data table and the centroid table to compute a distance table including a set of the share ((i)) of the data ID i, the share ((j)) of the cluster ID j, a share ((dij)) of a distance dij between the data (xi1, . . . , xiN) of the data ID i and the centroid (cj1, . . . , cjN) of the cluster ID j as an M(j−1)+i-th record (i=1, . . . , M, j=1, . . . , K), by the secure cluster computation system;a cluster ID table computation step of computing a cluster ID table including a set of the share ((i)) of the data ID i and the share ((k(i))) of the cluster ID k(i) of the cluster to which the data of the data ID i belongs as the i-th record (i=1, . . . , M) using the distance table, by the secure cluster computation system; anda centroid table computation step of computing the centroid table using the data table and the cluster ID table, by the secure cluster computation system, andthe centroid table initialization step includesa first initial value setting step of randomly selecting a share ((i1)) of a data ID from among shares ((1)), . . . , (M)), and setting a set of the share ((1)) of the cluster ID 1 and the share (((xi_11)), . . . , ((xi_1N))) of the data (xi_11, . . . , xi_1N) of the data ID i1 as an initial value of a first record of the centroid table,a first vector computation step of computing a share (((dk_12/Σdk_m2)), . . . , ((dk_(M−j)2/Σdk_m2))) of a vector (dk_12/Σdk_m2, . . . , dk_(M−j)2/Σdk_m2) (where dk_m is the smallest distance from a distance between the centroid specified by the first record of the centroid table and the data of the data ID km to a distance between the centroid specified by the j-th record of the centroid table and the data of the data ID km), using a share (((xk_11)), . . . , ((xk_(M−j)N))) of data (xk_11, . . . , xk_(M−j)N) (where km (m=1, M−j, j satisfies 1≤j<K) is a data ID of data that is not selected as the centroid) of the data ID km, anda second initial value setting step of using the secure random number computation method according to claim 3 to compute a share ((ij+1)) of one output value ij+1 (ij+1 is equal to one of the data IDs k1, . . . , kM−j of data not selected as the centroid) from the share (((k1)), . . . , ((kM−j))) of the vector (k1, . . . , kM−j) having a data ID of data not selected as the centroid as an element and the share (((dk_12/Σdk_m 2)), . . . , ((dk_(M−j)2/Σdk_m2)) of the vector (dk_12/Σdk_m2, . . . , dk_(M−j)2/Σdk_m2), and setting a set of the share ((j+1)) of the cluster ID j+1 and the share (((xi_(i+1)1)), . . . , ((xi_(i+1)N) of the data (xi_(j+1)1), . . . , xi_(j+1)N) of the data ID ij+1 as an initial value of a j+1-th record of the centroid table.
7. A non-transitory computer-readable storage medium which stores a program for causing a computer to function as the secure random number computation device according to claim 2.
8. A non-transitory computer-readable storage medium which stores a program for causing a computer to function as the secure cluster computation device according to claim 5.

PCT Information

Filing Document	Filing Date	Country	Kind
PCT/JP2022/000519	1/11/2022	WO

SECURE RANDOM NUMBER CALCULATION SYSTEM, SECURE RANDOM NUMBER CALCULATION APPARATUS, SECURE RANDOM NUMBER CALCULATION METHOD, SECURE CLUSTER CALCULATION SYSTEM, SECURE CLUSTER CALCULATION APPARATUS, SECURE CLUSTER CALCULATION METHOD, AND PROGRAM

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

PCT Information