The present invention relates to a secret division device (secure grouping apparatus), a secret division method (secure grouping apparatus), and a program.
As a method for obtaining a specific calculation result without restoring an encrypted numerical value, a method called secret calculation (secure calculation) (or hidden calculation) is known (for example, Non Patent Literature 1). In the method described in Non Patent Literature 1, encryption of distributing fragments of numerical values to three secret calculation devices is performed, and the three secret calculation devices perform cooperative calculation, and thus the results of addition/subtraction, constant addition, multiplication, constant multiplication, logical operation (NOT, logical product, logical sum, exclusive logical sum), data format conversion (integer and binary number), and the like can be obtained in a state of being distributed to the three secret calculation devices without restoring the numerical values.
Here, in a case where learning of a decision tree is performed by secret calculation, learning is generally performed while recursively dividing given training data (that is, learning is performed while recursively repeating further grouping of grouped training data).
For example, it is assumed that training data including n samples with an explanatory variable having m attributes (m≥1) is given as a vector z of an objective variable having a size n and a vector wj(j∈[1, m]) of m explanatory variables having a size n. Furthermore, the elements of the vector z and the elements of the vectors wj that are rearranged while maintaining the correspondence relationship between the objective variable and the explanatory variables such that the samples of the training data are grouped and the samples of the same group are continuous are defined as vectors y and xj, respectively. That is, if a certain permutation π of a magnitude n exists, y[π(i)]=z[i] for all i∈[1, n], and xj[π(i)]=wj[i] for all j∈[1, m], and if the i-th sample and the j-th sample (where i<j) after rearrangement are the same group, the k-th sample is also the same group for any k∈[i, j]. Note that y[i], z[i], xj[i], and wj[i] represent the i-th elements of the vectors y, z, xj, and wj, respectively.
At this time, in the learning of the decision tree by secret calculation, calculation of the vector y′, the vector vj′, and the permutation σj′ is repeatedly performed using the vector y, the vector vj, the permutation σj, and the vector b as inputs. Here, vj is a vector in which the elements of xj are rearranged in ascending order in the group, σj is a permutation in which the elements of xj are rearranged into vj, and b is a vector representing a division result when each sample is divided into groups under a certain condition. In addition, y′ and xj′ are vectors obtained by rearranging the elements of z and wj while maintaining the correspondence relationship between the objective variable and each explanatory variable such that samples of the same group are adjacent by new grouping after further dividing each group into groups according to b, vj′ is a vector obtained by rearranging the elements of xj′ in ascending order in the group, and σj′ is permutation in which the elements of xj′ are rearranged into vj′.
In addition, when the vector y′, the vector vj′, and the permutation σj′ are calculated, a method is used in which the vector xj is calculated, the elements of the vector y and the elements of the vector xj are rearranged into a new group according to the vector b to obtain the vectors y′ and xj′, then the elements of the vector xj′ are sorted in each group to obtain the permutation σj′, and the elements of the vector xj′ are rearranged according to the permutation σj′ to obtain the vector vj′.
However, in the above-described conventional method, it is necessary to sort the elements of the vector xj′ when calculating the permutation σj′, and thus, there is a problem that the calculation cost becomes higher.
An embodiment of the present invention has been made in view of the above points, and an object thereof is to reduce calculation costs in a case where data grouped by secret calculation is further grouped.
To achieve the above-described object, a secure grouping apparatus (secret division device) according to an embodiment is a secure grouping apparatus that is given training data including a plurality of samples having m explanatory variables as a hidden value of an objective variable vector z having a value of an objective variable of each sample as an element and an explanatory variable vector wj (j∈[1, m]) having a value of a j-th explanatory variable of each sample as an element, groups the samples included in the training data into groups, and executes grouping to arrange samples in the same group to be continuous in secure calculation, the secure grouping apparatus including an input unit that inputs a hidden value of an objective variable vector y of the grouped training data, a hidden value of a permutation σj for stably sorting a j-th explanatory variable vector of the grouped training data in a group, a hidden value of an explanatory variable vector vj obtained by stably sorting the j-th explanatory variable vector of the grouped training data in the group, and a hidden value of a grouping result vector b representing a grouping result when the training data is grouped into groups under a predetermined condition; and a secure grouping unit that calculates, in secure calculation, a hidden value of an objective variable vector y′ of the training data newly grouped according to a grouping result represented by the grouping result vector, a hidden value of a permutation σj′ for stably sorting the j-th explanatory variable vector of the training data newly grouped according to the grouping result in a group, and a hidden value of an explanatory variable vector vj′ obtained by stably sorting the j-th explanatory variable vector of the training data newly grouped according to the grouping result in a group.
It is possible to reduce calculation costs when the data grouped by secret calculation is further grouped.
Hereinafter, an embodiment of the present invention will be described. In the present embodiment, a description will be given about a secret division device 10 (secure grouping apparatus) capable of reducing calculation costs required when data grouped by secret calculation (secure calculation) is further grouped.
Hereinafter, notation, definitions, and the like used in the present embodiment will be described.
A value obtained by hiding a value a in encryption, secret sharing, or the like is referred to as a hidden value of a, and is denoted as [[a]]. In a case where a is hidden by secret sharing, a set of fragments of the secret sharing of each secret calculation device is referred to in [[a] ].
The i-th element of the vector v is referred to by v[i]. That is, when the vector v has a size n, v=(v[1], v[2], . . . , v[n]).
A permutation having a size n is a bijection of {1, 2, . . . , n}→{1, 2, . . . , n}, and when the permutation α of a size n satisfies α(i)=bi for each i∈[1, . . . , n], the permutation is expressed as follows.
An operation of creating a vector v′ in which elements of the vector v having the size n are rearranged such that v′[α(i)]=v[i] by the permutation α of the size n is referred to as “application”, and is denoted by v′=αv. In addition, in the following description, as long as no confusion occurs, rearranging the elements of the vector v may be simply described as “rearranging the vectors v” or the like.
A permutation ρ of the size n that satisfies ρ(i)=π(σ(i)) for the permutation π, σ of the size n and each i∈[1, n] is called synthesis of the permutation π and the permutation σ, and is expressed as follows.
The inverse mapping of the permutation π is referred to as an inverse permutation and expressed as π−1.
Examples of the input device 101 include a keyboard, a mouse, a touch panel, a physical button, and the like. The display device 102 is, for example, a display, a display panel, or the like. Note that the secret division device 10 may not include, for example, at least one of the input device 101 and the display device 102.
The external I/F 103 is an interface with an external device such as a recording medium 103a. The secret division device 10 can perform reading and writing in the recording medium 103a via the external I/F 103. Note that the recording medium 103a includes, for example, a compact disc (CD), a digital versatile disk (DVD), a secure digital memory card (SD memory card), a universal serial bus (USB) memory card, and the like.
The communication I/F 104 is an interface for connecting the secret division device 10 to a communication network. The processor 105 is, for example, any of various arithmetic devices such as a central processing unit (CPU) and a graphics processing unit (GPU). The memory device 106 is, for example, any of various storage devices such as a hard disk drive (HDD), a solid state drive (SSD), a random access memory (RAM), a read only memory (ROM), and a flash memory.
The secret division device 10 according to the present embodiment has the hardware configuration illustrated in
The input unit 201 inputs a hidden value of grouped data. The secret division unit 202 calculates a hidden value of data obtained by further grouping hidden values input by the input unit 201 in secret calculation. The output unit 203 outputs the hidden value calculated by the secret division unit 202. The storage unit 204 stores data such as a hidden value input by the input unit 201, a hidden value output by the output unit 203, and a hidden value of a calculation result (including an intermediate calculation result) of the secret division unit 202.
An example is described below. Hereinafter, although it is assumed that all the data including input, output, and intermediate calculation results with respect to the secret division device 10 are hidden values, the description of “the hidden value of” may be omitted in some cases. For example, the hidden value [[π]] of the permutation π may be expressed as “permutation [[π]]”.
In addition, the number of attributes of the explanatory variable is m≥1, and j∈[1, m]. Furthermore, some processes are defined below.
It is assumed that [[π]]←StableSort ([[x]]) represents a process of calculating the hidden value [[π]] of the permutation π that stably sorts the vector x from the hidden value [[x]] of the vector x. Note that it is assumed that sorting means rearrangement in ascending order.
It is assumed that [[y]]←Apply ([[π]], [[x]]) represents a process of calculating the hidden value [[y]] of the vector y=πx from the hidden value [[x]] of the vector x and the hidden value [[π]] of the permutation r.
It is assumed that [[σ]]←[[π]]−1 represents a process of calculating the hidden value [[σ]] of the permutation σ=π−1 from the hidden value [[π]] of the permutation π.
It is assumed that the above expression represents a process in which the hidden value [[π]] of the permutation π and the hidden value [[σ]] of the permutation σ are used to calculate the hidden value [[ρ]] of the permutation expressed by the following expression.
In the present example, it is assumed that training data including n samples with an explanatory variable having m attributes (m≥1) is given to the secret division device 10 as a vector z of an objective variable having a size n and a vector wj (j∈[1, m]) of m explanatory variables having a size n. Furthermore, the elements of the vector z and the elements of the vectors wj that are rearranged while maintaining the correspondence relationship between the objective variable and the explanatory variables such that the samples of the training data are grouped and the samples of the same group are continuous are defined as vectors y and xj, respectively. That is, if a certain permutation π of a magnitude n exists, y[π(i)]=z[i] for all i∈[1, n], and xj[π(i)]=wj[i] for all j∈[1, m], and if the i-th sample and the j-th sample (where i<j) after rearrangement are the same group, the k-th sample is also the same group for any k∈[i, j].
At this time, the secret division device 10 according to the present example performs calculating by receiving the vector y, the vector vj, the permutation σj, and the vector b and performing secret calculation of the vector y′, the vector vj′, and the permutation σj′ (that is, performing calculation while hiding the values of input and output and the intermediate calculation results), and performs the secret division process of output the calculation results. However, vj is a vector in which the elements of xj are rearranged in ascending order in the group, σj is a permutation in which the elements of x: are rearranged into vj, and b is a vector representing a division result when each sample is divided into groups under a certain condition. In addition, y′ and xj′ are vectors obtained by rearranging the elements of z and wj while maintaining the correspondence relationship between the objective variable and each explanatory variable such that samples of the same group are adjacent by new grouping after further dividing each group into groups according to b, vj′ is a vector obtained by rearranging the elements of xj′ in ascending order in the group, and σj′ is permutation in which the elements of xj′ are rearranged into vj′.
That is, the input and output of the secret division process executed by the secret division device 10 according to the present example are as follows.
Input: the vector [[y]] of the grouped objective variables, the permutation [[σj]] for stably sorting the grouped explanatory variables in the group, the vector [[vj]] obtained by stably sorting the grouped explanatory variables in the group, and the vector [[b]] representing a division result
Output: the vector [[y′]] of the objective variables newly grouped according to the division result, the permutation [[σj′]] for stably sorting the explanatory variables newly grouped according to the division result in the group, and the vector [[vj′]] obtained by stably sorting the explanatory variables newly grouped according to the division result in the group
By repeatedly performing the secret division process having the input and output, the decision tree can be learned from secret calculation.
The secret division process according to the present example will be described below with reference to
Step S101: The input unit 201 inputs the vector [[y]] of the grouped objective variables, the permutation [[σj]] for stably sorting the grouped explanatory variables in the group, the vector [[vj]] obtained by stably sorting the grouped explanatory variables in the group, and the vector [[b]] representing a division result.
Step S102: The secret division unit 202 calculates the permutation [[π]] for stably sorting the vector [[b]] representing the division result by [[π]]←StableSort ([[b]]).
Step S103: The secret division unit 202 applies the permutation [σj] for stably sorting the grouped explanatory variables in the group to the vector [[b]] by [[bj]]←Apply ([[σj]], [[b]]) to calculate a vector [[bj]] in which the grouped explanatory variables are arranged in the same order as the vector [[vj]] obtained by stably sorting the grouped explanatory variables in the group.
Step S104: The secret division unit 202 calculates the permutation [ρj] for stably sorting the vector [[bj]] by [[ρj]]←StableSort ([[bj]]).
Step S105: The secret division unit 202 applies the permutation [[ρj]] to the vector [[vj]] by [[vj′]] Apply ([[ρj]], [[vj]]) to calculate the vector [[vj′]] of the explanatory variables obtained by stably sorting in the group after division.
Step S106: The secret division unit 202 calculates a permutation [[σj′]] for stably sorting the explanatory variables in the group after division by using the following expression.
Step S107: The secret division unit 202 applies the permutation [[π]] to the vector [[y]] by [[y′]]←Apply ([[π]], [[y]]) to calculate the vector of the grouped objective variables after division (that is, the vector of the objective variables obtained by further grouping the grouped objective variables) [[y′]].
Step S108: The output unit 203 outputs the vector [[y′]] of the objective variables newly grouped according to the division result, the permutation [[σj′]] for stably sorting the explanatory variables newly grouped according to the division result in the group, and the vector [[vj′]] obtained by stably sorting the explanatory variables newly grouped according to the division result in the group to a predetermined output destination (e.g., the storage unit 204, etc.).
As described above, the explanatory variables grouped by the secret calculation can be further grouped. Moreover, at this time, in the present example, the hidden values of y′, vj′, and σj′ can be calculated without sorting xj. This is based on the property that, in a case where new grouping using division is performed in stable sorting using division, the configuration of a group does not change even if the order is changed in each group. That is, by rearranging the explanatory variables sorted in the group before division by stable sorting using division, it is ensured that the explanatory variables are sorted in a new group after the division due to stability of the sorting. Therefore, in the present example, when the explanatory variables grouped in secret calculation are further grouped, the calculation costs can be reduced as compared with the method of the related art.
Hereinafter, a specific division example will be described. Note that, in the following, a vertical bar “|” in each expression represents a boundary between groups.
Assuming that m=2 and n=8, vectors z and wj=1, 2) are as follows.
In addition, it is assumed that the second, third, fifth, seventh, and eighth samples and the first, fourth, and sixth samples are grouped and rearranged in this order to obtain the following vectors y, x1, and x2.
At this time, the permutation σ1, the vector v1, the permutation σ2, and the vector v2 are expressed as follows.
A vector b representing a division result when each sample is divided into groups under a certain predetermined condition is as follows.
Here, the vector b is assumed to be a vector indicating that the i-th sample and the j-th sample are currently included in the same group and are included in the same group after division when b[i]=b[j].
Assuming that a vector xj′ is obtained by rearranging the elements of the vector xj in the same arrangement as the vector y′, the following calculation results are obtained in the above-described secret division process.
In a case where the decision tree is learned by secret calculation, it is necessary to divide the grouped training data and create training data aligned by new grouping, and to this end, it is necessary in the related art to sort the explanatory variables in ascending order in each new group. On the contrary, according to the present embodiment, it is not necessary to sort the explanatory variables in ascending order in each new group, and sorting of division results, application of a permutation, synthesis of permutations, and calculation of a reverse permutation are better to be performed, and as a result, the calculation costs are reduced.
Note that, although the secret division process in which data grouped by secret calculation are further grouped on the assumption that the decision tree is learned by the secret calculation has been described in the present embodiment, the secret division device 10 according to the present embodiment may execute a process of learning the decision tree by repeating the above-described secret division process. Furthermore, the secret division device 10 according to the present embodiment may make use of a learned decision tree to execute processes such as various data analysis, data classification, and device control based on the analysis or classification result (for example, control of stopping of the device based on an abnormality detection result, etc.).
The present invention is not limited to the foregoing specifically disclosed embodiment, and various modifications and changes, combinations with known technologies, and the like can be made without departing from the scope of the claims.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2021/038294 | 10/15/2021 | WO |