The present invention relates to a technique for generating random numbers.
Patent Literature 1 has been known as a noise addition method that has been conventionally used in anonymization, one of privacy protection techniques (for example, Non-patent literatures 1, 2, 3 4 and 5).
A subject in the above patent literature is to generate random numbers according to an arbitrary probability distribution of an arbitrary dimension.
One of methods for protecting privacy of personal data in a database is anonymization.
In order to anonymize numerical attributes, an anonymization method of causing each attribute value to transition to another attribute value using a random number according to a probability distribution called Laplace distribution (Non-patent literatures 2, 3 and 4) may be used. As an example of numerical attribute personal data, an example of position information (latitude/longitude) about a person will be considered. Though there is almost no possibility that a person exists in an area of the sea other than routes, it may seem as if a person existed in the area due to anonymization.
In order to anonymize a discrete attribute called a category attribute, which is not a numerical attribute, an anonymization method of causing the discrete attribute value to transition to another attribute value with a probability ρ may be used (Non-patent literatures 1 and 5). As an example of the category attribute, purchase information about a person will be considered. Though there cannot be a person who purchases a movie ticket of a high school student price and an alcoholic drink, it may seem as if such a person that belongs to the category area existed due to anonymization.
A reason why such state transition of a numerical attribute and a category attribute occurs is that a random number included in the impossible area is generated.
In order to prevent such impossible state transition, it is necessary to generate random numbers according to a multidimensional probability distribution of an arbitrary dimension such that a probability in an impossible area is zero. A method for generating such random numbers, however, has not been proposed yet.
The present invention is intended to provide a random number generation apparatus, method and program for generating random numbers according to a multidimensional probability distribution such that a probability in a predetermined area is zero.
A random number generation apparatus according to one aspect of the present invention comprises:
a first random number generating part generating a random number u=(u1, . . . ,uD)T∈[−∞,∞]D;
a second random number generating part generating a random number v∈[0,f′max]; and
a determining part determining whether f′(x1=u1, . . . ,xD=uD)≥v or not, and, if f′(x1=u1, . . . ,xD=uD)≥v, adopting u as a random number according to f′(x1, . . . ,xD), wherein
D is a predetermined positive integer, for i=1, . . . ,D, [hi] is a predetermined possible range for a random variable xi, a hole [h] is [h]=([h1], . . . ,[hD])T, H is a probability of a predetermined basic distribution function f(x1, . . . ,xD) in the hole [h], α=1/(1−H), a corrected distribution function f′(x1, . . . ,xD) is defined by Expressions (1) and (2):
[Formula 1]
H=∫
x
∈[h
]. . . ∫x
f′(x1, . . . ,xD)=0 if x1∈[h1]∧ . . . ∧xD∈[hD] (1)
f′(x1, . . . ,xD)=α·f(x1, . . . ,xD) otherwise (2)
and f′max is a maximum value of f′(x1, . . . ,xD).
A random number generation apparatus according to another aspect of the present invention comprises:
an arithmetic operation part obtaining r=(r1, . . . ,rD)T in the end, by performing, for each of i=1, . . . ,D, a process for generating a random number uiç[0,1], obtaining a random number ri=F−1(ui) according to f′(xi) using the generated ui, and xi=ri, wherein
D is a predetermined positive integer, for i=1, . . . ,D, [hi] is a predetermined possible range for a random variable xi, a hole [h] is [h]=([h1], . . . ,[hD])T, H is a probability of a predetermined basic distribution function f(x1, . . . ,xD) in the hole [h], α=1/(1−H), a corrected distribution function f′(x1, . . . ,xD) is defined by Expressions (1) and (2):
[Formula 2]
H=∫
x
∈[h
]. . . ∫x
f′(x1, . . . ,xD)=0 if x1∈[h1]∧ . . . ∧xD∈[hD] (1)
f′(x1, . . . ,xD)=α·f(x1, . . . ,xD) otherwise (2)
f′(xi) is a peripheral distribution of f′(x1, . . . ,xD) of xi, F(t1) is a function defined by Expression (2′):
[Formula 3]
f′(xi)=∫−∞∞ . . . ∫−∞∞f′(x1, . . . ,xD)dx1 . . . dxi−1dxi+1 . . . dxD
F(ti)=∫−∞t
and F−1(ti) is an inverse function of F(ti).
It is possible to generate random numbers according to a multidimensional distribution such that a probability in a predetermined area is zero.
Embodiments of the present invention will be described below with reference to drawings.
Hereinafter, a distribution function that generates random numbers included in an impossible area is called a basic distribution function f(x1, . . . ,xD), and a distribution function that does not generate random numbers included in an impossible area is called a corrected distribution function f′(x1, . . . ,xD), where, ∀D∈N+\{0}. In other words, D is a predetermined positive integer.
An impossible area of a random variable is called “a hole”, and the area of the hole is given as a parameter, for example, as shown below.
A hole [h]=([h1], . . . ,[hD])T is a hypercube in D-dimensional space, and each [hi] indicates a range [si,ei] for a random variable xi.
The present invention is to generate a random number r∈RD according to an arbitrary D-dimensional probability distribution function f(x1, . . . ,xD). Intuitively, the process of the present invention is to express the function f(x1, . . . ,xD) as D one-dimensional probability distribution functions, sequentially generate a random number ri∈R D times, and obtain r=(r1, . . . ,rD)T, a set of ri, in the end.
For example, a random number generation apparatus of a first embodiment is provided with a first random number generating part 2, a second random number generating part 3 and a determining part 4 as shown in
For example, a random number generation method of the first embodiment is realized by each part of the random number generation apparatus performing processes from step S2 to step S4 illustrated in
Hereinafter, when the basic distribution function f is given, a procedure for generating a random number r∈RD of the first embodiment will be shown below. In this procedure, a technique of a rejection method is used. Hereinafter, a maximum value of the corrected distribution function f′ is indicated by f′max.
The random number generation apparatus outputs a D-dimensional random number r=(r1, . . . ,rD)T with the basic distribution function f and the hole [h] as input.
A probability H of a basic distribution function f(x1, . . . , xD) in the hole [h] is defined as below.
[Formula 4]
H=∫
x
∈[h
]. . . ∫x
When α=1/(1−H), a corrected distribution function f(x1, . . . , xD) is defined by Expressions (1) and (2).
[Formula 5]
f′(x1, . . . ,xD)=0 if x1∈[h1]∧ . . . ∧xD∈[hD] (1)
f′(x1, . . . ,xD)=α·f(x1, . . . ,xD) otherwise (2)
For example, the random number generation apparatus is provided with the function generating part 1 that calculates the probability H and the corrected distribution function f′(x1 . . . , xD), and the function generating part 1 calculates the probability H and the corrected distribution f′(x1 . . . , xD) in advance before subsequent processes are performed (step S1). The random number generation apparatus may not be provided with the function generating part 1. In this case, in subsequent processes, as the probability H and the corrected distribution function f′(x1 . . . , xD), the random number generation apparatus uses those calculated by another apparatus in advance.
The first random number generating part 2 generates a random number u=(u1, . . . ,uD)T∈[−∞,∞]D (step S2). The first random number generating part 2 generates, for example, a uniform random number u=(u1, . . . ,uD)T∈[−∞,∞]D as the random number u. The generated random number u is outputted to the determining part 4.
The second random number generating part 3 generates a random number v∈[0,f′max] (step S3). The second random number generating part 3 generates a uniform random number v∈[0,f′max] as the random number v. The generated random number v is outputted to the determining part 4.
The determining part 4 determines whether f′(x1=u1, . . . ,xD=uD)≥v or not. If f′(x1=u1, . . . ,xD=uD)≥v, the determining part 4 adopts the u as a random number according to f′(x1, . . . ,xD) and outputs the random number u (step S4).
If f′(x1=u1, . . . ,xD=uD)<v, the determining part 4 causes the first random number generating part 2 and the second random number generating part 3 to perform the processes of steps S2 and S3, respectively, again. In other words, the determining part 4 causes the first random number generating part 2 and the second random number generating part 3 to perform the processes of steps S2 and S3, respectively, until f′(x1=u1, . . . ,xD=uD)≥v.
By the above processes, a multidimensional random number according to a multidimensional probability distribution such that a probability in the hole [h], which is a predetermined area, is zero can be generated. If an expression of a multidimensional probability distribution can be obtained, it is possible to generate random numbers not only from a multidimensional probability distribution such that a probability in an impossible area is zero but from an arbitrary probability distribution.
For example, by using this random number generation method, personal data that is impossible in a real society is not generated when anonymization of a database that includes personal data having a plurality of attributes is performed.
Attributes in a database to be anonymized are not necessarily independent but may be dependent on one another. In Non-patent literature 1, anonymization is applicable only when attributes in a database are independent and cannot be trivially extended to a multidimensional probability distribution such that there are attributes dependent on one another.
A procedure for generating a random number r∈RD the processing speed of which is faster than that of the procedure of the first embodiment will be shown below. In this procedure, a technique of an inverse function method is used.
For example, a random number generation apparatus of the second embodiment is provided with an arithmetic operation part 5 as shown in
For example, a random number generation method of the second embodiment is realized by each part of the random number generation apparatus performing a process of step S5 illustrated in
The random number generation apparatus outputs a D-dimensional random number r=(r1, . . . ,rD)T with the basic distribution function f and the hole [h] as input.
A probability H of a basic distribution function f(x1, . . . , xD) in the hole [h] is defined as below.
[Formula 6]
H=∫
x
∈[h
]. . . ∫x
When α=1/(1−H), a corrected distribution function f′(x1, . . . , xD) is defined by Expressions (1) and (2).
[Formula 7]
f′(x1, . . . ,xD)=0 if x1∈[h1]∧ . . . ∧xD∈[hD] (1)
f′(x1, . . . ,xD)=α·f(x1, . . . ,xD) otherwise (2)
For example, the random number generation apparatus is provided with the function generating part 1 that calculates the probability H and the corrected distribution function f′(x1 . . . , xD), and the function generating part 1 calculates the probability H and the corrected distribution f′(x1 . . . , xD) in advance before subsequent processes are performed (step S1). The random number generation apparatus may not be provided with the function generating part 1. In this case, in subsequent processes, as the probability H and the corrected distribution function f′(x1 . . . , xD), the random number generation apparatus uses those calculated by another apparatus in advance.
By performing, for each of i=1, . . . , D, a process for generating a random number ui∈[0,1], obtaining a random number ri=F−1(ui) according to f′(xi) using the generated ui, and xi=ri, the arithmetic operation part 5 obtains r=(r1, . . . ,rD)T in the end (step S5), wherein f′(xi) is a peripheral distribution of f′(x1, . . . ,xD) of xi, F(ti) is a function defined by Expression (2′)
[Formula 8]
f′(xi)=∫−∞∞ . . . ∫−∞∞f′(x1, . . . ,xD)dx1 . . . dxi−1dxi+1 . . . dxD
F(ti)=∫−∞t
and F−1(ti) is an inverse function of F(ti).
The arithmetic operation part 5 may perform processes from step S50 to step S57 below to perform step S5.
For example, first, for i=1 (step S50), the arithmetic operation part 5 derives a peripheral distribution f′(x1) of f′(x1,x2, . . . ,xD) of x1 (step S51).
[Formula 9]
f′(xi)=∫−∞∞ . . . ∫−∞∞f′(x1, . . . ,xD)dx1 . . . dxi−1dxi+1 . . . dxD
Then, the arithmetic operation part 5 derives a cumulative density function F(t1) (step S52).
[Formula 10]
F(ti)=∫−∞t
Then, the arithmetic operation part 5 derives F−1(t1) which is an inverse function of F(t1) (step S53).
Then, the arithmetic operation part 5 generates a random number u1∈[0,1] (step S54). The arithmetic operation part 5 generates, for example, a uniform random number u1∈[0,1] as the random number u1.
Then, the arithmetic operation part 5 obtains a random number r1F−1(u1) according to f′(x1) using the generated u1 (step S55). That is, the arithmetic operation part 5 calculates an output value in the case of inputting the generated u1 to F−1(t1) and sets the output value as r1.
Then, the arithmetic operation part 5 substitutes r1 for x1 of f′(x1, . . . ,xD) to obtain f′(x2, . . . ,xD|x1=r1) (step S56).
Then, for integers ∀i∈[2,D], the arithmetic operation part 5 performs an operation similar to steps S51 to S56 described above for f′(xi, . . . ,xD|x1=r1, . . . ,xi−1=ri−1) to obtain r=(r1, . . . ,rD)T (step S57).
As described above, a multidimensional probability distribution may be converted to one-dimensional peripheral distributions to sequentially generate random numbers using an inverse function method.
It is possible to generate random numbers according to a multidimensional probability distribution such that a probability in the hole [h], which is a predetermined area, is zero by the second embodiment as like as the first embodiment though the procedure for generating random numbers of the second embodiment is different from that of the first embodiment.
An example of the corrected distribution function f′ in the case of D=2 will be described below.
Where, N is a predetermined positive integer, and there are N predetermined possible ranges [hi] for the random variable xi. The N predetermined possible ranges are {[hi]}i=1N{([h1]i,[h2]i)T}i=1N={([(s1)i,(e1)i],[(s2)i,(e2)i])T}i=1N. As described above, the number of holes is to be N.
Further, (μx1,2σx12) and (μx2,2σx22) are predetermined parameters, and the basic distribution function f′(x1, . . . , xD) is Laplace distribution Lap(x1,x2) defined by Expression (3). The Laplace distribution Lap(x1,x2) is shown in
The sign(a) is a function that outputs a sign {+,−} of an input a, a function g(s,e) is defined by Expressions (5) and (6) when s≤e, and H is defined by Expression (4), then the corrected distribution function f′ is Lap′(x1,x2) defined by Expressions (7) and (8).
Where, an example of the processes from step S50 to step S56 of the second embodiment will be described.
The corrected distribution function f′ is Lap′(x1,x2) with one hole [h] that is shown in
Where, a first area S1 is (x1≥μx1)∧(x2≥μx2), a second area S2 is (x1≤μx1)∧(x2≥μx2), a third area S3 is (x1≥μx1)∧(x2≤μx2) and a fourth area S4 is (x1≤μx1)∧(x2≤μx2).
First, a peripheral distribution of x1 will be considered in order to obtain a random number r1.
1. If the hole [h] is in the first area S1 or the second area S2, the corrected distribution function f′ is Lap′(x1) defined by Expressions (9) and (10).
2. If the hole [h] is in the third area S3 or the fourth area S4, the corrected distribution function f′ is Lap′(x1) defined by Expressions (11) and (12).
If a hole is across areas, the above cases 1. and 2. can be combined. Next, in order to generate a random number r1, inverse functions shown by Expressions (13), (14), (15) and (16) are derived from the cumulative density function:
[Formula 17]
F(ti)=Lap′(x1≤t1)=∫−∞t
Since inverse functions can be derived in other areas by using a similar procedure, description will be made here on only a case where there is a hole in the first area.
Hereinafter,
i) When −∞≤t1≤μx1:
ii) When μx1≤t1≤a:
iii) When a≤t1≤b:
iv) When b≤t1:
By substituting u1∈[0,1] according to a uniform distribution U[0,1] for F(t1) of Expressions (13), (14), (15) and (16), the random number r1=t1 according to Expressions (9) and (10) is generated. A probability density function Lap′(x2|x1=r1) in the case of x1=r1 is as follows:
Where, γ is defined by the following expression:
At this time, in order to generate a random number r2, inverse functions shown in Expressions (19), (20) and (21) are derived from the cumulative density function:
[Formula 25]
F(t2)=Lap′(x2≤t2)=∫−∞t
i) When −∞≤t2≤μx2:
ii) When μx2≤t2≤c:
iii) When c≤t2≤d, F(t2) is determined regardless of t2. Therefore, an inverse function does not exist. That is, random numbers within this range are not generated.
iv) When d≤t2:
By substituting u2∈[0,1] according to a uniform distribution U[0,1] for F(t2) of Expressions (19), (20) and (21), the random number r2=t2 according to Expressions (17) and (18) is generated. In this way, r=(r1,r2)T is generated.
[Program and Recording Medium]
When each process of the random number generation apparatus is realized by a computer, processing content of functions that the random number generation apparatus should have is written by a program. By executing the program by computer, the processes of the random number generation apparatus are realized on the computer.
The program in which the processing content is written can be recorded to a computer-readable recording medium. As the computer-readable recording medium, any recording medium, for example, a magnetic recording device, an optical disk, magneto-optical recording medium or a semiconductor memory is possible.
Each processing part may be configured by causing a predetermined program to be executed on the computer, or at least a part of processing content of the processing part may be realized as hardware.
[Modification]
It goes without saying that the present invention can be appropriately modified within a range not departing from the spirit of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2017-007827 | Jan 2017 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2018/001472 | 1/12/2018 | WO | 00 |