The present invention relates to an anonymization technology.
Statistical data relating to data including personal information, such as age, gender or address is utilized. There have been known Technologies for anonymizing the data by using data abstraction so that individuals are not specified from disclosed data when the data is disclosed is known. The anonymization is a technology which processes data so as not to specify whose individual data is each record in a set of personal information. There is “k-anonymity” as an index of anonymization. The k-anonymity is anonymization which guarantees that data which are same as the each individual data are not narrowed down less than k number. A group of attributes which can specify an individual based on a combination of them in attributes included in personal information is called a “quasi-identifier”. Basically, the k-anonymity guarantees anonymity on a basis of generalizing attribute values included in these quasi-identifiers and making the number of records which jointly have a quasi-identifier larger than or equal to k.
For example, in patent literature 1, an information processing device which can judge anonymity as a whole of the items based on a comparison between a minimum value and a threshold value when grouping at each item of collected data is disclosed.
In the information processing device disclosed in patent literature 1, an anonymization item storage unit stores anonymization classifiers for each item.
An anonymization processing unit designates the anonymization classifier for each item to data recorded in a first database. Then, the anonymization processing unit groups the data on the basis of the anonymization classifiers. Then, the anonymization processing unit calculates a minimum number of data after grouping for each item, and anonymization on the basis of the result of the calculation. Then, the anonymization processing unit records the result of the anonymization process in a second database.
An anonymization judgment unit judges whether or not there exists an item for which the number of data is less than a predetermined threshold value to the result of the anonymization process recorded in the second database.
[Patent Literature 1] Japanese Patent Application Publication Laid-Open No. 2010-086179
However, the technology disclosed in patent literature 1 has a possibility that personal information which other provision source provides can be specified on a basis of a comparison between data existing in an provision sources of information and anonymized data. That is, the technology disclosed in patent literature 1 has a problem that anonymity is not always preserved.
The reason of this is as follows. A provision source of data can specify self-provided data in anonymized data. Accordingly, provision sources of data can make the anonymity of data of other provision sources lower than a predetermined index by removing the self-provided data which is specified.
One of objects of the present invention is to provide an anonymization device and an anonymization method which can preserve the anonymity of the data to any one of provision sources which provide data.
To achieve the above-mentioned object, an anonymization device according to the present invention includes: judgment unit for judging whether or not anonymity of data is preserved to any one of providers which provide a record which is a part of the data for data which is combined with records acquired from plural providers; and anonymization unit for anonymizing the data on the basis of a judgment result of anonymity of said judgment unit.
To achieve the above-mentioned object, an anonymization method according to the present invention includes: judging whether or not anonymity of data is preserved to any one of providers which provide a record which is a part of the data for data which is combined with records acquired from plural providers; and anonymizing the data on the basis of the judgment result.
To achieve the above-mentioned object, a program according to the present invention causes a computer to execute: processing for judging whether or not anonymity of data is preserved to any one of providers which provide a record which is a part of the data for data which is combined with records acquired from plural providers; and a processing for anonymizing the data on the basis of the judgment result.
An example of effects of the present invention can preserve the anonymity of the data to any one of provision resources which provide data.
First, in order to easily understand exemplary embodiments of the present invention, a background of the present invention will be described.
As shown in
Generally, data targeted for an anonymization process includes ID (Identification) which identifies a user, sensitive information and a quasi-identifier.
The sensitive information is information which is not preferred to be known to others in the state of being associated with an individual.
The quasi-identifier is information which can not specify an individual in the state of single information, but has a possibility of a specification of an individual on a basis of combination with other information.
It is preferable that a value of the quasi-identifier is abstracted in a unified manner in entire records in the meaning of preventing specification of an individual. On the other hand, it is preferable that a value of the quasi-identifier is respectively concrete from the viewpoint of using combination data.
The anonymization process is a process of harmonizing the purpose of “preventing specification of an individual” and the purpose of “using combination data”. In anonymization processes, there are a top-down process and a bottom-up process. Here, the anonymization process of top-down process is “a process of dividing data”, and the anonymization process of bottom-up process is “a process of integrating data”.
Hereinafter, the background will be described more concretely.
Provider Z collects personal information which two different hospitals as Hospital X and Hospital Y respectively hold, and combines both data with preserving the anonymity.
Here, as an example of description, it is supposed that the personal information held by Hospital X and Hospital Y is information including “No.”, “age” and “disease code”.
The “No.” corresponds to an ID of each user.
Then, it is supposed that the “disease code” by which a specification of an individual's disease is sensitive information. And, the sensitive information is information which is not preferred to be changed in an abstraction process because it is used in an analysis of disclosed data.
Then, the abstraction process is a process of converting an attribute or an attribute value of data into data of an attribute or an attribute value with a wider range. Here, for example, the attribute is a classification of age, gender, address or so on. And, the attribute value is a concrete content or a value of an attribute. In the case where abstraction target data is a concrete value, a process which converts the value into numerical-value range data including the value (ambiguous data) is an example of the abstraction process.
It is supposed that personal information other than the sensitive information is a quasi-identifier. Here, the “age” is a quasi-identifier.
An anonymization technology in relation to the present invention judges whether or not anonymity is preserved on the basis of whether or not a predetermined index of k-anonymity is satisfied. The k-anonymity is an index which requires k or more data whose quasi-identifiers are same values. In description below, it is supposed that 2-anonymity is required. And, it is supposed that an anonymization process uses the bottom-up process.
Next, anonymization based on an anonymization technology in relation to the present invention will be described.
The anonymization technology in relation to the present invention divides the combination data shown in
However, there is a case where the provision source of the data can specify personal information existing in other provision source on the basis of comparison of data existing in the provision source of information with anonymized data. That is, there is a case where it cannot necessarily be said that the anonymity of the data shown in
A reason for this is as follows.
The provider of data provision source (Hospital X and Hospital Y) which provides data can specify the data provided by itself in anonymized data. For this reason, the provision source of data can make the anonymity of the data lower than a predetermined index.
More concrete description is as follows.
For example, Hospital X compares the data which is shown in
In this way, the anonymization technology in relation to the present invention has a problem that it does not satisfy an anonymization index.
A first exemplary embodiment of the present invention which will be described below solves the above-mentioned problem.
The first exemplary embodiment of the present invention will be described with reference to drawings.
First, a functional configuration of an anonymization device 10 according to the first exemplary embodiment of the present invention will be described with reference to
As shown in
In addition, in description of this exemplary embodiment, as having been shown in
And, an anonymization process performed by the anonymization unit 12 which the anonymization device 10 includes may be an existing method, and may be a top-down process or a bottom-up process. Therefore, in the following description of this exemplary embodiment, for example, it will be described on the assumption that the anonymization unit 12 performs an anonymization process of a bottom-up process.
The anonymization device 10 stores combination data in the storage unit 13 in advance. The combination data is data combining data which the anonymization device 10 acquires from plural provision sources. The combination data is a set of records in which user attribute information, which is attribute information relating to the user is associated with provision source information which is information indicating the provision source of the user attribute information. For example, as shown in
For example, the anonymization device 10 receives an instruction from a user of the anonymization device 10 and starts anonymization of the combination data. In addition, the anonymization device 10 may be in a mode in which the user instructs a start of anonymization proves to the judgment unit 11 of the anonymization device 10.
The judgment unit 11 acquires the combination data from the storage unit 13 when receiving the start instruction from the user.
The judgment unit 11 judges whether or not the anonymity of data is preserved for any one of provision sources of the data to the combination data acquired from the storage unit 13. In this description, “any one of provision sources” indicates Hospital X and Hospital Y. Therefore, concretely, the judgment unit 11 judges whether or not the anonymity is preserved even when Hospital X and Hospital Y compare data held by itself with the combination data. In addition, as will be described below, the judgment unit 11 judges whether or not the anonymity of the data is preserved even when it is viewed from any one of provision sources of the data for data outputted from the anonymization unit 12,.
When the judgment unit 11 judges that there exists a group in which anonymity is not preserved (for example, k-anonymity is not satisfied), it outputs the combination data to the anonymization unit 12.
The anonymization unit 12 anonymizes the group in which the anonymity included in the received combination data is not preserved when receiving the combination data from the judgment unit 11. Because the anonymization process of this exemplary embodiment is a bottom-up process, the anonymization unit 12 integrates the group in which the anonymity included in the combination data is not preserved.
When there exists a group in which the anonymity is not preserved in the combination data which the anonymization unit 12 anonymized, the judgment unit 11 outputs the combination data to the anonymization unit 12. The anonymization unit 12 receives the combination data and anonymizes it. That is, the judgment unit 11 and the anonymization unit 12 repeat the anonymization process of the anonymization unit 12 until the judgment unit 11 judges that there is no group in which the anonymity is preserved.
When the judgment unit 11 judges that the anonymity of all groups of the combination data is preserved, it outputs the anonymized combination data to outside. The outside is, for example, Provider V shown in
Next, operation of the anonymization device 10 according to the first exemplary embodiment will be described with reference to
As shown in
The judgment unit 11 divides the acquired combination data into plural groups such that plural records which have the same quasi-identifier value are grouped into one group (step S2).
The judgment unit 11 judges whether or not the anonymity of the data is preserved for the combination data acquired from the storage unit 13 to any one of provision sources of data (for example, “Hospital X” and “Hospital Y”) (step S3).
More concretely, the judgment unit 11 judges as follows.
The judgment unit 11 selects one group from groups whose values of quasi-identifier (for example, “age”) are the same, and supposes a group which is removed records including one type of provision source information (for example “Hospital X”). Then, the judgment unit 11 judges whether or not the number of records included in the group is larger than or equal to a threshold value (for example, larger than or equal to two) which is an index of anonymity (for example, “2 anonymity”).
The judgment unit 11 performs similar judgments for all groups.
Moreover, the judgment unit 11 performs similar judgments for all types (for example, “Hospital X” and “Hospital Y”) of provision source information.
Then, the judgment unit 11 judges whether or not the anonymity of the combination data is preserved on the basis of all judgments.
Detailed description of the judgment process of the judgment unit 11 will be made below.
The judgment unit 11 selects a next process on the basis of the judgment in step S3 (step S4).
When to be larger than or equal to the threshold value which is the index of anonymity for all the groups (all groups preserve the anonymity) (Yes in step S4), the judgment unit 11 outputs the combination data which becomes a target of the judgment process as the anonymized combination data.
On the other hand, when there exists a group which is not larger than or equal to the threshold value (there exists a group which does not preserve the anonymity) (No in step S4), the judgment unit 11 instructs integration of the groups to the anonymization unit 12. The anonymization unit 12 integrates the groups which do not preserved the anonymity (step S5).
The group integration process of the anonymization unit 12 does have limitation in particular. For example, the anonymization unit 12 may focus on an optional quasi-identifier for groups which do not preserve the anonymity, and may abstract by integrating groups with the nearest distance of center-of-gravity distance on a data space.
After execution of the process in step S5, the judgment unit 11 judges whether or not the anonymity is preserved for any one of provision sources for group integrated by the anonymization unit 12, in the same way as step S4 (step S6). More concretely, the judgment unit 11 judges whether or not the number of records subtracted the number of records of the provision source is larger than or equal to the threshold value which is the index of the anonymity for each provision source information of the integrated group.
The judgment unit 11 selects a next process on the basis of the judgment result (step S7).
When all integrated groups are larger than or equal to the threshold value (Yes in step S7), the judgment unit 11 outputs the combination data which is a target for the judgment process as the anonymized combination data.
On the other hand, when a group in which the number of records is not larger than or equal to the threshold value exists (No in step S7), the judgment unit 11 instructs integration of groups to the anonymization unit 12 to perform. The anonymization unit 12 integrates the groups which does not preserved the anonymity again (step S5).
The judgment unit 11 and the anonymization unit 12 repeat steps S5 to S7 until all groups are larger than or equal to the threshold value.
Next, each step shown in
In step S1 shown in
As shown in
In step S2 shown in
As shown in
Here, the process for judging whether or not each group satisfies the anonymity for viewed from any one of provision sources of data will be described in detail.
First, the judgment unit 11 removes a record including one certain provision source information from records included in a certain group whose values of quasi-identifier are same. For example, the judgment unit 11 removes records of user 1, user 2 and user 3 whose provision source information are “Hospital X” from records whose “age”s are “20”. The judgment unit 11 judges the anonymity of the group whose “age”s are “20” after removing the three records. The number of records whose “age”s are “20” after removing the three records is one (a record of user 8). Accordingly, the judgment unit 11 judges that this group does not satisfy 2-anonymity (the number of records is not larger than or equal to two). That is, the judgment unit 11 judges that the group whose “age” is “20” does not preserve the anonymity.
The judgment unit 11 judges for all types of the provision source information for every group.
In the data shown in
On the other hand, for the group whose “age” is “24”, the number of records is two when removing the record of “Hospital X” or removing “Hospital Y”. Accordingly, the judgment unit 11 judges that the anonymity is preserved for any one of the provision source for the groups whose “age”s are “24”.
In this way, in this description, “2” which is the index of then anonymity becomes the threshold value.
When the judgment unit 11 judges that there exists the group whose number of records is not larger than or equal to two (there exists a group which does not preserved the anonymity) (No in step S4), it instructs integration of groups to the anonymization unit 12.
In step S5 shown in
As shown in
In the case of the data shown in
As shown in
As described above, the anonymization device 10 according to the first exemplary embodiment can preserve the anonymity of data for any one of data provisions sources.
A reason for this is as follows.
The judgment unit 11 removes the data which the provision source holds and judges whether or not the data which other provision source holds satisfies the anonymity for each provision source. Then, when not satisfying the anonymity, the anonymization unit 12 anonymizes data until satisfying the anonymity.
In addition, in this exemplary embodiment, although the anonymization process of the anonymization unit 12 is described as a bottom-up method, the anonymization unit 12 may anonymize by using a top-down process.
When anonymizing by the top-down process, the anonymization unit 12 does not integrate the data but divides the data.
Concretely, first, the anonymization unit 12 gathers the data into one group, decides a division point of the group afterward and divides the data into plural groups.
Description of an example of operation of the division is as follows.
First, the judgment unit 11 judges whether or not the number of records in case of removing data of each provision source is larger than or equal to a certain threshold value which is an index of anonymity for all divided groups for all types of provision source information. Then, when larger than or equal to the threshold value for all groups, the judgment unit 11 requests a division to the anonymization unit 12. The anonymization unit 12 performs anonymization of the top-down process (a division of the data). The judgment unit 11 repeats this operation as far as all groups satisfy the anonymity. Then, when at least one group which does not satisfy the anonymity exists after the anonymization of the anonymization unit 12, the judgment unit 11 cancels the last data division, that is, returns the data to groups of before the latest anonymization of the anonymization unit 12, and outputs the data as the anonymized combination data.
In addition, when the anonymization of the top-down process, the anonymization unit 12 may make, the median of each group of the combination data the division point, or may determine the division point by other method. For example, the anonymization unit 12 may determine the division points by considering an entropy amount. More concretely, the anonymization unit 12 may make a point whose deviation of provision sources (for example, Hospital X and Hospital Y) is small the division point for data belonging to divided group on the basis of entropy.
For example, the anonymization unit 12 may calculate entropy for divided group by using the following formula.
Entropy=Σ{−1×P(Class)×log(P(Class))}
Here, when “Class” is made “Hospital X” or “Hospital Y”, P (Class) becomes as follows, respectively.
P(Hospital X)=(the number of “Hospital X” in the group after division)/(the sum of the number of “Hospital X” and “Hospital Y” in the group after division)
P(Hospital Y)=(the number of “Hospital Y” in the group after division)/(the sum of the number of “Hospital X” and “Hospital Y” in the group after division).
That is, the anonymization unit 12 calculates entropy for group after division by using the following formula.
Entropy={−1×P(Hospital X)×log(P(Hospital X))}+{−1×P(Hospital Y)×log(P(Hospital Y))}
For example, the anonymization unit 12 calculates the above-mentioned entropy for two groups after the division at an appropriate division candidate point. In addition, the anonymization unit 12 may determine the division candidate point in accordance with a predetermined rule (algorithm), or may determine it in accordance with a well-known method. Then, the anonymization unit 12 may determine the division candidate point at which a value (S) adding entropies of the two groups becomes the maximum as the division point.
Largeness of the value S means that in the two groups, mixing conditions of data in the two groups (mixing conditions of data of “Hospital X” and data of “Hospital Y”) are large, and a deviation of data between the two groups is small.
And, the anonymization unit 12 may make a division candidate point including a group which takes the maximum entropy value among all division candidate points as the division point. A decision method for the division point using entropy is not limited to the above-mentioned method, but may be a different method.
And, in the description so far, the judgment unit 11 judges the anonymity by using k-anonymity as the index. However, the judgment unit 11 may judge not only the k-anonymity but also other index, for example, 1-diversity as the index. The 1-diversity is an index which requires 1 or more types of sensitive information in groups.
For example, the judgment unit 11 may judge whether or not the number of types of sensitive information included in the group is larger than or equal to the threshold value which is a predetermined index of 1-diversity for all groups for each type of provision source information when removing records including one type of provision source information from groups whose values of quasi-identifiers are the same.
As a concrete example, a case where 3-diversity is required in the combination data is considered.
For example, in data shown in
The anonymization unit 12 anonymizes the data on the basis of the above-mentioned judgment results of the anonymity and the diversity of the judgment unit 11. In addition, the anonymization unit 12 may repeat the anonymization process. Alternatively, the judgment unit 11 may judge whether or not other index (for example, t-closeness) is satisfied. The t-closeness is the index which requires that, for two groups, a distance of distribution of sensitive data and a distance of distribution of all attributes are equal to or smaller than t.
And, in this exemplary embodiment, although an example in which each group includes both “Hospital X” and “Hospital Y” for the provision source information is described, the anonymization device 10 may generate a group of data of “Hospital X” or group of data of “Hospital Y”.
For example, in
Next, an anonymization device 20 according to a second exemplary embodiment of the present invention will be described.
The anonymization device 20 is different from the anonymization device 10 in the point that operates to preserve the anonymity even when plural provision sources conspire.
As shown in
The storage unit 23 stores data associated with three or more types of provision source information. For example, the anonymization device 20 is provided data from Hospital W in addition to Hospital X and Hospital Y. Then, the storage unit 23 stores the combination data associated with three types of provision source information.
The judgment unit 21 gathers predetermined two or more types of provision source information as one type of provision source information for group in which three or more types of provision source information are included, and judges the anonymity for each type of the provision source information.
Next, operation of the anonymization device 20 according to the second exemplary embodiment of the present invention will be described with reference to
In step S8, basically, the judgment unit 21 similarly operates as the judgment unit 11. The judgment unit 21 makes information combined predetermined two or more types of provision source information (for example, “Hospital Y” and “Hospital W”) one type of provision source information for the group in which three or more types of provision source information (for example, Hospital X, Hospital Y and Hospital W) are included. Then, the judgment unit 21 judges the anonymity for each type (“Hospital X” is one type, a combination of “Hospital Y” and “Hospital W” is one type) of the provision source information.
For example, when the reliability of Hospital Y and Hospital W is considered low, conspiracy of Hospital Y and Hospital W is assumed. Here, the conspiracy means that Hospital Y and Hospital W lower the anonymity by owning data jointly. Therefore, the judgment unit 21 judges whether or not the anonymity is preserved even when Hospital Y and Hospital W conspire and share the data which each holds.
In step S9, the judgment unit 21 judges the anonymity for the group which the judgment unit 21 integrates in step S5 by making predetermined two or more types of provision source information one type of provision source, just like in step S8.
Next, each step shown in
In step S1 shown in
As shown in
In step S2 shown in
As shown in
Here, a process in which the judgment unit 21 judges whether or not each group satisfies the anonymity for viewed from any one of provision sources of data when two or more hospitals conspire will be described in detail.
In this exemplary embodiment, the judgment unit 21 makes group in which the provision source information is included three or more types a judgment targets when conspiracy. And, it is supposed that the reliabilities of Hospital Y and Hospital W is low, and the judgment unit 21 judges whether or not the anonymity is satisfied by making “Hospital Y” and “Hospital W” one type of provision source.
In step S8 shown in
Here, when confirming groups shown in
In step S5 shown in
In the case shown in
In step S9 shown in
In addition, until here, though it is considered the case where the “Hospital Y” and “Hospital W” conspire, conspiracy patterns to be considered are not limited to this. For example, the judgment unit 21 may judge that the anonymity is preserved when all combinations of provision source information satisfy the anonymity. Concretely, for example, in the case of
And, in description of this exemplary embodiment, the case where the provision source information in data targeted for the anonymization process are three types, and two types of provision source information become one type of provision source information. However, the present invention is not limited to this. The exemplary embodiment may make the provision source information for data of target of the anonymization process three or more types, and two or more types of provision source information one type of provision source information.
As described above, the anonymization device 20 according to the second exemplary embodiment can preserve the anonymity of data when plural provision sources providing the data conspire.
A reason for this is as follows.
Because the judgment unit 21 judges whether or not the anonymity is satisfied by making plural provision source information one type of provision source information. Then, it is because the judgment unit 21 instructs the anonymization to the anonymization unit 12 when the anonymity is not satisfied.
Next, an anonymization device 30 according to a third exemplary embodiment of the present invention will be described. The anonymization device 30 is different from the anonymization device 10 and the anonymization device 20 in the point that is set anonymization level, which are different in accordance with provision sources.
As shown in
The setting unit 34 sets threshold values of anonymity levels which are different in accordance with each type of provision source information to the combination data which the storage unit 23 stores. The setting unit 34 may set, for example, the anonymity levels in accordance with the reliabilities of the provision sources. The setting unit 34 outputs the combination data set the anonymity levels which are different in accordance with the types of the provision source information to the judgment unit 31.
In this exemplary embodiment, as shown in
The judgment unit 31 judges whether or not the number of records removed records including the same provision source information is larger than or equal to the threshold value (the index of anonymization) which is different in accordance with a type of provision source information.
Next, operation of the anonymization device 30 according to the third exemplary embodiment of the present invention will be described with reference to
As shown in
Because the other steps are similar, detailed description will be omitted appropriately.
In step S10, the setting unit 34 sets the threshold value of the anonymity level for each type of provision source information to the combination data which the storage unit 23 stores. The setting unit 34 may set a different anonymity level for each type of the provision source information, or may set a same threshold value of the anonymity level for plural types of the provision source information.
And, in step S11 and step S12, the judgment unit 11 judges whether or not the number of records removed the provision source information is larger than or equal to the threshold value of an anonymity level of the each type of the provision source information for each type of the provision source information for each group.
Next, individual steps of
In this exemplary embodiment, it is supposed that the storage unit 23 stores the combination data shown in the
In step S10 shown in
As shown in
In step S2 shown in
Here, a process in which the judgment unit 31 judges whether or not each group satisfies the anonymity level for each type of the provision source information for viewed from any one of the provision sources of data will be described in detail.
In step S11 shown in
For example, for the group whose “age” is “20”, the remainder is one of the record of “Hospital Y” when removed records of “Hospital X”. Hospital X has high reliability, and the “anonymization level” of “Hospital X” is “1”. Accordingly, the judgment unit 31 judges that the group whose “age” is “20” satisfies the anonymity. Alternatively, when removed “Hospital Y”, records of “Hospital X” remain three. The “anonymity level” of “Hospital Y” is “2”. Accordingly, the judgment unit 31 judges that the group whose “age” is “20” preserves the anonymity.
On the other hand, groups whose “age”s are “21” and “22” have low reliability, and include “Hospital W” whose anonymity level is “3”, respectively. Then, for each group, remained record is one when removing records of “Hospital W”. Accordingly, the judgment unit 31 judges that any one of groups whose “age”s are “21 and “22” does not satisfy the anonymity.
The judgment unit 31 judges for all groups similarly.
In step S5 shown in
First, the anonymization unit 12 of this exemplary embodiment integrates the groups whose “age”s are “21 and “22” among the groups of NG. For the group integrated the groups whose “age”s are “21 and “22” shown in
Accordingly, in step S5 shown in
For the group of “21-23” which is integrated the groups whose “age”s are “21-22” and “23” shown in
As described above, the anonymization device 30 according to the third exemplary embodiment can preserve the anonymity of data in accordance with the reliability of the plural provision sources which provide the data.
A reason for this is as follows.
The setting unit 34 sets the threshold value of the anonymity level for each type of provision source information for the combination data which the storage unit 23 stores. Then, it is because the judgment unit 31 instructs anonymization of the anonymization unit 12 on the basis of the reliability of the provision sources.
In addition, in this exemplary embodiment, it is described on the assumption that the setting unit 34 sets the anonymity levels to the data which the storage unit 23 stores. However, the present invention is not limited to this. For example, the storage unit 23 may store the combination data which is set the anonymity levels in accordance with the provision sources in advance. In this case, the setting unit 34 is not needed. Alternatively, the judgment unit 31 may set the anonymity levels in accordance with the provision source before dividing into plural groups.
And, in the anonymization of the top-down process, when determined the division point by considering entropy, the anonymization unit 12 may use weighted entropy in accordance with the reliability.
For example, the anonymization unit 12 may calculate entropy for groups after dividing by using the following formula.
Entropy=Σ{−WClass×P(Class)×log(P(Class))}
Here, exception the operation of multiplying Wclass, may be the same function like the function shown in the first exemplary embodiment. And, the method for determining the division point on the basis of the value of the above-mentioned entropy may be the same as the method shown in the first exemplary embodiment. Wclass is a weighting coefficient in accordance with the reliability of each “Class” (for example, each of Hospital X, Hospital Y and Hospital W). In the above-mentioned example, for example, “Wclass” is “1” when “Class” is “Hospital X”, “Wclass” is “2” when “Class” is “Hospital Y”, and “Wclass” is “3” when “Class” is “Hospital W”.
Next, an anonymization device 40 according to a fourth exemplary embodiment of the present invention will be described.
The anonymization device 40 is different from the anonymization device 10, the anonymization device 20 and the anonymization device 30 in the point that is directly input data from outside to a judgment unit 41.
As shown in
The judgment unit 41 judges whether or not the anonymity of the data is preserved for viewed from any one of the provision sources having record which is a part of the combined data for the data which is combined with plural records acquired from the plural provision sources.
The anonymization unit 42 repeats the anonymization process of data on the basis of the judgment result of the anonymity of the judgment unit 41.
The judgment unit 41 outputs the combination data to outside as the anonymized combination data when it judges that the anonymity is preserved for the combination data for any one of the provision sources.
Next, operation of the anonymization device 40 according to the fourth exemplary embodiment will be described with reference to
As shown in
Subsequently, the anonymization device 40 processes like the anonymization device 10 according to the first exemplary embodiment.
As described above, the anonymization device 40 according to the fourth exemplary embodiment can preserve the anonymity of the data for any one of provision sources which provide data.
A reason for this is as follows.
The judgment unit 41 of the anonymization device 40 judges the anonymization like the anonymization device 10 of the first exemplary embodiment. Then, the judgment unit 41 instructs the anonymization of groups which do not satisfy the threshold value to the anonymization unit 12.
While the invention has been particularly shown and described with reference to exemplary embodiments thereof, the invention is not limited to these embodiments. It will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present invention as defined by the claims.
As shown in
For example, the CPU 1 executes an operating system, and read out programs and data from a recording medium which is not shown in the figure and is attached to the storage device 4 into the memory 3. Then, the CPU 1 controls the whole anonymization device 10 in accordance with the read-out programs and executes various processes of the judgment unit 11 and the anonymization unit 12.
The communication IF 2 connects the anonymization device 10 with other device not shown in the figure via network. For example, the anonymization device 10 may receive data of Hospital X and Hospital Y from external devices not shown in the figures via the communication IF 2, and may store into the storage unit 13. Alternatively, the CPU 1 may download a computer program from an external computer which is not shown in the figure and is connected to a communication network via the communication IF 2.
The memory 3 is, for example, a D-RAM (dynamic random read memory), and temporarily stores programs and data.
The storage device 4 is, for example, an optical disc, a flexible disc, a magneto-optical disc, an external hard disk or a semiconductor memory, and stores a computer program with readable for the computer.
For example, the storage unit 13 may be realized by using the storage device 4.
The input device 5 is, for example, a mouse device, a keyboard and the like, and receives inputs from users.
The output device 6 is, for example, a display device, such as a display.
The anonymization devices 20, 30 and 40 according to the second to fourth exemplary embodiments may be configured by using the computer device including the CPU 1 and the storage device 4 storing programs.
In addition, the block diagrams (
A program according to the present invention may be a program which causes a computer to execute the each operation described in the above mentioned each exemplary embodiment.
This application is based upon and claims the benefit of priority from Japanese patent application No. 2012-032992, filed on Feb. 17, 2012, the disclosure of which is incorporated herein in its entirety by reference.
Number | Date | Country | Kind |
---|---|---|---|
2012-032992 | Feb 2012 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2013/000639 | 2/6/2013 | WO | 00 |