The present invention relates to a data disturbance device that disturbs and outputs the acquired data, and a data disturbance system that uses the data disturbance device.
The recent development of IoT (Internet of Things) technologies has been allowing companies to collect and analyze detailed, vast amounts of data. Particularly, there have been increasing cases where companies collect and analyze data relating to individuals, which is so-called personal data, provide feedback about the results of the analyses to the individuals concerning the personal data, and utilize internally the personal data to improve the services or to develop new products.
Since personal data possibly contain information concerning privacy of each individual, various techniques for privacy protection have been proposed.
PTL 1, for example, discloses a method for protecting privacy by de-identifying personal information using a technique called “k-anonymization” which prevents identification of a certain number of individuals or more by generalizing attribute values of the personal information.
PTL 2, on the other hand, discloses a method for protecting privacy through de-identification of personal information by disturbing data by means of a wavelet transform in order to meet the safety standards called “differential privacy” which prevents one from distinguishing whether or not specific personal information is contained in a large volume of personal information.
PTL 3 discloses a method for protecting privacy by processing power data by using a signal generator and a battery so as to maximize the Kullback-Leibler divergences in distributions of unprocessed and processed power data while keeping statistical information such as average values and variance values with respect to power data acquired with a smart meter.
[PTL 1] WO 2011/145401
[PTL 2] Japanese Patent Application Publication No. 2016-12074
[PTL 3] Japanese Patent Application Publication No. 2013-153424
However, the k-anonymization technique and the differential privacy technique disclosed in PTL 1 and PTL 2 generalize data more than necessary, degrading data usability. In addition, these techniques cannot protect privacy according to the needs of each individual, despite the fact that each person defines privacy differently.
The privacy protection technique disclosed in PTL 3 also faces such problems as degrading of data usability and not being able to protect privacy according to the needs of each individual.
The present invention was contrived in order to solve the foregoing problems, and an object thereof is to provide a data disturbance device capable of protecting privacy according to the needs of each individual while maintaining data usability.
The data disturbance device according to the present invention has: a disturbance object setting unit that calculates a disturbance parameter necessary to disturb information, which is set as information to be disturbed, out of items of information contained in acquired data; and a data disturbance unit that generates disturbed data by irreversibly converting the acquired data using the disturbance parameter.
According to the data disturbance device of the present invention, the disturbance object setting unit calculates a disturbance parameter necessary to disturb the information, which is set as information to be disturbed, out of items of information contained in the acquired data, and the data disturbance unit generates disturbed data by irreversibly converting the acquired data using the disturbance parameter.
Therefore, the data disturbance device can protect privacy according to the needs of each individual while maintaining data usability.
Preferred embodiments of a data disturbance device according to the present invention are now described hereinafter using the diagrams, wherein the same reference numerals are used to describe the identical or corresponding parts in each diagram.
Embodiment 1 describes a technique concerning a system consisting of two members, a data provider and a data user, as in a smart meter system, a HEMS (Home Energy Management System) and the like, the technique for enabling a data user to perform effective data analysis while protecting privacy information of a data provider contained in data.
The data provider 101 is provided with a data acquisition device 111, a data disturbance device 112, and a data providing device 113. The data user 102 is provided with a data utilization device 114. The data providing device 113 and the data utilization device 114 are connected to each other by a network 115.
The data acquisition device 111 is a device for acquiring data. The data disturbance device 112 is a device for disturbing the acquired data by means of a method described hereinafter. The acquired data are referred to as “raw data” hereinafter.
The data providing device 113 is a device for providing disturbed data from the data provider 101 to the data user 102. The disturbed data are referred to as “processed data” hereinafter. The data utilization device 114 is a device for utilizing the processed data provided from the data provider 101.
In a smart meter system, for example, the data provider 101 is the head of a household, and the data user 102 is a power company. The data acquisition device 111 is a power sensor. The data disturbance device 112 is a personal computer or a dedicated device.
The data providing device 113 is a personal computer or a broadband router. The network 115 is the Internet. The data utilization device 114 is a personal computer or a main frame.
The data disturbance device 112 is configured with a data receiver 221 for receiving raw data from the data acquisition device 111, a disturbance object setting unit 222 for enabling the data provider 101 to set information to be disturbed out of items of information contained in the raw data, a parameter storage unit 223 for storing a parameter used for disturbance, a data disturbance unit 224 for actually disturbing the raw data, and a data transmitter 225 for transmitting processed data to the data providing device 113.
The data providing device 113 is configured with a data receiver 231 for receiving processed data from the data disturbance device 112, and a data transmitter 232 for transmitting the processed data to the network 115.
The data utilization device 114 is configured with a data receiver 241 for receiving processed data from the network 115, and a data analysis unit 242 for performing a variety of analyses on the received processed data.
The data disturbance device 112 is configured with a serial bus 327 for realizing mainly the functions of the data receiver 221, a display 321, a keyboard 322, and a mouse 323 that realize mainly the functions of the disturbance object setting unit 222, an HDD 326 for realizing mainly the functions of the parameter storage unit 223, a processor 324 and a memory 325 that realize mainly the functions of the data disturbance unit 224, and a USB host 328 for realizing mainly the functions of the data transmitter 225.
The data providing device 113 is configured with a processor 331, a memory 332, and a USB host 333 that realize mainly the functions of the data receiver 231, and a network interface 334 for realizing mainly the functions of the data transmitter 232.
The data utilization device 114 is configured with a network interface 347 for realizing mainly the functions of the data receiver 241, and a display 341, a keyboard 342, a mouse 343, a processor 344, a memory 345, and an HDD 346 that realize mainly the functions of the data analysis unit 242.
The relationship between the functional composition shown in
The data transmitter 212 reads the power data from the memory 314 and transmits the read power data to the data disturbance device 112 through the serial bus 315. This series of operations is performed through the execution of a program by the processor 313.
In the data disturbance device 112 shown in
The data disturbance unit 224 first reads the parameter from the HDD 326 which is the parameter storage unit 223, disturbs the raw data stored in the memory 325 to generate processed data based on this parameter, and stores the processed data in the memory 325. The data transmitter 225 reads the processed data from the memory 325 and transmits the processed data to the data providing device 113 through the USB host 328. This series of operations is performed through the execution of a program by the processor 324.
In the data providing device 113 shown in
In the data utilization device 114 shown in
The data analysis unit 242 executes analysis on the memory 345 in accordance with the analysis details that are input by the data user 102 using the keyboard 322 or mouse 343, and then displays the result of the analysis on the display 341. This series of operations is performed through the execution of a program by the processor 344.
Operations of the data disturbance system having the foregoing configuration are described next. First of all, a method for enabling the data provider 101 to set information to be disturbed out of items of information contained in raw data by using the data disturbance device 112 is described with reference to
As shown in
Next, an extraction parameter necessary to extract information from the raw data is studied (step S402). This step is realized by a method known in the pattern recognition field and the machine learning field.
More specifically, when a hidden Markov model is used to extract the operation history of each household electrical appliance from the power data, the foregoing step is realized by studying a transition probability or an output distribution, which is a parameter necessary to extract the operation history of each household electrical appliance, by using a part of an EM (Expectation-Maximization) algorithm. Note that the operation history is a history of operations of each household electrical appliance that show, for example, that the bathroom light was turned ON at 6:10, that the bathroom light was turned OFF at 6:12, and that the rice cooker was set to “warm” at 7:34, and the like.
Next, a disturbance parameter necessary to disturb the information is calculated (step S403). This step is realized by solving a convex optimization problem. More specifically, this step is carried out as follows.
The information to be disturbed, which is the operating status of a specific household electrical appliance selected by the data provider 101, such as the ON/OFF state of the bathroom light, is taken as a random variable X, data such as power data is taken as a random variable Y, and processed data such as disturbed power data is taken as a random variable Z.
In this case, a convex optimization problem is known as a problem of obtaining a conditional probability PZ|Y, that minimizes the amount of mutual information I(X;Z) between the X and the Z, under the condition according to which the expected value of the distance between the Y and the Z is set to be less than a predetermined value, namely under the condition according to which an expected value E of the Euclidean distance [∥Y−Z∥] is less than d ([∥Y−Z∥]<d).
The disturbance parameter PZ|Y, is calculated by solving this problem. Note that a more detailed method was invented by the same inventor as the inventor of the present invention and described in U.S. patent application Ser. No. 15/224,742 filed in the U.S. on Aug. 1, 2016.
Next, the disturbance parameter is stored (step S404), ending the procedure shown in
In each of the tables of transition probabilities, the row shows the current statuses, the column shows the next statuses, and the values represent the transition probabilities. For example, in the transition probability 501 of the living room light, when the current status thereof is ON, the next status of the same also becomes ON with the probability of 0.8, and the next status of the same becomes OFF with the probability of 0.2.
Each of the tables of output distributions shows statuses and power values corresponding thereto. For example, the output distribution 504 of the rice cooker shows that the power value corresponding to the “cooking” status is 325.1 watts, that the power value corresponding to the “warm” status is 107.2 watts, and that the power value corresponding to the “OFF” status is 0.3 watts.
In
For ease of explanation, this embodiment assumes that the power data that are continuous values are quantized appropriately beforehand so as to be treated as discrete values such as “0 to 10 watts.”
Those skilled in the art can easily understand that even when the power data are to be treated as discrete values without being quantized, the power data can be processed in the same manner by appropriately setting a probability setting functions corresponding to the conditional probabilities PZ|Y.
Next, a method for disturbing the raw data to generate processed data by using the data disturbance device 112 is described with reference to
As shown in
Next, processed data corresponding to the raw data are selected in accordance with the conditional probabilities of the disturbance parameter (step S603). More specifically, this step is carried out as follows.
For example, in a case where the received data is 29 watts, as shown by the conditional probabilities of the disturbance parameter in
Then, the processed data is selected according to these conditional probabilities. In this case, the processed data is “20 to 30 watts” with an extremely high probability and, in some cases, “10 to 20 watts” and the like.
Next, the processed data is transmitted from the data transmitter 225 (step S604). The process from step S602 to step S604 is executed on each received data, but the whole process shown in
A more detailed method of step S603 was invented by the same inventor as the inventor of the present invention and described in U.S. patent application Ser. No. 15/224,742 filed in the U.S. on Aug. 1, 2016.
It should be noted that such processing is so-called irreversible conversion. In other words, it is impossible to completely restore the raw data from the processed data. Such irreversible conversion helps reduce the amount of mutual information between the processed data and the information to be disturbed.
On the other hand, in so-called invertible transformation for generating processed data by adding a random number to the raw data, the amount of mutual information between the processed data and the information to be disturbed cannot be minimized because the information of the raw data remain as-is in the processed data.
As explained in the above, one of the main points of the present invention is to generate processed data by performing irreversible conversion on the raw data.
A disturbance parameter used in irreversible conversion is a conditional probability that is obtained by solving the convex optimization problem mentioned above. Therefore, under a given condition, minimization of the amount of mutual information between the processed data and the information to be disturbed can be guaranteed.
Meanwhile, technologies such as k-anonymization and differential privacy that de-identify personal information can quantitatively control personal specificity but cannot quantitatively control the amount of information about the information to be disturbed, which is privacy information.
As explained in the above, being able to quantitatively control the amount of information about the information to be disturbed is another main point of the present invention.
A method for transmitting the processed data to the network 115 using the data providing device 113 and a method for receiving the processed data from the network 115 using the data utilization device 114 and analyzing the received processed data, are executed by means of known network technique and data analysis technique; thus, descriptions thereof are omitted herein.
The operations according to Embodiment 1 have been described above. According to Embodiment 1, by using a system consisting of the two members, the data provider 101 and the data user 102, as in a smart meter system, a HEMS and the like, privacy can be protected according to the needs of each individual while maintaining data usability.
The reasons are as follows. In other words, as shown by step S403 in
Therefore, by selecting the processed data according to such a conditional probability, the privacy information of the data provider 101 obtained from the processed data can be minimized while maintaining data usability.
In addition, Embodiment 1 provides the effect of enabling a variety of safe data analyses without causing concern for the data user 102 about privacy infringement, because the privacy information obtained from the processed data are minimized.
Embodiment 1 discloses the technique of the system consisting of the two members: the data provider 101 and the data user 102. Embodiment 2 describes a technique concerning a system consisting of three or more members including a data intermediate user, the technique enabling the data intermediate user or data user to perform effective data analysis while protecting privacy information of a data provider contained in data.
Embodiment 2 describes particularly a technique to balance privacy protection with effective data analysis while fulfilling a constraint condition on data usability, if there is one.
In
For example, in a smart meter system, the data intermediate user 701 serves as a demand response business operator who provides a demand response service to the data provider 101.
In some cases, a constraint condition different from that for the data user 102 is generated for the data intermediate user 701. For instance, a demand response business operator needs to have a correct understanding of a tight power condition. For this reason, there is set a constraint condition according to which high-power data of, for example, 500 watts or higher need to be used without being disturbed.
Therefore, unlike Embodiment 1, not all the power data of 0 watts to 1000 watts can be disturbed, allowing only the low-power data of 0 watts to 500 watts to be disturbed. In other words, the data intermediate user 701 can only partially protect the privacy of the data provider 101.
However, once the high-power data can be used, this constraint condition becomes no longer necessary. Therefore, when transmitting data from the data intermediate user 701 to the data user 102, reprocessing the high-power data of 500 watts or higher can allow the data user 102 to protect the privacy of the data provider 101.
Operations of the data disturbance system having the foregoing configuration are described next. Specifically, the following describes, with reference to
Step S401 and step S402 shown in
Subsequently, a first disturbance parameter and a second disturbance parameter necessary to disturb information are calculated (step S801). The first disturbance parameter is a parameter used to generate processed data by disturbing raw data using the data disturbance device 112 of the data provider 101. The second disturbance parameter is a parameter used to generate reproduced data by disturbing the processed data using the data disturbance device 112 of the data intermediate user 701.
As in step S403 shown in
For the sake of simplicity of explanation, the following describes an assumption that there is set a constraint condition according to which high-power data of 500 watts or higher need to be used without being disturbed.
First, when calculating the first disturbance parameter, a Euclidean distance is used for the power data of less than 500 watts, distance 0 is used for the power data of 500 watts or higher when the raw data Y and the processed data Z share the same value, and an infinite distance is used in other cases.
When calculating the second disturbance parameter, on the other hand, a Euclidean distance is used for the power data of 500 watts or higher, distance 0 is used for the power data of less than 500 watts when the raw data Y and the processed data Z share the same value, and an infinite distance is used in other cases.
In step S801, a convex optimization problem is solved using these distorted Euclidean distances, to calculate the first disturbance parameter and the second disturbance parameter. Subsequently, the first disturbance parameter is stored (step S802).
Next, the second disturbance parameter is transmitted to the data disturbance device 112 of the data intermediate user 701 (step S803). In so doing, the data disturbance device 112 of the data intermediate user 701 receives and stores the second disturbance parameter.
As shown in
Therefore, as to the processed data generated by the data provider 101, the high-power data of 500 watts or higher are in the same state as the raw data. As to the reprocessed data generated by the data intermediate user 701, the low-power data of less than 500 watts are in the same state as the processed data.
This embodiment has described the example where the power range is divided by the 500 watts borderline into two ranges—low-power range and high-power range. However, those skilled in the art can understand that even when the power range is divided into three or more ranges, the disturbance parameters can be calculated in the same manner.
Those skilled in the art can also understand that even when the power range cannot be divided clearly, the disturbance parameters can be calculated by appropriately setting the distance function.
Note that the data disturbance method using the data disturbance device 112 of the data provider 101 and the data disturbance method using the data disturbance device 112 of the data intermediate user 701 are the same as the method shown in
The operations of Embodiment 2 have been described above. According to Embodiment 2, in the system consisting of three or more members such as the data provider 101, the data intermediate user 701 and the data user 102 as in a demand response system or the like, privacy can be protected according to the needs of each individual while maintaining data usability.
Moreover, according to Embodiment 2, because the privacy information obtained from the processed data and the reprocessed data are minimized under the constraint condition, the effect of enabling a variety of safe data analyses can be exerted without causing concern for the data intermediate user 701 and the data user 102 about privacy infringement.
Embodiments 1 and 2 each takes power data for an example, but those skilled in the art can easily understand that the details of the present invention are not limited to power data and can be applied to various types of data.
Various types of data mentioned above include, for example, vital data such as blood pressure, blood flow and heart rate, base station information, movement history data of a GPS sensor and the like, a date/time of purchase of a product, purchase history data on the contents of the product, and the like. Application of the present invention to these types of data can realize protection of privacy according to the needs of each individual while maintaining data usability.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2016/053066 | 9/22/2016 | WO | 00 |