The present invention relates to a data processing system, a model generation device, a data processing method, a model generation method, and a program.
In recent years, a model generated by machine learning is used for various uses. For example, PTL 1 describes that learning using a neural network is used when a state of a secondary battery is estimated.
Patent Document 1: Japanese Patent Application Publication No. 2008-232758
Value of a model used for an analysis of data, for example, a model generated by machine learning is high. Meanwhile, when a use of the model is widened, a case where a third person can be connected to the model increases as in a case where a device using the model is connected to a network. In this case, input data can be transmitted to the model, and output data by the model can be received. Then, the third person can acquire a plurality of combinations of the output data and the input data, and thus a structure of the model may be able to be estimated.
One example of an object of the present invention is to increase concealment of a model used for an analysis of data.
The present invention provides a data processing system including:
The present invention provides a model generation device that generates a model used by an intermediate data generation unit included in the data processing system described above, and the model generation device includes:
The present invention provides a data processing method including,
The present invention provides a model generation method for generating a model used by an intermediate data generation unit included in the data processing system described above, and the model generation method causes a computer to perform:
The present invention provides a program causing a computer to include:
The present invention provides a program causing a computer to include
an output data generation function of generating output data by selecting at least one value located in a predetermined position after a predetermined arithmetic operation is performed on the intermediate data generated by the intermediate data generation function described above.
The present invention provides a program causing a computer to include a function of generating a model used by an intermediate data generation unit included in the data processing system described above, and the program causes the computer to include:
According to the present invention, concealment of a model used for an analysis of data is increased.
An embodiment of the present invention will be described below by using drawings. Note that, in all of the drawings, a similar component has a similar reference sign, and description thereof will not be repeated as appropriate.
In the example illustrated in
The storage battery 30 supplies power to an apparatus 40. As one example, the apparatus 40 is, for example, a vehicle such as an electric vehicle. However, when the storage battery 30 is a storage battery for home use, the apparatus 40 is an electric apparatus used at home. In this case, the storage battery 30 is located outside the apparatus 40. Further, the storage battery 30 may be connected to a system power network. In this case, the storage battery 30 is used for leveling supplied power. Specifically, the equipment 40 stores electric power when the electric power is remaining, and supplies electric power when the electric power is running short.
The data processing system 20 estimates a state of the storage battery 30. The state estimated herein is, for example, at least one of a residual capacity (unit is Ah), a state of charge (SOC), and a state of health (SOH) of the storage battery 30. The SOH is, for example, “current full charge capacity (Ah)/initial full charge capacity (Ah) × 100(%)”. Note that the state of the storage battery 30 is not limited to these. In the estimation, as described above, the data processing system 20 uses the model. The model generation device 10 generates and updates at least one of the models used by the data processing system 20 by using machine learning, for example, a neural network.
The model generation device 10 may acquire a measurement value of data (hereinafter described as performance data) related to a state of the storage battery 30 from the plurality of storage batteries 30. In this case, a part of a plurality of pieces of the performance data may be used as training data of machine learning, and at least a part of the rest of the performance data may be used for verifying a model.
The performance data preferably include information that determines a kind (for example, a product name and a model number) of the storage battery 30. In this way, the model generation device 10 can generate a model by a kind of the storage battery 30. Then, the data processing system 20 can acquire, from the model generation device 10, a model associated with the kind of the storage battery 30 connected to the data processing system 20, and use the acquired model. Therefore, estimation accuracy of a state of the storage battery 30 by the data processing system 20 increases.
Note that at least a part of the performance data may be generated by the storage battery 30 used for collecting the performance data as a main purpose.
The model generation unit 140 may generate a plurality of models by using a plurality of machine learning algorithms (for example, a long short-term memory (LSTM), a deep neural network (DNN), linear regression (LR), and the like).
In the example illustrated in
The model generation device 10 further includes a pre-processing unit 130. The pre-processing unit 130 converts at least the second training data according to a predetermined conversion rule. The model generation unit 140 generates a model by using the second training data after the conversion. Herein, the pre-processing unit 130 may further convert the first training data according to a predetermined conversion rule. In this case, the model generation unit 140 generates a model by using the first training data after the conversion.
One example of the conversion rule described above is processing of increasing, by adding dummy data to input data of the training data, the number of values constituting the input data, and also increasing, by also adding dummy data to output data of the training data, the number of values constituting the output data. For example, when input data are formed of three values, the input data after the conversion include the three values originally included in the input data, and dummy data formed of at least one value, and thus the input data are formed of four or more values. Note that a detailed example of the conversion rule will be described below.
The model generated by the model generation unit 140 is stored in a model storage unit 150. Then, the model stored in the model storage unit 150 is transmitted to the data processing system 20 by a model transmission unit 160. In the example illustrated in
The intermediate data generation unit 240 and the output data generation unit 250 are devices different from each other. As one example, the output data generation unit 250 of the data processing system 20 is provided on a cloud server, and another function of the data processing system 20 is provided on a terminal. However, a storage processing unit 210, a model storage unit 220, and a display processing unit 260 described below may also be provided on the cloud server.
The input data processing unit 230 acquires input data, and converts the input data according to a conversion rule. A configuration of the input data is the same as that of the input data of the training data used by the model generation device 10. For example, the input data processing unit 230 acquires the input data from a sensor (for example, an ammeter, a voltmeter, and a thermometer) that detects a state of the storage battery 30. Further, the conversion rule used by the input data processing unit 230 is the same as the conversion rule used by the pre-processing unit 130 of the model generation device 10. As described above, one example of the conversion rule is to increase the number of values included in the input data by adding dummy data to the input data.
The intermediate data generation unit 240 generates intermediate data by processing the input data after the conversion. The intermediate data are, for example, a plurality of rows and/or a plurality of columns of data formed of a plurality of values. The intermediate data generation unit 240 generates the intermediate data by using the model generated by the model generation device 10. As described above, when the model generation device 10 generates the model, the model generation device 10 adds the output data of the training data to the dummy data. Thus, the generated intermediate data include a significant value (i.e., a value desired to be acquired) and a value being a dummy.
The output data generation unit 250 generates output data by selecting at least one value (i.e., the significant value of the intermediate data) located in a predetermined position after a predetermined arithmetic operation is performed on the intermediate data generated by the intermediate data generation unit 240. The predetermined arithmetic operation performed herein is, for example, addition, but may be subtraction, multiplication, or division, or may be an arithmetic operation acquired by combining at least two of addition, subtraction, multiplication, and division.
In the example illustrated in
The storage processing unit 210 acquires the model from the model generation device 10, and stores the model in the model storage unit 220. When the storage processing unit 210 acquires, from the model generation device 10, data (for example, a parameter of the model) for updating the model, the storage processing unit 210 updates the model stored in the model storage unit 220 by using the data. The update processing is preferably repeatedly performed. In the example illustrated in
The display processing unit 260 displays the output data generated by the output data generation unit 250 on the display 270. The display 270 is disposed in a position that can be visually recognized by a user of the apparatus 40. For example, when the apparatus 40 is a vehicle, the display 270 is provided inside the vehicle (for example, in front of or obliquely in front of a driver’s seat).
In the example illustrated in
Then, in the example illustrated in
Note that the pre-processing unit 130 of the model generation device 10 adds, in a predetermined position (for example, a second column), dummy data (for example, 0) of a predetermined value to input data (for example, [1, 1, 1]) included in training data. Further, the pre-processing unit 130 adds, in a predetermined position (for example, a first column and a fifth column), predetermined dummy data to output data (for example, [1, 3, 1]) included in the training data. A value of the predetermined dummy data added to the output data herein preferably falls within a range on which a value included in the original output data may take. In this way, a value (i.e., a value that is not selected as the output data) of a dummy included in intermediate data generated by the intermediate data generation unit 240 falls within a range on which the plurality of values being the original output data may take. Therefore, even when a third person views the intermediate data, the third person cannot decide which data of the plurality of values included in the intermediate data are significant data (i.e., the original output data). Thus, the third person cannot determine a combination of the input data and the output data, and, as a result, concealment of a model generated by the model generation device 10 increases.
Note that, in order to set the model in such a manner, in the second training data used by the model generation device 10, a value of a dummy included in output data may fall within a range on which a plurality of values being original output data may take.
Processing illustrated in
Herein, when a difference between the “value that satisfies the predetermined condition” and the “another value” being the value after the replacement is reduced, even in a case where the input data include the “value that satisfies the predetermined condition”, a difference between a value of output data output from the output data generation unit 250, and output data (a value of original output data) without the value being replaced is reduced. Thus, an error caused by the replacement of a value of the conversion rule is reduced.
Note that the replacement of a value described above may not be performed in the pre-processing unit 130 of the model generation device 10.
Processing illustrated in
In order to do this, for example, in training data used by the model generation unit 140, an inverse arithmetic operation (for example, subtracting a constant (for example, 1) from a value in a third column) of an arithmetic operation performed by the intermediate data generation unit 240 (or the output data generation unit 250) may be performed on output data, and a model used by the intermediate data generation unit 240 may be generated by using the training data after the inverse arithmetic operation is performed.
According to the example illustrated in
The bus 1010 is a data transmission path for allowing the processor 1020, the memory 1030, the storage device 1040, the input/output interface 1050, and the network interface 1060 to transmit and receive data with one another. However, a method for connecting the processor 1020 and the like to one another is not limited to bus connection.
The processor 1020 is a processor achieved by a central processing unit (CPU), a graphics processing unit (GPU), and the like.
The memory 1030 is a main storage device achieved by a random access memory (RAM) and the like.
The storage device 1040 is an auxiliary storage device achieved by a hard disk drive (HDD), a solid state drive (SSD), a memory card, a read only memory (ROM), or the like. The storage device 1040 stores a program module that achieves each function (for example, the training data acquisition unit 120, the pre-processing unit 130, the model generation unit 140, and the model transmission unit 160) of the model generation device 10. The processor 1020 reads each program module onto the memory 1030 and executes the program module, and each function associated with the program module is achieved. Further, the storage device 1040 also functions as the training data storage unit 110 and the model storage unit 150.
The input/output interface 1050 is an interface for connecting the model generation device 10 and various types of input/output equipment.
The network interface 1060 is an interface for connecting the model generation device 10 to a network. The network is, for example, a local area network (LAN) and a wide area network (WAN). A method of connection to the network by the network interface 1060 may be wireless connection or wired connection. The model generation device 10 may communicate with the data processing system 20 via the network interface 1060.
Note that, a hardware configuration of the data processing system 20 is also similar to the example illustrated in
First, the training data acquisition unit 120 of the model generation device 10 reads training data from the training data storage unit 110 (step S10). The training data include first training data and second training data. In this state, a conversion rule is not applied to the first training data and the second training data, and thus dummy data are not added.
Next, the model generation unit 140 trains a model by using the first training data (step S20). Hereinafter, the model generated by the training is referred to as a base model. The number of values of input data to the base model, and the number of values of output data from the base model are each an original number.
Next, the model generation unit 140 generates a temporary model by processing the base model (step S30). In the temporary model, each of the number of pieces of the input data and the number of pieces of the output data is the number after the conversion rule is applied.
Herein, a generation method for the temporary model will be described. The temporary model is generated by, for example, the following methods. Note that the conversion rule of the first training data and the conversion rule from intermediate data to the output data described above are determined according to the generation method for the temporary model.
In a matrix indicating a parameter of the base model, dummy data are added to a specific column, a specific row, or any position (associated with the examples illustrated in
In a matrix indicating a parameter of the base model, a specific column, a specific row, or any component is replaced with one another (see
(1) and (2) are combined.
Herein, a value of the dummy data in the temporary model is determined by performing training of the temporary model in (1) or (3).
Next, the pre-processing unit 130 converts the first training data and the second training data according to the conversion rule (step S40). Then, the model generation unit 140 trains the temporary model by using the first training data and the second training data after the conversion (step S50). Subsequently, whether the temporary model after the training satisfies a reference is decided (step S60). For example, the reference includes both of the following two references.
A first reference is a fact that a difference between output data as a result of selecting performance data that are not used in the training from the training data storage unit 110, and inputting input data of the selected performance data to the temporary model, and data corresponding to output data of the selected performance data is equal to or less than a reference.
A second reference is a fact that all values constituting output data acquired as a result of inputting, to the temporary model, input data of abnormal data (corresponding to the second training data) that are not used in the training fall within a range of a value on which original output data may take.
When the temporary model after the training does not satisfy the reference (step S60: No), the processing returns to step S20. In other words, the model generation device 10 repeats the processing illustrated in step S20 to step S50 until the temporary model satisfies the reference.
When the temporary model after the training satisfies the reference (step S60: Yes), the model generation unit 140 stores the temporary model as a formal model in the model storage unit 150 (step S70).
As described above, according to the present embodiment, output data of a model used by the data processing system 20 are different from output data of the data processing system 20. Thus, even when a third person acquires a plurality of combinations of input data and output data of the data processing system 20, a model used by the data processing system 20 is hardly estimated from the combinations. Therefore, concealment of the model is high.
While the example embodiments of the present invention have been described above with reference to the drawings, the example embodiments are exemplifications of the present invention, and various configurations other than those described above can also be employed.
Further, the plurality of steps (pieces of processing) are described in order in the plurality of flowcharts used in the above-described description, but an execution order of steps performed in each of the example embodiments is not limited to the described order. In each of the example embodiments, an order of the illustrated steps may be changed within an extent that there is no harm in context. Further, each of the example embodiments described above can be combined within an extent that a content is not inconsistent.
This application claims priority based on Japanese Patent Application No. 2020-171188 filed on Oct. 9, 2020, the disclosure of which is incorporated herein in its entirety.
10
20
30
40
110
120
130
140
150
160
210
220
230
240
250
260
270
Number | Date | Country | Kind |
---|---|---|---|
2020-171188 | Oct 2020 | JP | national |
This application is a National Stage Entry of PCT/JP2021/036407 filed on Oct. 1, 2021, which claims priority from Japanese Patent Application 2020-171188 filed on Oct. 9, 2020, the contents of all of which are incorporated herein by reference, in their entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2021/036407 | 10/1/2021 | WO |