The present application claims priority from Japanese Patent Application No. 2023-017144, filed on Feb. 7, 2023, the contents of which are incorporated herein by reference.
The present disclosure relates to a method for supporting evaluation of a learning process of a prediction model, an information processing device, and a recording medium storing instructions.
Conventionally, methods for performing prediction related to physical and chemical phenomena such as chemical reactions have been developed (for example, PTL 1).
The technique described in PTL 1 has described the use of modeling techniques such as neural networks, partial least squares, and principal component regression to optimize the control of reactor systems. However, a method for evaluating a learning process of a prediction model has not been considered in PTL 1, and there has been room for improvement in the evaluation support technology of the prediction model.
One or more embodiments of the present disclosure made in view of such circumstances improve the evaluation support technology of the prediction model.
(1) A method according to one or more embodiments of the present disclosure is a method executed by an information processing device for supporting evaluation of a learning process of a prediction model, the method comprising:
(2) A method for supporting evaluation of a learning process of a prediction model according to one or more embodiments of the present disclosure is the method according to (1), in which the statistical information includes a first frequency distribution of one of the input values input to the intermediate layer, and a second frequency distribution of another of the input values input to the output layer.
(3) A method for supporting evaluation of a learning process of a prediction model according to one or more embodiments of the present disclosure is the method according to (2), further comprising:
(4) A method for supporting evaluation of the learning process of a prediction model according to one or more embodiments of the present disclosure is the method according to (2) or (3), further comprising:
(5) A method for supporting evaluation of the learning process of a prediction model according to one or more embodiments of the present disclosure is the method according to (4), in which the first predetermined range is either a range where an output value of the activation function of the intermediate layer is 0.01 or more and less than 0.99 or a range where a differential value of the activation function of the intermediate layer is larger than 0, and the second predetermined range is either a range where an output value of the activation function of the output layer is 0.01 or more and less than 0.99 or a range where a differential value of the activation function of the output layer is larger than 0.
(6) A method for supporting evaluation of the learning process of a prediction model according to one or more embodiments of the present disclosure is the method according to (5), in which the activation function of the intermediate layer and the activation function of the output layer are sigmoid functions.
(7) A method for supporting evaluation of the learning process of a prediction model according to one or more embodiments of the present disclosure is the method according to any one of (1) to (3), in which an initial value of a weight coefficient of the neural network model is determined based on a first predetermined range determined based on an activation function of the intermediate layer and based on a second predetermined range determined based on an activation function of the output layer.
(8) An information processing apparatus according to one or more embodiments of the present disclosure is an information processing device supporting evaluation of a learning process of a prediction model, the information processing device comprising a processor that:
(9) A non-transitory computer-readable recording medium according to one or more embodiments of the present disclosure is a non-transitory computer-readable recording medium storing instructions executed by an information processing device that comprises a processor and supports evaluation of a learning process of a prediction model, the instructions causing the processor to execute:
With the method, the information processing device, and the recording medium storing instructions according to one or more embodiments of the present disclosure, the evaluation support technology of the prediction model can be improved.
Hereinafter, a method, an information processing device, and a recording medium storing instructions for supporting the evaluation of the learning process of the prediction model according to embodiments of the present disclosure will be described with reference to the drawings. A prediction target of a prediction model according to one or more embodiments may be arbitrary. In other words, the technology according to one or more embodiments can be used for the whole modeling using neural networks and the like. The prediction target of the prediction model according to one or more embodiments includes, for example, chemical reactions of synthetic resins. Examples of the chemical reactions of the synthetic resins include polycondensation reactions and addition polymerization reactions. The polycondensation reactions include dehydration-condensation reactions. The chemical reactions of the prediction target of the prediction model according to one or more embodiments are not limited to these, and the prediction target may be any chemical reaction other than those of the synthetic resins. Hereinafter, in one or more embodiments, the case where the prediction target of the prediction model is chemical reactions of synthetic resins or the like will be described. The present invention, however, is not limited thereto.
In each of the drawings, identical or equivalent parts are assigned the same symbols. In the description of one or more embodiments, descriptions of the identical or equivalent parts are omitted or simplified as appropriate.
First, the overview of one or more embodiments will be described. The method according to one or more embodiments is executed by an information processing device 10. The information processing device 10 trains a neural network model including an input layer, an intermediate layer, and an output layer based on actual data including an explanatory factor and an objective factor related to the chemical reaction. In other words, the prediction model of the target evaluated by the method according to one or more embodiments is the neural network model. In the method according to one or more embodiments, the information processing device 10 also outputs statistical information on input values input to the intermediate layer and the output layer of the neural network model.
As described above, the method according to one or more embodiments is characterized in that the statistical information on the input values input to the intermediate layer and the output layer of the neural network model is output. As described later, the output of the statistical information enables the user to objectively evaluate the validity of a learning process of the prediction model predicting the chemical reaction. Therefore, the evaluation support technology according to one or more embodiments can be used as an evaluation method that secures robustness of the prediction result even for unknown future data, which cannot be measured only by cross-validation, a common model evaluation method. Therefore, according to one or more embodiments, the evaluation support technology of the prediction model predicting the chemical reactions can be improved.
Subsequently, referring to
As illustrated in
The control unit 11 includes at least one processor, at least one dedicated circuit, or a combination thereof. The processor is a general-purpose processor such as a central processing unit (CPU) or a graphics processing unit (GPU), or a dedicated processor specialized for specific processing. The dedicated circuit is, for example, a field-programmable gate array (FPGA) or an application specific integrated circuit (ASIC). The control unit 11 executes processes associated with the operation of the information processing device 10 while controlling each part of the information processing device 10.
The storage unit 12 includes at least one semiconductor memory, at least one magnetic memory, at least one optical memory, or a combination of at least two of these memories. The semiconductor memory is, for example, a random-access memory (RAM) or a read-only memory (ROM)). The RAM is, for example, a static random-access memory (SRAM) or a dynamic random-access memory (DRAM). The RCM is, for example, an electrically erasable programmable read-only memory (EEPROM). The storage unit 12 functions, for example, as a main memory device, an auxiliary memory device, or a cache memory. In the storage unit 12, data used in the operation of the information processing device 10 and data obtained by the operation of the information processing device 10 are stored.
The input unit 13 includes at least one interface for input. The interface for input is, for example, physical keys, capacitive keys, pointing devices, or touch screens integrated with displays. The interface for input may be, for example, a microphone that accepts voice input or a camera that accepts gesture input. The input unit 13 accepts operations to input data used in the operation of the information processing device 10. The input unit 13 may be connected to the information processing device 10 as an external input device instead of being provided in the information processing device 10. For example, any method such as universal serial bus (USB), high-definition multimedia interface (HDMI) (registered trademark), or Bluetooth (registered trademark) can be used as the connection method.
The output unit 14 includes at least one interface for output. The interface for output is, for example, a display that outputs information in the form of images. The display is, for example, a liquid crystal display (LCD) or an electroluminescence (EL) display. The output unit 14 displays and outputs data obtained by the operation of the information processing device 10. The output unit 14 may be connected to the information processing device 10 as an external output device instead of being provided in the information processing device 10. For example, any method such as USB, HDMI (registered trademark), or Bluetooth (registered trademark) can be used as the connection method.
The functions of the information processing device 10 are achieved by executing program or instructions according to one or more embodiments on a processor corresponding to the information processing device 10. In other words, the functions of the information processing device 10 are achieved by the software. The instructions cause the computer to function as the information processing device 10 by causing the computer to execute the operations of the information processing device 10. In other words, the computer functions as the information processing device 10 by executing the operation of the information processing device 10 in accordance with the instructions.
In one or more embodiments, the instructions can be recorded on a computer-readable recording medium. The computer-readable recording media include non-transient (non-transitory) computer-readable media, for example, magnetic recording devices, optical discs, magneto-optical recording media, or semiconductor memories.
Some of or all of the functions of the information processing device 10 may be achieved by a dedicated circuit corresponding to the control unit 11. In other words, some or all of the functions of the information processing device 10 may be achieved by hardware.
In one or more embodiments, the storage unit 12 stores therein, for example, actual data and prediction models. The actual data and the prediction model may be stored in an external device separate from the information processing device 10. In this case, the information processing device 10 may be equipped with an interface for external communication. The interface for communication may be either interface of a wired communication or interface of wireless communication. In the case of the wired communication, the interface for communication is, for example, a LAN interface or USB. In the case of the interface for wireless communication, the interface for communication is, for example, an interface compliant with mobile communication standards such as LTE, 4G, or 5G, or an interface compliant with short-range wireless communication such as Bluetooth (registered trademark). The interface for communication can receive data used in the operation of the information processing device 10 and can transmit data obtained by the operation of the information processing device 10.
Subsequently, with reference to
Step S101: The control unit 11 of the information processing device 10 trains a neural network model based on actual data on a chemical reaction of a synthetic resin. The actual data include an explanatory factor and an objective factor related to the chemical reaction. Such explanatory factor and objective factor are appropriately selected depending on the target chemical reaction to be predicted. For example, in the case where the prediction related to the dehydration-condensation reaction is performed, the experimental data include the explanatory factors and the objective factor related to the dehydration-condensation reaction. For example, the explanatory factors may include feature values and the like related to the dehydration and temperature rising processes. The objective factor may also include a hydroxyl value, an acid value, and the like. In other words, the control unit 11 trains the neural network model using these explanatory factor and objective factor included in the actual data as learning data.
Any method can be employed to acquire the actual data. For example, the control unit 11 acquires the actual data from the storage unit 12. The control unit 11 may also acquire the actual data by accepting input of the actual data from the user by the input unit 13. Alternatively, the control unit 11 may acquire such actual data from an external device that stores the actual data through an interface for communication.
The neural network model trained based on the learning data is cross-validated. As a result of such cross-validation, in the case where an accuracy is within a practical range, the prediction related to the chemical reaction is performed using the neural network model.
Step S102: The control unit 11 outputs the statistical information on the input values input to the intermediate layer and the output layer of the neural network model by the output unit 14.
Step S103: The control unit 11 predicts the objective factor related to the chemical reaction based on the explanatory factors related to the chemical reaction. For example, the control unit 11 may acquire the explanatory factors by accepting input of the explanatory factors from the user by the input unit 13. The control unit 11 may output the predicted objective factor as a prediction result by the output unit 14.
The input layer 100 includes a plurality of elements 101 to 104 (also referred to as input elements 101 to 104). In the neural network model illustrated in
The intermediate layer 200 includes a plurality of elements 201 to 214 (also referred to as intermediate elements 201 to 214). In the neural network model illustrated in
The output layer 300 includes an element 301 (an output element 301). In the neural network model illustrated in
The values input from the input elements 101 to 104 of the input layer 100 to the intermediate elements 201 to 214 of the intermediate layer 200 are converted in the intermediate layer 200 based on the activation function of the intermediate layer 200. The converted values are output to the element 301 of the output layer 300. The activation function of the intermediate layer 200 is, for example, a sigmoid function. The values input from the intermediate elements 201 to 214 of the intermediate layer 200 to the output element 301 of the output layer 300 are converted in the output layer 300 based on the activation function of output layer 300 and output. The activation function of the output layer 300 is, for example, the sigmoid function. Specifically, the activation functions of the intermediate layer and the output layer are, for example, the respective sigmoid functions determined by the following Mathematical Formulas (1) and (2).
Here, f1(uj1) is the activation function of the intermediate layer 200, α1 is the coefficient of the activation function of the intermediate layer 200, and uj1 is the input value input to the j-th element of the intermediate layer 200. f2(uj2) is the activation function of the output layer 300, α2 is the coefficient of the activation function of the output layer 300, and uj2 is the input value input to the j-th element of the output layer. α1 and α2 may be the same value or may be different values.
Here, the output information may include information on the activation function of the intermediate layer. In other words, for example, the first frequency distribution 401 may be displayed together with the activation function of the intermediate layer. The output information in
The output information may include information on a first predetermined range. The first predetermined range is a range where the output value of the activation function of the intermediate layer is 0.01 or more and less than 0.99. The first predetermined range may be a range where the differential value of the activation function of the intermediate layer is larger than 0. Alternatively, the first predetermined range may be a range where the differential value of the activation function of the intermediate layer is larger than 0.001. The output information in
Here, the output information may include information on the activation function of the output layer. In other words, for example, the second frequency distribution 501 may be displayed together with the activation function of the output layer. The output information in
The output information may also include information on a second predetermined range. The second predetermined range is a range where the output value of the activation function of the output layer is 0.01 or more and less than 0.99. The second predetermined range may be a range where the differential value of the activation function of the output layer is larger than 0. Alternatively, the second predetermined range may be a range where the differential value of the activation function of the output layer is larger than 0.001. The output information in
As described above, with the method according to one or more embodiments, the validity of the learning process of the prediction model can be objectively evaluated by outputting the statistical information in relation to the learning process. Specifically, for example, the relationship between the first frequency distribution and the second frequency distribution and activation functions can be visualized by displaying them, whereby the validity of the learning process can be evaluated. Therefore, according to one or more embodiments, the evaluation support technology of the prediction model can be improved.
Here, in one or more embodiments, hyperparameters may be adjusted in order to further optimize the learning process. For example, in one or more embodiments, setting of the initial values of weight coefficients may be adjusted. For example, the initial values of the weight coefficients may be determined based on a first predetermined range determined based on the activation function of the intermediate layer and a second predetermined range determined based on the activation function of the output layer. As described above, the first predetermined range is, for example, a range where the output value of the activation function of the intermediate layer is 0.01 or more and less than 0.99. The second predetermined range is, for example, a range where the output value of the activation function of the output layer is 0.01 or more and less than 0.99. Specifically, for example, the weight coefficient between the input layer and the intermediate layer may be set to a random number from 0 or more and less than 0.1. The weight variable between the intermediate layer and the output layer may be set to a random number from 0 or more and less than 0.2. Setting the weight coefficient as described above allows respective starting points of learning to be adjusted so as to be in the first predetermined range and the second predetermined range. If the starting point is out of the range, it is desirable that the range of random numbers (the weight variables between the input layer and the intermediate layer are 0 or more and less than 0.1, and others) serving as the initial value of the weight coefficient be adjusted again.
In one or more embodiments, the case where the intermediate layer is one layer is described. However, in the case where intermediate layers are two or more layers, the statistical information on each intermediate layer may be output.
In
The determination of the validity of the learning process based on the first frequency distribution may be automated. For example, the control unit 11 may determine whether the first peak determined based on the first frequency distribution is within the first predetermined range. The control unit 11 may also output such a determination result from the output unit 14. The first peak is appropriately determined based on the first frequency distribution. For example, the first peak may be an average value, a median, or the like of the peaks in the frequency distribution of the input values of the intermediate elements. Alternatively, the first peak may be a set of peaks in the frequency distribution of the input values of all intermediate elements. In other words, whether the peak in the frequency distribution of the input values of all the intermediate elements is within the first predetermined range may be determined.
Similarly, the determination of the validity of the learning process based on the second frequency distribution may be automated. For example, the control unit 11 may determine whether the second peak determined based on the second frequency distribution is within the second predetermined range. The control unit 11 may also output such a determination result from the output unit 14. The second peak is appropriately determined based on the second frequency distribution. For example, the second peak may be an average value, a median, or the like of the peaks in the frequency distribution of the input values of the output elements. Alternatively, the second peak may be a set of peaks in the frequency distribution of the input values of all intermediate elements. In other words, whether the peak in the frequency distribution of the input values of all the intermediate elements is within the second predetermined range may be determined.
The case where the activation functions of the intermediate layer and the output layer are sigmoid functions is described in one or more embodiments. However, the activation functions are not limited to the sigmoid functions. For example, the activation functions of the intermediate layer and the output layer may be functions such as a hyperbolic tangent function (tanh function) and a ramp functions (ReLU).
In one or more embodiments, the case where the prediction target is the chemical reaction of the synthetic resin and the like is described as one example. The prediction target, however, is not limited to this. The prediction target may be, for example, the prediction of a physical or chemical phenomenon such as a chemical reaction of any substance. The prediction target may not necessarily be a physical or chemical phenomenon and the like. In other words, the technology according to one or more embodiments can be used for the whole modeling using neural networks and the like.
Although the present disclosure has been described based on the drawings and examples, it should be noted that those skilled in the art can easily make changes and modifications based on the present disclosure. Therefore, it should be noted that these changes and modifications are included within the scope of the present disclosure. For example, the functions and the like included in the units, steps, or the like can be rearranged so as not to be logically inconsistent, and a plurality of units, steps, or the like can be combined to one or divided.
Number | Date | Country | Kind |
---|---|---|---|
2023-017144 | Feb 2023 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2023/019012 | 5/22/2023 | WO |