The present invention relates to a technology that supports experiments in materials science among others.
Along with development of a statistical processing technology regarding data analysis, there is a rising demand for carrying out data analysis in materials science as well. Particularly, in a field of materials science, a method called screening is known in which a selection of candidates for a next experiment is made based on known data to perform development of new materials efficiently.
In Patent Literature (PTL) 1, a design support method is described in which knowledges in a nanoscale domain are linked and structured in a same concept scheme independently of material types and applied usefully to new material design independent of material types.
In PTL 2, descriptions are provided as below: through the use of quantum statistic values that are obtained through statistic processing of quantum thermodynamics state quantities specific to elements constituting a reaction system, out of substances with the same number of elements, for which the number or percentage of elements that constitute a reaction system differs, a selection is made of only those substances that have the same physical property value; by inducing multiple simultaneous liner equations that are as many as or more than the number of elements constituting each of those substances and finding a solution of the equations, it is enabled to design the material of a metallic or non-metallic substance having targeted physical and chemical properties and functionality.
As a screening method, various sorts of experimental data are input to an information system, a model is built for predicting an experiment result through machine learning, and screening is performed based on prediction through the model. For this prediction, a method that takes various parameters regarding material design as arguments and evaluates a function that returns a material property by a regression analysis is well known.
PTL 1: Japanese Patent Application Laid-Open No. 2003-178102
PTL 2: Japanese Patent Application Laid-Open No. 2004-086892
In material development, increasing the accuracy of predicting material properties makes it possible to identify a promising potential of a candidate for a new material more exactly and, by dispensing with unnecessary experiments, it is expected that efficient material development can be conducted.
In the regression analysis, variables that correspond to arguments of a function are called explanatory variables and a value that corresponds to a return value of the function is called an objective variable. In predicting a property of a material, the material property is taken as the objective variable and explanatory variables representing the features of the material are selected so that the material property can be predicted. Increase or decrease in the accuracy of the prediction depends on how to select the explanatory variables and, therefore, it is important to prepare a variation of explanatory variables to be adaptable for prediction of a wide range of material properties.
An attempt to predict material properties using past data is disclosed in PTL 1 and PTL 2. However, a general process for material development starts development with certain compositions and a manufacturing process and, for a material found to have an effective property, further takes measures with its related composition and manufacturing process.
In fact, there is a problem in which, in an initial phase of development, only a very small amount of data can be used for a task that has just begun. When attempting to use information of past data, in most cases, data representing well-prepared material properties is only such data available for the task to address, because material properties that are targeted differ task by task. In addition, in some cases, even experiments that aim at finding like properties use different measurement methods and it is often hard to repurpose resulting data straightforwardly.
A problem that is addressed by the present invention is to provide a method that improves the accuracy of predicting material properties by making effective use of past data.
One preferred aspect of the present invention resides in a system to carry out prediction of material properties by processing task data including a plurality of records, each including a material composition, an experimental condition, and a material property. This system includes a material property prediction presenting unit, a cross-task compatible feature value generating unit, and a material property predicting unit. The material property prediction presenting unit accepts a specification of first task data that includes a record in which a material property is unknown and is to be a target of material property prediction through a first predictive model. The cross-task compatible feature value generating unit predicts feature values from material compositions in the first task data by using a second predictive model. The material property predicting unit generates the first predictive model by using the material compositions, experimental condition, feature values, and the known material property in the first task data. Also, the material property predicting unit inputs the material composition, the experimental condition, and the feature value in a record in which the material property is unknown in the first task data to the first predictive model and predicts the unknown material property.
Material composition is at least information about the composition of a material and, more preferably, information about the structure of a material, e.g., its structural formula.
Another preferred aspect of the present invention resides in a method for predicting material properties by an information processing device including an input device, a storage device, and a processor. When generating a first predictive model for predicting a first material property from first data including first feature values, the method executes the following steps. The method executes, namely, a first step of preparing, from the first feature values, a second predictive model that is to predict a second material property defined different from the first material property; a second step of predicting the second material property by applying the first data to the second predictive model; and a third step of generating the first predictive model, taking the first feature values as a first explanatory variable, the second material property as a second explanatory variable, and the first material property as an objective variable.
It is possible to improve the accuracy of predicting material properties by making effective use of past data.
An embodiment is now described in detail with the aid of the drawings. However, the present invention should not be construed to be limited to the following description of the embodiment. Those skilled in the art will easily appreciate that a concrete configuration of the present invention may be modified without departing from the idea or spirit of the present invention.
In a configuration of the invention which will be described hereinafter, for identical parts or parts having like functions, identical reference numerals are used in common across different drawings and duplicated description of those parts may be omitted.
Multiple elements having the same or like functions, if any, may be assigned the same reference numeral with different subscripts and described. However, when it is not necessary to individualize those multiple elements, the subscripts may be omitted in describing them.
Notation of “first”, “second”, “third”, etc. herein is prefixed to identify components, but it is not necessarily intended to confine the components to a certain number, sequence, or contents. In addition, numbers to identify components are used on a per-context basis; a number used in one context does not always denote the same component in another context. Additionally, it is not precluded that a component identified by a number also functions as a component identified by another number.
In some cases, the position, size, shape, range, etc. of each component depicted in a drawing or the like may not represent its actual position, size, shape, range, etc. with the intention to facilitate understanding of the invention. Hence, the present invention is not necessarily to be limited to a position, size, shape, range, etc. disclosed in a drawing or the like.
The material property prediction device (101) also includes a material property predicting unit (113) that generates a material property predictive model to predict material properties and predicts unmeasured material properties using a material property predictive model and a material property predictive model DB (114) to store material property predictive models.
The material property predicting unit (113) generates a material property predictive model by using feature values obtained from data of measured values of a material property from the material DB (112) and feature values obtained from a cross-task compatible feature value generating unit (115) and predicts an unknown property. The cross-task compatible feature value generating unit (115) generates new feature values from data in the material DB (112) and the material property predictive model DB (114). A material property prediction presenting unit (116) presents a result of a prediction made by the material property predicting unit (113) to the user (102).
In the present example, the material property prediction device (101) was assumed to be configured as an information processing device like a server including an input device, an output device, a storage device, and a processing device. Computation and control functions among others are implemented by carrying out a defined process in cooperation with other elements of hardware in such a manner that a program stored in the storage device is executed by the processing device.
In
The configuration of
Material data inputting (S310) is a procedure of inputting experimental data (600) which is a data set in which data of a material for which an experiment was conducted and data of a material for which an experiment is going to be conducted have been stored to the material property prediction device (101). In response to this data, the material property prediction device executes a material DB update process (S311), thereby updating internally stored information.
In the prediction result viewing (S320), the material property prediction device executes a material property prediction presenting process (S321) in response to a request of the user (102) and presents a material property prediction display (322) which is a screen in which a result of predicted material properties is visualized.
In
The first step (S401) of the material DB update process (S311) of
A task ID (700) is an identification number that uniquely identifies a task. In Example 1, it is assumed to handle one file as one task and, therefore, a task ID corresponds to a filename of a real data file. A task ID (700) should be added in a serial numbering scheme when registering in the material DB (112). If correspondence between a file and a task is not fixed, its registration may be made in the following manner: when registering in the material DB (112), a question that “a file you are going to upload now corresponds to what task?” is presented to the user to ask the user to input the correspondence. The format of the experimental data table is required to be the same for registered data and added data. The user can define the material property (702) and the experimental condition (704) optionally and also can set the number of material properties and experimental conditions freely.
A feature of the present example is improving the accuracy of predicting material properties by using data of existing tasks even in a situation where there are few data pieces. In an initial phase of a material development process, the amount of available data is very small. Before explaining a concrete example, a concept of the present example is described.
In the example of
A process that uses information about past tasks as “information for creating feature values” is described with
The process then predicts the material property A by applying the structural formulas in the data of the past task B (903) to the predictive model (902). The process adds the material property A to the data of the past task B, thus generating a new data set (904). If the same structural formula as in the past task B is included in the past task A, its material property in the past task A may be added as is to the new data set. This material property A corresponds to cross-task compatible feature values.
Upon having obtained the new data set (904), the process generates a predictive model (905) to predict a material property B, taking known data of the material property B (item Nos. 1, 2, and 3) in the data set as teacher data. At this time, the explanatory variables are the structural formulas, experimental condition (humidity), and the material property A and the objective variable is the material property B. The predictive model (905) can be generated through supervised machine leaning which is known.
The process inputs data (item No. 4) for which the material property B should be predicted to the generated predictive model (905) and obtains the material property B. By adding the material property A as new feature values (cross-task compatible feature values), it can be expected to improve the prediction accuracy in comparison with when the past task B data is used as it is. This is considered as effective particularly when there is a correlation between the material properties A and B.
With the understanding of the concept discussed above, a flow of a concrete process for prediction result viewing is described.
The material property prediction presenting process (S321) for prediction result viewing (S320) is described with
First, the material property prediction presenting unit (116) presents the material property prediction display (322) to the user (102) and receives the specification of an experimental data table as a target of property prediction (S1001). At this time, a task ID is used to specify the designation of an experimental data table stored in the material DB (112). Here, it is assumed that experimental data has already been stored in the material DB (112).
In a drop-down box (1101) in the figure, the designation of an experimental data table is displayed as a candidate. When the user specifies a task ID and presses the predicted value update button (1102), the material property prediction presenting unit (116) sends a command to execute interpolation by a predicted value for blank data of material property (702) in the records of the experimental data table (
Upon receiving the above command to execute interpolation from the material property prediction presenting unit (116), the material property predicting unit (113) retrieves the data of the experimental data table specified by the task ID (700) from the material DB (112) (S1002). Also, in the screen (1104) in
Data retrieved in the processing step (S1002) as described with the flowchart of
In the above description, it is assumed that the predictive model (902) has already been created and is called by the task ID (700) from the material property predictive model DB (114). If the corresponding predictive model (902) does not exist in the material property predictive model DB (114), learning and creating the predictive model (902) should be executed, assuming the material structural formulas in the data of the past task A as the explanatory variables and the material property of a known material as the objective variable, as illustrated in
Then, the material property predicting unit (113) generates data for predicting material properties (S1004). This processing corresponds to predicting the material property A by applying the structural formulas in the data of the past task B (903) to the predictive model (902) and adding the material property A to the data of the past task B, thus generating a new data set (904). At this time, the cross-task compatible feature value generating unit (115) executes prediction of the material property A (cross-task compatible feature values) by using the predictive model (902) retrieved in the pressing step (S1003).
The data for predicting material properties includes feature values (1202, 1203) created through the predictive model (902) related to any other task, i.e., cross-task compatible feature values. The description with regard to
From the data for predicting material properties except for records in which material property (702) is unmeasured, i.e., blank, the material property predicting unit (113) assigns items excepting task ID (700), experiment ID (701), and material property (702) to the explanatory variables and the material property (702) to the objective variable, executes a regression analysis which is publicly known, obtains a prediction function, and learns a predictive model (905) (S1005). The created predictive model (905) is stored into the material property predictive model DB (114) together with the task ID of the data from which the predictive model (905) was generated.
Given that the prediction function is written as y=f (x1, x2, . . . ), where y is the objective variable and x1, x2, . . . are the explanatory variables, this procedure means defining the function form of f, i.e., defining x1, x2, . . . so that y can be predicted. In the case of the present example, supposing the use of the data for predicting material properties in
This learning corresponds to generating the predictive model (905) in the bottom row of
Algorithms for the regression analysis may be those that are publicly known; regression trees, LASSO, random forests, support vector regression, Gaussian process regression, neural networks, etc. can be used. Note that an increase in the number of explanatory variables is made in the present example and regression trees, and random forests are suitable for increasing the number of explanatory variables rather than support vector regression. Particularly, with nonlinear random forests, prediction at high accuracy can be expected.
After thus generating the predictive model (905), the material property predicting unit (113) selects a record in which material property (702) is unmeasured, i.e., blank and computes a predicted value of the material property (702) using the foregoing prediction function y=f (x1, x2, . . . ) (S1006).
The computed predictive value is displayed by the material property prediction presenting unit (116) in the screen on the monitor (205), as illustrated in
Although structural formulas are used when creating feature values about any other task in the example discussed hereinbefore, data of composition and others may be used as long as the data is common across tasks data. Additionally, a method in which prediction can be made using structural formulas as such is also publicly known and the scheme is the same in that case as well.
According to the example described hereinbefore, using data stored when material properties were predicted in any other past task, a model is created that is compatible with a prediction that is executed currently and the accuracy is improved by increasing the number of explanatory variables through the model. Although, e.g., a task (the past task B in
Number | Date | Country | Kind |
---|---|---|---|
2019-169651 | Sep 2019 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2020/031267 | 8/19/2020 | WO |