The present invention relates to a technique for supporting an experiment in materials science and the like.
With the development of statistical processing technologies relating to data analysis, there is an increasing demand for data analysis in materials science as well. In particular, in the field of materials science, a method that is called screening is known in which a candidate for a next experiment is selected based on known data in order to efficiently develop a new material.
As a screening method, various experimental data is input to an information system, machine learning is performed to build a model that predicts experimental results, and screening is performed based on the prediction performed by the model. For this prediction, a method is known, which uses various parameters relating to material design as arguments to perform regression analysis to obtain a function of returning material properties.
PTL 1: Japanese Patent Application Laid-Open No. 2004-086892
PTL 2: Japanese Unexamined Patent Application Publication (Translation of PCT Application) No. 2017-520868
PTL 3: Japanese Patent Application Laid-Open No. 2008-081435
In material development, it is expected that it will be possible to more accurately identify a promising potential for a candidate for a new material by improving the accuracy of predicting material properties and efficiently develop a material by omitting unnecessary experiments.
In regression analysis, a variable corresponding to an argument of a function is called an explanatory variable, and a value corresponding to a return value of the function is called an objective variable. In material properties prediction, properties of a material are used as an objective variable, and an explanatory variable that indicates a feature of the material is selected such that the material properties can be predicted. Since the accuracy of the prediction varies depending on the selection of the explanatory variable. Therefore, it is important to prepare a variety of methods for generating an explanatory variable such that the methods can be used to predict various material properties.
In Patent Literature 1, material properties are predicted using a mixing ratio of components as an explanatory variable. This method can be used to predict properties of a material obtained by mixing a plurality of substances. However, this method is suitable for predicting properties of a single substance.
Patent Literature 2 discloses a method for dividing a space around a molecule into spatial lattices (voxels), expressing the three-dimensional structure of the molecule using the number of atoms in each voxel, and using it as an explanatory variable. According to this method, it is possible to predict material properties based on the three-dimensional shape of a single molecule.
However, in the method using voxels, there is a degree of freedom in how to determine a coordinate system. That is, there is no method of determining where the origin needs to be placed in a molecule, determining which direction an x axis needs to be, and the like. In other words, many voxels may be present in the same substance.
In the invention described in Patent Literature 2, although the freedom is tried to be incorporated into regression analysis by generating a large amount of data with different origins and different angles, a large amount of duplicate data is input, and a calculation time and the like will increase significantly. In addition, depending on a technique for the regression analysis, it is not clear whether a regression analysis algorithm can appropriately incorporate the freedom into a model, and there remains a concern that the accuracy of prediction will be rather reduced. In addition, even when the prediction can be performed well, there is a problem that reverse calculation cannot be performed. For example, when a condition under which the highest predicted value of properties of a material is obtained needs to be found, it may be sufficient to search for the maximum value of a function that returns the properties of the material. However, even when explanatory variables of voxels at that time are obtained, the structure of a corresponding molecule cannot be easily inferred.
As a screening method based on the three-dimensional structure of a molecule, a method for evaluating a similarity with a known molecule as disclosed in Patent Literature 3 is also known. Since this method is based on another molecule, the effect of the freedom of the coordinate system of a molecule alone is small, but there is a problem that inverse calculation is still difficult and the method cannot be applied unless a sufficient number of molecules are known.
Therefore, it is desirable to define the spatial structure of a molecule without allowing freedom with respect to the selection of a coordinate system, and to predict material properties based on the three-dimensional structure of the molecule.
According to a preferable aspect of the present invention, a material properties prediction system that predicts properties of a material includes a three-dimensional molecular structure calculation unit having a function of calculating positional coordinates of atoms constituting a molecule from a structural formula of the material; a spatial structure feature quantity calculation unit having a function of selecting three atoms to form a triangle based on the positional coordinates of the atoms calculated by the three-dimensional molecular structure calculation unit and calculating, as a spatial structure feature quantity, distances between the three atoms and another atom; and a material properties prediction unit that predicts the material properties using, as an explanatory variable, the spatial structure feature quantity generated by the spatial structure feature quantity calculation unit.
According to another preferable aspect of the present invention, an information processing method includes performing a three-dimensional molecular structure calculation process of receiving a structural formula of a material and calculating positions of atoms constituting a molecule from the structural formula of the material; and performing a spatial structure feature quantity calculation process of selecting three atoms to form a triangle based on the calculated positions of the atoms, and calculating distances between the three atoms and another atom to obtain a spatial structure feature quantity.
It is possible to define the spatial structure of a molecule without allowing freedom with respect to the selection of a coordinate system and to predict material properties based on the three-dimensional structure of the molecule.
Embodiments are described in detail with reference to the drawings. However, the present invention is not construed as being limited to the following description of the embodiments. It is easily understood by those skilled in the art that a specific configuration thereof can be modified without departing from the idea or gist of the present invention.
For configurations according to the present invention described below, the same reference signs are used in common for the same components or components having the same functions between different drawings, and a duplicate description may be omitted.
When multiple elements having the same function or having similar functions are present, they may be explained by adding different subscripts to the same reference sign. However, when it is not necessary to distinguish between the multiple elements, the subscripts may be omitted for explanation.
Notations such as “first”, “second”, and “third” in the present specification and the like are added to identify components, and do not necessarily limit the number, order, or contents thereof. In addition, numbers for identifying components are used for each context, and numbers used in one context do not necessarily indicate the same configuration in other contexts. Furthermore, this does not prevent a component identified by a certain number from functioning as a component identified by another number.
Positions, sizes, shapes, ranges, and the like of configurations illustrated in the drawings and the like may not represent the actual positions, sizes, shapes, ranges, and the like in order to facilitate understanding of the present invention. Therefore, the present invention is not necessarily limited to the positions, the sizes, the shapes, the ranges, and the like disclosed in the drawings and the like.
In the present example, the material properties prediction device (101) is constituted by an information processing device such as a server including an input device, an output device, a storage device, and a processing device. Functions such as calculation and control are implemented by executing a program stored in the storage device by the processing device to cause a predetermined process and other hardware to collaborate with each other.
The experimental data reception unit (111), the three-dimensional molecular structure calculation unit (113), the spatial structure feature quantity calculation unit (114), the material properties prediction unit (116), and the material properties prediction presentation unit (118), which are illustrated in
The configuration illustrated in
The material data entry (S310) is a procedure for entering, in the material properties prediction device (101), experimental data (600) that is a data set storing data of a material for which an experiment has been conducted and data of a material for which an experiment will be conducted. The material properties prediction device (101) performs a material DB update process (S311) based on the data to update information stored in the material DB (112).
In the prediction result viewing (S320), the material properties prediction device (101) executes a material properties prediction presentation process (S321) in accordance with a request from the user (102) to present a material properties prediction display (322) that is a screen obtained by visualizing the result of the material properties prediction.
In the first step (S401) of the material DB update process (S311) illustrated in
However, for example, coordinate values (1.0, 1.2, 5.0) of the carbon atom (901) and coordinate values (7.0, 3.7, 5.0) of a carbon atom (902) are not described and thus need to be calculated. In an example of a known method, an atom is once placed at a position in the van der Waals radius or the like, the position is optimized and calculated such that a bonding angle and the like are appropriate values. For this calculation method, there are various known methods. Therefore, the calculation may be performed using any of the known methods as long as a certain degree of accuracy can be obtained.
The coordinate values obtained as results of this calculation are relative coordinates, and the coordinate system varies depending on the molecule. For this, there is a method of creating a certain unified standard by using the center of gravity of the molecule or the like. However, the present example has an advantage that this standard for a coordinate system may not be required and any standard may be used.
For each material structural formula (703) of the experimental data table, the positions of atoms are calculated and three-dimensional molecular structure data (800) is obtained as a result in which the positions are described in appropriate order. In this case, a corresponding experiment ID (701) needs to be associated with the experimental data (600).
The third step of the material DB update process (S311) is the spatial structure feature quantity generation process (S403) of calculating a feature quantity from the three-dimensional molecular structure data (800).
In the present example, carbon atoms are prioritized as the atoms forming the triangle. This is due to the fact that, when the material is organic, the basis of the structure is carbon atoms. It is not essential to select a carbon atom, and atoms that cause the accuracy to be high may be selected as appropriate. In fact, since it is inferred that the atoms to be used vary depending on properties to be predicted, it is desirable that the user can configure a setting as appropriate.
A circulation direction is defined by adding reference atomic numbers (reference numbers) or the like for the three atoms forming the triangle serving as the standard in the three-dimensional structure, as described later with reference to
As described above, since a method for selecting atoms forming a triangle may vary depending on material properties to be predicted, atoms that cause the highest accuracy may be selected via calculation of a plurality of combinations.
Next, other atoms are rearranged in accordance with a predetermined standard (S1002). In this case, as this standard, the shortest linear distances from the center of gravity of the triangle formed by the reference atoms are calculated and the atoms are arranged in the order from the shortest distance. Alternatively, the atoms can be arbitrarily arranged based on the order determined based on only relative distances between the atoms. Identification numbers are assigned to the other atoms based on the arrangement order.
Next, linear distances between the atoms and the three reference atoms are calculated and used as a feature quantity (S1003). As described above, the foregoing linear distances can be long and an error can be reduced by using, as the reference atoms, atoms forming a triangle with the largest area.
When signs of d1, d2, and d3 are determined to be positive in the case where the circulation direction of the triangle formed by the three reference atoms (1101), (1102), and (1103) is the clockwise direction as viewed from the target atom (1104) side, and are determined to be negative in the case where the circulation direction is the counterclockwise direction as in
In the present example, these values are arranged in a row to indicate a spatial structure feature quantity indicating the spatial structure of the molecule. These values do not have dependency on the orientation of the coordinate system and the position of the origin that relate to the coordinate values of the atoms within the molecule. In addition, these values have a feature suitable for material properties prediction in which the molecular structure can be reversely calculated from the values when the values are determined.
After that, as the foregoing spatial structure feature quantity, distances between each atom and the reference atoms are described (1207), (1208), and (1209). In this case, items are created such that the number of items is based on a case where the number of atoms is the largest among cases stored in the material DB (112), and 0 is added to an item in which an atom corresponding to a molecule that does not have the maximum number of atoms is not present. In this case, distances may be expressed in any unit. However, Angstrom is used for the distances in this example.
By the foregoing process, new experimental data can be added to the material DB (112). That is, the procedure for the material data entry (S310) is completed.
The material properties prediction presentation process (S321) in the prediction result viewing (S320) is described using
Upon receiving the instruction to perform the interpolation from the material identification prediction presentation unit (117), the material properties prediction unit (116) acquires data of the designated experimental data table from the material DB (112) (S1302) and uses the experiment ID (701) thereof to acquire a corresponding record from the spatial structure feature quantity table (1200) (S1303). The material properties prediction unit (116) associates the data with the record, thereby generating data to be used for material properties prediction (S1304).
The material properties prediction unit (116) removes, from the data for material properties prediction, a record that has material properties (702) not measured or in which the material properties (702) are blank, sets the items excluding the experiment ID (701) and the material properties (702) as explanatory variables, sets the material properties (702) as an objective variable, and performs known regression analysis to obtain a prediction function (S1305). This procedure means that, when the prediction function is expressed as y=f(x), y is an objective variable, x is an explanatory variable, and the function form of y is defined such that y can be predicted when x is determined. After generating a regression model, the material properties prediction unit (116) selects data with material properties (702) that are not measured or are blank, and uses the foregoing prediction function of y=f(x) to calculate a predicted value of the material properties (702) (S1306).
As a method to be used to build the prediction function f, a known multivariate regression analysis method can be used. For example, a known high-precision nonlinear regression method, such as a regression tree, random forests, support vector regression, Gaussian process regression, or a neural network, can be used as long as the method is a regression analysis method that uses multivariate as an argument. As described above, this prediction result is reflected in the screen (1403) by the material identification prediction presentation unit (117) (S1307). In the present example, although only the spatial structure feature quantity and the experimental condition are used as the explanatory variables, a certain amount (for example, a molecular weight or an electric charge) may be calculated and used in fact. In addition, like a known recursive neural network, when a technique capable of performing prediction using sequential information is used, it may be possible to perform the prediction without using data in which the distance d1 (1207) between each atom and the reference atom 1 is 0, and high accuracy may be obtained.
In the foregoing example, it is possible to incorporate the spatial structure of a molecule into prediction to perform evaluation without performing special post-processing on the spatial structure for screening of experimental design. Therefore, it is expected to improve the accuracy of prediction.
Example 2 has a feature in which not only a predicted value of material properties not measured is calculated, but also a condition under which optimal material properties are predicted is searched, displayed on a screen, and used to make experimental design.
The result of this search is displayed on a material properties prediction result screen (S1702).
According to Example 2, a candidate other than a candidate compound given by the user (102) can be selected and it is expected to increase the possibility that a compound that the user has not noticed can be found.
According to the examples described above, to perform prediction based on the spatial structure of a molecule, a feature quantity that has a one-to-one correspondence with the spatial structure of the single molecule is used without allowing freedom with respect to the selection of a coordinate system such that inverse calculation is possible. Therefore, it is possible to predict material properties by incorporating the three-dimensional structure of the molecule into the prediction, which leads to more appropriate screening.
That is, in prediction evaluation for screening of experimental design, it will be possible to perform more accurate prediction by incorporating the three-dimensional structure of a molecule into the prediction. In addition, since the three-dimensional structure of a molecule having a specific predicted value can be inversely calculated, it is possible to estimate the shape of a molecule having desirable properties. As a result, it will be easier to make experimental design, and it will be possible to develop a good material by conducting a small number of experiments.
As described above in the examples, the inventers have paid attention to a problem that, when a feature quantity is used based on the spatial structure of a molecule in order to improve the accuracy of predicting material properties, a coordinate system in the molecule is not uniquely determined, the shape of the molecule cannot be inversely calculated from the feature quantity, and thus the molecule corresponding to an optimal solution is difficult to understand. Therefore, the examples present the methods for selecting the most important three atoms in a molecule as a feature quantity representing the three-dimensional structure of the molecule, and using linear distances from the atoms as the feature quantity. As a result, it is possible to define the spatial structure of a molecule without allowing freedom with respect to the selection of a coordinate system and it is possible to predict material properties based on the three-dimensional structure of the molecule.
Number | Date | Country | Kind |
---|---|---|---|
2019-162137 | Sep 2019 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2020/031426 | 8/20/2020 | WO | 00 |