This application claims the benefit of priority of the prior Japanese Patent Application No. 2023-177805, filed on Oct. 13, 2023, the entire contents of which are incorporated herein by reference.
The embodiments discussed herein are related to a storage medium stored with a molecular dynamics computation program, a molecular dynamics computation method, and a molecular dynamics computation device.
Hitherto, simulations have been performed of the movements of atoms and molecules using molecular dynamics. For example, there is a proposal for an analysis method to analyze composite materials with a computer using molecular dynamics and to appropriately reproduce the behavior of a particle model inside an analysis model. In this analysis method, a machine learning model including a particle model for training modeling a substance in a composite material is employed to perform computation of behavior using molecular dynamics, and machine learning is performed on a prediction module using training data including at least information about positions in the particle model for training before and after lapse of a specific period of time. Such a prediction module takes a particle model for use in analysis from out of analysis models for composite materials as a particles-of-interest model, and predicts information about positions in the particles-of-interest model after lapse of a specific period of time using information about the relative positions in a neighboring-particles model, at the periphery of the particles-of-interest model, with respect to the particles-of-interest model. Information about the positions in the particles-of-interest model is found thereby. In the prediction by the prediction module, a prediction module is prepared separately for a first analysis particles model and a second analysis particles model that respectively model first and second substances in the composite materials.
There is also a proposal, for example, for a polymer material interaction potential determination method for deciding an interaction potential allowing a high accuracy simulation to be performed. In such a determination method, an all-atom filler model that models a filler with an all-atom model is set, and an all-atom polymer model that models molecule chains with an all-atom model is set so as to adjoin the all-atom filler model across a first interface. In this method, a structural relaxation of the all-atom filler model and the all-atom polymer model is computed based on molecular dynamics computation and a first interaction energy is found at the first interface between the all-atom filler model and the all-atom polymer model. Then based on the first interaction energy, an interaction parameter is determined that indicates strength of interaction potential at the interface between the filler and a molecule chain.
Japanese Patent Application Laid-Open (JP-A) No. 2021-71803 JP-A No. 2020-42434.
According to an aspect of the embodiments, a non-transitory recording medium storing a program that is executable by a computer to perform a molecular dynamics computation process includes: executing machine learning of a first force field for predicting energy for an input structure using, as training data, a structure of a coarse-grained model resulting from coarse-graining an all-atom model sampled by molecular dynamics computation based on an all-atom force field and an energy corresponding to the all-atom model structure; and computing molecular dynamics of a coarse-grained model based on a second force field that combines the first force field that has been subjected to machine learning by combining with an energy based on a coarse-grained model structure.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention.
Explanation follows regarding an example of an exemplary embodiment according to technology disclosed herein, with reference to the drawings.
First, prior to giving a detailed description of the present exemplary embodiment, general issues that arise with AA-MD and CG-MD will be described.
First, molecular dynamics is the computation of various forces acting between atoms, such as bonding forces, intermolecular forces, electrostatic interactions, and the like, and is the computation (simulation) of movement of atoms and molecules based on these computed forces and on equations of motion by repeating processing to update the positions of each atom in a time evolution.
For example, in the field of drug discovery, there is a need to analyze a functional expression mechanism, structure change dynamics, and the like of a target protein in vivo. These analyzes are accordingly performed by executing AA-MD input with consideration of all atoms configuring a biomolecule. However, there are a huge number of degrees of freedom in AA-MD, and the computation cost to execute AA-MD is high, making it difficult to execute AA-MD for a long period of time (in the order of milliseconds, for example). Moreover, as illustrated in
Thus, an all-atom model, in which a particle represents a single atom, is expressed by a coarse-grained model in which plural atoms are united as specific units represented by a single particle. Molecular dynamics computation is able to be executed over a long time at low computational cost using CG-MD, in which molecular dynamics are computed using a coarse-grained model. Moreover, although the computed energy profile (potential) of a coarse-grain force field is, for example, as illustrated in
For example, as illustrated in
However, there is an issue with CG-MD in that reference structures obtained by experimentation or the like are strongly stabilized and sampled structures are constrained to the reference structures. More specifically, there is an issue that dynamics obtained with CG-MD are based on a force field having a high degree of arbitrariness originating from the reference structures, and are inferior compared to those of AA-MD. For example, in the example of
Thus, an object of the present exemplary embodiment is to reproduce global structural changes with good efficiency by efficient sampling of diverse structures, by executing CG-MD based on a force field that incorporates an AI force field that has been subjected to machine learning based on an all-atom model. Detailed description follows regarding the present exemplary embodiment.
As illustrated in
The generation section 12 generates training data including a structure of a coarse-grained model resulting from coarse-graining an all-atom model sampled by computing molecular dynamics based on an all-atom force field, and including an energy corresponding to the structure of the all-atom model.
Specifically, the generation section 12 executes AA-MD and takes plural samplings of structures of an all-atom model and energies corresponding to these structures. Examples of AA-MD methods that may be employed include, for example, multicanonical molecular dynamics (McMD), replica exchange molecular dynamics (REMD), metadynamics, and the like, either employed singly or in a combination thereof.
Moreover, the generation section 12 may perform, on the structures and energies obtained by AA-MD, at least one correction selected from the group consisting of correction to stabilize energy, correction to minimize energy, and correction to average energy.
Description follows regarding an example of correction to stabilize energy. Structural ensembles sampled at high temperature using an enhanced sampling method such as REMD, McMD, or the like are diverse, however sometimes there are probabilistic breaks in the behavior, for example bond length or the like, of local structures that a protein should ordinarily hold, and an instability in energy. In order to address this, the generation section 12 may execute short MD equilibration processing, which is processing with the objective of energy stabilization, to settle, or to relax, the main chain of a sampled structure.
For example, the generation section 12 executes short MD equilibration processing of 500 ps under room temperature conditions (T=300K). Moreover, the generation section 12 may use software such as Amber or the like to execute short MD equilibration processing based on a water free (implicit-solvent) generalized born (GB) force field with the objective of suppressing computational cost and relaxing the structure.
Next, description follows regarding an example of correction to minimize energy. The generation section 12 performs energy minimization processing with the main chain fixed. For example, as illustrated in
In consideration of the need to perform such energy minimization, a coarse-grained model that has been coarse-grained to Cα atom units for each residue is applied in the present exemplary embodiment. Feature values extracted from the coarse-grained model structure are input during machine learning of the AI force field. In other words, training is performed while focusing only on the Cα atoms of the coarse-grained model structure.
When this is being performed, sometimes side chain positions differ greatly even if the main chain (Cα positions) is the same within a group of training structures. In such cases, a teacher energy (all-atom energy) held by a target protein is different for each of the structures. However, in the all-atom model, even if the side chain positions are different, as long as the main chain is the same, then this results in the same coarse-grained model structure. This means that there is a lack of consistency as training data in cases in which associated energies are different even though there is the same structure after being coarse-grained, and this lack of consistency is difficult to handle. Thus, energy minimization is performed such as described above in order to secure consistency of training data.
Next, description follows regarding an example of correction to average energy. As described above, the present exemplary embodiment focusses only on the Cα atoms of the coarse-grained model structure. Thus, with the objective of suppressing variation in energies generated by side chain structures, the generation section 12 averages training data so as to facilitate machine learning of correspondence relationships between Cα atom coordinates and energy values. Sufficient accuracy is thereby secured in MD even with a coarse-grained force field.
Specifically, the generation section 12 averages such that energies between like structures approach each other. For example as illustrated in
Moreover, the generation section 12 computes an energy value EAI as training data according to following Equation (1).
EGB of Equation (1) is an all-atom potential energy based on a generalized born (GB) model, and is the energy after the above correction. Estab is a structure stabilization energy and is, for example, configured including part or all of the following four types of energy.
Note that structure stabilization energy other than the above four types may be defined and included in the Estab.
Equation (2) is an example of interaction energy related to bond length. In Equation (2), bi is a bond length held by residue i and residue i+1, bi(0) is an average bond length, Kb, i is a constant (for example, 110.4 kcal/mol/Å2). Equation (3) is an example of interaction energy Eba related to bond angle. In Equation (3), θi is an ith bond angle formed between two consecutive covalent bonds, θi(0) is an average bond angle, and Kθ, i is a constant (for example, 22.08 kcal/mol/Å2). Equation (4) is an example of interaction energy Eda related to dihedral angle. In Equation (4), φi is an ith dihedral angle defined by three consecutive covalent bonds, and φi(0) is an average dihedral angle. Moreover, Kφi(1) is a constant (for example, 1.104 kcal/mol/Å2), and Kφi(3) is a constant (for example, 0.552 kcal/mol/Å2). Equation (5) illustrates an example of an interaction energy Eev related to excluded volume. dij is an excluded volume dependent on residue i and residue j, rij is a Euclidean distance between the Cα atoms of residue i and residue j, εev is a constant (for example, 0.6 kcal/mol), and β is a real number β≥1.
Moreover, as structure information for use as training data, the generation section 12 generates a coarse-grained model being coarse-graining a structure after the above correction. For example, as described above, the generation section 12 generates a coarse-grained model of a post-correction all-atom model coarse-grained at the Cα atom unit of each residue. Moreover, the generation section 12 extracts feature values related to the generated coarse-grained model, and reduces the dimensionality of explanatory variables. For example, the generation section 12 may extract feature values such as weighted atom-centered symmetry functions (wACSF), smooth overlap of atomic position (SOAP), and the like. Moreover, the generation section 12 may employ three-dimensional coordinates of each residue (Cα atom) configuring a protein as direct feature values. Appropriate feature values may be extracted corresponding to configuration of a machine learning model expressing an AI force field subjected to machine learning by the machine learning section 14, described later.
The machine learning section 14 employs the training data generated by the generation section 12 to execute machine learning of the AI force field for predicting an energy for an input structure. The AI force field is an example of a “first force field” of technology disclosed herein. The AI force field may, for example, be configured as a machine learning model such as high-dimensional neural network potentials (HDNNP), graph neural networks, or the like.
HDNNP are multiple neural networks for preparing molecular descriptors for each particle. For example, in cases in which HDNNP are applied in the present exemplary embodiment, as illustrated in
Energy values are output for each residue from each neural network. For example, energy values [EALA1, EALA2] are output from the ALA NN when input with respective feature values for two alanine residues [ALA1, ALA2]. Similarly, energy values [EPRO1, EPRO2, EPRO3, EPRO4] are output from the PRO NN when input with respective feature values for four proline residues [PRO1, PRO2, PRO3, PRO4]. An energy value Eall resulting from summing the energy values of each residue is output from the final output layer of the AI force field.
The machine learning section 14 compares the energy value Eall output from the AI force field against the energy value EAI of the training data, and executes machine learning on the AI force field by updating parameters of the machine learning models configuring the AI force field to get the Eall to approach the EAI.
The computation section 16 computes molecular dynamics of a coarse-grained model based on a hybrid force field resulting from combining the AI force field after machine learning by the machine learning section 14 by combining with an energy based on the coarse-grained model structure. The hybrid force field is an example of a “second force field” of technology disclosed herein. The hybrid force field Ehybrid is expressed by following Equation (6).
The computation section 16 inputs a coarse-grained model structure (input structure r) to the AI force field and predicts an energy value EAI (r). The computation section 16 computes Estab based on the coarse-grained model structure. The computation section 16 then computes the hybrid force field Ehybrid according to Equation (6).
As described above, Estab based on the all-atom model structure is excluded from the AI force field during machine learning. On the other hand, due to the force field during MD execution being a coarse-grained force field, Estab based on the coarse-grained model structure is added to the energy value EAI predicted by the AI force field. A difference between the energy based on the all-atom model used during machine learning and the energy based on the coarse-grained model used during MD execution is absorbed thereby, time evolution of the CG-MD is stabilized, and also unstable high energy states can be prevented.
The computation section 16 computes an acting force on each particle included in the coarse-grained model based on the hybrid force field Ehybrid, computes motion in unit time for each particle in response to the computed force based on the equations of motion, and updates the position of each particle after unit time. The computation section 16 simulates structural change of the coarse-grained model by repeatedly performing the above computation for each unit time with the coarse-grained model in which the positions of each particle have been updated.
The molecular dynamics computation device 10 may, for example, be implemented by a computer 40 as illustrated in
The storage device 44 is, for example, a hard disk drive (HDD), solid state drive (SSD), flash memory, or the like. A molecular dynamics computation program 50 that causes the computer 40 to function as the molecular dynamics computation device 10 is stored on the storage device 44 serving as a storage medium. The molecular dynamics computation program 50 includes a generation process control command 52, a machine learning process control command 54, and a computation process control command 56.
The CPU 41 reads the molecular dynamics computation program 50 from the storage device 44, expands the molecular dynamics computation program 50 into the memory 43, and sequentially executes the control commands contained in the molecular dynamics computation program 50. The CPU 41 operates as the generation section 12 illustrated in
Note that functions implemented by the molecular dynamics computation program 50 may, for example, be implemented by a semiconductor integrated circuit, or more specifically by an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), or the like.
Next, description follows regarding operation of the molecular dynamics computation device 10 according to the present exemplary embodiment. The molecular dynamics computation device 10 executes the molecular dynamics computation processing illustrated in
First description follows regarding the machine learning phase.
At step S10, the generation section 12 executes AA-MD and samples plural all-atom model structures and energies corresponding to these structures. Next, at step S12, the generation section 12 performs, on the structures and energies obtained at step S10, at least one correction selected from the group consisting of correction to stabilize energy, correction to minimize energy, and correction to average energy.
Next, at step S14, the generation section 12 computes the energy value EAI according to Equation (1), and also extracts feature values related to the coarse-grained model structure resulting from coarse-graining the post-correction all-atom model, and generates training data including the EAT and the feature values. Next, at step S16 the machine learning section 14 uses the training data generated at step S14 to execute machine learning on the machine learning model expressing an AI force field for predicting energy for an input structure.
Next, description follows regarding the MD execution phase.
At step S20, the computation section 16 inputs a coarse-grained model structure (input structure r) to the AI force field and predicts the energy value EAI (r), and also computes Estab based on the coarse-grained model structure. The computation section 16 then uses the EAI (r) and the Estab to compute a hybrid force field Ehybrid according to Equation (6), and computes an acting force on each of the particles contained in the coarse-grained model based on the computed hybrid force field Ehybrid.
Next, at step S22, based on the equations of motion, the computation section 16 computes motion in unit time for each particle in response to the force computed at step S20, and updates the position of each particle after unit time. Next, at step S24, the computation section 16 determines whether or not to end MD execution such as by determining whether or not a predetermined simulation duration has been reached. Processing returns to step S20 when determined not to end, and the molecular dynamics computation processing is ended when determined to end MD execution.
As described above, the molecular dynamics computation device according to the present exemplary embodiment generates training data including a coarse-grained model structure resulting from coarse-graining an all-atom model sampled by molecular dynamics computation based on an all-atom force field, and including an energy corresponding to the all-atom model structure. The molecular dynamics computation device uses the generated training data to execute machine learning on the AI force field for predicting the energy of an input structure. Moreover, the molecular dynamics computation device computes molecular dynamics of the coarse-grained model based on the hybrid force field resulting from combining the AI force field that has been subjected to machine learning with the energy based on the coarse-grained model structure. The coarse-grained model MD is thereby executed based on the hybrid force field incorporating the AI force field based on the high accuracy all-atom model and the coarse-grained force field. In other words, the molecular dynamics computation device according to the present exemplary embodiment builds a coarse-grained force field that excludes arbitrary terms of a stabilized reference structure. In this manner, the force field applied during MD execution in the present exemplary embodiment differs from the related coarse-grained force field of structure base, and is not derived from the standpoint of being based on a reference structure (does not need a particular reference structure). This means that in the smooth force field accompanying coarse-graining, the sampling of molecular structures can be performed from a wide structure space without being constrained to reference structures. Therefore, efficient sampling of diverse molecular structures can be performed.
Moreover, in the present exemplary embodiment, as described above, due to employing the AI force field based on a high accuracy all-atom model, a higher prediction accuracy can be implemented than in related CG-MD.
Note that in the exemplary embodiment described above, although a case has been described in which AI force field machine learning and execution of MD based on the hybrid force field are both executed on the same computer, there is no limitation thereto. For example, a machine learning device including the generation section and the machine learning section, and a molecular dynamics computation device including the computation section, may be implemented by different computers.
Moreover, although in the above exemplary embodiment the molecular dynamics computation program was pre-stored (installed) on the storage device, there is no limitation thereto. The program according to technology disclosed herein may be provided in a format stored on a storage medium such as a CD-ROM, DVD-ROM, USB memory, or the like.
An issue with all-atom molecular dynamics is the significantly high computation cost thereof. Coarse-grained molecular dynamics is accordingly proposed that computes molecular dynamics using a model resulting from coarse-graining an all-atom model. However, in cases in which simulation is executed by coarse-grained molecular dynamics and structure data is sampled, reference structures are given for structures held, and sampling of structures is performed based on a force field that includes arbitrary terms coming from these reference structures. There is accordingly an issue in that the reference structures are strongly stabilized, and it is difficult to sample diverse structures.
The technology disclosed herein enables arbitrary terms arising from a reference structure being stabilized to be excluded from the force field.
All examples and conditional language provided herein are intended for the pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although one or more embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2023-177805 | Oct 2023 | JP | national |