Some embodiments described herein relate to computer-implemented methods and systems of determining material properties of molecular systems. In particular, but not by way of limitation, some embodiments described herein relate to computer-implemented methods and systems of determining material properties of molecular systems from an ab-initio parameterized force-field.
Molecular simulation is used to predict experimental observables of molecular systems. A force field model of a molecular system is a representation of the energy of the system (i.e., the Hamiltonian) which can be used to determine properties of a physical system. However, devising interaction models or force field (FF) models for arbitrary molecules with little or no reliance on experimental data has presented challenges. In addition, some known molecular force field models and molecular simulation methods either do not extend to a wide coverage of molecular systems or fail to determine properties within an acceptable degree of chemical accuracy.
Accordingly, a need exists for a wide coverage force field model and a molecular simulation method that are transferable and extendable to previously un-benchmarked chemical species, have minimum reliance on experimental data, and predict experimentally observable properties within an acceptable degree of chemical accuracy.
In some embodiments, a method includes determining, by a processor, an atom type from a set of atom types of each atom from a set of atoms in a functional group of a molecular system. The method includes calculating, by the processor and based on the set of atom types, a first set of ab initio molecular properties of a monomer having the functional group and a second set of ab initio molecular properties of a multimer having the functional group. The method further includes determining, by the processor, a set of parameters of a force field model by fitting the force field model to the first set of ab initio molecular properties and the second set of ab initio molecular properties until at least an agreement reaches a pre-determined threshold. The agreement can include one of (1) a first difference between a first set of predicted molecular properties of the monomer and the first set of ab initio molecular properties of the monomer or (2) a second difference between a second set of predicted molecular properties of the multimer and the second set of ab initio molecular properties of the multimer. The method includes determining, by the processor and based on the set of parameters and the force field model, a molecular property of the molecular system and sending, by the processor, a signal to present the molecular property of the molecular system.
Embodiments described herein include a parameterized force field model and a parameterized force field molecular simulation system and/or method using the parameterized force field model. The parameterized force field model can be transferable, have wide coverage, and have the property of extending interactions to previously un-benchmarked chemical species. The parameterization of the force field molecular simulation method discussed herein can be based on first principles that have little or no reliance on experimental data. Embodiments described herein can predict experimentally observable properties of molecular systems within an acceptable degree of chemical accuracy.
A force field is a method that can be used to estimate the forces between atoms within molecules and also between molecules. The force field refers to the functional form and parameter sets used to calculate the potential energy of a system of atoms or coarse-grained particles in molecular mechanics, molecular dynamics, or Monte Carlo simulations. The parameters for a chosen energy function can be derived or adjusted from experiments in physics and chemistry, calculations in quantum mechanics, or both.
The force field parameters and interaction rules in applications to molecular systems can define the energy landscape from which a number of results of interest can be derived. For example, the acting forces on particles can be derived as a gradient of the potential energy with respect to the particle coordinates. For another example, the resulting equations of motion can be propagated forward in time and time averages can predict experimental observables.
Prediction of experimentally observable ensemble and non-ensemble properties of a material system may be obtained from a molecular force field model of the material system. An ensemble property may be obtained via molecular simulation of the model, for example, molecular dynamics (MD), path integral molecular dynamics (PIMD), or Monte Carlo (MC). The way to determine the ensemble properties from the Hamiltonian at non-zero temperature can be, for example, via various thermodynamic sampling methods such as molecular dynamics (MD), path integral molecular dynamics (PIMD), or Monte Carlo (MC).
The molecular interaction models can be roughly separated into two categories: scope limited models that benchmark their interaction energies with Quantum Mechanics (QM) calculations, and wide coverage models that fit at least some of their parameters to selected sets of experimental data. Some known QM parametrized models can employ high functional complexity to reproduce QM energies. The functional complexity of these known QM-parameterized models, however, can make transferability and extending coverage to other chemical species and hetero-chemical ensembles difficult.
Some known wide-coverage molecular force field models derive some or all of the parameters by fitting to empirical observables. There are at least two drawbacks to this approach. First, even available experimental data (e.g., densities, heats of vaporization) can be insufficient to produce models that describe existing compounds precisely; and there can be molecules (for example, molecules that have not been synthesized) that require more precise description than available data from existing inference. Second, if an empirical model's prediction is erroneous, it can be difficult to decide which parameter(s) of the model to remove, add, correct or adjust.
In a wide coverage force field, a chemical group can be considered described when both its self-interactions and its interactions with other described group(s) are determined. A Force Field with wide coverage can, in some examples, fully describe the interactions between most and/or all chemical groups. In the case of a protein, because of the importance of biology, wide coverage force fields can include the interaction of amino acids and backbone, both between themselves, water, and non-polar solvents at the very minimum. Furthermore, in some instances, the force field can describe the interactions between a significant number of major chemical groups, e.g., alkanes aromatics amides amines sulfides esters, in both homo (self) and hetero (between any distinct number of) interactions. More specifically, in some instances, the force field can describe the behavior of these groups in a variety of solvents: H2O, CHEX (cyclohexane), OCT (octanol), Benzene, etc.
Finally the simulation results produced with the force-field can correctly describe the entropic behavior of major solvents—H2O, CHEX, OCT, benzene, in order to capture the entropic component of solvation. Moreover the results can correctly describe the entropic behavior of arbitrary molecular systems.
Some known fixed charge force field models lack flexibilities in the functional form to fit the experimental results well and frequently the errors are uncontrollable. The parameters of these known fixed charge force field models that are determined by QM energies are not well defined, as these models do not have polarization, and are, therefore, unable to transfer the results obtained by QM (in the gas phase) into the liquid phase or the solid phase (condensed phase). Furthermore, when fitting to experimental results, these known models do not have clear guidance on which parameters should be corrected, and therefore errors also arise.
The known Merck Molecular Force Field (MMF94) model, for example, also lacks polarization and includes ad-hoc factors. The functional form of the MMF94 model does not permit transferability from the available QM calculations to larger collections of molecules, e.g., to the condensed phase. As a result, the known MMF94 model produces unacceptable low quality predictions and does not enable accurate molecular modelling predictions from first principles.
Some known machine learning models lack the precision, transferability or coverage for describing arbitrary molecular ensembles. In addition, these known machine learning models have not been incorporated into a fully functional simulation stack with, e.g., long distance truncation, nuclear quantum effects, or thermodynamic property outputs.
Embodiments described herein include a QM-parameterized force field model and a molecular simulation method, which include the benefits of: (1) interaction energy calculated from first principles and have little, minimum or no reliance on experimental data, (2) transferable and extendable to other molecular systems, and (3) predicts experimentally observable properties within an acceptable degree of chemical accuracy. The QM-parameterized force field models and the molecular simulation methods described herein can accurately predict experimental observables of molecular systems with models for arbitrary molecules with little, minimum or no reliance on experimental data.
Example embodiments are illustrated in the drawings. It is to be understood that the embodiments are described only to facilitate better understanding of the embodiments, rather than to limit the scope of the embodiments in any manner.
An advantage of the QM-parameterized force field models described herein is that in some embodiments, QM calculations can be obtained for arbitrary molecules. In some embodiments, a limitation may exist on QM computations of the size of the molecular system. Another advantage of the QM-parameterized force field models described herein is that prediction errors can be traced to, for example, the imprecise description of the interaction energies and rectified in the model. In some implementations, embodiments described herein include methods to generate models parameterized entirely from first-principles (ab-initio) Quantum Mechanics calculations.
The QM-parameterized force field models described herein can be multipolar, polarizable, and the dynamics of the QM-parameterized force field models can include nuclear quantum effects. The QM-parameterized force field models and the molecular simulation methods described herein can maintain a high computational efficiency. The construction of a wide-coverage molecular model from first principles and its excellent predictive ability in the liquid phase, as described in embodiments herein, is an advance in molecular simulation.
Embodiments described herein include an ab-initio-based molecular interaction model with chemical accuracy that enables the quantitative in-silico determination of properties of molecular ensembles.
Embodiments described herein include QM-parameterized force field models applicable to arbitrary molecules. The QM-parameterized force field models can predict solvation free energies of molecular systems within chemical accuracy (in some implementations, better than thermal noise at room temperature (˜0.59 kcal/mol)). In some implementations, the predictions in the liquid phase can be satisfyingly accurate while the model is created from ab-initio computational methods without fitting to any experimental data. The value of 0.5 kcal/mol for the desired (‘chemical’) accuracy of free energy predictions arises from several considerations. First, 0.59 kcal/mol is the thermal noise at ambient conditions (room temperature and pressure). This is the inherent fuzziness of our everyday biological world. Additionally, for example, in ligand-protein lead optimization the definition of ‘incremental lead improvement’ is about 0.5 kcal/mol or ˜2-3 fold increase binding affinity.
In some implementations, the QM-parameterized force field models described herein can include a set of features that enables the final energies of the QM-parameterized force-field models to agree with the final energies of a small collection of molecules within an average chemical accuracy below 0.5 kcal/mol. In some implementations, for neural-neutral chemical groups in a molecular system, such chemical accuracy can be improved to around 0.3 kcal/mol. In some implementations, for some neural-charged proper ions in a molecular system, such chemical accuracy can be worsened to be higher than 0.5 kcal/mol. The set of features can include, for example, (1) fully multipolar which can provide accurate fitting, (2) polarizable, (3) model parameters that are assigned to interaction centers (i.e., atoms) and intramolecular degrees of freedom (i.e., bond stretch energies), (4) model parameters that depend on and are changed by different chemical (i.e., within a molecule) molecular environments, and/or (5) transferability.
In some implementations, the QM-parameterized force-field molecular simulation methods described herein can extend and combine models for sub-pieces of a molecule when, for example, the whole molecule(s) are too large to compute.
In some implementations, the QM-parameterized force-field molecular simulation methods can provide a certain accuracy by including specific algorithms and/or processes in the downstream simulation protocols. One set of such algorithms can be, for example, a proper truncation method of long-range interactions (e.g. electrostatic and dispersion). A specific example of such a truncation method can be the Particle Mesh Ewald method (PME). Another set of such algorithms can be, for example, methods that may permit one to include quantum mechanics effects. A specific example of these algorithms may be described as Path Integral Molecular Dynamics (PIMD).
In some embodiments, Molecular Mechanics (MM) can refer to a Newtonian representation of molecular interaction, where the energy is a function of atomic and molecular geometries, distances and orientations. This includes monomer, two-body and n-body contributions.
In some embodiments, Quantum Mechanics (QM) can refer to a quantum mechanical description and propagation of the molecular system. It may mean an N-body wavefunction solution, or DFT (Density Functional Theory), which reduces computational load by using an electron density functional instead of an n-body wavefunction.
In some embodiments, transferable can refer to the property of extending interactions to previously un-benchmarked chemical species. For a transferrable interaction, having determined such between A<->B and B<->C, the interactions between A<->C are automatically determined and correct. The electrostatic interaction is an example.
In some embodiments, separable can refer to an interaction where the properties of the interaction between species are a function of properties belonging to the individual involved species. For example, an electrostatic interaction is proportional to the product of the individual charges q1*q2.
In some embodiments, a combination rule can be the actual function giving the energy of a separable interaction from individual properties, e.g., q1*q2/r.
The parameterized force field model, the QM-parametrized force field model, the QM-parameterized model, and the QM-parameterized physics-based molecular models can be used interchangeably in embodiments described herein.
In some embodiments, the “interaction rank”, L, can indicate the nature of interaction between two multipole moments. In some implementations, when L=0, monopole-monopole interactions are considered, while L=2 can mean a dipole-dipole interaction or a monopole-quadrupole interaction is considered.
In some embodiments, chemical accuracy may refer to thermal noise at room temperature and pressure which is 0.59 kcal/mol or kT at 300K.
Embodiments described herein that relate to dimer(s) can also apply to multimer(s). In some implementations, a multimer can be a molecule that includes two or more monomers. In other words, a multimer can be a dimer, a trimer, a tetramer, pentamer, hexamer, and so on. For example, the QM-parameterized force field computing device (e.g., 101 in
The memory 112 can be, for example, a random-access memory (RAM) (e.g., a dynamic RAM, a static RAM), a flash memory, a removable memory, a hard drive, a database and/or so forth. In some implementations, the memory 112 can include (or store), for example, a database, process, application, virtual machine, and/or other software modules (stored and/or executed in hardware) and/or hardware modules configured to execute a QM-parameterized force-field molecular simulation process(es) as described with regards to
The processor 111 can be configured to, for example, write data into and read data from the memory 112, and execute the instructions stored within the memory 112. The processor 111 can also be configured to execute and/or control, for example, the operations of other components of the parameterized force-field computing device 101 (such as a network interface card, other peripheral processing components (not shown)). In some implementations, based on the instructions stored within the memory 112, the processor 111 can be configured to execute one or more steps of the QM-parameterized force-field molecular simulation process(es) as described with regards to
In some implementations, the processor 111 can be configured to execute a QM-parameterized force-field molecular simulation method that can include, for example, the steps: (1) creating a QM description of the atoms of a molecular system with required accuracy; (2) creating a force field model from the QM description; (3) optionally, extending or transferring the existing force field model to new atom types; and (4) performing thermodynamic sampling with a simulation method such as, for example, MD or MC, that incorporates procedures such as nuclear quantum effects (NQE) to obtain results, i.e., ensemble properties that predict experimentally observable properties with chemical accuracy.
In some implementations, the memory 112 may include a parameterization software stack 113, a simulation software stack 114, a typification stack 115, a simulation analysis stack 116, and force field models 117. Each of the parameterization software stack 113, the simulation software stack 114, the typification stack 115, and the simulation analysis stack 116 can include code having instructions that, when executed by the processor 111, cause the processor 111 to perform at least a portion of the steps in the QM-parameterized force-field molecular simulation process. In some implementations, the force field models 117 and information of the force field models 117 can be stored in, for example, flat files, in suitable markup language files, in memory, in a database format, etc. The formats can be human or machine readable. In some implementations, the memory 112 can include a database that can store the interaction parameters of the QM-parameterized force field model and the processor 111 can update, manage, and control access to the database. The processor 111 can also perform consistency and integrity checks of the data stored in the database when, in some implementations, the data are added to the database. For example, when adding information of molecular properties or interaction parameters of a new non-bonded type of molecules, information of a corresponding bonded type of the molecules may be needed and/or used for the QM-parameterized force-field molecular simulation process. In such situations, the processor 111 can automatically send a notification (to, for example, a compute device of a user) to request such information to be added to the database. In some implementations, the interaction parameters provided by the database for the QM-parameterized force-field molecular simulation process may be protected by encryption. The processor 111 may assign different access rights to users to have different levels of decryption and/or access capabilities. For example, a first user can access to a first set of interaction parameters of the QM-parameterized force field model, which is not accessible by a second user. Thus, the QM-parameterized force-field molecular simulation process performed for the first user can be kept confidential from the QM-parameterized force-field molecular simulation process performed for the second user.
The typification software stack 115, when executed by the processor 111, can be configured to determine an atom ‘type’ of an atom based on the atom's QM monomer properties and the atom's chemical environment in the molecule. An atom ‘type’ can be a classification of an atom, which can depend on its atomic number. An atom ‘type’ can also be further differentiated based on its chemical connections to other atoms in the molecule and can help the interaction parameters to better represent such subclasses. An example of such differentiating atom types can be an aromatic carbon and an aliphatic carbon.
The simulation software stack 114 can include, for example, an MD package which can contain features designed to support the multi-featured QM-parameterized force field model. For example, the features can include, but are not limited to, multipolar exchange interactions. In some implementations, these features (e.g., multipolar exchange interactions) are not required to generate the QM-parameterized force field model or use the QM-parameterized force field molecular simulation methods described herein. In some implementations, the simulation software stack 114 can be configured to determine the atomic/molecular polarization at each configuration (e.g., timestep) to support various polarization descriptions in the QM-parameterized force field model. For example, the polarization description can be induced dipole moments. In some implementations, the simulation software stack 114 can include code, when executed, to enable the multipolar Ewald truncation to describe diffuse charge distributions. In some implementations, the simulation software stack 114 can be configured to implement the QM-parameterized force field model which can, in some instances, have an explicit smeared density in addition to multipole moments. The simulation analysis stack 116 can enable analysis, visualization and computation of ensemble properties from atomic and molecular trajectories calculated using the QM-parameterized force field models. In some implementations, the simulation analysis stack 116 can be configured to receive data of locations and energies of atoms at each timestep and generate predictions of ensemble properties (e.g., physical properties) of atoms/molecules.
In some implementations, the parameterization software stack 113 can be configured to receive computed QM data and generate model parameters that can enable the computation of energies and/or forces of a molecular ensemble. The parameterization software stack 113 can include features such as full parameterization, lightweight parameterization, environmental parameterization, QM calculation quality, and/or the like. In some implementations, the full parametrization feature can be applied when the atom is sufficiently different from other atoms of the same chemistry (e.g., aromatic carbon vs. carbon in alkane) such that a full parameterization of the model parameters can be determined. In some implementations, the model parameters can be fitted to, for example, homo and hetero multimer (e.g., dimer) QM energies, monomer properties (ES potential map, conformational energies), multimer QM energies, and/or the like. In some implementations, the lightweight parameterization feature can be applied when it is computationally expensive to obtain QM data. In some situations, the lightweight parameterization feature can be applied only when model parameters may be altered in nature by adjacent atoms, and/or can be fitted largely to monomer properties which are tractable. In some implementations, the environmental parameterization feature can be applied when the atoms and molecules are in a relatively extreme environment. For example, to deduce the liquid and crystal phase parameters from gas phase QM calculations, the transferability can be degraded, when atoms and molecules are in an environment of, for example, high electric fields, in the presence of a nearby ion or a large number of polar groups, in an externally induced field(s), or under high temperature/pressure. For example, a protein can be such an environment. Therefore for such cases the QM parameterized force-field model can be configured to enable an ‘environment’ containing parameterization where the QM energies are computed while the molecules are immersed in a simplified and thermally averaged representation of their environment. The QM calculation quality feature included in the parameterization software stack 113 can balance the QM calculation quality/accuracy with the complexity of the parameterization features of QM-parameterized force field model. In some situations when highly precise predictions of, for example, protein-drug interactions, as well as many other property predictions are desired, a high level of theory (may also be highly computationally expensive) method can be used.
In some implementations, the memory 112 can include and/or store a set of force field features that can be used to optimize and/or improve the QM-parameterized force field model when fitting the model to QM. The set of force field features can be implemented in the simulation software stack 114 and/or the parameterization software stack 113. For example, the set of FF features can include (1) polarizability (e.g., di-polar, quadrupolar, fluctuating charge), and/or exchange-induction (exchange can be ‘polarized’), (2) penetration effects (e.g., charge entities are not points but diffuse multipolar clouds) in electrostatics energies and damping effects in dispersion energies, (3) multipolar electrostatic interaction energy (e.g., with an “interaction rank”, L, of 2 or 3), (4) multipolar exchange, (5) charge transfer interaction, (6) multipolar dispersion interaction, (7) component-based modeling, and/or the like. When constructing the forces and energies of a molecular system, in some implementations, the QM-parameterized force-field model can be configured to decompose the interactions into components with different transferability rules (e.g., the exchange interaction, the dispersion interaction, the electrostatic interaction, the induction interaction, and charge transfer). For example, dispersion interaction can simultaneously have geometric (or multiplicative) rules for the force constants and arithmetic (additive) rule for the effective widths. Electrostatic combination rules can come directly from Coulomb's law applied to continuous charge distributions. In some implementations, the QM-parameterized force-field model can be configured to not force all the energies into transferrable components. In some implementations, there are non-negligible contributions that do not decompose or transfer and the QM-parameterized force-field model can be configured to keep track of these contributions and of the need to determine them later if necessary (while the rest is available via transferability).
In some embodiments, the example QM-parameterized force-field molecular simulation process 300 can include parameterization of a functional group in a molecular system. Specifically, in some implementations, the parameterization of the functional group includes preparing QM data of molecules containing the functional group of interest and optimizing and/or improving parameters of the QM-parameterized force-field model that describes intermolecular and intramolecular interactions.
At step 301, the example QM-parameterized force-field molecular simulation process 300 includes selecting a molecule/functional group of interest and using, for example, the typification software stack (115 in
Determining the atom type can be based on, for example, (1) a chemical identity of the atom determined by a number of bound electrons of the atom, (2) properties of a molecular neighborhood of the atom including chemical identities, arrangement and hybridization of a set of atoms neighboring the atom and atom types of each atom from the set of atoms, (c) at least one characteristic of the atom including shape or density of an electronic structure of the atom, effective dispersion coefficients, polarization coefficients, a location of the atom relative to the set of atoms neighboring the atom, and/or (4) the atom's QM monomer properties. In some implementations, the shape and the density of the electronic structure of the atom can include at least one of a dipole moment, a quadrupole moment, or a multipole moment.
At step 302, the example QM-parameterized force-field molecular simulation process 300 includes determining a first set of ab initio molecular properties of a monomer and a second plurality of ab initio molecular properties of a multimer (e.g., dimer) for bonded and non-bonded interactions of the molecule/functional group of interest, based on the plurality of atom types determined from step 301. In some implementations, the QM calculations (e.g., calculations based on quantum mechanics rule-based reasoning) can include determining electrostatic energies on monomers associated with the molecule/functional group (i.e., a first set of ab initio molecular properties of the monomer having the molecule/functional group) and/or multimers (i.e., a second set of ab initio molecular properties of the multimer having the molecule/functional group). The monomer QM calculations (i.e., a first set of ab initio molecular properties of the monomer) can include calculating, for example, molecular moment (e.g., dipole moment, quadrupole moment, multipole moment, at, for example, the mp2/avqz level), molecular electrostatic field or potential, energy of molecular conformation, and polarizability of the monomers or polarization of the monomers by an applied external field or charge. The multimer QM calculations can include calculating, for example, (1) the energies and their components based on a pre-determined standard (e.g., a gold standard, a silver standard, a bronze standard), (2) potential energies of hetero-multimer interactions between the monomer and a first probe species, (3) potential energy of homo-multimer interactions between two identical molecules of the monomer, and/or (4) potential energy of multimer-multimer interactions between the monomer and at least two probe species having a second probe species and a third probe species. The probe species (e.g., the first probe species, the second probe species and the third probe species) can be a charge, a noble gas atom, or a small molecule. In some implementations, the small molecule includes at least one of a water molecule, a methane molecule, or an ammonia molecule. The potential energy of the hetero-multimer interactions, the potential anergy of homo-multimer interactions, and the potential energy of multimer-multimer interactions are calculated based on a set of orientations (or a set of distances) of the hetero-multimer interactions, the homo-multimer interactions, and the multimer-multimer interactions, respectively.
At step 303, the example QM-parameterized force-field molecular simulation process 300 includes determining and optimizing and/or improving multiple parameters of a (polarizable or QM-parameterized) force field model by fitting the QM-parameterized force field model to the monomer QM calculations (i.e., the first set of ab initio molecular properties of the monomer) and/or the multimer QM calculations (i.e., the second set of ab initio molecular properties of the multimer) from step 302 until an agreement between (1) multiple predicted molecular properties and (2) the monomer QM calculations and the multimer QM calculations reaches a threshold. In some implementations, the agreement can be at least one of (1) a first difference between a first set of predicted molecular properties of the monomer and the first set of ab initio molecular properties of the monomer or (2) a second difference between a second set of predicted molecular properties of the multimer and the second set of ab initio molecular properties of the multimer. Fitting the QM-parameterized force field model to the monomer QM calculations and/or multimer QM calculations includes adjusting the values of the parameters of the QM-parameterized force field model to determine and/or predict molecular properties of the monomer and/or multimer so that the molecular properties are in vicinity of the monomer QM calculations (i.e., the first set of ab initio molecular properties of the monomer) and/or the multimer QM calculations (i.e., the second set of ab initio molecular properties of the multimer). For example, fitting the QM-parametrized force field model to the first set of ab initio molecular properties and the second set of ab initio molecular properties includes adjusting at least one parameter from the multiple parameters of the QM-parametrized force field model to cause molecular properties (e.g., electrostatic potential of the monomer) determined using the force field model to be in a pre-determined range of the molecular properties (e.g., electrostatic potential of the monomer) from the first set of ab initio molecular properties or the second set of ab initio molecular properties. For another example, fitting the QM-parameterized force field model to the first set of ab initio molecular properties and the second set of ab initio molecular properties includes adjusting a polarization parameter from the multiple parameters of the QM-parameterized force field model or an induction parameter from the multiple parameters to produce an effect of a probe charge or an external electric field placed around the monomer (or produce a non-additive energy of a multimer), or cause an induction component of dimerization energy of the multimer determined using the QM-parameterized force field model to be in a pre-determined range of the induction component of the multimer from the second set of ab initio molecular properties.
In some implementations, the QM-parameterized force field model can be transferable to a different molecular system from the molecular system such that molecular properties of the different molecular system predicted based on the force field model is within a chemical accuracy of the molecular properties of the different molecular system determined from experiment.
In some implementations, the QM-parameterized force field model includes a plurality of components associated with at least one of exchange energy, dispersion energy, electrostatics energy, charge transfer interaction, induction energy, exchange-induction energy, penetration effect in electrostatics potential, damping effect in dispersion energy, multipolar electrostatics energy, multipolar exchange energy, multipolar dispersion interaction energy, or many-body interaction energy.
In some implementations, the QM-parameterized force field model is associated with intermolecular interactions of the molecular system including an electric field, and/or auxiliary interaction locations including an electron lone pair site or mid-bond site.
At step 304, the example QM-parameterized force-field molecular simulation process 300 includes determining (or predicting), based on the multiple parameters of the QM-parameterized (and/or polarizable) force field model and the QM-parameterized force field model (which can be considered as a quantum mechanics rule-based reasoning technique), molecular properties of the molecule (or the molecular system) within a chemical accuracy (e.g., thermal noise at room temperature and pressure, substantially being 0.59 kcal/mol or kT at 300.) The molecular properties can be ensemble properties (e.g., Boltzmann ensemble properties) predicted using a thermodynamic sampling of the molecular system based on the QM-parameterized force field model. The thermodynamic sampling method can be, for example but not limited to, molecular dynamics simulation, stochastic simulation, or Monte-Carlo simulation, or path integral molecular dynamics (PIMD). In some implementations, the determining the molecular property of the molecular system is based on thermodynamic sampling and nuclear quantum effects (NQE). In some samples, the predicted molecular properties can be ensemble or non-ensemble properties. In some implementations, the predicted molecular property includes at least one of density, heat of vaporization, heat of melting, heat of sublimation, free energy of solvation, or free energy of ligand binding in ligand-protein systems, thermal conductivity, diffusion coefficient, X-ray diffraction, glass transition temperature, glass transition enthalpy change, or enthalpy of solid-solid phase transitions.
At step 305, the example QM-parameterized force-field molecular simulation process 300 can include sending a signal to present the molecular properties of the molecule via, for example, user interface on a monitor of a user's compute device, for visualization and analysis. For example, the signal can include information related to the protein-protein interactions, protein-ligand interactions, protein folding and various protein properties. In some implementations, the signal example QM-parameterized force-field molecular simulation process 300 can be used to screen drug candidate options, assess ligand and protein targets in a lead optimization phase of drug development, in drug design and discovery (e.g., design ligands that bind to a protein for treatment of a disease), in applications of aggregation of biotherapeutic formulations, to predict structure of antibodies, protein-protein binding, among many other applications discussed below. These applications can be performed with the use of the example QM-parameterized force-field molecular simulation process and in response to the signal.
At step 401, the example QM-parameterized force-field molecular simulation process 400 includes performing a typification process. Specifically, the example QM-parameterized force-field molecular simulation process 400 includes obtaining geometrical coordinates of solute of interest (or molecular target of interest, the functional group of interest) and assigning atom types to the atoms in the molecule.
At step 402, the example QM-parameterized force-field molecular simulation process 400 includes performing QM calculations on the functional group of interest. In some implementations, the example QM-parameterized force-field molecular simulation process 400 can include performing monomer QM calculations (e.g., Molpro) to calculate, for example, moment (e.g., dipole moment, quadrupole moment at, for example, the mp2/avqz level), electrostatic potential, and polarizability of the monomers of the solute of interest (e.g., based on a quantum mechanics rule-based reasoning). The electrostatic potential of the monomers can be calculated for, for example, several Van der Waals radii (one set of radii can be 1.4, 1.6, 1.8, 2.0, 2.5 VdW) and the QM calculated molecular electrostatic potential (MEP) can be fitted at molecular surfaces using an atom-centered point charge model (e.g., as in the Restrained ElectroStatic Potential (RESP) procedure). In some implementations, the monomer QM calculations can include calculating interactions of the molecular target of interest with a number of point charges (the number can be 100 to 200, 200 to 300, 300 to 400, more than 400) and the charge can be 0.1 e (or any suitable charge, e.g., charges <=plus/minus 1e) randomly placed around the molecule.
In some implementations, the example QM-parameterized force-field molecular simulation process 400 can then perform multimer (e.g., dimer) QM calculations based on a pre-determined standard (e.g., a gold standard, a silver standard, a bronze standard). In some implementations, for example, the example QM-parameterized force-field molecular simulation process 400 can perform the multimer QM calculations based on the silver standard, defined as:
E
silver_standard=mp2/CBS+ccsd(t)/aug-cc-pVDZ-mp2/aug-cc-pVDZ
where mp2/CBS can be obtained by extrapolation using mp2/aug-cc-pVTZ and mp2/aug-cc-pVQZ intermolecular energies. Example multimers can include: the solute of interest and water, the solute of interest and methane, the solute of interest and the solute of interest (i.e., homo-multimers). In some implementations, to perform the multimer QM calculations, the QM-parameterized force-field computing device can extract transferable components from the interaction of the multimers and with sufficient chemical accuracy (e.g., 0.5 kcal/mol or less) for total energies. In some implementations, the QM-parameterized force-field computing device can extract sub-components of the energetic interactions of the multimers. In some implementations, the QM-parameterized force-field computing device can use Symmetry-Adapted Perturbation Theory (SAPT) methods (or other energy decomposition analysis (EDA) methods) to extract such components. A specific example of such a decomposition (i.e., extracting transferable components from the interaction of the multimers, or extracting sub-components of the energetic interactions of the multimers) can be dimerization energy calculated with DFT-SAPT/aug-cc-pVTZ with PBE0 functional asymptotically corrected. A collection of computer codes can be designed to implement the SAPT methods.
In some implementations, the QM-parameterized force-field computing device can calculate the energies and their components at a chemical accuracy no less than the desired accuracy of output predictions. For example, if the desired input accuracy is on the average 0.1 kcal/mol, a quantum level of theory can be mp2/CBS+ccsd(t)/aug-cc-pVDZ-mp2/aug-cc-pVDZ. For another example, if the desired level of input accuracy is 0.01, the quantum level of theory can be mp2/CBS+ccsd(t)/aug-cc-pVTZ-mp2/aug-cc-pVTZ. If the desired level of input accuracy is 0.6 kcal/mol the quantum level of theory can be DFT-SAPT aug-cc-pVTZ with the PBE0 functional asymptotically corrected or mp2/aTZ. If the desired level of accuracy is 1.5 kcal/mol the quantum level of theory is HF (Hartree-Fock) 6-31G*.
In some implementations, the accuracy mismatch between QM methods suitable for extraction of full energy and QM methods suitable for extraction of components can be resolved by placing the mismatch into a suitable component, for example, electrostatic, dispersion, exchange, charge transfer or polarization components.
In some implementations, to generate the multimer energy input data, the QM-parameterized force field computing device can calculate a set of multimer interactions of “molecular target of optimization” (e.g., ethanol) with probe molecules. Probe molecules can include water, methane, ammonia, ions, or any preferably small molecular entities. The set of multimer interactions can include, for example, a set of surrounding neighborhoods of the molecule of interest with varying intermolecular distance placed in random orientations, or a set of potential minima and saddle points of intermolecular potential energy surface which can be generated via a random search, a directed search, a simulated annealing procedure, a stochastic optimization procedure, a genetic optimization algorithm, and/or the like. The set of multimer interactions can also include, for example, an examination set of simple translations and rotations around major minima, which can be used as input data and as a visual examination check of the interaction and its components. The set of multimer interactions can include, for example, other inter-molecular configurations that help define the potential energy surface and/or configurations generated to represent more specifically the interactions of interest in a system. Such configurations can be, for example, an ion and neighboring amino acid residues encountered in proteins, adjacent molecules in a crystal lattice, a chemical entity of special importance (e.g., a solvent).
At step 403, the example QM-parameterized force-field molecular simulation process 400 can receive and process the QM data (e.g., the monomer QM calculations and the multimer calculations obtained in step 402, the geometrical coordinates obtained in step 401) and output the results to, for example, the parameterization software stack 113 (e.g., MolMechFitter) for optimization and/or improvement of the parameters of the QM-parameterized force field model. In some implementations, the processing of the received QM data includes generating an initial prediction of the parameters of the QM-parameterized force field model based on, for example, analogous atom-type parameters existing in the repository (e.g., FF models 117 in memory 112 in
At step 404, the example QM-parameterized force-field molecular simulation process 400 includes determining and optimizing the parameters of the QM-parameterized force-field model using the parameterization software stack (e.g., 113 in
These steps of optimization and/or improvement can be performed in different orders and in some implementations, not all steps are required at all time. Some steps of the optimization and/or improvement can be optional.
At step 405, the QM-parameterized force field computing device can check whether the total or component energy differences of multimers (FF−QM) are within a pre-determined chemical accuracy. If pre-determined chemical accuracy has not been achieved, the QM-parameterized force field computing device can repeat all or some steps in step 404 to optimize and/or further improve the parameters (or introduce new parameters). In some implementations, the QM-parameterized force field computing device can perform other solutions to optimize and/or improve the parameters (e.g., extend current atom typification, alter the weights on the total energy or energy components to improve the agreement between FF and QM).
If the pre-determined chemical accuracy has been achieved, the QM-parameterized force field computing device can proceed to step 406. At step 406, the example QM-parameterized force-field molecular simulation process 400 includes molecular dynamics simulations. In some implementations, the example QM-parameterized force-field molecular simulation process 400 can include inputting the parameters of the QM-parameterized force field model determined from steps 404 and 405 into the simulation software stack (114 in
At step 407, the example QM-parameterized force-field molecular simulation process 400 includes analysis of the molecular dynamics simulations using, for example, the simulation analysis stack (116 in
At step 416, the example QM-parameterized force-field molecular simulation process 400 can further refine the charge shift parameters (from step 415) and optimize and/or improve the electrostatic and exchange widths for the force field model, based on QM calculations of multimers 417 (e.g., electrostatic and exchange energy of multimers E1pol & E1exchange of solute-probe multimer energy DFT-SAPT/avtz.) At step 419, the example QM-parameterized force-field molecular simulation process 400 can optimize and/or improve the induction parameters and charge transfer and dispersion corrections, based on QM calculations of total energy of homo and hetero-multimers 418 (e.g., total multimer interaction energy calculated at silver standard.) At step 420, the example QM-parameterized force-field molecular simulation process 400 can further refine any or all force field parameters based on steps 415, 416, 419 and QM calculations of total energy of homo and hetero-multimers 418. The example QM-parameterized force-field molecular simulation process 400 can then proceed to step 405 from
The example here illustrates using the QM-parameterized force field model and the QM-parameterized force field molecular simulation described with regards to
For biophysical ensembles that exist at room temperature and pressure, and where the entropic contributions are on par with interaction strengths, the free energies can be both important and difficult to predict. In this example, free energies of solvation are computed for a diverse set of organic compounds using a (polarizable) QM-parameterized force field model fitted, in some implementations, entirely to ab initio calculations (without fitting to experimental data). The mean absolute errors of hydration, cyclohexane solvation, and corresponding partition coefficients can substantially reach 0.2 kcal/mol, 0.3 kcal/mol and 0.22 log units, i.e. within chemical accuracy.
The predictive ability of the QM-parameterized force-field model described herein can be demonstrated by computing solvation free energies for a wide range of chemical functional groups in, for example, water and cyclohexane. The QM-parameterized force-field molecular simulation method described in this example can be executed by the processor of the QM-parameterized force-field computing device (e.g., 101 in
SI: Supplementary Information
QM: Quantum Mechanics
FF: force-field (model)
MAE: Mean Absolute Error
NQE: Nuclear Quantum Effect
MD: Molecular Dynamics
PIMD: Path Integral Molecular Dynamics
H2O: Water
CHEX: CycloHexane
MMFF94: Merck Molecular Force Field
AMOEBA: Atomic Multipole Optimized Energetics for Biomolecular Simulation
GAFF: Generalized AMBER (Assisted Model Building with Energy Refinement) Force Field
A. QM and Force Field agreement (“QM-FF agreement”)
An advantage of the QM-parameterized force-field model described herein includes determining the degree of faithfulness that is sufficient for modeling the liquid phase of arbitrary molecules and mixtures while keeping the model complexity manageable. The QM-parameterized force-field model and the QM-parameterized force field molecular simulation method described herein provide an ab-initio-based molecular interaction toolset with chemical accuracy that can enable the long-promised quantitative in-silico determination of properties of any molecular ensembles.
In this example, the first step is choosing the level and accuracy of the underlying QM computations. The chosen accuracy level of input data can propagate throughout the technology stack and can affect the accuracy of output predictions. The QM-parameterized force-field computing device can fit the intermolecular interactions to multimer and select multimer QM energies at, for example, the highest level of theory practical for large-scale parameterization. This specific level can have an average accuracy of 0.05 kcal/mol on a diverse set of small molecular configurations as compared with the most precise and computationally demanding methods. (This is referred to as the “silver standard”.)
The QM-parameterized force-field computing device can then encapsulate the QM interaction energies in a physics-based analytical model. The faithfulness can introduce a significant level of complexity from the functional form: polarizability terms enable proper transferability from multimer to bulk energies; multipole descriptions of both the electrostatic and exchange-repulsion interactions permit a precise fit of the potential energy surface for multimer orientations; a fairly detailed typification accounts for the difference in interaction properties of identical atoms in diverse chemical environments.
In some simulations, the example QM-parameterized force-field molecular simulation process includes using the functional form of the bonded interactions, with force constants and equilibrium values fitted to QM energies at the MP2/aug-cc-pVTZ level of theory. In some simulations, solvents of water and cyclohexane are selected. Cyclohexane (CHEX) is nonpolar and equilibrates relatively quickly. Reliable experimental data for both CHEX solvation free energies, as well as for the CHEX/H2O partition coefficients are used in these simulations. In some instances, though the two molecules (water and cyclohexane) were parameterized with the same procedure as every other functional group, the two molecules participate in bulk and can benefit from extra examination of their liquid-state properties (e.g., considering the molecular 2-body interactions and the multi-body contributions.) For water, which is small, polar, and polarizable, the multibody energies are estimated to be a sizable 27% of the total energy.
Biological systems exist mostly at room temperature and pressure, where the shifting interplay between enthalpy and entropy enables the immense variety of biological phenomena. The QM-parameterized force field molecular simulation process described herein enables correctly predicting the free energies of ensembles. For solvation, in addition to capturing the enthalpy of interaction with itself and the solute, a solvent model can reproduce the entropic effects of pushing aside and reordering molecules to create a cavity for placing the solute. Water molecules are small, highly polar, and highly structured at room temperature and pressure. Table 1 lists three bulk properties for each of the two solvents (water and CHEX): density, heat of vaporization and the highly informative self-solvation. The predicted values, using the QM-parameterized force field model (labeled as AFF (PEID8) for water and AFF (PIMD) for CHEX), for water agree with experimental values to within 3% or better. The predictions using the QM-parameterized force field model are significantly closer to the experimental values than the classical MD simulations (labeled as AFF (MD) for both water and CHEX). The predicted values of cyclohexane, comparing with the experimental values, present an agreement to a lesser degree than the agreement of water. Given that cyclohexane is a larger molecule, the energetics are meeting the pre-determined threshold. In addition, when predicting the properties of cyclohexane, the QM-parameterized force field molecular simulation includes assigning same atom types to atoms of the cyclohexane (similar to the step 301 in
experimental
0.997
10.51
−6.30
−6.30
experimental
0.790
7.91
1.20
−4.42
In some example simulations, solutes containing neutral chemical functional groups, for example carboxylic acids, alkanes, alkenes, aromatics, aldehydes, ketones, alcohols, amides, esters, thiols, sulfides, disulfides and heterocycles, are used. The simulations were performed independently by four groups using their own respective computational resources and architectures, and then averaged.
Specifically,
In some implementations, the QM-parameterized force field model takes into consideration or includes parameters relating to, for example, accurate treatment of long range electrostatic (PME) (and dispersion interactions, proper thermodynamic modelling (thermostats and barostats), enhanced sampling techniques and the Path Integral formulation of nuclear motion. In some simulations, NQE are taken into consideration in the QM-parameterized force field model for accurate predictions for molecular systems.
Table 2 shows time required for computation of 1 ns molecular dynamics solvation simulation for each window using the QM-parameterized force field model based on classical and path-integral methods. In some implementations, these timings do not employ the various speed-up techniques that may result in energy errors >0.1 kcal/mol. For example, in these implementations, centroid RPMID was not used.
In some simulations, the total dimer energies are calculated with silver standard i.e. MP2/CBS, calculated with Helgaker cubic extrapolation from aug-cc-pVTZ->aug-cc-pVQZ+CCSD(T)/aug-cc-pVDZ-MP2/aug-cc-pVDZ. To improve transferability and ease optimization, in some implementations, DFT-SAPT decomposition is used with dHF correction at aug-cc-pVTZ level with PBE0 functional asymptotically corrected4. In these implementations, four parts of DFT-SAPT decomposition are implemented in the QM-parameterized force field model. ES (electrostatic E1pol), EX (exchange-repulsion Elexch), IND (induction, E2ind+E2ind-exch+δHF), DS (dispersion, E2disp+E2disp-exch+Esilver-standard−DFT-SAPTtotal_energy). The dispersion term can accumulate all disagreements between total energy described by the “silver standard” and DFT-SAPT+SHF energies.
In some implementations, the QM-parameterized force field model (also referred to as ARROW-FF) includes features: the two-body non-bonded interactions are composed of electrostatic, exchange-repulsion and dispersion terms, and many others. The electrostatic and exchange-repulsion terms are multipolar, and their radial dependence is a Slater-like exponential to describe charge penetration and delocalization. The dispersion term can be represented by a spherical, 2 term (C6 and C8), Tang-Toennies-damped interaction. Multibody effects can be modeled by anisotropic atomic polarizable dipoles interacting with the electrostatic term and with each other and iterated to self-consistent field (SCF) convergence on every non-bonded step. In some implementations, there are no explicit three-body terms or charge flux. The intermolecular parameters of the QM-parameterized force field model are determined by agreements with QM values of dimer and multimer energies, electrostatic potentials, and multipole moments of monomers. To aid transferability, the QM-parameterized force field molecular simulation process can include matching the individual components to their corresponding QM counterparts, in addition to reproducing the total energy. The bonded interactions are fitted to monomer QM energies computed at the df-MP2/aug-cc-pVTZ level of theory.
In some simulations of the QM-parameterized force field molecular simulation process, the molecules were placed in a water or cyclohexane box of size 32×32×32 Å3. Particle-Mesh Ewald (PME) algorithm was used to compute the electrostatic interactions. A 32×32×32 grid mesh and 5th power spline interpolation order can be used to compute the inverse PME sum. 9 Å cutoffs were used to compute the direct PME sum as well as the exchange and dispersion interactions. Bulk corrections were applied to account for distance cut-off of dispersion interactions. Solvation free energies were computed by decoupling the interactions between the solute and the solvent molecules. Electrostatic, exchange-repulsion and dispersion interactions were switched off simultaneously using a lambda-dependent scaling (a scaling factor power=2 was used) and a soft-coring algorithm (maximal soft-coring radius=1.5 Å, soft-coring radius scaling factor power=1). 15 lambda points unequally spaced (X=0, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1.0) were used to decouple the solute-solvent intermolecular interactions.
In some simulations of the QM-parameterized force field molecular simulation process, energy minimization (10,000 steps) using the steepest descent algorithm was initially performed on all the simulation systems. This was followed by a 50 ps equilibration and 1 ns production runs in the isothermal-isobaric ensemble (NPT). The temperature was maintained at 298 K using a Nose-Hoover thermostat (chain length=6, relaxation time=1 ps). Pressure was maintained at 1 atm using a MTTK barostat (relaxation time=5 ps). A Multiple Time Step (MTS) algorithm was used to integrate the equations of motion both for the classical and path-integral runs. Bonded and PIMD beads interactions were integrated with the time step of 0.125 fs, while intermolecular interactions were integrated with a time step of 2 fs. The calculations were performed using the QM-parameterized force field model.
Using the QM-parameterized force field model, the QM-parameterized force field device can compute or predict the solvation free energies using both the Bennett acceptance ratio (BAR) and thermodynamic integration (TI) method with <dH/dλ> values interpolated by cubic splines.
Example embodiments described herein include a QM-parameterized force-field model and a QM-parameterized force field molecular simulation process that can predict free energies of solvation of arbitrary molecules to an accuracy better than thermal noise at room temperature (˜0.5 kcal/mol). The correspondence from quantum mechanics to ensemble predictions meets the chemical accuracy for reasons including, but not limited to, first, the benchmark QM calculations can preserve a degree of accuracy. Second, the QM-parameterized force-field model can provide a description of the QM potential energy surface (PES), which imposes a significant yet computational manageable level of complexity on the functional form. Third, the averaging of molecular ensemble can be performed with care. Finally, the dynamics of sampling of the system can account for nuclear quantum effects.
A) Charge Density
In the QM-parameterized force field model, the charge density of the electron cloud of atom ‘a’ located at position Ra is written as:
where qa is the cloud charge and wA is the atom-type parameter that characterizes the cloud size. Capital letters ‘A’, ‘B’, . . . to denote the atom-types respectively for atoms ‘a’, ‘b’, . . . ; similar notations are used for bond, bend and torsion atoms types (e.g. ‘AB’ denotes the bond type for the pair of bonded atoms ‘a’ and ‘b’).
The cloud charge is written in the form:
where ZA is the charge of the core for the atom type ‘A’, {a} denotes the set of atoms chemically bonded to atom ‘a’ and qBA is the bond charge transfer parameters (thus the equality qAB=−qBA providing for charge conservation).
Differential operator {circumflex over (D)}a
where vector ta and 3×3 tensor ωa introduce p and d type density anisotropy, respectively. They are related to the local dipole Pa and quadrupole Qa as:
P
a
=q
a
t
a,
Q
n
=q
nωn (A4)
Vector ta is a sum of permanent and induced vectors:
t
a
=t
a
per
+t
a
ind (A5)
The induced part is written as
t
a
ind
=t
A
maxτa (A6)
where tmaxA is the atom-type parameter and τa is the dimensionless vector representing the dynamic variable, its length being confined within interval (0, 1) by the restraint potential UIN.
The permanent part in Eq. (A5) is written as:
where nab is the unit vector, nab=Rab/Rab| and Rab is the vector directed from atom ‘a’ to atom ‘b’ and tAB is the bond-type parameter associated with permanent ‘chemical’ polarization along the bond ‘ab’ (tAB≠tBA).
As for the local quadrupole tensor, its components are calculated as:
where α,β=x, y, z and QAB are the bond-type parameters.
The atom core density (proportional to Dirac's delta-function) can be formally considered as the limiting case of Eq. (A1) replacing qa by ZA, dropping operator {circumflex over (D)}a and formally approaching zero by the width parameter.
B) Non-Valence Interactions
Based on representation of Equation (A1) the potential φab of electrostatic interaction of two unit charge distributions ‘a’ and ‘b’ separated by a distance Rab=|Rb−Ra| can be represented in the form:
Expanding equation (B1) one gets:
where F's are the algebraic factors defined as:
and potentials φ{k}ab satisfy the recurrence relations:
The ES potential UES between the two clouds is written as:
U
ab
ES(Rab)=332.064·qaqbφab(Rab)
The potentials are in kcal/mol, the components of vectors Rab and t are in Å, charges are in a.u. This formula is also valid for core-cloud and core-core interactions by replacing the cloud charges with the core charges, nullifying vectors t and tensor ω for the core, with the corresponding width parameter(s) formally approaching zero.
The EX potential UEX is written similarly to that of ES:
U
ab
EX=332.064·CAEXCBEXχab(Rab) (B6)
where CEXA, CEXB are the force parameters for atom types ‘A’ and ‘B’, and
The algebraic factors F were defined in Eq. (B4), potentials χ{k}ab(Rab) satisfy the recurrence relations in Eq. (B5)
where uA and uB are the EX width parameters that generally differ from ES parameters wA and wB. Again, χab(0) represents a functional form of the exchange potential.
U
ab
DS=−332.064·((CADS6CBDS6ψ6(Rab)+CADS8CBDS8ψS(Rab)) (B9)
where CDS6, CDS8 are the force constants and ψ's are the Tang-Toennies's functions:
vA and vB being the dispersion width parameters that generally differ from those for electrostatics and exchange.
The corresponding anharmonic restraint potential for the atom ‘a’ being represented as:
where αA is the QM-parameterized force field parameter characterizing polarizability prescribed to atom-type A while bond-type parameters sAB describe the anisotropy of the restraint potential (note that generally sAB≠sBA), the vector τa was introduced in Eq. (A6). Note that the force due to potential in Eq. (B12) approaches infinity as τa approaches 1. This peculiarity of the force field model restraint potential avoids the polarization catastrophe and provides the existence of a solution of the problem of optimization of electron density in any physically reasonable external field.
C) Valence Interactions
he Hamiltonian of valence interaction is given in the form:
H=H
BS
+H
AB
+H
StBn
+H
OOP
+H
TORS (C1)
where the terms correspond respectively to bond stretching (BS), angle bending (AB), stretch-bend (StBn), out-of-plane bending (OOP), and torsion interactions (TORS).
The bond stretching component is of the form:
where
Δrij=|ri−rj|−fIJ(0) (C3)
In Eq. (C2), the sum is over all the bond pairs, I and J are the indices of the atomic types ascribed respectively to atoms i and j, rIJ(0) and kIJ(b) are respectively the equilibrium (or reference) bond length and the bond force constant determined by the atomic type pair and the specific bond type index, and cb is a model parameter.
The angle bending term is given by:
where the summation is performed over all bend triplets i-j, j-k, the indices I, J and K corresponding to the atomic types ascribed respectively to atoms i, j and k. Other notations are:
Δφijk=φijk−φIJK(0) (C5)
where φijk is the angle between i-j and k-j bonds, φIJK{0} and kIJK{a} are the equilibrium (or reference) angle and the force constant determined by the atomic type triplet and the specific angle type index, and ca is a model parameter.
Stretch-bend interaction is given by:
H
St-Bn=ΣΔφijk(kIJK(ba)Δrij+kKJI(ba)Δrkj) (C6)
where the summation is similar to Eq. (C4). The bond length and angle deviations, Δrij, Δrkj and Δφijk are defined by Eqs. (C3) and (C5), k(ba)IJK and k(ba)KJI are the force constants for the interaction of ij and kj bond-angle, respectively. These constants are determined by the atomic type triplet and the specific stretch-bend type index.
Out-of-plane interaction is given by:
where the sum is over all bond tetrahedrals i-j-k-l with the 3-bond central (or vertex) atom j and the bonds i-j, k-j, l-j, the indices I, J, K and L correspond to the atomic types ascribed respectively to atoms i, j, k, and l. The angle deviation Δχikl is defined as the angle between the vector (ri−rj) and the plane defined by vectors (rk−rl) and (rl−rj); that is
where
n
aj=(ra−rj)/|ra−rj|, a=i,k,l (C9)
The force constant kIJKL is determined by the atomic type quartet IJKL.
The torsion component is of the form
H
TORS=1/2Σ[VIJKL(1)(1+cos ϕ)+VIJKL(2)(1−cos 2ϕ)+VIJKL(3)(1+cos 3ϕ)] (C10)
where the sum is over all bond quartets i-j, j-k, k-l, the indices I, J, K, and L correspond to the atomic types ascribed respectively to atoms i,j, k, and l. ø is a standard torsion angle defined as the angle between the planes i-j-k and j-k-l, given by:
cos ϕ=mij·mlk (C11)
where
The force constants VIJKL are determined by the atomic type quartet and two specific indices, the first relating to the type of the central bond j-k, while the second one relates to the hybridization state of atoms j and k.
Monomer, direr, and multimer QM data of high quality is used for fitting the QM-parameterized force field model. From the monomer calculations the ES potential map is used as well as polarizability and its related properties. The stochastic Covariance matrix adaptation evolution strategy (CMA-ES) can be used.
As shown in
Multimers of ethanol with 2,3,4 water molecules were extracted from Molecular dynamics simulations with the force field model. Geometries of those multimers were optimized with the mp2/cc-pVTZ method. Nonadditive energies are full nonbonded energy of MP2/CBS (aug-cc-pVTZ->aug-cc-pVQZ extrapolated) minus energies of dimers nonbonded energy.
As expected for ‘formula’-based functional forms, the predictability of the QM-parameterized force field model can saturate at its maximum achievable accuracy. In contrast, machine learning (ML) models can theoretically achieve accuracy, but the training data requirements are linear in log-log (accuracy vs training size) space.
Solvation data for various functional groups is given in Table 3. Table 3 shows solvation data with free energies of hydration, solvation and partition coefficients for the various functional groups. Table 4 shows a comparison of QM-parameterized force field model and PIMD calculated free energies with experiment data for neutral amino-acid analogues. Table 5 shows a comparison of hydration free energies as predicted by the QM-parameterized force field model described herein, GAFF, and AMOEBA force fields. Table 6 shows computations of free energies of hydration as predicted by the QM-parameterized force field model described herein, and models performed by academic collaborations.
Comparison with Implicit Solvent Models and Machine Learning Models
Tables 3-6 show comparisons of the free energy of hydration predictions of various functional group molecules based on the QM-parameterized force field model with that of the implicit solvent models namely Solvation Model Dielectric (SMD), Polarizable Continuum Model (PCM), and COSMO/COSMO-RS. Due to unavailability of free energy data for the original COSMO model, the free energies of solvation for the molecule dataset are calculated using MolPro ver. 2012 (gcosmo, epsilon=80.0,df-ks,B3LYP). The MAE for COSMO (computed using MolPro ver. 2012) is 2.5 kcal/mol which is in reasonable agreement with the MAE of ˜2 kcal/mol for COSMO. The comparisons to other implicit solvent models such as SMD, PCM and COSMO-RS are limited to the overlap of the molecules computed using QM-parameterized force field model. The free energies of hydration as predicted by these implicit solvent models (SMD, PCM) were received. For these subset of compounds, as shown in Table 3, the MAE's for the SMD, PCM are 0.61 kcal/mol and 1.49 kcal/mol, respectively. The free energies of hydration as determined by COSMO-RS are calculated and for the dataset the MAE to be 0.42 kcal/mol if calculated.
Machine learning models can be used for free energy determination. Based on an overlapping dataset of free energy computations of neutral organic molecules, the MAE for Free Energy Machine Learning Model (FML) was 0.50 kcal/mol. The MAE's for the free energies of hydration on the subset of neutral organic molecules dataset based on these implicit solvent models and the machine learning model-FML are consistent with the MAE's calculated and received using other models.
The QM-parameterized force field models and the QM-parameterized force field molecular simulations described herein may be applied to a variety of molecular systems. Example applications include, but are not limited to, protein-ligand related applications and material science related applications. Example protein-ligand related applications can include, but are not limited to, protein-protein interactions related applications, protein-ligand interactions related applications, protein enzyme related applications, ligand properties related applications, membrane and metalloprotein applications, protein folding and various protein properties applications.
Embodiments can include performing free energy perturbation (FEP) calculations to predict ligand-protein binding affinities via molecular simulations using the force-field model.
Embodiments can include using the force-field model to screen drug candidate options through assessment of a plurality of variations of the drug candidate obtained by replacement of a central core of the drug candidate molecule with a plurality of chemical moieties to identify an optimum and/or potential drug candidate.
Embodiments can include using the force-field model in assessing ligand and protein targets in a lead optimization phase of drug development.
Embodiments can include using the force-field model to design ligands that bind to a protein for treatment of a disease.
Embodiments can include application to aggregation of biotherapeutic formulations.
Embodiments can include using the force-field model in predicting aggregation of a biotherapeutic including an antibody, wherein the predicting includes: predicting chromatography retention times of the antibody modeled with the force-field model; mapping distribution of aggregation-prone regions on a surface of the antibody modeled by the force-field model; and/or predicting effect of residue mutation on aggregation behavior of the antibody modeled by the force-field model.
Embodiments can include predicting structure of antibodies, in particular, using the force-field model to predict the three-dimensional structures of antibodies.
Embodiments can include using the force-field model to predict protein-protein binding.
Embodiments can include using the force-field model to design a protein having a stabilized folded protein conformation through modeling structural modifications of the protein.
Embodiments can include using the force-field model to design an enzyme through modeling of active variants of an enzyme.
Example material science related applications (homogenous and heterogenous liquids or simple structured systems like crystals, etc.) can include batteries, polymers, zeolites, solvents and mixtures and phase diagrams, and/or the like.
Embodiments can include using the force-field model to enhance reactivity and selectivity of transition metal catalyzed reactions through modification of one or more of the catalyst, catalyst substrate, and reactants modeled by the force-field model.
Embodiments can include using the force-field model to predict internal optoelectronic and condensed-phase thermophysical properties of organic optoelectronic materials in applications to optimize and/or improve material behavior. The material behavior can be selected from the group including charge injection, charge transport, exciton coupling, phosphorescence, chemical stability, thermophysical stability, etc. The applications can be selected from the group including organic light-emitting diodes (OLEDs), organic field-effect transistors (OFETs), photovoltaics (PVs and OPVs), dye-sensitized solar cells (DSSCs), etc.
Embodiments can include using the force-field model to predict controlled release of a drug from formulations, wherein the formulations include lipid or polymer-based aggregates in solution that sequester, solubilize and deliver the drug. A simulation of the formulation modeled by the force-field model predicts formation and structure of the aggregates and location of the drug within the aggregate.
Embodiments can include using the force-field model to predict stability of a drug compound to degradation mechanisms comprising oxidation, hydrolysis, photo-degradation, or any combination thereof. Degradation analyses over multiple chemical moieties in high-throughput fashion may provide useful criteria for rapid screening of formulation scenarios.
Embodiments may include using the force-field model to predict miscibility of a pharmaceutical formulation in a solution.
Embodiments may include using the force-field model to predict a glass transition temperature (Tg) of an amorphous solid obtained from density-temperature curves generated from simulations of a force-field model of the amorphous solid.
Applications may include those related to energy capture and storage. Embodiments may include modeling materials for batteries, fuel cells, and/or hydrogen storage using the force-field model to predict properties selected from the group including, for example, ion mobility, intercalation potentials and load capacity of battery electrodes, additives chemistry at an electrode-electrolyte interface, electro-catalytic activity, degradation processes, elastic and thermophysical properties, and/or ionic mobility in organic and solid electrolyte.
Embodiments may include modeling atomic layer deposition at a gas-surface interface using the force-field model.
Embodiments may include modeling organometallic complexes as precursors for depositing metal or metal oxide film at a gas-surface interface using the force-field model. A plurality of variations of organometallic complexes may be screened to assess viability of organometallic complexes as precursors. The variations may include a collection of complexes having different combinations of selected ligands associated with a metal ion.
Embodiments may include modeling polymer systems using the force-field model to determine mechanical and thermophysical properties, wherein the polymer system can be selected from the group including a crosslinked polymer, a carbon fiber polymer composite, a polymer-polymer composite, a polymer-ceramic composite, a neat polymer and/or the like.
Embodiments may include determining elastic and ultimate performance properties of a polymer system modeled with the force-field model. The polymer system model may be subjected to a series of simulations under different levels of a controlled strain and a stress calculated from the simulations and the strain may be used to determine modulus and a yield point of the polymer system.
Embodiments may include determining a glass transition temperature (Tg) of a polymer system modeled with the force-field model, wherein a series of simulations of the polymer system may be performed at different temperatures and the Tg is determined from a glass transition identified from the densities calculated from the simulations and the temperatures.
Embodiments may include assessing miscibility of components in a polymer material through modeling with the force-field model, wherein the components can include reactive components, solvents, polymer additives, plasticizers, or any combination thereof. Assessing miscibility may include comparing solubility parameters determined from the modeling.
Embodiments include modeling phase separation in polymer blends and copolymer systems with the force-field model.
Embodiments include modeling inorganic solid materials with the force-field model to predict bulk and surface properties, wherein the properties can be selected from the group including band structure, mechanical, dielectric, magnetic, and/or thermodynamic properties.
Embodiments may include predicting compositional phase diagrams, material defects, and/or surface degradation and reactivity of the inorganic solid materials modeled with the force-field model.
Embodiments may include modeling membrane proteins with the force-field model including molecular dynamics simulations of protein-protein interactions and protein-lipid interactions.
Embodiments can include performing a molecular simulation of the force-field model of a metalloprotein in a biological system. A relationship between structure and function of the metalloprotein as a catalyst for use in synthesis of model catalysts that mimic the modeled metalloprotein may be evaluated from molecular simulation.
Embodiments may include performing a molecular simulation of permeation of compounds through a skin barrier layer using the force-field model to predict permeability of the compounds through the skin barrier layer. The compounds may be selected from the group including drugs, toxic chemicals, and/or cosmetics.
Embodiments may include molecular simulation of misfolding and aggregation of proteins modeled with the force-field model to elucidate conformational states and the aggregation and the misfolding pathways of the proteins.
Embodiments may include performing molecular simulations of polymer nanocomposites (PNCs) modeled as organic or inorganic nanoparticles interacting with polymer chains represented with the force-field model, wherein a conformation and the mobility of the polymer chains in proximity to the nanoparticles is monitored. A relationship of enthalpic and entropic interactions on structure and the dynamics of the polymer chains and the nanoparticles to macroscopic properties may be evaluated from the simulations.
Methods described herein can further include performing molecular simulations of a liquid crystal system using the force-field model, wherein the liquid system is selected from the group including a bulk phase, mesophase, inhomogeneous system, interfacial system, etc.
Embodiments may include performing molecular simulations of adsorption, separation and/or diffusion of molecular systems in a zeolite. The molecular system is selected from the group including small molecules comprising less than 10 atoms, long chain hydrocarbons including greater than 10 carbons, mixtures, etc. The results of the simulations may be used to evaluate and/or calculate adsorption isotherms of molecules, separation of molecules of different types, and relative diffusion of molecules of different types of the molecular system.
Embodiments may include simulations that probe protein-lipid interactions of single membrane proteins and complex membranes.
Embodiments may include molecular simulation that can be used to study the relationship between structure and function of metalloproteins, which can be used to synthesize model catalysts that mimic those metalloproteins.
Embodiments may include molecular simulations used to understand and predict permeability of compounds through skin. This is of interest for transdermal delivery of drugs, for toxicity predictions of chemicals and topical application of cosmetics. Permeability of compounds can be simulated using atomistic models of the skin's barrier structure and the compounds. Modeling predictions may be validated with data for reference compounds.
Embodiments may include using MD simulations with the force field described herein to provide a full and realistic dynamical description of the protein-ligand binding event.
Embodiments may include determining functional mechanisms of proteins through simulation of transitions of a protein from an active to inactive structure. The atomistic characterization of the transition state is a fundamental step to improve the understanding of the folding mechanism and the function of proteins.
Embodiments may include using the force field model described herein to understand the optical, electrical and mechanical properties of organic electronic materials. The spatial arrangement of the molecules as well as the chemical identity critically affects the performance of the material. Modeling can be used to relate a material's performance in a device to the molecular details that cause and optimize that performance. Processes such as charge transport, exciton coupling, and phosphorescence are important for optimizing and/or improving the performance of materials used in organic electronics devices such as organic light-emitting diodes (OLEDs), organic field-effect transistors (OFETs), photovoltaics (PVs and OPVs), and dye-sensitized solar cells (DSSCs).
Polymer nanocomposites (PNCs) are hybrid materials incorporating organic or inorganic nanoparticles (NPs) with at least one dimension in the submicron scale. These materials play an important role in industrial formulations and technological applications from food packaging to smart coatings. Incorporating nanoparticles (NPs) to a polymer matrix can significantly alter the conformation and the mobility of the polymer chains in their proximity. Understanding the delicate balance between the enthalpic and entropic interactions is crucial to control and predict the ability of NPs to diffuse and disperse in the polymer matrix. The relationship of these interactions on the structure and the dynamics of polymer chains and NPs to a number of macroscopic properties can be shown by molecular dynamics simulations using the force field described herein.
Molecular dynamics simulations may be employed to study liquid crystals including bulk phases, mesophases, inhomogeneous systems, and interfaces. In self-organizing materials, such as liquid crystals, subtle changes in intermolecular forces lead to changes in phase behavior. In liquid crystals many properties of interest arise from specific ordering of molecules (or parts of molecules) in the bulk and so are studied by simulation of many molecules.
Molecular simulation has been applied to study adsorption, separation and diffusion of various types of molecules in zeolites. There are numerous industrial applications involving pure components, various sizes of molecules including small molecules to long chain hydrocarbons, and mixtures. MD simulations can relate the adsorbates' characteristics to adsorption isotherms, separation, and relative diffusion.
It should be appreciated that the above detailed embodiments of the present disclosure are only to exemplify or explain principles of the present disclosure and not to limit the present disclosure. Therefore, any modifications, equivalent alternatives and improvement, etc. without departing from the scope of the present disclosure shall be included in the scope of protection of the present disclosure. Meanwhile, appended claims of the present disclosure aim to cover all the variations and modifications falling under the scope and boundary of the claims or equivalents of the scope and boundary.
The above-described embodiments can be implemented in any of numerous ways. For example, embodiments may be implemented using hardware, software or a combination thereof. When implemented in software, the software code can be stored (e.g., on non-transitory memory) and executed on any suitable processor or collection of processors, whether provided in a single computer or distributed among multiple computers.
Further, it should be appreciated that a compute device including a computer can be embodied in any of a number of forms, such as a rack-mounted computer, a desktop computer, a laptop computer, netbook computer, or a tablet computer. Additionally, a computer can be embedded in a device not generally regarded as a computer but with suitable processing capabilities, including a smartphone, smart device, or any other suitable portable or fixed electronic device.
Also, a computer can have one or more input and output devices. These devices can be used, among other things, to present a user interface. Examples of output devices that can be used to provide a user interface include printers or display screens for visual presentation of output and speakers or other sound generating devices for audible presentation of output. Examples of input devices that can be used for a user interface include keyboards, and pointing devices, such as mice, touch pads, and digitizing tablets. As another example, a computer can receive input information through speech recognition or in other audible format.
Such computers can be interconnected by one or more networks in any suitable form, including a local area network or a wide area network, such as an enterprise network, and intelligent network (IN) or the Internet. Such networks can be based on any suitable technology and can operate according to any suitable protocol and can include wireless networks, wired networks or fiber optic networks.
The various methods or processes outlined herein can be coded as software that is executable on one or more processors that employ any one of a variety of operating systems or platforms. Additionally, such software can be written using any of a number of suitable programming languages and/or programming or scripting tools, and also can be compiled as executable machine language code or intermediate code that is executed on a framework or virtual machine.
In this respect, various disclosed concepts can be embodied as a computer-readable storage medium (or multiple computer-readable storage media) (e.g., a computer memory, one or more floppy discs, compact discs, optical discs, magnetic tapes, flash memories, circuit configurations in Field Programmable Gate Arrays or other semiconductor devices, or other non-transitory medium or tangible computer storage medium) encoded with one or more programs that, when executed on one or more computers or other processors, perform methods that implement the various embodiments of the disclosure discussed above. The computer-readable medium or media can be transportable, such that the program or programs stored thereon can be loaded onto one or more different computers or other processors to implement various aspects of the present disclosure as discussed above.
Some embodiments described herein relate to a computer storage product with a non-transitory computer-readable medium (also can be referred to as a non-transitory processor-readable medium) having instructions or computer code thereon for performing various computer-implemented operations. The computer-readable medium (or processor-readable medium) is non-transitory in the sense that it does not include transitory propagating signals per se (e.g., a propagating electromagnetic wave carrying information on a transmission medium such as space or a cable). The media and computer code (also can be referred to as code) may be those designed and constructed for the specific purpose or purposes. Examples of non-transitory computer-readable media include, but are not limited to, magnetic storage media such as hard disks, floppy disks, and magnetic tape; optical storage media such as Compact Disc/Digital Video Discs (CD/DVDs), Compact Disc-Read Only Memories (CD-ROMs), and holographic devices; magneto-optical storage media such as optical disks; carrier wave signal processing modules; and hardware devices that are specially configured to store and execute program code, such as Application-Specific Integrated Circuits (ASICs), Programmable Logic Devices (PLDs), Read-Only Memory (ROM) and Random-Access Memory (RAM) devices. Other embodiments described herein relate to a computer program product, which can include, for example, the instructions and/or computer code discussed herein.
The terms “program” or “software” or “software stack” are used herein in a generic sense to refer to any type of computer code or set of computer-executable instructions that can be employed to program a computer or other processor to implement various aspects of embodiments as discussed above. Additionally, it should be appreciated that according to one aspect, one or more computer programs that when executed perform methods of the present disclosure need not reside on a single computer or processor, but can be distributed in a modular fashion amongst a number of different computers or processors to implement various aspects of the disclosure.
Computer-executable instructions can be in many forms, such as program modules, executed by one or more computers or other devices. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Typically the functionality of the program modules can be combined or distributed as desired in various embodiments.
Also, various concepts can be embodied as one or more methods, of which an example has been provided. The acts performed as part of the method may be ordered in any suitable way. Accordingly, embodiments can be constructed in which acts are performed in an order different than illustrated, which can include performing some acts simultaneously, even though shown as sequential acts in illustrative embodiments. All publications, patent applications, patents, and other references mentioned herein are incorporated by reference in their entirety.
This application claims priority to and the benefit of U.S. Provisional Patent Application No. 63/271,979 filed Oct. 26, 2021, and entitled “System and Method for Determining Material Properties of Molecular Systems from an ab-initio Parameterized Force-Field,” the contents of which are incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
63271979 | Oct 2021 | US |