The present disclosure relates to a multi-level partitioning description of the electronic structure of molecular systems and materials and related applications.
In particular, the present disclosure relates to methods for treating the electronic structures of the molecular systems and materials and related applications.
Computational modeling of chemically reactive systems in the condensed phase faces challenges from the perspective of electronic structure theory.
Applications involving chemical reactions such as hydrogen transfer catalysis in enzymes, metal dendrite formation at electrolyte/electrode interfaces, and the decomposition of crystalline high energy density materials combine large system sizes with subtle molecular interactions and chemical reactions.
In particular, many of such applications involve multiple dynamical timescales and electronically non-adiabatic effects. The development of new methods to perform reliable, on-the-fly electronic structure calculations at a computational cost that makes feasible the simulation of long-timescale dynamics in large systems remains the central challenge in chemistry.
Provided herein are embedded mean field (EMF) methods which allow in several embodiments treating the electronic structures of molecular systems and materials at different mean-field levels of accuracy in order to create a high accuracy to computational cost ratio that make feasible the simulation of long-timescale dynamics in large systems.
According to a first aspect, a system and method are described herein for constructing an energy model for a molecular system, said method comprising: generating, on a computer, a one-particle basis set for the molecular system; partitioning the molecular system at a basis set level into a plurality of partitions comprising a first partition and a second partition; estimating, on the computer, a total energy of the molecular system by combining mean-field quantum mechanical calculations that involve self-consistent system optimization of a density matrix for a combination of an entirety of the first and second partitions, the mean-field quantum mechanical calculations being of a first level of computational cost for the first partition and a second level of computational cost for the second partition, the first level of computational cost being higher than the second level of computational cost such that a computational cost of the estimating is lower than if the second level of computational cost were equal to the first level of computational cost; and storing, on the computer, at least one value corresponding to the total energy.
According to a second aspect, a system and method are described herein for constructing an energy model for a molecular system, said method comprising: generating, on a computer, a one-particle basis set for the molecular system; partitioning the molecular system at a basis set level into a multiple partitions; estimating, on the computer, a total energy of the molecular system by combining mean-field quantum mechanical calculations that involve self-consistent system optimization of a density matrix for a combination of an entirety of the multiple partitions, the mean-field quantum mechanical calculations being of different levels of computational cost for each of the multiple partitions; deriving a density matrix from the total energy; and storing, on the computer, at least one value corresponding to the total energy.
According to a third aspect, a system and method are described herein for selecting a solvent for a compound presenting an acid, the method comprising: generating, on a computer, a first one-particle basis set associated with a protonated form of the acid; partitioning the protonated form at a basis set level into a first partition of the protonated form, the first partition comprising a hydrogen-oxygen bond, and into a second partition of the protonated form; generating, on the computer, a second one-particle basis set associated with a deprotonated form of the acid; partitioning the deprotonated form at the basis set level into a first partition of the deprotonated form, the first partition of the deprotonated form comprising a portion of the deprotonated form of the acid where the hydrogen-oxygen bond of the protonated form was broken to form the deprotonated form, and into a second partition of the protonated form; estimating, on the computer, a first total energy by combining mean-field quantum mechanical calculations that involve self-consistent system optimization of a first density matrix for a combination of an entirety of the first and second partitions of the protonated form, the mean-field quantum mechanical calculations being of a first level of computational cost for the first partition of the protonated form and of a second level of computational cost for the second partition of the protonated form, the first level of computational cost being higher than the second level of computational cost such that a computational cost of the estimating is lower than if the second level of computational cost were equal to the first level of computational cost; estimating, on the computer, a second total energy by combining mean-field quantum mechanical calculations that involve self-consistent system optimization of a second density matrix for a combination of an entirety of the first and second partitions of the deprotonated form, the mean-field quantum mechanical calculations being of the first level of computational cost for the first partition of the deprotonated form and of the second level of computational cost for the second partition of the deprotonated form; calculating a deprotonation energy from the first total energy and the second total energy; calculating a pKa value of the acid from the deprotonation energy; and selecting the solvent for the compound based on the pKa value.
According to a fourth aspect, a system and method are described herein for determining a total energy of a molecular system in an external field, said method comprising: generating, on a computer, a one-particle basis set associated with the molecular system; partitioning the molecular system at a basis set level into a first partition and a second partition; estimating, on the computer, a total energy of the molecular system by combining mean-field quantum mechanical calculations that involve self-consistent system optimization of a density matrix for a combination of an entirety of the first and second partitions of the one-particle basis set, the mean-field quantum mechanical calculations being of a first level of computational cost for the first partition and a second level of computational cost for the second partition, the first level of computational cost being higher than the second level of computational cost such that a computational cost of the estimating is lower than if the second level of computational cost were equal to the first level of computational cost, the self-consistent system optimization including interactions of the system with the external field; and storing, on the computer, at least one value corresponding to the total energy.
According to a fifth aspect, a system and method are described herein for constructing a model of time dependent behavior for a molecular system, said method comprising: generating, on a computer, a one-particle basis set for a configuration of the molecular system; partitioning the molecular system at a basis set level into a first partition and a second partition, wherein the first partition and the second partition each correspond to disjoint subsets of the one-particle basis set; estimating, on the computer, a total energy of the molecular system by combining mean-field quantum mechanical calculations that involve self-consistent system optimization of a density matrix for a combination of an entirety of the first and second partitions of the basis set, the mean-field quantum mechanical calculations being of a first level of computational cost for the first partition and a second level of computational cost for the second partition, the first level of computational cost being higher than the second level of computational cost such that a computational cost of the estimating is lower than if the second level of computational cost were equal to the first level of computational cost; and calculating forces within the molecular system from a gradient, with respect to nuclear coordinates of the molecular system, of the total energy of the molecular system; generating a one-particle basis set for a new configuration of the system at a consecutive time step, the new configuration being based on the forces being applied to the molecular system in the configuration; and repeating the generating, the partitioning, the estimating, and the calculating for the new configuration.
According to a sixth aspect, a system and method are described herein for optimizing geometry of a molecular system, said method comprising: generating, on a computer, a one-particle basis set associated with the molecular system of a particular geometry; partitioning the molecular system of the particular geometry at a basis set level into a first partition and a second partition; estimating, on the computer, a total energy of the molecular system by combining mean-field quantum mechanical calculations that involve self-consistent system optimization of a density matrix for a combination of an entirety of the first and second partitions, the mean-field quantum mechanical calculations being of a first level of accuracy for the first partition and a second level of accuracy for the second partition, the first level of accuracy being higher than the second level of accuracy such that a computational cost of the estimating is lower than if the second level of accuracy were equal to the first level of accuracy; repeating the generating, the partitioning, and the estimating for the molecular system in at least one other geometry; and optimizing the geometry of the molecular system based on the total energies.
According to a seventh aspect, a system and method are described herein for creating an enzyme from a non-enzymatic protein, said method comprising: procuring a sample of the non-enzymatic protein; determining a reaction to be catalyzed by the enzyme; determining a transition state of the reaction; building a nuclear configuration associated with a molecular system, the molecular system comprising the sample and the transition state; generating, on a computer, a one-particle basis set for the nuclear configuration; partitioning the nuclear configuration at a basis set level into a first partition and a second partition; estimating, on the computer, a total energy of the molecular system by combining mean-field quantum mechanical calculations that involve self-consistent system optimization of a density matrix for a combination of an entirety of the first and second partitions, the mean-field quantum mechanical calculations being of a first level of computational cost for the first partition and a second level of computational cost for the second partition, the first level of computational cost being higher than the second level of computational cost such that a computational cost of the estimating is lower than if the second level of computational cost were equal to the first level of computational cost; storing, on the computer, the total energy; determining a modification to the non-enzymatic protein; building a new nuclear configuration associated with a new molecular system, the new molecular system comprising: the transition state, and the non-enzymatic protein with the modification; partitioning the new nuclear configuration at a basis set level into a new first partition and a new second partition; estimating, on the computer, a new total energy of the new nuclear configuration by combining mean-field quantum mechanical calculations that involve self-consistent system optimization of a new density matrix for a combination of an entirety of the new first and new second partitions, the mean-field quantum mechanical calculations being of the first level of computational cost for the new first partition and the second level of computational cost for the new second partition; storing, on the computer, the new total energy; determining if the new total energy is lower than the total energy; and if the new total energy is lower than the total energy, applying the modification to the non-enzymatic protein.
In some embodiments, the EMF methods herein described can be used in methods to control one or more chemical reactions, in methods to engineer a compound and/or material or in methods to engineer a combination of compounds/molecules material to have a certain desired reactivity states or effect.
For example, according to an eighth aspect, the EMF methods can be used in a method for controlling a reaction between one or more reagents. In particular, the method can comprise constructing an energy model for a molecular system of the reaction according to a EMF method herein described to obtain one or more total energy values for the reaction; and selecting an energy value (e.g. a specific value or a range of values) for the molecular system corresponding to a predetermined reaction result. The method can further comprise selecting the reaction settings corresponding to the selected energy value; and selecting the number of reagents, type of reagents, chemical structure of the reagents and/or related concentration, and/or selecting the reaction conditions (such temperature, pH, external fields, buffers, ionic strength) to obtain the selected reaction setting.
In particular, the energy value in a method to control a reaction herein described, can be selected to correspond to several chemical and/or physicochemical state of the reactants, reaction intermediates and/or of the related products, resulting in reaction features, such as: the making or breaking of a bond (e.g. covalent bond or ionic bond), the reaction rate, the product yield, the chemical and/or structural nature of the product, and additional features identifiable by a skilled person upon reading of the present disclosure.
According to a ninth aspect, the EMF methods can be used in a method for engineering a compound to control its reactivity and/or physicochemical state. The method can comprise: constructing an energy model for a molecular system of the compound according to EMF methods herein described to obtain one or more total energy values for the compound and selecting an energy value (e.g. a specific value or a range of values) for the molecular system corresponding to a predetermined compound structure. The method can further comprise selecting structural features of the compound corresponding to the selected energy value engineering the compound having the selected structural features (e.g. polarity, hydrophobicity, or hydrophilicity of the compounds or portions thereof; number, positioning, and orientation of functional groups; and additional features identifiable by a skilled person) corresponding to the selected energy value.
In particular, the energy value for methods to engineer a compound herein described, can be selected to correspond to a compound structure that is associated with one or more chemical and/or physicochemical states of the compound, such as the reactivity of the compound or portions thereof (per se or in combination with one or more additional compounds or environment), the thermodynamic stability of the compound et al., and additional states identifiable by a skilled person upon reading of the present disclosure.
According to a tenth aspect, the EMF methods can be used in a method for engineering a material to have one or more chemical and/or physicochemical properties. The method can comprise: constructing an energy model for a molecular system of the material according to EMF methods herein described to obtain one or more total energy values for the compound and selecting an energy value (e.g. a specific value or a range of values) for the molecular system corresponding to a predetermined material configuration. The method can further comprise selecting structural features of the material corresponding to the selected energy value engineering the material having the selected structural features (e.g. crystallinity, composition number and chemical nature of the interactions between the components, and additional features identifiable by a skilled person) corresponding to the selected energy value.
In particular, the energy value, in a method to engineer a material herein described, can be selected to correspond to a material configuration that is associated with one or more chemical and/or physicochemical properties of the material, such as conductivity, ability to absorb light, tensile strength, hardness, and additional properties identifiable by a skilled person upon reading of the present disclosure.
According to an eleventh aspect, the EMF methods can be used in a method for engineering a combination of compounds, molecules and/or material to have a desired reactivity, state or effect. The method can comprise: constructing an energy model for a molecular system of the combination according to the EMF methods herein described to obtain one or more total energy values for the combination and selecting an energy value (e.g. a specific value or a range of values) for the molecular system corresponding to a predetermined state of the combination. The method can further comprise selecting an arrangement of the compounds, molecules, and/or material corresponding to the selected energy value; and selecting the structural and/or physical features of the compounds, molecules and/or material, as well as the related positioning one with respect to the other, in order to obtain the selected arrangement.
In particular, the energy value in a method to engineer a combination of compounds, molecules and/or material herein described, can be selected to correspond to an arrangement that is associated with one or more chemical and/or physicochemical properties of the combination, such as: the complementarity of the compounds, molecules and/or material forming the combination; the stability of the combination; and additional properties identifiable by a skilled person upon reading of the present disclosure.
The EMF methods and related components, methods and systems herein described, allow in several embodiments to, for a partitioned basis set: introduce no constraints on the number of electrons in any partition and unambiguously return the same number of electrons in each partition; capture quantum entanglement between partitions at the level of mean-field approximation by making no assumption that any partition is in a pure state; require no external or system-dependent parameters beyond the determination of the partition; implement the linear response theory more efficiently.
The EMF methods and related components, methods and systems herein described can be used in connection with applications wherein multi-level description of molecular systems for, is desired. Exemplary applications comprise reactions at solid or metal surfaces, bioinorganic or organometallic chemistry, homogenous or heterogeneous catalytic processes, chemical processes in batteries and fuel cells and additional applications which are identifiable by a skilled person.
The EMF methods can be implemented in combination with any classical or quantum molecular simulation tool that utilizes Born-Oppenheimer potential energy surfaces, including molecular dynamics, Monte Carlo simulation, energy minimization and docking. Software realizations of the disclosed methods will enable simulation, study and design of chemical, material and biological systems. Application domain include heterogenous catalysis, homogenous catalysis with splitting of ligands, semiconductor etching, processes at semiconductor surfaces, chemistry on and among nanoparticles, chemistry at battery electrodes, graphene chemistry, biocatalysis, and many other application areas.
The details of one or more embodiments of the present disclosure are set forth in the description below. Other features, objects, and advantages will be apparent from the description and from the claims.
The accompanying drawings, which are incorporated into and constitute a part of this specification, illustrate one or more embodiments of the present disclosure and, together with the detailed description and the examples, serve to explain the principles and implementations of the disclosure.
Provided herein are embedded mean field methods which allow in several embodiments treating the electronic structures of the molecular systems and materials at different mean-field levels of accuracy (alternatively, levels of computational cost).
Finding a solution to any electronic structure problem requires a compromise between accuracy and feasibility that is dictated by the system size. The current disclosure is directed to Embedded Mean Field (EMF) methods for the seamless, multi-level description of the electronic structure of complex molecular systems and materials with improved accuracy and at a reduced computational cost that makes feasible the simulation of long-timescale dynamics in large systems.
The EMF methods allow for partitioning of a complex molecular system into two or more subsystems at the particle level and the treatment of each subsystem within the complex molecular system at different mean-field levels of accuracy, in which subsystems are identified by subsets of one-particle basis sets. The EMF methods are distinguished from previous methods [references 1-4] by its fully consistent treatment of coupling effect between the subsystems, and the fact that electron-number fluctuations between subsystems are fully accounted for at the mean-field level. Detailed description of the EMF method, as well as numerical demonstrations, is provided in the following sections.
The term “model” or “modeling” used herein is a software construct of a molecular system. A model contains numerous variables that characterize the system being studied. Simulation is done by adjusting these variables and observing how the changes in variables can affect the outcomes predicted by the model. A common feature of various modeling techniques requires an atomistic- or even electronic-level description of molecular systems of interest.
The term “molecular system” used herein refers to a combination of atomic and/or subatomic particles possibly linked by one or more chemical bonds and possibly forming chemical and/or biological compounds, molecules, material, or combination thereof.
In particular, a molecular system in the sense of the disclosure can refer to a chemical and/or biological compounds, molecules, materials, or combination thereof comprising interacting particles, such as atoms or electrons, possibly taking the form of chemical compounds or combinations thereof consisting of a small number of atoms, to polymeric biological macromolecules and material assemblies. The particles in a molecular system, in the sense of the disclosure, can be interacting and be subjected to attractive repulsive forces which can result in some instances in formation of one or more bonds. In particular in some instances, at least some of the particles in a molecular system can be either covalently or non-covalently bonded and interact with one another through nuclear and electron interactions. Examples of a molecular system include an organic molecule, an inorganic molecule, an organic polymer, an inorganic polymer, a protein, a nucleic acid, or a combination thereof.
The term “material” used herein refers to an organic or inorganic substance, which can be a form of matter that has constant chemical composition and characteristic properties. In particular chemical substances can exist in various physical forms such as solids, liquids, gases, and plasma, and may change between these phases of matter with changes in temperature or pressure. In some instances chemical reactions can convert one chemical substance into another.
In embodiments herein described, nuclear configuration of a molecular system determines the molecular geometry by describing the three-dimensional arrangement of the particles in the molecular system. The nuclear configuration provides a description of each particle in a molecule in terms of atomic number, bond length, bond angle, and dihedral angle, and related information regarding atomic orientations and connectivity in space. In certain embodiments, the nuclear configuration of a molecular system can contain additional information such as nuclear coordinates, atomic masses and velocities. The number of variables contained in a molecular system can be modified in order to observe how the modification may affect system features of the molecular system.
Computational models introduce various levels of approximation to characterize the molecular system of interest. For small systems (<500 atoms), multi-picosecond molecular dynamics (MD) trajectories that utilize on-the-fly Kohn-Sham (KS) density functional theory (DFT) potentials may be employed [references 5-7], and MD trajectories performed using wavefunction-based ab initio methods such as Hartree-Fock (HF), second-order Moller-Plesset (MP2), coupled-cluster single/double (CCSD), and many other methods, are also feasible [reference 8].
The term “molecular dynamics” or “classical molecular dynamics” is a computer simulation approach of physical movements of particles, in particular, atoms, in a many-body molecular system. In molecular dynamics, a simulation trajectory is generated for each atom according to classical mechanics by following Newton's law of motion:
where the force Fi(r) on atom i is due to interactions with all other atoms in the given system and m, is the mass of atom i. The force can be calculated from the gradient or first derivative of the interatomic potential energy V(r) with respect to the nuclear coordinates of the molecular system. The potential energy V(r) can be calculated using either classical molecular mechanics methods or first-principle electronic structure theories, such as HF, DFT, MP2 and many other methods identifiable to a person in the skilled art. Therefore, given a starting conformation of a molecular system with a set of initial positions and velocities, the above equation describes the evolution of the simulated system in time space in a deterministic fashion. A “simulation trajectory” used herein refers to the nuclear coordinates of the particles or atoms in the molecular system that are propagated in time by a time step in the simulation method over the course of the simulation. A “classical molecular dynamics trajectory” is therefore referred to as a “simulation trajectory” generated using the classical molecular dynamics method. Molecular dynamics can be used to estimate equilibrium and dynamic properties of a complex system that cannot be calculated analytically.
Among the conventional computational models, the robustness and generality of first-principles electronic structure methods come at enormous computational expense. The first-principle electronic structure theories or methods, also referred to as ab initio quantum mechanics methods, are based on the solution of the Schrödinger equation that describes the motions of the electrons and nuclei in a molecular system from first principles. These methods calculate molecular energy and associated properties of a given molecular system on the basis of fundamental physical principles, namely electronic and nuclear structures of atoms and molecules.
For example, a typical 10 ps classical simulation of 216 water molecules using KS-DFT, one of the first-principle electronic structure methods, requires 71,000 CPU hours and at least 280 wall-clock hours on the top-ranked [reference 9] NCCS Jaguar supercomputer. Even with the approach of exoscale computing and advances in linear-scaling techniques [references 10-19], first-principles electronic structure methods remain impractical for applications involving large system sizes in a range of 10,000-100,000 atoms and long timescales about nano to microseconds.
An alternative approach to first-principle electronic structure methods is known as empirical force field methods such as molecular mechanics methods and related calculations. In molecular mechanics, a molecular system is considered as a mechanical system in which particles are connected by springs and physical forces determine the structure and dynamics of each particle. The total energy of a molecular system, also known as potential energy, is evaluated empirically with respect to the nuclei in the molecular system with the electrons treated as implicit variables of this potential, rather than explicitly treated in the first-principle electronic structure theories. Therefore, molecular mechanics involves construction of a potential energy function from a large body of experimental data, such as crystal structure geometries, vibrational and microwave spectroscopy, heats of formation, and so on.
Computations based on first-principle electronic structure methods are carried out numerically using a set of one-particle basis functions. The term “one-particle basis function” or “one-particle basis set” is defined as a set of known functions used to represent an unknown function, such as the molecular orbitals. Such function can be used to calculate the probability of finding any electron of an atom in any specific region around the atom's nucleus. The molecular orbital can be obtained by combining one-particle basis functions in a molecule. Molecular orbital is generally defined as a one-particle wavefunction, obtained as an eigenfunction of a model one-particle Hamiltonian operator.
There are hundreds of one-particle basis sets, including atom-centered basis sets in which basis set functions are centered on atoms, such as some Gaussian-function orbitals and non-exponential Slater-type orbitals (STO), and basis sets that are not centered on atoms, such as plane wave basis sets, wavelets, splines, and exponential Slater-type orbital basis functions.
Among atom-centered basis sets, also referred to as “atomic orbital basis sets”, the smallest are called minimal basis sets that consist of a minimum number of basis functions required to represent all the electrons on each atom. The most common minimal basis set is STO-nG (Slater-Type Orbital, n-Gaussian), where n is an integer and represents the number of Gaussian primitive functions comprising a single basis function. Examples of minimal basis sets include but STO-3G, STO-3G*, STO-4G, STO-6G and many others identifiable to a person skilled in the art. A common addition to basis sets is the addition of polarization functions, denoted by an asterisk *. Two asterisks, **, indicate that polarization functions are also added to light atoms such as hydrogen and helium. In order to better describe the critical role of valence electrons in chemical bonding, valence orbitals are commonly represented by more than one basis function, each of which in turn is composed of a fixed linear combination of Gaussian primitive functions. Examples of such split-valence basis sets include 4-31G, 6-31G, 6-311G, and many other split-valence basis sets identifiable to a person skilled in the art. Other atomic orbital basis sets such as cc-pVDZ and cc-pVTZ are optimized to increase computational efficiency. These basis sets can be augmented with diffuse functions by adding the AUG-prefix to the basis set keyword.
In some embodiments, products of one-particle basis functions can be constructed to generate auxiliary basis sets, using the strategy that is widely known in the field, for density fitting or resolution of the identity The general term “basis sets” refers to both one-particle basis functions and auxiliary basis sets constructed from one-particle basis functions.
Methods for modeling the electronic structure of large, chemically reactive molecular systems can be generally categorized as either (i) reactive force fields or (ii) multi-level partitioning methods.
The term “reactive force fields (RFFs)” is referred to as using analytical molecular mechanics potential energy functions to allow for breaking and reforming of chemical bonds. The RFFs method, which includes the empirical valence bond (EVB) potentials of Warshel [references 20-22], Voth [references 23-26], and others [references 27, 28] and the reaxFF force field of Goddard and coworkers [reference 29], has made possible many important applications, ranging from the transport of protons across membranes to the chemical etching of silicates. One feature of RFFs is that the system is seamlessly represented as a function of changes in configuration: no artificial spatial interfaces are introduced. However, RFFs are limited by the fact that relevant interactions, and in the case of EVB methods, possible bonding connectivities, must be anticipated and parameterized for each application. Consequently, RFFs generally include dozens or even hundreds of empirical parameters, and those parameters are difficult to rapidly and reliably generalize for new applications.
The term “partitioning” or “multi-level partitioning” used herein is subdividing a given molecular system into two or more separated, smaller subsystems at the level of one-particle basis functions, each of which can be treated differently, for example, using different algorithms, levels of accuracy or levels of computational cost. For example, one region in the molecular system can be treated at a “high” level of accuracy, which is more accurate, but more computationally expensive while a surrounding region in the molecular system can be treated at a “lower” level of accuracy, which is less computationally expensive, but less accurate. Each region can be referred to as a “subsystem”. In some embodiments, the molecular system may be partitioned at the level of atoms associated with atom-centered basis sets. In some embodiments, the partitioning can be based on spatial domains in which a given particle is assigned depending on its position.
The term “subsystem” used herein is an atomic and/or subatomic representation of a portion of a given molecular system. In certain embodiments, the subsystem can be identified as a subset of one-particle basis sets for a given configuration of the molecular system. In some embodiments, the subsystem can be identified as a subset of atom-based basis sets associated with the given molecular system.
The term “high-level” or “low-level” used herein is referred to different accuracy levels of the electron structure description of the molecular system. The level of accuracy is considered equivalent to the level of computational cost for a computer implementation of the description. In general, higher level of accuracy corresponds to higher computational cost and lower level of accuracy corresponds to lower computational cost. For example, first-principle ab initio quantum mechanics methods are considered higher-level in comparison to empirical force field methods such as molecular mechanics. The term “computational cost” is typically expressed in core hours also known as CPU hours which provide necessary value estimation needed to calculate the cost of a simulation. In addition to the level of accuracy of the methods, the computational cost can be affected by a number of other factors, including the performance of the computers, the number of atoms in the molecular system under consideration, the timescale or the length of a simulation, whereas level of computational cost is independent of those factors.
As one example of the multi-level partitioning methods, the combined QM/MM (Quantum Mechanics/Molecular Mechanics) [references 30-33] approach is commonly a method of choice for modeling reactions in bio-molecular systems consisting a large number of atoms. QM methods are required for describing chemical reactions and other electronic processes, such as charge transfer. However, QM methods are restricted to systems of up to a few hundred atoms. In order to model larger systems (i.e. more atoms) over large time scales, force-field-based molecular mechanics (MM) methods are combined with the QM methods so that QM methods are used for the chemically active region, such as substrates and co-factors in an enzymatic reaction, and MM methods are used to treat the environment region, such as protein and solvent that surround the active region. Such combined methods enable the modeling of reactive systems at a reasonable computational effort while providing the necessary accuracy. In other approaches, as in the modeling of bulk materials and crack formation, the partitioning can be based on spatial domains in which a given atom is assigned to the high- or low-level partition depending on its position [references 34-36].
The term “active region” is a region in close proximity to a chemically reactive center where conformational reorganization, chemical reactions and other electronic processes, such as charge transfer, may take place. The active region can involve chemical bond breaking and reforming, angle bending, out-of-plane distortions, or other types of conformational change. For example, the core region of biomolecular systems in which quantum events such as catalytic reaction and electronic excitations occur is considered as an active region, while the peripheral regions such as protein environments, lipids, and waters that surround the active region are considered as an environment region.
As another example of the multi-level partitioning methods, projector based embedding introduces a projection operator and constrains the number of electrons in each subsystem. However, such a method is not self-consistent, and therefore introduces ambiguity regarding partitioning selection.
As yet another example, density matrix embedding theory provides a method for embedding high-level methods using mean-field density matrices. However, this method is not formulated as a minimization of a single energy function.
As yet another example, ONIOM (Our own N-layered Integrated molecular Orbital and molecular Mechanics) provides a simple multi-level energy model in which molecular mechanics method can treat the system as a whole and an ab initio method can treat the active site of interest, known as ONIOM (QM:MM). Alternatively, two different levels of ab initio quantum mechanics methods can be combined to treat two different subsystems, known as ONIOM (QM:QM). However, ONIOM does not involve self-consistent optimization of a single energy functional, and neither does it treat subsystems properly embedded in their electronic environment. The limitation associated with the ONION method includes using link atoms for treating covalent linkages across subsystems as well as the constrained number of electrons in each subsystem. In particular, atoms linked by double or triple bonds should be placed within the same ONIOM region.
In principle, the partitioning methods require less parameterization than the RFF methods to describe the chemically reactive region, since that part of the electronic structure theory problem is treated using wavefunction-based ab initio methods. Nonetheless, these methods have clear limitations. In particular, calculation of the electronic structure must correctly account for the interfaces between the high- and low-level subsystems. Using QM/MM calculations for enzymes as an example, the molecular groups in the active site where an enzymatic reaction takes place are often capped with hydrogen atoms, although the active site in the full system might be covalently bonded to the surrounding protein scaffolding. These boundary effects can be a significant source of error in the calculated results [reference 37].
To address above limitations, attention has also been paid to the development of multi-level partitioning methods in which the high-level subsystem corresponds to a correlated wavefunction theory [references 3, 39] that accounts for explicit electron correlations and the low-level subsystem corresponds to a mean-field approximation. However, in such methods, it is necessary to specify a priori number of electrons in each subsystem and it is not possible to allow for the flow of electrons between subsystems or the fluctuation of the number of electrons between subsystems.
The “mean-field approximation” or “mean-field accuracy” or “mean-field electronic structure method” is in particular described as a method that approximates the solution for the total wavefunction of a system in terms of an anti-symmetrized product of molecular orbitals, where anti-symmetrized refers to the fact that the sign of the total wavefunction changes upon transposition of electron pairs. Both HF and KS-DFT methods correspond to examples of mean-field electronic structure methods.
The term “total energy” of a system defines the energy state of a molecular system. The total energy can be determined by various approximate solutions of the time-independent Schrödinger equation, usually with no relativistic terms included and by making assumption under the Born-Oppenheimer approximation that allows for the separation of electronic and nuclear motions.
In certain embodiments, the total energy of an atomic system can be computed by solution of the Schrödinger equation, which is defined, in the time-independent, non-relativistic, Born-Oppenheimer approximation, as follows:
{circumflex over (H)}Ψ(r1,r2, . . . ,rN)=EΨ(r1,r2, . . . ,rN) (2)
The Hamiltonian operator, H, consists of a sum of three terms: the kinetic energy, the interaction with the external potential (Vext) and the electro-electron interaction (Vee). Ψ is a wavefunction, a functional describing the quantum state of a system, represented by as a linear combination of basis function. The Hamiltonian operator H is defined in the following equation:
The external potential of interest can be defined as the interaction of the electrons with the atomic nuclei:
where ri is the coordinate of electron i; Zα is the charge on the nucleus Rα. The spin coordinate can be omitted in order to simplify the notation.
The average total energy for a state specified by a particular wavefunction Ψ is the expectation value of H, E[ψ], defined in the following equation:
E[Ψ]=∫Ψ*ĤΨdr≡
Ψ|Ĥ|Ψ
(5)
The lowest energy eigenvalue, E0, is the ground state energy and the probability density of finding an electron with any particular set of coordinates {ri} at the ground state is |ψ0|2.
Hartree-Fock theory provides an approximate solution of the Schrödinger equation. The expression for the Hartree-Fock energy is defined as:
in which the first term is a sum of the kinetic energy and the external potential, the second term is the classical Coulomb energy, and the third term is the exchange energy. The ground state orbitals are determined by applying the variation theorem to the above energy expression under the constraint that the orbitals are orthonormal, leading to the Hartree-Fock equation:
where VX is a non-local exchange potential. The Hartree-Fock equations describe non-interacting electrons under the influence of a mean field potential consisting of the classical Coulomb potential and a non-local exchange potential. Therefore, the Hartree-Fock method is also known as self-consistent field method (SCF) in which each particle is subjected to the mean field created by all other particles.
Approximations such as electron correlation methods can be made to better represent the wavefunction and the total energy of the system. However, accurate solutions require a detailed description of the spatial variation of the wavefunction, i.e. a large basis set is required which also leads to increased expense for practical calculations. Many correlation methods have been developed for molecular calculations. However, the cost of the most commonly used methods, such as MP2, MP3 (third order MP), and CCSD, scales with the number of electrons. Due to the high computational expense, the routine application of such methods to realistic models of systems of interest is not practical.
Given that direct solution of the Schrödinger equation is not currently feasible, especially for systems of interest in condensed matter science, approximation methods such as density functional theory have been developed to make a compromise between accuracy and computational cost. In this case, instead of solving the full Schrödinger equation for the wavefunction, two-particle probability density, which is the probability of finding an electron at position r1 and an electron at position r2, is sufficient for the purpose of calculating the ground state energy. Therefore, the total energy can be expressed in terms of the total electron density rather than the wavefunction.
Density functional theory (DFT) provides a theoretical approach to calculating the total energy of a molecular system from the one-particle density matrix. In the Kohn-Sham implementation of DFT (KS-DFT), which resembles to the HF equation, the electronic exchange and correlation effects are included via an approximate exchange correlation functional, whose derivative with respect to the one-particle density provides an effective potential for the one-particle (i.e., HF-like) equations.
A local density approximation (LDA) is another approximation for calculating the exchange-correlation potential, which assumes that an inhomogeneous density of a molecule can be calculated using the homogeneous electron gas functional same as the local density around the electron.
Hybrid functionals are another class of approximation to the exchange-correlation function in density functional theory. The hybrid functionals incorporate a portion of exact exchange from Hartree-Fock theory with a portion of exchange and correlation from other sources. The exact exchange energy functional is expressed in terms of Kohn-Sham orbitals rather than density. One of the most commonly used exchange-correlation functional in DFT is B3LYP (Becke, three-parameter, Lee-Yang-Parr). Another commonly known exchange-correlation functional is PBE functional that mixes PBE (Perdew-Burke-Ernzerhof) exchange and correlation energy with Hartree-Fock exchange energy. Many other hybrid functionals or non-hybrid functionals such as gradient-corrected methods including PBE can also be used in the current disclosure and are identifiable to a person skilled in the art.
The EMF methods are advanced from the density functional theory to better describe a chemically reactive system by partitioning the one-particle basis of the system into two or more subsystems, each subsystem treated with different levels of mean field accuracy (alternatively, levels of computational cost). Such partition imposes a block-like structure on the reduced one-particle density matrix in the one-particle basis set. For example, in a case with two subsystems, the density matrix is a 2×2 matrix that takes the form
where α and β represent one-particle basis functions in one subsystem A, and μ and ν represent one-particle basis functions in the other subsystem B. In this example, the subsystem A is referred to as the active region and the subsystem B is referred to as the environment region, with the connotation that the active region is treated at a higher mean-field level of accuracy and the environment region is treated as a lower mean-field level of accuracy. More than two subsystems can be utilized, and can include multiple active and/or environment regions. The number of rows and columns of the density matrix matches the number of subsystems in the molecular system. Therefore, the density matrix for a molecular system consisting of N number of subsystems is an N×N matrix. It is noted that compared to the HF and KS-DFT, in the EMF methods the one-particle density matrix is partitioned at the level of particle basis functions with respect to the subsystems according to equation (8). For this reason, the active region and the environment region can each further comprise two or more non-contiguous sub-regions that are not adjacent or directly connected to one another. In other words, one or more active regions can be embedded in an environment region, each of which can be treated with different levels of computational cost. The density matrix shown in equation (8) describes an electronic mean-field state corresponding to a total energy of the molecular system defined below in equation (9).
The choice of the active region is usually made by chemical intuition. A minimum-size active region can be defined on chemical grounds by considering the chemical problem at hand. For example, an atom or a group of atoms that is involved in a reaction of interest can be selected, then the active region can be defined to include any atom that is within 5 Å from the selected atom or from the center of mass of the selected group of atoms. Each of the atoms in the active region is described by a set of one-particle basis functions. The EMF results can then be checked with respect to adjusting the active region, either enlarging the active region or reducing the active region.
The total EMF energy functional for the above-described system, including two subsystems A and B, is a function of the above defined density matrix D and is defined in the following equation:
where h is the core Hamiltonian that represents the kinetic energy of the electrons and the external potential. The first term is defined as the trace of the product of the density matrix D and the core Hamiltonian h. Exc1 and Exc2 are the exchange-correlation functionals for two mean-field levels of approximation representing the low-level of approximation and the high-level of approximation, respectively. G1 and G2 include any contributions from two-electron integrals, for the low-level and the high-level approximation, respectively, such as those from the Coulomb electrostatic repulsion. It is noted that in certain embodiments any of these terms can vanish.
The Fock can be defined as the partial derivative of the total energy defined in equation (9):
where i and j represent index of the one-particle basis functions. Components in the Fock matrix can be obtained using standard methods familiar to a person skilled in the art. The Fock matrix fij, which depends on the density matrix, can be optimized iteratively using the self-consistent-field (SCF) procedure. The SCF procedure is repeated until self-consistency is achieved in the density matrix to a desired accuracy. The self-consistency is achieved, or the procedure is converged, when the new density matrix calculated from the Fock matrix fij is the same or within a specified convergence threshold as the density matrix calculated from the previous step. Then the resultant solution including the electron density and the total energy can be used to calculate chemical quantities of the system alone or in combination with external fields, such as the reactive energy, structurally stable geometry, dipole moment, magnetic moment, and other quantities of interest.
Therefore, in the EMF methods, the total energy of the molecular system is estimated by combining mean-field quantum mechanical calculations that involve self-consistent optimization of a single density matrix, as shown in equation (8), describing the entire molecular system combining two or more subsystems. As a result, the molecular system is self-consistently optimized as a whole using the SCF procedure with each subsystem treated at a different mean-field level of accuracy.
The EMF methods therefore disclose a full self-consistent solution to the electrons with respect to the molecular orbitals, which can be simple, efficient, and robust to achieve given the inherent structure provided by such method. In particular, the EMF methods provide a multi-level description of the electronic structure of complex molecular systems at a reduced computational cost and with improved accuracy that make feasible the simulation of long-timescale dynamics in systems consisting of large number of atoms. It is noted that the EMF method itself is an example of a mean-field method.
Since both the active region and the environment region are treated at mean-field level of accuracy, the interfaces between the high- and low-level subsystems are correctly accounted for, and the electronic energy surfaces are guaranteed to be smooth functions of the nuclear coordinates. The methods introduce no constraints on the number of electrons in any partition and unambiguously return the same number of electrons in each partition. The methods can also capture quantum entanglement between partitions at the level of mean-field approximation by making no assumption that any partition is in a pure state. As a result, the boundary effect that commonly arises in other methods such as QM/MM or ONIOM is eliminated. In addition, in comparison to other conventional methods, such as RFFs, the EMF methods require no external or system-dependent parameters beyond the determination of the partition.
Additionally, the linear response theory can be implemented more efficiently than in other formulations. Note that the linear response theory refers to the linear response of the electron density and total energy with respect to the external fields, both in time-dependent and time-independent way, as well as any higher-order responses of the energy or density.
The gradient of the EMF energy with respect to nuclear coordinates, which is required for calculation of force, can be efficiently and robustly implemented due to the full minimization of the energy with respect to the linear variational parameters. In particular, a solution of the coupled-perturbed equations is not necessary for the calculation of EMF energy gradients. The calculated force can be combined with the molecular dynamics methods to study the time evolution dynamic behavior of the molecular system. For example, the trajectories of atoms in the molecular system can be determined by numerically solving the Newton's equation for each individual atom, where forces applied on each atom can be derived from the EMF potential.
In some embodiments, different subsystems can be treated at the same mean-field level of approximation. For example, the active region and the environment region can be treated both at a higher mean-force level of accuracy (i.e. “high-level”). Alternatively, the active region and the environment region can be treated both at a lower mean-force level of accuracy (i.e. “low-level”). In such cases, the EMF methods reproduce the same result as for the mean-field calculation performed on the full system. Additionally, additional regions outside the environment regions can be treated at an even lower level of accuracy with non-EMF methods.
In some embodiments, the EMF methods can describe the active region using Kohn-Sham Density Functional Theory (KS-DFT) and the environment region using density functional tight binding (DFTB). The DFTB is an approximated density functional theory method by introducing parameters to the conventional DFT to reduce computational cost while maintaining a reasonable accuracy.
In certain embodiments herein described, the EMF methods can describe the active region using KS-DFT with better, and more costly, descriptions (i.e. “high-level”) of the one-particle basis set, the exchange-correlation functional, and the coulomb interactions and the environment region using KS-DFT with less costly descriptions (i.e. “low-level”) of one-particle basis set, the exchange-correlation functional, and the coulomb interactions. Such implementations are described below in detail in the section titled The EMF Methods Using Mixed Basis Sets.
The EMF methods can be used for multi-level approaches in which the active region is described using KS-DFT and the environment region is described using orbital-free DFT. Orbital-free DFT used herein is a non-KS implementation of DFT that involves the need for a kinetic energy functional of the density.
In certain embodiments, to improve the accuracy of the EMF method, elements of the Fock matrix can be shifted according to various algorithms. Since different levels of approximation for the active and environment regions of a general molecular system can be used, it is noted that some calculations can potentially exhibit errors in which electrons incorrectly flow between the active and environment regions. It is further noted that various algorithms can be developed in which elements of the Fock matrix are shifted by parameters that mitigate this problem. These parameters can be derived, for example, from calculations on isolated atoms, thus ensuring that the parameters are transferrable among different systems. In some embodiments, relative values for diagonal or off-diagonal matrix elements of the Fock matrix can be parameterized based on the properties of a library of test molecules or upon isolated atoms.
In certain embodiments in which the environment region is much larger than the active region, the total cost of evaluating the Fock matrix will be dominated by the cost of the calculation on the environment region. The higher-quality description for the active region is thus obtained at negligible cost.
Note that even though the EMF methods are described above using a system with two subsystems as an example, the EMF methods can be generalized and applied to systems comprised of a plurality of subsystems, with each subsystem treated with different mean-field level of accuracy.
In certain embodiments, the EMF methods can be implemented in periodic boundary conditions, which approximate a large infinite system by using an infinite number of a small unit cell, in order to model systems with a large number of atoms. Among all the unit cells, one cell is an original simulation box and other cells are copies, also called images, of the original simulation box. All atoms in the original simulation box are replicated throughout the space with its images to form an infinite lattice of the cells tiled together. That is, if atoms in the original simulation box have certain positions, the positions of the image atoms can be calculated according to the periodic boundary condition. As a result, each atom in the original simulation box interacts not only with other particles in the same simulation box, but also with their images in the adjacent boxes. Therefore, by using periodic boundary conditions, calculation of a large infinite system can be approximately reproduced by a less costly calculation of the small unit cell while taking into account the environmental contribution from the images. The unit cell commonly has a cubic shape, but non-cubic shapes can also be used including truncated octahedral or rhombic dodecahedral cells.
A major factor in determining the cost of calculating the Fock matrix is the number of functions in the one-particle basis set for the active region and the environment region. Although a larger basis set leads to a more accurate description of the electronic structure, the description of the environment region might be adequately handled with a smaller one-particle basis set, thus reducing the total cost of evaluating the Fock matrix. Although the use of mixed basis sets is an available feature in most quantum chemistry packages, such feature is simply a special case of the more general EMF method.
In the embodiment described below, a mixed basis sets will be used to represent the active region and the environment region, in which the environment region is described using a smaller one-particle basis set. In particular, a minimal density fitting basis set is used to describe the Coulomb electrostatic repulsion term represented in equation (9).
Full evaluation of the Coulomb electrostatic repulsion energy in equation (9) is given by
where p, q, r, and s index one-particle basis functions,
Due to the 4-index integrals (pq|rs), calculation of the above would scale theoretically as O(n4), where O is a big O notation in terms of computational complexity and n is the number of one-particle basis functions. In density fitting, an auxiliary basis set of basis functions χ, indexed by A and B, is used to approximate these integrals:
where the Coulomb operator, defining the electron-electron repulsion energy is given by
The approximate Coulomb energy is thus summed up in an expression analogous to resolution of the identity:
In practice, calculation of the Coulomb energy with density fitting yields a computational cost that scales as O(mn2), where m is the number of density fitting basis functions. By treating the environment region with a small density fitting basis, the overall cost of the Coulomb electrostatic repulsion interaction can thus be dramatically reduced.
Next, the approximation of the exchange-correlation energy Exc is described. The exchange-correlation energy Exc represented in equation (9) can be described using a less-costly function (i.e. “low-level”), shown in detail as follows. For local and semi-local exchange-correlation functionals of the electron density, the exchange-correlation energy is typically evaluated via quadrature using
E
XC
=∫dr∈[ρ(r)] (18)
where the electron density ρ(r) is calculated on a numerical grid using
and ∈[p] is an approximate exchange-correlation functional that depends on the local density and its spatial derivatives at a given grid-point. While evaluation of the Fock matrix formally scales as O(Nn2), where N is the total number of atoms, pre-screening of the grids can be implemented easily in order to decrease the asymptotic expense. The evaluation of the non-hybrid exchange-correlation energy is a smaller computational cost than the exact exchange or Coulomb terms.
As previously described, partial inclusion of the Hartree-Fock exact exchange energy in the DFT energy has been observed to allow for substantially more accurate density functionals, commonly known as hybrid functionals. This exchange energy is written as:
Techniques from the Coulomb energy calculation can be applied to the exchange integrals. However, this expression includes a more computationally difficult (i.e. “high-level”) ordering of the summation indices, as the summand is not separable in terms of the density matrix. This fact makes the use of density fitting and other approximations difficult, making the usual calculation of EK scale theoretically as O(n4). Even without using techniques like density fitting for the Coulomb energy, exact exchange is usually the most expensive component of a hybrid functional DFT calculation.
Using the EMF method, it is straightforward to describe the active region using a hybrid exchange-correlation function and the environment using a local or semi-local exchange-correlation functional. Similarly active regions can be treated using a more costly semi-local exchange-correlation functional (i.e. “high-level”) and the environment regions using a less costly semi-local exchange-correlation functional (i.e. “low-level”).
Exemplary Types of System Features from the EMF Methods
Many types of system features can be determined using the EMF methods, including total system energy, optimized geometry, and frequencies, each of which provides its own information about a molecular system.
In some embodiments, total system energy can be determined from a molecular system with a given nuclear configuration or a given set of one-particle basis functions. Total system energy can predict properties such as atomic charges, ionization energy, dipole moment, spectroscopy, electronic energy levels, and many other chemical properties. Comparisons of the total system energies of a molecular system at different states represented by different nuclear configurations or different one-particle basis functions can provide information such as bond strengths or energy changes and barriers associated with conformational changes or chemical reactions.
An embodiment of the EMF procedure shown in
In some embodiments, the EMF methods can be used to calculate the reaction energy of a chemical reaction. Using the deprotonation of a carboxylic acid as an example, the protonated form of the carboxylic acid represents one state, corresponding to the reactant side of the deprotonation reaction. The deprotonated form of the carboxylic acid represents another state, corresponding to the product side of the reaction, in which the chemical bond between the hydrogen and oxygen in —COOH is broken. In such case, a first and second total energy of the molecular system at two different states can be calculated, concurrently or sequentially, using the same EMF procedure described in
Many intermediate states, including transition states, can exist in a chemical reaction between the reaction state and the product state. Accordingly, the total energies of one or more intermediate states can be calculated using the EMF methods, and plotted as a function of a reaction coordinate, which, in the above described deprotonation case, is defined as the bond length between the oxygen and hydrogen atoms of the carboxylic group.
The term “reaction coordinate” represents the changes in nuclear coordinates as the molecular system progress from an initial state to a final state, such as from reactants to products in a chemical reaction or from an initial configuration to a final configuration. In some embodiments, the reaction coordinate corresponds to the stretching or twisting of a particular bond. In some embodiments, the reaction coordinate corresponds to a bond length between one atom and its adjacent atom, an angle of rotating one bond with respect to another bond, or the distance between the center of masses of the molecular system at the initial state and the final state. In certain embodiments, the reaction coordinate represents an ensemble of the nuclear coordinates of the molecular system at each individual state. The reaction coordinate can be alternatively defined as an arbitrary parameter identifiable to a person skilled in the art.
In some embodiments, the EMF method-based electronic structure calculations can be used to perform geometry optimization to predict equilibrium geometries of molecules. The total energies calculated from the EMF methods as a function of the nuclear coordinates of the molecular system can be used to define a potential energy surface. In other words, the potential energy surface establishes the relationship between the total energy of a molecular system and its geometry. A potential energy surface can be obtained by performing an iterative steps of single point calculations using equation (9) of the description and adjusting the geometry at each step. The minima of the potential energy surface define equilibrium geometries of the molecular system corresponding to the local minimum of the estimated total energies. In addition, the maxima of the potential energy surface can be used to determine the energy and geometry of the highest energy states such as transition states. The gradient vector of first order derivative of the potential energy surface with respect to nuclear coordinates can be used for the calculation of force. The second order derivative of the potential energy surface is typically called a Hessian matrix that can be used in geometry optimization to locate potential energy minima, transition states and other stationary points such as saddle points. The term “stationary points” refers to configurations of the molecular system wherein the first order of derivative of the total energy is zero.
In some embodiments, frequencies can be obtained from computing various modes of nuclear vibrational motion within a molecule based on the second order derivative of the potential energy. Such calculations can be used to determine zero-point and thermal corrections to the internal energy, enthalpy contribution to Gibbs free energies, and vibrational spectra of a molecular system.
In some embodiments, differentiation of the potential energy and electron density with respect to external fields can be used to calculate response properties. Energy derivatives with respect to applied electric fields correspond to electric moments, infrared intensities, polarizabilities and hyperpolarizabilities and energy derivatives with respect to external and nuclear magnetic fields correspond to magnetic properties such as magnetic moment, magnetic susceptibilities, and nuclear magnetic resonance chemical shifts.
In some embodiments, the EMF methods herein described are implemented in combination with many other simulation tools identifiable by a skilled person upon reading of the present disclosure. The simulations tools that can be used in combination with the EMF methods include any classical or quantum molecular simulation tool that utilizes Born-Oppenheimer potential energy surfaces, including molecular dynamics, Monte Carlo simulation, energy minimization and docking. Such implementations will involve evaluating the total energy of the system under consideration and its derivatives with respect to parameters in the Hamiltonian.
In some embodiments, the EMF methods are combined with other simulation methods to determine kinetic and thermodynamic properties of a molecular system by sampling configuration space according to Boltzmann distribution. Such samples can be obtained by molecular dynamics that simulates the thermal motion of the molecular system by integrating the equations of motion. Alternatively, such samples can be obtained by Monte Carlo methods by randomly stepping through conformation space and using the Metropolis criterion to ensure the convergence of the conformational sampling. Such kinetic and thermodynamic properties include enthalpy and entropy of the molecular system, Gibbs free energy, heat of reaction, activation energy, rate constant, and many other properties of the molecular system identifiable by a person skilled in the art.
The EMF methods can also be used in combination with other multi-scale strategies identifiable by a skilled person upon reading of the present disclosure. As one example, the EMF methods can be embedded in a molecular mechanics or continuum description of a larger environment. Alternatively, more accurate wavefunction-based methods can be embedded, by using the above-described projector technique, in a multi-level EMF description of the environment.
The term, “system features” refers to features of a molecular system and particles contained in the molecular system that are derived from the total energy of the molecular system and density matrix of the molecular system. Exemplary system features comprise location of the particles, related geometries, interactions between particles and related bonds, stability of the system. In addition to system features that can be extracted from the ground-state potential energy surfaces, response of the electronic density to external fields can be used to obtain excited-state potential energy surfaces. Exemplary system features can additionally include response properties that characterize the response of the total energy and density matrix to external fields such as magnetic field and electric field.
System features define the chemical reactivity and/or physicochemical state of the system or portions thereof “Chemical reactivity” refers to the relative capacity of an atom, molecule, or radical to undergo a chemical reaction with another atom, molecule, or compound. In particular, chemical reactivity can be detected and/or expressed as the rate at which a chemical substance tends to undergo a chemical reaction in time. For example, in some systems the rate constant of a chemical reaction can be determined by K=A*e−Ea/RT, in which Ea is the activation energy of the chemical reaction A reaction rate of a chemical reaction can be determined from the rate constant and concentrations of the reactants in the chemical reaction. Other factors can also influence the reaction rate, including temperature and catalysts.
In some pure compounds, reactivity is connected and in particular regulates by the physicochemical state of the system.
The term “physicochemical state” refers to macroscopic, atomic, subatomic, and particulate combination of features of a molecular system that define a state of the molecular system and the related propensity to undergo physicochemical transformations or evolutions. Exemplary combination of features that define the physicochemical state of a molecular system comprise: intermolecular forces that act upon the physical properties of materials (plasticity, tensile strength, surface tension in liquids); the identity of ions and the electrical conductivity of materials; interaction of one body with another in terms of quantities of heat and work; number of phases; number of components; and degree of freedom (or variance).
In particular, chemical reactivity, physicochemical state, and related properties are derived from the system features and/or the energy of the molecular system as would be understood by a skilled person.
Accordingly, the EMF method allows in several embodiments to predict and/or control chemical reactions and/or physicochemical features of a molecular systems or portions thereof. For example, reagents of a chemical reaction can be selected to obtain formation or breaking of a covalent, ionic or other kind of bond. Also, reagents or reaction conditions can be selected to obtain the transition of a compound from one state to another state. As an additional example, interactions between a compound and a light source can be controlled to obtain a certain absorbance of light by the compound or portions thereof (detectable for example by a change of a rotational, vibrational or nuclear magnetic resonance spectra of the system). Also, a compound can be designed to have a desired stability and/or a desired reactivity alone or in combination with another compound or portions thereof (e.g. an enzyme designed to obtain a certain reaction or a ligand and/or receptor designed to obtain a desired ligand receptor interactions (e.g. agonist, antagonist or superagonists).
Alternatively, the EMF methods allow in some embodiments to calculate the effect of an external field on UV-visible, fluorescent, vibrational, rotational or nuclear magnetic resonance spectra of a molecular system. For example, by varying the external electric field in the EMF calculations, the impact of the external electric field in the electric double layer on an oxygen reduction reaction on an electrode can be estimated. Furthermore, the methods can be applied to calculate the effect of such external electric field in the electric double layer on possible changes in adsorption energies, reaction barriers and reaction mechanisms for the oxygen reduction reaction on different metals.
The currently disclosed EMF methods and various implementations can be used to study chemically reactive systems and non-adiabatic systems. General areas of application include: heterogenous catalysis, homogenous catalysis such as splitting of ligands, semiconductor etching, processes at semiconductor surfaces, chemistry on and among nanoparticles, chemistry at battery electrodes, graphene chemistry, biocatalysis, and many other application domains where chemical reactions are involved. Application domains of the EMF methods are further illustrated in the following sections, which are provided by way of illustrating exemplary uses of the EMF methods and are not intended to be limiting.
In particular, the EMF methods can be applied to systems involving strong inter-atomic bonding, such as conjugated systems. The term “conjugated system” used herein is a system of connected p-orbitals with delocalized electrons. Small errors in modeling of such complex systems can result in inappropriate results. Studies of these systems can be conducted based on high-level ab initio quantum mechanics theories, but the complexity and the number of variables of such systems that must be taken into account carries a high computational cost. Therefore, conventional methods developed for non-conjugated systems usually experience difficulties in this regard. Similarly, systems such as metallic, semiconductor, and nanoparticle systems that have delocalized electrons have many of the same challenges. The currently disclosed EMF methodology has shown success with improved results in reducing the errors to acceptable levels, in many cases, to reduced errors less than 1 kcal/mol in terms of the relative energy as shown in the above exemplary applications. The computational cost of the EMF methods is also comparable to many low-level theories.
The EMF methods presented above have been implemented in the special case of embedding one DFT approximation in another. To provide a proof of principle of the proposed EMF methodology, a series of applications of this implementation has been performed.
Potential energy curves for removing a proton from the methanol molecule were calculated. The reference calculation used the PBE exchange-correlation functional, and the atomic orbital basis set aug-cc-pVDZ are used on oxygen and cc-pVDZ on other atoms.
2. Symmetric SN2 Reaction of F— with n-Propyl Fluoride
The EMF method is demonstrated, in
The EMF calculations employ two active spaces, as is illustrated in
For the purpose of analysis,
The convergence of the EMF method is herein systematically demonstrated in both non-conjugated and conjugated systems.
From panel A in
In this implementation, an additional application of the EMF method to a conjugated system is demonstrated, and its convergence with respect to the size of the embedded subsystem is illustrated.
The EMF calculations employ three active spaces of increasing size, as is illustrated on
The accuracy of various EMF implementations is illustrated across various chemical reactions, illustrating effects of using (KS-DFT)-in-(KS-DFT) (or more succinctly DFT-in-DFT) embedding with various combinations of mixed one-particle basis set for the active and environmental regions, mixed exchange correlation functions for different regions, and mixed density fitting basis sets for the Coulomb interaction calculations associated with the different regions. The CPU time per SCF iteration measured on an Intel Xeon 2.6 GHz processor for the different embedding schemes is also discussed. Furthermore, the comparison between DFT-in-DFT and other embedding schemes, such as vacuum embedding, ONIOM (QM:MM), and ONIOM(QM:QM), is shown.
In the following sections, mixed basis sets were used to denote 6-31G*/STO-3G, mixed exchange correlation to denote PBE/LDA, mixed density fitting to denote Ahlrichs/Spherical, ONIOM(QM:MM) to denote ONIOM(PBE/6-31G*:UFF), and ONIOM(QM:QM) to denote ONIOM(PBE/6-31G*:LDA/STO-3G).
A system consisting of only single bonds is particularly easy to treat with conventional embedding schemes including ONIOM. This is largely because the reaction center is so localized that the rest of system does not play a crucial role in the reaction. DFT-in-DFT and other embedding methods were tested to calculate the reaction energy of SN 2 reaction from haloalkane to alcohol (shown in
A conjugated linear alkene could be challenging for ONIOM because the distinction between single and double bonds becomes slightly unclear. However, if the reaction still occurs quite locally, e.g. at the terminal carbon of the chain, ONIOM may perform well. The hydrogenation at the terminal carbons of conjugated alkene belongs to such cases, as shown in
If a reaction occurs at two carbons in the internal of a linear alkene, there are two interfaces between subsystems. Although the reaction still would occur quite locally, it is a harder case to use embedding methods compared to the previous case. As an example, a Diels-Alder reaction between a conjugated linear alkene chain and 1,3-butadiene is illustrated, as shown in
In naphthalene, or generally in aromatic systems, there is no clear distinction between single and double bonds. In this regard, ONIOM is expected to fail in capturing correct chemistry in such systems. On the other hand, DFT-in-DFT does not require any a priori assumption on chemical bonds, so it should be able to describe proper chemistry with reasonable accuracy. To demonstrate this point, the different embedding methods to evaluate the reaction energy of naphthalene hydrogenation, as shown in
DFT-in-DFT may behave less well in some types of reactions. As mentioned earlier, mixing basis introduces a spurious dipole across the boundary between subsystems, and it can cause a slow convergence in the reaction energy as the size of active subsystem increases. If the reaction manifests charge changes in system, e.g. deprotonation or protonation, the spurious dipole would affect the results even more significantly. As the first example to illustrate this point, the deprotonation energy of carboxylic acid, as shown in
Reactions on semiconductor nanocrystals are one of the many application domains that can be investigated using the EMF methods. Ligand-binding reactions on semiconductor nanocrystals provide an important class of targets for the development of efficient photovoltaic materials. In particular, cadmium selenide (CdSe) quantum dots have been extensively characterized both theoretically [references 40-44] and experimentally [references 40, 45, 46], with focus on issues that range from the dependence of material and photochemical properties on cluster size, stoichiometry, and ligand coverage. The disclosed EMF methods can be applied to model acetate and methylamine capped CdSe quantum dots, with focus on ligand binding and exchange energies, as well as cluster stability [reference 40]. Results can involve the description of chemical processes that involve breaking covalent bonds in the presence of an extended semiconductor material.
Transition metal complexes in the area of inorganic catalysis are another application domain. High-level approximation methods such as KS-DFT have been used for the characterization of structures and energetics in this area, due to their adequate description of the electronic structure. However, the cost of high-level approximation methods such as KS-DFT calculations for such systems is still too computationally high to enable MD simulations, trajectory-based reaction mechanism studies, and inclusion of explicit solvent. The disclosed EMF methods can be employed in such systems to overcome the above mentioned drawbacks while still being able to maintain or exceed the accuracy.
Inorganic catalysts, such as Cobalt- and Nickel-based catalysts, have emerged as promising candidates for hydrogen evolution in solar powered water-splitting devices [reference 47]. The formation of metal-hydride intermediates [references 48, 49] is an important step in the catalytic processes. Recent studies have focused on the use of KS-DFT calculations and ONIOM calculations for the description of such systems, revealing that the multi-level partitioning description of the ONIOM method provides inadequate accuracy, even for qualitative trends [references 50, 51]. The EMF methods can be employed to compute the hydride dissociation energy for Cobalt- and Nickel-based catalysts, such as the Co(III)H(dpgBF2)2L2 (dpg=diphenylglyoxime, L=acetonitrile) system.
Another focus of transition metal catalysis is to develop systems for efficient oxygen evolution [reference 52]. Experimental work has demonstrated water oxidation catalysis by a di-nuclear Ru complex [reference 53], and a subsequent KS-DFT study of deprotonation and redox chemistry [reference 54] examined a set of corresponding Ru monomers across a broad range of pH-pE values, yielding insight into the effects from ligand field couplings and metal-ligand charge transfer. The EMF method can be employed to compute ligand acidities, reduction potentials, solvation energies, and free energy changes with respect to Ru spin state for systems such as the Ru(OH)(Q)(tpy) (Q=3,5-di-tert-butyl-1,2-benzoquinone, tpy=2,2:6,2-terpyridine) system.
Another application domain of the EMF methods is the redox potential study of molecules encased in fullerene systems that involve additional complexities exhibiting extensive conjugating and multiple oxidation states. Metal nitride clusterfullerenes (NCFs) are a new and expanding class of endohedral fullerenes in which a homogeneous or heterogeneous metal nitride cluster is enclosed inside a large carbon cage [reference 55]. Experimental work in this system indicates that upon reduction or oxidation of the metal nitride cluster, the excess charge can localize either on the cluster itself, or can cascade to the surrounding fullerene [references 56, 57]. The EMF methods disclosed herein can be employed to calculate the system oxidation and reduction potentials [reference 56],
Fe-MOF-74 is an iron-containing metal-organic framework useful for efficiently separating short chain hydrocarbon (C1-C3) molecules based primarily on their degree of saturation [references 58, 59]. Due to the prohibitively large size of the MOF unit cell, previous KS-DFT studies [reference 60] have reduced the system to intermediate representative clusters consisting of 50-100 atoms, in order to calculate hydrocarbon binding energies with respect to the hydrocarbon identity and the Fe spin state. The currently disclosed EMF methods can be employed to study a more realistic representation of the system consisting of an increased number of atoms. Results can provide further indications on how hydrocarbons bind to metal atoms and other subtle effects from structural distortion, charge transfer, and ligand field effects. Additionally, EMF calculations can be performed with periodic boundary conditions to investigate the sensitivity of the calculated results to approximations associated using the finite cluster representation.
The refinement of explosives such as Research Department Formula X (RDX), HMX, and TNAX is a subject of interest for a variety of industrial and military applications [references 61-63]. The energy density, cost, and environmental toxicity [reference 64] are key features of HEDM. Other critical properties include their stability against thermal impulses and impact shocks, which determines the safety and practicality with which the materials can be transported and stored [reference 65].
Understanding and improving the stability of HEDM requires investigation of the early-stages of chemical decomposition and explosion, but high-resolution condensed phase experiments [references 66, 67] remain a technically challenging and potentially hazardous pursuit. Reliable, simulation-based approaches for studying and predicting explosive initiation dynamics are thus attractive from the perspective of both cost and safety, although the complex chemistry and long timescales of HEDM decomposition create significant methodological challenges. Empirical descriptions [references 68-71], including ReaxFF methods and tight-binding methods such as DFT binding (DFTB), and first-principles methods such as KS-DFT can be used to computationally study such systems. However, the ReaxFF methods [references 70-72] are limited by their lack of generality and requirement for extensive parameterization, the tight-binding methods [references 73-78] do not offer the accuracy of first-principle methods, while first-principle methods such as KS-DFT [references 79-82] are too computationally costly to access the nanosecond timescales for large systems size that are needed to realistically describe the HEDM decomposition dynamics. The EMF methods, on the other hand, provide a good compromise and are well suited to performing accurate, long-timescale simulation studies of the reactive processes associated with HEDM explosive initiation. Explosive material such as HMX (Octahydro-1,3,5,7-tetranitro-1,3,5,7-tetrazocine), both in β-crystalline form and in slurries with water can be studied. In addition, the decomposition pathways, reaction dynamics, and energy production following ignition due to high temperatures and shock pressure can be compared between the key steps associated with HMX decomposition.
Software realizations of the currently disclosed EMF methods can be developed to enable simulation, study and design of chemical, materials and biological systems.
The examples set forth above are provided to give those of ordinary skill in the art a complete disclosure and description of how to implement the EMF methods and related methods and to apply the methods to a molecular system of interest, and are not intended to limit the scope of what the Applicants regard as their disclosure. Modifications of the above-described modes for carrying out the disclosure can be used by persons of skill in the art, and are intended to be within the scope of the following claims.
A number of embodiments of the disclosure have been described. The specific embodiments provided herein are examples of useful embodiments of the disclosure and it will be apparent to one skilled in the art that the disclosure can be carried out using a large number of variations of the methods, including basis sets, number of subsystems, and various approximation methods within mean-field level of approximation, set forth either in the present description or readily identifiable to one of skill in the art. The methods disclosed in the current document can be applied to molecular systems and materials of different size varying from tens of atoms to millions of atoms or even more. Thus, the present disclosure is not intended to be limited to the implementations shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The entire disclosure of each document cited (including patents, patent applications, journal articles, abstracts, laboratory manuals, books, or other disclosures) in the Background, Summary, Detailed Description, and Examples is hereby incorporated herein by reference. All references cited in this disclosure are incorporated by reference to the same extent as if each reference had been incorporated by reference in its entirety individually. However, if any inconsistency arises between a cited reference and the present disclosure, the present disclosure takes precedence.
The present application claims priority to U.S. Provisional Application No. 61/906,775 entitled “Embedded Mean Field Method for the Seamless, Multi-Scale Description of the Electronic Structure of Molecular Systems and Materials” filed on Nov. 20, 2013 with docket number CIT 6737 P the disclosure of which is incorporated by reference in its entirety. The present application is also related to International Application S/N ______ entitled “Methods for a Multiscale description of the electronic structure of molecular systems and Materials and related applications” filed on Nov. 20, 2014 with docket number P1565-PCT, the disclosure of which is herein incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
61906775 | Nov 2013 | US |