D3SC: EAGER: Collaborative Research: A probabilistic framework for automated force field parameterization from experimental datasets

Information

  • NSF Award
  • 1738979
Owner
  • Award Id
    1738979
  • Award Effective Date
    8/1/2017 - 7 years ago
  • Award Expiration Date
    7/31/2019 - 5 years ago
  • Award Amount
    $ 179,907.00
  • Award Instrument
    Standard Grant

D3SC: EAGER: Collaborative Research: A probabilistic framework for automated force field parameterization from experimental datasets

Michael Shirts of the University of Colorado Boulder and John Chodera of the Sloan Kettering Institute are supported by a grant from the Chemical Theory, Models and Computational Methods program in the Division of Chemistry to develop statistical and probabilistic methods for parameterizing molecular force fields through the automated integration of information from large experimental datasets. This project is supported under the Data-Driven Discovery Science in Chemistry (D3SC) Dear Colleague Letter (DCL), and is co-funded by the Cyberinfrastructure for Emerging Science and Engineering Research (CESER) Program in the Office of Advanced Cyberinfrastructure. Force fields are classical approximations to quantum mechanical descriptions of interacting molecules. Because they are several orders of magnitude faster to use in simulations than even approximate quantum approaches, force fields are integral to computational modeling in chemistry, chemical engineering, biophysics, materials science, and soft-matter physics. New and more accurate force fields are needed in order to accelerate drug discovery, biomaterials design, and nanoscale device engineering. Currently, force fields are primarily tuned using quantum chemical calculations and small amounts of experimental data, and rely on optimization methods that often require considerable manual intervention, may not identify optimal solutions, and do not provide a way of characterizing and propagating parameter uncertainty. When experimental data is included in the tuning process, there is currently no systematic way to incorporate information on measurement error to weight the data accordingly. Professors Shirts and Chodera are developing a rigorous Bayesian probabilistic framework and statistical techniques to overcome these problems. Their approach is designed to take advantage of large, rich experimental datasets including measurement uncertainty, leverage available data more efficiently, and automate both parameter selection and the choice of functional forms in the mathematical formulation of the force field. Software from the project is being disseminated as open source Python code that can be interfaced to simulation codes such as OpenMM and GROMACS. A new Open Force Field Group, with collaborators from academia, the National Institute of Standards and Technology, and industry, will advance community-driven force field development and applications during the project and beyond.<br/><br/>This project is addressing the challenges of force field parameterization by applying a rigorous Bayesian inference framework to determine force fields that are maximally compatible with experimental datasets. The formalism is being applied initially to organic and aqueous liquid mixtures using the NIST ThermoML Archive, which contains a wide range of thermophysical property measurements and associated measurement errors for thousands of molecules. Specific tasks include (1) developing and evaluating an automated Bayesian force field parameterization framework that scales to large numbers of parameters and large data sets, and (2) using this approach to explore the automated selection of force field functional forms. The Bayesian probabilistic framework developed in this work promises to greatly reduce human effort, maximize force field transferability and generalizability by avoiding over-fitting, and enable the systematic extraction of available information from a given set of experimental data. The probabilistic formulation will allow force fields to be easily extended to accommodate new experimental data in a consistent manner via conditional Bayesian updates, and will provide direct routes for estimating systematic error. Initial tests of the new approach will help resolve important questions on the parameterization of molecular force fields for liquid systems, such as optimal choices of functional forms and combining rules. The same techniques can be later used to determine whether pure fluid thermodynamic properties are sufficient to parameterize fluids to reproduce mixture properties, as measured experimentally by the project, and to assess the importance of including polarization in force fields. Open source software tools will be released as easily-installed interoperable Python modules and online instructive IPython/Jupyter notebooks, and all experimental datasets and parameter sets will be freely disseminated.

  • Program Officer
    Susan Atlas
  • Min Amd Letter Date
    6/30/2017 - 7 years ago
  • Max Amd Letter Date
    6/30/2017 - 7 years ago
  • ARRA Amount

Institutions

  • Name
    Sloan Kettering Institute For Cancer Research
  • City
    New York
  • State
    NY
  • Country
    United States
  • Address
    1275 York Avenue
  • Postal Code
    100650000
  • Phone Number
    6462273273

Investigators

  • First Name
    John
  • Last Name
    Chodera
  • Email Address
    john.chodera@choderalab.org
  • Start Date
    6/30/2017 12:00:00 AM

Program Element

  • Text
    Theory, Models, Comput. Method
  • Code
    6881
  • Text
    CESER-Cyberinfrastructure for
  • Code
    7684

Program Reference

  • Text
    CyberInfra Frmwrk 21st (CIF21)
  • Code
    7433
  • Text
    EAGER
  • Code
    7916
  • Text
    CDS&E
  • Code
    8084
  • Text
    COMPUTATIONAL SCIENCE & ENGING
  • Code
    9263