Open data-driven infrastructure for building biomolecular force fields for predictive biophysics and drug design

Information

Research Project
10412594

ApplicationId
10412594
Core Project Number
R01GM132386
Full Project Number
3R01GM132386-02S1
Serial Number
132386
FOA Number
PA-20-272
Sub Project Id

Project Start Date
3/1/2020 - 5 years ago
Project End Date
2/29/2024 - a year ago
Program Officer Name
LYSTER, PETER
Budget Start Date
3/1/2021 - 4 years ago
Budget End Date
2/28/2022 - 3 years ago
Fiscal Year
2021
Support Year
02
Suffix
S1
Award Notice Date
8/30/2021 - 3 years ago

Organizations

The University of Colorado, Inc.

Information

Open data-driven infrastructure for building biomolecular force fields for predictive biophysics and drug design

PROJECT SUMMARY/ABSTRACT Current generation molecular simulation models are insuf?ciently accurate, and current generation tools for building those models are limited, not automated, and based on aging infrastructure. Our original R01, ?Open Data-driven Infrastructure for Building Biomolecular Force Fields for Predictive Biophysics and Drug Design,? aims to solve these problems, producing a modern infrastructure for building, applying, and improving accurate molecular mechanics force ?elds. As part of our NIH-funded project, we have collaborated closely with the Molecular Sciences Software Institute (MolSSI) to use the QCArchive ecosystem to gen- erate and continuously expand very large quantum chemical datasets relevant to biomolecular systems on a variety of supercomputing resources. QCArchive now contains over 42M quantum chemical calculations for over 39M molecules, and has become incredibly popular, with over 1.79M accesses/month. Large quantum chemical datasets relevant to biomolecular systems are incredibly valuable to the AI/ML community. Data is the key element needed for both fundamental research into ML architectures and constructing predictive models for downstream use. Unfortunately, quantum chemical datasets are incredibly expensive to generate, limiting in-house generation of large, useful datasets needed to drive AI/ML research to a few large companies and researchers with access to suf?cient computing resources. While AI/ML quantum chemical methods have shown immense promise for biomolecular systems, the limited access to large, curated datasets has greatly hindered researchers from making rapid progress in this area. We aim to bridge this gap by working closely with MolSSI QCArchive developers to address robustness, scal- ability, and data delivery challenges to meet the needs of the biomolecular AI/ML community requiring access to large quantum chemistry datasets (Aim 1). Additional software developers will enable improvements to the QCArchive infrastructure to meet the rapidly growing demands of the AI/ML community. As QCArchive is primarily maintained by a single MolSSI Software Scientist, additional developers are necessary for fully enabling the AI/ML community to take full advantage of the wealth of data generated by our NIH-funded project directly, as well as the data actively being generated by the tools our project has engineered to enable distributed, fault-tolerant quantum chemistry that is rapidly populating QCArchive. We will additionally develop interfaces and dashboards to enable facile discovery, retrieval, and import of quantum chemical datasets within popular machine learning frameworks (Aim 2). To ensure our tools are speci?cally useful for the most promising AI/ML applications, we will collaborate directly with AI researchers in the OpenMM, TorchMD, and SchNetPack communities actively developing and deploying quantum machine learning (QML) potentials for biomolecular simulation, with the goal of producing generally useful tools suitable for the wider community yet capable of driving these high-priority applications.

IC Name

NATIONAL INSTITUTE OF GENERAL MEDICAL SCIENCES

Activity
R01
Administering IC
GM
Application Type
3

Direct Cost Amount
132694
Indirect Cost Amount
44992
Total Cost
177686
Sub Project Total Cost

ARRA Funded
False
CFDA Code
859
Ed Inst. Type
BIOMED ENGR/COL ENGR/ENGR STA
Funding ICs
OD:177686\
Funding Mechanism
Non-SBIR/STTR RPGs
Study Section
Study Section Name

Organization Name
UNIVERSITY OF COLORADO
Organization Department
ENGINEERING (ALL TYPES)
Organization DUNS
007431505
Organization City
Boulder
Organization State
CO
Organization Country
UNITED STATES
Organization Zip Code
803031058
Organization District
UNITED STATES

Open data-driven infrastructure for building biomolecular force fields for predictive biophysics and drug design

Information

ApplicationId

Core Project Number

Full Project Number

Serial Number

FOA Number

Sub Project Id

Project Start Date

Project End Date

Program Officer Name

Budget Start Date

Budget End Date

Fiscal Year

Support Year

Suffix

Award Notice Date

Organizations

Open data-driven infrastructure for building biomolecular force fields for predictive biophysics and drug design

IC Name

Activity

Administering IC

Application Type

Direct Cost Amount

Indirect Cost Amount

Total Cost

Sub Project Total Cost

ARRA Funded

CFDA Code

Ed Inst. Type

Funding ICs

Funding Mechanism

Study Section

Study Section Name

Organization Name

Organization Department

Organization DUNS

Organization City

Organization State

Organization Country

Organization Zip Code

Organization District