PROJECT SUMMARY/ABSTRACT Current generation molecular simulation models are insuf?ciently accurate, and current generation tools for building those models are limited, not automated, and based on aging infrastructure. Our original R01, ?Open Data-driven Infrastructure for Building Biomolecular Force Fields for Predictive Biophysics and Drug Design,? aims to solve these problems, producing a modern infrastructure for building, applying, and improving accurate molecular mechanics force ?elds. As part of our NIH-funded project, we have collaborated closely with the Molecular Sciences Software Institute (MolSSI) to use the QCArchive ecosystem to gen- erate and continuously expand very large quantum chemical datasets relevant to biomolecular systems on a variety of supercomputing resources. QCArchive now contains over 42M quantum chemical calculations for over 39M molecules, and has become incredibly popular, with over 1.79M accesses/month. Large quantum chemical datasets relevant to biomolecular systems are incredibly valuable to the AI/ML community. Data is the key element needed for both fundamental research into ML architectures and constructing predictive models for downstream use. Unfortunately, quantum chemical datasets are incredibly expensive to generate, limiting in-house generation of large, useful datasets needed to drive AI/ML research to a few large companies and researchers with access to suf?cient computing resources. While AI/ML quantum chemical methods have shown immense promise for biomolecular systems, the limited access to large, curated datasets has greatly hindered researchers from making rapid progress in this area. We aim to bridge this gap by working closely with MolSSI QCArchive developers to address robustness, scal- ability, and data delivery challenges to meet the needs of the biomolecular AI/ML community requiring access to large quantum chemistry datasets (Aim 1). Additional software developers will enable improvements to the QCArchive infrastructure to meet the rapidly growing demands of the AI/ML community. As QCArchive is primarily maintained by a single MolSSI Software Scientist, additional developers are necessary for fully enabling the AI/ML community to take full advantage of the wealth of data generated by our NIH-funded project directly, as well as the data actively being generated by the tools our project has engineered to enable distributed, fault-tolerant quantum chemistry that is rapidly populating QCArchive. We will additionally develop interfaces and dashboards to enable facile discovery, retrieval, and import of quantum chemical datasets within popular machine learning frameworks (Aim 2). To ensure our tools are speci?cally useful for the most promising AI/ML applications, we will collaborate directly with AI researchers in the OpenMM, TorchMD, and SchNetPack communities actively developing and deploying quantum machine learning (QML) potentials for biomolecular simulation, with the goal of producing generally useful tools suitable for the wider community yet capable of driving these high-priority applications.