Extrapolative Analyses for Reliable Machine Learning Driven Scientific Discovery

Information

NSF Award
2324394

Owner

University of North Carolina at Chapel Hill

Award Id
2324394
Award Effective Date
9/1/2023 - 9 months ago
Award Expiration Date
8/31/2026 - 2 years from now
Award Amount
$ 391,595.00
Award Instrument
Continuing Grant

Information

Extrapolative Analyses for Reliable Machine Learning Driven Scientific Discovery

There has been a promising explosion in the production and analysis of digital data from experimental and observational sources, which presents many opportunities for machine learning (ML) driven scientific discovery in high-impact applications such as chemistry (cheminformatics) and biology (bioinformatics). Unfortunately, current ML methodology often fails to properly characterize data markedly distinct from what was seen during training (i.e., extrapolation). This, in turn, hampers our ability to make scientific discoveries that truly extend past our current knowledge. For example, this is of great consequence in chemical virtual screening campaigns, where one hopes to use ML predictions to guide potential targets for expensive real-world experimentation (e.g., in drug discovery applications). Poor extrapolative power of ML models can result in false positives, wasting time and resources through costly synthesis and experimental testing of novel chemical entities. The work stemming from this award will improve the real-world utility of ML models in scientific domains and prevent the faulty use of model predictions. The project also provides research training opportunities for graduate students. This project develops various methodologies to more accurately assess the reliability of ML predictions on novel inputs and improve models' extrapolatory capabilities. First, the project develops empirical trials to more accurately evaluate the extrapolative capabilities of ML model fitting procedures on domains that lie beyond the training set distributional support. Second, utilizing extrapolative assessments, the project develops techniques to thoroughly explore the input space of possible extrapolation to anticipate and filter out likely unreliable predictions. Lastly, the project builds methodology to guide the acquisition of new training data that, once trained on, will improve model extrapolation. This award by the Division of Mathematical Sciences is jointly supported by the NSF Office of Advanced Cyberinfrastructure. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Program Officer
Yong Zengyzeng@nsf.gov7032927299
Min Amd Letter Date
8/14/2023 - 10 months ago
Max Amd Letter Date
8/14/2023 - 10 months ago
ARRA Amount

Institutions

Name
University of North Carolina at Chapel Hill
City
CHAPEL HILL
State
NC
Country
United States
Address
104 AIRPORT DR STE 2200
Postal Code
275995023
Phone Number
9199663411

Investigators

First Name
Junier
Last Name
Oliva
Email Address
joliva@cs.unc.edu
Start Date
8/14/2023 12:00:00 AM

First Name
Alexander
Last Name
Tropsha
Email Address
alex_tropsha@unc.edu
Start Date
8/14/2023 12:00:00 AM

Program Element

Text
CDS&E-MSS
Code
8069

Text
CDS&E
Code
8084

Program Reference

Text
NSCI: National Strategic Computing Initi

Text
Machine Learning Theory

Text
COMPUTATIONAL SCIENCE & ENGING
Code
9263

Extrapolative Analyses for Reliable Machine Learning Driven Scientific Discovery

Information

Owner

Award Id

Award Effective Date

Award Expiration Date

Award Amount

Award Instrument

Extrapolative Analyses for Reliable Machine Learning Driven Scientific Discovery

Program Officer

Min Amd Letter Date

Max Amd Letter Date

ARRA Amount

Institutions

Name

City

State

Country

Address

Postal Code

Phone Number

Investigators

First Name

Last Name

Email Address

Start Date

First Name

Last Name

Email Address

Start Date

Program Element

Text

Code

Text

Code

Program Reference

Text

Text

Text

Code