There has been a promising explosion in the production and analysis of digital data from experimental and observational sources, which presents many opportunities for machine learning (ML) driven scientific discovery in high-impact applications such as chemistry (cheminformatics) and biology (bioinformatics). Unfortunately, current ML methodology often fails to properly characterize data markedly distinct from what was seen during training (i.e., extrapolation). This, in turn, hampers our ability to make scientific discoveries that truly extend past our current knowledge. For example, this is of great consequence in chemical virtual screening campaigns, where one hopes to use ML predictions to guide potential targets for expensive real-world experimentation (e.g., in drug discovery applications). Poor extrapolative power of ML models can result in false positives, wasting time and resources through costly synthesis and experimental testing of novel chemical entities. The work stemming from this award will improve the real-world utility of ML models in scientific domains and prevent the faulty use of model predictions. The project also provides research training opportunities for graduate students. <br/><br/>This project develops various methodologies to more accurately assess the reliability of ML predictions on novel inputs and improve models' extrapolatory capabilities. First, the project develops empirical trials to more accurately evaluate the extrapolative capabilities of ML model fitting procedures on domains that lie beyond the training set distributional support. Second, utilizing extrapolative assessments, the project develops techniques to thoroughly explore the input space of possible extrapolation to anticipate and filter out likely unreliable predictions. Lastly, the project builds methodology to guide the acquisition of new training data that, once trained on, will improve model extrapolation.<br/><br/>This award by the Division of Mathematical Sciences is jointly supported by the NSF Office of Advanced Cyberinfrastructure.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.