In this EArly-concept Grant for Exploratory Research (EAGER) project, artificial intelligence (AI) methods that can learn from, and make predictions on, simulations of the physics of materials will be developed. The approach in this project will constitute an extension of the capabilities of recent AI platforms, of which OpenAI's ChatGPT, Microsoft's Copilot, and Google's Gemini, are among the best known. These AI platforms have caught the public's imagination and are widely used in virtually every field of human endeavor. However, their use in science and engineering is mostly based on the AI learning from vast volumes of scientific literature in the form of text and drawing conclusions through language prediction via underlying neural network-based large language models. Going beyond this, we will develop AI methods for such large language models to learn from both simulations and mathematical equations in an expansion of the way they learn from text. This capability will make it possible for our AI platform to learn complex processes in the physics of materials and make predictions that are too intricate to be easily attained by human experts. In particular, the focus of this project will be on the physics of battery materials. Thus, in addition to advancing the frontiers of AI, this project will make important contributions to the design of future batteries for sustainable and safe energy generation. Allowing AI to learn jointly from simulations and mathematics will be a significant departure from previous text-based learning in large language models in science, and has not been demonstrated yet. This project will educate students from diverse backgrounds in developing AI. In addition to ensuring equitable access, this is of considerable importance because diverse and inclusive human input will help mitigate some of the effects of biases in AI. Furthermore, close attention will be paid to constant testing to avoid harmful output from the AI.<br/><br/>Coupled electro-chemo-mechanics in materials physics lead to emergent phenomena including phase transitions and instabilities. For materials discovery and design, it is of interest to not only solve forward problems, but also to explain what type of model best represents the observations. Such inverse problems encompass the task of inference, where the goal is to identify the mechanisms of the coupled physics that best explain data that, typically, displays time evolution. Some progress has been made in inference with applications to materials physics of batteries, bio- materials, and structural alloys, but there remains a gap. Existing methods of inference in physics select the best models from a library of candidates. While useful to explain phenomena with data- driven models, such inference does not lead to discovery of previously unknown physics. This EAGER project is to develop Generative Artificial Intelligence methods, specifically Large Language Foundation Models, that will be augmented to perform true discovery of emergent phenomena in mechanics-driven coupled physics problems. Specifically, leveraging experience gained in prior work carried out by the PIs on pre-training and fine-tuning large language models, a new direction will be sought out for multi-modal foundation models that learn directly from computational physics simulations and mathematical equations. The project has broad implications, moving toward true discovery of nonlinear mechanisms in systems with complexity that are induced by coupling with other physics. Applications of this include energy materials such as batteries, biomaterials, and other materials classes such as structural alloys. This new modality of large language models generalized to learn from simulations and mathematics is novel. It represents a significant departure from previous text learning-based uses of large language models in science, and has not been demonstrated yet. Its success will draw from the investigators’ prior experiences with autoregressive, attention-based foundation models, and their understanding of how to extend them to learning from time series and spatially related data.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.