Large-scale AI models that generate text and images are transforming visual synthesis and recognition tasks. These models can not only generate realistic images from texts but also serve as backbone models for object recognition, tracking, and segmentation. Despite their capabilities, they are complex, challenging to interpret, and sometimes unpredictable. This unpredictability is problematic for safety-critical applications such as autonomous driving and healthcare. Moreover, these models can easily generate copyrighted content, produce harmful content, or amplify stereotypes. Understanding such complex models, with billions of parameters trained on billions of images, remains challenging. Key questions include understanding why certain input prompts lead to failures or artifacts, how these models might amplify biases, and how the training data can impact output quality. This project seeks to provide a systematic, interpretable, rational basis for understanding and controlling the learned computations of multimodal generative models, with the potential to increase the accountable and safe use of state-of-the-art multimodal AI models and mitigate potential harms. To effectively disseminate the research, the investigators will freely release all materials (code, models, and datasets) and host tutorials, workshops, and courses to engage with the research community, enhance students’ participation at the K12, undergraduate, and graduate levels, and engage with policymakers to inform them of the latest technology and future trends.<br/> <br/>The project aims to develop a new systematic framework for visualizing, understanding, and rewriting the learned computation of multimodal generative models and leverage this knowledge to trace how the training data used influences the internal representation and, ultimately, affects the model outputs. The project will focus on three research thrusts. First, new research methodologies will be developed to visualize the internal mechanisms and the hierarchical structures of pre-trained multimodal generative models, understand their roles in different stages of the generation process, and extract visual concepts and their relationships. Second, the project will explore several model editing algorithms to manipulate these discovered concepts and relationships to pinpoint and fix inconsistencies, failures, biases, and safety concerns of existing multimodal generative models. Finally, the investigators will develop new attribution methods to assess the influence of training images on generated results based on the analysis of internal representations and show several potential applications of the attribution algorithms.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.