Computational notebooks have become a cornerstone of the scientific computing enterprise, providing an interactive means to acquire and communicate insights, share discoveries, and visualize experimental outcomes. However, computational notebooks today are most suited for small-scale explorations carried out on a single computer, and are quite difficult to use for large-scale computations on high performance computing clusters or commercial clouds. This project will develop NBFlow, a software toolkit for converting notebook computations into workflows that are feasible to execute efficiently on clusters or clouds. This will make it possible to use notebook technologies in conjunction with high performance clusters to enable new discoveries in scientific fields such as high energy physics and geosciences. These technologies will be used to develop new educational curricula, outreach activities for K-12 students, and research experiences for college students.<br/><br/>NBFlow will support and advance the use of computational notebooks in scientific research and data analysis by bridging the gap between interactive computation and distributed cyberinfrastructure developed for data-intensive sciences. Today's notebook environments provide easy access to standard artificial intelligence and machine learning toolkits for processing vast datasets with greater efficiency and accuracy compared to conventional methods. However, migrating a notebook from a scratchpad-like analysis to a robust pipeline, which must be distributed across a cluster or cloud infrastructure, currently requires significant efforts by developers and scientists. NBFlow will build upon existing NSF investments in the areas of containerization and workflows that will record notebook executions and schedule tasks for concurrent execution. By experimenting with an integrated notebook-workflow system, this cutting-edge research will advance understanding in data management and distributed computing sub-fields. At the same time, the project will produce novel techniques to robustly capture provenance from notebook-based workflows, a rich source of data in itself, as well as put techniques developed for incremental computation to practice. These technologies will be deployed with active user communities in high energy physics at multiple facilities and in geospatial and sustainability sciences through the I-GUIDE cyberinfrastructure.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.