To better understand how climate may change in the future, scientists look at how it has varied in the past, a field called paleoclimatology. Doing so requires the use of proxies such as the rings of trees or the chemical composition of ice cores to infer climate variability over thousands to millions of years. Given the importance of these datasets, it is crucial that scientists have the ability to efficiently locate, access, and integrate data. Paleoclimate studies often require the integration of hundreds of datasets that are stored in unstructured files, currently requiring scientists to spend a significant fraction of their time manually reformatting the data before they can work with it. The goal of this project is to use artificial intelligence to identify tables in files automatically so these tables can be used more easily in analyses. The end goal is to make more data available to scientists, to reduce the time scientists spend on data wrangling, to decrease the time it takes to obtain scientific results, and to increase the reproducibility of scientific analyses performed by different research groups. Working with collaborators from the hydroclimate community, the project contributes to the understanding of how water resources have changed over the last two thousand years. <br/><br/>The Table Understanding for Paleoclimate Studies project aims to leverage table understanding machine learning mechanisms to greatly reduce the time scientists spend searching, wrangling, and annotating paleoclimate datasets prior to analysis. The resulting deep learning model will be integrated into a Python toolbox that will allow scientists to identify and retrieve tables from unstructured files into their computational environment along with relevant metadata. A primary use case for this project is to extract tabular paleoclimate data from the National Oceanic and Atmospheric Administration World Data Service for Paleoclimatology and the PANGAEA repositories. Over the past decade, these archived centers have spent a significant effort standardizing and updating the metadata of the datasets in their possession, but many datasets remain difficult to extract programmatically. In addition, this project builds on an existing recommender system for paleoclimate datasets that will be updated with community-curated datasets to help with metadata annotation and correction. TUPS will directly engage with the paleoclimate community through tutorials, hackathons, and a workshop to ensure that tools and training materials meet the community scientific needs and to help train the next generation of paleoclimate scientists. <br/><br/>This award by the Office of Advanced Cyberinfrastructure is jointly supported by the National Discovery Cloud for Climate initiative within the Directorate for Computer and Information Science and Engineering and by Geosciences Directorate’s Research, Innovation, Synergies, and Education and Atmospheric and Geospace Sciences divisions.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.