Collaborative Research: Elements: TUPS: Table Understanding for Paleoclimate Studies

Information

  • NSF Award
  • 2411267
Owner
  • Award Id
    2411267
  • Award Effective Date
    9/1/2024 - a year ago
  • Award Expiration Date
    8/31/2027 - a year from now
  • Award Amount
    $ 406,993.00
  • Award Instrument
    Standard Grant

Collaborative Research: Elements: TUPS: Table Understanding for Paleoclimate Studies

To better understand how climate may change in the future, scientists look at how it has varied in the past, a field called paleoclimatology. Doing so requires the use of proxies such as the rings of trees or the chemical composition of ice cores to infer climate variability over thousands to millions of years. Given the importance of these datasets, it is crucial that scientists have the ability to efficiently locate, access, and integrate data. Paleoclimate studies often require the integration of hundreds of datasets that are stored in unstructured files, currently requiring scientists to spend a significant fraction of their time manually reformatting the data before they can work with it. The goal of this project is to use artificial intelligence to identify tables in files automatically so these tables can be used more easily in analyses. The end goal is to make more data available to scientists, to reduce the time scientists spend on data wrangling, to decrease the time it takes to obtain scientific results, and to increase the reproducibility of scientific analyses performed by different research groups. Working with collaborators from the hydroclimate community, the project contributes to the understanding of how water resources have changed over the last two thousand years. <br/><br/>The Table Understanding for Paleoclimate Studies project aims to leverage table understanding machine learning mechanisms to greatly reduce the time scientists spend searching, wrangling, and annotating paleoclimate datasets prior to analysis. The resulting deep learning model will be integrated into a Python toolbox that will allow scientists to identify and retrieve tables from unstructured files into their computational environment along with relevant metadata. A primary use case for this project is to extract tabular paleoclimate data from the National Oceanic and Atmospheric Administration World Data Service for Paleoclimatology and the PANGAEA repositories. Over the past decade, these archived centers have spent a significant effort standardizing and updating the metadata of the datasets in their possession, but many datasets remain difficult to extract programmatically. In addition, this project builds on an existing recommender system for paleoclimate datasets that will be updated with community-curated datasets to help with metadata annotation and correction. TUPS will directly engage with the paleoclimate community through tutorials, hackathons, and a workshop to ensure that tools and training materials meet the community scientific needs and to help train the next generation of paleoclimate scientists. <br/><br/>This award by the Office of Advanced Cyberinfrastructure is jointly supported by the National Discovery Cloud for Climate initiative within the Directorate for Computer and Information Science and Engineering and by Geosciences Directorate’s Research, Innovation, Synergies, and Education and Atmospheric and Geospace Sciences divisions.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

  • Program Officer
    Marlon Piercempierce@nsf.gov7032927743
  • Min Amd Letter Date
    7/12/2024 - a year ago
  • Max Amd Letter Date
    7/12/2024 - a year ago
  • ARRA Amount

Institutions

  • Name
    University of Southern California
  • City
    LOS ANGELES
  • State
    CA
  • Country
    United States
  • Address
    3720 S FLOWER ST FL 3
  • Postal Code
    90033
  • Phone Number
    2137407762

Investigators

  • First Name
    Jay
  • Last Name
    Pujara
  • Email Address
    jpujara@isi.edu
  • Start Date
    7/12/2024 12:00:00 AM
  • First Name
    Deborah
  • Last Name
    Khider
  • Email Address
    khider@usc.edu
  • Start Date
    7/12/2024 12:00:00 AM

Program Element

  • Text
    Paleoclimate
  • Code
    153000
  • Text
    GEO CI - GEO Cyberinfrastrctre
  • Text
    NDCC-Natl Discvry Cloud Climat
  • Text
    Software Institutes
  • Code
    800400

Program Reference

  • Text
    INTERDISCIPLINARY PROPOSALS
  • Code
    4444
  • Text
    Software Institutes
  • Code
    8004