Collaborative Research: Elements: TUPS: Table Understanding for Paleoclimate Studies

Information

NSF Award
2411267

Owner

University of Southern California

Award Id
2411267
Award Effective Date
9/1/2024 - a year ago
Award Expiration Date
8/31/2027 - a year from now
Award Amount
$ 406,993.00
Award Instrument
Standard Grant

Information

Collaborative Research: Elements: TUPS: Table Understanding for Paleoclimate Studies

To better understand how climate may change in the future, scientists look at how it has varied in the past, a field called paleoclimatology. Doing so requires the use of proxies such as the rings of trees or the chemical composition of ice cores to infer climate variability over thousands to millions of years. Given the importance of these datasets, it is crucial that scientists have the ability to efficiently locate, access, and integrate data. Paleoclimate studies often require the integration of hundreds of datasets that are stored in unstructured files, currently requiring scientists to spend a significant fraction of their time manually reformatting the data before they can work with it. The goal of this project is to use artificial intelligence to identify tables in files automatically so these tables can be used more easily in analyses. The end goal is to make more data available to scientists, to reduce the time scientists spend on data wrangling, to decrease the time it takes to obtain scientific results, and to increase the reproducibility of scientific analyses performed by different research groups. Working with collaborators from the hydroclimate community, the project contributes to the understanding of how water resources have changed over the last two thousand years. The Table Understanding for Paleoclimate Studies project aims to leverage table understanding machine learning mechanisms to greatly reduce the time scientists spend searching, wrangling, and annotating paleoclimate datasets prior to analysis. The resulting deep learning model will be integrated into a Python toolbox that will allow scientists to identify and retrieve tables from unstructured files into their computational environment along with relevant metadata. A primary use case for this project is to extract tabular paleoclimate data from the National Oceanic and Atmospheric Administration World Data Service for Paleoclimatology and the PANGAEA repositories. Over the past decade, these archived centers have spent a significant effort standardizing and updating the metadata of the datasets in their possession, but many datasets remain difficult to extract programmatically. In addition, this project builds on an existing recommender system for paleoclimate datasets that will be updated with community-curated datasets to help with metadata annotation and correction. TUPS will directly engage with the paleoclimate community through tutorials, hackathons, and a workshop to ensure that tools and training materials meet the community scientific needs and to help train the next generation of paleoclimate scientists. This award by the Office of Advanced Cyberinfrastructure is jointly supported by the National Discovery Cloud for Climate initiative within the Directorate for Computer and Information Science and Engineering and by Geosciences Directorate’s Research, Innovation, Synergies, and Education and Atmospheric and Geospace Sciences divisions. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Program Officer
Marlon Piercempierce@nsf.gov7032927743
Min Amd Letter Date
7/12/2024 - a year ago
Max Amd Letter Date
7/12/2024 - a year ago
ARRA Amount

Institutions

Name
University of Southern California
City
LOS ANGELES
State
CA
Country
United States
Address
3720 S FLOWER ST FL 3
Postal Code
90033
Phone Number
2137407762

Investigators

First Name
Jay
Last Name
Pujara
Email Address
jpujara@isi.edu
Start Date
7/12/2024 12:00:00 AM

First Name
Deborah
Last Name
Khider
Email Address
khider@usc.edu
Start Date
7/12/2024 12:00:00 AM

Program Element

Text
Paleoclimate
Code
153000

Text
GEO CI - GEO Cyberinfrastrctre

Text
NDCC-Natl Discvry Cloud Climat

Text
Software Institutes
Code
800400

Program Reference

Text
INTERDISCIPLINARY PROPOSALS
Code
4444

Text
Software Institutes
Code
8004

Collaborative Research: Elements: TUPS: Table Understanding for Paleoclimate Studies

Information

Owner

Award Id

Award Effective Date

Award Expiration Date

Award Amount

Award Instrument

Collaborative Research: Elements: TUPS: Table Understanding for Paleoclimate Studies

Program Officer

Min Amd Letter Date

Max Amd Letter Date

ARRA Amount

Institutions

Name

City

State

Country

Address

Postal Code

Phone Number

Investigators

First Name

Last Name

Email Address

Start Date

First Name

Last Name

Email Address

Start Date

Program Element

Text

Code

Text

Text

Text

Code

Program Reference

Text

Code

Text

Code