There is considerable scientific and societal interest in better understanding the terrestrial carbon cycle: how carbon dioxide is taken out of the air by plants, moves through ecosystems (i.e., fluxes), is stored in different plant and soil pools, and is released back to the atmosphere. Researchers need to better understand the variability and predictability of the cycle over the short-term (for carbon inventory monitoring, reporting, and verification), medium-term (for developing natural climate solutions), and long-term (to understand climate stabilizing feedbacks), and across spatial scales from individual sites to continents. This project aims to better understand carbon variability and predictability through the generation and analysis of a new North American terrestrial carbon cycle data tool that harmonizes information coming from an unprecedented volume and variety of on-the-ground measurements, data from satellites, and mathematical models of how the carbon cycle works. These analyses will provide new insights into long-standing questions such as: (1) How do different carbon pools and fluxes vary across space, time, and in response to environmental variables like temperature, precipitation, land use, and topography? (2) Under what conditions are mathematical models most/least reliable? (3) Where are the gaps in existing data-collection networks? (4) How far into the future are different carbon pools/fluxes predictable and which sources of uncertainty most limit predictability? To facilitate uptake this team of researchers will work with the US Forest Service (USFS) to incorporate these data products into federal carbon accounting efforts and seek certification of these open-source technologies for use in the voluntary carbon markets. Finally, this project will contribute to the training of two graduate students and four undergraduates, with the latter recruited through an environmental data science program focused on increasing American Indian/Alaskan Native involvement in STEM.<br/><br/>This project will produce a carbon cycle “reanalysis” product based on the iterative model–data assimilation approaches commonly employed in numerical weather forecasting to harmonize process-based mathematical models with new observations. Specifically, the PEcAn terrestrial carbon data assimilation and forecasting system will be expanded to integrate twelve new bottom-up field data constraints from the National Ecological Observatory Network (NEON), five data constraints from the USFS Forest Inventory, and Ameriflux eddy-covariance tower and ancillary data. These bottom-up constraints will “anchor” PEcAn’s existing assimilation, which is based on optical, lidar, and microwave remote sensing. To support this, existing data assimilation approaches will be refined to not only jointly estimate pools and fluxes, but to also capture spatiotemporal variability in model parameters and employ hybrid machine learning approaches to understand the variability in model residual error. Finally, the variability in the continental-scale reanalysis product will be re-analyzed against a range of explanatory variables and across multiple time scales, spatial scales, and prediction lead times. Integrated into the research project will be training opportunities for graduate and undergraduate students.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.