Graph Neural Networks (GNNs) have extended Deep Neural Networks’ success from independent data points to relational data points, such as observations collected on-site from environmental sensors (e.g., humidity, temperature, PM2.5, etc.) widely distributed in different spatial locations. While most existing works focus on proof-of-concept on relatively small, well-curated data, with offline settings, real-world scientific research, and applications need more capable GNN models, which can effectively learn from large-scale, real-time, geographically distributed (geo-distributed) and diversely different (heterogeneous) data. This project aims to chart a radically new cyberinfrastructure solution for training large-spatial GNNs to fill this gap. The success of this project will provide a cyberinfrastructure that overcomes the fundamental computational and communication bottlenecks for a broad range of domain science applications that rely on massive spatiotemporal prediction. The proposed algorithms and systems will be ideal for cultivating a deeper understanding of designing large machine-learning systems at a geo-distributed scale, teaching and training students and peers, and providing graduate and undergraduate students with new courses, research, and internship opportunities.<br/> <br/>This project aims to develop a comprehensive set of graph construction and partitioning methods, distributed learning algorithms, and cyberinfrastructure designs to support large-scale GNNs for real-world spatiotemporal data in geospatial scientific research and applications. The project will address significant research challenges, including (1) formulating spatiotemporal prediction within a geographically inspired graph deep learning framework, (2) enabling highly accurate, efficient, and cost-effective spatiotemporal prediction tasks across vast, geographically dispersed datasets, and (3) integrating spatial correlation, spatial heterogeneity, spatial computing parallelism, and geographic communication efficiency. The research is organized around several key research themes: (1) Creating a universal framework for constructing graphs from spatiotemporal data, determining spatial relationships, and filling in missing node attributes. (2) Developing a centralized spatiotemporal graph learning infrastructure that leverages multiple edge micro-datacenters for collaborative GNN model learning. (3) Establishing a decentralized spatiotemporal graph learning infrastructure that supports decentralized geographical multitask learning to address spatial heterogeneity.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.