Rapid advances in sensing technology and computer simulation have generated vast amounts of 3D surface data in various scientific domains, from high-resolution geographic terrains to electrostatic surfaces of proteins. Analyzing such emerging 3D surface big data provides scientists an opportunity to study problems that were not possible before, such as mapping detailed surface water flow and distribution for the entire continental US. Despite its vast transformative potential, machine learning tools to analyze large volumes of 3D surface data are not readily available. The project aims to fill this gap by designing a novel parallel spatial machine learning framework for 3D surface topology and implementing the system in a distributed computing environment. The system can produce high-quality observation-based flood inundation maps derived from satellite images. In collaboration with federal agencies (e.g., U.S. Geological Survey, NOAA), the project will enhance situational awareness for flood disaster response and improve flood forecasting capabilities of the NOAA National Water Model by filling in the gap of lacking observations in model calibration and validation. The proposed software tools will be open-source to enhance the research infrastructure for the broad geoscience communities. Educational activities include curriculum development, mentoring a group of high school students in data science seminars at K-12 Summer Camps, and year-long projects for selected high school students in regional Science Fair competitions. <br/><br/>The project will transform spatial machine learning research by enhancing terrain awareness through modeling large-scale 3D surface topology. Specifically, the project will bring about the following cyberinfrastructure innovations. First, the project will design a topography-aware spatial probabilistic model called hidden Markov contour forest, which advances existing machine learning tools by incorporating physical constraints of heterogeneous 3D terrains into zonal tree structures in the model representation. Second, the project will investigate a parallel inference framework by decomposing both intra-zone dependency and inter-zone dependency. Finally, the project will implement the proposed parallel learning framework in a distributed computing environment by addressing challenges related to task partitioning, load balancing, and dynamic task scheduling. The proposed system will be deployed for real-world rapid flood disaster response and the validation and calibration of the National Water Model through collaboration with the U.S. Geological Survey and NOAA.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.