The advancement of cloud-computing infrastructure and machine-learning algorithms have enabled transformative techniques that push the boundaries of various domains, ranging from automated drug design to natural-language understanding. However, understanding the full software/hardware stack remains a grand challenge for domain experts in developing scalable domain-specific machine-learning models, especially when the application data is of inherently non-relational representations. The project’s novelties are to explore, design, and implement an end-to-end system that delivers efficient and effective management of probabilistic graphs, which can serve as a general data abstraction in a variety of domains (e.g., social network, bioinformatics, sensing and communication, to name a few). The probabilistic graph model not only captures complicated correlations among real-world entities but also quantifies the intensities of correlations or influences among them. The project’s impacts are that it addresses important missing pieces from both theory and system practices to support probabilistic graph management in a systematic, inductive, and verifiable way. <br/> <br/>This planning-grant project investigates an end-to-end probabilistic graph management system that promises efficient probabilistic graph learning, representation, aggregation, and analysis with quality guarantees in a scalable distributed setting. The exploration focuses on the full software/hardware stack of probabilistic-graph management, including designing formal probabilistic-graph definition/manipulation abstractions, and the provable compiling process of inductive constraints with guaranteed correctness and efficiency of pipelining execution in a distributed setting. This computing framework can serve as a general-purpose probabilistic-graph analysis tool that benefits different research domains by discovering and understanding the complex correlations among real-world entities in a more comprehensive and transformative way. Besides this advantage, the outcomes of this project, such as open-source software, publications, and workshop tutorials, could benefit data-management research, decision-making processing in general for the industry (sensing-based automatic operations, e.g., auto-piloting, self-driving), and the government (data-driven policymaking, e.g., public health/global trading monitoring). Furthermore, products from this project can be integrated to enrich the curriculum development of undergraduate/graduate-level courses (with course projects related to cloud computing, data management, and machine learning) and therefore train/benefit a rich body of underrepresented students (including minority/female students) at the investigators' institutions.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.