Clustering algorithms are one of the most important modern tools for understanding data. Given data on various entities, clustering algorithms group entities into sets or "clusters" such that similar entities are likely to end up in the same cluster while dissimilar entities tend to end up in different clusters. For example, clustering algorithms can be used to group images together according to the contents of the image. However, modern datasets are so large that many existing clustering algorithms cannot be feasibly used. This project aims to systematically address this situation by way of new clustering algorithms that scale to massive datasets with billions of entities. Clustering is widely used by scientists, companies, and government agencies. The toolkit developed in the project will be open-sourced and will make scalable, high-performance clustering more broadly accessible to scientists and practitioners by improving the efficiency and programming productivity of their clustering tasks. Results from the project will be integrated into courses that the investigators teach, and the researchers will recruit undergraduate students to participate in the project.<br/><br/>This three-institution collaborative project investigates a new approach for clustering pointsets by constructing sparse graphs that preserve relevant properties of the pointset. By carefully leveraging high-quality near-linear work graph clustering algorithms, very large datasets can be clustered in time that is nearly linear to the number of objects in the input with high accuracy. Particular attention will be paid to new algorithms for graph clustering and construction that utilize structure observed in practice, exploit parallelism, and enable dynamism with provable accuracy guarantees. A major contribution of the project will be an end-to-end clustering toolkit for graphs and pointsets that enables clustering to be scaled to inputs with billions of objects. The investigators will collaborate through regular remote meetings and seminars, student visits, joint publications, and annual in-person workshops.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.