The digital revolution has led to a rapid expansion of available data sources. Data integration methods such as record linkage are critically important in generating actionable insights from the combination of data sources in a cost-effective manner. Record linkage merges multiple files by identifying matching records associated with the same entity. Without unique identifiers, this process is often error-prone, bearing the risk of false and missed links, which can compromise data analysis based on the merged file. Building on established implementations of record linkage, this project develops software in popular data science programming languages that addresses potential errors and uncertainty in data analysis performed post-linkage. The development of the software is driven by the needs of federal statistical agencies, researchers and practitioners in health services, and other stakeholders seeking to harness record linkage to unravel the potential of their data. The resulting additional capabilities for linked data analysis support informed decision-making in public administration and health services and enable savings in data collection and human labor. The project provides research opportunities and support for graduate students and delivers educational materials on the record linkage and analysis pipeline, thereby contributing to workforce development and the training of future data scientists.<br/><br/>Existing software for record linkage is almost entirely dedicated to aspects pertaining to the creation of linked files, leaving a gap in supporting downstream tasks. This project fills this gap by providing software supporting post-linkage data analysis, an evolving subject with various open problems requiring novel statistical and computational tools. Guided by use cases from several application domains, the investigators intend to unify and refine state-of-the-art approaches to post-linkage data analysis in primary and secondary analysis settings given information of varying degrees about the underlying linkage process. The project develops scalable, robust and user-friendly implementations of these approaches embedded in a modular software that is envisioned to ease burdens for users of linked data, enable the validity of analyses based on such data, and propel advances in the field of data integration. The latter is achieved by generating capabilities for validation, benchmarking, and inspiring new avenues of methodological research.<br/><br/>This Office of Advanced Cyberinfrastructure award is jointly funded by the Division of Social and Economic Sciences (SES) in the Social, Behavioral and Economic Sciences Directorate (SBE).<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.