1 Project Summary Evidence suggests that exposure to Superfund chemicals contributes to adverse pregnancy outcomes (APOs), including preterm birth (PTB). Rates of PTB and infant mortality in Puerto Rico (PR) are among the highest of all US states and territories. There are 18 Superfund sites in PR, and evidence of contamination of the drinking water is extensive. Moreover, extreme weather events (hurricanes, ?ooding) may result in elevated exposures to Superfund chemicals. The PROTECT center has brought together researchers from Northeastern University, the University of Puerto Rico, University of Georgia, and the University of Michigan to provide much needed understanding of the relationship and the mechanisms by which exposure to suspect chemicals contribute to APOs, and to develop new methods to reduce risk of exposure in PR and beyond. To do this, PROTECT uses a source-to-outcome structure, integrating epidemiological, toxicological, fate and transport, and remediation studies, a uni?ed sampling infrastructure, a centralized indexed data repository, and a sophisticated data management system. Since its inception in 2010, PROTECT has built detailed and extensive datasets on environmental conditions and prenatal conditions of pregnant mothers (exposure, socioeconomic and health data?close to more than 2400 data points per partic- ipant), yielding a rich dataset collected from a cohort of over 2000 expectant mothers and their children. The PROTECT Data Management and Analytics Core (DMAC) manages data centrally for the entire Center, while providing both analytics support to evaluate quantitative and qualitative data. The DMAC provides a comprehensive set of Data Dictionaries for this dataset and developed protocols for proper handling of sensitive data. The PROTECT database has the potential to help unlock the relationships that can tie environmental factors to preterm birth outcomes. The dataset is primed to leverage powerful AI/ML toolsets to help identify and establish these relationships. But before we can start leveraging these powerful tools, speci?c challenges must be overcome to prepare the data. These challenges include the degree of missing data in our dataset, and the inherent class imbalance in our data. This project will address these issues head on, and utilize feedback and experience developed through hold a series of hackathons that explore these datasets. The result should be the production of a suite of datasets ready for AI/ML tools, and the delivery a number of new open source toolsets to addressing missingness and imbalance in data.