In epidemiology and many other disciplines, it is often of interest to evaluate an overall treatment or exposure effect on a response. Of even greater interest is to explain the underlying mechanism by which the effect of an exposure on the outcome is mediated through a casual intermediate variable or mediator. In practice, mediation analysis utilizes one or more measured mediators hypothesized to lie on the causal pathway between the exposure and the outcome. With the advancement of high-throughput data collection techniques, it has become a challenging task to process and analyze complex structure of data with the very high dimension of the potential mediators. Indeed, many existing statistical methods have been dedicated to settings with a single or low dimensional mediators and for a single level data. The focus of this research will be on developing statistical methods that can handle both multilevel data and high dimensional mediators, and that can assess heterogeneity in mediation effects across subpopulations. The resulting methods will provide new statistical tools for studying high dimensional and multilevel causal mediation effects and contribute to various fields, such as anthropology, sociology, genetics and epidemiology. The success of this project will bolster infrastructure and STEM workforce training to solve real–life scientific problems. The investigator will make special efforts to recruit and engage students, especially women and members from underrepresented groups, to work on hands-on research problems. Novel analytical tools will be implemented in an open source R package, making the tools accessible to research groups who use causal inference to study complex epigenetic information. A new introductory course and workshops in causal inference will be developed for statisticians and non-statisticians to ensure that the students have an earlier exposure to causal concepts and can quickly assimilate into the workforce after graduation.<br/> <br/>This project will provide new statistical tools for studying high dimensional and multilevel causal mediation analysis that will lead to the development of a multi-disciplinary research, education and career preparation workshop, henceforth referred to as the Bayesian Statistics Training (BST) program. The novel Bayesian methods will be developed to estimate the mediation effect for complex dataset with high–dimensional mediators by simultaneously analyzing a large number of mediators and characterizing their joint mediation effect without making any path-specific or ordering assumptions on mediators. The methods will assess possible heterogeneity of mediation effect associated with different subpopulation and will offer flexibility by allowing the number of subpopulations to be inferred from the data while retaining interpretability in practical settings. The PI will illustrate the utility of the Bayesian framework in statistical mediation analyses to several epigenetic mechanisms studies to assess the causal mechanisms between an exposure and outcome through DNA methylation.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.