Project Summary Over the past decade, genome-wide association studies have discovered complex disease-associated genetic variants while at the same time whole genome sequencing studies have been identifying risk alleles for Mendelian and complex diseases. These variants have the potential to shed light on human disease mechanisms. But there are several important challenges. More than 90% of complex disease associated variants lie within non-coding regions, posing a challenge of identifying relevant cell types and cell states, target genes, and regulatory mechanisms. The important task of linking these variants to genes itself can be challenging. In addition, as our ability to identify de novo and rare mutations for complex and Mendelian diseases is rapidly expanding, defining the function of those de novo alleles, which genes and pathways they affect remains uncertain. To address these challenges, we will predict the functional impact of disease risk variants at the level of individual variants, individual genes, and pathways to elucidate disease biology. In all aims of this proposal we will utilize IGVF functional genomic data. In Aim 1, we will predict the regulatory potential of variants in disease-critical cell types/states at a single base-pair resolution. We will identify pathogenic cell-states by analyzing single cell transcriptional data sets in a disease context, and then integrate single-cell epigenetic data to define the regulatory landscape of these rare disease cell-states. These regulatory regions identified in this analysis can be used to annotate variants for potential function. Finally, to understand functionality of specific variants in regulatory regions, we quantify selective pressure using large-scale whole genome sequencing data. In Aim 2, we will predict functional impacts of genes by effectively linking variants to genes. Defining causal diseases genes is critically important since they may be important for therapeutic targeting. We develop strategies to use genetic data and functional genomic data to predict downstream genes, and evaluate these methods with a set of gold-standard casual genes from Mendelian phenotypes. In Aim 3, we focus on rare and de novo mutations with large effect sizes. Here we recognize that predicting the function of these alleles requires an understanding of the pathways they effect, models to connect rare non-coding variants to genes, and strategies to define functionality of the variants based on population genetic parameters. In Aim 4, we develop a framework to synergize with the IGVF consortium to advance consortium goals, outlining our integration plan and flexible programmatic framework. The proposal represents a collaboration between Drs. Soumya Raychaudhuri, Alkes Price, and Shamil Sunyaev, bringing analytical expertise across functional genomics, single-cell data integration, and population genetics. These investigators have a history of successful collaborations with a strong publication records integrating functional genomics data with GWAS and sequencing studies to uncover disease mechanisms.