Haplotype inference and allele-specific transcript expression quantification are two fundamental<br/>problems in genetics and genomics. Haplotype inference aligns maternal and paternal alleles of<br/>genetic variants along two diploid chromosomes, whereas allele-specific expression<br/>quantification obtains the expression levels of transcripts of maternal and paternal origins from<br/>RNA-seq reads. These two problems are coupled in that one can affect the accuracy of the<br/>other: accurate allele-specific expression quantification requires accurate haplotypes to map<br/>RNA-seq reads to and the accuracy of haplotype inference can be enhanced by allele-specific<br/>RNA-seq reads. While existing works have considered these two problems separately, this project<br/>develops a computational framework to address these two fundamental problems jointly in a<br/>single statistical framework to enhance the accuracy of both inferred haplotypes and<br/>allele-specific expression quantification. The computational methods to be developed in this<br/>research will advance various aspects of biological research that require accurate allele-specific<br/>expression estimates and haplotypes, including mapping allele-specific eQTLs, detecting<br/>imprinted genes, imputing untyped variants, finding signatures of natural selection, and<br/>detecting recombination events. The outcome of the research will be used in outreach activities<br/>in minority serving institutions to recruit graduate students.<br/><br/>The project develops a computational framework for obtaining accurate allele-specific<br/>expression measurements and haplotypes from RNA-seq and genotype data. Two existing<br/>frameworks, one for transcript expression quantification and the other for haplotype inference<br/>(e.g., Beagle), are combined into a single framework, while keeping the computational efficiency<br/>of the original frameworks. Each of these two existing frameworks is modified to address two<br/>previously-unmet challenges regarding allele-specific reads: for the RNA-seq quantification, the<br/>project develops a mathematically rigorous approach to obtaining identifiable allele-specific<br/>expression estimates at gene level, at transcript-set level, or at individual transcript level,<br/>whereas for haplotype inference, the project couples the model in Beagle with RNA-seq<br/>quantification methods of these investigators to jointly estimate identifiable allele-specific expression levels and<br/>haplotypes that are consistent with each other. The computational methods are benchmarked<br/>on allele-specific eQTL mapping, using genotypes and RNA-seq reads from human trios and<br/>LG/SM intercross mice with known haplotypes. The outcome of the research is available at<br/>http://www.cs.cmu.edu/~sssykim.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.