This project aims to advance the state of the art in analyzing data generated through high-throughput RNA sequencing experiments by developing cutting-edge software that addresses the current challenges in transcriptome assembly and gene annotation. RNA sequencing has become a vital method for understanding gene expression across various cell types and conditions, leading to discoveries of new genes and splice variants in a wide range of species. However, the increasing volume of data from large-scale sequencing experiments demands more efficient and precise computational methods. This project seeks to create innovative algorithms to improve the accuracy and scalability of computational methods for assembling the data from these experiments, thereby producing more accurate measurements of the genes and transcripts present in any tissue sample. By tackling these challenges, the project promises significant advancements in the understanding of gene expression and transcriptional activity, benefiting a wide range of scientific research. Additionally, by leveraging data from previous experiments in a new way, it will provide a cost-saving opportunity by reducing the number of samples required for sequencing.<br/><br/>The project will focus on three key areas to overcome the limitations of current RNA-seq analysis methods. First, a scalable approach for assembling transcripts from large RNA-seq datasets will be developed by constructing a "universal splice graph" that captures all valid alignments and ensures consistent transcript structures across samples. Second, a new model of transcriptional noise will be introduced, which aggregates data from multiple experiments to distinguish genuine transcripts from background noise, enhancing the precision of transcript quantification. Third, collections of universal splicing graphs will be generated to represent transcriptional activity across different species, cell types, and tissues. These methodologies leverage the extensive RNA-seq data available today and provide essential tools for identifying structural variations in transcripts and performing differential expression analyses. The proposed software will be open-source, facilitating widespread adoption and furthering research capabilities in computational biology. All software will be freely available from https://ccb.jhu.edu/software/stringtie and on a public github archive.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.