Project Summary The GENCODE consortium creates foundational reference genome annotation for the human and mouse genomes in which all features are identified and classified with high accuracy based on biological evidence, and then freely released for the benefit of biomedical research and genome interpretation. GENCODE seeks to create annotation that increases the understanding of genome function in both human and mouse by prioritizing human disease genes and respecting the role of mouse as the major mammalian model organism. To effectively annotate genomes, GENCODE has created a suite of tools and draws on deep expertise across its partners across four fundamental components: 1) a comprehensive gene annotation pipeline leveraging manual and computational annotation; 2) a set of computational methods to evaluate and enhance gene annotation; 3) experimental pipelines targeted to expressed sequences less detectable in standard protocols; and 4) a machine learning capacity to improve all facets of the project. GENCODE will maintain a major focus on protein-coding and non-coding loci, including their alternatively spliced isoforms and pseudogenes and will extend expert manual review to small non-coding RNAs (ncRNA) and the annotation of non-polyadenylated transcripts. GENCODE will also expand regulatory annotation to a defined set of gene-associated features to more accurately reflect the interconnections between regulatory regions, including those with transcribed sequences such as ncRNA, and overall transcriptional output. GENCODE will take advantage of the increasing maturity of genomics technology including long-read transcriptome sequencing, functional genomics assays, and graph- based genome representations to identify features such as genes, pseudogenes, exons and splice sites that are incorrect, incomplete or in genome regions simply not present in the current reference assembly. More specifically, in the next four years GENCODE plans to 1) extend its human and mouse gene sets to as near completion as possible given available data and current experimental technology; 2) leverage new, high-quality human genome assemblies and targeted transcriptomic data to expand representation so that more human haplotypes will have high-quality annotation 3) annotate gene-associated regulatory regions including enhancer- promoter connections 4) collaborate with other resources to ensure a consistent representation of genic and regulatory features and reference transcripts for reporting clinical variation; and 5) distribute GENCODE annotations and engage with community annotation efforts to ensure accuracy and consistency. Primary GENCODE data will continue to be available from the Ensembl and UCSC Genome Browsers and the GENCODE web site. We will develop new mechanisms for effective two-way outreach, training and communication with the community with the long-term aim of establishing GENCODE as the standard annotation set for research and clinical genomics applications.