RESOURCE INFORMATICS ? PROJECT SUMMARY The creation, advancement and maintenance of the GENCODE resource requires both adherence to and optimization of defined processes that ensure the genome annotation created now and in the future will always be of the same or better standard compared to what has already been created. The GENCODE resource must also be attuned to the new technologies and opportunities that arise as the field of genomics evolves. A primary objective of the GENCODE resource is to ensure quality control (QC) and data validation of annotations. Ensembl will compare the GENCODE gene set to other gene sets (e.g. UniProt) to check for missing genes or transcripts; CNIO will validate the coding genes; the CNIO/CNIC proteomics pipeline will validates the gene models; CNIO/CNIC will perform manual verification for QC of proteomics data. Project stability will be ensured through a well-maintained computational infrastructure, adequate QC processes that will ensure the highest possible quality, as well as regular releases of freely available annotation in high value formats. The annotation curation for human and mouse will be completed, in particular the existing human partial transcript models will be extended to full length, expanding the human lncRNA annotation, as well as the completion of the initial full pass of the mouse annotation. GENCODE will incorporate individual genome representation and population data represented by available human variation data at both the sequence level (e.g. 1000 Genomes) and at the transcriptomic level (e.g. GTEx), and by the 16 mouse strain genomes produced by the Mouse Genomes Project led by the WTSI. Data from individuals and populations will be annotated. A personal genome resource will be developed, which will produce an accurate representation of an individual's gene set. Two pilot projects will help to define the most effective way to support future GENCODE annotations. The first pilot project will use GENCODE's experience in developing population reference genome graphs to pilot a scalable and potentially universal approach to population based genome annotation. The second pilot project will focus on connecting regulatory regions to regulated genes. GENCODE will enhance the current annotation of genes with their regulatory elements so that the annotation is dependent on tissue and cell type. The demand for manual annotation of transcripts across strains and species may outstrip GENCODE's ability to provide such services via existing mechanisms, therefore a system to enable the submission of annotated data will be developed. The described measures will ensure that GENCODE in 2020 will be significantly more valuable for research and clinical applications in genomics than today.