GENCODE: comprehensive reference genome annotation for human and mouse

Information

  • Research Project
  • 10186568
  • ApplicationId
    10186568
  • Core Project Number
    U24HG007234
  • Full Project Number
    2U24HG007234-09
  • Serial Number
    007234
  • FOA Number
    PAR-20-100
  • Sub Project Id
  • Project Start Date
    4/1/2013 - 12 years ago
  • Project End Date
    6/30/2025 - 3 months ago
  • Program Officer Name
    GILCHRIST, DANIEL A
  • Budget Start Date
    9/14/2021 - 4 years ago
  • Budget End Date
    6/30/2022 - 3 years ago
  • Fiscal Year
    2021
  • Support Year
    09
  • Suffix
  • Award Notice Date
    9/14/2021 - 4 years ago

GENCODE: comprehensive reference genome annotation for human and mouse

Project Summary The GENCODE consortium creates foundational reference genome annotation for the human and mouse genomes in which all features are identified and classified with high accuracy based on biological evidence, and then freely released for the benefit of biomedical research and genome interpretation. GENCODE seeks to create annotation that increases the understanding of genome function in both human and mouse by prioritizing human disease genes and respecting the role of mouse as the major mammalian model organism. To effectively annotate genomes, GENCODE has created a suite of tools and draws on deep expertise across its partners across four fundamental components: 1) a comprehensive gene annotation pipeline leveraging manual and computational annotation; 2) a set of computational methods to evaluate and enhance gene annotation; 3) experimental pipelines targeted to expressed sequences less detectable in standard protocols; and 4) a machine learning capacity to improve all facets of the project. GENCODE will maintain a major focus on protein-coding and non-coding loci, including their alternatively spliced isoforms and pseudogenes and will extend expert manual review to small non-coding RNAs (ncRNA) and the annotation of non-polyadenylated transcripts. GENCODE will also expand regulatory annotation to a defined set of gene-associated features to more accurately reflect the interconnections between regulatory regions, including those with transcribed sequences such as ncRNA, and overall transcriptional output. GENCODE will take advantage of the increasing maturity of genomics technology including long-read transcriptome sequencing, functional genomics assays, and graph- based genome representations to identify features such as genes, pseudogenes, exons and splice sites that are incorrect, incomplete or in genome regions simply not present in the current reference assembly. More specifically, in the next four years GENCODE plans to 1) extend its human and mouse gene sets to as near completion as possible given available data and current experimental technology; 2) leverage new, high-quality human genome assemblies and targeted transcriptomic data to expand representation so that more human haplotypes will have high-quality annotation 3) annotate gene-associated regulatory regions including enhancer- promoter connections 4) collaborate with other resources to ensure a consistent representation of genic and regulatory features and reference transcripts for reporting clinical variation; and 5) distribute GENCODE annotations and engage with community annotation efforts to ensure accuracy and consistency. Primary GENCODE data will continue to be available from the Ensembl and UCSC Genome Browsers and the GENCODE web site. We will develop new mechanisms for effective two-way outreach, training and communication with the community with the long-term aim of establishing GENCODE as the standard annotation set for research and clinical genomics applications.

IC Name
NATIONAL HUMAN GENOME RESEARCH INSTITUTE
  • Activity
    U24
  • Administering IC
    HG
  • Application Type
    2
  • Direct Cost Amount
    2849784
  • Indirect Cost Amount
    123383
  • Total Cost
    2973167
  • Sub Project Total Cost
  • ARRA Funded
    False
  • CFDA Code
    172
  • Ed Inst. Type
  • Funding ICs
    NHGRI:2973167\
  • Funding Mechanism
    OTHER RESEARCH-RELATED
  • Study Section
    GNOM
  • Study Section Name
    National Human Genome Research Institute Initial Review Group
  • Organization Name
    EUROPEAN MOLECULAR BIOLOGY LABORATORY
  • Organization Department
  • Organization DUNS
    321691735
  • Organization City
    HEIDELBERG
  • Organization State
  • Organization Country
    GERMANY
  • Organization Zip Code
    69117
  • Organization District
    GERMANY