GENCODE: comprehensive reference genome annotation for human and mouse

Information

Research Project
10186568

ApplicationId
10186568
Core Project Number
U24HG007234
Full Project Number
2U24HG007234-09
Serial Number
007234
FOA Number
PAR-20-100
Sub Project Id

Project Start Date
4/1/2013 - 12 years ago
Project End Date
6/30/2025 - 3 months ago
Program Officer Name
GILCHRIST, DANIEL A
Budget Start Date
9/14/2021 - 4 years ago
Budget End Date
6/30/2022 - 3 years ago
Fiscal Year
2021
Support Year
09
Suffix
Award Notice Date
9/14/2021 - 4 years ago

Organizations

European Molecular Biology Laboratory

Information

GENCODE: comprehensive reference genome annotation for human and mouse

Project Summary The GENCODE consortium creates foundational reference genome annotation for the human and mouse genomes in which all features are identified and classified with high accuracy based on biological evidence, and then freely released for the benefit of biomedical research and genome interpretation. GENCODE seeks to create annotation that increases the understanding of genome function in both human and mouse by prioritizing human disease genes and respecting the role of mouse as the major mammalian model organism. To effectively annotate genomes, GENCODE has created a suite of tools and draws on deep expertise across its partners across four fundamental components: 1) a comprehensive gene annotation pipeline leveraging manual and computational annotation; 2) a set of computational methods to evaluate and enhance gene annotation; 3) experimental pipelines targeted to expressed sequences less detectable in standard protocols; and 4) a machine learning capacity to improve all facets of the project. GENCODE will maintain a major focus on protein-coding and non-coding loci, including their alternatively spliced isoforms and pseudogenes and will extend expert manual review to small non-coding RNAs (ncRNA) and the annotation of non-polyadenylated transcripts. GENCODE will also expand regulatory annotation to a defined set of gene-associated features to more accurately reflect the interconnections between regulatory regions, including those with transcribed sequences such as ncRNA, and overall transcriptional output. GENCODE will take advantage of the increasing maturity of genomics technology including long-read transcriptome sequencing, functional genomics assays, and graph- based genome representations to identify features such as genes, pseudogenes, exons and splice sites that are incorrect, incomplete or in genome regions simply not present in the current reference assembly. More specifically, in the next four years GENCODE plans to 1) extend its human and mouse gene sets to as near completion as possible given available data and current experimental technology; 2) leverage new, high-quality human genome assemblies and targeted transcriptomic data to expand representation so that more human haplotypes will have high-quality annotation 3) annotate gene-associated regulatory regions including enhancer- promoter connections 4) collaborate with other resources to ensure a consistent representation of genic and regulatory features and reference transcripts for reporting clinical variation; and 5) distribute GENCODE annotations and engage with community annotation efforts to ensure accuracy and consistency. Primary GENCODE data will continue to be available from the Ensembl and UCSC Genome Browsers and the GENCODE web site. We will develop new mechanisms for effective two-way outreach, training and communication with the community with the long-term aim of establishing GENCODE as the standard annotation set for research and clinical genomics applications.

IC Name

NATIONAL HUMAN GENOME RESEARCH INSTITUTE

Activity
U24
Administering IC
HG
Application Type
2

Direct Cost Amount
2849784
Indirect Cost Amount
123383
Total Cost
2973167
Sub Project Total Cost

ARRA Funded
False
CFDA Code
172
Ed Inst. Type
Funding ICs
NHGRI:2973167\
Funding Mechanism
OTHER RESEARCH-RELATED
Study Section
GNOM
Study Section Name
National Human Genome Research Institute Initial Review Group

Organization Name
EUROPEAN MOLECULAR BIOLOGY LABORATORY
Organization Department
Organization DUNS
321691735
Organization City
HEIDELBERG
Organization State
Organization Country
GERMANY
Organization Zip Code
69117
Organization District
GERMANY

GENCODE: comprehensive reference genome annotation for human and mouse

Information

ApplicationId

Core Project Number

Full Project Number

Serial Number

FOA Number

Sub Project Id

Project Start Date

Project End Date

Program Officer Name

Budget Start Date

Budget End Date

Fiscal Year

Support Year

Suffix

Award Notice Date

Organizations

GENCODE: comprehensive reference genome annotation for human and mouse

IC Name

Activity

Administering IC

Application Type

Direct Cost Amount

Indirect Cost Amount

Total Cost

Sub Project Total Cost

ARRA Funded

CFDA Code

Ed Inst. Type

Funding ICs

Funding Mechanism

Study Section

Study Section Name

Organization Name

Organization Department

Organization DUNS

Organization City

Organization State

Organization Country

Organization Zip Code

Organization District