Measuring functional similarity between transcriptional enhancers using deep learning

Information

Research Project
10302539

ApplicationId
10302539
Core Project Number
R21HG011507
Full Project Number
1R21HG011507-01A1
Serial Number
011507
FOA Number
PA-20-195
Sub Project Id

Project Start Date
9/1/2021 - 3 years ago
Project End Date
8/31/2023 - a year ago
Program Officer Name
GILCHRIST, DANIEL A
Budget Start Date
9/1/2021 - 3 years ago
Budget End Date
8/31/2023 - a year ago
Fiscal Year
2021
Support Year
01
Suffix
A1
Award Notice Date
8/25/2021 - 3 years ago

Organizations

TEXAS A&M UNIVERSITY

Information

Measuring functional similarity between transcriptional enhancers using deep learning

PROJECT SUMMARY Understanding transcriptional regulation remains as a major task in the molecular biology ?eld. Enhancers are genetic elements that regulate when and where genes are expressed and their expression levels. These elements are hard to discover because their locations and orientations are not constrained with respect to their target genes. Several diseases and susceptibility to certain diseases are linked to mutations and variants in enhancers. Multiple experimental and computational methods have been developed for locating enhancers. Computational methods are more suitable to handle the large number of genomes being sequenced now because they are faster, cheaper, and less labor intensive than experimental methods. Despite many available computational tools, we lack a sophisticated tool that can measure similarity in the enhancer activity of a pair of sequences. We propose here utilizing Deep Arti?cial Neural Networks (DANNs) to develop such a tool. The long-term objective of this project is to decipher the code governing gene regulation with the following speci?c aims: (i) design a computational tool for measuring enhancer-enhancer similarity, (ii) validate up to 96 putative enhancers experimentally, (iii) understand enhancer grammar, and (iv) annotate enhancers in more than 50 insect genomes. To achieve these aims, a novel application of DANNs is proposed. Current tools utilize DANNs to answer a yes-no question: does a sequence have similar activity to the tissue-speci?c enhancers comprising a particular training set of known enhancers? These approaches require training a separate network on each tissue, leading to inconsistent performances on different tissues. Instead, here we use a DANN to answer a related but different question: does this sequence have similar enhancer activity to a single known tissue-speci?c enhancer? This deep network should perform consistently on different cell types because it is trained on pairs of sequences ? not individual sequences as is the case in the available tools ? representing all tissues for which there are known enhancers. The DANN is trained to recognize sequence pairs with similar enhancer activities and those with dissimilar activities including (i) two enhancers active in two different tissues, (ii) one enhancer and a random genomic sequence, and (iii) two random genomic sequences. The tool outputs a score between 0 and 1, indicating how similar the enhancer activities of the two sequences are. Using a much simpler machine learning algorithm than DANNs, we demonstrate that pairs with similar enhancer activities can be separated from pairs of random genomic sequences or pairs of one enhancer and a random genomic sequence with a very high accuracy. The new tool has many important potential applications including consistent annotation of enhancers across cell types and related species. Our tool can annotate enhancers active in a cell type that has a small number of known enhancers, and it can annotate enhancers in related genomes when there is a set of known enhancers demarcated in one of them. Discovering new transcription factor binding sites is another potential application. Studying enhancer ?design principles? and the effects of variants can be facilitated using the proposed tool. Such applications will advance our ?eld.

IC Name

NATIONAL HUMAN GENOME RESEARCH INSTITUTE

Activity
R21
Administering IC
HG
Application Type
1

Direct Cost Amount
293951
Indirect Cost Amount
73940
Total Cost
367891
Sub Project Total Cost

ARRA Funded
False
CFDA Code
172
Ed Inst. Type
BIOMED ENGR/COL ENGR/ENGR STA
Funding ICs
NHGRI:367891\
Funding Mechanism
Non-SBIR/STTR RPGs
Study Section
BDMA
Study Section Name
Biodata Management and Analysis Study Section

Organization Name
TEXAS A&M UNIVERSITY-KINGSVILLE
Organization Department
ENGINEERING (ALL TYPES)
Organization DUNS
868154089
Organization City
KINGSVILLE
Organization State
TX
Organization Country
UNITED STATES
Organization Zip Code
783638202
Organization District
UNITED STATES

Measuring functional similarity between transcriptional enhancers using deep learning

Information

ApplicationId

Core Project Number

Full Project Number

Serial Number

FOA Number

Sub Project Id

Project Start Date

Project End Date

Program Officer Name

Budget Start Date

Budget End Date

Fiscal Year

Support Year

Suffix

Award Notice Date

Organizations

Measuring functional similarity between transcriptional enhancers using deep learning

IC Name

Activity

Administering IC

Application Type

Direct Cost Amount

Indirect Cost Amount

Total Cost

Sub Project Total Cost

ARRA Funded

CFDA Code

Ed Inst. Type

Funding ICs

Funding Mechanism

Study Section

Study Section Name

Organization Name

Organization Department

Organization DUNS

Organization City

Organization State

Organization Country

Organization Zip Code

Organization District