Imputing single cell RNA sequencing data: Mathematical, statistical and computational challenges

Information

Research Project
10242066

ApplicationId
10242066
Core Project Number
R01GM135928
Full Project Number
5R01GM135928-03
Serial Number
135928
FOA Number
PAR-19-001
Sub Project Id

Project Start Date
9/23/2019 - 6 years ago
Project End Date
8/31/2022 - 3 years ago
Program Officer Name
BRAZHNIK, PAUL
Budget Start Date
9/1/2021 - 4 years ago
Budget End Date
8/31/2022 - 3 years ago
Fiscal Year
2021
Support Year
03
Suffix
Award Notice Date
8/22/2021 - 4 years ago

Organizations

North Carolina State University

Information

Imputing single cell RNA sequencing data: Mathematical, statistical and computational challenges

Novel single cell RNA sequencing (scRNA-seq) technologies can simultaneously measure the expression levels of all 30,000 genes over thousands to millions of individual cells. The analysis of scRNA-seq data has already led to fundamental advances in biology, including discovery of new cell types, detection of subtle differences between similar cells, and reconstruction of cellular developmental trajectories. Single- cell measurements involve amplification of tiny amounts of RNA and result in extremely sparse data matrices with many zeros, While some of these zeros are due to missing data (dropouts), others represent true biological inactivity. Yet, many scRNA-seq imputation methods treat all observed zero entries identically, leading to imputed matrices that often overestimate transcriptional activity. Other methods that do attempt to distinguish biological zeros from dropouts lack rigorous theoretical guarantees. The goals of this proposal are to develop models, supporting mathematical theory, and computational tools that explicitly take the existence of true biological zeros into account. Matrix imputation under this constraint involves both computational challenges as well as theoretical questions in random matrix theory and high dimensional statistics. These include rank estimation and low rank sparse matrix recovery from partially observed data, and biclustering in the presence of dropouts and zeros, We plan to develop novel approaches based on non-smooth continuous optimization, and derive accompanying statistical guarantees, We also plan to develop ensemble learning approaches that cleverly combine the outputs of multiple imputation algorithms. Finally, we hope to gain important insights regarding recovery from such data via a study of minimax rates and information lower bounds. To address these challenges, we will build on our promising preliminary results and the joint expertise of the investigators in spectral methods, high dimensional statistics, matrix analysis, numerical optimization, and genomics.

IC Name

NATIONAL INSTITUTE OF GENERAL MEDICAL SCIENCES

Activity
R01
Administering IC
GM
Application Type
5

Direct Cost Amount
199947
Indirect Cost Amount
23319
Total Cost
223266
Sub Project Total Cost

ARRA Funded
False
CFDA Code
859
Ed Inst. Type
Funding ICs
NIGMS:223266\
Funding Mechanism
Non-SBIR/STTR RPGs
Study Section
ZGM1
Study Section Name
Special Emphasis Panel

Organization Name
NORTH CAROLINA STATE UNIVERSITY RALEIGH
Organization Department
Organization DUNS
042092122
Organization City
RALEIGH
Organization State
NC
Organization Country
UNITED STATES
Organization Zip Code
276957514
Organization District
UNITED STATES

Imputing single cell RNA sequencing data: Mathematical, statistical and computational challenges

Information

ApplicationId

Core Project Number

Full Project Number

Serial Number

FOA Number

Sub Project Id

Project Start Date

Project End Date

Program Officer Name

Budget Start Date

Budget End Date

Fiscal Year

Support Year

Suffix

Award Notice Date

Organizations

Imputing single cell RNA sequencing data: Mathematical, statistical and computational challenges

IC Name

Activity

Administering IC

Application Type

Direct Cost Amount

Indirect Cost Amount

Total Cost

Sub Project Total Cost

ARRA Funded

CFDA Code

Ed Inst. Type

Funding ICs

Funding Mechanism

Study Section

Study Section Name

Organization Name

Organization Department

Organization DUNS

Organization City

Organization State

Organization Country

Organization Zip Code

Organization District