CAREER: Disentangled learning of high dimensional biomedical data in the presence of inherent heterogeneity

Information

NSF Award
2514834

Owner

OREGON HEALTH AND SCIENCE UNIVERSITY

Award Id
2514834
Award Effective Date
10/1/2024 - 4 months ago
Award Expiration Date
10/31/2027 - 2 years from now
Award Amount
$ 450,205.00
Award Instrument
Standard Grant

Information

CAREER: Disentangled learning of high dimensional biomedical data in the presence of inherent heterogeneity

The advancement in high throughput technologies is revolutionizing the fields of biomedical research, giving rise to large-scale patient-derived molecular datasets, including matched multi-omics data, single cell or spatially resolved -omics data, as well as longitudinal -omics data. Despite the growing trend of analyzing such complex data to further our understanding of the molecular basis in states of human disease, the inherent biological heterogeneity is challenging the conventional paradigm to treat a given diseased system as a uniform entity: patients may form different disease subtypes present with variable clinical outcomes, and cells collected from the same patient sample may display different phenotypic states. Biological and clinical interpretability in the face of the high dimensional molecular features remain to be another challenge. The proposed project will address key challenges in: 1) simultaneously detecting biologically/clinically meaningful patient/cell subgroups and extracting their unique defining genomic features; and 2) integrated learning from data of multiple modalities, with strong systematic batch effect, or collected over time. Thus, the proposed work has high potential to discover new knowledge and to induce new approaches significant to health data science research. The project will result in algorithms, such as high dimensionality reduction, clustering and data integration, that are broadly applicable across the whole of data science, to address a wide range of research and industry needs.<br/><br/>This project proposes novel ideas of disentangled learning to simultaneously address the high dimensionality and inherent heterogeneity issues for three most popular biomedical data types: multi-omics, tissue resolved -omics, and longitudinal -omics data. The following three critical challenges are to be addressed: 1) Detecting biologically/clinically meaningful patient subgroups by leveraging multi-omics data through novel supervised clustering methods; 2) Discovering heterogeneous cell populations in noisy tissue resolved omics data and multi-task learning of multiple tissue samples through a Poisson based low dimensional embedding model; and 3) Characterizing heterogeneous disease trajectories using longitudinal and high dimensional -omics data through a fusion learning model. For all three scenarios, a common theme of disentangled learning is proposed to ensure the biological/clinical interpretability of the analysis results. The inherent heterogeneity within and across the subjects are divided to distinct subgroups or subpopulations, each of which is characterized by their unique features extracted from within a high dimensional feature space. All proposed methods are embraced with extensive utility in both academia and industry. Educationally, the proposed peer learning education module, R Shiny based tool development research, and summer workshops on data science, all targeting high school and undergraduate students, can well serve as a platform not only for inspiring and retaining students, but also training them for future STEM workforce roles. Thus, the proposed project is promising to have far-reaching educational impacts.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Program Officer
Sylvia Spenglersspengle@nsf.gov7032927347
Min Amd Letter Date
12/31/2024 - a month ago
Max Amd Letter Date
12/31/2024 - a month ago
ARRA Amount

Institutions

Name
Oregon Health & Science University
City
PORTLAND
State
OR
Country
United States
Address
3181 SW SAM JACKSON PARK RD
Postal Code
972393011
Phone Number
5034947784

Investigators

First Name
Sha
Last Name
Cao
Email Address
shacao@iu.edu
Start Date
12/31/2024 12:00:00 AM

Program Element

Text
Info Integration & Informatics
Code
736400

Program Reference

Text
CAREER-Faculty Erly Career Dev
Code
1045

Text
INFO INTEGRATION & INFORMATICS
Code
7364

CAREER: Disentangled learning of high dimensional biomedical data in the presence of inherent heterogeneity

Information

Owner

Award Id

Award Effective Date

Award Expiration Date

Award Amount

Award Instrument

CAREER: Disentangled learning of high dimensional biomedical data in the presence of inherent heterogeneity

Program Officer

Min Amd Letter Date

Max Amd Letter Date

ARRA Amount

Institutions

Name

City

State

Country

Address

Postal Code

Phone Number

Investigators

First Name

Last Name

Email Address

Start Date

Program Element

Text

Code

Program Reference

Text

Code

Text

Code