CAREER: Disentangled learning of high dimensional biomedical data in the presence of inherent heterogeneity

Information

  • NSF Award
  • 2514834
Owner
  • Award Id
    2514834
  • Award Effective Date
    10/1/2024 - 4 months ago
  • Award Expiration Date
    10/31/2027 - 2 years from now
  • Award Amount
    $ 450,205.00
  • Award Instrument
    Standard Grant

CAREER: Disentangled learning of high dimensional biomedical data in the presence of inherent heterogeneity

The advancement in high throughput technologies is revolutionizing the fields of biomedical research, giving rise to large-scale patient-derived molecular datasets, including matched multi-omics data, single cell or spatially resolved -omics data, as well as longitudinal -omics data. Despite the growing trend of analyzing such complex data to further our understanding of the molecular basis in states of human disease, the inherent biological heterogeneity is challenging the conventional paradigm to treat a given diseased system as a uniform entity: patients may form different disease subtypes present with variable clinical outcomes, and cells collected from the same patient sample may display different phenotypic states. Biological and clinical interpretability in the face of the high dimensional molecular features remain to be another challenge. The proposed project will address key challenges in: 1) simultaneously detecting biologically/clinically meaningful patient/cell subgroups and extracting their unique defining genomic features; and 2) integrated learning from data of multiple modalities, with strong systematic batch effect, or collected over time. Thus, the proposed work has high potential to discover new knowledge and to induce new approaches significant to health data science research. The project will result in algorithms, such as high dimensionality reduction, clustering and data integration, that are broadly applicable across the whole of data science, to address a wide range of research and industry needs.<br/><br/>This project proposes novel ideas of disentangled learning to simultaneously address the high dimensionality and inherent heterogeneity issues for three most popular biomedical data types: multi-omics, tissue resolved -omics, and longitudinal -omics data. The following three critical challenges are to be addressed: 1) Detecting biologically/clinically meaningful patient subgroups by leveraging multi-omics data through novel supervised clustering methods; 2) Discovering heterogeneous cell populations in noisy tissue resolved omics data and multi-task learning of multiple tissue samples through a Poisson based low dimensional embedding model; and 3) Characterizing heterogeneous disease trajectories using longitudinal and high dimensional -omics data through a fusion learning model. For all three scenarios, a common theme of disentangled learning is proposed to ensure the biological/clinical interpretability of the analysis results. The inherent heterogeneity within and across the subjects are divided to distinct subgroups or subpopulations, each of which is characterized by their unique features extracted from within a high dimensional feature space. All proposed methods are embraced with extensive utility in both academia and industry. Educationally, the proposed peer learning education module, R Shiny based tool development research, and summer workshops on data science, all targeting high school and undergraduate students, can well serve as a platform not only for inspiring and retaining students, but also training them for future STEM workforce roles. Thus, the proposed project is promising to have far-reaching educational impacts.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

  • Program Officer
    Sylvia Spenglersspengle@nsf.gov7032927347
  • Min Amd Letter Date
    12/31/2024 - a month ago
  • Max Amd Letter Date
    12/31/2024 - a month ago
  • ARRA Amount

Institutions

  • Name
    Oregon Health & Science University
  • City
    PORTLAND
  • State
    OR
  • Country
    United States
  • Address
    3181 SW SAM JACKSON PARK RD
  • Postal Code
    972393011
  • Phone Number
    5034947784

Investigators

  • First Name
    Sha
  • Last Name
    Cao
  • Email Address
    shacao@iu.edu
  • Start Date
    12/31/2024 12:00:00 AM

Program Element

  • Text
    Info Integration & Informatics
  • Code
    736400

Program Reference

  • Text
    CAREER-Faculty Erly Career Dev
  • Code
    1045
  • Text
    INFO INTEGRATION & INFORMATICS
  • Code
    7364