The advancement in high throughput technologies is revolutionizing the fields of biomedical research, giving rise to large-scale patient-derived molecular datasets, including matched multi-omics data, single cell or spatially resolved -omics data, as well as longitudinal -omics data. Despite the growing trend of analyzing such complex data to further our understanding of the molecular basis in states of human disease, the inherent biological heterogeneity is challenging the conventional paradigm to treat a given diseased system as a uniform entity: patients may form different disease subtypes present with variable clinical outcomes, and cells collected from the same patient sample may display different phenotypic states. Biological and clinical interpretability in the face of the high dimensional molecular features remain to be another challenge. The proposed project will address key challenges in: 1) simultaneously detecting biologically/clinically meaningful patient/cell subgroups and extracting their unique defining genomic features; and 2) integrated learning from data of multiple modalities, with strong systematic batch effect, or collected over time. Thus, the proposed work has high potential to discover new knowledge and to induce new approaches significant to health data science research. The project will result in algorithms, such as high dimensionality reduction, clustering and data integration, that are broadly applicable across the whole of data science, to address a wide range of research and industry needs.<br/><br/>This project proposes novel ideas of disentangled learning to simultaneously address the high dimensionality and inherent heterogeneity issues for three most popular biomedical data types: multi-omics, tissue resolved -omics, and longitudinal -omics data. The following three critical challenges are to be addressed: 1) Detecting biologically/clinically meaningful patient subgroups by leveraging multi-omics data through novel supervised clustering methods; 2) Discovering heterogeneous cell populations in noisy tissue resolved omics data and multi-task learning of multiple tissue samples through a Poisson based low dimensional embedding model; and 3) Characterizing heterogeneous disease trajectories using longitudinal and high dimensional -omics data through a fusion learning model. For all three scenarios, a common theme of disentangled learning is proposed to ensure the biological/clinical interpretability of the analysis results. The inherent heterogeneity within and across the subjects are divided to distinct subgroups or subpopulations, each of which is characterized by their unique features extracted from within a high dimensional feature space. All proposed methods are embraced with extensive utility in both academia and industry. Educationally, the proposed peer learning education module, R Shiny based tool development research, and summer workshops on data science, all targeting high school and undergraduate students, can well serve as a platform not only for inspiring and retaining students, but also training them for future STEM workforce roles. Thus, the proposed project is promising to have far-reaching educational impacts.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.