Cancer is an umbrella term that includes a range of disorders, from those that are fast-growing and lethal to indolent lesions with low potential for progression to death. In recent decades, important clinical advances in cancer treatments have been attributed to molecular subtyping and targeted treatments aiming at specific genes. However, a significant percentage of patients do not respond to targeted therapies or develop resistance over time. This implies that current methods for tumor characterization and therapeutic interventions are not sufficiently accurate. This project aims to develop novel technologies able to better differentiate among patients diagnosed with the same cancer type. Fundamental to this personalized analysis approach is the capability to explain why patients with similar cancer can greatly differ in terms of treatment success. The approach will also feature an effective integration methodology of multiple types of data. This work will enhance our ability to distinguish among patients who are in immediate danger and need the most aggressive treatments and those whose disease will progress slowly. This will lead to reduced health care costs and personal suffering while improving patient care by identifying the correct personalized treatment for each patient. This research will pave the way for future projects in identifying clinically applicable biomarkers that can be used in diagnosis, risk prediction, and monitoring treatment response and outcome. The project also has an extensive education and outreach component, including curriculum development, undergraduate research, museum exhibits for children, and outreach activities to community colleges and K-12 schools in Nevada.<br/><br/><br/>This project will address two important challenges commonly faced in cancer subtyping: (1) incorporation of pathway knowledge in cancer subtyping, patient stratification, and risk prediction, and (2) efficient integration of multi-cohort and multi-omics data. To address the first challenge, the project will develop novel machine learning technologies to identify impacted pathways and compute personalized pathway profiles in individual patients. The innovation of this idea stems from combining classical probabilistic components with important biological factors that are not captured in existing techniques: i) all gene-gene interactions as described by each pathway, ii) topology among multi-omics layers, and iii) the crosstalk among pathways. The approach will transform all molecular data to a common pathway space, making it possible to efficiently address the second challenge: systematically integrate multi-omics and multi-cohort data. This will be realized by a non-negative-kernel, variational autoencoders. The non-negative kernel will effectively accumulate consistent signals of biomarkers while shrinking random noise of non-relevant components. The goal of this project will be achieved by three thrusts: 1) compute personalized pathway profiles that can be used for subtyping, 2) integrate multiple patient cohorts to increase sample size and statistical power of subtyping methods, and 3) validate the proposed methodologies using 10 subtype discovery methods, 6 patient stratification techniques, and 6 risk prediction models that will be tested on more than 70 cancer datasets. The investigator will make the methodologies publicly available via a Bioconductor package and a web-based platform, thus increasing their potential for wide adoption by the research communities.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.