Project Summary/Abstract This project proposes new methods for representing data in electronic health records (EHR) to improve pre- dictive modeling and interpretation of patient outcomes. EHR data offer a promising opportunity for advancing the understanding of how clinical decisions and patient conditions interact over time to in?uence patient health. However, EHR data are dif?cult to use for predictive modeling due to the various data types they contain (con- tinuous, categorical, text, etc.), their longitudinal nature, the high amount of non-random missingness for certain measurements, and other concerns. Furthermore, patient outcomes often have heterogenous causes and re- quire information to be synthesized from several clinical lab measures and patient visits. The core challenge at hand is overcoming the mismatch between data representations in the EHR and the assumptions underly- ing commonly used statistical and machine learning (ML) methods. To this end, this project proposes novel wrapper-based methods for learning informative features from EHR data. Both methods propose specialized operators to handle sequential data, time delays, and variable interactions, and have the capacity to discover underlying clinical rules/decisions that affect patient outcomes. Importantly, both methods also produce archives of possible models that represent the best trade-offs between complexity and accuracy, which assists in model interpretation. These method advances are made possible by encoding a rich set of data operations as nodes in a directed acyclic graph, and optimizing the graph structures using multi-objective optimization. The central hypothesis of this research is that multi-objective optimization can learn effective data representations from the EHR to produce accurate, explanatory models of patient outcomes. Preliminary work has shown that these methods can effectively learn low-order data representations that improve the predictive ability of several state- of-the-art ML methods. This technique demonstrates good scaling properties with high-dimensional biomedical data. Aim 1 (K99) is to develop a multi-objective feature engineering method that pairs with existing ML methods to iteratively improve their performance by constructing new features from the raw data and using feedback from the trained model to guide feature construction. In Aim 2 (K99), this method is applied to form predictive models of the risk of heart disease and heart failure using longitudinal EHR data. The resultant models will be inter- preted with the help of mentors in order to translate predictions into clinical recommendations. For Aim 3 (R00), a second method is proposed that uses a similar framework to optimize existing neural network approaches in order to simplify their structure as much as possible while maintaining accuracy. The goal of Aim 4 (R00) is to identify hospital patients who are at risk of readmission and propose point-of-care strategies to mitigate that risk. This goal is facilitated through the application of the proposed methods to patient data collected from the Hospital of the University of Pennsylvania, the Geisinger Health System, and publicly available EHR databases.