In the United States, millions of people have chronic conditions, including Type 2 Diabetes and Heart Failure. It is important to screen patients for these illnesses as soon as possible. This research aims at mining health care data to find patients likely to develop these conditions and to develop a model for opportunistic screening in situations where the encounter with the patient may be unrelated to the specific diagnosis. Opportunistic screening is needed especially for minority and lower socio-economic status patients, who are less likely to seek regular care from primary care providers. This research will address many challenges. First, health records include different types of data, from text to numeric values, from continuous signals to images. Second, records comprise information collected at different timepoints, and with different frequencies: some patients may be seen once a year, and others, every few days. Third, the privacy of patients must be protected. Fourth, automatically derived models must be fair and unbiased, especially towards underprivileged groups. Finally, many powerful current Machine Learning models behave like black boxes: these models will be adopted in healthcare and other critical areas only if their conclusions can be explained. From a societal point of view, this project has the potential to positively impact the health of millions of people, and in particular, of minority and lower socio-economic status patients. As concerns education, this research will recruit underrepresented students at the University of Illinois Chicago, a federally-designated Minority-Serving Institution, and support the interdisciplinary development of a diverse cohort of PhD and undergraduate students. <br/><br/>This project will explore new Machine Learning (ML) and Natural Language Processing approaches to uncover the earliest point in temporal sequence data, in which a patient can be screened for a certain chronic condition. The research will develop novel methods to integrate heterogeneous data, which features missing values and noise; de-identification approaches to protect privacy; new approaches to concept and temporal relation extraction; algorithms to improve fairness by addressing data heterogeneity and missing data; exploration of concept-level explainability. A robust assessment plan is an integral part of the proposed research. First, all algorithms will be evaluated according to current ML methodology. Additionally, a human-in-the-loop approach will be employed, in which the clinicians on the team will provide informal and formal evaluation of the algorithm predictions. The methods this research will uncover are likely applicable to other domains where heterogeneous, incomplete, identifiable, or biased temporal sequence data exist, for example predicting youth at risk, water resource monitoring, and supporting food safety.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.