ABSTRACT Substance misuse (SM) puts persons at risk for HIV. Systems of care for the detection and treatment of risks for HIV acquisition or transmission among SM populations are siloed. The acute-care, hospital setting offers unique opportunities for screening, testing, and treatment of HIV risks among patients with SM. Adjacent to communities with the highest number of heroin overdoses in Chicago, Rush University Medical Center launched the Substance Use Intervention Team (SUIT) in 2017. The SUIT service attempts to screen all hospitalized patients for SM and intervenes with a harm reduction model based on low, medium, and high risk; however, the busy setting and acuity and severity of patients? illnesses limit universal screening rates and facilitate implicit biases in making determinations about which patients to screen. Automated, clinical decision support tools trained with supervised machine learning (ML) can relieve these screening burdens. A machine learning health system approach leverages EHR data, including clinical, social, and behavioral determinants captured in structured data fields and in clinical notes ? unstructured data typically unavailable for predictive analytics. A ML HIV risk classifier can identify patients with SM and HIV risk and alert providers to evaluate appropriateness for medication and care to prevent or treat HIV. To date, no screening tool has been developed and validated to assess for HIV risk among persons with SM. This pilot?s goal is to develop, train, and test an interoperable ML classifier to identify risk for HIV transmission or acquisition among patients with SM and assess its real-time performance. Aim 1 is to develop, train, and test a ML classifier with high sensitivity (?0.8) and specificity (?0.8) to identify risk for HIV acquisition or transmission among patients with substance misuse. Within the source cohort of encounter-level data of patients with SM between 2017-2019 (N=23,817), we will use a rule-based method and Centers for Disease Control HIV risk guidelines to identify as cases those patient encounters with diagnoses, such as Chlamydia, associated with HIV transmission (6%, n=1,300). Utilizing propensity score matching we will match non-cases (1:2) and conduct manual chart annotation in order to verify or re-classify cases and non-cases and to establish the reference dataset (n=3,900). With labeled cases and non-cases, we will partition the reference dataset to train and test three supervised learning ML models. We will select the best performing model based on standard metrics, like the C-statistic. Aim 2 is to integrate the best performing model from Aim 1 into the Rush EHR infrastructure to test predictive validity in real time, prospectively. As we expect the ML classifier to identify 50% more HIV risk cases (9-10%) than our rule-based method, we will study the effects of the classifier and measure the number of risk cases identified over 12 one-month time points in an interrupted time series. This ML classifier is the first step toward an appropriate, scalable, and interoperable learning health system intervention that integrates HIV prevention and treatments into care for hospitalized patients with SM.