Optimizing the Utility of Large Electronic Health Records Data in Data-Driven Health Research

Information

  • Research Project
  • 10111205
  • ApplicationId
    10111205
  • Core Project Number
    R15LM013569
  • Full Project Number
    1R15LM013569-01
  • Serial Number
    013569
  • FOA Number
    PAR-18-714
  • Sub Project Id
  • Project Start Date
    9/18/2020 - 4 years ago
  • Project End Date
    8/31/2023 - a year ago
  • Program Officer Name
    YE, JANE
  • Budget Start Date
    9/18/2020 - 4 years ago
  • Budget End Date
    8/31/2023 - a year ago
  • Fiscal Year
    2020
  • Support Year
    01
  • Suffix
  • Award Notice Date
    9/17/2020 - 4 years ago

Optimizing the Utility of Large Electronic Health Records Data in Data-Driven Health Research

Optimizing the Utility of Electronic Medical Records Data in Data-driven Health Research ABSTRACT Medical centers continue to archive patient follow-up data in Electronic Medical Records (EMR), which have tremendous value in discovering new knowledge and insights. The large volume of EMR data can play an important role in improving the accuracy and generalizability of predictive models in healthcare, especially when misdiagnosis is known to be the third leading cause of death in the United States. Despite these merits, EMR data are invariably corrupted by factors like missing values, outliers, and unrealistic measurements, which prevent researchers from fully utilizing such abundant data in many important studies. Many studies simply discard a large number of samples to get rid of missingness and eventually bias their data-driven analytical models. Existing techniques for missing data imputation use simplified linear models and are mostly suitable for imputing cross-sectional data missingness that ignore longitudinal missingness in patient follow-up data. This proposal aims to investigate novel artificial intelligence (AI) based models to improve the quality and utility of EMR data in preparation for data-driven retrospective studies. Toward this preparation, the goal of the project is 1) to investigate more accurate and robust data imputation models compared to existing ones and 2) adapt state-of-the-art deep learning techniques in preparing optimal representation of large EMR data. The proposed research will 1) maximize the quality and utility of EMR data to support a multitude of retrospective studies, 2) enable visualization of complex patient data, 3) identify more important and predictive clinical parameters, 4) yield a compact and optimal representation of large EMR datasets. We hypothesize that optimally processed EMR data with state-of-the-art AI models can most accurately model patient risk when compared to existing statistical and clinical risk models. This project will combine the complementary expertise of the collaborators, Dr. Manar Samad, PhD (Computer Science), Dr. Owen Johnson, DPH (Biostatistics and Public Health), and Dr. Edilberto Raynes, MD, PhD (Medicine) along with the participating undergraduate students at Tennessee State University (TSU). The proposal entails several research and development components that will allow undergraduate students to gain valuable research and analytical skills in data science, programming, and health informatics. The project activities will expose health science students to AI-based computing solutions to broaden their scope of future health research and career. This project will help TSU prepare a strong workforce of minority students who will gain competitive skill sets in data science and health informatics that are currently high in demand almost everywhere. Overall, the project will develop a data-capable workforce to strengthen an interdisciplinary research capacity and collaboration between the Departments of Computer and Health science at TSU.

IC Name
NATIONAL LIBRARY OF MEDICINE
  • Activity
    R15
  • Administering IC
    LM
  • Application Type
    1
  • Direct Cost Amount
    296972
  • Indirect Cost Amount
    124728
  • Total Cost
    421700
  • Sub Project Total Cost
  • ARRA Funded
    False
  • CFDA Code
    879
  • Ed Inst. Type
    BIOMED ENGR/COL ENGR/ENGR STA
  • Funding ICs
    NLM:421700\
  • Funding Mechanism
    Non-SBIR/STTR RPGs
  • Study Section
    BCHI
  • Study Section Name
    Biomedical Computing and Health Informatics Study Section
  • Organization Name
    TENNESSEE STATE UNIVERSITY
  • Organization Department
    BIOSTATISTICS & OTHER MATH SCI
  • Organization DUNS
    108814179
  • Organization City
    NASHVILLE
  • Organization State
    TN
  • Organization Country
    UNITED STATES
  • Organization Zip Code
    372091561
  • Organization District
    UNITED STATES