Optimizing the Utility of Large Electronic Health Records Data in Data-Driven Health Research

Information

Research Project
10111205

ApplicationId
10111205
Core Project Number
R15LM013569
Full Project Number
1R15LM013569-01
Serial Number
013569
FOA Number
PAR-18-714
Sub Project Id

Project Start Date
9/18/2020 - 4 years ago
Project End Date
8/31/2023 - a year ago
Program Officer Name
YE, JANE
Budget Start Date
9/18/2020 - 4 years ago
Budget End Date
8/31/2023 - a year ago
Fiscal Year
2020
Support Year
01
Suffix
Award Notice Date
9/17/2020 - 4 years ago

Organizations

Tennessee State University

Information

Optimizing the Utility of Large Electronic Health Records Data in Data-Driven Health Research

Optimizing the Utility of Electronic Medical Records Data in Data-driven Health Research ABSTRACT Medical centers continue to archive patient follow-up data in Electronic Medical Records (EMR), which have tremendous value in discovering new knowledge and insights. The large volume of EMR data can play an important role in improving the accuracy and generalizability of predictive models in healthcare, especially when misdiagnosis is known to be the third leading cause of death in the United States. Despite these merits, EMR data are invariably corrupted by factors like missing values, outliers, and unrealistic measurements, which prevent researchers from fully utilizing such abundant data in many important studies. Many studies simply discard a large number of samples to get rid of missingness and eventually bias their data-driven analytical models. Existing techniques for missing data imputation use simplified linear models and are mostly suitable for imputing cross-sectional data missingness that ignore longitudinal missingness in patient follow-up data. This proposal aims to investigate novel artificial intelligence (AI) based models to improve the quality and utility of EMR data in preparation for data-driven retrospective studies. Toward this preparation, the goal of the project is 1) to investigate more accurate and robust data imputation models compared to existing ones and 2) adapt state-of-the-art deep learning techniques in preparing optimal representation of large EMR data. The proposed research will 1) maximize the quality and utility of EMR data to support a multitude of retrospective studies, 2) enable visualization of complex patient data, 3) identify more important and predictive clinical parameters, 4) yield a compact and optimal representation of large EMR datasets. We hypothesize that optimally processed EMR data with state-of-the-art AI models can most accurately model patient risk when compared to existing statistical and clinical risk models. This project will combine the complementary expertise of the collaborators, Dr. Manar Samad, PhD (Computer Science), Dr. Owen Johnson, DPH (Biostatistics and Public Health), and Dr. Edilberto Raynes, MD, PhD (Medicine) along with the participating undergraduate students at Tennessee State University (TSU). The proposal entails several research and development components that will allow undergraduate students to gain valuable research and analytical skills in data science, programming, and health informatics. The project activities will expose health science students to AI-based computing solutions to broaden their scope of future health research and career. This project will help TSU prepare a strong workforce of minority students who will gain competitive skill sets in data science and health informatics that are currently high in demand almost everywhere. Overall, the project will develop a data-capable workforce to strengthen an interdisciplinary research capacity and collaboration between the Departments of Computer and Health science at TSU.

IC Name

NATIONAL LIBRARY OF MEDICINE

Activity
R15
Administering IC
LM
Application Type
1

Direct Cost Amount
296972
Indirect Cost Amount
124728
Total Cost
421700
Sub Project Total Cost

ARRA Funded
False
CFDA Code
879
Ed Inst. Type
BIOMED ENGR/COL ENGR/ENGR STA
Funding ICs
NLM:421700\
Funding Mechanism
Non-SBIR/STTR RPGs
Study Section
BCHI
Study Section Name
Biomedical Computing and Health Informatics Study Section

Organization Name
TENNESSEE STATE UNIVERSITY
Organization Department
BIOSTATISTICS & OTHER MATH SCI
Organization DUNS
108814179
Organization City
NASHVILLE
Organization State
TN
Organization Country
UNITED STATES
Organization Zip Code
372091561
Organization District
UNITED STATES

Optimizing the Utility of Large Electronic Health Records Data in Data-Driven Health Research

Information

ApplicationId

Core Project Number

Full Project Number

Serial Number

FOA Number

Sub Project Id

Project Start Date

Project End Date

Program Officer Name

Budget Start Date

Budget End Date

Fiscal Year

Support Year

Suffix

Award Notice Date

Organizations

Optimizing the Utility of Large Electronic Health Records Data in Data-Driven Health Research

IC Name

Activity

Administering IC

Application Type

Direct Cost Amount

Indirect Cost Amount

Total Cost

Sub Project Total Cost

ARRA Funded

CFDA Code

Ed Inst. Type

Funding ICs

Funding Mechanism

Study Section

Study Section Name

Organization Name

Organization Department

Organization DUNS

Organization City

Organization State

Organization Country

Organization Zip Code

Organization District