Data science tools to identify robust exposure-phenotype associations for precision medicine

Information

  • Research Project
  • 10095924
  • ApplicationId
    10095924
  • Core Project Number
    R01ES032470
  • Full Project Number
    1R01ES032470-01
  • Serial Number
    032470
  • FOA Number
    PA-19-056
  • Sub Project Id
  • Project Start Date
    9/10/2021 - 2 years ago
  • Project End Date
    6/30/2026 - 2 years from now
  • Program Officer Name
    DUNCAN, CHRISTOPHER GENTRY
  • Budget Start Date
    9/10/2021 - 2 years ago
  • Budget End Date
    6/30/2022 - a year ago
  • Fiscal Year
    2021
  • Support Year
    01
  • Suffix
  • Award Notice Date
    9/10/2021 - 2 years ago
Organizations

Data science tools to identify robust exposure-phenotype associations for precision medicine

Project Summary/Abstract Phenotypic variability across demographically diverse populations are driven by environmental factors. The overall goal of this proposal is to deploy data science approaches to drive discovery of associations between exposures (E) and phenotypes (P) in demographically diverse populations. We lack data science methods to associate, replicate, and prioritize exposure variables of the exposome (E) in phenotypes (P) and disease incidence (D), required for the delivery of precision medicine. Observational studies are fraught with 4 unsolved data science challenges. First, E-based studies are: (1) limited to associating a few hypothesized exposure- phenotype pairs (E-P) at a time, leading to a fragmented literature of environmental associations. Machine learning (ML) approaches for feature selection and prediction hold promise, however, (2) most extant E-based cohorts contain missing data, challenging the use of ML to detect complex E-P associations, Third, (3) biases, such as confounding and study design influence associations and hinder translation. Fourth, (4) there are few well-powered data resources that systematically document longitudinal E-P and E-D associations across massive precision medicine. It is a challenge to systematically associate a number of exposures in multiple phenotypes and replicate these associations across cohorts. (Aim 1). The ?vibration of effects?, or the degree to which associations change as a function of study design (e.g., analytic method, sample size) and model choice is a hidden bias in observational studies (Aim 2). Third, an outstanding question is the degree to which environmental differences lead to health disparities. To address these challenges and gaps, we propose to Aim 1: develop and test machine learning methods to associate multiple environmental exposure indicators with multiple phenotypes: EP-WAS. We hypothesize that exposures will explain a significant amount of variation in phenotype in populations and will deposit all data and models in a novel EP-WAS Catalog. Aim 2: Quantitate how study design influences associations between exposure biomarkers and phenotype. We will scale up, extend, and test a method called ?vibration of effects? (VoE) to measure how study criteria influences the stability of associations (how reproducible associations are as a function of analytic choice). Aim 3. Leverage EP-WAS and VoE to disentangle biological, demographic, and environmental influences of phenotypic disparities in hypercholesterolemia. We will deploy EP-WAS and VoE packaged libraries in the largest cohort study to partition phenotypic variation across demographic groups in factors for hypercholesterolemia. We will equip the biomedical community with data science approaches for robust data-driven discovery and interpretation of exposure-phenotype factors in observational datasets, required for the identification of environmental health disparities. For the first time, investigators will ascertain the collective role of the environment in heart disease at scale just in time for the All of Us program.

IC Name
NATIONAL INSTITUTE OF ENVIRONMENTAL HEALTH SCIENCES
  • Activity
    R01
  • Administering IC
    ES
  • Application Type
    1
  • Direct Cost Amount
    511118
  • Indirect Cost Amount
    186634
  • Total Cost
    697752
  • Sub Project Total Cost
  • ARRA Funded
    False
  • CFDA Code
    113
  • Ed Inst. Type
    SCHOOLS OF MEDICINE
  • Funding ICs
    NIEHS:697752\
  • Funding Mechanism
    Non-SBIR/STTR RPGs
  • Study Section
    BDMA
  • Study Section Name
    Biodata Management and Analysis Study Section
  • Organization Name
    HARVARD MEDICAL SCHOOL
  • Organization Department
    MISCELLANEOUS
  • Organization DUNS
    047006379
  • Organization City
    BOSTON
  • Organization State
    MA
  • Organization Country
    UNITED STATES
  • Organization Zip Code
    021156027
  • Organization District
    UNITED STATES