Collaborative Research: Multi-source Learning: Data-driven Algorithms, Optimality Theory, and Applications

Information

  • NSF Award
  • 2413106
Owner
  • Award Id
    2413106
  • Award Effective Date
    8/1/2024 - a year ago
  • Award Expiration Date
    7/31/2027 - a year from now
  • Award Amount
    $ 250,000.00
  • Award Instrument
    Standard Grant

Collaborative Research: Multi-source Learning: Data-driven Algorithms, Optimality Theory, and Applications

Massive and diverse high-dimensional datasets are now routinely collected in a wide range of scientific fields. In many instances, in addition to the primary data from the target study, other datasets from different populations or under different environments with a similar structure to the primary data have been collected. Incorporating such related auxiliary data is desirable to make more accurate and informative decisions. For example, the availability of large-scale genomic and proteomic data promises a better understanding of disease processes and suggests the possibility of more accurate prediction of disease outcomes. Efficiently extracting meaningful information from multiple such datasets becomes a critical problem in medical research, which presents unprecedented opportunities to statisticians and data scientists. The project's goal is to devise a collection of advanced statistical tools for efficient integrative analysis of EHR and genomics data. <br/><br/>The PIs aim to address the pressing need for novel statistical methods to perform efficient integrative analysis that combines multiple data sources. The PIs plan to develop new methodologies and optimality theory for efficiently integrating large-scale data from multiple sources and to address critical biomedical problems using the newly developed methods. There are three major research goals to be pursued. One is to develop data-driven algorithms with theoretical optimality guarantees for transfer learning in various settings, including estimation/inference of high-dimensional covariance matrices, covariance functions for functional data, instrumental variable regression, and conformal inference. The second is to develop a class of adversarially robust algorithms that efficiently integrate the heterogeneous information from the multi-source data, including constructing the guided adversarially robust learning and conducting the group significance test for high-dimensional and nonparametric models. The third is to address the urgent needs and new challenges in biomedical studies through the analyses of EHR data and integrative genomics, using the newly developed methods for transfer learning and adversarially robust learning.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

  • Program Officer
    Tapabrata Maititmaiti@nsf.gov7032925307
  • Min Amd Letter Date
    7/17/2024 - a year ago
  • Max Amd Letter Date
    7/17/2024 - a year ago
  • ARRA Amount

Institutions

  • Name
    University of Pennsylvania
  • City
    PHILADELPHIA
  • State
    PA
  • Country
    United States
  • Address
    3451 WALNUT ST STE 440A
  • Postal Code
    191046205
  • Phone Number
    2158987293

Investigators

  • First Name
    T. Tony
  • Last Name
    Cai
  • Email Address
    tcai@wharton.upenn.edu
  • Start Date
    7/17/2024 12:00:00 AM

Program Element

  • Text
    STATISTICS
  • Code
    126900