Massive and diverse high-dimensional datasets are now routinely collected in a wide range of scientific fields. In many instances, in addition to the primary data from the target study, other datasets from different populations or under different environments with a similar structure to the primary data have been collected. Incorporating such related auxiliary data is desirable to make more accurate and informative decisions. For example, the availability of large-scale genomic and proteomic data promises a better understanding of disease processes and suggests the possibility of more accurate prediction of disease outcomes. Efficiently extracting meaningful information from multiple such datasets becomes a critical problem in medical research, which presents unprecedented opportunities to statisticians and data scientists. The project's goal is to devise a collection of advanced statistical tools for efficient integrative analysis of EHR and genomics data. <br/><br/>The PIs aim to address the pressing need for novel statistical methods to perform efficient integrative analysis that combines multiple data sources. The PIs plan to develop new methodologies and optimality theory for efficiently integrating large-scale data from multiple sources and to address critical biomedical problems using the newly developed methods. There are three major research goals to be pursued. One is to develop data-driven algorithms with theoretical optimality guarantees for transfer learning in various settings, including estimation/inference of high-dimensional covariance matrices, covariance functions for functional data, instrumental variable regression, and conformal inference. The second is to develop a class of adversarially robust algorithms that efficiently integrate the heterogeneous information from the multi-source data, including constructing the guided adversarially robust learning and conducting the group significance test for high-dimensional and nonparametric models. The third is to address the urgent needs and new challenges in biomedical studies through the analyses of EHR data and integrative genomics, using the newly developed methods for transfer learning and adversarially robust learning.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.