Secure and Privacy-preserving Genome-wide and Phenome-wide Association Studies via Intel Software Guard Extensions (SGX)

Information

  • Research Project
  • 10269896
  • ApplicationId
    10269896
  • Core Project Number
    R01HG010798
  • Full Project Number
    5R01HG010798-03
  • Serial Number
    010798
  • FOA Number
    PAR-18-844
  • Sub Project Id
  • Project Start Date
    8/9/2019 - 4 years ago
  • Project End Date
    5/31/2023 - 11 months ago
  • Program Officer Name
    SOFIA, HEIDI J
  • Budget Start Date
    6/1/2021 - 2 years ago
  • Budget End Date
    5/31/2022 - a year ago
  • Fiscal Year
    2021
  • Support Year
    03
  • Suffix
  • Award Notice Date
    8/16/2021 - 2 years ago
Organizations

Secure and Privacy-preserving Genome-wide and Phenome-wide Association Studies via Intel Software Guard Extensions (SGX)

With the rapid growth of the data volume (e.g., human genomic data) collected in biomedical research, data protection, in particular for patients? privacy in secondary uses of these data, has attracted much attention recently. Today, a vast majority of sensitive biomedical data, including individual human genomic data and their associated health metadata, are shared only through controlled-access databases (e.g. dbGaP) and biomedical researchers are required to sign a user agreement before getting access to these data. Security research has already produced a suite of techniques that can serve the general purpose of privacy-preserving computation; their direct applications are, however, too expensive (in terms of resource consumption) for real-world biomedical applications. An alternative solution is hardware-assisted Trusted Execution Environment (TEE) solutions developed or being developed by both hardware vendors (Intel, AMD, ARM) and the open-source research community. A prominent example is Intel?s Software Guard Extension (SGX), which is available as a feature in Intel's mainstream CPUs (i.e., Skylake and Kaby Lake). In this project, we plan to explore potential applications of TEE to two popular genome computation tasks involving sensitive biomedical data, i.e., the genome-wide and phenome-wide association studies. For GWAS, a secondary research user may collect genomic sequences (in encrypted form) with (cases) or without (controls) a disease phenotype from multiple data owners, on which association tests or advanced GWAS algorithms can be conducted within the SGX enclave. Similarly, for PheWAS, a user may collect phenotype data from individuals whose genomes containing (case) or not containing (control) one or more specific variations. We will address two issues when developing these approaches: 1) we will customize GWAS/PheWAS algorithms for efficient execution in the TEE with limited resources (e.g, memory, I/O, etc), and 2) we will develop new genome computing outsourcing and data sharing platforms suing the SGX techniques, and further understand and mitigate its potential side-channel risks with regards to GWAS/PheWAS computing tasks. The proposed research will lead to a practical solution for secure GWAS and PheWAS in three application scenarios: 1) secure outsourcing: a research institution collects matched genomic and phenotypic data from a large cohort of case and control individuals, and outsources the storage of these data and potential repeated GWAS and PheWAS computation to a public or commercial cloud; 2) secure collaboration: a consortium of researchers across multiple institutions attempt to collaborate on a large GWAS/PheWAS study using the data collected by each participating institution; and 3) secure data sharing: researchers want to share their data with a broad biomedical research community so that potential data users may conduct a secondary GWAS/PheWAS analysis.

IC Name
NATIONAL HUMAN GENOME RESEARCH INSTITUTE
  • Activity
    R01
  • Administering IC
    HG
  • Application Type
    5
  • Direct Cost Amount
    257634
  • Indirect Cost Amount
    90110
  • Total Cost
    347744
  • Sub Project Total Cost
  • ARRA Funded
    False
  • CFDA Code
    172
  • Ed Inst. Type
    SCHOOLS OF ARTS AND SCIENCES
  • Funding ICs
    NHGRI:347744\
  • Funding Mechanism
    Non-SBIR/STTR RPGs
  • Study Section
    ZRG1
  • Study Section Name
    Special Emphasis Panel
  • Organization Name
    INDIANA UNIVERSITY BLOOMINGTON
  • Organization Department
    MISCELLANEOUS
  • Organization DUNS
    006046700
  • Organization City
    BLOOMINGTON
  • Organization State
    IN
  • Organization Country
    UNITED STATES
  • Organization Zip Code
    474013654
  • Organization District
    UNITED STATES