Rapid response for pandemics: single cell sequencing and deep learning to predict antibody sequences against an emerging antigen

Information

  • Research Project
  • 10274223
  • ApplicationId
    10274223
  • Core Project Number
    R01AI169543
  • Full Project Number
    1R01AI169543-01
  • Serial Number
    169543
  • FOA Number
    RFA-RM-20-020
  • Sub Project Id
  • Project Start Date
    9/16/2021 - 3 years ago
  • Project End Date
    8/31/2024 - 9 months ago
  • Program Officer Name
    GONDRE-LEWIS, TIMOTHY A
  • Budget Start Date
    9/16/2021 - 3 years ago
  • Budget End Date
    8/31/2024 - 9 months ago
  • Fiscal Year
    2021
  • Support Year
    01
  • Suffix
  • Award Notice Date
    9/16/2021 - 3 years ago
Organizations

Rapid response for pandemics: single cell sequencing and deep learning to predict antibody sequences against an emerging antigen

ABSTRACT One of the ?holy grails? in immunology is to be able to directly predict tight-binding variable chain antibody sequences in silico against foreign or non-self `antigenic' proteins. Immunoglobulin chain rearrangement can potentially encode approximately 1016 different variants of antibody heavy and light chain sequences. However, only a small fraction of the sequence space is generally accessed for evolving antibodies against foreign proteins. The computational challenge is to go from a model of the structure of an antigen to predicting a set of antibody chain sequences that can bind tightly to the antigen. If solved, it might be possible to move in less than 24 hours from the first cryo-electron-microscopic structure of a novel viral protein to advance a set of potent antibody-like molecular candidates for testing. Towards solving this problem, this project aims to develop a deep learning architecture that will take as input thermodynamic, quantum mechanical (density functional), and local structure- based network topographical features of the antigens and their cognate antibodies, and will output their respective binding affinity constants. We will design a generative adversarial network (GAN), which we think is uniquely suited for regression-based ML approaches for the immune system, to discover associations between the epitope and the variable chain features. This approach requires a large data stream of antigen and cognate antibody sequences, which until recently was difficult to obtain. A recently described single B-cell receptor (BCR) specific tagging method coupled with single cell deep sequencing (?linking B cell receptor to antigen specificity through sequencing? or LIBRA- seq) can rapidly isolate and sequence the BCR variable chain coding regions that can bind with high selectivity to antigenic epitopes. Towards the specific project goals, in Task 1, LIBRA-seq will be used to rapidly identify and generate candidate immunoglobulin coding sequences in response to specific linear and nonlinear epitopes (against controls), chosen through computational/molecular modeling and prioritized with SARS-CoV-2 Spike protein epitopes (but not restricted to these), injected into a mouse model, to generate large training sets; in Task 2, these training sets, along with other data sets already available in public databases, will generate a series of structural features (described above), which will be used to train the GAN; in Task 3, the predicted epitope-antibody interactions will be validated by direct experiments with synthetic antibody and phage-display systems. Thus, the proposed strategy combines foundational principles in evolutionary biology, genomics, structural chemistry, and computer science to the solution of a general biological engineering problem. Results from this project are expected to lay the foundations for a rigorously tested and fully automated machine- learning system that could rapidly generate synthetic antibody candidates from the structure of a novel virus protein, which can enhance the rapid response ability against a future pandemic. The ability to develop targeted antibody therapy against non-infectious or chronic diseases, and on the production of antibody-based industrial enzymes, will also be dramatically enhanced if this project were to be successful. The team: The team-leads of this multi-institutional research project comprise a computer scientist, a protein crystallographer, an immunologist, and a molecular biologist. 1

IC Name
NATIONAL INSTITUTE OF ALLERGY AND INFECTIOUS DISEASES
  • Activity
    R01
  • Administering IC
    AI
  • Application Type
    1
  • Direct Cost Amount
    1437367
  • Indirect Cost Amount
    414260
  • Total Cost
    1851627
  • Sub Project Total Cost
  • ARRA Funded
    False
  • CFDA Code
    310
  • Ed Inst. Type
    UNIVERSITY-WIDE
  • Funding ICs
    OD:1851627\
  • Funding Mechanism
    Non-SBIR/STTR RPGs
  • Study Section
    ZRG1
  • Study Section Name
    Special Emphasis Panel
  • Organization Name
    KECK GRADUATE INST OF APPLIED LIFE SCIS
  • Organization Department
    NONE
  • Organization DUNS
    011116907
  • Organization City
    CLAREMONT
  • Organization State
    CA
  • Organization Country
    UNITED STATES
  • Organization Zip Code
    917114817
  • Organization District
    UNITED STATES