Rapid response for pandemics: single cell sequencing and deep learning to predict antibody sequences against an emerging antigen

Information

Research Project
10274223

ApplicationId
10274223
Core Project Number
R01AI169543
Full Project Number
1R01AI169543-01
Serial Number
169543
FOA Number
RFA-RM-20-020
Sub Project Id

Project Start Date
9/16/2021 - 3 years ago
Project End Date
8/31/2024 - 9 months ago
Program Officer Name
GONDRE-LEWIS, TIMOTHY A
Budget Start Date
9/16/2021 - 3 years ago
Budget End Date
8/31/2024 - 9 months ago
Fiscal Year
2021
Support Year
01
Suffix
Award Notice Date
9/16/2021 - 3 years ago

Organizations

Keck Graduate Institute

Information

Rapid response for pandemics: single cell sequencing and deep learning to predict antibody sequences against an emerging antigen

ABSTRACT One of the ?holy grails? in immunology is to be able to directly predict tight-binding variable chain antibody sequences in silico against foreign or non-self `antigenic' proteins. Immunoglobulin chain rearrangement can potentially encode approximately 1016 different variants of antibody heavy and light chain sequences. However, only a small fraction of the sequence space is generally accessed for evolving antibodies against foreign proteins. The computational challenge is to go from a model of the structure of an antigen to predicting a set of antibody chain sequences that can bind tightly to the antigen. If solved, it might be possible to move in less than 24 hours from the first cryo-electron-microscopic structure of a novel viral protein to advance a set of potent antibody-like molecular candidates for testing. Towards solving this problem, this project aims to develop a deep learning architecture that will take as input thermodynamic, quantum mechanical (density functional), and local structure- based network topographical features of the antigens and their cognate antibodies, and will output their respective binding affinity constants. We will design a generative adversarial network (GAN), which we think is uniquely suited for regression-based ML approaches for the immune system, to discover associations between the epitope and the variable chain features. This approach requires a large data stream of antigen and cognate antibody sequences, which until recently was difficult to obtain. A recently described single B-cell receptor (BCR) specific tagging method coupled with single cell deep sequencing (?linking B cell receptor to antigen specificity through sequencing? or LIBRA- seq) can rapidly isolate and sequence the BCR variable chain coding regions that can bind with high selectivity to antigenic epitopes. Towards the specific project goals, in Task 1, LIBRA-seq will be used to rapidly identify and generate candidate immunoglobulin coding sequences in response to specific linear and nonlinear epitopes (against controls), chosen through computational/molecular modeling and prioritized with SARS-CoV-2 Spike protein epitopes (but not restricted to these), injected into a mouse model, to generate large training sets; in Task 2, these training sets, along with other data sets already available in public databases, will generate a series of structural features (described above), which will be used to train the GAN; in Task 3, the predicted epitope-antibody interactions will be validated by direct experiments with synthetic antibody and phage-display systems. Thus, the proposed strategy combines foundational principles in evolutionary biology, genomics, structural chemistry, and computer science to the solution of a general biological engineering problem. Results from this project are expected to lay the foundations for a rigorously tested and fully automated machine- learning system that could rapidly generate synthetic antibody candidates from the structure of a novel virus protein, which can enhance the rapid response ability against a future pandemic. The ability to develop targeted antibody therapy against non-infectious or chronic diseases, and on the production of antibody-based industrial enzymes, will also be dramatically enhanced if this project were to be successful. The team: The team-leads of this multi-institutional research project comprise a computer scientist, a protein crystallographer, an immunologist, and a molecular biologist. 1

IC Name

NATIONAL INSTITUTE OF ALLERGY AND INFECTIOUS DISEASES

Activity
R01
Administering IC
AI
Application Type
1

Direct Cost Amount
1437367
Indirect Cost Amount
414260
Total Cost
1851627
Sub Project Total Cost

ARRA Funded
False
CFDA Code
310
Ed Inst. Type
UNIVERSITY-WIDE
Funding ICs
OD:1851627\
Funding Mechanism
Non-SBIR/STTR RPGs
Study Section
ZRG1
Study Section Name
Special Emphasis Panel

Organization Name
KECK GRADUATE INST OF APPLIED LIFE SCIS
Organization Department
NONE
Organization DUNS
011116907
Organization City
CLAREMONT
Organization State
CA
Organization Country
UNITED STATES
Organization Zip Code
917114817
Organization District
UNITED STATES

Rapid response for pandemics: single cell sequencing and deep learning to predict antibody sequences against an emerging antigen

Information

ApplicationId

Core Project Number

Full Project Number

Serial Number

FOA Number

Sub Project Id

Project Start Date

Project End Date

Program Officer Name

Budget Start Date

Budget End Date

Fiscal Year

Support Year

Suffix

Award Notice Date

Organizations

Rapid response for pandemics: single cell sequencing and deep learning to predict antibody sequences against an emerging antigen

IC Name

Activity

Administering IC

Application Type

Direct Cost Amount

Indirect Cost Amount

Total Cost

Sub Project Total Cost

ARRA Funded

CFDA Code

Ed Inst. Type

Funding ICs

Funding Mechanism

Study Section

Study Section Name

Organization Name

Organization Department

Organization DUNS

Organization City

Organization State

Organization Country

Organization Zip Code

Organization District