The instant application contains a Sequence Listing which has been submitted electronically in ASCII format and is hereby incorporated by reference in its entirety. Said ASCII copy, created on Oct. 22, 2021, is named 081906-1271535-240910PC_SL.txt and is 20,094 bytes in size.
Even when effective vaccines for COVID19 become available for broad application, there will remain a need for prophylactic and/or therapeutic inventions that will inhibit infection by SARS-CoV-2 and its mutational descendants. This need is especially acute for urgent care providers, community police, and military personnel in close quarters for extended periods of time. Frequent hand washing, wearing of the accepted standard of face masks, and physical distancing are proven to be effective for the general population, but are not practical for these essential sub-populations. Monoclonal antibody-based prophylactics are potentially highly effective, but the cost of production from mammalian cells limits applicability and large scale social impact. This is also true of monoclonal antibody-based intravenous therapeutics. Additionally, monoclonal antibodies are relatively unstable environmentally, not easily adapted to viral mutational descendants, and risk triggering unwanted viral responses such as antibody dependent enhancement (ADE) of infection or autoimmune responses such as the cytokine storm observed in some COVID19 patients. The inventions described here offer an alternative to the monoclonal antibody standard. They are a novel extension of previously proven out-of-body antiviral biotechnology: programmatic development of human body-friendly synthetic antibody substitutes that are mass producible at low cost, highly adaptable to emerging zoonotic viral threats, and applicable to COVID19 and future viral threats.
The inventions disclosed can be utilized as an alternative to any monoclonal antibody. Of the top ten biotechnology drugs in 2014 according to Statista, three where monoclonal antibodies to Tumor Necrosis Factor (TNF)—Humira/adalimumab, Remicade/infliximab, Enbrel/etanercept, followed by three other monoclonal antibodies: Rituxan/rituximab (anti-CD20 mAb), Avastin/bevacizumab (anti-VEGF mAb) and Herceptin/trastuzmab (anti-HER2 mAb). These six monoclonal antibodies had aggregate sales of almost $60 billion dollars/per year (supra). Monoclonal antibodies are typically expressed in eukaryotic cells cultured from multicellular organisms (humans, hamsters, or insects) in order to achieve correct protein folding and proper attachment of carbohydrates, which lend to variable production quality that is slow and scales poorly. In contrast, the invention disclosed herein can be expressed in prokaryotic cells or yeast, reducing the cost of GMP (Good Manufacturing Practice) by up to ten-fold. The invention disclosed represent a substantial economic improvement over current monoclonal antibody synthesis, particularly for chronic drug administration, such as a viral prophylactic or treatment of a chronic disease, including, but not limited to auto-immune diseases and cancer.
While previous patents have utilized human trimeric constructs for multimeric binding with monoclonal antibodies1-5, and heteromultimers6, none have used them as constructs for purely synthetic human derived antibody substitutes as described in the invention herein.
As illustrated in
The PBD is rationally engineered from a known human receptor protein. We can choose all these proteins to be expressible in E. coli or other prokaryotic organisms, as well as in single cell eukaryotic organisms such as P. pastoris. In some embodiments, the αm is the trimerization (m=3) domain of collagen7 (PDB code 3N3F), or other multimeric human protein complexes including trimeric EDA-18, 9 (PDB code 1RJ7, m=3), trimeric Langerin10, 11 (3KQG, m=3), and tetrameric diubiquitin12, 13 (2XEW, m=4); the mBSP may be obtained from structures including the p27 unit of the human dynactin complex14 (3TV0), or from the
Retinitis Pigmentosa 2 Protein15, 16 (RP2) (2BX6); the PBD may be obtained from the N-terminus of the human ACE2 receptor protein17, 18 for neutralization of SARS-CoV-1 and SARS-CoV-2 for example by binding to the corresponding coronavirus Spike protein Receptor Binding Domain (RBD), or from parts of any other known human receptor protein, such as the DPP4 human protein, which binds to the MERS Spike RBD.
The multimeric protein complex as antibody substitute described herein allows a variable size dependent upon the number, n, of mBSPs, so that one multimeric viral envelope protein (VEP) with m-fold symmetry denoted as (VEP)m can be neutralized or multiple (VEP)m can be neutralized. This is illustrated for m=3 for the SARS-CoV-2 Spike protein in
In some embodiments, the overall sequence α-βn-γp of one of the monomeric protein domains of the multimeric protein complex as antibody substitute comprises a sequence shown in SEQ ID NO:1 or at least 90% or 95% or 98% or 99% identical to SEQ ID NO :1, for which n=0, p=1, α is a monomeric protein of the collagen trimerization domain (3N3F on the PDB), and the PBD (γ) is at least 90% or 95% or 98% identical to the N-terminus domain of residues 19-85 of the human ACE2 receptor protein with modifications S19G, T20C, P84A, L85C. These changes provide a disulfide bridge to stabilize the ends of the protein.
In some embodiments, the overall sequence of α-βn-γp of one of the monomeric protein domains of the multimeric protein complex as antibody substitute comprises a sequence shown in SEQ ID NO: 2 or at least 90% or 95% or 98% or 99% identical to SEQ ID NO: 2 for which n=0, p=1, a is a monomeric protein of the human collagen trimerization domain (3N3F on the PDB), and the PBD (γ) is at least 90% or 95% or 98% identical to the N-terminus domain of residues 19-91 of the human ACE2 receptor protein with modifications N52Q, M62K, M69K. These changes remove a potential N-linked glycosylation site and mitigate the tendency of the PBD to dimerize via hydrophobic interaction since the lysines are positively charged.
In some embodiments, the overall sequence α-βn-γp of one of the monomeric protein units of the multimeric protein complex as antibody substitute comprises a sequence shown in SEQ ID NO:3 or at least 90% or 95% or 98% or 99% identical to SEQ ID NO:3 for which n=0, p=1, a is a monomeric protein of the human collagen trimerization domain (3N3F on the PDB), and the PBD (γ) is at least 90% or 95% or 98% identical to the N-terminus domain of residues 19-91 of the human ACE2 receptor protein with modifications N52Q, M62R, M69R. These changes remove a potential N-linked glycosylation site and mitigate the tendency of the PBD to dimerize via hydrophobic interaction since the arginines are positively charged.
In some embodiments, the overall sequence α-βn-γp of one of the monomeric protein units of the multimeric protein complex as antibody substitute comprises a sequence shown in SEQ ID NO:4 or at least 90% or 95% or 98% or 99% identical to SEQ ID NO:4
In some embodiments, the overall sequence α-βn-γp of one of the monomeric protein units of the multimeric protein complex as antibody substitute comprises a sequence shown in SEQ ID NO 5 or at least 90% or 95% or 98% or 99% identical to SEQ ID NO:5.
In some embodiments, the present invention may consist of a single PBD rationally engineered from a known pathogen-binding human receptor protein, or of p copies (γ)p of a single PBD=γ rationally engineered from a known pathogen binding human receptor protein that binds at a common interface.
In some embodiments, the γp may be rationally engineered from the human ACE2 receptor protein, and each element of the γp comprises the sequence shown in SEQ ID NO: 4 or at least 90% or 95% or 98% or 99% identical to SEQ ID NO: 6 or SEQ ID NO: 7.
In some embodiments, the PBD is modified to reduce probability of N-linked glycan attachment, for example by mutation of N52 in the wildtype N-terminus of the human ACE2 protein to N52Q so that the NIT glycan attachment sequence is disrupted.
In some embodiments, the PBD is modified to have a different sequence relative to wild type human form to deter multimerization of the monomeric proteins in γp.
In some embodiments, the α or β=mBSP domains of the (α-βn-γp)m or (γp-βn-α)m multimeric protein complex as antibody substitute may be modified to include at least one amino acid that promotes attachment to a solid support, a nanoparticle, or a biological molecule.
In some embodiments, the α or β=mBSP domains of the (α-βn-γp)m or (γp-βn-α)m multimeric protein complex as antibody substitute may be modified to include at least one amino acid that promotes attachment to human albumin.
In some embodiments, the pathogen is a virus. In some embodiments the virus is SARS-CoV-2 or SARS-CoV-1 or MERS or HBV or HCV or HIV or Ebola or Marburg or CMV.
In some aspects, the disclosure provides a multimeric protein complex as antibody substitute complex comprising a plurality (e.g., m≥3) of monomeric proteins with modular protein domains of the form (α-βn-γp)m, or (γp-γn-α)m wherein the monomeric proteins α-βn-γp or γp-γn-α comprise fused protein domains with a being a monomeric protein from a symmetric human multimeric protein complex of point group symmetry Cm or Dm, βn being a fused domain of n modified beta solenoid proteins (mBSPs) with 0≤n, and γp being a complex of p≥1 pathogen binding domains (PBDs) either fused or bound by intermolecular forces.
In some aspects, the disclosure provides a multimeric protein complex as antibody substitute complex comprising a plurality (e.g., m≥2) of monomeric proteins with modular protein domains of the form (α-βn-γp)m or (γp-βn-α)m wherein the monomeric proteins α-βn-γp or γp-βn-α comprise fused protein domains wherein:
In some embodiments, wherein the multimeric protein complex as antibody substitute is symmetrical. In some embodiments, the multimeric protein complex as antibody substitute has two-fold symmetry. In some embodiments, the multimeric protein complex as antibody substitute has three-fold symmetry. In some embodiments, the multimeric protein complex as antibody substitute has four-fold symmetry. In some embodiments, the multimeric protein complex as antibody substitute has five-fold symmetry. In some embodiments, the multimeric protein complex as antibody substitute has six-fold symmetry. In some embodiments, the multimeric protein complex as antibody substitute has twelve-fold symmetry.
In some embodiments, the modular protein domain a is a monomeric protein from a wild type symmetric multimeric protein complex αm. In some embodiments, the modular domains are subsequences of human proteins.
In some embodiments, α is a monomeric protein from the m=3 human collagen trimerization domain which is at least 90%, 95%, 98%, or 99% identical to SEQ ID NO: 9. In some embodiments, a is a monomeric protein from the m=3 human growth factor EDA-A1 which is at least 90%, 95%, 98%, or 99% identical to SEQ ID NO: 10. In some embodiments, α is a monomeric protein from the m=3 human Langerin which is at least 90%, 95%, 98%, or 99% identical to SEQ ID NO: 11. In some embodiments, a is a monomeric protein from the m=4 human tetrameric diubiquitin which is at least 90%, 95%, 98%, or 99% identical to SEQ ID NO: 12.
In some embodiments, n=0, p=1 and the protein binding domain (PBD) γ is at least 90%, 95%, 98% or 99% identical to the N-terminus domain (residues 19-85 or residues 19-91) of the human ACE2 receptor protein of SEQ ID NO: 6 or SEQ ID NO: 7.
In some embodiments, the monomeric sequence is at least 90%, 95%, 98% or 99% identical to SEQ ID NO:1. In some embodiments, the monomeric sequence is at least 90%, 95%, 98% or 99% identical to SEQ ID NO:2. In some embodiments, the monomeric sequence is at least 90%, 95%, 98% or 99% identical to SEQ ID NO:3.
In some embodiments, a is a monomeric protein from other m-fold symmetric protein multimeric protein complexes such as trimeric EDA1 of SEQ ID NO: 10 or Langerin of SEQ ID NO: 11 or tetrameric diubiquitin of SEQ ID NO: 12.
In some embodiments, modified human beta solenoid (mBSP) is at least 80%, 90%, 95%, 98%, or 99% identical to the dynactin p27 domain (3VT0) of SEQ ID NO: 8.
In some embodiments, the monomeric sequence α-βn-γp is at least 90%, 95%, 98% or 99% identical to SEQ ID NO: 4 and n=1 and p=1.
In some embodiments, the monomeric sequence γp-βn-α is at least 90%, 95%, 98% or 99% identical to SEQ ID NO: 5, and n=4 and p=1.
In some embodiments, the sequence is of the form γ2 and γ is at least 90%, 95%, 98% or 99% identical to SEQ ID NO: 6 or SEQ ID NO: 7.
In some embodiments, the modified human beta solenoid is modified to be at least 80%, 90%, 95%, 98% or 99% identical to the human Retinitis Pigmentosa Protein 2 (RP2) (2BX6).
In some embodiments, the pathogen binding domain is at least 90%, 95%, 98% or 99% identical to the helix-turn-helix (HTH) complex from the N-terminus of the ACE2 receptor protein of SEQ ID NO: 6 or SEQ ID NO: 7.
In some embodiments, at least one (e.g., 1, 2, 3, 4, 5, or more) amino acid of the a or modified human beta solenoid domain is modified to allow attachment to a nanoparticle, a solid support, or other biological molecule.
In some embodiments, the multimeric protein complex is attached to a nanoparticle, a solid support, or other biological molecule.
In some embodiments, the multimeric protein complex is attached to human serum albumin.
In some embodiments, at least one (e.g., 1, 2, 3, 4, or more) amino acid of one or more of the module domains is modified to allow attachment to a nanoparticle, a solid support, or other biological molecule.
Also provided is a multimeric protein complex as antibody substitute comprising a plurality of pathogen binding domains (e.g., 2, 3, 4, or more). In some embodiments, the pathogen binding domain is modified to be at least 90%, 95%, 98% or 99% identical to the HTH domain (residues 19-85 or 19-91) of the N-terminus of the ACE2 receptor protein of SEQ ID NO: 6 or SEQ ID NO: 7. In some embodiments, at least one (e.g., 1, 2, 3, 4, or more) amino acid of either or all pathogen binding domains are modified to allow attachment to a nanoparticle, a solid support, or other biological molecule.
Also provided is a method for neutralizing a pathogen comprising contacting said pathogen with the multimeric protein complex. In some embodiments, the antibody substitute is as described above or elsewhere herein, e.g., wherein one or more pathogen binding domains binds to one or more sites on the pathogen.
Also provided is a method for immobilizing a pathogen comprising contacting said pathogen with the multimeric protein complex. In some embodiments, the antibody substitute is as described above or elsewhere herein, e.g., wherein one or more pathogen binding domains binds to one or more sites on the pathogen. In some embodiments, the pathogen is a virus.
α: “monomeric protein unit of a symmetric multimeric protein complex.” This is a single protein unit of a symmetric multimeric protein complex.
αm: “symmetric multimeric protein complex”. A set of m (m can be 2, 3, 4, 5, etc.) identical proteins that form a complex invariant under m-fold rotations about the symmetry axis and are held together by non-covalent bonds. Examples am is the trimerization (m=3) domain of collagen7 (PDB code 3N3F), or other multimeric human protein complexes including trimeric EDA-18, 9 (PDB code 1RJ7, m=3), trimeric Langerie10, 11 (3KQG, m=3), and tetrameric diubiquitin12, 13 (2XEW, m=4). The full sequence of each of these proteins is available from the Protein Database.
BSP: “beta-solenoid protein”. Proteins having backbones that turn helically in either a left- or right-handed sense around the long axis of the protein structure from the N-terminus to the C-terminus to form contiguous β-sheets, typically with 1.5-2 nm sides. Examples of non-amyloidogenic WT-BSPs that can form amyloid fibrils upon modification include: one-sided antifreeze proteins (AFPs) (Tenebrio molitor AFP-Protein Database (PDB) Accession No. 1EZG19-22), two-sided AFPs (Snow Flea AFP-PDB 2PNE and 3BOI23, 24), rye grass AFP (PDB-3ULT25, 26), three-sided “type II” left handed beta-helical solenoid AFPs, for example from the spruce budworm (PDB 1M8N21), three-sided bacterial enzymes (PDB 1LXA27, 1FWY28, 1G9529, 1HV930, 1J2Z31, 1T3D32, 1THJ33, 1KGQ34, 1MR735, 1SSM36, 2WLC37, 3R3R38, 1KRV39, 3EH040, 3Q1X41, 3BXY42, 3HJJ43, 30GZ44, 4M9845, 4IHH46 (acyltransferases, γ-class carbonic anhydrases and homologs), three-sided human motor protein subunits (e.g., PDB 3TV014), a three-sided “type I” left handed beta-helical enzyme ydcK from Salmonellae cholera (2PIG47, 48), four sided proteins (PDB 2BM649, 2W7Z49, 2J8I49), four-sided pentapeptide repeat proteins (2G0Y50 and 3DU151), and 1XAT52. The full sequence of each of these proteins is available from the Protein Database.
mBSP: “modified β solenoid protein” (also referred to as “mBSP monomeric protein”). Genetically engineered β solenoid proteins that allow for insertion into a (α-βn-γp) fused monomeric protein with β=mBSP . The mBSP is modified from an existing BSP such as 3TV0. An mBSP monomeric protein can be engineered to be of any length, typically from three rungs of a beta solenoid structure up to 24 or more depending upon the length needed for a particular binding application.
βn: “modified β solenoid protein n-meric protein domain”. An mBSP monomeric protein domain consisting of n (n can be 0, 1, 2, 3, 4 etc.) identical fused and epitaxially bonded copies of a single mBSP derived from a wild type protein such as 3TV0.
mBSP epitaxy: “mBSP epitaxy”. The structural alignment of multiple engineered monomeric mBSPs (denoted as β in the protein schematic of this proposal per [0005] and
Functionalized mBSP: “functionalized mBSPs”. BSPs that are designed to specifically carry designated functional units, for example pathogen binding proteins, which are fused to either the amino- or carboxyl-terminus (or both) of mBSPs. In some embodiments, the mBSP monomeric proteins can further include one or more amino acid residues at the end.
PBD: “pathogen binding domain”. Proteins that are rationally engineered by extraction from full length human receptors that have binding to known viral envelope proteins, For example, in the examples of the present invention in SEQ ID NOS: 1-7, the PBD is taken from the N-terminus of the ACE2 receptor protein, with a few possible mutations, that binds to the RBD of the Spike VEP from SARS-CoV-2.
γp: “Pathogen binding domain with p copies”. A pathogen binding domain is denoted by γ in the multimeric protein complex schematic language of this application, that can form multimeric protein complexes with p-copies (p can be 1,2,3,4 etc.). For example, in the present invention, in sequence 6, the HTH2 complex binds together because of a significant content of hydrophobic residues on the HTH face opposite to the binding site for the RBD protein of the SARS-CoV-2 spike complex.
Multimeric protein as antibody substitute. A multimeric protein complex of the form (α-βn-γp)m with α-βn-γp the fused monomeric protein unit of the m-fold symmetric protein if the N-terminus of the symmetric human multimeric protein complex is the overall starting point of the fused protein, and where α is a monomeric protein from a symmetric human multimeric protein complex such as the human collagen trimerization domain (3N3F PDB ID) used in SEQ ID NOS: 1-3, β is an mBSP such as the modified p27 dynactin domain (PDB ID 3VT0) used in SEQ ID NOS: 4 and 5, and γ is a PBD extracted from a human receptor protein such as residues 19-91 of the ACE2 receptor protein used in SEQ ID NOS: 1-7. If the N-terminus of the monomeric unit of the protein is on the PBD, the multimeric protein complex as antibody substitute has the form (γp-βn-α)m. The index n refers to the fusion of n copies of an mBSP into a single domain, and the index p refers to p copies of a PBD either fused or bonded together by intermolecular interactions.
Fused domains—two otherwise independent protein domains are said to be fused if they are linked by a peptide bond.
VEPm: “m-fold viral envelope protein”. A symmetric complex on the surface of a virus that binds to a surface receptor protein on a human cell. The VEPm has m-fold point group symmetry such as Cm or Dm.
Neutralization: “Neutralization”. The coating of the VEP binding sites of the virus sufficiently to block attachment to a cell surface protein and thus block infection.
Sequence Identity: “identical or percent identity”. In the context of two or more nucleic acids or polypeptide sequences (e.g., two mBSPs and polynucleotides that encode them), this refers to two or more sequences or subsequences that are the same or have a specified percentage of amino acid residues or nucleotides that are the same, when compared and aligned for maximum correspondence, as measured using one of the sequence comparison algorithms listed below, or by visual inspection.
Substantially Identical: “substantially identical”. In the context of two nucleic acids or polypeptides of the invention, this refers to two or more sequences or subsequences that have at least 60%, 65%, 70%, 75%, 80%, or 90-95% nucleotide or amino acid residue identity (e.g., to any of the sequences here, including but not limited to SEQ ID NO:1, 2, 3, 4, and 5), when compared and aligned for maximum correspondence, as measured using one of the sequence comparison algorithms listed below, or by visual inspection. Preferably, the substantial identity exists over a region of the sequences that is at least 50 residues in length, more preferably over a region of at least 100 residues, and most preferably sequences that are substantially identical over at least 150 residues. In a most preferred embodiment, the sequences are substantially identical over the entire length of the coding regions.
Sequence Comparison: “sequence comparison”. Typically, one sequence acts as a reference sequence, to which test sequences are compared. When using a sequence comparison algorithm, test and reference sequences are input into a computer, subsequence coordinates are designated, if necessary, and sequence alignment program parameters are specified. The sequence alignment algorithm then calculates the percent sequence identity for the test sequence(s) relative to the reference sequence, based on the specified program parameters.
Optimal Alignment: “Optimal alignment”. This means the most likely alignment of protein sequences for comparison. This can be conducted, e.g., by the local homology algorithm of Smith & Waterman54, by the homology alignment algorithm of Needleman & Wunsch55, by the search for similarity method of Pearson & Lipman56, and by computerized implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group, 575 Science Dr., Madison, WI), or by visual inspection (see generally, Current Protocols in Molecular Biology, F. M. Ausubel et al., eds., Current Protocols, a joint venture between Greene Publishing Associates, Inc. and John Wiley & Sons, Inc., (1995 Supplement) (Ausubel)).
Alignment Algorithms: “Alignment algorithms”. These are programs that are suitable for determining percent sequence identity and sequence similarity, for example the BLAST and BLAST 2.0 algorithms, which are described in Altschul et al.57, 58. Software for performing BLAST analyses is publicly available through the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/). These algorithms involve first identifying high scoring sequence pairs (HSPs) by identifying short words of length W in the query sequence, which either match or satisfy some positive-valued threshold score T when aligned with a word of the same length in a database sequence. T is referred to as the neighborhood word score threshold57, 58. These initial neighborhood word hits act as seeds for initiating searches to find longer HSPs containing them. The word hits are then extended in both directions along each sequence for as far as the cumulative alignment score can be increased. Cumulative scores are calculated using, for nucleotide sequences, the parameters M (reward score for a pair of matching residues; always>0) and n (penalty score for mismatching residues; always<0). For amino acid sequences, a scoring matrix is used to calculate the cumulative score. Extension of the word hits in each direction are halted when: the cumulative alignment score falls off by the quantity X from its maximum achieved value; the cumulative score goes to zero or below, due to the accumulation of one or more negative-scoring residue alignments; or the end of either sequence is reached. The BLAST algorithm parameters W, T, and X determine the sensitivity and speed of the alignment. The BLASTN program (for nucleotide sequences) uses as defaults a word length (W) of 11, an expectation (E) of 10, M=5, N=-4, and a comparison of both strands. For amino acid sequences, the
BLASTP program uses as defaults a word length (W) of 3, an expectation (E) of 10, and the BLOSUM62 scoring matrix59
Statistical Analysis: “Statistical analysis”. This refers to quantitative statistical analysis of the similarity between two sequences to quantify the degree of similarity apart from the visual alignment and percentage overlap of the protein sequences60, 61. One measure of similarity provided by the BLAST algorithm is the smallest sum probability (P(N)), which provides an indication of the probability by which a match between two nucleotide or amino acid sequences would occur by chance. For example, a nucleic acid is considered similar to a reference sequence if the smallest sum probability in a comparison of the test nucleic acid to the reference nucleic acid is less than about 0.1, more preferably less than about 0.01, and most preferably less than about 0.001.
Avidity. Avidity refers to the enhanced binding of a multimeric protein complex relative to a monomeric protein when the multimeric protein complex consists of m copies of the monomer. Because the binding free energies of a monomeric protein add linearly absent cooperative effects between the multimeric units, the binding strength as characterized by the binding affinity measured by the equilibrium association constant KAm=(KDm)−1˜(KA1)m, so that the binding strength is dramatically enhanced by avidity.
RBD: “Receptor Binding Domain”. Here, this is applied to an individual protein of the spike trimer complex for a coronavirus such as SARS-CoV-2 that ‘flips’ between a concealed conformation and an exposed (binding) conformation from the spike complex. It is this RBD that binds to the HTH pathogen binding domain of the ACE2 protein receptor on epithelial cells.
Point group symmetry For the symmetric human multimeric protein complexes and the multimeric protein complexes as antibody substitutes, each protein takes the form Φm where Φ is a single (monomeric) protein and the m-copies bind at the interface such that when structural fluctuations are removed there is an m-fold symmetry about an axis through the center of the assembly: rotations of 2πq/m where 0≤q<m take the structure into itself. Such a pure rotation symmetry about one axis is denoted Cm. If there are in addition reflection symmetries in a plane perpendicular to the m-fold axis, the complex can have the symmetry Dm.
In this case the multimeric protein complex is a trimer (m=3) comprised of a monomeric protein from a symmetry human trimer such as the human collagen trimerization domain (PDB ID 3N3F), fused to a modified beta solenoid (mBSP) domain in which n copies of a single mBSP such as the p27 domain from dynactin (PDB ID 3VT0) are fused together, and p copies of a pathogen binding domain (PBD) such as residues 19-91 of the human ACE2 receptor protein. If the sequence begins with the α-monomer, it will have the form (α-βn-γp)m with α-βn-γp the fused (by peptide bond) monomeric protein of the α, βn, and γp domains. If the sequence begins with the PBD, it will have the form (γp-βn-α)m. The multimeric value m is chosen to match the symmetry of the corresponding viral envelope protein (VEP)m where, e.g., VEP can be a monomeric protein of the Spike trimer from the SARS-CoV-2 virus.
The inventors have discovered that engineered protein trimers and dimers composed of linked modular sequences from human proteins can neutralize viral pathogens in a prophylactic or therapeutic context to form a multimeric protein complex as antibody substitute. By fusing a purely human multimeric protein complex comprised of m monomeric proteins αm to mBSPs and PBDs to match the m-fold symmetry and geometry of the viral envelope protein, it is possible to neutralize the multimeric VEP protein with a multimeric protein as an antibody substitute properly attuned to the VEP symmetry. In contrast monoclonal antibodies which at best exhibit dimeric (m=2) protein binding, but without a geometry tuned to the VEPm geometry they typically exhibit monomeric protein binding. The multimeric protein binding increases the net binding affinity to the VEPm complex. The inventors have previously demonstrated, for example, that a similar binding domain to that of the PBD from the ACE2 of SEQ ID NOS:1-6 attached to mBSP polymers can successfully bind vascular endothelial growth factor (VEGF) protein. The symmetric human multimeric protein complexes expressible in E. coli can be, for example, the human collagen trimerization domain (PDB code 3N3F) or other multimeric human protein complexes including but not limited to trimeric EDA-1 (PDB code 1RJ7), trimeric Langerin (3KQG), and tetrameric diubiquitin (2XEW), as detailed in SEQ ID NO: 9, 10, 11, and 12.
The inventors have discovered that in the case of the PBD in the embodiment of the N-terminus sequence (see, e.g., SEQ ID NO: 6 and SEQ ID NO: 7) from the ACE2 protein that the binding is effective and essentially unchanged from the ACE2 dimer itself (
The inventors have discovered through extensive simulations using the YASARA molecular dynamics program63 that mutations associated with extensive variants of the SARS-CoV-2 virus do not significantly alter the binding of the PBD from residues 19-91 of the ACE2 to the RBD of the SARS-CoV-2 spike protein (
The inventors have discovered that the multimeric protein complex as antibody substitute designs of SEQ ID NO: 2 and SEQ ID NO: 3 are expressible in P. pastoris with the trimers secreted from the cells and needing no post-translation modification, confirming the advantage of the invention over monoclonal antibodies. (
The inventors have discovered that by inserting a human βn=mBSPn domain consisting of n fused copies of a single mBSP domain between a monomeric proteins and the γ=PBD element of the multimeric protein complex as antibody substitute that they can adapt the size of the multimeric protein complex as antibody substitute to match the size of the VEPm to bind to more than one VEP element at once (
The inventors have discovered that in the case of the N-terminus ACE2 HTH example of SEQ ID NO: 6 or SEQ ID NO: 7 for a PBD, that there is a tendency to dimerize when detached from the ACE2 protein (
By making use of purely or nearly (see the following paragraph) human proteins in each of the modular domains of (α-βn-γp) or (γp-βn-α) for the multimeric protein complex as antibody substitute, the invention avoids immunogenic response of the human host in prophylactic or therapeutic applications.
Immunogenic tolerance of the host to these multimeric protein complexes as antibody substitutes can be maintained by modifying up to 5 residues of the PBD to either (a) increase the innate binding strength to the VEP complex, or (b) to inhibit the dimerization of a PBD such as the HTH PBD construct from the ACE2 protein. For example, the substitutions of lysines or arginines at positions 62,69 of the HTH PBD in SEQ ID NO: 2 or 3 helps to significantly diminish the hydrophobic dimerization tendency.
The probability for attachment of N-linked glycans to the multimeric proteins as antibody substitutes can be substantially reduced by modifying NIT sequences to QIT, as for example in SEQ ID NO:2 or 3.
Exemplary multimeric protein complex as antibody substitute sequences of the form (α-βn-γp)m or (γp-βn-α)m include but are not limited to polypeptides comprising an amino acid sequence at least 90%, 95%, 98%, 99% or 100% identical any one of SEQ ID NOS: 1, 2, 3, 4, 5, 6, or 7 as provided below (numbers at right are amino acid count, beginning with N-terminal).
We have developed antibody replacements for viral neutralization from modular domain designs of proteins into multimeric proteins as antibody substitutes, extending work the inventors have previously demonstrated skill in developing for basic science64-70 and for applications53. The invention is to identify human symmetric multimeric protein complexes (αm) such as the human collagen trimerization domain (SEQ ID NO: 9 and PDB code 3N3F, M=3), with an example shown in
As example 1, we consider the multimeric protein complex as antibody substitute design of SEQ ID 1 and
where m is the number of binders (identical to the multimeric number m in this example) and ΔGB is the (negative) binding free energy associated with a single binding event, the dissociation constant of the trimeric antibody substitute of SEQ ID NOS: 1-4 is likely to be in the femtomolar regime when all three RBDs are simultaneously bound by a single trimer. This avidity concept is borne out by considerable theoretical evidence and argument71-75 and by recent evidence of engineered nanobody binding to the down domain of the spike protein where trimers have likely femtomolar affinity76.
ΔΔGMut−DMS=0.03437*(ΔΔGMut−HawkDock−3.8)
from which we can predict, with the theoretical ΔΔGMut−HawkDock, the dissociation constants relative to wildtype (WT) as
where T is taken at room temperature and R=8.29 J/mole-K is the ideal gas constant. This is the third column of
By working with human constructs for the multimeric proteins as antibody substitutes containing a minor number (≤4-5) of point mutations, it is anticipated that our modular protein designs will be resistant to immunogenic response/rejection, and proteolytic degradation inside the body. Moreover, by using common domains from ubiquitous human proteins, the potential for autoimmune response is avoided.
By choosing protein domains for our multimeric proteins as antibody substitutes which have been expressed from well proven, scalable, industrialized prokaryotic or single celled eukaryotic processes, we save the expense and unpredictability of monoclonal antibody production.
The molecular weight of each monomeric protein in our multimeric protein complex as antibody substitute designs is relatively small compared to antibodies. The design of SEQ ID 1 and
In example 2, we consider the somewhat larger construct of
This design can access the RBD domain even when it is flipped out away from the spike complex, and thus is amenable to a wider range of binding conformations than the smaller design of
The much larger multimeric protein complex as antibody substitute design of
The HTH complex itself, in dimer form, can be a potent multimeric protein complex as antibody substitute neutralizing agent. The dimer construct of
This multimeric protein complex as antibody substitute complex can be mutated by up to 4 amino acids to achieve higher affinity binding with the RBD without inducing immunogenic response, and such mutations can enhance the affinity per HTH to sub-nanomolar KD values78.
If necessary, particularly in attachment to the complexes of EXAMPLES 1-3, we can modify the hydrophobic “underside” of the HTH complex by 1-4 residues to prevent self-association.
By fusing a specific human serum albumin binding sequence to the side of the trimers in the multimeric protein complex as antibody substitutes discussed in EXAMPLES 1-3 opposite the binding face to the spike proteins, we can attach to albumin in the blood. For example, the SA2 peptide invented by Genentech79-82, has specific binding to serum albumin. This provides steric hindrance to viral binding in addition to the explicit blocking of the spike proteins and engenders enhanced lifetimes in vivo.
The multimeric protein complex as antibody substitute inventions herein, while using specific examples of binding to the SARS-CoV-2 virus, are not solely restricted to this application. For example, the general multimeric protein complex as antibody substitute schema (α-βn-γp)m or (γp-βn-α)m can be extended to develop neutralization agents for the trimeric haemagglutinin VEPs on the surface of influenza virions, the tetrameric neuraminidase complexes on influenza virions, or the trimeric gp120 VEPs on the surface of HIV virions.
Additionally, the general (α-βn-γp)m or (γp-βn-α)m scheme can be extended to binding to multimeric fusion complexes on microorganisms or tumor cells.
10. H. Feinberg, A. S. Powlesland, M. E. Taylor and W. I. Weis, Journal of Biological Chemistry 285 (17), 13285-13293 (2010).
Although the foregoing invention has been described in some detail by way of illustration and example for purposes of clarity of understanding, one of skill in the art will appreciate that certain changes and modifications may be practiced within the scope of the appended claims. In addition, each reference provided herein is incorporated by reference in its entirety to the same extent as if each reference was individually incorporated by reference. Where a conflict exists between the instant application and a reference provided herein, the instant application shall dominate.
The present patent application is a US National Phase Application Under 371 of International Application PCT/US2021/051772 filed Sep. 23, 2021, which claims benefit of priority to U.S. Provisional Patent Application No. 63/082,587, filed Sep. 24, 2020, each of which is incorporated by reference for all purposes.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2021/051772 | 9/23/2021 | WO |
Number | Date | Country | |
---|---|---|---|
63082587 | Sep 2020 | US |