The instant application contains a Sequence Listing which has been filed electronically in ASCII format and is hereby incorporated by reference in its entirety. Said ASCII copy, created on Aug. 8, 2022, is named LBH-02001_SL.txt and is 18,144 bytes in size.
The ongoing COVID-19 pandemic has infected around 500 million people and killed more than 6.1 million people worldwide, as of April 2022. The continual emergence of SARS-CoV-2 variants with increased transmissibility and capacity for immune escape, such as B.1.17 (“UK variant”) and P.1 (“Brazilian variant”), threatens to prolong the pandemic through devastating outbreaks such as the one currently being witnessed in India.
While multiple vaccines have demonstrated high effectiveness in clinical trials and real-world studies, there have been reports of “vaccine breakthrough infections” with SARS-CoV-2 variants. A recent study described two such cases in New York, at least one of which occurred despite confirmation of a robust neutralizing antibody response. Variant classification schemes have been developed by the US Centers for Disease Control and Prevention (CDC) and the World Health Organisation (WHO) based on factors such as prevalence, evidence of transmissibility and disease severity, and ability to be neutralized by existing therapeutics or sera from vaccinated patients.
It is imperative to further understand and combat these emerging Variants of Concern/Interest to contain the ongoing pandemic and manage or prevent future outbreaks.
In some aspects, compositions for use as a vaccine against SARS-CoV-2 infection comprise either a polypeptide that comprises at least one surge-associated mutation in its amino acid sequence with respect to SEQ ID NO: 1 or a nucleic acid that encodes said polypeptide.
In some embodiments, said at least one mutation is within the residue range 13-303 of SEQ ID NO: 1. In preferred embodiments, said at least one mutation is a deletion.
In some embodiments, the composition comprises said nucleic acid (e.g., ribonucleic acid). In some embodiments, the composition comprises a messenger ribonucleic acid (mRNA). In some embodiments, the mRNA comprises a 5′ cap, 5′-untranslated region, a 3′-untranslated region, and a poly(A) tail. In some embodiments, the mRNA acid comprises at least one non-canonical nucleobase.
In some embodiments, said mutation is a deletion of any one or more residues selected from 14-16, 24-26, 63-76, 81, 85-89, 136-146, 150-165, 167-169, 210-212, 214, 216 and 241-252.
In some embodiments, said mutation is a deletion of two or more residues selected from 14-16, 24-26, 63-76, 81, 85-89, 136-146, 150-165, 167-169, 210-212, 214, 216 and 241-252. In some embodiments, said mutation is a deletion of 3, 4, 5, 6, 7, 8, 9, or 10 residues selected from 14-16, 24-26, 63-76, 81, 85-89, 136-146, 150-165, 167-169, 210-212, 214, 216 and 241-252. In some embodiments, the mutation comprises a contiguous stretch of residues. In some embodiments, the mutation comprises two separate contiguous stretches of residues. In some embodiments, the mutation comprises three or more separate contiguous stretches of residues. In certain preferred embodiments, said mutation is a deletion of one or more residues selected from those described in
In some aspects, compositions comprise two or more of the polypeptides described above, or nucleic acids that encode two or more of the polypeptides.
In some aspects, antibodies or antigen-binding fragments thereof are disclosed, which bind to the polypeptides described above. Suitable methods can be used to generate such antibodies against the disclosed polypeptides.
In some aspects, formulations comprise the compositions described above or elsewhere herein. In some such embodiments, the formulations comprise at least one excipient. For example, the formulations further comprise a delivery system. In some such embodiments, the delivery system is selected from protamine, protamine liposome, polysaccharide particles, cationic nanoemulsion, cationic polymer, cationic polymer liposome, cationic lipid nanoparticle, cationic lipid/cholesterol nanoparticles, cationic lipid/cholesterol/PEG nanoparticles, and dendrimer nanoparticles.
In some aspects, formulations comprise two or more of the polypeptides (or nucleic acids encoding two or more of the polypeptides) described above or elsewhere herein and at least one excipient.
In some aspects, methods of vaccinating a subject against SARS-CoV-2 infection comprise administering to the subject a composition or a formulation as described above or elsewhere herein. In some such embodiments, said administering is via intramuscular injection or intradermal injection.
In some aspects, methods of selecting an antibody for treating a SARS-CoV-2 infection in a subject comprise determining the presence of one or more mutations at a residue selected from 14-16, 24-26, 63-76, 81, 85-89, 136-146, 150-165, 167-169, 210-212, 214, 216 and 241-252 in a SARS-CoV-2 spike protein from the subject; and selecting an antibody that does not bind to the N-terminal domain antigenic supersite of the SARS-CoV-2 spike protein.
In some aspects, methods of making an antibody comprise using a polypeptide described above or elsewhere herein as the target antigen.
In some aspects, methods of selecting a convalescent plasma against SARS-CoV-2 infection in a subject comprise determining the presence of one or more mutations at a residue selected from 14-16, 24-26, 63-76, 81, 85-89, 136-146, 150-165, 167-169, 210-212, 214, 216 and 241-252 in a SARS-CoV-2 spike protein from the subject; and selecting a convalescent plasma having antibodies that do not bind to the N-terminal domain antigenic supersite of the SARS-CoV-2 spike protein.
In some aspects, methods of selecting a vaccine against SARS-CoV-2 infection in a subject comprise determining the presence of one or more mutations at a residue selected from 14-16, 24-26, 63-76, 81, 85-89, 136-146, 150-165, 167-169, 210-212, 214, 216 and 241-252 in a spike protein from an emerging variant of SARS-CoV-2; and selecting a vaccine having a polypeptide described above or elsewhere herein.
Further embodiments and details for each of these aspects is presented throughout the disclosure.
The present disclosure is based, at least in part, to the discovery that deletions in the Spike protein NTD that map to an antigenic supersite have emerged over the course of the pandemic are strongly associated with case surges and are present in a subset of vaccine breakthrough variants.
In accord with this discovery and further findings, in some aspects, compositions for use as a vaccine against SARS-CoV-2 infection are disclosed, which comprise either a polypeptide that comprises at least one surge-associated mutation (e.g., deletion) in its amino acid sequence with respect to SEQ ID NO: 1 (e.g., at its NTD) or a nucleic acid (e.g., mRNA) that encodes said polypeptide. The mutation is preferably a deletion of any one residue or more than one residue or a range of contiguous residues selected from 14-16, 24-26, 63-76, 81, 85-89, 136-146, 150-165, 167-169, 210-212, 214, 216 and 241-252 with respect to SEQ ID NO: 1.
Additional aspects include various formulations that include these compositions, antibodies or their antigen-biding fragments directed to these polypeptides, methods of making such antibodies, methods of vaccinating subjects against SARS-CoV-2 infection, and methods of selecting an antibody, convalescent plasma, or vaccine against SARS-CoV-2 infection.
In certain preferred embodiments, the compositions described herein are injectable compositions with one or more excipients and no other pathogens or biological materials.
In certain preferred embodiments, two or more of the disclosed polypeptides (or nucleic acids encoding them) can be combined, or the disclosed mutations can be combined in a multi-antigen polypeptide, for use as a multi-prong vaccine.
As used in the description, the words “a” and “an” can mean one or more than one. As used in the claims in conjunction with the word “comprising,” the words “a” and “an” can mean one or more than one. As used in the description, “another” can mean at least a second or more.
A “formulation” refers to a mixture of one or more of the polypeptides or nucleic acids described herein, or pharmaceutically acceptable salts or hydrates thereof, with other chemical components, such as physiologically acceptable carriers and excipients. The purpose of a formulation is to facilitate administration to an organism.
The term “pharmaceutically acceptable salt” includes salts derived from inorganic or organic acids or bases, including, for example hydrochloric, hydrobromic, sulfuric, nitric, perchloric, phosphoric, formic, acetic, lactic, maleic, fumaric, succinic, tartaric, glycolic, salicylic, citric, methanesulfonic, benzenesulfonic, benzoic, malonic, trifluroacetic, trichloroacetic, naphthalene-2 sulfonic and other acids; or salts with metals such as sodium, potassium, lithium, calcium, magnesium, and aluminum.
As used herein and as well understood in the art, “treatment” is an approach for obtaining beneficial or desired results, including clinical results. Beneficial or desired clinical results may include, but are not limited to, alleviation or amelioration of one or more symptoms or conditions, diminution of extent of disease, a stabilized (i.e., not worsening) state of disease, preventing spread of disease, delay or slowing of disease progression, amelioration or palliation of the disease state and remission (whether partial or total), whether detectable or undetectable. “Treatment” can also mean prolonging survival as compared to expected survival if not receiving treatment.
As used herein, a therapeutic that “prevents” a disorder or condition refers to an agent (e.g., compound) that, in a statistical sample, reduces the occurrence of the disorder or condition in the treated sample relative to an untreated control sample, or delays the onset or reduces the severity of one or more symptoms of the disorder or condition relative to the untreated control sample.
The term “carrier” refers to a diluent, adjuvant, excipient, or vehicle with which a compound is administered. Non-limiting examples of such pharmaceutical carriers include liquids, such as water and oils, including those of petroleum, animal, vegetable or synthetic origin, such as peanut oil, soybean oil, mineral oil, sesame oil and the like. The pharmaceutical carriers may also be saline, gum acacia, gelatin, starch paste, talc, keratin, colloidal silica, urea, and the like. In addition, auxiliary, stabilizing, thickening, lubricating and coloring agents may be used. Other examples of suitable pharmaceutical carriers are described in “Remington's Pharmaceutical Sciences” by E. W. Martin.
The terms “animal”, “subject”, and “patient” as used herein include all members of the animal kingdom including, but not limited to, birds, mammals, animals (e.g., cats, dogs, horses, and swine) and humans.
In some descriptions, reference is made to SEQ ID NO: 1, which is provided below.
In some aspects, compositions for use as a vaccine against SARS-CoV-2 infection comprise either a polypeptide that comprises at least one surge-associated mutation in its amino acid sequence with respect to SEQ ID NO: 1 or a nucleic acid that encodes said polypeptide.
SEQ ID NO: 1 is representative of the spike protein of SARS-CoV-2. Its N-terminal domain (NTD), according to the corresponding UniProt entry, is comprised of residues 13-303 of SEQ ID NO: 1. In some embodiments, the referenced mutations are in the NTD.
The mutations in some preferred embodiments are deletions. The deletions can be at any one residue, or at more than one residue (contiguous or not), which can be selected from the following set of residues: 14-16, 24-26, 63-76, 81, 85-89, 136-146, 150-165, 167-169, 210-212, 214, 216 and 241-252 with respect to SEQ ID NO: 1. In some embodiments, these selected mutations result in a spike protein with an altered NTD that does not bind to some antibodies that bind the NTD of SEQ ID NO: 1; therefore, use of such polypeptides allows creating a vaccine against emerging strains against which current therapies are not effective. In some embodiments, the polypeptide has additional mutations, such as K986P and V987P, and/or E484K, N501Y, D614G, P681H, and/or P681R.
In certain aspects, compositions include more than one polypeptide with a different set of mutations. Additionally, in some aspects, antibodies or antigen-biding fragments bind to a polypeptide described herein.
In certain embodiments, the compositions comprise a nucleic acid, such as an mRNA, that encodes the polypeptide. The mRNA, in some embodiments, has features that enable its successful use as a vaccine, such as a 5′ cap, 5′-untranslated region, a 3′-untranslated region, and a poly(A) tail. The mRNA, in some embodiments, comprises at least one non-canonical nucleobase (e.g., to improve its stability).
An mRNA comprising one or more non-canonical nucleosides or nucleotides, for example, is called a “modified” RNA to describe the presence of one or more non-naturally and/or naturally occurring components or configurations that are used instead of or in addition to the canonical A, G, C, and U residues.
Modified nucleosides and nucleotides can include one or more of: (i) alteration, e.g., replacement, of one or both of the non-linking phosphate oxygens and/or of one or more of the linking phosphate oxygens in the phosphodiester backbone linkage (an exemplary backbone modification); (ii) alteration, e.g., replacement, of a constituent of the ribose sugar, e.g., of the 2′ hydroxyl on the ribose sugar (an exemplary sugar modification); (iii) wholesale replacement of the phosphate moiety with “dephospho” linkers (an exemplary backbone modification); (iv) modification or replacement of a naturally occurring nucleobase, including with a non-canonical nucleobase (an exemplary base modification); (v) replacement or modification of the ribose-phosphate backbone (an exemplary backbone modification); (vi) modification of the 3′ end or 5′ end of the oligonucleotide, e.g., removal, modification or replacement of a terminal phosphate group or conjugation of a moiety, cap or linker (such 3′ or 5′ cap modifications may comprise a sugar and/or backbone modification); and (vii) modification or replacement of the sugar (an exemplary sugar modification). Certain embodiments comprise a 5′ end modification to an mRNA. Certain embodiments comprise a 3′ end modification to an mRNA. A modified RNA can contain 5′ end and 3′ end modifications. A modified RNA can contain one or more modified residues at non-terminal locations. In certain embodiments, an mRNA includes at least one modified residue.
In some embodiments, the mRNA comprises SEQ ID NO: 2, which is provided below (with “T” shown instead of “U”).
In some embodiments, the mRNA comprises a part of SEQ ID NO: 2, or a modified (e.g., codon-optimized) version of SEQ ID NO: 2 with the requisite mutations to encode the described polypeptides. In some embodiments, the modified version of the mRNA includes one or more (e.g., plurality, all) modified uridines. “Modified uridine” is used herein to refer to a nucleoside other than thymidine with the same hydrogen bond acceptors as uridine and one or more structural differences from uridine. In some embodiments, a modified uridine is a substituted uridine, i.e., a uridine in which one or more non-proton substituents (e.g., alkoxy, such as methoxy) takes the place of a proton. In some embodiments, a modified uridine is pseudouridine. In some embodiments, a modified uridine is a substituted pseudouridine, e.g., a pseudouridine in which one or more non-proton substituents (e.g., alkyl, such as methyl) takes the place of a proton. In some embodiments, a modified uridine is any of a substituted uridine, pseudouridine, or a substituted pseudouridine.
In some embodiments, the mRNA comprises at least one UTR from an expressed mammalian mRNA, such as a constitutively expressed mRNA. An mRNA is considered constitutively expressed in a mammal if it is continually transcribed in at least one tissue of a healthy adult mammal. In some embodiments, the mRNA comprises a 5′ UTR, 3′ UTR, or 5′ and 3′ UTRs from an expressed mammalian RNA, such as a constitutively expressed mammalian mRNA. Actin mRNA is an example of a constitutively expressed mRNA.
In some embodiments, the mRNA comprises at least one UTR from Hydroxysteroid 17-Beta Dehydrogenase 4 (HSD 17B4 or HSD), e.g., a 5′ UTR from HSD. In some embodiments, the mRNA comprises at least one UTR from a globin mRNA, for example, human alpha globin (HBA) mRNA, human beta globin (HBB) mRNA, or Xenopus laevis beta globin (XBG) mRNA. In some embodiments, the mRNA comprises a 5′ UTR, 3′ UTR, or 5′ and 3′ UTRs from a globin mRNA, such as HBA, HBB, or XBG. In some embodiments, the mRNA comprises a 5′ UTR from bovine growth hormone, cytomegalovirus (CMV), mouse Hba-a1, HSD, an albumin gene, HBA, HBB, or XBG. In some embodiments, the mRNA comprises a 3′ UTR from bovine growth hormone, cytomegalovirus, mouse Hba-a1, HSD, an albumin gene, HBA, HBB, or XBG. In some embodiments, the mRNA comprises 5′ and 3′ UTRs from bovine growth hormone, cytomegalovirus, mouse Hba-a1, HSD, an albumin gene, HBA, HBB, XBG, heat shock protein 90 (Hsp90), glyceraldehyde 3-phosphate dehydrogenase (GAPDH), beta-actin, alpha-tubulin, tumor protein (p53), or epidermal growth factor receptor (EGFR).
In some embodiments, the mRNA comprises 5′ and 3′ UTRs that are from the same source, e.g., a constitutively expressed mRNA such as actin, albumin, or a globin such as HBA, HBB, or XBG.
In some embodiments, the mRNA does not comprise a 5′ UTR, e.g., there are no additional nucleotides between the 5′ cap and the start codon. In some embodiments, the mRNA comprises a Kozak sequence between the 5′ cap and the start codon, but does not have any additional 5′ UTR. In some embodiments, the mRNA does not comprise a 3′ UTR, e.g., there are no additional nucleotides between the stop codon and the poly-A tail.
In some embodiments, the mRNA comprises a Kozak sequence. The Kozak sequence can affect translation initiation and the overall yield of a polypeptide translated from an mRNA. A Kozak sequence includes a methionine codon that can function as the start codon. A minimal Kozak sequence is NNNRUGN wherein at least one of the following is true: the first N is A or G and the second N is G. In the context of a nucleotide sequence, R means a purine (A or G). In some embodiments, the Kozak sequence is RNNRUGN, NNNRUGG, RNNRUGG, RNNAUGN, NNNAUGG, or RNNAUGG. In some embodiments, the Kozak sequence is rccRUGg with zero mismatches or with up to one or two mismatches to positions in lowercase. In some embodiments, the Kozak sequence is rccAUGg with zero mismatches or with up to one or two mismatches to positions in lowercase. In some embodiments, the Kozak sequence is gccRccAUGG (SEQ ID NO: 3) with zero mismatches or with up to one, two, or three mismatches to positions in lowercase. In some embodiments, the Kozak sequence is gccAccAUG (SEQ ID NO: 4) with zero mismatches or with up to one, two, three, or four mismatches to positions in lowercase. In some embodiments, the Kozak sequence is GCCACCAUG. In some embodiments, the Kozak sequence is gccgccRccAUGG with zero mismatches or with up to one, two, three, or four mismatches to positions in lowercase.
In some embodiments, an mRNA disclosed herein comprises a 5′ cap, such as a Cap0, Cap1, or Cap2. A 5′ cap is generally a 7-methylguanine ribonucleotide (which may be further modified, as discussed below e.g. with respect to ARCA) linked through a 5′-triphosphate to the 5′ position of the first nucleotide of the 5′-to-3′ chain of the mRNA, i.e., the first cap-proximal nucleotide. In Cap0, the riboses of the first and second cap-proximal nucleotides of the mRNA both comprise a 2′-hydroxyl. In Cap1, the riboses of the first and second transcribed nucleotides of the mRNA comprise a 2′-methoxy and a 2′-hydroxyl, respectively. In Cap2, the riboses of the first and second cap-proximal nucleotides of the mRNA both comprise a 2′-methoxy. See, e.g., Katibah et al. (2014) Proc Natl Acad Sci USA 111(33): 12025-30; Abbas et al. (2017) Proc Natl Acad Sci USA 114(11):E2106-E2115. Most endogenous higher eukaryotic mRNAs, including mammalian mRNAs such as human mRNAs, comprise Cap1 or Cap2. Cap0 and other cap structures differing from Cap1 and Cap2 may be immunogenic in mammals, such as humans, due to recognition as “non-self” by components of the innate immune system such as IFIT-1 and IFIT-5, which can result in elevated cytokine levels including type I interferon. Components of the innate immune system such as IFIT-1 and IFIT-5 may also compete with eIF4E for binding of an mRNA with a cap other than Cap1 or Cap2, potentially inhibiting translation of the mRNA.
A cap can be included co-transcriptionally. For example, ARCA (anti-reverse cap analog; Thermo Fisher Scientific Cat. No. AM8045) is a cap analog comprising a 7-methylguanine 3′-methoxy-5′-triphosphate linked to the 5′ position of a guanine ribonucleotide which can be incorporated in vitro into a transcript at initiation. ARCA results in a Cap0 cap in which the 2′ position of the first cap-proximal nucleotide is hydroxyl. See, e.g., Stepinski et al., (2001) “Synthesis and properties of mRNAs containing the novel ‘anti-reverse’ cap analogs 7-methyl(3′-O-methyl)GpppG and 7-methyl(3′deoxy)GpppG,” RNA 7: 1486-1495.
Alternatively, a cap can be added to an RNA post-transcriptionally. For example, Vaccinia capping enzyme is commercially available (New England Biolabs Cat. No. M2080S) and has RNA triphosphatase and guanylyltransferase activities, provided by its DI subunit, and guanine methyltransferase, provided by its D12 subunit. As such, it can add a 7-methylguanine to an RNA, so as to give Cap0, in the presence of S-adenosyl methionine and GTP. See, e.g., Guo, P. and Moss, B. (1990) Proc. Natl. Acad. Sci. USA 87, 4023-4027; Mao, X. and Shuman, S. (1994) J. Biol. Chem. 269, 24472-24479.
In some embodiments, the mRNA further comprises a poly-adenylated (poly-A) tail. In some embodiments, the poly-A tail comprises at least 20, 30, 40, 50, 60, 70, 80, 90, or 100 adenines, optionally up to 300 adenines (SEQ ID NO: 5). In some embodiments, the poly-A tail comprises 95, 96, 97, 98, 99, or 100 adenine nucleotides (SEQ ID NO: 6). In some instances, the poly-A tail is “interrupted” with one or more non-adenine nucleotide “anchors” at one or more locations within the poly-A tail. The poly-A tails may comprise at least 8 consecutive adenine nucleotides, but also comprise one or more non-adenine nucleotide. As used herein, “non-adenine nucleotides” refer to any natural or non-natural nucleotides that do not comprise adenine. Guanine, thymine, and cytosine nucleotides are exemplary non-adenine nucleotides. As used herein, “non-adenine nucleotides” refer to any natural or non-natural nucleotides that do not comprise adenine. Guanine, thymine, and cytosine nucleotides are exemplary non-adenine nucleotides.
In some embodiments, the mRNA is purified. In some embodiments, the mRNA is purified using a precipitation method (e.g., LiCl precipitation, alcohol precipitation, or an equivalent method, e.g., as described herein). In some embodiments, the mRNA is purified using a chromatography-based method, such as an HPLC-based method or an equivalent method (e.g., as described herein). In some embodiments, the mRNA is purified using both a precipitation method (e.g., LiCl precipitation) and an HPLC-based method.
In some aspects, formulations comprise the polypeptides or nucleic acids described herein. The formulations, in some embodiments, further comprise at least one excipient. In some embodiments, the formulations further comprise a delivery system (e.g., selected from protamine, protamine liposome, polysaccharide particles, cationic nanoemulsion, cationic polymer, cationic polymer liposome, cationic lipid nanoparticle, cationic lipid/cholesterol nanoparticles, cationic lipid/cholesterol/PEG nanoparticles, and dendrimer nanoparticles). Various details of such formulations can be found in Pardi et al., mRNA vaccines—a new era in vaccinology, Nature Reviews—Drug Discovery 17: 261-279 (2018). The formulations, in some aspects, include a disclosed polypeptide and an adjuvant or a disclosed nucleic acid as part of a vector or transfection system.
For facilitating delivery of nucleic acids, such as mRNAs, certain lipid formulations can be used, as described further below.
In some embodiments, the lipid formulations, mRNA modifications, and other features of the formulations are as described in the following patents: U.S. Pat. Nos. 10,703,789; 10,702,600; 10,577,403; 10,442,756; 10,266,485; 10,064,959; 9,868,692, each of which is incorporated by reference in its entirety. In some embodiments, the formulations comprise lipids (SM-102, polyethylene glycol [PEG] 2000 dimyristoyl glycerol [DMG], cholesterol, and 1,2-distearoyl-sn-glycero-3-phosphocholine [DSPC]), tromethamine, tromethamine hydrochloride, acetic acid, sodium acetate trihydrate, and/or sucrose.
Disclosed herein are various embodiments of LNP formulations for biologically active agents, such as RNAs. Such LNP formulations include an “amine lipid” or a “biodegradable lipid”, optionally along with one or more of a helper lipid, a neutral lipid, and a stealth lipid such as a PEG lipid. By “lipid nanoparticle” is meant a particle that comprises a plurality of (i.e. more than one) lipid molecules physically associated with each other by intermolecular forces.
In certain embodiments, LNP compositions for the delivery of biologically active agents comprise an “amine lipid”, which is defined as Lipid A or its equivalents, including acetal analogs of Lipid A.
In some embodiments, the amine lipid is Lipid A, which is (9Z,12Z)-3-((4,4-bis(octyloxy)butanoyl)oxy)-2-((((3-(diethylamino)propoxy)carbonyl)oxy)methyl)propyl octadeca-9,12-dienoate, also called 3-((4,4-bis(octyloxy)butanoyl)oxy)-2-((((3-(diethylamino)propoxy)carbonyl)oxy)methyl)propyl (9Z,12Z)-octadeca-9,12-dienoate.
Lipid A may be synthesized according to WO2015/095340 (e.g., pp. 84-86). In certain embodiments, the amine lipid is an equivalent to Lipid A.
In certain embodiments, an amine lipid is an analog of Lipid A. In certain embodiments, a Lipid A analog is an acetal analog of Lipid A. In particular LNP compositions, the acetal analog is a C4-C12 acetal analog. In some embodiments, the acetal analog is a C5-C12 acetal analog. In additional embodiments, the acetal analog is a C5-C10 acetal analog. In further embodiments, the acetal analog is chosen from a C4, C5, C6, C7, C9, C10, C11, and C12 acetal analog.
Amine lipids and other “biodegradable lipids” suitable for use in the LNPs described herein are biodegradable in vivo. The amine lipids have low toxicity (e.g., are tolerated in animal models without adverse effect in amounts of greater than or equal to 10 mg/kg). In certain embodiments, LNPs comprising an amine lipid include those where at least 75% of the amine lipid is cleared from the plasma within 8, 10, 12, 24, or 48 hours, or 3, 4, 5, 6, 7, or 10 days. In certain embodiments. LNPs comprising an amine lipid include those where at least 50% of the mRNA is cleared from the plasma within 8, 10, 12, 24, or 48 hours, or 3, 4, 5, 6, 7, or 10 days. In certain embodiments, LNPs comprising an amine lipid include those where at least 50% of the LNP is cleared from the plasma within 8, 10, 12, 24, or 48 hours, or 3, 4, 5, 6, 7, or 10 days, for example by measuring a lipid (e.g. an amine lipid), RNA (e.g. mRNA), or other component. In certain embodiments, lipid-encapsulated versus free lipid, RNA, or nucleic acid component of the LNP is measured.
Biodegradable lipids include, for example the biodegradable lipids of WO/2017/173054, WO2015/095340, and WO2014/136086. Lipid clearance may be measured as described in literature. See Maier, M. A., et al. Biodegradable Lipids Enabling Rapidly Eliminated Lipid Nanoparticles for Systemic Delivery of RNAi Therapeutics. Mol. Ther. 2013, 21(8), 1570-78 (“Maier”).
Lipids may be ionizable depending upon the pH of the medium they are in. For example, in a slightly acidic medium, the lipid, such as an amine lipid, may be protonated and thus bear a positive charge. Conversely, in a slightly basic medium, such as, for example, blood where pH is approximately 7.35, the lipid, such as an amine lipid, may not be protonated and thus bear no charge.
The ability of a lipid to bear a charge is related to its intrinsic pKa. In some embodiments, the amine lipids of the present disclosure may each, independently, have a pKa in the range of from about 5.1 to about 7.4. In some embodiments, the bioavailable lipids of the present disclosure may each, independently, have a pKa in the range of from about 5.1 to about 7.4. For example, the amine lipids of the present disclosure may each, independently, have a pKa in the range of from about 5.8 to about 6.5. Lipids with a pKa ranging from about 5.1 to about 7.4 are effective for delivery of cargo in vivo, e.g. to the liver. Further, it has been found that lipids with a pKa ranging from about 5.3 to about 6.4 are effective for delivery in vivo, e.g. to tumors. See, e.g., WO2014/136086.
“Neutral lipids” suitable for use in a lipid composition of the disclosure include, for example, a variety of neutral, uncharged or zwitterionic lipids. Examples of neutral phospholipids suitable for use in the present disclosure include, but are not limited to, 5-heptadecylbenzene-1,3-diol (resorcinol), dipalmitoylphosphatidylcholine (DPPC), distearoylphosphatidylcholine (DSPC), phosphocholine (DOPC), dimyristoylphosphatidylcholine (DMPC), phosphatidylcholine (PLPC), 1,2-distearoyl-sn-glycero-3-phosphocholine (DAPC), phosphatidylethanolamine (PE), egg phosphatidylcholine (EPC), dilauryloylphosphatidylcholine (DLPC), dimyristoylphosphatidylcholine (DMPC), 1-myristoyl-2-palmitoyl phosphatidylcholine (MPPC), 1-palmitoyl-2-myristoyl phosphatidylcholine (PMPC), 1-palmitoyl-2-stearoyl phosphatidylcholine (PSPC), 1,2-diarachidoyl-sn-glycero-3-phosphocholine (DBPC), 1-stearoyl-2-palmitoyl phosphatidylcholine (SPPC), 1,2-dieicosenoyl-sn-glycero-3-phosphocholine (DEPC), palmitoyloleoyl phosphatidylcholine (POPC), lysophosphatidyl choline, dioleoyl phosphatidylethanolamine (DOPE), dilinoleoylphosphatidylcholine distearoylphosphatidylethanolamine (DSPE), dimyristoyl phosphatidylethanolamine (DMPE), dipalmitoyl phosphatidylethanolamine (DPPE), palmitoyloleoyl phosphatidylethanolamine (POPE), lysophosphatidylethanolamine and combinations thereof. In one embodiment, the neutral phospholipid may be selected from the group consisting of distearoylphosphatidylcholine (DSPC) and dimyristoyl phosphatidyl ethanolamine (DMPE). In another embodiment, the neutral phospholipid may be distearoylphosphatidylcholine (DSPC).
“Helper lipids” include steroids, sterols, and alkyl resorcinols. Helper lipids suitable for use in the present disclosure include, but are not limited to, cholesterol, 5-heptadecylresorcinol, and cholesterol hemisuccinate. In one embodiment, the helper lipid may be cholesterol. In one embodiment, the helper lipid may be cholesterol hemisuccinate.
“Stealth lipids” are lipids that alter the length of time the nanoparticles can exist in vivo (e.g., in the blood). Stealth lipids may assist in the formulation process by, for example, reducing particle aggregation and controlling particle size. Stealth lipids used herein may modulate pharmacokinetic properties of the LNP. Stealth lipids suitable for use in a lipid composition of the disclosure include, but are not limited to, stealth lipids having a hydrophilic head group linked to a lipid moiety. Stealth lipids suitable for use in a lipid composition of the present disclosure and information about the biochemistry of such lipids can be found in Romberg et al., Pharmaceutical Research, Vol. 25, No. 1, 2008, pg. 55-71 and Hoekstra et al., Biochimica et Biophysica Acta 1660 (2004) 41-52. Additional suitable PEG lipids are disclosed, e.g., in WO 2006/007712.
In one embodiment, the hydrophilic head group of stealth lipid comprises a polymer moiety selected from polymers based on PEG. Stealth lipids may comprise a lipid moiety. In some embodiments, the stealth lipid is a PEG lipid.
In one embodiment, a stealth lipid comprises a polymer moiety selected from polymers based on PEG (sometimes referred to as poly(ethylene oxide)), poly(oxazoline), poly(vinyl alcohol), poly(glycerol), poly(N-vinylpyrrolidone), polyaminoacids and poly[N-(2-hydroxypropyl)methacrylamide].
In one embodiment, the PEG lipid comprises a polymer moiety based on PEG (sometimes referred to as poly(ethylene oxide)).
The PEG lipid further comprises a lipid moiety. In some embodiments, the lipid moiety may be derived from diacylglycerol or diacylglycamide, including those comprising a dialkylglycerol or dialkylglycamide group having alkyl chain length independently comprising from about C4 to about C40 saturated or unsaturated carbon atoms, wherein the chain may comprise one or more functional groups such as, for example, an amide or ester. In some embodiments, the alkyl chain length comprises about C10 to C20. The dialkylglycerol or dialkylglycamide group can further comprise one or more substituted alkyl groups. The chain lengths may be symmetrical or assymetric.
Unless otherwise indicated, the term “PEG” as used herein means any polyethylene glycol or other polyalkylene ether polymer. In one embodiment, PEG is an optionally substituted linear or branched polymer of ethylene glycol or ethylene oxide. In one embodiment, PEG is unsubstituted. In one embodiment, the PEG is substituted, e.g., by one or more alkyl, alkoxy, acyl, hydroxy, or aryl groups. In one embodiment, the term includes PEG copolymers such as PEG-polyurethane or PEG-polypropylene (see, e.g, J. Milton Harris, Poly(ethylene glycol) chemistry: biotechnical and biomedical applications (1992)); in another embodiment, the term does not include PEG copolymers. In one embodiment, the PEG has a molecular weight of from about 130 to about 50,000, in a sub-embodiment, about 150 to about 30,000, in a sub-embodiment, about 150 to about 20,000, in a sub-embodiment about 150 to about 15.000, in a sub-embodiment, about 150 to about 10,000, in a sub-embodiment, about 150 to about 6,000, in a sub-embodiment, about 150 to about 5,000, in a sub-embodiment, about 150 to about 4,000, in a sub-embodiment, about 150 to about 3,000, in a sub-embodiment, about 300 to about 3,000, in a sub-embodiment, about 1,000 to about 3,000, and in a sub-embodiment, about 1,500 to about 2,500.
In any of the embodiments described herein, the PEG lipid may be selected from PEG-dilauroylglycerol, PEG-dimyristoylglycerol (PEG-DMG) (catalog #GM-020 from NOF, Tokyo, Japan), PEG-dipalmitoylglycerol, PEG-distearoylglycerol (PEG-DSPE) (catalog #DSPE-020CN, NOF, Tokyo, Japan), PEG-dilaurylglycamide, PEG-dimyristylglycamide. PEG-dipalmitoylglycamide, and PEG-distearoylglycamide, PEG-cholesterol (1-[8′-(Cholest-5-en-3[beta]-oxy)carboxamido-3′,6′-dioxaoctanyl]carbamoyl-[omega]-methyl-poly(ethylene glycol), PEG-DMB (3,4-ditetradecoxylbenzyl-[omega]-methyl-poly(ethylene glycol)ether), 1,2-dimyristoyl-sn-glycero-3-phosphoethanolamine-N-[methoxy(polyethylene glycol)-2000] (PEG2k-DMG) (cat. #880150P from Avanti Polar Lipids, Alabaster, Ala., USA), 1,2-distearoyl-sn-glycero-3-phosphoethanolamine-N-[methoxy(polyethylene glycol)-2000](PEG2k-DSPE) (cat. #880120C from Avanti Polar Lipids, Alabaster, Ala., USA), 1,2-distearoyl-sn-glycerol, methoxypolyethylene glycol (PEG2k-DSG; GS-020, NOF Tokyo, Japan), poly(ethylene glycol)-2000-dimethacrylate (PEG2k-DMA), and 1,2-distearyloxypropyl-3-amine-N-[methoxy(polyethylene glycol)-2000] (PEG2k-DSA). In one embodiment, the PEG lipid may be PEG2k-DMG. In some embodiments, the PEG lipid may be PEG2k-DSG. In one embodiment, the PEG lipid may be PEG2k-DSPE. In one embodiment, the PEG lipid may be PEG2k-DMA. In one embodiment, the PEG lipid may be PEG2k-C-DMA. In one embodiment, the PEG lipid may be compound S027, disclosed in WO2016/010840 (paragraphs [00240] to [00244]). In one embodiment, the PEG lipid may be PEG2k-DSA. In one embodiment, the PEG lipid may be PEG2k-C11. In some embodiments, the PEG lipid may be PEG2k-C14. In some embodiments, the PEG lipid may be PEG2k-C16. In some embodiments, the PEG lipid may be PEG2k-C18.
The LNP may contain (i) a biodegradable lipid, (ii) an optional neutral lipid, (iii) a helper lipid, and (iv) a stealth lipid, such as a PEG lipid. The LNP may contain a biodegradable lipid and one or more of a neutral lipid, a helper lipid, and a stealth lipid, such as a PEG lipid.
The LNP may contain (i) an amine lipid for encapsulation and for endosomal escape, (ii) a neutral lipid for stabilization, (iii) a helper lipid, also for stabilization, and (iv) a stealth lipid, such as a PEG lipid. The LNP may contain an amine lipid and one or more of a neutral lipid, a helper lipid, also for stabilization, and a stealth lipid, such as a PEG lipid.
In certain embodiments, lipid compositions are described according to the respective molar ratios of the component lipids in the formulation. Embodiments of the present disclosure provide lipid compositions described according to the respective molar ratios of the component lipids in the formulation. In one embodiment, the mol-% of the amine lipid may be from about 30 mol-% to about 60 mol-%. In one embodiment, the mol-% of the amine lipid may be from about 40 mol-% to about 60 mol-%. In one embodiment, the mol-% of the amine lipid may be from about 45 mol-% to about 60 mol-%. In one embodiment, the mol-% of the amine lipid may be from about 50 mol-% to about 60 mol-%. In one embodiment, the mol-% of the amine lipid may be from about 55 mol-% to about 60 mol-%. In one embodiment, the mol-% of the amine lipid may be from about 50 mol-% to about 55 mol-%. In one embodiment, the mol-% of the amine lipid may be about 50 mol-%. In one embodiment, the mol-% of the amine lipid may be about 55 mol-%. In some embodiments, the amine lipid mol-% of the LNP batch will be ±30%, ±25%, ±20%, ±15%, ±10%, ±5%, or ±2.5% of the target mol-%. In some embodiments, the amine lipid mol-% of the LNP batch will be ±4 mol-%, ±3 mol-%, ±2 mol-%, ±1.5 mol-%, ±1 mol-%, ±0.5 mol-%, or ±0.25 mol-% of the target mol-%. All mol-% numbers are given as a fraction of the lipid component of the LNP compositions. In certain embodiments, LNP inter-lot variability of the amine lipid mol-% will be less than 15%, less than 10% or less than 5%.
In one embodiment, the mol-% of the neutral lipid may be from about 5 mol-% to about 15 mol-%. In one embodiment, the mol-% of the neutral lipid may be from about 7 mol-% to about 12 mol-%. In one embodiment, the mol-% of the neutral lipid may be about 9 mol-%. In some embodiments, the neutral lipid mol-% of the LNP batch will be ±30%, ±25%, ±20%, ±15%, ±10%, ±5%, or ±2.5% of the target neutral lipid mol-%. In certain embodiments, LNP inter-lot variability will be less than 15%, less than 10% or less than 5%.
In one embodiment, the mol-% of the helper lipid may be from about 20 mol-% to about 60 mol-%. In one embodiment, the mol-% of the helper lipid may be from about 25 mol-% to about 55 mol-%. In one embodiment, the mol-% of the helper lipid may be from about 25 mol-% to about 50 mol-%. In one embodiment, the mol-% of the helper lipid may be from about 25 mol-% to about 40 mol-%. In one embodiment, the mol-% of the helper lipid may be from about 30 mol-% to about 50 mol-%. In one embodiment, the mol-% of the helper lipid may be from about 30 mol-% to about 40 mol-%. In one embodiment, the mol-% of the helper lipid is adjusted based on amine lipid, neutral lipid, and PEG lipid concentrations to bring the lipid component to 100 mol-%. In some embodiments, the helper mol-% of the LNP batch will be ±30%, ±25%, ±20%, ±15%, ±10%, ±5%, or ±2.5% of the target mol-%. In certain embodiments, LNP inter-lot variability will be less than 15%, less than 10% or less than 5%.
In one embodiment, the mol-% of the PEG lipid may be from about 1 mol-% to about 10 mol-%. In one embodiment, the mol-% of the PEG lipid may be from about 2 mol-% to about 10 mol-%. In one embodiment, the mol-% of the PEG lipid may be from about 2 mol-% to about 8 mol-%. In one embodiment, the mol-% of the PEG lipid may be from about 2 mol-% to about 4 mol-%. In one embodiment, the mol-% of the PEG lipid may be from about 2.5 mol-% to about 4 mol-%. In one embodiment, the mol-% of the PEG lipid may be about 3 mol-%. In one embodiment, the mol-% of the PEG lipid may be about 2.5 mol-%. In some embodiments, the PEG lipid mol-% of the LNP batch will be ±30%, ±25%, ±20%, ±15%, ±10%, ±5%, or ±2.5% of the target PEG lipid mol-%. In certain embodiments, LNP inter-lot variability will be less than 15%, less than 10% or less than 5%.
In certain embodiments, the cargo includes an mRNA encoding one or more of the disclosed polypeptides. In one embodiment, an LNP composition may comprise a Lipid A or its equivalents. In some aspects, the amine lipid is Lipid A. In some aspects, the amine lipid is a Lipid A equivalent, e.g. an analog of Lipid A. In certain aspects, the amine lipid is an acetal analog of Lipid A. In various embodiments, an LNP composition comprises an amine lipid, a neutral lipid, a helper lipid, and a PEG lipid. In certain embodiments, the helper lipid is cholesterol. In certain embodiments, the neutral lipid is DSPC. In specific embodiments, PEG lipid is PEG2k-DMG. In some embodiments, an LNP composition may comprise a Lipid A, a helper lipid, a neutral lipid, and a PEG lipid. In some embodiments, an LNP composition comprises an amine lipid, DSPC, cholesterol, and a PEG lipid. In some embodiments, the LNP composition comprises a PEG lipid comprising DMG. In certain embodiments, the amine lipid is selected from Lipid A, and an equivalent of Lipid A, including an acetal analog of Lipid A. In additional embodiments, an LNP composition comprises Lipid A, cholesterol, DSPC, and PEG2k-DMG.
Embodiments of the present disclosure also provide lipid compositions described according to the molar ratio between the positively charged amine groups of the amine lipid (N) and the negatively charged phosphate groups (P) of the nucleic acid to be encapsulated. This may be mathematically represented by the equation N/P. In some embodiments, an LNP composition may comprise a lipid component that comprises an amine lipid, a helper lipid, a neutral lipid, and a helper lipid; and a nucleic acid component, wherein the N/P ratio is about 3 to 10. In some embodiments, an LNP composition may comprise a lipid component that comprises an amine lipid, a helper lipid, a neutral lipid, and a helper lipid; and an RNA component, wherein the N/P ratio is about 3 to 10. In one embodiment, the N/P ratio may about 5-7. In one embodiment, the N/P ratio may about 4.5-8. In one embodiment, the N/P ratio may about 6. In one embodiment, the N/P ratio may be 6±1. In one embodiment, the N/P ratio may about 6±0.5. In some embodiments, the N/P ratio will be ±30%, ±25%, ±20%, ±15%, ±10%, ±5%, or ±2.5% of the target N/P ratio. In certain embodiments, LNP inter-lot variability will be less than 15%, less than 10% or less than 5%.
In some embodiments, LNPs are formed by mixing an aqueous RNA solution with an organic solvent-based lipid solution, e.g., 100% ethanol. Suitable solutions or solvents include or may contain: water, PBS, Tris buffer, NaCl, citrate buffer, ethanol, chloroform, diethylether, cyclohexane, tetrahydrofuran, methanol, isopropanol. A pharmaceutically acceptable buffer, e.g., for in vivo administration of LNPs, may be used. In certain embodiments, a buffer is used to maintain the pH of the composition comprising LNPs at or above pH 6.5. In certain embodiments, a buffer is used to maintain the pH of the composition comprising LNPs at or above pH 7.0. In certain embodiments, the composition has a pH ranging from about 7.2 to about 7.7. In additional embodiments, the composition has a pH ranging from about 7.3 to about 7.7 or ranging from about 7.4 to about 7.6. In further embodiments, the composition has a pH of about 7.2, 7.3, 7.4, 7.5, 7.6, or 7.7. The pH of a composition may be measured with a micro pH probe. In certain embodiments, a cryoprotectant is included in the composition. Non-limiting examples of cryoprotectants include sucrose, trehalose, glycerol, DMSO, and ethylene glycol. Exemplary compositions may include up to 10% cryoprotectant, such as, for example, sucrose. In certain embodiments, the LNP composition may include about 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10% cryoprotectant. In certain embodiments, the LNP composition may include about 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10% sucrose. In some embodiments, the LNP composition may include a buffer. In some embodiments, the buffer may comprise a phosphate buffer (PBS), a Tris buffer, a citrate buffer, and mixtures thereof. In certain exemplary embodiments, the buffer comprises NaCl. In certain embodiments, NaCl is omitted. Exemplary amounts of NaCl may range from about 20 mM to about 45 mM. Exemplary amounts of NaCl may range from about 40 mM to about 50 mM. In some embodiments, the amount of NaCl is about 45 mM. In some embodiments, the buffer is a Tris buffer. Exemplary amounts of Tris may range from about 20 mM to about 60 mM. Exemplary amounts of Tris may range from about 40 mM to about 60 mM. In some embodiments, the amount of Tris is about 50 mM. In some embodiments, the buffer comprises NaCl and Tris. Certain exemplary embodiments of the LNP compositions contain 5% sucrose and 45 mM NaCl in Tris buffer. In other exemplary embodiments, compositions contain sucrose in an amount of about 5% w/v, about 45 mM NaCl, and about 50 mM Tris at pH 7.5. The salt. buffer, and cryoprotectant amounts may be varied such that the osmolality of the overall formulation is maintained. For example, the final osmolality may be maintained at less than 450 mOsm/L. In further embodiments, the osmolality is between 350 and 250 mOsm/L. Certain embodiments have a final osmolality of 300+/−20 mOsm/L.
In some embodiments, microfluidic mixing, T-mixing, or cross-mixing is used. In certain aspects, flow rates, junction size, junction geometry, junction shape, tube diameter, solutions, and/or RNA and lipid concentrations may be varied. LNPs or LNP compositions may be concentrated or purified, e.g., via dialysis, tangential flow filtration, or chromatography. The LNPs may be stored as a suspension, an emulsion, or a lyophilized powder, for example. In some embodiments, an LNP composition is stored at 2-8° C., in certain aspects, the LNP compositions are stored at room temperature. In additional embodiments, an LNP composition is stored frozen, for example at −20° C. or −80° C. In other embodiments, an LNP composition is stored at a temperature ranging from about 0° C. to about −80° C. Frozen LNP compositions may be thawed before use, for example on ice, at 4° C., at room temperature, or at 25° C. Frozen LNP compositions may be maintained at various temperatures, for example on ice, at 4° C., at room temperature, at 25° C., or at 37° C.
In some aspects, the methods are disclosed that use the described polypeptides, nucleic acids, compositions, or formulations.
For example, methods of vaccinating a subject against SARS-CoV-2 infection comprise administering to the subject a composition or a formulation according to any of the described embodiments. The administering step, in some embodiments, is via intramuscular injection or intradermal injection.
In some aspects, methods of selecting an antibody for treating a SARS-CoV-2 infection in a subject comprise determining the presence of one or more mutations at a residue selected from 14-16, 24-26, 63-76, 81, 85-89, 136-146, 150-165, 167-169, 210-212, 214, 216 and 241-252 in a SARS-CoV-2 spike protein from the subject; and selecting an antibody that does not bind to the N-terminal domain antigenic supersite of the SARS-CoV-2 spike protein.
In some aspects, methods of selecting a convalescent plasma against SARS-CoV-2 infection in a subject comprise determining the presence of one or more mutations at a residue selected from 14-16, 24-26, 63-76, 81, 85-89, 136-146, 150-165, 167-169, 210-212, 214, 216 and 241-252 in a SARS-CoV-2 spike protein from the subject; and selecting a convalescent plasma having antibodies that do not bind to the N-terminal domain antigenic supersite of the SARS-CoV-2 spike protein.
In some aspects, methods of selecting a vaccine against SARS-CoV-2 infection in a subject comprise determining the presence of one or more mutations at a residue selected from 14-16, 24-26, 63-76, 81, 85-89, 136-146, 150-165, 167-169, 210-212, 214, 216 and 241-252 in a spike protein from an emerging variant of SARS-CoV-2; and selecting a vaccine having a polypeptide according to any of the embodiments described herein.
The raging COVID-19 pandemic in India, combined with cases of re-infection and post-vaccination “breakthrough” globally have raised alarm, mandating characterization of the immuno-evasive features of SARS-CoV-2. Here, we systematically analyzed over 1.3 million SARS-CoV-2 genomes from 178 countries and also conducted whole-genome viral sequencing from 53 patients at Mayo Clinic sites that had developed SARS-CoV-2 re-infections or vaccine breakthrough infections. We identified 116 Spike protein mutations that increased in prevalence during at least one surge in PCR test positivity in any country over a three-month window. Deletions in the Spike protein N-terminal domain (NTD) are enriched for these ‘surge-associated mutations’ (Odds Ratio=1.96, 95% CI: 1.35-2.85; p<0.001) and are expanding into longer contiguous stretches of deletions over the course of the pandemic. In the ongoing COVID-19 surge in India, an emerging NTD deletion (ΔF157/R158) has increased over 10-fold in prevalence from February 2021 (1.1%) to April 2021 (15%). During the recent surge in Chile, a hitherto uncharacterized NTD deletion (Δ246-252) has increased in prevalence by over 30-fold from January 2021 (0.86%) to April 2021 (33%). Strikingly, these emerging surge-associated deletions in India and Chile map directly to an antigenic supersite that is bound by most NTD-targeted neutralizing antibodies. Finally, in three patients from Mayo Clinic in Minnesota who were previously infected or vaccinated, we identified NTD deletions (Δ85-90, Δ156-164, Δ167-174) that were never previously found in the state. These putative immune escape deletions are also proximal to the neutralizing antibody binding sites, suggesting that antigenic minimalism may be an emerging evolutionary strategy for SARS-CoV-2 to evade immune responses. This study highlights the urgent need to sequence SARS-CoV-2 genomes at much larger scale globally and mandate a public health policy for more granular and transparent reporting of SARS-CoV-2 sample annotations such as de-identified patient phenotypes and vaccination status. Such a universal standard for genomic epidemiology and clinical genomics is imperative to proactively predict breakthrough and reinfection mutations at their incipient stages, as well as guide the development of neutralizing antibodies and future COVID-19 vaccines that thwart a broad spectrum of immunoevasive SARS-CoV-2 variants.
The ongoing COVID-19 pandemic has infected around 500 million people and killed more than 6.1 million people worldwide, as of April 20221. The continual emergence of SARS-CoV-2 variants with increased transmissibility and capacity for immune escape, such as B.1.17 (“UK variant”) and P.1 (“Brazilian variant”), threatens to prolong the pandemic through devastating outbreaks such as the one currently being witnessed in India2. While multiple vaccines have demonstrated high effectiveness in clinical trials and real world studies3-5, there have been reports of “vaccine breakthrough infections” with SARS-CoV-2 variants6,7. A recent study described two such cases in New York, at least one of which occurred despite confirmation of a robust neutralizing antibody response. Variant classification schemes have been developed by the US Centers for Disease Control and Prevention (CDC)8 and the World Health Organisation (WHO)9 based on factors such as prevalence, evidence of transmissibility and disease severity, and ability to be neutralized by existing therapeutics or sera from vaccinated patients. Early and rapid detection of these emerging Variants of Concern/Interest is imperative to combat and contain the ongoing pandemic and future outbreaks.
It is critical to thoroughly characterize how SARS-CoV-2 mutates to evade natural and vaccine-induced immune responses as it continues driving case surges. To this end, neutralizing antibodies which target the receptor-binding domain (RBD) or the N-terminal domain (NTD) of the Spike protein have been isolated from the sera of COVID-19 patients10-12. Recent studies contemporaneously found that several neutralizing antibodies target a single antigenic supersite in the NTD of the Spike protein13,14. The NTD is also a hotspot for in-frame deletions in the SARS-CoV-2 genome, with four recurrent deletion regions (RDRs) identified15. Several such deletions have been experimentally demonstrated to reduce neutralization by some NTD-targeting neutralizing antibodies13,15. Whether additional deletions have emerged in variants that drive surges or vaccine breakthrough infections needs to be determined.
Concerted global data sharing efforts during the pandemic have led to the rapid development of large-scale genomic and epidemiological COVID-19 resources. Over 9.3 million SARS-CoV-2 genomes from 213 distinct geographical regions have been deposited throughout the pandemic in the GISAID database (
In this study, we reveal that deletion mutations in the Spike protein have a high likelihood of being associated with surges in community transmission. We identify rapidly emerging surge-associated deletion mutations in India and Chile that map to a proposed antigenic supersite. We also identify non-overlapping deletion mutations in SARS-CoV-2 from patients with re-infection/vaccine-breakthrough infections, also mapping near the antibody-binding site and thus representing candidates for vaccine escape mutations. Finally, we highlight that the deletion-prone regions of the Spike protein are expanding during the course of the pandemic as an evolutionary strategy of “antigenic minimalism” to evade immune responses.
Deletions are Enriched for Association with Surges in Community Transmission of SARS-CoV-2
Analysis of 9,299,506 SARS-CoV-2 genome sequences (
Further, we investigated whether a class of mutations (missense and/or indels) are enriched among the surge-associated mutations. 38 of 396 (9.5%) deletions were surge-associated, as compared to 133 of 2545 (5.22%) substitutions, and 6 of 29 (20.68%) insertions. This data indicates that deletions, but not substitutions or insertions, are enriched for association with surges (Chi-square Test p-value <0.00001; Odds Ratio=1.96, 95% CI: 1.35-2.85;
Rapidly Emerging Deletion Mutations Associated with Surges in India and Chile Map to Antigenic Supersite Binding Most NTD-Targeted Neutralizing Antibodies
Recently there have been massive surges of COVID-19 infection in a few countries, most prominently in India16 and Chile17,18. In order to identify the mutations associated with recent surges, we identified the mutations which have monotonically increased in frequency during a monotonic increase in test positivity in any country between February and April 2021. We found that different sets of mutations have increased in prevalence during current surges in seven countries: Poland, Bangladesh, Belgium, Chile, France, India and Sweden (Table 1).
In India, 13 mutations are correlated with the recent massive surge (“second wave of infections”, in the month of April 2021), which includes an emerging deletion (ΔF157/R158) in the NTD. This deletion has co-occurred with the existing mutations (P681R, L452R, E484Q) and is found in B.1.617.2, which has been categorized as a variant of interest by the CDC8 (
In Chile, 36 mutations are correlated with the current surge (April 2021), which clusters into three distinct groups corresponding to independently circulating variants (
Taken together, this analysis highlights two NTD deletions that are rapidly emerging in specific countries and are strongly correlated with the surges in community spreads of SARS-CoV-2 in each. Furthermore, structures show that these residues are found in the binding sites for several characterized neutralizing antibodies. Deletion of these epitopes from the Spike protein is highly likely to diminish antibody binding affinity thereby enabling immune escape.
Analysis of SARS-CoV-2 Genomes from COVID-19 Patients with Vaccine Breakthrough Reveals the Presence of Distinct Deletions in the N-Terminal Domain
While the polyclonal nature of the immune response to vaccination makes it unlikely that single mutations will alter vaccine effectiveness, combinations of mutations may indeed lower the sensitivity of particular variants to vaccine-induced immunity. As such, it is important to track the sets of mutations that are present in variants infecting vaccinated individuals. To do so, we performed whole genome viral sequencing from 52 breakthrough COVID-19 cases in the Mayo Clinic health system. In total, we have identified 92 unique mutations, of which 29 are deletions (
We identified four variants harboring one or more less characterized deletion stretches. Importantly, each one had deletions in a distinct NTD region, demonstrating the genomic heterogeneity of vaccine escape variants and emphasizing that these cases of vaccine escape are not explained simply by the spread of one immuno-evasive strain of SARS-CoV-2. Whether the deletions were already present at the times of infection or evolved within these individuals under the pressure of vaccine-induced immunity is not known.
One patient who had received two doses of BTN162b2 in January 2021 was subsequently infected in April. The virus recovered from this patient contained a Δ156-164 deletion, reminiscent of the ΔF157/R158 which has increased in prevalence during the case surge in India (
More interestingly, viral genomes recovered from two breakthrough cases contained deletions outside of the RDRs which we and others have identified from GISAID data15. One patient who was fully vaccinated with BNT162b2 in February was infected in March, and the recovered virus contained a Δ167-174 deletion (
From a structural standpoint, all four deletions map to parts of the NTD that are either at the antigenic supersite or are proximal to it, as seen on the structure. The deletion of these loops is likely to result in lowered antibody binding and thereby may enable immune escape. Some residues reside in a flexible loop (
The identification of deletion stretches outside of the four previously defined RDRs during test positivity surges (ΔF157/R158 in India) and in breakthrough infections (Δ85-90 and Δ167-174 in breakthrough cases at the Mayo Clinic) emphasizes that we must continue to vigilantly monitor deletion patterns to capture new RDRs as they emerge. Indeed, while the SARS-CoV-2 RDRs were initially defined based on 146,795 sequences deposited in GISAID as of Oct. 24, 2020, the number of deposited sequences has increased almost 10-fold over the past seven months.
As such, we examined the current distribution of deletion frequencies for all amino acids in the Spike protein sequence to identify any additional candidate RDRs (
In addition to identifying new RDRs, we also recognized that some RDRs appear to have the capacity to expand (i.e., to involve more flanking amino acids) over time. For example, the Δ246-252 deletion in one of the surge associated Chile variants can be viewed as an expansion of the previously defined RDR4 (Δ242-248)15 (
Taken together, our analysis highlights both the emergence of novel RDRs and the expansion of previously defined RDRs over the past several months. Given the clear need for dynamic classification, we suggest that nomenclature should henceforth be defined by residue numbers rather than sequential 5′ to 3′ order to avoid confusion when new RDRs arise which fall between two that have been previously characterized. As such, the currently existing RDRs in the NTD of the Spike protein can be defined as RDR14-16 (new RDR), RDR67-74 (part of previous RDR1), RDR138-146 (extended RDR2), RDR157-158 (new RDR), RDR210-211 (previous RDR3), and RDR241-252 (extended RDR4). Further, while they have not yet emerged to frequencies warranting an RDR classification in GISAID, the other regions with breakthrough infection-associated deletions (Δ85-90 and Δ167-174) should be monitored as candidates for emerging RDRs in the coming months. Our data suggests that experiments should be conducted to determine whether deletions in several NTD regions (residues 85-90, 156-159, 167-174, and 249-252) impact the binding of NTD-targeted neutralizing antibodies or the capacity of sera from vaccinated individuals to neutralize the virus.
The worldwide mass vaccination campaign has had a profound impact on COVID-19 transmission. However, certain variants are less susceptible to neutralization by sera from vaccinated individuals and convalescent COVID-19 patients24,25. Such findings motivate the need to vigilantly track the emergence of new variants and to determine whether they are likely to cause surges or vaccine breakthrough infections. Here, through an integrated analysis of genomic and epidemiologic data, we found that deletions in the Spike protein NTD which map to an antigenic supersite have emerged over the course of the pandemic, are strongly associated with case surges, and are present in a subset of vaccine breakthrough variants. Indeed, in addition to deletion mutations several substitution mutations (e.g. E484Q, T478K in the receptor binding domain) are also associated with surges in cases (
There are a few limitations of this study. First, the geographic distribution of sequences deposited in GISAID is not representative of the global population, with a majority of the sequences coming from the United States or the United Kingdom. Future genomic epidemiology studies would be improved by expanded sequencing efforts in other countries. Second, the identification of mutations associated with surges during early months of the pandemic is complicated by the relative paucity of whole genome sequencing data deposited during that time. Third, the GISAID data is not linked to any phenotypic information (e.g., disease severity) or relevant medical histories (e.g., comorbidities and vaccination status). Thus, while we are able to identify correlations between mutational prevalence and case surges, we cannot determine whether particular mutations are associated with more severe disease or are observed more frequently than expected by chance in vaccinated individuals. While the latter shortcoming is partially addressed by our independent whole genome sequencing of virus isolated from reinfected and vaccinated patients, this analysis was limited by the small size of the cohort (n=53) and the lack of corresponding antibody titer data.
Taken together, this study illustrates the value of intersecting the disparate fields of epidemiologic surveillance and genomic sequencing. With the COVID-19 vaccine rollout occurring at unprecedented rates, it is critical to rapidly identify emerging mutation patterns and then to characterize single mutations and combinations thereof for their impact on vaccine effectiveness. Looking forward, this dynamic process will require interdisciplinary collaboration among experts in genomics, clinical epidemiology, structural biology, and basic virology. We emphasize that to achieve these goals, we must expand sequencing efforts around the world and encourage the transparent linking of relevant phenotypic data to each deposited sequence.
Our study is extremely timely and has important therapeutic and public health policy implications. The repeated emergence deletions within an antigenic supersite should be considered when developing vaccines and biologics to counter the immuno-evasive strategies of SARS-CoV-2. From a public health standpoint, this study motivates the need to massively scale up whole-genome sequencing efforts globally and highlights the value of clinico-genomic studies which link sequence information to patient phenotypes, particularly in the setting of breakthrough infections.
9,299,506 SARS-CoV-2 genome sequences (with 1,601 unique lineages) were obtained from GISAID26 (data retrieved from https://www.gisaid.org/ on 23 Mar. 2022) for the period of December 2019 to March 2022 across 213 geographical locations. The mutations were called using the Wuhan-Hu-1 sequence as reference (UniProt ID: P0DTC2). To filter out potential sequencing artifacts, we excluded mutations that were present in fewer than 100 sequences, resulting in 3378 unique Spike protein mutations.
To identify mutations that have been temporally associated with surges in COVID-19 cases throughout the pandemic, we assessed monthly mutational prevalences and test positivity over three-month intervals in each country. For each of the 3378 mutations, the monthly mutational prevalence was computed for a given country as:
Positivity data for PCR tests was obtained from the OWID resource27,28 (retrieved from https://github.com/owid/covid-19-data/tree/master/public/data on Apr. 23, 2021). For each country, the monthly test positivity was calculated as:
To identify surge-associated mutations, we classified the monthly mutational prevalence (for each mutation) and the monthly test positivity as increasing (monotonically), decreasing (monotonically), or mixed over sliding three-month intervals over the course of the pandemic. Any mutation which monotonically increased in prevalence over this interval in a country with a simultaneous monotonic increase in test positivity was defined as a “surge-associated mutation.” There were 116 such mutations.
In order to test the value of our method, we obtained the set of CDC variants of interest and concern as of Apr. 15, 20218. At this time (April 2021), there were 5 variants of concern and 8 variants of interest, with no variants of high consequence. From the 13 classified variants, there 56 unique mutations listed, of which 25 were found only in variants of interest, 24 were found only in variants of concern, and 7 were found in both variants of interest and concern. After identifying the surge-associated mutations as described above, we determined the fraction of mutations comprising the CDC-classified variants which were captured by this approach.
After identifying the 177 surge-associated mutations, we tested whether any of the contributing mutation types (deletions, insertions, or substitutions) were enriched for surge-associated mutations. To do so, we constructed a 3×2 table giving the number of surge-associated and non-surge-associated mutations in each category. To determine whether one or more groups showed a statistically significant enrichment, a chi-square p-value was calculated using the chisq.test function from the stats package (4.0.3) in R. Post-hoc tests were performed by considered constructing 2×2 contingency tables to compare each mutation type against all others. Then, odds ratios and their corresponding 95% confidence intervals were calculated using the fisher.test function from the stats package (version 4.0.3) in R.
Recurrent deletion regions (RDRs) were previously defined as four sites within the NTD to which over 90% of all Spike protein deletions occurred, per the 146,795 SARS-CoV-2 sequences deposited in GISAID as of Oct. 24, 2020. To identify potential new RDRs that have emerged since this time, we first plotted the distribution of deletion counts for each amino acid (i.e. number of sequences in which deletion of the given amino acid was observed) in the Spike protein, considering all 9,299,506 sequences analyzed in this study. We calculated the 95th percentile of the deletion count distribution, which is 659. We then bucketed each residue R into categories (Yes, No, Possible) reflecting whether or not it should be considered as part of an RDR (i.e., a contiguous stretch of two or more amino acid residues which undergo deletion events more frequently than expected by chance) as follows (illustrated schematically in Table 2).
Once each residue was categorized in this way, then any residue P in the “Possible” category were subjected to further analysis to convert their labels into “Yes” or “No.” Specifically, we took a step-wise approach, walking in both directions from P until the first encounter of a residue categorized as “Yes” or “No” (i.e., other residues labeled as “Possible” were ignored). If a residue categorized as “Yes” was encountered before any residue categorized as “No” in either direction, then the “Possible” label was converted to “Yes.” If a residue categorized as “No” was encountered before any residue categorized as “Yes” in both directions, then the “Possible” label was converted to “Yes.”
With each residue categorized as “Yes” or “No”, we then simply merged the residue windows with consecutive “Yes” labels to define the updated set of Spike protein RDRs. We name the RDRs on the basis of the first and last amino acid residues contained within the region; for example, the RDR including residues C14, Q15, and V16 is defined as RDR14-16.
To assess the expansion of regions undergoing deletions over time, we plotted a time series heatmap indicating the first time (month) at which a given deletion was identified across all GISAID sequences, and the number of sequences in which that deletion was detected in that month and all subsequent months. The residues plotted were defined based on the definition of RDRs provided above, which builds upon the regions defined previously15.
Structural analyses and illustrations were performed in PyMOL (version 2.3.4). The cryo-EM structure of the Spike protein characterizing the interaction with a neutralizing antibody 4A8 (PDB identifier: 7C2L), described by Chi et al.19, was retrieved from the PDB.
Whole Viral Genome Sequencing of SARS-CoV-2 Obtained from Individuals with Breakthrough Infections
This is a retrospective study of individuals who underwent polymerase chain reaction (PCR) testing for suspected SARS-CoV-2 infection at the Mayo Clinic and hospitals affiliated to the Mayo health system. This study was reviewed by the Mayo Clinic Institutional Review Board and determined to be exempt from human subjects research. Subjects were excluded if they did not have a research authorization on file.
SARS-CoV-2 RNA-positive upper respiratory tract swab specimens from patients with vaccine breakthrough or reinfection of COVID-19 were subjected to next-generation sequencing, using the commercially available Ion AmpliSeq SARS-CoV-2 Research Panel (Life Technologies Corp., South San Francisco, Calif.) based on the “sequencing by synthesis” method. The assay amplifies 237 sequences ranging from 125 to 275 base pairs in length, covering 99% of the SARS-CoV-2 genome. Viral RNA was first manually extracted and purified from these clinical specimens using MagMAX™ Viral/Pathogen Nucleic Acid Isolation Kit (Life Technologies Corp.), followed by automated reverse transcription-PCR (RT-PCR) of viral sequences, DNA library preparation (including enzymatic shearing, adapter ligation, purification, normalization), DNA template preparation, and sequencing on the automated Genexus™ Integrated Sequencer (Life Technologies Corp.) with the Genexus™ Software version 6.2.1. A no-template control and a positive SARS-CoV-2 control were included in each assay run for quality control purposes. Viral sequence data were assembled using the Iterative Refinement Meta-Assembler (IRMA) application (50% base substitution frequency threshold) to generate unamended plurality consensus sequences for analysis with the latest versions of the web-based application tools: Pangolin29 for SARS-CoV-2 lineage assignment; Nextclade30 for viral clade assignment, phylogenetic analysis, and S codon mutation calling, in comparison to the wild-type reference sequence of SARS-CoV-2 Wuhan-Hu-1 (lineage B, clade 19A).
Mutations correlated with recent increased test positivity rate over the three-month period starting between February 2021 and April, 2021. We ensured that these mutations are prevalent in at least 5% of the number of sequences deposited within this time period in GISAID. A minimum cut-off of 5% test positivity within this three-month window was also applied to ensure we capture surges with relevant magnitude associated with it. Only the top five mutations that were observed to have maximum change in their prevalence % (min−max) over the three-month period. The test positivity rate observed across these three-months in different countries are also shown.
Schematic representation of the decision schema for considering a residue R to be a part of a RDR. Deletion count of <=687 is represented by X. Deletion count of >=688 is represented by ✓.
List of GISAID accession IDs with the same recurrent deletions observed as seen in the vaccine breakthrough patients.
All the mutations in the spike protein that have positive correlation with the test positivity percentage across the complete timeline of pandemic in India has been tabulated here. Following are the expansion of the abbreviations used in the table header—Total Seqs. Dep.: Total number of sequences deposited in the particular month in India. Test Pos. %: Test positivity percentage, Mut Prev. %: Mutation prevalence percentage, Rho (Pearson) Mut Prev. % vs Test Pos. %: The Pearson correlation Rho value between test positivity and mutational prevalence, Test pos. List: test positivity percentage over the window of 3 months, Mut Prev. List: mutation prevalence percentage over the window of 3 months, MaxΔ Mut Prev.: maximum difference in the mutational prevalence percentage observed over the window of 3 months.
All the mutations in the spike protein that have positive correlation with the test positivity percentage across the complete timeline of pandemic in Chile has been tabulated here. Following are the expansion of the abbreviations used in the table header—Total Seqs. Dep.: Total number of sequences deposited in the particular month in Chile. Test Pos. %: Test positivity percentage, Mut Prev. %: Mutation prevalence percentage, Rho (Pearson) Mut Prev. % vs Test Pos. %: The Pearson correlation Rho value between test positivity and mutational prevalence, Test pos. List: test positivity percentage over the window of 3 months, Mut Prev. List: mutation prevalence percentage over the window of 3 months, MaxΔ Mut Prev.: maximum difference in the mutational prevalence percentage observed over the window of 3 months.
Each publication and patent mentioned herein is hereby incorporated by reference in its entirety. In case of conflict, the present specification, including any definitions herein, will control.
While specific embodiments of the subject invention have been discussed, the above specification is illustrative and not restrictive. Many variations of the invention will become apparent to those skilled in the art upon review of the preceding description and the following claims. The full scope of the invention should be determined by reference to the claims, along with their full scope of equivalents, and by reference to the rest of the specification, along with such variations.
This application claims the benefit of U.S. Provisional Application No. 63/192,434, filed May 24, 2021, which is hereby incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
63192434 | May 2021 | US |