OPTIMIZED NUCLEOTIDE SEQUENCES ENCODING SARS-COV-2 ANTIGENS

Abstract
The present invention relates to optimized nucleotide sequence encoding SARS-COV-2 antigens. These sequences are particularly suitable for use in vaccine compositions for the treatment or prevention of infections caused by a β-coronaviruses, including COVID-19 infections, in a human or animal subject in need of such treatment.
Description
SEQUENCE LISTING

The present specification makes reference to a Sequence Listing electronically filed in ASCII format. The sequence listing file named 122548. US044_ST25.txt, was created on Apr. 3, 2024, and is 811,008 bytes in size. The entire contents of the sequence listing are herein incorporated by reference in their entirety.


FIELD OF THE INVENTION

The present invention relates to SARS-COV-2 antigenic polypeptides and to optimized nucleotide sequence encoding these SARS-COV-2 antigenic polypeptides. These antigenic polypeptides and optimized nucleotide sequences are particularly suitable for use in vaccine compositions for the treatment or prevention of infections caused by a β-coronaviruses, including COVID-19 infections, in a human or animal subject in need of such treatment.


BACKGROUND OF THE INVENTION

The Coronavirus Disease 2019 (COVID-19) pandemic poses a serious threat to global public health. The causative agent of COVID-19 is severe acute respiratory syndrome coronavirus 2 (SARS-COV-2), a newly emerged human pathogen.


Protein antigen selection and design both contribute to the immunogenicity of a vaccine, whether it is protein-based or nucleic acid-based. Moreover, with respect to nucleic acid-based immunogenic compositions such as mRNA-based vaccines, expression levels achieved from the nucleic acid encoding one or more protein antigens can significantly impact efficacy.


Recombinant DNA technology and advances in nucleic acid sequencing and synthesis have made it possible to rapidly design protein antigens, once the genome sequence of a pathogen has been determined. Success or failure of a vaccine can depend on the selection of antigenic polypeptides that yield a highly effective response in form of neutralizing antibodies in vivo. Therefore a need exists to provide new antigenic polypeptides derived from SARS-COV-2 proteins for use in immunogenic compositions that provide prophylaxis against COVID-19.


Effective expression or production of a protein from an mRNA within a cell depends on a variety of factors. Optimization of the composition and order of codons within a protein-coding nucleotide sequence (“codon optimization”) can lead to higher expression of the mRNA-encoded protein. Various methods of performing codon optimization are known in the art, however, each has significant drawbacks and limitations from a computational and/or therapeutic point of view. In particular, known methods of codon optimization often involve, for each amino acid, replacing every codon with the codon having the highest usage for that amino acid, such that the “optimized” sequence contains only one codon encoding each amino acid.


Accordingly, a need exists for improved codon optimization methods that generate an optimized nucleotide sequence for increased expression of mRNA encoding a selected or designed protein antigen for the production of an efficacious mRNA vaccine.


Moreover, with the global spread of SARS-COV-2, new variants of the virus have emerged. Therefore, a need exists to provide pharmaceutical compositions (e.g., immunogenic compositions) that are capable of eliciting a broadly neutralizing antibody response effective against a multitude of naturally occurring variants of SARS-COV-2.


SUMMARY OF THE INVENTION

The present invention addresses the need for selecting and/or designing a protein antigen that yields an effective immune response against SARS-COV-2. It also addressed the need for generating optimized nucleotide sequences encoding that protein antigen for the effective treatment or prevention of COVID-19 infections through the provision of a vaccine comprising a nucleic acid (e.g., an mRNA) with the optimized nucleotide sequence. Various selected and/or designed protein antigens against SARS-COV-2 are provided herein, as well as at least one optimized nucleotide sequence for each such protein antigen.


In addition, a method is provided for analyzing an amino acid sequence of a protein antigen to produce at least one optimized nucleotide sequence. The optimized nucleotide sequence for each selected and/or designed protein antigen is designed to increase the expression of that encoded protein antigen compared to the expression of the protein associated with a naturally occurring nucleotide sequence. Codon optimization produces a protein-coding nucleotide sequence based on various criteria without altering the sequence of translated amino acids of the encoded protein antigen, due to the redundancy in the genetic code. Moreover, the optimized nucleotide sequences disclosed here are designed to produce high-quality full-length transcripts during in vitro synthesis and therefore can be manufactured more cost effectively than optimized nucleotide sequences generated with prior art codon optimization algorithms. In particular, termination sequences and the like that could result in incomplete transcripts during in vitro synthesis are effectively removed by the sequence optimization processes described herein.


As demonstrated in the examples, immunogenic compositions that comprise a LNP-encapsulated optimized nucleotide sequence of the invention which encodes a full-length pre-fusion stabilized SARS-COV-2 S protein can produce an effective neutralizing antibody response and therefore can provide protective efficacy against COVID-19 infection.


The present invention also addresses the need for immunogenic compositions that are capable of eliciting a broadly effective immune response, in particular in the form of neutralizing antibodies, against naturally occurring variants of SARS-COV-2. As shown in the examples, the inventors surprisingly discovered that administration of an immunogenic composition that comprises a LNP-encapsulated optimized nucleotide sequence which encodes a South African variant of the SARS-COV-2 S protein to subjects who have been previously immunized with a COVID-19 vaccine can induce an effective neutralizing antibody response against a broad range of β-coronaviruses, including naturally occurring variants of SARS-COV-2 isolated in Wuhan, South Africa, Japan/Brazil and California, as well as the phylogenetically more distant SARS-CoV-1 strain.


In particular, the invention provides a nucleic acid comprising an optimized nucleotide sequence encoding a full-length SARS-COV-2 spike protein which has been modified relative to naturally occurring full-length SARS-COV-2 spike protein of SEQ ID NO: 1 to remove the furin cleavage site and to mutate residues 986 and 987 to proline, wherein the optimized nucleotide sequence consists of codons associated with a usage frequency which is greater than or equal to 10%; wherein the optimized nucleotide sequence: does not contain a termination signal having one of the following nucleotide sequences: 5′-X1ATCTX2TX3-3′, wherein X1, X2 and X3 are independently selected from A, C, T or G; and 5′-X1AUCUX2UX3-3′, wherein X1, X2 and X3 are independently selected from A, C, U or G; does not contain any negative cis-regulatory elements and negative repeat elements; and has a codon adaptation index greater than 0.8; wherein, when divided into non-overlapping 30 nucleotide-long portions, each portion of the optimized nucleotide sequence has a guanine cytosine content range of 30%-70%. In particular embodiments, the nucleic acid is mRNA. In some embodiments, the nucleic acid is DNA. In certain embodiments, the optimized nucleotide sequence does not contain a termination signal having one of the following sequences: TATCTGTT; TTTTTT; AAGCTT; GAAGAGC; TCTAGA; UAUCUGUU; UUUUUU; AAGCUU; GAAGAGC; UCUAGA.


In some embodiments, the optimized nucleotide sequence encodes the amino acid sequence of SEQ ID NO:11. In particular embodiments, the optimized nucleotide sequence is at least 85% (e.g., at least 90%) identical to the nucleic acid sequence of SEQ ID NO: 44 or SEQ ID NO: 148 and encodes the amino acid sequence of SEQ ID NO: 11. In specific embodiments, the optimized nucleotide sequence has the nucleic acid sequence of SEQ ID NO: 44 or SEQ ID NO: 148.


In some embodiments, the full-length SARS-COV-2 spike protein encoded by the optimized sequence further contains the L18F, D80A, D215G, L242-, A243-, L244-, K417N, E484K, N501Y, D614G and A701V mutations. In these embodiments, the optimized nucleotide sequence may encode the amino acid sequence of SEQ ID NO: 167. In particular embodiments, the optimized nucleotide is at least 85% (e.g., at least 90%) identical to the nucleic acid sequence of SEQ ID NO: 166 or SEQ ID NO: 173 and encodes the amino acid sequence of SEQ ID NO: 167. In specific embodiments, the optimized nucleotide sequence has the nucleic acid sequence of SEQ ID NO: 166 or SEQ ID NO: 173.


In certain embodiments, a nucleic acid of the invention is for use in therapy. For example, the invention also provides an immunogenic composition comprising a nucleic acid of the invention for use in the prophylaxis of an infection caused by a β-coronavirus. In addition, the invention also provides use of a nucleic acid of the invention in the manufacture of a medicament for the prophylaxis of an infection caused by a β-coronavirus. In certain embodiments, the β-coronavirus expresses a spike protein which binds to angiotensin-converting enzyme 2 (ACE2). In a specific embodiment, the β-coronavirus is SARS-COV-2. In other embodiments, the β-coronavirus has a spike protein that is at least 75%, 80%, 90%, 95% or 99% identical to SEQ ID NO: 1.


The invention further provides a method of treating or preventing an infection caused by a β-coronavirus, said method comprising administering to a subject an effective amount of an immunogenic composition comprising a nucleic acid of the invention. In certain embodiments, the β-coronavirus expresses a spike protein which binds to angiotensin-converting enzyme 2 (ACE2). In a specific embodiments, the β-coronavirus is SARS-COV-2. In other embodiments, the β-coronavirus has a spike protein that is at least 75%, 80%, 90%, 95% or 99% identical to SEQ ID NO: 1.


Furthermore, the invention provides a pharmaceutical composition comprising i) a nucleic acid of the invention and ii) a lipid nanoparticle. In certain embodiments, the nucleic acid is mRNA, which may be present at a concentration of between about 0.5 mg/mL to about 1.0 mg/mL. In certain embodiments, the nucleic acid of the invention (e.g., an mRNA in accordance with the invention) is encapsulated in the lipid nanoparticle. In certain embodiments, the lipid nanoparticle comprises a cationic lipid, a non-cationic lipid, a cholesterol-based lipid, and a PEG-modified lipid. In particular embodiments, the cationic lipid is selected from cKK-E12, cKK-E10, OF-Deg-Lin and OF-02; the non-cationic lipid is selected from DOPE and DEPE; the cholesterol-based lipid is cholesterol; and the PEG-modified lipid is DMG-PEG-2K. In certain embodiments, the cationic lipid constitutes about 30-60% of the lipid nanoparticle by molar ratio, e.g., about 35-40%. In certain embodiments, the ratio of cationic lipid to non-cationic lipid to cholesterol-based lipid to PEG-modified lipid is approximately 30-60:25-35:20-30:1-15 by molar ratio.


In certain embodiments, a lipid nanoparticle encapsulating a nucleic acid of the invention (e.g., an mRNA in accordance with the invention) comprises cKK-E12, DOPE, cholesterol and DMG-PEG2K; cKK-E10, DOPE, cholesterol and DMG-PEG2K; OF-Deg-Lin, DOPE, cholesterol and DMG-PEG2K; or OF-02, DOPE, cholesterol and DMG-PEG2K. In a specific embodiment, the lipid nanoparticle comprises cKK-E10, DOPE, cholesterol and DMG-PEG2K at the molar ratios 40:30:28.5:1.5. In certain embodiments, the lipid nanoparticle has an average size of less than 150 nm, e.g., less than 130 nm, less than 110 nm, less than 100 nm. In some embodiments, the lipid nanoparticle has an average size of about 90-110 nm, or has an average size of about 50-70 nm, e.g., about 55-65 nm.


In certain embodiments, a pharmaceutical composition of the invention is for use in treating or preventing an infection caused by a β-coronavirus. In certain embodiments, the β-coronavirus expresses a spike protein which binds to angiotensin-converting enzyme 2 (ACE2). In a specific embodiment, the β-coronavirus is SARS-COV-2. In other embodiments, the β-coronavirus has a spike protein that is at least 75%, 80%, 90%, 95% or 99% identical to SEQ ID NO: 1.


In certain embodiments, a pharmaceutical composition of the invention is administered intramuscularly. In certain embodiments, a pharmaceutical composition of the invention is administered at least once. In some embodiments, a pharmaceutical composition is administered at least twice. In particular embodiments, the period between administrations is at least 2 weeks, e.g. 3 weeks, or 1 month. In some embodiments, the period between administrations is about 3 weeks.


In one particular embodiment, the invention provides an mRNA construct (mRNA construct 1) consisting of the following structural elements:

    • a 5′ cap with the following structure:




embedded image




    • a 5′ untranslated region (5′ UTR) having the nucleic acid sequence of SEQ ID NO: 144;

    • a protein coding region having the nucleic acid sequence of SEQ ID NO: 148;

    • a 3′ untranslated region (3′ UTR) having the nucleic acid sequence of SEQ ID NO: 145; and

    • a poly A tail.





In another particular embodiment, the invention provides an mRNA construct (mRNA construct 2) consisting of the following structural elements:

    • a 5′ cap with the following structure:




embedded image




    • a 5′ untranslated region (5′ UTR) having the nucleic acid sequence of SEQ ID NO: 144;

    • a protein coding region having the nucleic acid sequence of SEQ ID NO: 173;

    • a 3′ untranslated region (3′ UTR) having the nucleic acid sequence of SEQ ID NO: 145; and

    • a poly A tail.





In specific embodiments, the invention provides a lipid nanoparticle encapsulating an mRNA construct of the invention. In some embodiments, the lipid nanoparticle encapsulates more than one mRNA construct of the invention, e.g. a lipid nanoparticle may encapsulate both mRNA construct 1 and mRNA construct 2. In certain embodiments, the lipid nanoparticle comprises a cationic lipid, a non-cationic lipid, a cholesterol-based lipid and a PEG-modified lipid. In certain embodiments, the cationic lipid is selected from cKK-E12, cKK-E10, OF-Deg-Lin and OF-02; the non-cationic lipid is selected from DOPE and DEPE; the cholesterol-based lipid is cholesterol; and the PEG-modified lipid is DMG-PEG-2K. In a specific embodiment, the lipid nanoparticle comprises cKK-E10, DOPE, cholesterol and DMG-PEG2K at the molar ratios 40:30:28.5:1.5.


The invention also provides an immunogenic composition comprising an mRNA construct of the invention, or a lipid nanoparticle encapsulating an mRNA construct of the invention. In some embodiments, the immunogenic composition comprises more than one mRNA constructs of the invention, e.g., mRNA construct 1 and mRNA construct 2. In some embodiments, the immunogenic composition comprises the more than one mRNA constructs (e.g., mRNA construct 1 and mRNA construct 2) encapsulated in the same lipid nanoparticle. In other embodiments, the more than one mRNA constructs e.g., mRNA construct 1 and mRNA construct 2) are encapsulated in separate lipid nanoparticles. In certain embodiments, the immunogenic composition comprises between 5 μg and 200 μg of the mRNA construct(s).


In certain embodiments, the immunogenic composition comprises between 7 μg and 135 μg of the mRNA construct(s). In certain embodiments, the immunogenic composition comprises at least 10 μg of the mRNA construct(s). In certain embodiments, the immunogenic composition comprises at least 15 μg of the mRNA construct(s). In certain embodiments, the immunogenic composition comprises at least 20 μg of the mRNA construct(s). In certain embodiments, the immunogenic composition comprises at least 25 μg of the mRNA construct(s). In certain embodiments, the immunogenic composition comprises at least 35 μg of the mRNA construct(s). In certain embodiments, the immunogenic composition comprises at least 40 μg of the mRNA construct(s). In certain embodiments, the immunogenic composition comprises at least 45 μg of the mRNA construct(s). In certain embodiments, the immunogenic composition comprises 7.5 μg, 15 μg, 45 μg or 135 μg of the mRNA construct(s). Typically, reference to a certain μg amount of mRNA refers to the total dose of mRNA in the immunogenic composition. In certain embodiments, an immunogenic composition comprising an mRNA


construct of the invention, or a lipid nanoparticle encapsulating an mRNA construct of the invention, is for use in treating or preventing an infection caused by a β-coronavirus. In certain embodiments, the β-coronavirus expresses a spike protein which binds to angiotensin-converting enzyme 2 (ACE2). In a specific embodiment, the β-coronavirus is SARS-COV-2. In other embodiments, the β-coronavirus has a spike protein that is at least 75%, 80%, 90%, 95% or 99% identical to SEQ ID NO: 1.


The invention also provides a method of treating or preventing an infection caused by a β-coronavirus, said method comprising administering to a subject an effective amount of an immunogenic composition comprising an mRNA construct of the invention, or a lipid nanoparticle encapsulating an mRNA construct of the invention. In certain embodiments, the β-coronavirus expresses a spike protein which binds to angiotensin-converting enzyme 2 (ACE2). In a specific embodiment, the β-coronavirus is SARS-COV-2. In other embodiments, the β-coronavirus has a spike protein that is at least 75%, 80%, 90%, 95% or 99% identical to SEQ ID NO: 1. In particular embodiments, the immunogenic composition is administered to the subject at least twice. In certain embodiments, the period between administrations is at least 2 weeks. In some embodiments, the period between administrations is about 3 weeks.


In a particular embodiment, the invention provides an immunogenic composition comprising at least two nucleic acids, wherein the first nucleic acid comprises an optimized nucleotide sequence encoding a full-length SARS-COV-2 spike protein which has been modified relative to naturally occurring full-length SARS-COV-2 spike protein of SEQ ID NO: 1 to remove the furin cleavage site and to mutate residues 986 and 987 to proline; and the second nucleic acid comprises an optimized nucleotide sequence encoding a full-length SARS-COV-2 spike protein which has been modified relative to naturally occurring full-length SARS-COV-2 spike protein of SEQ ID NO: 1 to remove the furin cleavage site and to mutate residues 986 and 987 to proline and further contains the L18F, D80A, D215G, L242-, A243-, L244-, K417N, E484K, N501Y, D614G and A701V mutations.


In some embodiments, the first nucleic acid comprises an optimized nucleotide sequence which encodes the amino acid sequence of SEQ ID NO: 11. In particular embodiments, the first nucleic acid comprises an optimized nucleotide sequence that is at least 85% (e.g., at least 90%) identical to the nucleic acid sequence of SEQ ID NO: 44 or SEQ ID NO: 148 and encodes the amino acid sequence of SEQ ID NO: 11. In specific embodiments, the optimized nucleotide sequence has the nucleic acid sequence of SEQ ID NO: 44 or SEQ ID NO: 148.


In some embodiments, the second nucleic acid comprises an optimized nucleotide sequence which encodes the amino acid sequence of SEQ ID NO: 167. In particular embodiments, the second nucleic acid comprises an optimized nucleotide sequence that is at least 85% (e.g., at least 90%) identical to the nucleic acid sequence of SEQ ID NO: 168 or SEQ ID NO: 173 and encodes the amino acid sequence of SEQ ID NO: 167. In specific embodiments, the optimized nucleotide sequence has the nucleic acid sequence of SEQ ID NO: 166 or SEQ ID NO: 169.


In certain embodiments, the at least two nucleic acids are mRNA constructs. In specific embodiments, the optimized nucleotide sequence of the first nucleic acid has the nucleic acid sequence of SEQ ID NO: 148, and the optimized nucleotide sequence of the second nucleic acid has the nucleic acid sequence of SEQ ID NO: 173. In particular embodiments, the first nucleic acid is mRNA construct 1, and the second nucleic acid is mRNA construct 2. In certain embodiments, the at least two nucleic acids are encapsulated in lipid nanoparticles. In certain embodiments, the at least two nucleic acids are encapsulated in the same lipid nanoparticle. In certain embodiments, the at least two nucleic acids are encapsulated in separate lipid nanoparticles.


In some embodiments, the lipid nanoparticle comprises a cationic lipid, a non-cationic lipid, a cholesterol-based lipid and a PEG-modified lipid. In certain embodiments, the cationic lipid is selected from cKK-E12, cKK-E10, OF-Deg-Lin and OF-02; the non-cationic lipid is selected from DOPE and DEPE; the cholesterol-based lipid is cholesterol; and the PEG-modified lipid is DMG-PEG-2K. In specific embodiments, the lipid nanoparticle comprises cKK-E10, DOPE, cholesterol and DMG-PEG2K at the molar ratios 40:30:28.5:1.5. In further specific embodiments, the immunogenic composition comprises a total of 7.5 μg, 15 μg, 45 μg or 135 μg of the at least two nucleic acids.


The immunogenic composition described in paragraphs [0030]-[0034] can be used in the prophylaxis of an infection caused by a β-coronavirus. In certain embodiments, the β-coronavirus expresses a spike protein which binds to angiotensin-converting enzyme 2 (ACE2). In a specific embodiment, the β-coronavirus is SARS-COV-2. In other embodiments, the β-coronavirus has a spike protein that is at least 75% (e.g., at least 80%, 90%, 95% or 99%) identical to SEQ ID NO: 1.


In certain embodiments, the subject has not previously been administered an immunogenic composition for the prophylaxis of an infection caused by a β-coronavirus (e.g., SARS-COV-2), i.e., the immunogenic composition described in paragraphs [0030]-[0034] is the first immunogenic composition which is administered to the subject for that purpose. More commonly, the subject has previously been administered with one or more immunogenic composition(s) for the prophylaxis of an infection caused by a β-coronavirus (e.g., SARS-COV-2). For clarity “the subject has previously been administered with one or more immunogenic composition(s)” means that the subject has previously been administered with one or more doses of the same immunogenic composition or with one or more doses of different immunogenic composition(s)”. For example, the subject may have previously been administered with two immunogenic compositions at least two weeks apart for the prophylaxis of an infection caused by a β-coronavirus (e.g., SARS-COV-2). In some embodiments, these one or more immunogenic composition(s) is/are different from the immunogenic composition described in paragraphs [0030]-[0034]. In specific embodiments, the one or more immunogenic composition(s) is/are selected from a pharmaceutical compositions disclosed herein (e.g., an immunogenic composition or a vaccine disclosed herein) and a COVID-19 vaccine produced by Moderna (COVID-19 Vaccine Moderna, such as for example, mRNA-1273 or mRNA-1283), CureVac (CVnCOV), Johnson & Johnson (COVID-19 Vaccine Janssen), AstraZeneca (Vaxzevria), Pfizer/BioNTech (Comirnaty), Sputnik (Gam-COVID-Vac), Sinovac (COVID-19 Vaccine (Vero Cell) Inactivated) or Novavax (NVX-CoV2373). In certain embodiments, the immunogenic composition described in paragraphs [0030]-[0034] is administered 3-18 months after administration of the one or more immunogenic composition(s), which were previously administered to the subject for the prophylaxis of an infection caused by a β-coronavirus (e.g., SARS-COV-2). In certain embodiments, the immunogenic composition described in paragraphs [0030]-[0034] is administered at least 9 months or at least 12 months after administration of the one or more immunogenic composition(s), which were previously administered to the subject for the prophylaxis of an infection caused by a β-coronavirus (e.g., SARS-COV-2). In certain embodiments, the immunogenic composition described in paragraphs [0030]-[0034] is administered at least once, e.g., at least twice.


In another particular embodiment, the invention provides a method of treating or preventing an infection caused by a β-coronavirus, said method comprising administering to a subject an effective amount of an immunogenic composition comprising an mRNA construct, wherein said mRNA construct comprises an optimized nucleotide sequence encoding a full-length SARS-COV-2 spike protein which has been modified relative to naturally occurring full-length SARS-COV-2 spike protein of SEQ ID NO: 1 to remove the furin cleavage site and to mutate residues 986 and 987 to proline and further contains the L18F, D80A, D215G, L242-, A243-, L244-, K417N, E484K, N501Y, D614G and A701V mutations. In some embodiments, the optimized nucleotide sequence encodes the amino acid sequence of SEQ ID NO: 167. In particular embodiments, the optimized nucleotide sequence comprises a nucleotide sequence that is at least 85% (e.g., at least 90%) identical to the nucleic acid sequence of SEQ ID NO: 166 or SEQ ID NO: 173 and encodes the amino acid sequence of SEQ ID NO: 167. In a specific embodiment, the optimized nucleotide sequence has the nucleic acid sequence of SEQ ID NO: 173. In certain embodiments, the mRNA construct is mRNA construct 2. In certain embodiments, the mRNA construct is encapsulated in a lipid nanoparticle. In certain embodiments, the lipid nanoparticle comprises a cationic lipid, a non-cationic lipid, a cholesterol-based lipid and a PEG-modified lipid. In certain embodiments, the cationic lipid is selected from cKK-E12, cKK-E10, OF-Deg-Lin and OF-02; the non-cationic lipid is selected from DOPE and DEPE; the cholesterol-based lipid is cholesterol; and the PEG-modified lipid is DMG-PEG-2K. In specific embodiments, the lipid nanoparticle comprises cKK-E10, DOPE, cholesterol and DMG-PEG2K at the molar ratios 40:30:28.5:1.5. In certain embodiments, the immunogenic composition comprises 7.5 μg, 15 μg, 45 μg or 135 μg of the mRNA construct. In certain embodiments, the β-coronavirus expresses a spike protein which binds to angiotensin-converting enzyme 2 (ACE2). In a specific embodiment, the β-coronavirus is SARS-COV-2. In other embodiments, the β-coronavirus has a spike protein that is at least 75%, 80%, 90%, 95% or 99% identical to SEQ ID NO:1.


In the method described in paragraph [0037], the subject may have not previously been administered an immunogenic composition for the prophylaxis of an infection caused by a β-coronavirus (e.g., SARS-COV-2). More commonly, the subject has previously been administered with one or more immunogenic composition(s) for the prophylaxis of an infection caused by a β-coronavirus (e.g., SARS-COV-2), e.g., two immunogenic compositions at least two weeks apart. In certain embodiments, the one or more immunogenic composition(s) is/are different from the immunogenic compositions of the invention. In certain embodiments, the one or more immunogenic composition(s) which has/have previously been administered to the subject is/are selected from a pharmaceutical compositions disclosed herein (e.g., an immunogenic composition or a vaccine disclosed herein) and a COVID-19 vaccine produced by Moderna (COVID-19 Vaccine Moderna, such as for example, mRNA-1273 or mRNA-1283), CureVac (CVnCOV), Johnson & Johnson (COVID-19 Vaccine Janssen), AstraZeneca (Vaxzevria), Pfizer/BioNTech (Comirnaty), Sputnik (Gam-COVID-Vac), Sinovac (COVID-19 Vaccine (Vero Cell) Inactivated) or Novavax (NVX-CoV2373). In certain embodiments, the method described in paragraph comprises administering the immunogenic composition described in that paragraph about 3-18 months after administration of the one or more immunogenic composition(s) which has/have previously been administered to the subject. In certain embodiments, the method described in paragraph comprises administering the immunogenic composition described in that paragraph [0037] at least 9 months or at least 12 months after administration of the one or more immunogenic composition(s). In certain embodiments, the method described in paragraph comprises administering the immunogenic composition described in that paragraph at least once, e.g., at least twice.





BRIEF DESCRIPTION OF THE DRAWINGS


FIGS. 1A and 1B illustrate a process for generating optimized nucleotide sequences in accordance with the invention. As illustrated in FIG. 1A, the process receives an amino acid sequence of interest and a first codon usage table which reflects the frequency of each codon in a given organism (e.g., a mammal or human). The process then removes codons from the first codon usage table if they are associated with a codon usage frequency which is less than a threshold frequency (e.g., 10%). The codon usage frequencies of the codons not removed in the first step are normalized to generate a normalized codon usage table. The process uses the normalized codon usage table to generate a list of optimized nucleotide sequences. Each of the optimized nucleotide sequences encode the amino acid sequence of interest. As illustrated in FIG. 1B, the list of optimized nucleotide sequences is further processed by applying a motif screen filter, guanine-cytosine (GC) content analysis filter, and codon adaptation index (CAI) analysis filter, in that order, to generate an updated list of optimized nucleotide sequences.



FIG. 2 illustrates an example bar chart depicting the yield of protein produced from various codon optimized nucleotide sequences, determined by an ELISA assay for EPO.



FIG. 3 illustrates the structure of the spike protein of SARS-COV-2. SS=signal sequence; NTD=N-terminal domain; RBD=receptor binding domain; FP=fusion peptide; HR1=heptad repeat-N; CH, central helix; CTD, connector domain; HR2, heptad repeat 2; TM, transmembrane domain; CT, cytoplasmic tail. S2′, S2′ protease cleavage site are denoted with arrows. The PP and GSAS mutations lead to a prefusion conformations of the spike protein. This image is based FIG. 1 in Wrapp et al (2020) Science 367, 6483, 1260-1263.



FIG. 4 illustrates the spike protein of SARS-COV-2 and variants thereof that may form part of the pharmaceutical compositions disclosed herein or may be encoded by the optimized nucleotides sequences disclosed herein, e.g., for use in the nucleic acid-based vaccines disclosed here. Domains and subunits, mutations to remove the furin cleavage site and replace residues 985, 986 and 987 with proline (P, PP, PPP and GSAS mutations) and the relevant SEQ ID NOs are indicated. The same abbreviations are used as in FIG. 3.



FIGS. 5-7 demonstrate the protein production of nucleic acid vector constructs expressing optimized nucleic sequences encoding a full length native SARS-COV-2 S protein (Construct A) and three stable prefusion conformations of a SARS-COV-2 S protein (Constructs B-D). Construct B encodes a variant SARS-COV-2 S protein that is modified relative to naturally occurring SARS-COV-2 spike protein to lack the furin cleavage site (and therefore is not cleaved to form the S1 and S2 subunits) and to contain prolines as residues 986 and 987 (thereby stabilizing the protein in its prefusion conformation). Construct C encodes a variant SARS-COV-2 S protein that is modified relative to naturally occurring SARS-COV-2 spike protein to contain prolines as residues 986 and 987 and Construct D encodes a variant SARS-COV-2 S protein that is modified relative to naturally occurring SARS-COV-2 spike protein to lack the furin cleavage site.



FIGS. 5-6 show that constructs A and B can produce a glycosylated mature protein (˜225 kDa band) and a pre-processed full length S protein (˜170-180 kDa band).



FIG. 5 also shows the presence of S1 and S2 subunit bands with Construct A, demonstrating that the native full length SARS-COV-2 S protein is processed correctly by the cells.



FIG. 7 demonstrates that all four constructs were able to produce full length S protein. S1 and S2 subunit bands were detected with Construct A and Construct C. Strong bands of fully glycosylated mature S protein were detected with Construct B and Construct D.



FIG. 8 illustrates the spike protein of SARS-COV-2 and variants thereof that may form part of the pharmaceutical compositions disclosed herein or may be encoded by the optimized nucleotides sequences disclosed herein, e.g., for use in the nucleic acid-based vaccines disclosed here. Domains, subunits, mutations to remove the furin cleavage site and mutate residues 817, 892, 899, 942, 986 and 987 with proline (P, PP, PPP, PPPPP and GSAS), the D614G mutation, removal of the ER retrieval signal and an extended N-terminal signal peptide and the relevant SEQ ID NOs are indicated. The same abbreviations are used as in FIG. 3.



FIG. 9 illustrates that an immunogenic composition of lipid nanoparticle (LNP)-encapsulated mRNA comprising an optimized nucleotide sequence encoding a full-length uncleavable pre-fusion stabilized SARS-COV-2 S protein produced a robust binding and neutralizing antibody response in mice. FIG. 9A illustrates the ELISA titers elicited in mice after immunization with two doses of 0.2 μg, 1 μg, 5 μg or 10 μg LNP-encapsulated mRNA. A group of mice to which the diluent of the mRNA-LNP composition was administered acted as a negative control. FIG. 9B illustrates the titer of neutralizing antibodies produced in mice after immunization with two doses of either 0.2 μg, 1 μg, 5 μg or 10 μg LNP-encapsulated mRNA as determined by a pseudovirus-based assay. 39 individual conversion serum samples from COVID-19 patients with mild, strong and severe symptoms (Conv Sera) acted as a positive control. As illustrated in FIG. 9C, the immunogenic composition was administered on Day 0 and Day 21. Blood was sampled on days Day −7 (baseline), Day 14, Day 21, Day 28 and Day 35.



FIG. 10 illustrates that an immunogenic composition of LNP-encapsulated mRNA comprising an optimized nucleotide sequence encoding a full-length uncleavable pre-fusion stabilized SARS-COV-2 S protein produced a Th1-biased T-cell response in mice. FIG. 10A shows that splenocytes isolated at Day 35 secreted high levels of the Th1 cytokine interferon-γ (IFN-γ). FIG. 10B shows that these splenocytes did not secrete detectable amounts of the Th2 cytokine IL-5. As illustrated in FIG. 10C, the mice were immunized with two doses of either 5 μg or 10 μg LNP-encapsulated mRNA at Day 0 and Day 21, blood was sampled on days Day −4, Day 14, Day 21, Day 28 and Day 35, and spleens were harvested at Day 35 for determination of IFN-γ and IL-5 levels by ELISPOT assay.



FIG. 11 illustrates that an immunogenic composition of LNP-encapsulated mRNA comprising an optimized nucleotide sequence encoding a full-length uncleavable pre-fusion stabilized SARS-COV-2 S protein produced a robust binding and neutralizing antibody response in cynomolgus monkeys. FIG. 11A illustrates the ELISA titer elicited in cynomolgus monkeys after immunization with two doses of 15 μg, 45 μg or 135 μg LNP-encapsulated mRNA. FIG. 11B illustrates the titers of neutralizing antibodies produced in cynomolgus monkeys after immunization with two doses of 15 μg, 45 μg or 135 μg LNP-encapsulated mRNA, as determined by a pseudovirus-based assay. FIG. 11C illustrates the microneutralization titers produced in cynomolgus monkeys after immunization with two doses of either 15 μg, 45 μg or 135 μg LNP-encapsulated mRNA. 39 individual conversion serum samples from COVID-19 patients with mild, strong and severe symptoms (Conv Sera) acted as a positive control in the assays illustrated by FIGS. 11B and 11C. As illustrated in FIG. 11D, the immunogenic composition was administered on Day 0 and Day 21. Blood was sampled on days Day −4 (baseline), Day 2, Day 7, Day 14, Day 21, Day 23, Day 28 and Day 35 and Day 42. Peripheral blood mononuclear cells (PMCs) were isolated on Day 42 to determine the cell-mediated immunity (CMI) elicited by the test composition.



FIG. 12 illustrates that an immunogenic composition of LNP-encapsulated mRNA comprising an optimized nucleotide sequence encoding a full-length uncleavable pre-fusion stabilized SARS-COV-2 S protein produced a Th1-biased T-cell response in cynomolgus monkeys. The monkeys were immunized with two doses of either 5 μg or 10 μg LNP-encapsulated mRNA at Day 0 and Day 21. FIGS. 12A and 12C show that PMBCs isolated on Day 42 secreted high levels of the Th1 cytokine interferon-γ (IFN-γ) after stimulation with peptide pools S1 and S2, respectively (SARS-COV-2 S protein-derived peptides). FIGS. 12B and 12D show that these PMBCs secreted only baseline levels of the Th2 cytokine IL-13 in response to peptide stimulation. Naïve (non-activated and non-stimulated) splenocytes served as a control to establish baseline levels of IFN-γ and IL-13 (dashed line).



FIG. 13 describes a statistical analysis of the data summarized in FIGS. 9 and 11. Pseudovirus (PsV) titers in mice for the 1 μg, 5 μg and 10 μg dose levels of the tested immunogenic composition were significantly different from the control human convalescent sera PsV titers (FIG. 13A). Spearman Correlation Coefficients (SCC) between ELISA (IgG), pseudoviral (PsV) and microneutralization (MN) titers were calculated for the cynomolgus monkey experiment summarised in FIG. 11. SCC were conducted per individual animals, and means (±Standard Errors) were calculated per dose (N=4) or all test animals (N=12). The results of this analysis are shown in FIG. 13B. FIGS. 13C and 13D illustrate that microneutralization (MN) and pseudoviral (PsV) titers in cynomolgus monkeys were significantly higher than MN and PsV titers of human convalescent sera that served as controls.



FIG. 14 illustrates the neutralizing antibody titers induced in mice and NHPs by immunization with LNP formulations comprising optimized mRNAs encoding full-length prefusion stabilized SARS-COV-2 S proteins. Mice were administered two immunizations at a three-week interval with a 0.4 μg per dose of each of five formulations (WT, 2P, GSAS, 2P/GSAS, 2P/GSAS/ALAYT). Non-human primates (NHPs) were immunized using the same immunization schedule at 5 μg per dose of six formulations (2P, GSAS, 2P/GSAS, 2P/GSAS/ALAYT, 6P and 6P/GSAS). Sera samples were collected from pre-immunized animals (Day −4) and on Day 14, 21, 28, 35 and 42 post administration. Each dot represents an individual serum sample and the line represents the geometric mean for the group. The dotted line below for each panel represents the lower limit of assay readout.



FIG. 15 illustrates the protective efficacy of LNP formulation of Example 5 in Syrian golden hamsters. (a) weight loss in hamsters administered with either a single or two dose regime; (b) H&E staining of lungs of hamsters that received either one dose 0.15 μg (custom-character), 1.5 μg (custom-character), 4.5 μg (custom-character), 13.5 μg (custom-character), Sham (custom-character) or unchallenged (-∘-) animals; (c) Day 4 and Day 7 post-challenge pathogenicity scores of hamsters immunized with either one or two dose regimens; (d) Quantification of SARS-COV-2 subgenomic mRNA (sgmRNA) in lungs and nasal tissue of hamsters immunized with two doses of the LNP formulation of Example 5 as compared to control (Sham and Naïve) on Day 4 and Day 7 post-infection (DPI).



FIG. 16 provides the strains from which the S protein was derived for the preparation of pseudoviruses (PsVs) that were used in the neutralization assays described in Example 14. For SARS-COV-2 strains, mutations compared to the SARS-COV-2 S protein from the Wuhan index strain are indicted as well as the presence of the D614G mutation. Where applicable, the GenBank number of the S-protein amino acid sequence is provided. The PsVs were obtained from Integral Molecular, and both the catalogue number and the lot number for each PsV are also indicated.



FIG. 17 illustrates that non-human primates (NHPs), which previously had been immunized with two doses of the LNP formulation of Example 5, mount an effective neutralizing antibody response against the S protein derived from the original Wuhan strain as well as naturally occurring variants of the S protein observed in South Africa, Japan/Brazil and California, and an S protein derived from a SARS-COV-1 strain after immunization with a booster mRNA vaccine encoding a South African variant of the SARS-COV-2 S protein. NHPs were administered two immunizations on day 0 and day 35 with LNP formulations that comprised an optimized mRNAs encoding full-length prefusion stabilized SARS-COV-2 S protein as described in Example 5. A booster LNP formulation comprising an mRNA encoding a corresponding S protein with mutations observed in a naturally occurring South African strain was injected on Day 305. Serum samples were taken on days 35, 308, 329 and 343. Each dot represents an individual serum sample, and the line represents the geometric mean for the group. The dotted line represents the lower limit of detection.





DEFINITIONS

In order for the present invention to be more readily understood, certain terms are first defined below. Additional definitions for the following terms and other terms are set forth throughout the Specification.


As used in this Specification and the appended claims, the singular forms “a,” “an” and “the” include plural referents unless the context clearly dictates otherwise.


Unless specifically stated or obvious from context, as used herein, the term “or” is understood to be inclusive and covers both “or” and “and”.


The terms “e.g.,” and “i.e.” as used herein, are used merely by way of example, without limitation intended, and should not be construed as referring only those items explicitly enumerated in the specification.


Unless specifically stated or evident from context, as used herein, the term “about” is understood as within a range of normal tolerance in the art, for example within 2 standard deviations of the mean. “About” can be understood to be within 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1%, 0.5%, 0.1%, 0.05%, 0.01%, or 0.001% of the stated value. Unless otherwise clear from the context, all numerical values provided herein reflects normal fluctuations that can be appreciated by a skilled artisan.


As used herein, term “abortive transcript” or “pre-aborted transcript” or the like is any transcript that is shorter than a full-length mRNA molecule encoded by the DNA template that results from the premature release of RNA polymerase from the template DNA in a sequence-independent manner. In some embodiments, an abortive transcript may be less than 90% of the length of the full-length mRNA molecule that is transcribed from the target DNA molecule, e.g., less than 80%, 70%, 60%, 50%, 40%, 30%, 20%, 10%, 5%, 1% of the length of the full-length mRNA molecule.


As used herein, the terms “codon” and “codons” refer to a sequence of three nucleotides which together form a unit of the genetic code. Each codon corresponds to a specific amino acid or stop signal in the process of translation or protein synthesis. The genetic code is degenerate, and more than one codon can encode a specific amino acid residue. For example, codons can comprise DNA or RNA nucleotides.


As used herein, the terms “codon optimization” and “codon-optimized” refer to modifications of the codon composition of a naturally-occurring or wild-type nucleic acid encoding a peptide, polypeptide or protein that do not alter its amino acid sequence, thereby improving protein expression of said nucleic acid. In the context of the present invention, “codon optimization” may also refer to the process by which one or more optimized nucleotide sequences are arrived at by removing with filters less than optimal nucleotide sequences from a list of nucleotide sequences, such as filtering by guanine-cytosine content, codon adaptation index, presence of destabilizing nucleic acid sequences or motifs, and/or presence of pause sites and/or terminator signals.


As used herein, “full-length mRNA” is as characterized when using a specific assay, e.g., gel electrophoresis and detection using UV and UV absorption spectroscopy with separation by capillary electrophoresis. The length of an mRNA molecule that encodes a full-length polypeptide is at least 50% of the length of a full-length mRNA molecule that is transcribed from the target DNA, e.g., at least 60%, 70%, 80%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.01%, 99.05%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9% of the length of a full-length mRNA molecule that is transcribed from the target DNA.


As used herein, the term “in vitro” refers to events that occur in an artificial environment, e.g., in a test tube or reaction vessel, in cell culture, etc., rather than within a multi-cellular organism.


As used herein, the term “in vivo” refers to events that occur within a multi-cellular organism, such as a human and a non-human animal. In the context of cell-based systems, the term may be used to refer to events that occur within a living cell (as opposed to, for example, in vitro systems).


As used herein, the term “messenger RNA (mRNA)” refers to a polyribonucleotide that encodes at least one polypeptide. mRNA as used herein encompasses both modified and unmodified RNA. mRNA may contain one or more coding and non-coding regions. mRNA can be purified from natural sources, produced using recombinant expression systems and optionally purified, in vitro transcribed, or chemically synthesized. Where appropriate, e.g., in the case of chemically synthesized molecules, mRNA can comprise nucleoside analogs such as analogs having chemically modified bases or sugars, backbone modifications, etc. An mRNA sequence is presented in the 5′ to 3′ direction unless otherwise indicated.


As used herein, the term “nucleic acid,” in its broadest sense, refers to any compound and/or substance that is or can be incorporated into a polynucleotide chain. In some embodiments, a nucleic acid is a compound and/or substance that is or can be incorporated into a polynucleotide chain via a phosphodiester linkage. In some embodiments, “nucleic acid” refers to individual nucleic acid residues (e.g., nucleotides and/or nucleosides). In some embodiments, “nucleic acid” refers to a polynucleotide chain comprising individual nucleic acid residues. In some embodiments, “nucleic acid” encompasses RNA as well as single and/or double-stranded DNA and/or cDNA. Furthermore, the terms “nucleic acid,” “DNA,” “RNA,” and/or similar terms include nucleic acid analogs, i.e., analogs having other than a phosphodiester backbone. A nucleic acid sequence is presented in the 5′ to 3′ direction unless otherwise indicated.


As used herein, the term “nucleotide sequence”, in its broadest sense, refers to the order of nucleobases within a nucleic acid. In some embodiments, “nucleotide sequence” refers to the order of individual nucleobases within a gene. In some embodiments, “nucleotide sequence” refers to the order of individual nucleobases within a protein-coding gene. In some embodiments, “nucleotide sequence” refers to the order of individual nucleobases within single and/or double stranded DNA and/or cDNA. In some embodiments, “nucleotide sequence” refers to the order of individual nucleobases within RNA. In some embodiments, “nucleotide sequence” refers to the order of individual nucleobases within mRNA. In a particular embodiment, “nucleotide sequence” refers to the order of individual nucleobases within the protein-coding sequence of RNA or DNA. A nucleotide sequence is normally presented in the 5′ to 3′ direction unless otherwise indicated.


As used herein, the term “premature termination” refers to the termination of transcription before the full length of the DNA template has been transcribed. As used herein, premature termination can be caused by the presence of a nucleotide sequence motif (also referred to herein simply as “motif”), e.g., a termination signal, within the DNA template and results in mRNA transcripts that are shorter than the full length mRNA (“prematurely terminated transcripts” or “truncated mRNA transcripts”). Examples of a termination signal include the E. coli rrnB terminator t1 signal (consensus sequence: ATCTGTT) and variants thereof, as described herein.


As used herein, the term “template DNA” (or “DNA template”) relates to a DNA molecule comprising a nucleic acid sequence encoding an mRNA transcript to be synthesized by in vitro transcription. The template DNA is used as template for in vitro transcription in order to produce the mRNA transcript encoded by the template DNA. The template DNA comprises all elements necessary for in vitro transcription, particularly a promoter element for binding of a DNA-dependent RNA polymerase, such as, e.g., T3, T7 and SP6 RNA polymerases, which is operably linked to the DNA sequence encoding a desired mRNA transcript. Furthermore the template DNA may comprise primer binding sites 5′ and/or 3′ of the DNA sequence encoding the mRNA transcript to determine the identity of the DNA sequence encoding the mRNA transcript, e.g., by PCR or DNA sequencing. The “template DNA” in the context of the present invention may be a linear or a circular DNA molecule. As used herein, the term “template DNA” may refer to a DNA vector, such as a plasmid DNA, which comprises a nucleic acid sequence encoding the desired mRNA transcript.


As used herein, the term “preventing” refers to partially or completely inhibiting the onset of one or more symptoms or features of a particular infection, disease, disorder, and/or condition.


As used herein, the term “prophylaxis” refers to partially or completely inhibiting the onset of one or more symptoms or features of a particular infection, disease, disorder, and/or condition.


As used herein, the term “treating” refers to partially or completely alleviating, ameliorating, improving, relieving, delaying onset of, inhibiting progression of, reducing severity of, and/or reducing incidence of one or more symptoms or features of a particular infection, disease, disorder, and/or condition.


As used herein, the term “immunogenic composition” means a composition comprising a nucleic acid or protein that, when administered to a subject, elicits an immune response. In some embodiments, the “immunogenic composition” comprises a nucleic acid. In some embodiments, the nucleic acid is mRNA. In some embodiments, the nucleic acid is DNA. It should be understood that the terms “immunogenic composition” and “vaccine” are used interchangeably herein and are thus meant to have equivalent meanings.


Percentage sequence identity between two nucleotide (or amino acid) sequences is determined after alignment of the two sequences. This alignment and the percentage sequence identity can be determined using software programs known in the art, for example those described in section 7.7.18 of Current Protocols in Molecular Biology (F. M. Ausubel et al., eds., 1987) Supplement 30. In the context of the present invention, an alignment is determined by the Smith-Waterman homology search algorithm using an affine gap search with a gap open penalty of 12 and a gap extension penalty of 2, BLOSUM matrix of 62. The Smith-Waterman homology search algorithm is disclosed in Smith & Waterman (1981) Adv. Appl. Math. 2:482-489. A comparison is then carried out between respective nucleotides (or amino acids) located at the same position in the two nucleotide (or amino acid) sequences. When a given position is occupied by the same nucleotide (or amino acid) in the two nucleotide (or amino acid) sequences, these sequences are identical for this position. The percentage of sequence identity is then determined from the number of positions for which respective nucleotides (or amino acids) are identical, over the total number of nucleotides (or amino acids) in the nucleotide (or amino acid) sequence with which the comparison is made.


All technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs and as commonly used in the art to which this application belongs. The publications and other reference materials referenced herein to describe the background of the invention and to provide additional detail regarding its practice are hereby incorporated by reference.


DETAILED DESCRIPTION OF THE INVENTION

The present invention addresses the need for generating optimized nucleotide sequences encoding a protein antigen for the effective treatment or prevention of an infectious disease through the provision of a vaccine comprising an mRNA with the optimized nucleotide sequence. A method is provided for processing a naturally occurring nucleotide sequence encoding a protein antigen to produce at least one optimized nucleotide sequence. The optimized nucleotide sequence is designed to increase the expression of the encoded protein antigen compared to the expression of the protein associated with the naturally occurring nucleotide sequence. Codon optimization can modify the composition of a protein-coding nucleotide sequence based on various criteria without altering the sequence of translated amino acids of the encoded protein antigen, due to the redundancy in the genetic code.


To avoid imbalance between mRNA codon usage and abundance of cognate tRNAs, codon optimization can provide a composition of codons within a nucleotide sequence that better matches the naturally occurring abundance of transfer RNAs (tRNAs) in a host cell and avoids depletion of a specific tRNA. As tRNA abundance influences the rate of protein translation, codon optimization of a nucleotide sequence can increase the efficiency of protein translation and yield for the encoded protein. For example, by not using rare codons which are characterized by a low codon usage, efficiency of protein translation and protein yield can be increased, as the shortage of rare tRNAs can stall or terminate protein translation.


Codon optimization can come at the cost of reduced functional activity of the encoded protein and an associated loss in efficacy as the process may remove information encoded in the nucleotide sequence that is important for controlling translation of the protein and ensuring proper folding of the nascent polypeptide chain (Mauro & Chappell, Trends Mol Med. 2014; 20 (11): 604-13). The inventors have found that optimized sequences which retain some variety, i.e. do not necessarily include only one codon encoding each amino acid, can achieve increased protein yield while retaining functional activity of the encoded protein.


Generation of Optimized Nucleotide Sequences


FIGS. 1A and 1B illustrate a process for generating optimized nucleotide sequences in accordance with the invention. The process first generates a list of codon-optimized sequences and then applies three filters to the list. Specifically, it applies a motif screen filter, guanine-cytosine (GC) content analysis filter, and codon adaptation index (CAI) analysis filter to produce an updated list of optimized nucleotide sequences. The updated list no longer includes nucleotide sequences containing features that are expected to interfere with effective transcription and/or translation of the encoded protein antigen.


Codon Optimization

The genetic code has 64 possible codons. Each codon comprises a sequence of three nucleotides. The usage frequency for each codon in the protein-coding regions of the genome can be calculated by determining the number of instances that a specific codon appears within the protein-coding regions of the genome, and subsequently dividing the obtained value by the total number of codons that encode the same amino acid within protein-coding regions of the genome. A codon usage table contains experimentally derived data regarding how often, for the particular biological source from which the table has been generated, each codon is used to encode a certain amino acid. This information is expressed, for each codon, as a percentage (0 to 100%), or fraction (0 to 1), of how often that codon is used to encode a certain amino acid relative to the total number of times a codon encodes that amino acid.


Codon usage tables are stored in publically available databases, such as the Codon Usage Database (Nakamura et al. (2000) Nucleic Acids Research 28 (1), 292; available online at https://www.kazusa.or.jp/codon/), and the High-performance Integrated Virtual Environment-Codon Usage Tables (HIVE-CUTs) database (Athey et al., (2017), BMC Bioinformatics 18 (1), 391; available online at http://hive.biochemistry.gwu.edu/review/codon).


During the first step of codon optimization, codons are removed from a first codon usage table which reflects the frequency of each codon in a given organism (e.g., a mammal or human) if they are associated with a codon usage frequency which is less than a threshold frequency (e.g., 10%). The codon usage frequencies of the codons not removed in the first step are normalized to generate a normalized codon usage table. An optimized nucleotide sequence encoding an amino acid sequence of interest is generated by selecting a codon for each amino acid in the amino acid sequence based on the usage frequency of the one or more codons associated with a given amino acid in the normalized codon usage table. The probability of selecting a certain codon for a given amino acid is equal to the usage frequency associated with the codon associated with this amino acid in the normalized codon usage table.


The codon-optimized sequences of the invention are generated by a computer-implemented method for generating an optimized nucleotide sequence. The method comprises: (i) receiving an amino acid sequence, wherein the amino acid sequence encodes a peptide, polypeptide, or protein; (ii) receiving a first codon usage table, wherein the first codon usage table comprises a list of amino acids, wherein each amino acid in the table is associated with at least one codon and each codon is associated with a usage frequency; (iii) removing from the codon usage table any codons associated with a usage frequency which is less than a threshold frequency; (iv) generating a normalized codon usage table by normalizing the usage frequencies of the codons not removed in step (iii); and (v) generating an optimized nucleotide sequence encoding the amino acid sequence by selecting a codon for each amino acid in the amino acid sequence based on the usage frequency of the one or more codons associated with the amino acid in the normalized codon usage table. The threshold frequency can be in the range of 5%-30%, in particular 5%, 10%, 15%, 20%, 25%, or 30%. In the context of the present invention, the threshold frequency is typically 10%.


The step of generating a normalized codon usage table comprises: (a) distributing the usage frequency of each codon associated with a first amino acid and removed in step (iii) to the remaining codons associated with the first amino acid; and (b) repeating step (a) for each amino acid to produce a normalized codon usage table. In some embodiments, the usage frequency of the removed codons is distributed equally amongst the remaining codons. In some embodiments, the usage frequency of the removed codons is distributed amongst the remaining codons proportionally based on the usage frequency of each remaining codon. “Distributed” in this context may be defined as taking the combined magnitude of the usage frequencies of removed codons associated with a certain amino acid and apportioning some of this combined frequency to each of the remaining codons encoding the certain amino acid.


The step of selecting a codon for each amino acid comprises: (a) identifying, in the normalized codon usage table, the one or more codons associated with a first amino acid of the amino acid sequence; (b) selecting a codon associated with the first amino acid, wherein the probability of selecting a certain codon is equal to the usage frequency associated with the codon associated with the first amino acid in the normalized codon usage table; and (c) repeating steps (a) and (b) until a codon has been selected for each amino acid in the amino acid sequence.


The step of generating an optimized nucleotide sequence by selecting a codon for each amino acid in the amino acid sequence (step (v) in the above method) is performed n times to generate a list of optimized nucleotide sequences.


Motif Screen

A motif screen filter is applied to the list of optimized nucleotide sequences. Optimized nucleotide sequences encoding any known negative cis-regulatory elements and negative repeat elements are removed from the list to generate an updated list.


For each optimized nucleotide sequence in the list, it is also determined whether it contains a termination signal. Any nucleotide sequence that contains one or more termination signals is removed from the list generating an updated list. In some embodiments, the termination signal has the following nucleotide sequence: 5′-X1ATCTX2TX3-3′, wherein X1, X2 and X3 are independently selected from A, C, T or G. In some embodiments, the termination signal has one of the following nucleotide sequences: TATCTGTT; and/or TTTTTT; and/or AAGCTT; and/or GAAGAGC; and/or TCTAGA. In some embodiments, the termination signal has the following nucleotide sequence: 5′-X1AUCUX2UX3-3′, wherein X1, X2 and X3 are independently selected from A, C, U or G. In some embodiments, the termination signal has one of the following nucleotide sequences: UAUCUGUU; and/or UUUUUU; and/or AAGCUU; and/or GAAGAGC; and/or UCUAGA.


Guanine-Cytosine (GC) Content

The method further comprises determining a guanine-cytosine (GC) content of each of the optimized nucleotide sequences in the updated list of optimized nucleotide sequences. The GC content of a sequence is the percentage of bases in the nucleotide sequence that are guanine or cytosine. The list of optimized nucleotide sequences is further updated by removing any nucleotide sequence from the list, if its GC content falls outside a predetermined GC content range.


Determining a GC content of each of the optimized nucleotide sequences comprises, for each nucleotide sequence: determining a GC content of one or more additional portions of the nucleotide sequence, wherein the additional portions are non-overlapping with each other and with the first portion, and wherein updating the list of optimized sequences comprises: removing the nucleotide sequence if the GC content of any portion falls outside the predetermined GC content range, optionally wherein determining the GC content of the nucleotide sequence is halted when the GC content of any portion is determined to be outside the predetermined GC content range. In some embodiments, the first portion and/or the one or more additional portions of the nucleotide sequence comprise a predetermined number of nucleotides, optionally wherein the predetermined number of nucleotides is in the range of: 5 to 300 nucleotides, or 10 to 200 nucleotides, or 15 to 100 nucleotides, or 20 to 50 nucleotides. In the context of the present invention, the predetermined number of nucleotides is typically 30 nucleotides. The predetermined GC content range can be 15%-75%, or 40%-60%, or, 30%-70%. In the context of the present invention, the predetermined GC content range is typically 30%-70%.


A suitable GC content filter in the context of the invention may first analyze the first 30 nucleotides of the optimized nucleotide sequence, i.e., nucleotides 1 to 30 of the optimized nucleotide sequence. Analysis may comprise determining the number of nucleotides in the portion with are either G or C, and determining the GC content of the portion may comprise dividing the number of G or C nucleotides in the portion by the total number of nucleotides in the portion. The result of this analysis will provide a value describing the proportion of nucleotides in the portion that are G or C, and may be a percentage, for example 50%, or a decimal, for example 0.5. If the GC content of the first portion falls outside a predetermined GC content range, the optimized nucleotide sequence may be removed from the list of optimized nucleotide sequences.


If the GC content of the first portion falls inside the predetermined GC content range, the GC content filter may then analyze a second portion of the optimized nucleotide sequence. In this example, this may be the second 30 nucleotides, i.e., nucleotides 31 to 60, of the optimized nucleotide sequence. The portion analysis may be repeated for each portion until either: a portion is found having a GC content falling outside the predetermined GC content range, in which case the optimized nucleotide sequence may be removed from the list, or the whole optimized nucleotide sequence has been analyzed and no such portion has been found, in which case the GC content filter retains the optimized nucleotide sequence in the list and may move on to the next optimized nucleotide sequence in the list.


Codon Adaptation Index (CAI)

The method further comprises determining a codon adaptation index of each of the optimized nucleotide sequences in the most recently updated list of optimized nucleotide sequences. The codon adaptation index of a sequence is a measure of codon usage bias and can be a value between 0 and 1. The most recently updated list of optimized nucleotide sequences is further updated by removing any nucleotide sequence if its codon adaptation index is less than or equal to a predetermined codon adaptation index threshold. The codon adaptation index threshold can 0.7, or 0.75, or 0.8, or 0.85, or 0.9. The inventors have found that optimized nucleotide sequences with a codon adaptation index equal to or greater than 0.8 deliver very high protein yield. Therefore in the context of the invention, the codon adaptation index threshold is typically 0.8.


A codon adaptation index may be calculated, for each optimized nucleotide sequence, in any way that would be apparent to a person skilled in the art, for example as described in “The codon adaptation index—a measure of directional synonymous codon usage bias, and its potential applications” (Sharp and Li, 1987. Nucleic Acids Research 15 (3), p. 1281-1295); available online at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC340524/.


Implementing a codon adaptation index calculation may include a method according to, or similar to, the following. For each amino acid in a sequence, a weight of each codon in a sequence may be represented by a parameter termed relative adaptiveness (wi). Relative adaptiveness may be computed from a reference sequence set, as the ratio between the observed frequency of the codon fi and the frequency of the most frequent synonymous codon fj for that amino acid. The codon adaptation index of a sequence may then be calculated as the geometric mean of the weight associated to each codon over the length of the sequence (measured in codons). The reference sequence set used to calculate codon adaptation index may be the same reference sequence set from which a codon usage table used with methods of the invention is derived.


Synthesis of Optimized Nucleotide Sequences

Once a list of optimized nucleotide sequences has been generated, in vitro synthesis (also referred to commonly as “in vitro transcription”) can be performed with a nucleic acid vector such as a linear or circular DNA template containing a promoter, a pool of ribonucleotide triphosphates, a buffer system that may include DTT and magnesium ions, and an appropriate RNA polymerase (e.g., T3, T7, or SP6 RNA polymerase), DNase I, pyrophosphatase, and/or RNase inhibitor. The exact conditions will vary according to the specific application.


The nucleic acid vector typically is a plasmid. The term ‘plasmid’ or ‘plasmid nucleic acid vector’ refers to a circular nucleic acid molecule, e.g., to an artificial nucleic acid molecule. A plasmid DNA in the context of the present invention is suitable for incorporating or harboring a desired nucleic acid sequence, such as a nucleic acid sequence comprising a sequence encoding an mRNA transcript and/or an open reading frame encoding at least protein antigen. Such plasmid DNA constructs/vectors may be expression vectors, cloning vectors, transfer vectors etc.


The nucleic acid vector typically comprises a sequence corresponding to (coding for) a desired mRNA transcript, or a part thereof, such as a sequence corresponding to the optimized nucleotide sequence encoding a protein antigen and the 5′- and/or 3′UTR of an mRNA. In some embodiments, the sequence corresponding to the desired mRNA transcript may also encode a poly A-tail after the 3′ UTR so that the poly A-tail is included with the mRNA transcript. More typically in the context of the present invention, the sequence corresponding to the desired mRNA transcript consists of the 5′/3′ UTRs and the open reading frame. In some embodiments of the invention, the mRNA transcript synthesized from the nucleic acid vector during in vitro transcription does not contain a polyA tail. A polyA tail may be added to the mRNA transcript in a post-synthesis processing step.


Screening of Optimized Nucleotide Sequences

Individual in vitro transcribed, capped and tailed mRNAs encoding an optimized nucleotide sequence encoding a protein antigen can be transfected into a cell either in vivo or in vitro to determine the expression level of the protein encoded by the optimized nucleotide sequence. An mRNA encoding, e.g., a naturally occurring nucleotide sequence encoding the protein antigen, or a codon-optimized nucleotide sequence encoding the protein antigen prepared with a method other than the process for generating an optimized nucleotide sequence described herein, may serve as a control mRNA. Each mRNA and control mRNA are contacted with a separate cell or organism, wherein the cell or organism contacted. An mRNA comprising an optimized nucleotide sequence generated in accordance with the invention is selected for use in a immunogenic composition in accordance with the present invention if it produces an increased yield of the protein antigen compared to the yield of the protein produced by the cell or organism contacted with a control mRNA.


Methods well-known in the art, such as western blotting, are suitable to experimentally verify that the optimized nucleotide sequence results in increased expression and production of the encoded protein antigen. Furthermore, multiple optimized nucleotide sequences generated by the methods of the present invention can be screened to identify the sequence or sequences which generate the highest protein yield. In some embodiments, the expression level of the protein encoded by the optimized nucleotide sequence is increased at least 2-fold, e.g., at least 3-fold or 4-fold.


In some embodiments, the functional activity of the protein antigen encoded by the optimized nucleotide sequence is determined. The functional activity of the protein encoded by the optimized nucleotide sequence can be determined using a range of well-established methods. These methods may vary depending on the properties of the encoded protein antigen. For example, antibodies recognizing a conformational epitope on the protein antigen may be used to confirm proper folding of the protein antigen expressed from the optimized nucleotide sequence. Alternatively or in addition, in embodiments of the invention relating to a spike protein of SARS-CoV-2, the spike protein may be contacted with human angiotensin-converting enzyme 2 (ACE2) to confirm its receptor binding activity. Binding activity is typically assessed relative to a control, such a spike protein of SARS-COV-2 expressed from a naturally occurring coding sequence.


SARS-COV-2 Proteins

Coronaviruses (CoVs) are the largest group of viruses belonging to the Nidovirales order, which includes Coronaviridae, Arteriviridae, and Roniviridae families. CoVs are spherical enveloped viruses with a positive-sense single-stranded RNA genome and a nucleocapsid of helical symmetry with a diameter of approximately 125 nm.


SARS-COV-2 is a β-coronavirus, like other coronaviruses that infect humans, such as MERS-COV and SARS-COV. The first two-thirds of the viral 30 kb RNA genome, mainly named as ORF1a/b region, encodes two polyproteins (pp1a and pp1ab), which constitute the main non-structural proteins. The remaining genome encodes accessory proteins and four essential structural proteins, namely the spike(S) glycoprotein, small envelope (E) protein, matrix/membrane (M) protein, and nucleocapsid (N) proteins (Kang et al. (2020) https://doi.org/10.1101/2020.03.06.977876). SARS-COV-2 uses its S protein to bind host cell receptors (ACE2 in human) and mediate cell entry. This makes S protein the main target for neutralizing antibodies, as discussed in detail below.


Spike Glycoprotein (S Protein)

Cell entry depends on the binding of S proteins to receptors on the cell surface and on S protein priming by host cell proteases. The S protein comprises two functional subunits responsible for binding to the host cell receptor (S1 subunit) and fusion of the viral and cellular membranes (S2 subunit) (FIG. 3). The S protein forms a homotrimer that produces a distinctive spike structure on the surface of the virus. The S1 subunit has a large receptor-binding domain (RBD), while S2 forms the stalk of the spike molecule. The amino acid sequence of the full-length SARS-COV-2 S glycoprotein is provided by SEQ ID NO: 1 (Gen Bank QHD43416.1). The S1 subunit is located at residues 1 to 681, the S2 subunit is located at residues 686 to 1208 and the S2′ subunit is located at residues 816 to 1208. The C-terminal end of the S protein contains a transmembrane domain, and the last 19 amino acids of the cytoplasmic tail contain an endoplasmic reticulum (ER)-retention signal.


References to the naturally occurring SARS-COV-2 S protein refer to the full-length SARS-COV-2 S glycoprotein provided by SEQ ID NO: 1. Any modifications to the naturally occurring SARS-COV-2 S protein are numbered based on the residues in SEQ ID NO:1


Although the observed diversity among pandemic SARS-COV-2 sequences is low, its rapid global spread provides the virus with ample opportunity for natural selection to act upon rare but favorable mutations. It is advantageous to target the sequences of the circulating SARS-CoV-2 virus rather than just the index strain from Wuhan (i.e. SEQ ID NO: 1).


An amino acid change in the SARS-COV-2 S glycoprotein, D614G, emerged early during the 2020 COVID-19 pandemic and as of July 2020 has become the most prevalent form of the virus around the world. Patients infected with G614 shed more viral nucleic acid compared with those with D614, and G614-bearing viruses show significantly higher infectious titers in vitro than their D614 counterparts (Korber et al., 2020, Cell 182, 1-16). Optimized nucleotide sequence encoding a SARS-COV-2 S protein comprising a D614G mutation may therefore particularly suitable for use in immunogenic composition as described herein.


Other rare mutations that have been identified in the SARS-COV-2 S protein are summarized in the table below (Korber et al. 2020-https://doi.org/10.1101/2020.04.29.069054):
















Spike Mutation
Spike location possible impact









L5F
Signal Peptide



L8V/W
Signal Peptide



H49Y
S1 NTD domain



Y145H/del
S1 NTD domain



Q239K
S1 NTD domain



V367F
Up/Down conformations



G476S
Directly in the RBD



V483A
Up/Down conformations



V6151/F
In SARS-CoV ADE epitope



A831V
Potential fusion peptide in S2



D839Y/N/E
S2 subunit



S943P
Fusion core of HR1



P1263L
Cytoplasmic Tail











Further SARS-COV-2 S glycoprotein mutations include: L18F, HV 69-70 deletion, Y144 deletion, E154Q, Q218E, A222V, S447N, F490S, S494P, N501Y, A570D, E583D, T618E, P681H, A701V, T716I, T723I, 1843V, S982A and D1118H. In late 2020, new SARS-COV-2 variants emerged in the UK, South Africa, Brazil and California that contained multiple mutations. The mutations present in the SARS-COV-2 S glycoprotein in the UK variant (named lineage B.1.1.7) include a H69 deletion (ΔH69), V70 deletion (ΔV70), a Y144 deletion (ΔY144), N501Y, A570D, P681H, T716I, S982A and D1118H (Rambaut et al. 2020 https://virological.org/t/preliminary-genomic-characterisation-of-an-emergent-sars-cov-2-lineage-in-the-uk-defined-by-a-novel-set-of-spike-mutations/563). In October 2020, the South African variant (named lineage B.1.351) includes six mutations in the SARS-COV-2 S glycoprotein protein-D80A, K417N, E484K, N501Y, D614G and A701V. By the end of November, three further SARS-COV-2 S glycoprotein mutations had emerged (L18F, R246I and K417N) and the deletion of three amino acids at L242 (ΔL242), A243 (ΔA243) and L244 (ΔL244) (Tegally et al. (2020) https://doi.org/10.1101/2020.12.21.20248640). The mutations present in the SARS-COV-2 S glycoprotein in the Brazilian variant (named linage P.1) include L18F, T20N, P26S, D138Y, R190S, K417T, E484K, N501Y, H655Y, T1027I and V1176F. The mutations present in the SARS-COV-2 S glycoprotein in the Californian variant (known as CAL.20C) include S13I, W152C and L452R (Zhang et al. (2021) https://doi.org/10.1101/2021.01.18.21249786).


In some embodiments, the amino acid sequence of the full-length SARS-COV-2 S glycoprotein can have multiple mutations. For example, two or more, three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, ten or more of mutations relative to the amino acid sequence of SEQ ID NO: 1. The mutations in the SARS-COV-2 S glycoprotein can be amino acid deletions or amino acid substitutions. Possible combinations of mutations include: (a) L18F, A222V, D614G; (b) A222V, D614G; (c) A222V, E583D, D614G; (d) S447N, D614G; (e) E154Q, F490S, D614G, 1834V; (f) D614G, A701V; (g) Q218E, D614G; (h) D614G, T618R; (i) ΔL242, ΔA243, ΔL244; (j) A222V, E583D, A701V; (k) ΔH69, ΔV70, ΔY144, N501Y, A570D, D614G, P681H, T716I, S982A, D1118H (UK variant+D614G); (1) D80A, K417N, E484K, N501Y, D614G and A701V (South African fixed mutations+D614G); (m) D80A, K417N, E484K, N501Y and A701V (South African fixed mutations; (n) D80A, D215G, ΔL242, ΔA243, ΔL244, A417V, E484K, N501Y, D614G, A701V (South African variant 1+D614G); (o) L18F, D80A, D215G, ΔL242, ΔA243, ΔL244, K417N, E484K, N501Y, D614G and A701V (South African variant 2+D614G); (p) L18F, T20N, P26S, D138Y, R190S, K417T, E484K, N501Y, D614G, H655Y, T1027I and V1176F (the Brazilian variant+D614G) and (q) S13I, W152C, L452R and D614G (Californian variant+D614G).


In some embodiments, the amino acid sequence of the full-length SARS-COV-2 S glycoprotein can have one or more of mutations relative to the amino acid sequence of SEQ ID NO: 1. This may include one or more of the following mutations: D614G mutation, L5F mutation, L8V/W mutation, H49Y mutation, Y145H/del mutation, Q239K mutation, V367F mutation, G476S mutation, V483A mutation, V6151/F mutation, A831V mutation, D839Y/N/E mutation, S943P mutation, P1263L mutation. Accordingly, in particular embodiments, any of the S proteins or antigenic fragments thereof described herein comprises a D614G mutation. For example, in particular embodiments, the SARS-COV-2 spike protein, the ectodomain thereof or the antigenic fragment thereof comprises a D614G mutation.


In some embodiments, any of the S proteins or antigenic fragments thereof described herein comprises the L5F mutation. In some embodiments, any of the S proteins or antigenic fragments thereof described herein comprises the L8V/W mutation. In some embodiments, any of the S proteins or antigenic fragments thereof described herein comprises the H49Y mutation. In some embodiments, any of the S proteins or antigenic fragments thereof described herein comprises the Y145H/del mutation. In some embodiments, any of the S proteins or antigenic fragments thereof described herein comprises the Q239K mutation. In some embodiments, any of the S proteins or antigenic fragments thereof described herein comprises the V367F mutation. In some embodiments, any of the S proteins or antigenic fragments thereof described herein comprises the G476S mutation. In some embodiments, any of the S proteins or antigenic fragments thereof described herein comprises the V483A mutation. In some embodiments, any of the S proteins or antigenic fragments thereof described herein comprises the V6151/F mutation. In some embodiments, any of the S proteins or antigenic fragments thereof described herein comprises the A831V mutation. In some embodiments, any of the S proteins or antigenic fragments thereof described herein comprises the D839Y/N/E mutation. In some embodiments, any of the S proteins or antigenic fragments thereof described herein comprises the S943P mutation. In some embodiments, any of the S proteins or antigenic fragments thereof described herein comprises the P1263L mutation.


An optimized nucleotide sequence according to the present invention may encode the SARS-COV-2 S protein or an antigenic fragment thereof. In particular embodiments, the optimized nucleotide sequence encodes a full-length SARS-COV-2 S protein. The full-length SARS-COV-2 S protein can have the amino acid sequence comprising SEQ ID NO: 1 or an amino acid sequence at least 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 1. In some embodiment, the optimized nucleotide sequence encoding the full-length SARS-COV-2 S protein has the sequence of SEQ ID NO: 29. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO:29 and encodes the amino acid sequence of SEQ ID NO:1.


In some embodiments, the optimized nucleotide sequence encodes a SARS-COV-2 S protein comprising one or more mutations relative to the amino acid sequence of SEQ ID NO: 1. For example, in some embodiments, the optimized nucleotide sequence encodes a SARS-COV2 S protein comprising one or more of the following mutations: D614G mutation, L5F mutation, L8V/W mutation, H49Y mutation, Y145H/del mutation, Q239K mutation, V367F mutation, G476S mutation, V483A mutation, V6151/F mutation, A831V mutation, D839Y/N/E mutation, S943P mutation, P1263L mutation. Accordingly, in particular embodiments, the optimized nucleotide sequence encodes a SARS-COV2 S protein comprising the D614G mutation. For example, in particular embodiments the optimized nucleotide sequence encodes a SARS-COV-2 spike protein, an ectodomain thereof or an antigenic fragment thereof which comprises the D614G mutation.


In some embodiments, the optimized nucleotide sequence encodes a SARS-COV2 S protein comprising the L5F mutation. In some embodiments, the optimized nucleotide sequence encodes a SARS-COV2 S protein comprising the L8V/W mutation. In some embodiments, the optimized nucleotide sequence encodes a SARS-COV2 S protein comprising the H49Y mutation. In some embodiments, the optimized nucleotide sequence encodes a SARS-COV2 S protein comprising the Y145H/del mutation. In some embodiments, the optimized nucleotide sequence encodes a SARS-COV2 S protein comprising the Q239K mutation. In some embodiments, the optimized nucleotide sequence encodes a SARS-COV2 S protein comprising the V367F mutation. In some embodiments, the optimized nucleotide sequence encodes a SARS-COV2 S protein comprising the G467S mutation. In some embodiments, the optimized nucleotide sequence encodes a SARS-COV2 S protein comprising the V483A mutation. In some embodiments, the optimized nucleotide sequence encodes a SARS-COV2 S protein comprising the V6151/F mutation. In some embodiments, the optimized nucleotide sequence encodes a SARS-COV2 S protein comprising the A831V mutation. In some embodiments, the optimized nucleotide sequence encodes a SARS-COV2 S protein comprising the D839Y/N/E mutation. In some embodiments, the optimized nucleotide sequence encodes a SARS-COV2 S protein comprising the S943P mutation. In some embodiments, the optimized nucleotide sequence encodes a SARS-CoV2 S protein comprising the P1263L mutation.


Alternatively, an optimized nucleotide sequence according to the present invention may encode an antigenic fragment of the SARS-COV-2 S protein. In certain embodiments, the optimized nucleotide sequence may encode the ectodomain of the SARS-COV-2 S protein, which can have the amino acid sequence of SEQ ID NO:2 or an amino acid sequence at least 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 2. The ectodomain does not contain residues 1209-1273 of the full length SARS-COV-2 S protein, which includes the transmembrane domain and the cytoplasmic tail. In some embodiments, the optimized nucleotide sequence encoding the ectodomain of the SARS-COV-2 S protein has the sequence of SEQ ID NO: 30. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 30 and encodes the amino acid sequence of SEQ ID NO: 2.


In other embodiments, an antigenic fragment of the SARS-COV-2 S protein may comprise one or more of the S1 subunit, the S2 subunit and/or the S2′ subunit of the SARS-COV-2 S protein. For example, the optimized nucleotide sequence may encode the S1 subunit, which has the amino acid sequence of SEQ ID NO: 3. Accordingly, in one embodiment, the optimized nucleotide sequence may encode an amino acid sequence comprising SEQ ID NO:3. In one embodiment, an optimized nucleotide sequence encoding the S1 subunit of the SARS-COV-2 S protein has the sequence of SEQ ID NO: 31. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 31 and encodes the amino acid sequence of SEQ ID NO: 3. In an alternative embodiment, the optimized nucleotide sequence may encode the S2 subunit, which has the amino acid sequence of SEQ ID NO: 4. Accordingly, in one embodiment, the optimized nucleotide sequence may encode an amino acid sequence comprising SEQ ID NO: 4. In one embodiment, an optimized nucleotide sequence encoding the S2 subunit of the SARS-COV-2 S protein has the sequence of SEQ ID NO: 32. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 32 and encodes the amino acid sequence of SEQ ID NO: 4. In an alternative embodiment, the optimized nucleotide sequence may encode the S2′ subunit, which has the amino acid sequence of SEQ ID NO: 5. Accordingly, in one embodiment, the optimized nucleotide sequence may encode an amino acid sequence comprising SEQ ID NO: 5. In one embodiment, an optimized nucleotide sequence encoding the S2′ subunit of the SARS-COV-2 S protein has the sequence of SEQ ID NO: 33. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 33 and encodes the amino acid sequence of SEQ ID NO: 5.


In some embodiments, an antigenic fragment of the SARS-COV-2 S protein may comprise the full length S2 subunit or S2′ subunit of the SARS-COV-2 S protein. The full length S2 subunit or S2′ subunit comprises the transmembrane domain and the cytoplasmic tail. The full length S2 subunit encompasses residues 686 to 1273 of the SARS-COV-2 S protein and the S2′ subunit encompasses residues 816 to 1273 of the SARS-COV-2 S protein. For example, the optimized nucleotide sequence may encode the full length S2 subunit, which has the amino acid sequence of SEQ ID NO:72. Accordingly, in one embodiment, the optimized nucleotide sequence may encode an amino acid sequence comprising SEQ ID NO:72. In one embodiment, an optimized nucleotide sequence encoding the full length S2 subunit of the SARS-COV-2 S protein has the sequence of SEQ ID NO: 71. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO:71 and encodes the amino acid sequence of SEQ ID NO: 72. In an alternative embodiment, the optimized nucleotide sequence may encode the full length S2′ subunit, which has the amino acid sequence of SEQ ID NO: 98. Accordingly, in one embodiment, the optimized nucleotide sequence may encode an amino acid sequence comprising SEQ ID NO: 98. In one embodiment, an optimized nucleotide sequence encoding the full length S2′ subunit of the SARS-COV-2 S protein has the sequence of SEQ ID NO:97. In other embodiments, the optimized nucleotide sequence is at least 81 and encodes the amino acid sequence of SEQ ID NO:98.


The SARS-COV-2 S protein mediates viral entry into host cells by first binding to the angiotensin-converting enzyme 2 (ACE2) receptor through the receptor-binding domain (RBD), which is located in the S1 subunit, and then fusing the viral and host membranes through the S2 subunit (Tai et al. (2020) Cellular and Molecular immunology, doi.org/10.1038/s41423-020-0400-4). Tai et al. identified a region of the RBD of SARS-COV-2 at residues 331 to 524 of the S protein. A putative RBD from residues 331 to 521 of the SARS-COV-2 S protein is provided by SEQ ID NO: 6 in Table 2 below. A recombinant fusion protein containing 193-amino acid RBD (residues 318-510) of SARS-COV and a human IgG1 Fc fragment has been shown to induce highly potent antibody responses in rabbits immunized with it (He et al. (2004) Biochem Biophys Res Commun; 324 (2): 773-781.). Therefore, the RBD of SARS-COV-2 S protein may also be able to highly induce an antibody response. Both the RBD of SARS-COV and the RBD of SARS-COV-2 bind to ACE2. Therefore, it is contemplated that the antigenic fragment of the SARS-COV-2 S protein may encode the RBD. Accordingly, in particular embodiments, the optimized nucleotide sequence may encode an amino acid sequence comprising the RBD of the SARS-COV-2 S protein, which has the amino acid sequence of SEQ ID NO: 6. In one embodiment, an optimized nucleotide sequence encoding the RBD of the SARS-COV-2 S protein has the sequence of SEQ ID NO: 34. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 34 and encodes the amino acid sequence of SEQ ID NO: 6.


In certain embodiments, the antigenic fragment of the SARS-COV-2 S protein is fused with an exogenous N-terminal signal peptide. The signal peptide targets the protein to the ER and the secretory pathway, so that the protein enters the secretory pathway in the host cell in which it is expressed. In particular embodiments, the invention provides an antigenic fragment of the SARS-COV-2 S protein operably linked with an N-terminal signal peptide. For example, the RBD of the SARS-COV-2 S protein may be operably linked to the N-terminal signal peptide, which enables the resulting protein to be secreted from the host cell expressing it.


In specific embodiments, the N-terminal signal peptide can have the sequence MFVFLVLLPLVSSQC (SEQ ID NO: 7), which is the native signal peptide of the naturally occurring SARS-COV-2 S protein. In some embodiments, the signal peptide is encoded by the nucleotide sequence ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAGTGT (SEQ ID NO: 37). Numerous other signal peptides are known in the art, which can be used to secrete a protein from a host cell, for example those mentioned in the review by Freudl (2018) Microbial Cell Factories 17:52. An alternative signal peptide that can be used as part of the invention is MATGSRTSLLLAFGLLCLPWLQEGSAFPTIPLS (SEQ ID NO:38). In some embodiments, the signal peptide is encoded by the nucleotide sequence AUGGCCACUGGAUCAAGAACCUCACUGCUGCUCGCUUUUGGACUGCUUUGCCUGC CCUGGUUGCAAGAAGGAUCGGCUUUCCCGACCAUCCCACUCUCC (SEQ ID NO: 39). Another signal peptide that can be used as part of the invention is MATGSRTSLLLAFGLLCLPWLQEGSAFPTIPLS (SEQ ID NO:40). In some embodiments, the signal peptide is encoded by the nucleotide sequence AUGGCAACUGGAUCAAGAACCUCCCUCCUGCUCGCAUUCGGCCUGCUCUGUCUCC CAUGGCUCCAAGAAGGAAGCGCGUUCCCCACUAUCCCCCUCUCG (SEQ ID NO:41).


The original annotation of the SARS-COV-2 genome identified the signal peptide sequence of the SARS-COV-2 S protein as being SEQ ID NO: 7. An alternative annotation of the SARS-COV-2 genome identified a longer native N-terminal signal peptide sequence, MFLLTTKRTMFVFLVLLPLVSSQC (SEQ ID NO: 142), which is nine amino acids longer. In specific embodiments, the N-terminal signal peptide can be has the sequence of SEQ ID NO: 142.


In some embodiments, the signal peptide is encoded by the nucleotide sequence ATGTTCCTGCTGACAACAAAAAGAACCATGTTTGTGTTCCTGGTGCTGCTGCCTCTG GTGTCCTCACAGTGT (SEQ ID NO: 143).


In particular embodiments, the optimized nucleotide sequence of the invention can encode an amino acid sequence comprising the RBD of the SARS-COV-2 S protein operably linked with an N-terminal signal peptide, which has the amino acid sequence comprising SEQ ID NO: 8. In one embodiment, an optimized nucleotide sequence encoding the RBD of the SARS-COV-2 S protein operably linked with an N-terminal signal peptide has the sequence of SEQ ID NO: 35. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 35 and encodes the amino acid sequence of SEQ ID NO: 8.


In particular embodiments, the optimized nucleotide sequence of the invention can encode the S2 subunit of the SARS-COV-2 S protein operably linked with an N-terminal signal peptide, which has the amino acid sequence comprising SEQ ID NO:74. In one embodiment, an optimized nucleotide sequence encoding the S2 subunit of the SARS-COV-2 S protein operably linked with an N-terminal signal peptide has the sequence of SEQ ID NO: 73. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 73 and encodes the amino acid sequence of SEQ ID NO: 74.


In particular embodiments, the optimized nucleotide sequence of the invention can encode the S2′ subunit of the SARS-COV-2 S protein operably linked with an N-terminal signal peptide, which has the amino acid sequence comprising SEQ ID NO:66. In one embodiment, an optimized nucleotide sequence encoding the S2 subunit of the SARS-COV-2 S protein operably linked with an N-terminal signal peptide has the sequence of SEQ ID NO: 65. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 65 and encodes the amino acid sequence of SEQ ID NO:66.


In particular embodiments, the optimized nucleotide sequence of the invention can encode the full length S2 subunit of the SARS-COV-2 S protein operably linked with an N-terminal signal peptide, which has the amino acid sequence comprising SEQ ID NO:68. In one embodiment, an optimized nucleotide sequence encoding the full length S2 subunit of the SARS-COV-2 S protein operably linked with an N-terminal signal peptide has the sequence of SEQ ID NO: 67. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 67 and encodes the amino acid sequence of SEQ ID NO:68. In particular embodiments, the optimized nucleotide sequence of the invention can encode the full length S2′ subunit of the SARS-COV-2 S protein operably linked with an N-terminal signal peptide, which has the amino acid sequence comprising SEQ ID NO:96. In one embodiment, an optimized nucleotide sequence encoding the full length S2′ subunit of the SARS-COV-2 S protein operably linked with an N-terminal signal peptide has the sequence of SEQ ID NO: 95. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 95 and encodes the amino acid sequence of SEQ ID NO:96.


CoV S proteins are typical class I viral fusion proteins, which require protease cleavage in order for the fusion potential of S protein to be activated. A two-step sequential protease cleavage model has been proposed for activation of S proteins of SARS-COV-2 S protein, (1) priming cleavage between the S1 and S2 subunits and (2) activating cleavage on the S2′ site (Ou et al. (2020) Nature communications, 11, 1620). The SARS-COV-2 S protein harbors a furin cleavage site at the boundary between the S1/S2 subunits, which is processed during biogenesis, which sets this virus apart from SARS-COV and SARS-related CoVs (Walls et al. (2020) Cell doi.org/10.1016/j.cell.2020.02.058).


In some embodiments, an optimized nucleotide sequence according to the present invention may encode a prefusion stabilized SARS-COV-2 S protein, a prefusion stabilized ectodomain of the SARS-COV-2 S protein, or an antigenic fragment of either, which has been modified relative to naturally occurring SARS-COV-2 S protein by removing the furin cleavage site required for activation, by mutating residues 986 and 987 to proline and which contains an extended N-terminal signal peptide. For example, an optimized nucleotide sequence may encode a prefusion stabilized SARS-COV-2 S protein, which has an amino acid sequence comprising SEQ ID NO: 123. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 122. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 122 and encodes the amino acid sequence of SEQ ID NO: 123.


In some embodiments, an optimized nucleotide sequence according to the present invention may encode a prefusion stabilized SARS-COV-2 S protein, a prefusion stabilized ectodomain of the SARS-COV-2 S protein, or an antigenic fragment of either, which has been modified relative to naturally occurring SARS-COV-2 S protein by removing the furin cleavage site required for activation, by mutating residues 817, 892, 899, 942, 986 and 987 to proline and which contains an extended N-terminal signal peptide. For example, an optimized nucleotide sequence may encode a prefusion stabilized SARS-COV-2 S protein, which has an amino acid sequence comprising SEQ ID NO: 137. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 136. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 136 and encodes the amino acid sequence of SEQ ID NO: 137.


Prefusion stabilization tends to increase the recombinant expression of viral fusion glycoproteins, possibly by preventing misfolding that results from a tendency of such proteins to adopt the more stable postfusion structure. Prefusion-stabilized viral glycoproteins are considered superior immunogens to their wild-type counterparts.


A prefusion stabilized conformation of the SARS-COV-2 S protein can be created by mutating the furin cleavage site in order to prevent the cleavage of the S1 and S2 subunits. For example, the RRAR residues in the furin cleavage site (positions 682-685) can be mutated to GSAS residues (i.e. R682G R683S A684A R685S). Accordingly, in some embodiments, an optimized nucleotide sequence in accordance with the invention may encode a prefusion stabilized SARS-CoV-2 S protein, a prefusion stabilized ectodomain of the SARS-COV-2 S protein, or an antigenic fragment of either, in which has been modified relative to naturally occurring SARS-COV-2 S protein by removing the furin cleavage site required for activation, e.g., by replacing the amino acid residues recognized by furin with alternative amino acids that do not form a furin cleavage site but maintain the structure of the S protein. In a specific embodiment, the RRAR furin cleavage site residues 682-685 can be mutated to the residues GSAS to remove the furin cleavage site. In particular embodiments, an optimized nucleotide sequence may encode a prefusion stabilized SARS-COV-2 S protein, which has an amino acid sequence comprising SEQ ID NO: 9. In one embodiment, an optimized nucleotide sequence encoding a prefusion stabilized SARS-COV-2 S protein has the sequence of SEQ ID NO: 42. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 42 and encodes the amino acid sequence of SEQ ID NO: 9.


The SARS-COV-2 S protein can be stabilized in its prefusion conformation by substituting one or more of residues 985, 986 and 987 (i.e., D985P) with proline. For example, a prefusion stabilized conformation of the SARS-COV-2 S protein can be created by making one stabilizing proline mutation at residue 985 (i.e., D985P); two stabilizing proline mutations at residues 986 and 987 (i.e., K986P, V987P); or three stabilizing proline mutations at residues 985, 986 and 987 (i.e., D985P, K986P, V987P).


In some embodiments, an optimized nucleotide sequence according to the present invention may encode a prefusion stabilized SARS-COV-2 S protein, a prefusion stabilized ectodomain of the SARS-COV-2 S protein, or an antigenic fragment of either, which has been modified relative to naturally occurring SARS-COV-2 S protein by mutating residues 986 and 987 to proline. For example, an optimized nucleotide sequence may encode a prefusion stabilized SARS-COV-2 S protein, which has an amino acid sequence comprising SEQ ID NO:10. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 43. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 43 and encodes the amino acid sequence of SEQ ID NO: 10. In further embodiments, an optimized nucleotide sequence may encode a prefusion stabilized SARS-COV-2 S protein, which has an amino acid sequence comprising SEQ ID NO: 118. This amino acid sequence comprises the D614G mutation. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 119. In specific embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 119 and encodes the amino acid sequence of SEQ ID NO: 118.


In certain embodiments, an optimized nucleotide sequence may encode a prefusion stabilized variant of the S2 subunit of the SARS-COV-2 S protein, which has an amino acid sequence comprising SEQ ID NO:78. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 77. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 77 and encodes the amino acid sequence of SEQ ID NO:78. In certain embodiments, an optimized nucleotide sequence may encode a prefusion stabilized variant of the full length S2 subunit of the SARS-COV-2 S protein, which has an amino acid sequence comprising SEQ ID NO: 70. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 69. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO:69 and encodes the amino acid sequence of SEQ ID NO: 70.


In certain embodiments, an optimized nucleotide sequence may encode a prefusion stabilized variant of the S2′ subunit of the SARS-COV-2 S protein, which has an amino acid sequence comprising SEQ ID NO:82. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO:81. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 81 and encodes the amino acid sequence of SEQ ID NO:82. In certain embodiments, an optimized nucleotide sequence may encode a prefusion stabilized variant of the full length S2′ subunit of the SARS-COV-2 S protein, which has an amino acid sequence comprising SEQ ID NO: 86. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 85. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO:85 and encodes the amino acid sequence of SEQ ID NO:86.


In some embodiments, an optimized nucleotide sequence according to the present invention may encode a prefusion stabilized SARS-COV-2 S protein, a prefusion stabilized ectodomain of the SARS-COV-2 S protein, or an antigenic fragment of either, which has been modified relative to naturally occurring SARS-COV-2 S protein by mutating residue 985 to proline. For example, an optimized nucleotide sequence may encode a prefusion stabilized SARS-COV-2 S protein, which has an amino acid sequence comprising SEQ ID NO:88. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 87. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 87 and encodes the amino acid sequence of SEQ ID NO: 88.


In some embodiments, a prefusion stabilized conformation of the SARS-COV-2 S protein can be created by making three stabilizing proline mutations in the C-terminal of the S2 subunit at residues 985, 986 and 987 (i.e., D985P, K986P, V987P). In some embodiments, an optimized nucleotide sequence according to the present invention may encode a prefusion stabilized SARS-CoV-2 S protein, a prefusion stabilized ectodomain of the SARS-COV-2 S protein, or an antigenic fragment of either, which has been modified relative to naturally occurring SARS-COV-2 S protein by mutating residues 985, 986 and 987 to proline. For example, an optimized nucleotide sequence may encode a prefusion stabilized SARS-COV-2 S protein, which has an amino acid sequence comprising SEQ ID NO:92. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 91. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 91 and encodes the amino acid sequence of SEQ ID NO: 92.


In some embodiments, a prefusion stabilized conformation of the SARS-COV-2 S protein can be created by mutating the furin cleavage site in order to prevent the cleavage of the SI and S2 subunits and (a) by making two stabilizing proline mutations at residues 986 and 987 (i.e., K986P, V987P) and/or (b) by making a stabilizing proline mutation at residue 985. For example, an optimized nucleotide sequence according to the present invention may encode a prefusion stabilized SARS-COV-2 S protein, a prefusion stabilized ectodomain of the SARS-COV-2 S protein, or an antigenic fragment of either, which has been modified relative to naturally occurring SARS-COV-2 S protein by removing the furin cleavage site required for activation and by mutating residues 986 and 987 to proline. In some embodiments, the residues forming the furin cleavage site at residues 682-685 are mutated to the residues GSAS. For example, an optimized nucleotide sequence may encode a prefusion stabilized SARS-COV-2 S protein, which has an amino acid sequence comprising SEQ ID NO:11. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 44. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 44 and encodes the amino acid sequence of SEQ ID NO: 11. In further embodiments, an optimized nucleotide sequence may encode a prefusion stabilized SARS-COV-2 S protein, which has an amino acid sequence comprising SEQ ID NO: 120. This amino acid sequence comprises the D614G mutation. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 121. In some embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 121 and encodes the amino acid sequence of SEQ ID NO: 120. Alternatively, the optimized nucleotide sequence encodes a prefusion stabilized ectodomain of the SARS-COV-2 S protein which has been modified relative to naturally occurring SARS-COV-2 S protein, which has an amino acid sequence comprising SEQ ID NO:12. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 45. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 45 and encodes the amino acid sequence of SEQ ID NO: 12.


In certain embodiments, an optimized nucleotide sequence according to the present invention may encode a prefusion stabilized SARS-COV-2 S protein, a prefusion stabilized ectodomain of the SARS-COV-2 S protein, or an antigenic fragment of either, which has been modified relative to naturally occurring SARS-COV-2 S protein by removing the furin cleavage site required for activation and by mutating residue 985 to proline. In some embodiments, the residues forming the furin cleavage site at residues 682-685 are mutated to the residues GSAS. For example, an optimized nucleotide sequence may encode a prefusion stabilized SARS-COV-2 S protein, which has an amino acid sequence comprising SEQ ID NO:90. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 89. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 89 and encodes the amino acid sequence of SEQ ID NO: 90.


In certain embodiments, an optimized nucleotide sequence according to the present invention may encode a prefusion stabilized SARS-COV-2 S protein, a prefusion stabilized ectodomain of the SARS-COV-2 S protein, or an antigenic fragment of either, which has been modified relative to naturally occurring SARS-COV-2 S protein by removing the furin cleavage site required for activation and by mutating residues 985, 986 and 987 to proline. In some embodiments, the residues forming the furin cleavage site at residues 682-685 are mutated to the residues GSAS. For example, an optimized nucleotide sequence may encode a prefusion stabilized SARS-COV-2 S protein, which has an amino acid sequence comprising SEQ ID NO:94. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 93. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 93 and encodes the amino acid sequence of SEQ ID NO: 94.


The SARS-COV-2 S protein can be further stabilized in its prefusion conformation by substituting one or more of residues 817, 892, 899 and 942 (i.e F817P, A892P, A899P and A942P) with proline. For example, a prefusion stabilized conformation of the SARS-COV-2 S protein can be created by making one stabilizing proline mutation at residue 817 (i.e., F817P); two stabilizing proline mutations at residues 817 and 892 (i.e., F817P, A892P,); or three stabilizing proline mutations at residues 817, 892, 899 (i.e., F817P, A892P, A899P,); or four stabilizing proline mutations at residues 817, 892, 899 and 942 (i.e. F817P, A892P, A899P, A942P). In some embodiments, an optimized nucleotide sequence according to the present invention may encode a prefusion stabilized SARS-COV-2 S protein, a prefusion stabilized ectodomain of the SARS-COV-2 S protein, or an antigenic fragment of either, which has been modified relative to naturally occurring SARS-COV-2 S protein by mutating residues 817, 892, 899 and 942 to proline.


In preferred embodiments, a prefusion stabilized conformation of the SARS-COV-2 S protein can be created by making stabilizing proline mutations at residues 817, 892, 899, 942, 986. In some embodiments, the optimized nucleotide sequence according to the present invention may encode a prefusion stabilized SARS-COV-2 S protein, a prefusion stabilized ectodomain of the SARS-COV-2 S protein, or an antigenic fragment of either, which has been modified relative to naturally occurring SARS-COV-2 S protein by mutating residues 817, 892, 899, 942, 986 and 987 to proline. For example, an optimized nucleotide sequence may encode a prefusion stabilized SARS-COV-2 S protein, which has an amino acid sequence comprising SEQ ID NO: 129. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 128. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 128 and encodes the amino acid sequence of SEQ ID NO: 129.


In some embodiments, an optimized nucleotide sequence according to the present invention may encode a prefusion stabilized SARS-COV-2 S protein, a prefusion stabilized ectodomain of the SARS-COV-2 S protein, or an antigenic fragment of either, which has been modified relative to naturally occurring SARS-COV-2 S protein by removing the furin cleavage site required for activation and by mutating residues 817, 892, 899, 942, 986 and 987 to proline. For example, an optimized nucleotide sequence may encode a prefusion stabilized SARS-COV-2 S protein, which has an amino acid sequence comprising SEQ ID NO: 131. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 130. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 130 and encodes the amino acid sequence of SEQ ID NO: 131.


In some embodiments, an optimized nucleotide sequence according to the present invention may encode a prefusion stabilized SARS-COV-2 S protein, a prefusion stabilized ectodomain of the SARS-COV-2 S protein, or an antigenic fragment of either, which has been modified relative to naturally occurring SARS-COV-2 S protein by mutating residues 817, 892, 899, 942, 986 and 987 to proline and which contains the D614G mutation. For example, an optimized nucleotide sequence may encode a prefusion stabilized SARS-COV-2 S protein, which has an amino acid sequence comprising SEQ ID NO: 133. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 132. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 132 and encodes the amino acid sequence of SEQ ID NO: 133.


In some embodiments, an optimized nucleotide sequence according to the present invention may encode a prefusion stabilized SARS-COV-2 S protein, a prefusion stabilized ectodomain of the SARS-COV-2 S protein, or an antigenic fragment of either, which has been modified relative to naturally occurring SARS-COV-2 S protein by removing the furin cleavage site required for activation, by mutating residues 817, 892, 899, 942, 986 and 987 to proline and which contains the D614G mutation. For example, an optimized nucleotide sequence may encode a prefusion stabilized SARS-COV-2 S protein, which has an amino acid sequence comprising SEQ ID NO: 135. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 134. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 134 and encodes the amino acid sequence of SEQ ID NO: 135.


A T4 bacteriophage fibritin Foldon can be placed at the C terminus of an antigenic fragment the SARS-COV-2 S protein in order to help induce trimer formation. Foldons have been used to produce trimeric influenza hemagglutinin stem domains for use in influenza vaccines (Lu et al. (2014) PNAS, 111, 1, 124-130). The Foldon can have the amino acid sequence of GYIPEAPRDGQAYVRKDGEWVLLSTFL (SEQ ID NO: 13). Accordingly, optimized nucleotide sequences according to the present invention may encode an ectodomain of the SARS-CoV-2 S protein, or an antigenic fragment thereof, and a C terminal Foldon. In particular embodiments, the Foldon is placed at the C terminus of the ectodomain of the SARS-COV-2 S protein or the S2′ subunit of the SARS-COV-2 S protein. In one embodiment, the invention provides an optimized nucleotide sequence that encodes an amino acid sequence comprising the ectodomain of the SARS-COV-2 S protein with a C terminal Foldon, which has an amino acid sequence comprising SEQ ID NO:14. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 46. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 46 and encodes the amino acid sequence of SEQ ID NO: 14. The invention also provides an optimized nucleotide sequence that encodes an amino acid sequence comprising the S2 subunit of the SARS-COV-2 S protein with a C terminal Foldon, which has an amino acid sequence comprising SEQ ID NO: 76. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 75. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 75 and encodes the amino acid sequence of SEQ ID NO: 76. The invention also provides an optimized nucleotide sequence that encodes an amino acid sequence comprising the S2′ subunit of the SARS-COV-2 S protein with a C terminal Foldon, which has an amino acid sequence comprising SEQ ID NO: 15. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 47. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 47 and encodes the amino acid sequence of SEQ ID NO: 15.


In some embodiments, the optimized nucleotide sequence encodes a prefusion stabilized ectodomain of the SARS-COV-2 S protein with a C terminal Foldon, wherein the ectodomain has been modified relative to the ectodomain of the naturally occurring SARS-COV-2 S protein by removing the furin cleavage site required for activation and/or by mutating residues 986 and 987 to proline. In particular embodiments, an optimized nucleotide sequence encodes a prefusion stabilized ectodomain of the SARS-COV-2 S protein with a C terminal Foldon, which has an amino acid sequence comprising SEQ ID NO: 16. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 48. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 48 and encodes the amino acid sequence of SEQ ID NO: 16. In other particular embodiments, an optimized nucleotide sequence encodes a prefusion stabilized ectodomain of the SARS-COV-2 S protein with a C terminal Foldon, wherein the ectodomain been modified relative to naturally occurring SARS-COV-2 S protein by removing the furin cleavage site required for activation and by mutating residues 986 and 987 to proline. Accordingly, in a particular embodiment, an optimized nucleotide sequence encodes a prefusion stabilized ectodomain of the SARS-COV-2 S protein, which has an amino acid sequence comprising SEQ ID NO: 17. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 49. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 49 and encodes the amino acid sequence of SEQ ID NO: 17.


In some embodiments, the optimized nucleotide sequence encodes a prefusion stabilized ectodomain of the S2 or S2′ subunit of the SARS-COV-2 S protein with a C terminal Foldon, wherein compared to the naturally occurring SARS-COV-2 S protein residues 986 and 987 have been mutated to proline. Accordingly, in a particular embodiment, an optimized nucleotide sequence encodes a prefusion stabilized S2 subunit of the SARS-COV-2 S protein, which has the amino acid sequence comprising SEQ ID NO: 80. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 79. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 79 and encodes the amino acid sequence of SEQ ID NO: 80. Accordingly, in a particular embodiment, an optimized nucleotide sequence encodes a prefusion stabilized S2 subunit of the SARS-COV-2 S protein which has an amino acid sequence comprising SEQ ID NO: 84. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 83. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 83 and encodes the amino acid sequence of SEQ ID NO: 84.


The presence of the Fc domain in a protein markedly increases the plasma half-life of the protein and thereby prolongs the molecule's therapeutic activity. The Fc domain is also able to slow renal clearance of a protein from the blood stream and enables the protein to interact with Fc-receptors (FcRs) found on immune cells, a feature that may be advantageous for their use in vaccines. In addition, the Fc domain folds independently and can improve the solubility and stability of the partner molecule both in vitro and in vivo (Czajkowsky et al (2012) EMBO Mol Med. (10): 1015-1028). Accordingly, the invention also provides an optimized nucleotide sequence that encodes an amino acid sequence comprising the ectodomain of the SARS-COV-2 S protein or an antigenic fragment thereof with a C-terminal Fc domain. The Fc domain can comprise the following amino acid sequence: PKSCDKTHTCPPCPAPELLGGPSVFLFPPKPKDTLMISRTPEVTCVVVDVSHEDPEVKFN WYVDGVEVHNAKTKPREEQYNSTYRVVSVLTVLHQDWLNGKEYKCKVSNKALPAPIE KTISKAKGQPREPQVYTLPPSRDELTKNQVSLTCLVKGFYPSDIAVEWESNGQPENNYK TTPPVLDSDGSFFLYSKLTVDKSRWQQGNVFSCSVMHEALHNHYTQKSLSLSPGK (SEQ ID NO: 18). In particular embodiments, the antigenic fragment is the RBD of the SARS-COV-2 S protein. In some embodiments, an optimized nucleotide sequence encodes an amino acid sequence comprising the RBD of the SARS-COV-2 S protein and an Fc domain, which has an amino acid sequence comprising SEQ ID NO: 19. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 50. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 50 and encodes the amino acid sequence of SEQ ID NO: 19.


The invention also provides an optimized nucleotide sequence that encodes the ectodomain of the SARS-COV-2 S protein, or an antigenic fragment thereof, operably linked with an N-terminal signal peptide and a C-terminal Fc domain. The Fc can have the amino acid sequence of SEQ ID NO:18. The signal peptide can have the amino acid sequence of SEQ ID NO: 7. In particular embodiments, the antigenic fragment is the RBD of the SARS-COV-2 S protein. In some embodiments, the invention provides an optimized nucleotide sequence that encodes an amino acid sequence comprising the RBD of the SARS-COV-2 S protein operably linked with an N-terminal signal peptide and a C-terminal Fc domain, which has an amino acid sequence comprising SEQ ID NO: 20. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 36. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 36 and encodes the amino acid sequence of SEQ ID NO: 20.


The pharmacokinetic properties of antibodies are largely dictated by the pH-dependent binding of the Fc domain to the neonatal Fc receptor (FcRn). For example, Fc domains containing the amino acid substitutions M428L/N434S (LS mutant), M252Y/S254T/T256E (YTE mutant), or H433K/N434F (KF mutant) confer 10- to 12-fold higher affinity for FcRn at pH 5.8. This results in a large increase in antibody half-life (2- to 4-fold longer circulation times). Modifying the Fc region included in a fusion protein of the present invention can therefore extend its half-life in serum. An Fc variant containing L309D/Q311H/N434S (DHS) substitutions has been shown to further improve the pharmacokinetics of an antibody relative to both native IgG1 and the aforementioned variants (Lee et al. (2019) Nature communications, 10, 5031). Accordingly, in certain embodiments, the Fc region has been mutated compared to wild-type, using the EU numbering system based on human IGHG. For example, the L residue at position 309, the Q residue at 311 and the N residues at 434 can be mutated to D, H and S respectively (i.e. L309D; Q311H and N434S). The mutated Fc domain can comprise the following sequence:









(SEQ ID NO: 100)


PKSCDKTHTCPPCPAPELLGGPSVFLFPPKPKDTLMISRTPEVTCVVVD





VSHEDPEVKFNWYVDGVEVHNAKTKPREEQYNSTYRVVSVLTVDHHDWL





NGKEYKCKVSNKALPAPIEKTISKAKGQPREPQVYTLPPSRDELTKNQV





SLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTV





DKSRWQQGNVFSCSVMHEALHSHYTQKSLSLSPGK.






In other embodiments, the M residue at position 428 and the N residue at 434 can be mutated to L and S respectively (i.e. M428L and N434S). The mutated Fc domain can comprise the following sequence:









(SEQ ID NO: 101)


PKSCDKTHTCPPCPAPELLGGPSVFLFPPKPKDTLMISRTPEVTCVVVD





VSHEDPEVKFNWYVDGVEVHNAKTKPREEQYNSTYRVVSVLTVLHQDWL





NGKEYKCKVSNKALPAPIEKTISKAKGQPREPQVYTLPPSRDELTKNQV





SLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTV





DKSRWQQGNVFSCSVLHEALHSHYTQKSLSLSPGK.






In other embodiments, the M residue at position 252, the S residue at 254 and the T residue at 256 can be mutated to Y, T and E respectively (i.e. M252Y, S254T and T256E). The mutated Fc domain can comprise the following sequence:









(SEQ ID NO: 102)


PKSCDKTHTCPPCPAPELLGGPSVFLFPPKPKDTLYITREPEVTCVVVD





VSHEDPEVKFNWYVDGVEVHNAKTKPREEQYNSTYRVVSVLTVLHQDWL





NGKEYKCKVSNKALPAPIEKTISKAKGQPREPQVYTLPPSRDELTKNQV





SLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTV





DKSRWQQGNVFSCSVMHEALHNHYTQKSLSLSPGK.






In other embodiments, the H residue at position 433 and the N residue at 434 can be mutated to K and F respectively (i.e. H433K and N434F). The mutated Fc domain can comprise the following sequence:









(SEQ ID NO: 103)


PKSCDKTHTCPPCPAPELLGGPSVFLFPPKPKDTLMISRTPEVTCVVVD





VSHEDPEVKFNWYVDGVEVHNAKTKPREEQYNSTYRVVSVLTVLHQDWL





NGKEYKCKVSNKALPAPIEKTISKAKGQPREPQVYTLPPSRDELTKNQV





SLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTV





DKSRWQQGNVFSCSVMHEALKFHYTQKSLSLSPGK.






Accordingly, the invention also provides an optimized nucleotide sequence that encodes an antigenic fragment of the SARS-COV-2 S protein, or an antigenic fragment thereof, operably linked with an N-terminal signal peptide and a C-terminal Fc domain. The Fc can have the amino acid sequence of SEQ ID NO: 100, SEQ ID NO: 101, SEQ ID NO: 102 or SEQ ID NO: 103. The signal peptide can have the amino acid sequence of SEQ ID NO:7. In particular embodiments, the antigenic fragment is the RBD of the SARS-COV-2 S protein.


In some embodiments, the invention provides an optimized nucleotide sequence that encodes an amino acid sequence comprising the RBD of the SARS-COV-2 S protein operably linked with an N-terminal signal peptide and a C-terminal mutated Fc domain, which has an amino acid sequence comprising SEQ ID NO:104. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 105. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 105 and encodes the amino acid sequence of SEQ ID NO: 104.


In some embodiments, the invention provides an optimized nucleotide sequence that encodes an amino acid sequence comprising the RBD of the SARS-COV-2 S protein operably linked with an N-terminal signal peptide and a C-terminal mutated Fc domain, which has an amino acid sequence comprising SEQ ID NO: 106. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 107. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 107 and encodes the amino acid sequence of SEQ ID NO: 106.


In some embodiments, the invention provides an optimized nucleotide sequence that encodes an amino acid sequence comprising the RBD of the SARS-COV-2 S protein operably linked with an N-terminal signal peptide and a C-terminal mutated Fc domain, which has an amino acid sequence comprising SEQ ID NO: 108. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 109. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 109 and encodes the amino acid sequence of SEQ ID NO: 108.


In some embodiments, the invention provides an optimized nucleotide sequence that encodes an amino acid sequence comprising the RBD of the SARS-COV-2 S protein operably linked with an N-terminal signal peptide and a C-terminal mutated Fc domain, which has an amino acid sequence comprising SEQ ID NO: 110. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 111. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 111 and encodes the amino acid sequence of SEQ ID NO: 110.


Coronaviruses assemble at and bud into the lumen of the endoplasmic reticulum (ER)-Golgi intermediate compartment (ERGIC). The cytoplasmic tail of the SARS-COV-2 S protein contains an ER retrieval signal (ERRS) that can move the S protein from the Golgi to the ER. This process is thought to accumulate S proteins at the ERGIC, which facilitates S protein incorporation into viral particles. The ER retrieval signal in the SARS COV S protein is a dibasic motif (KxHxx) in the cytoplasmic tail, which is similar to a canonical dilysine ER retrieval signal (McBride et al (2007) Journal Of Virology, 81, 5, 2418-2428).


Mutating the ER retrieval signal may prevent the virus from forming viral particles. Without wishing to be bound by any particular theory, the inventors believe that it is advantageous to remove the ER retrieval signals from SARS-COV-2 S proteins that are intended for the inclusion in a vaccine. Therefore, in some embodiments, an optimized nucleotide sequence according to the present invention may encode a prefusion stabilized SARS-COV-2 S protein, a prefusion stabilized ectodomain of the SARS-COV-2 S protein, or an antigenic fragment of either, which has been modified relative to naturally occurring SARS-COV-2 S protein by mutating the ER retrieval signal. For example, the KLHYT ER retrieval signal of the SARS-COV-2 S protein can be removed by mutating resides 1268 and 1270 to alanine (i.e., ALAYT).


In some embodiments, an optimized nucleotide sequence according to the present invention may encode a prefusion stabilized SARS-COV-2 S protein, a prefusion stabilized ectodomain of the SARS-COV-2 S protein, or an antigenic fragment of either, which has been modified relative to naturally occurring SARS-COV-2 S protein by removing the furin cleavage site required for activation, by mutating residues 986 and 987 to proline and by removing the ER retrieval signal. For example, an optimized nucleotide sequence may encode a prefusion stabilized SARS-COV-2 S protein, which has an amino acid sequence comprising SEQ ID NO: 125. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 124. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 124 and encodes the amino acid sequence of SEQ ID NO: 125.


In some embodiments, an optimized nucleotide sequence according to the present invention may encode a prefusion stabilized SARS-COV-2 S protein, a prefusion stabilized ectodomain of the SARS-COV-2 S protein, or an antigenic fragment of either, which has been modified relative to naturally occurring SARS-COV-2 S protein by removing the furin cleavage site required for activation, by mutating residues 986 and 987 to proline, by removing the ER retrieval signal and which contains an extended N-terminal signal peptide. For example, an optimized nucleotide sequence may encode a prefusion stabilized SARS-COV-2 S protein, which has an amino acid sequence comprising SEQ ID NO: 127. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 126. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 126 and encodes the amino acid sequence of SEQ ID NO: 127.


In some embodiments, an optimized nucleotide sequence according to the present invention may encode a prefusion stabilized SARS-COV-2 S protein, a prefusion stabilized ectodomain of the SARS-COV-2 S protein, or an antigenic fragment of either, which has been modified relative to naturally occurring SARS-COV-2 S protein by removing the furin cleavage site required for activation, by mutating residues 817, 892, 899, 942, 986 and 987 to proline and by removing the ER retrieval signal. For example, an optimized nucleotide sequence may encode a prefusion stabilized SARS-COV-2 S protein, which has an amino acid sequence comprising SEQ ID NO: 139. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 138. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 138 and encodes the amino acid sequence of SEQ ID NO: 139.


In some embodiments, an optimized nucleotide sequence according to the present invention may encode a prefusion stabilized SARS-COV-2 S protein, a prefusion stabilized ectodomain of the SARS-COV-2 S protein, or an antigenic fragment of either, which has been modified relative to naturally occurring SARS-COV-2 S protein by removing the furin cleavage site required for activation, by mutating residues 817, 892, 899, 942, 986 and 987 to proline, by removing the ER retrieval signal and which contains an extended N-terminal signal peptide. For example, an optimized nucleotide sequence may encode a prefusion stabilized SARS-COV-2 S protein, which has an amino acid sequence comprising SEQ ID NO: 141. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 140. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 140 and encodes the amino acid sequence of SEQ ID NO: 141.


A specific combination of mutations listed in paragraphs 0 and may be introduced in any of the SARS-COV-2 S proteins disclosed herein. For example, in specific embodiments, an optimized nucleotide sequence according to the present invention may encode a prefusion stabilized SARS-COV-2 S protein, a prefusion stabilized ectodomain of the SARS-COV-2 S protein, or an antigenic fragment of either, which has been modified relative to SARS-COV-2 S protein of the index strain from Wuhan (SEQ ID NO: 1) to contain the ΔH69, ΔV70, ΔY144, N501Y, A570D, D614G, P681H, T716I, S982A and D1118H mutations. Accordingly, in certain embodiments, an optimized nucleotide sequence of the invention may encode a prefusion stabilized SARS-COV-2 S protein, which has an amino acid sequence comprising SEQ ID NO: 151. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 150. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 150 and encodes the amino acid sequence of SEQ ID NO: 151.


In some embodiments, an optimized nucleotide sequence according to the present invention may encode a prefusion stabilized SARS-COV-2 S protein, a prefusion stabilized ectodomain of the SARS-COV-2 S protein, or an antigenic fragment of either, which has been modified relative to SARS-COV-2 S protein of the index strain from Wuhan (SEQ ID NO: 1) by mutating residues 986 and 987 to proline and which contains the ΔH69, ΔV70, ΔY144, N501Y, A570D, D614G, P681H, T716I, S982A and D1118H mutations (UK variant+D614G). Accordingly, in certain embodiments, an optimized nucleotide sequence of the invention may encode a prefusion stabilized SARS-COV-2 S protein, which has an amino acid sequence comprising SEQ ID NO: 153. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 152. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 152 and encodes the amino acid sequence of SEQ ID NO: 153.


In some embodiments, an optimized nucleotide sequence according to the present invention may encode a prefusion stabilized SARS-COV-2 S protein, a prefusion stabilized ectodomain of the SARS-COV-2 S protein, or an antigenic fragment of either, which has been modified relative to SARS-COV-2 S protein of the index strain from Wuhan (SEQ ID NO: 1) by removing the furin cleavage site required for activation and which contains the ΔH69, ΔV70, ΔY144, N501Y, A570D, D614G, P681H, T716I, S982A and D1118H mutations (UK variant+D614G). Accordingly, in certain embodiments, an optimized nucleotide sequence of the invention may encode a prefusion stabilized SARS-COV-2 S protein, which has an amino acid sequence comprising SEQ ID NO: 155. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 154. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 154 and encodes the amino acid sequence of SEQ ID NO: 155.


In some embodiments, an optimized nucleotide sequence according to the present invention may encode a prefusion stabilized SARS-COV-2 S protein, a prefusion stabilized ectodomain of the SARS-COV-2 S protein, or an antigenic fragment of either, which has been modified relative to SARS-COV-2 S protein of the index strain from Wuhan (SEQ ID NO: 1) by removing the furin cleavage site required for activation, by mutating residues 986 and 987 to proline and which contains the ΔH69, ΔV70, ΔY144, N501Y, A570D, D614G, P681H, T716I, S982A and D1118H mutations (UK variant+D614G). Accordingly, in certain embodiments, an optimized nucleotide sequence of the invention may encode a prefusion stabilized SARS-COV-2 S protein, which has an amino acid sequence comprising SEQ ID NO: 157. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 156. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 156 and encodes the amino acid sequence of SEQ ID NO: 157.


In some embodiments, an optimized nucleotide sequence according to the present invention may encode a prefusion stabilized SARS-COV-2 S protein, a prefusion stabilized ectodomain of the SARS-COV-2 S protein, or an antigenic fragment of either, which has been modified relative to SARS-COV-2 S protein of the index strain from Wuhan (SEQ ID NO: 1) by removing the furin cleavage site required for activation, by mutating residues 817, 892, 899 and 942, 986 and 987 to proline and which contains the H69, ΔV70, ΔΥ144, N501Y, A570D, D614G, P681H, T716I, S982A and D1118H mutations (UK variant+D614G). Accordingly, in certain embodiments, an optimized nucleotide sequence of the invention may encode a prefusion stabilized SARS-COV-2 S protein, which has an amino acid sequence comprising SEQ ID NO: 159. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 158. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 158 and encodes the amino acid sequence of SEQ ID NO: 159.


In some embodiments, an optimized nucleotide sequence according to the present invention may encode a prefusion stabilized SARS-COV-2 S protein, a prefusion stabilized ectodomain of the SARS-COV-2 S protein, or an antigenic fragment of either, which has been modified relative to SARS-COV-2 S protein of the index strain from Wuhan (SEQ ID NO: 1) to contain the D80A, D215G, ΔL242, ΔA243, ΔL244, K417N, E484K, N501Y, D614G and A701V mutations (South African variant 1+D614G). Accordingly, in certain embodiments, an optimized nucleotide sequence of the invention may encode a prefusion stabilized SARS-COV-2 S protein, which has an amino acid sequence comprising SEQ ID NO: 161. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 160. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 160 and encodes the amino acid sequence of SEQ ID NO: 161.


In some embodiments, an optimized nucleotide sequence according to the present invention may encode a prefusion stabilized SARS-COV-2 S protein, a prefusion stabilized ectodomain of the SARS-COV-2 S protein, or an antigenic fragment of either, which has been modified relative to SARS-COV-2 S protein of the index strain from Wuhan (SEQ ID NO: 1) by removing the furin cleavage site required for activation, by mutating residues 986 and 987 to proline and which contains the D80A, D215G, ΔL242, ΔA243, ΔL244, K417N, E484K, N501Y, D614G and A701V mutations (South African variant 1+D614G). Accordingly, in certain embodiments, an optimized nucleotide sequence of the invention may encode a prefusion stabilized SARS-COV-2 S protein, which has an amino acid sequence comprising SEQ ID NO: 163. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 162. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 162 and encodes the amino acid sequence of SEQ ID NO: 163.


In some embodiments, an optimized nucleotide sequence according to the present invention may encode a prefusion stabilized SARS-COV-2 S protein, a prefusion stabilized ectodomain of the SARS-COV-2 S protein, or an antigenic fragment of either, which has been modified relative to SARS-COV-2 S protein of the index strain from Wuhan (SEQ ID NO: 1) to contain the L18F, D80A, D215G, ΔL242, ΔA243, ΔL244, K417N, E484K, N501Y, D614G and A701V mutations (South African variant 2+D614G). Accordingly, in certain embodiments, an optimized nucleotide sequence of the invention may encode a prefusion stabilized SARS-COV-2 S protein, which has an amino acid sequence comprising SEQ ID NO: 165. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 164. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 164 and encodes the amino acid sequence of SEQ ID NO: 165.


In some embodiments, an optimized nucleotide sequence according to the present invention may encode a prefusion stabilized SARS-COV-2 S protein, a prefusion stabilized ectodomain of the SARS-COV-2 S protein, or an antigenic fragment of either, which has been modified relative to SARS-COV-2 S protein of the index strain from Wuhan (SEQ ID NO: 1) by removing the furin cleavage site required for activation, by mutating residues 986 and 987 to proline and which contains the L18F, D80A, D215G, ΔL242, ΔA243, ΔL244, K417N, E484K, N501Y, D614G and A701V mutations (South African variant 2+D614G). Accordingly, in certain embodiments, an optimized nucleotide sequence of the invention may encode a prefusion stabilized SARS-COV-2 S protein, which has an amino acid sequence comprising SEQ ID NO: 167. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 166. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 166 and encodes the amino acid sequence of SEQ ID NO: 167.


In some embodiments, an optimized nucleotide sequence according to the present invention may encode a prefusion stabilized SARS-COV-2 S protein, a prefusion stabilized ectodomain of the SARS-COV-2 S protein, or an antigenic fragment of either, which has been modified relative to SARS-COV-2 S protein of the index strain from Wuhan (SEQ ID NO: 1) to contain the L18F, T20N, P26S, D138Y, R190S, K417T, E484K, N501Y, D614G, H655Y, T1027I and V1176F mutations (Brazilian variant+D614G). Accordingly, in certain embodiments, an optimized nucleotide sequence of the invention may encode a prefusion stabilized SARS-COV-2 S protein, which has an amino acid sequence comprising SEQ ID NO: 169. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 168. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 168 and encodes the amino acid sequence of SEQ ID NO: 169.


In some embodiments, an optimized nucleotide sequence according to the present invention may encode a prefusion stabilized SARS-COV-2 S protein, a prefusion stabilized ectodomain of the SARS-COV-2 S protein, or an antigenic fragment of either, which has been modified relative to SARS-COV-2 S protein of the index strain from Wuhan (SEQ ID NO: 1) by removing the furin cleavage site required for activation, by mutating residues 986 and 987 to proline and which contains L18F, T20N, P26S, D138Y, R190S, K417T, E484K, N501Y, D614G, H655Y, T1027I and V1176F mutations (Brazilian variant+D614G). Accordingly, in certain embodiments, an optimized nucleotide sequence of the invention may encode a prefusion stabilized SARS-COV-2 S protein, which has an amino acid sequence comprising SEQ ID NO: 171. In one embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 170. In other embodiments, the optimized nucleotide sequence is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 170 and encodes the amino acid sequence of SEQ ID NO: 171.


Exemplary Optimized Nucleotide Sequences Encoding a SARS-Cov-2 S Protein and Antigenic Fragments

An optimized nucleotide sequence according to the present invention may encode a SARS-COV-2 S protein or an antigenic fragment thereof. In one embodiment, the invention provides a nucleic acid comprising an optimized nucleotide sequence encoding a SARS-COV-2 S protein or an antigenic fragment thereof. In some embodiments, the nucleic acid is an mRNA comprising an optimized nucleotide sequence encoding a SARS-COV-2 S protein or an antigenic fragment thereof. In some embodiments, a suitable mRNA sequence comprises a nucleotide sequence encoding a SARS-COV-2 S protein or an antigenic fragment thereof optimized for efficient expression human cells. Exemplary optimized nucleotide sequences encoding a SARS-CoV-2 S protein or an antigenic fragment thereof produced with the process for generating optimized nucleotide sequences in accordance with the invention and their corresponding amino acid sequence are shown in Table 1. Bold residues indicate those amino acids which have been mutated compared to a naturally occurring SARS-COV-2 S protein, underlined residues represent a signal peptide and the residues in italics indicate the presence of an Fc region or a Foldon.









TABLE 1





Exemplary SARS-CoV-2 S sequences.
















Optimized nucleotide
(SEQ ID NO: 29)


sequence encoding a
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


SARS-CoV-2 S protein
TGTGTCAACCTGACAACTAGGACTCAGCTGCCACCAGCCTA



CACCAACTCCTTCACCAGAGGCGTGTATTACCCAGACAAGG



TGTTTAGAAGCAGCGTGCTGCACTCTACCCAGGACCTCTTT



CTGCCCTTTTTCAGCAACGTGACATGGTTTCACGCAATTCA



CGTGTCCGGCACTAATGGCACAAAGCGGTTCGACAATCCA



GTCCTGCCTTTCAACGATGGCGTCTACTTTGCATCTACTGA



GAAATCCAATATCATTAGGGGATGGATCTTCGGCACAACCC



TGGATTCTAAGACCCAGAGCCTGCTGATCGTCAACAACGCC



ACAAACGTGGTCATTAAGGTTTGCGAGTTTCAGTTCTGTAA



CGATCCTTTTCTGGGCGTGTATTATCATAAGAACAATAAGA



GCTGGATGGAGTCCGAGTTTAGAGTGTATAGCTCTGCAAAT



AATTGTACCTTTGAGTACGTGAGCCAGCCCTTTCTGATGGA



CCTGGAGGGAAAACAAGGAAACTTCAAAAACCTGCGGGAA



TTCGTTTTCAAAAACATCGACGGCTATTTCAAGATCTATAG



CAAGCATACCCCAATCAACCTCGTGAGGGACCTCCCCCAG



GGCTTTAGCGCACTGGAGCCACTGGTTGACCTGCCTATCGG



CATTAATATCACAAGATTTCAGACCCTGCTGGCACTGCATA



GAAGCTATCTGACCCCTGGAGACTCCTCTAGTGGGTGGACT



GCCGGCGCCGCTGCCTACTATGTGGGCTATCTGCAGCCACG



GACATTCCTGCTGAAATACAATGAGAACGGGACAATCACA



GATGCTGTTGATTGCGCACTCGACCCCCTGTCCGAGACAAA



GTGCACTCTCAAGAGCTTTACCGTCGAGAAGGGCATCTATC



AGACCTCAAACTTCAGGGTGCAGCCCACAGAATCTATCGTG



CGCTTCCCTAATATCACTAACCTGTGTCCTTTCGGTGAAGT



GTTCAACGCCACCAGGTTTGCTAGCGTGTATGCCTGGAACA



GGAAGAGGATCTCTAACTGCGTCGCCGACTATTCCGTGCTG



TATAACAGCGCCTCCTTCTCCACATTCAAATGCTATGGAGT



GAGCCCGACAAAACTGAACGATCTCTGCTTTACAAATGTCT



ACGCCGACTCTTTTGTGATCAGAGGGGACGAGGTCCGGCA



GATCGCACCAGGACAGACAGGCAAGATTGCTGACTACAAC



TATAAGCTGCCTGACGACTTCACAGGATGTGTGATCGCATG



GAACTCAAACAATCTGGACTCCAAAGTCGGGGGCAACTAT



AATTACCTGTATCGCCTGTTCCGGAAGTCCAACCTGAAGCC



CTTCGAGAGGGACATCAGTACAGAGATCTATCAGGCTGGC



TCCACCCCTTGCAATGGCGTCGAAGGCTTTAATTGTTATTTT



CCCCTGCAGTCTTACGGGTTTCAGCCTACTAATGGAGTTGG



GTACCAGCCATACAGAGTGGTCGTGCTCAGCTTCGAGCTCC



TGCATGCTCCAGCTACAGTTTGCGGGCCAAAGAAGTCCACT



AACCTGGTGAAGAATAAGTGCGTCAACTTCAACTTTAACGG



GCTCACCGGCACCGGCGTGCTGACTGAGAGCAACAAGAAG



TTTCTGCCATTTCAACAGTTTGGACGGGACATTGCCGACAC



CACCGATGCCGTTCGGGATCCACAGACCCTGGAAATTCTGG



ACATTACACCGTGCAGCTTCGGGGGCGTGAGCGTGATCAC



ACCCGGAACCAATACAAGCAACCAGGTTGCCGTCCTGTATC



AGGATGTCAATTGCACAGAAGTGCCAGTTGCTATCCACGCA



GACCAGCTGACTCCCACATGGCGGGTGTATAGCACCGGAT



CCAACGTGTTTCAGACCCGCGCCGGATGTCTCATTGGGGCC



GAGCACGTGAATAACAGCTACGAGTGCGACATCCCCATTG



GCGCCGGCATTTGTGCGTCTTACCAGACTCAGACCAACTCT



CCTAGAAGGGCCAGGTCCGTTGCTAGTCAGTCTATTATTGC



CTATACCATGAGCCTCGGAGCTGAGAATAGCGTGGCCTACT



CCAATAATTCCATCGCAATCCCTACTAACTTCACTATTTCTG



TGACCACCGAGATCCTGCCTGTGTCTATGACTAAGACTAGC



GTTGATTGTACCATGTATATTTGTGGCGACTCTACCGAATG



TTCTAACCTGCTGCTTCAGTACGGCTCATTTTGCACACAGCT



GAACAGAGCCCTGACTGGGATCGCTGTGGAGCAGGACAAG



AACACACAGGAGGTGTTTGCACAGGTGAAGCAGATCTATA



AGACCCCTCCTATTAAGGATTTCGGCGGATTCAATTTCTCA



CAGATTCTGCCAGACCCCAGTAAGCCTTCCAAGAGGAGCTT



CATCGAGGATCTCCTGTTTAACAAGGTGACCCTGGCAGACG



CCGGCTTTATTAAGCAATATGGGGATTGCCTGGGCGACATT



GCTGCCAGAGACCTGATTTGCGCCCAGAAATTCAATGGCCT



CACAGTGCTGCCACCTCTGCTGACCGACGAGATGATCGCTC



AATACACTAGCGCACTGCTGGCCGGAACCATCACATCAGG



CTGGACCTTCGGGGCCGGAGCAGCACTGCAGATTCCATTCG



CCATGCAGATGGCCTATAGATTCAACGGCATTGGCGTCACA



CAGAACGTGCTGTACGAAAACCAGAAGCTCATCGCTAACC



AGTTTAATTCCGCAATTGGAAAGATCCAAGATTCACTCAGC



TCAACCGCCTCTGCACTCGGAAAGCTGCAGGACGTGGTCA



ACCAGAATGCTCAGGCCCTGAACACACTCGTCAAGCAGCT



GTCCTCTAACTTTGGCGCTATCAGCTCCGTTCTGAACGACA



TTCTGAGCCGCCTGGATAAGGTGGAGGCTGAAGTCCAGATT



GACCGCCTGATTACCGGCCGGCTGCAGTCTCTGCAAACATA



CGTGACCCAGCAGCTGATCAGAGCAGCCGAGATCCGGGCA



TCCGCAAATCTGGCAGCAACTAAGATGAGCGAATGCGTGC



TGGGCCAGTCCAAGCGGGTGGACTTTTGTGGCAAGGGCTA



CCACCTGATGAGCTTCCCCCAGAGCGCCCCACATGGCGTTG



TTTTTCTGCACGTGACCTATGTCCCTGCTCAGGAAAAGAAC



TTTACAACTGCTCCTGCTATCTGCCATGACGGCAAGGCCCA



CTTCCCACGGGAGGGAGTGTTTGTGTCCAATGGCACACACT



GGTTCGTGACCCAGAGGAACTTCTATGAACCCCAGATCATC



ACCACTGACAATACCTTCGTGTCTGGAAATTGCGACGTCGT



GATCGGCATCGTTAACAACACCGTGTACGACCCTCTCCAGC



CAGAGCTGGACTCCTTTAAGGAGGAACTGGATAAGTATTTT



AAGAACCACACAAGCCCAGATGTGGATCTCGGGGACATCT



CCGGAATTAACGCCTCCGTGGTGAATATCCAGAAGGAGAT



TGACCGCCTAAATGAAGTTGCCAAGAACCTCAATGAGTCTC



TGATTGATCTGCAGGAACTGGGCAAGTATGAGCAGTATATC



AAATGGCCCTGGTACATTTGGCTGGGGTTTATCGCCGGACT



GATTGCCATCGTCATGGTGACCATCATGCTGTGTTGCATGA



CCTCCTGTTGTTCCTGTCTGAAGGGCTGCTGTAGTTGCGGCT



CTTGCTGTAAATTCGACGAAGATGATAGCGAGCCCGTGCTG



AAGGGCGTGAAGCTGCATTATACCTGA





SARS-CoV-2 S protein
(SEQ ID NO: 1)


sequence
MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVF



RSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFDNPVLPF



NDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKV



CEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVS



QPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRD



LPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWT



AGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKC



TLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATR



FASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLN



DLCFTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTG



CVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQ



AGSTPCNGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFE



LLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKK



FLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTN



TSNQVAVLYQDVNCTEVPVAIHADQLTPTWRVYSTGSNVFQT



RAGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSPRRARSVA



SQSIIAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEILPVSMTK



TSVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDK



NTQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSFIEDL



LFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPL



LTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQMAYRF



NGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQ



DVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDKVEAEV



QIDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVL



GQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNF



TTAPAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTD



NTFVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTS



PDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGK



YEQYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCC



SCGSCCKFDEDDSEPVLKGVKLHYT





Optimized nucleotide
(SEQ ID NO: 30)


sequence encoding the
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


ectodomain of a SARS-
TGTGTCAACCTGACAACTAGGACTCAGCTGCCACCAGCCTA


CoV-2 S protein
CACCAACTCCTTCACCAGAGGCGTGTATTACCCAGACAAGG



TGTTTAGAAGCAGCGTGCTGCACTCTACCCAGGACCTCTTT



CTGCCCTTTTTCAGCAACGTGACATGGTTTCACGCAATTCA



CGTGTCCGGCACTAATGGCACAAAGCGGTTCGACAATCCA



GTCCTGCCTTTCAACGATGGCGTCTACTTTGCATCTACTGA



GAAATCCAATATCATTAGGGGATGGATCTTCGGCACAACCC



TGGATTCTAAGACCCAGAGCCTGCTGATCGTCAACAACGCC



ACAAACGTGGTCATTAAGGTTTGCGAGTTTCAGTTCTGTAA



CGATCCTTTTCTGGGCGTGTATTATCATAAGAACAATAAGA



GCTGGATGGAGTCCGAGTTTAGAGTGTATAGCTCTGCAAAT



AATTGTACCTTTGAGTACGTGAGCCAGCCCTTTCTGATGGA



CCTGGAGGGAAAACAAGGAAACTTCAAAAACCTGCGGGAA



TTCGTTTTCAAAAACATCGACGGCTATTTCAAGATCTATAG



CAAGCATACCCCAATCAACCTCGTGAGGGACCTCCCCCAG



GGCTTTAGCGCACTGGAGCCACTGGTTGACCTGCCTATCGG



CATTAATATCACAAGATTTCAGACCCTGCTGGCACTGCATA



GAAGCTATCTGACCCCTGGAGACTCCTCTAGTGGGTGGACT



GCCGGCGCCGCTGCCTACTATGTGGGCTATCTGCAGCCACG



GACATTCCTGCTGAAATACAATGAGAACGGGACAATCACA



GATGCTGTTGATTGCGCACTCGACCCCCTGTCCGAGACAAA



GTGCACTCTCAAGAGCTTTACCGTCGAGAAGGGCATCTATC



AGACCTCAAACTTCAGGGTGCAGCCCACAGAATCTATCGTG



CGCTTCCCTAATATCACTAACCTGTGTCCTTTCGGTGAAGT



GTTCAACGCCACCAGGTTTGCTAGCGTGTATGCCTGGAACA



GGAAGAGGATCTCTAACTGCGTCGCCGACTATTCCGTGCTG



TATAACAGCGCCTCCTTCTCCACATTCAAATGCTATGGAGT



GAGCCCGACAAAACTGAACGATCTCTGCTTTACAAATGTCT



ACGCCGACTCTTTTGTGATCAGAGGGGACGAGGTCCGGCA



GATCGCACCAGGACAGACAGGCAAGATTGCTGACTACAAC



TATAAGCTGCCTGACGACTTCACAGGATGTGTGATCGCATG



GAACTCAAACAATCTGGACTCCAAAGTCGGGGGCAACTAT



AATTACCTGTATCGCCTGTTCCGGAAGTCCAACCTGAAGCC



CTTCGAGAGGGACATCAGTACAGAGATCTATCAGGCTGGC



TCCACCCCTTGCAATGGCGTCGAAGGCTTTAATTGTTATTTT



CCCCTGCAGTCTTACGGGTTTCAGCCTACTAATGGAGTTGG



GTACCAGCCATACAGAGTGGTCGTGCTCAGCTTCGAGCTCC



TGCATGCTCCAGCTACAGTTTGCGGGCCAAAGAAGTCCACT



AACCTGGTGAAGAATAAGTGCGTCAACTTCAACTTTAACGG



GCTCACCGGCACCGGCGTGCTGACTGAGAGCAACAAGAAG



TTTCTGCCATTTCAACAGTTTGGACGGGACATTGCCGACAC



CACCGATGCCGTTCGGGATCCACAGACCCTGGAAATTCTGG



ACATTACACCGTGCAGCTTCGGGGGCGTGAGCGTGATCAC



ACCCGGAACCAATACAAGCAACCAGGTTGCCGTCCTGTATC



AGGATGTCAATTGCACAGAAGTGCCAGTTGCTATCCACGCA



GACCAGCTGACTCCCACATGGCGGGTGTATAGCACCGGAT



CCAACGTGTTTCAGACCCGCGCCGGATGTCTCATTGGGGCC



GAGCACGTGAATAACAGCTACGAGTGCGACATCCCCATTG



GCGCCGGCATTTGTGCGTCTTACCAGACTCAGACCAACTCT



CCTAGAAGGGCCAGGTCCGTTGCTAGTCAGTCTATTATTGC



CTATACCATGAGCCTCGGAGCTGAGAATAGCGTGGCCTACT



CCAATAATTCCATCGCAATCCCTACTAACTTCACTATTTCTG



TGACCACCGAGATCCTGCCTGTGTCTATGACTAAGACTAGC



GTTGATTGTACCATGTATATTTGTGGCGACTCTACCGAATG



TTCTAACCTGCTGCTTCAGTACGGCTCATTTTGCACACAGCT



GAACAGAGCCCTGACTGGGATCGCTGTGGAGCAGGACAAG



AACACACAGGAGGTGTTTGCACAGGTGAAGCAGATCTATA



AGACCCCTCCTATTAAGGATTTCGGCGGATTCAATTTCTCA



CAGATTCTGCCAGACCCCAGTAAGCCTTCCAAGAGGAGCTT



CATCGAGGATCTCCTGTTTAACAAGGTGACCCTGGCAGACG



CCGGCTTTATTAAGCAATATGGGGATTGCCTGGGCGACATT



GCTGCCAGAGACCTGATTTGCGCCCAGAAATTCAATGGCCT



CACAGTGCTGCCACCTCTGCTGACCGACGAGATGATCGCTC



AATACACTAGCGCACTGCTGGCCGGAACCATCACATCAGG



CTGGACCTTCGGGGCCGGAGCAGCACTGCAGATTCCATTCG



CCATGCAGATGGCCTATAGATTCAACGGCATTGGCGTCACA



CAGAACGTGCTGTACGAAAACCAGAAGCTCATCGCTAACC



AGTTTAATTCCGCAATTGGAAAGATCCAAGATTCACTCAGC



TCAACCGCCTCTGCACTCGGAAAGCTGCAGGACGTGGTCA



ACCAGAATGCTCAGGCCCTGAACACACTCGTCAAGCAGCT



GTCCTCTAACTTTGGCGCTATCAGCTCCGTTCTGAACGACA



TTCTGAGCCGCCTGGATAAGGTGGAGGCTGAAGTCCAGATT



GACCGCCTGATTACCGGCCGGCTGCAGTCTCTGCAAACATA



CGTGACCCAGCAGCTGATCAGAGCAGCCGAGATCCGGGCA



TCCGCAAATCTGGCAGCAACTAAGATGAGCGAATGCGTGC



TGGGCCAGTCCAAGCGGGTGGACTTTTGTGGCAAGGGCTA



CCACCTGATGAGCTTCCCCCAGAGCGCCCCACATGGCGTTG



TTTTTCTGCACGTGACCTATGTCCCTGCTCAGGAAAAGAAC



TTTACAACTGCTCCTGCTATCTGCCATGACGGCAAGGCCCA



CTTCCCACGGGAGGGAGTGTTTGTGTCCAATGGCACACACT



GGTTCGTGACCCAGAGGAACTTCTATGAACCCCAGATCATC



ACCACTGACAATACCTTCGTGTCTGGAAATTGCGACGTCGT



GATCGGCATCGTTAACAACACCGTGTACGACCCTCTCCAGC



CAGAGCTGGACTCCTTTAAGGAGGAACTGGATAAGTATTTT



AAGAACCACACAAGCCCAGATGTGGATCTCGGGGACATCT



CCGGAATTAACGCCTCCGTGGTGAATATCCAGAAGGAGAT



TGACCGCCTAAATGAAGTTGCCAAGAACCTCAATGAGTCTC



TGATTGATCTGCAGGAACTGGGCAAGTATGAGCAGTGA





Ectodomain of a SARS-
(SEQ ID NO: 2)


CoV-2 S protein
MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVF



RSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFDNPVLPF



NDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKV



CEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVS



QPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRD



LPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWT



AGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKC



TLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATR



FASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLN



DLCFTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTG



CVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQ



AGSTPCNGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFE



LLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKK



FLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTN



TSNQVAVLYQDVNCTEVPVAIHADQLTPTWRVYSTGSNVFQT



RAGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSPRRARSVA



SQSIIAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEILPVSMTK



TSVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDK



NTQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSFIEDL



LFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPL



LTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQMAYRF



NGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQ



DVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDKVEAEV



QIDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVL



GQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNF



TTAPAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTD



NTFVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTS



PDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGK



YEQ





Optimized nucleotide
(SEQ ID NO: 31)


sequence encoding the
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


S1 subunit of a SARS-
TGTGTCAACCTGACAACTAGGACTCAGCTGCCACCAGCCTA


CoV-2 S protein
CACCAACTCCTTCACCAGAGGCGTGTATTACCCAGACAAGG



TGTTTAGAAGCAGCGTGCTGCACTCTACCCAGGACCTCTTT



CTGCCCTTTTTCAGCAACGTGACATGGTTTCACGCAATTCA



CGTGTCCGGCACTAATGGCACAAAGCGGTTCGACAATCCA



GTCCTGCCTTTCAACGATGGCGTCTACTTTGCATCTACTGA



GAAATCCAATATCATTAGGGGATGGATCTTCGGCACAACCC



TGGATTCTAAGACCCAGAGCCTGCTGATCGTCAACAACGCC



ACAAACGTGGTCATTAAGGTTTGCGAGTTTCAGTTCTGTAA



CGATCCTTTTCTGGGCGTGTATTATCATAAGAACAATAAGA



GCTGGATGGAGTCCGAGTTTAGAGTGTATAGCTCTGCAAAT



AATTGTACCTTTGAGTACGTGAGCCAGCCCTTTCTGATGGA



CCTGGAGGGAAAACAAGGAAACTTCAAAAACCTGCGGGAA



TTCGTTTTCAAAAACATCGACGGCTATTTCAAGATCTATAG



CAAGCATACCCCAATCAACCTCGTGAGGGACCTCCCCCAG



GGCTTTAGCGCACTGGAGCCACTGGTTGACCTGCCTATCGG



CATTAATATCACAAGATTTCAGACCCTGCTGGCACTGCATA



GAAGCTATCTGACCCCTGGAGACTCCTCTAGTGGGTGGACT



GCCGGCGCCGCTGCCTACTATGTGGGCTATCTGCAGCCACG



GACATTCCTGCTGAAATACAATGAGAACGGGACAATCACA



GATGCTGTTGATTGCGCACTCGACCCCCTGTCCGAGACAAA



GTGCACTCTCAAGAGCTTTACCGTCGAGAAGGGCATCTATC



AGACCTCAAACTTCAGGGTGCAGCCCACAGAATCTATCGTG



CGCTTCCCTAATATCACTAACCTGTGTCCTTTCGGTGAAGT



GTTCAACGCCACCAGGTTTGCTAGCGTGTATGCCTGGAACA



GGAAGAGGATCTCTAACTGCGTCGCCGACTATTCCGTGCTG



TATAACAGCGCCTCCTTCTCCACATTCAAATGCTATGGAGT



GAGCCCGACAAAACTGAACGATCTCTGCTTTACAAATGTCT



ACGCCGACTCTTTTGTGATCAGAGGGGACGAGGTCCGGCA



GATCGCACCAGGACAGACAGGCAAGATTGCTGACTACAAC



TATAAGCTGCCTGACGACTTCACAGGATGTGTGATCGCATG



GAACTCAAACAATCTGGACTCCAAAGTCGGGGGCAACTAT



AATTACCTGTATCGCCTGTTCCGGAAGTCCAACCTGAAGCC



CTTCGAGAGGGACATCAGTACAGAGATCTATCAGGCTGGC



TCCACCCCTTGCAATGGCGTCGAAGGCTTTAATTGTTATTTT



CCCCTGCAGTCTTACGGGTTTCAGCCTACTAATGGAGTTGG



GTACCAGCCATACAGAGTGGTCGTGCTCAGCTTCGAGCTCC



TGCATGCTCCAGCTACAGTTTGCGGGCCAAAGAAGTCCACT



AACCTGGTGAAGAATAAGTGCGTCAACTTCAACTTTAACGG



GCTCACCGGCACCGGCGTGCTGACTGAGAGCAACAAGAAG



TTTCTGCCATTTCAACAGTTTGGACGGGACATTGCCGACAC



CACCGATGCCGTTCGGGATCCACAGACCCTGGAAATTCTGG



ACATTACACCGTGCAGCTTCGGGGGCGTGAGCGTGATCAC



ACCCGGAACCAATACAAGCAACCAGGTTGCCGTCCTGTATC



AGGATGTCAATTGCACAGAAGTGCCAGTTGCTATCCACGCA



GACCAGCTGACTCCCACATGGCGGGTGTATAGCACCGGAT



CCAACGTGTTTCAGACCCGCGCCGGATGTCTCATTGGGGCC



GAGCACGTGAATAACAGCTACGAGTGCGACATCCCCATTG



GCGCCGGCATTTGTGCGTCTTACCAGACTCAGACCAACTCT



CCTTGA





S1 subunit of a SARS-
(SEQ ID NO: 3)


CoV-2 S protein
MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVF



RSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFDNPVLPF



NDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKV



CEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVS



QPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRD



LPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWT



AGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKC



TLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATR



FASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLN



DLCFTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTG



CVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQ



AGSTPCNGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFE



LLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKK



FLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTN



TSNQVAVLYQDVNCTEVPVAIHADQLTPTWRVYSTGSNVFQT



RAGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSP





Optimized nucleotide
(SEQ ID NO: 32)


sequence encoding the
ATGTCCGTTGCTAGTCAGTCTATTATTGCCTATACCATGAG


subunit of a SARS-CoV-
CCTCGGAGCTGAGAATAGCGTGGCCTACTCCAATAATTCCA


2 S protein
TCGCAATCCCTACTAACTTCACTATTTCTGTGACCACCGAG



ATCCTGCCTGTGTCTATGACTAAGACTAGCGTTGATTGTAC



CATGTATATTTGTGGCGACTCTACCGAATGTTCTAACCTGC



TGCTTCAGTACGGCTCATTTTGCACACAGCTGAACAGAGCC



CTGACTGGGATCGCTGTGGAGCAGGACAAGAACACACAGG



AGGTGTTTGCACAGGTGAAGCAGATCTATAAGACCCCTCCT



ATTAAGGATTTCGGCGGATTCAATTTCTCACAGATTCTGCC



AGACCCCAGTAAGCCTTCCAAGAGGAGCTTCATCGAGGAT



CTCCTGTTTAACAAGGTGACCCTGGCAGACGCCGGCTTTAT



TAAGCAATATGGGGATTGCCTGGGCGACATTGCTGCCAGA



GACCTGATTTGCGCCCAGAAATTCAATGGCCTCACAGTGCT



GCCACCTCTGCTGACCGACGAGATGATCGCTCAATACACTA



GCGCACTGCTGGCCGGAACCATCACATCAGGCTGGACCTTC



GGGGCCGGAGCAGCACTGCAGATTCCATTCGCCATGCAGA



TGGCCTATAGATTCAACGGCATTGGCGTCACACAGAACGTG



CTGTACGAAAACCAGAAGCTCATCGCTAACCAGTTTAATTC



CGCAATTGGAAAGATCCAAGATTCACTCAGCTCAACCGCCT



CTGCACTCGGAAAGCTGCAGGACGTGGTCAACCAGAATGC



TCAGGCCCTGAACACACTCGTCAAGCAGCTGTCCTCTAACT



TTGGCGCTATCAGCTCCGTTCTGAACGACATTCTGAGCCGC



CTGGATAAGGTGGAGGCTGAAGTCCAGATTGACCGCCTGA



TTACCGGCCGGCTGCAGTCTCTGCAAACATACGTGACCCAG



CAGCTGATCAGAGCAGCCGAGATCCGGGCATCCGCAAATC



TGGCAGCAACTAAGATGAGCGAATGCGTGCTGGGCCAGTC



CAAGCGGGTGGACTTTTGTGGCAAGGGCTACCACCTGATG



AGCTTCCCCCAGAGCGCCCCACATGGCGTTGTTTTTCTGCA



CGTGACCTATGTCCCTGCTCAGGAAAAGAACTTTACAACTG



CTCCTGCTATCTGCCATGACGGCAAGGCCCACTTCCCACGG



GAGGGAGTGTTTGTGTCCAATGGCACACACTGGTTCGTGAC



CCAGAGGAACTTCTATGAACCCCAGATCATCACCACTGACA



ATACCTTCGTGTCTGGAAATTGCGACGTCGTGATCGGCATC



GTTAACAACACCGTGTACGACCCTCTCCAGCCAGAGCTGGA



CTCCTTTAAGGAGGAACTGGATAAGTATTTTAAGAACCACA



CAAGCCCAGATGTGGATCTCGGGGACATCTCCGGAATTAA



CGCCTCCGTGGTGAATATCCAGAAGGAGATTGACCGCCTA



AATGAAGTTGCCAAGAACCTCAATGAGTCTCTGATTGATCT



GCAGGAACTGGGCAAGTATGAGCAGTGA





S2 subunit of a SARS-
(SEQ ID NO: 4)


CoV-2 S protein
MSVASQSIIAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEILPV



SMTKTSVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAV



EQDKNTQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRS



FIEDLLFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLT



VLPPLLTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQ



MAYRFNGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASA



LGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDK



VEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKM



SECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPA



QEKNFTTAPAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEP



QIITTDNTFVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYF



KNHTSPDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDL



QELGKYEQ





Optimized nucleotide
(SEQ ID NO: 71)


sequence encoding the
ATGTCCGTTGCTAGTCAGTCTATTATTGCCTATACCATGAG


full length S2 subunit of
CCTCGGAGCTGAGAATAGCGTGGCCTACTCCAATAATTCCA


a SARS-CoV-2 S protein
TCGCAATCCCTACTAACTTCACTATTTCTGTGACCACCGAG



ATCCTGCCTGTGTCTATGACTAAGACTAGCGTTGATTGTAC



CATGTATATTTGTGGCGACTCTACCGAATGTTCTAACCTGC



TGCTTCAGTACGGCTCATTTTGCACACAGCTGAACAGAGCC



CTGACTGGGATCGCTGTGGAGCAGGACAAGAACACACAGG



AGGTGTTTGCACAGGTGAAGCAGATCTATAAGACCCCTCCT



ATTAAGGATTTCGGCGGATTCAATTTCTCACAGATTCTGCC



AGACCCCAGTAAGCCTTCCAAGAGGAGCTTCATCGAGGAT



CTCCTGTTTAACAAGGTGACCCTGGCAGACGCCGGCTTTAT



TAAGCAATATGGGGATTGCCTGGGCGACATTGCTGCCAGA



GACCTGATTTGCGCCCAGAAATTCAATGGCCTCACAGTGCT



GCCACCTCTGCTGACCGACGAGATGATCGCTCAATACACTA



GCGCACTGCTGGCCGGAACCATCACATCAGGCTGGACCTTC



GGGGCCGGAGCAGCACTGCAGATTCCATTCGCCATGCAGA



TGGCCTATAGATTCAACGGCATTGGCGTCACACAGAACGTG



CTGTACGAAAACCAGAAGCTCATCGCTAACCAGTTTAATTC



CGCAATTGGAAAGATCCAAGATTCACTCAGCTCAACCGCCT



CTGCACTCGGAAAGCTGCAGGACGTGGTCAACCAGAATGC



TCAGGCCCTGAACACACTCGTCAAGCAGCTGTCCTCTAACT



TTGGCGCTATCAGCTCCGTTCTGAACGACATTCTGAGCCGC



CTGGATAAGGTGGAGGCTGAAGTCCAGATTGACCGCCTGA



TTACCGGCCGGCTGCAGTCTCTGCAAACATACGTGACCCAG



CAGCTGATCAGAGCAGCCGAGATCCGGGCATCCGCAAATC



TGGCAGCAACTAAGATGAGCGAATGCGTGCTGGGCCAGTC



CAAGCGGGTGGACTTTTGTGGCAAGGGCTACCACCTGATG



AGCTTCCCCCAGAGCGCCCCACATGGCGTTGTTTTTCTGCA



CGTGACCTATGTCCCTGCTCAGGAAAAGAACTTTACAACTG



CTCCTGCTATCTGCCATGACGGCAAGGCCCACTTCCCACGG



GAGGGAGTGTTTGTGTCCAATGGCACACACTGGTTCGTGAC



CCAGAGGAACTTCTATGAACCCCAGATCATCACCACTGACA



ATACCTTCGTGTCTGGAAATTGCGACGTCGTGATCGGCATC



GTTAACAACACCGTGTACGACCCTCTCCAGCCAGAGCTGGA



CTCCTTTAAGGAGGAACTGGATAAGTATTTTAAGAACCACA



CAAGCCCAGATGTGGATCTCGGGGACATCTCCGGAATTAA



CGCCTCCGTGGTGAATATCCAGAAGGAGATTGACCGCCTA



AATGAAGTTGCCAAGAACCTCAATGAGTCTCTGATTGATCT



GCAGGAACTGGGCAAGTATGAGCAGTATATCAAATGGCCC



TGGTACATTTGGCTGGGGTTTATCGCCGGACTGATTGCCAT



CGTCATGGTGACCATCATGCTGTGTTGCATGACCTCCTGTT



GTTCCTGTCTGAAGGGCTGCTGTAGTTGCGGCTCTTGCTGT



AAATTCGACGAAGATGATAGCGAGCCCGTGCTGAAGGGCG



TGAAGCTGCATTATACCTGA





Full length S2 subunit of
(SEQ ID NO: 72)


a SARS-CoV-2 S protein
MSVASQSIIAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEILPV



SMTKTSVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAV



EQDKNTQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRS



FIEDLLFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLT



VLPPLLTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQ



MAYRFNGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASA



LGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDK



VEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKM



SECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPA



QEKNFTTAPAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEP



QIITTDNTFVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYF



KNHTSPDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDL



QELGKYEQYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSC



LKGCCSCGSCCKFDEDDSEPVLKGVKLHYT





Optimized nucleotide
(SEQ ID NO: 73)


sequence encoding the
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


S2 subunit of a SARS-
TGTTCCGTTGCTAGTCAGTCTATTATTGCCTATACCATGAGC


CoV-2 S protein with a
CTCGGAGCTGAGAATAGCGTGGCCTACTCCAATAATTCCAT


signal sequence
CGCAATCCCTACTAACTTCACTATTTCTGTGACCACCGAGA



TCCTGCCTGTGTCTATGACTAAGACTAGCGTTGATTGTACC



ATGTATATTTGTGGCGACTCTACCGAATGTTCTAACCTGCT



GCTTCAGTACGGCTCATTTTGCACACAGCTGAACAGAGCCC



TGACTGGGATCGCTGTGGAGCAGGACAAGAACACACAGGA



GGTGTTTGCACAGGTGAAGCAGATCTATAAGACCCCTCCTA



TTAAGGATTTCGGCGGATTCAATTTCTCACAGATTCTGCCA



GACCCCAGTAAGCCTTCCAAGAGGAGCTTCATCGAGGATCT



CCTGTTTAACAAGGTGACCCTGGCAGACGCCGGCTTTATTA



AGCAATATGGGGATTGCCTGGGCGACATTGCTGCCAGAGA



CCTGATTTGCGCCCAGAAATTCAATGGCCTCACAGTGCTGC



CACCTCTGCTGACCGACGAGATGATCGCTCAATACACTAGC



GCACTGCTGGCCGGAACCATCACATCAGGCTGGACCTTCGG



GGCCGGAGCAGCACTGCAGATTCCATTCGCCATGCAGATG



GCCTATAGATTCAACGGCATTGGCGTCACACAGAACGTGCT



GTACGAAAACCAGAAGCTCATCGCTAACCAGTTTAATTCCG



CAATTGGAAAGATCCAAGATTCACTCAGCTCAACCGCCTCT



GCACTCGGAAAGCTGCAGGACGTGGTCAACCAGAATGCTC



AGGCCCTGAACACACTCGTCAAGCAGCTGTCCTCTAACTTT



GGCGCTATCAGCTCCGTTCTGAACGACATTCTGAGCCGCCT



GGATAAGGTGGAGGCTGAAGTCCAGATTGACCGCCTGATT



ACCGGCCGGCTGCAGTCTCTGCAAACATACGTGACCCAGC



AGCTGATCAGAGCAGCCGAGATCCGGGCATCCGCAAATCT



GGCAGCAACTAAGATGAGCGAATGCGTGCTGGGCCAGTCC



AAGCGGGTGGACTTTTGTGGCAAGGGCTACCACCTGATGA



GCTTCCCCCAGAGCGCCCCACATGGCGTTGTTTTTCTGCAC



GTGACCTATGTCCCTGCTCAGGAAAAGAACTTTACAACTGC



TCCTGCTATCTGCCATGACGGCAAGGCCCACTTCCCACGGG



AGGGAGTGTTTGTGTCCAATGGCACACACTGGTTCGTGACC



CAGAGGAACTTCTATGAACCCCAGATCATCACCACTGACA



ATACCTTCGTGTCTGGAAATTGCGACGTCGTGATCGGCATC



GTTAACAACACCGTGTACGACCCTCTCCAGCCAGAGCTGGA



CTCCTTTAAGGAGGAACTGGATAAGTATTTTAAGAACCACA



CAAGCCCAGATGTGGATCTCGGGGACATCTCCGGAATTAA



CGCCTCCGTGGTGAATATCCAGAAGGAGATTGACCGCCTA



AATGAAGTTGCCAAGAACCTCAATGAGTCTCTGATTGATCT



GCAGGAACTGGGCAAGTATGAGCAGTGA





S2 subunit of a SARS-
(SEQ ID NO: 74)


CoV-2 S protein with a

MFVFLVLLPLVSSQCSVASQSIIAYTMSLGAENSVAYSNNSIAI



signal sequence
PTNFTISVTTEILPVSMTKTSVDCTMYICGDSTECSNLLLQYGS



FCTQLNRALTGIAVEQDKNTQEVFAQVKQIYKTPPIKDFGGFN



FSQILPDPSKPSKRSFIEDLLFNKVTLADAGFIKQYGDCLGDIA



ARDLICAQKFNGLTVLPPLLTDEMIAQYTSALLAGTITSGWTF



GAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKLIANQFNSA



IGKIQDSLSSTASALGKLQDVVNQNAQALNTLVKQLSSNFGAI



SSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAE



IRASANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAPHG



VVFLHVTYVPAQEKNFTTAPAICHDGKAHFPREGVFVSNGTH



WFVTQRNFYEPQIITTDNTFVSGNCDVVIGIVNNTVYDPLQPE



LDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNE



VAKNLNESLIDLQELGKYEQ





Optimized nucleotide
(SEQ ID NO: 67)


sequence encoding the
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


full length S2 subunit of
TGTTCCGTTGCTAGTCAGTCTATTATTGCCTATACCATGAGC


a SARS-CoV-2 S protein
CTCGGAGCTGAGAATAGCGTGGCCTACTCCAATAATTCCAT


with a signal sequence
CGCAATCCCTACTAACTTCACTATTTCTGTGACCACCGAGA



TCCTGCCTGTGTCTATGACTAAGACTAGCGTTGATTGTACC



ATGTATATTTGTGGCGACTCTACCGAATGTTCTAACCTGCT



GCTTCAGTACGGCTCATTTTGCACACAGCTGAACAGAGCCC



TGACTGGGATCGCTGTGGAGCAGGACAAGAACACACAGGA



GGTGTTTGCACAGGTGAAGCAGATCTATAAGACCCCTCCTA



TTAAGGATTTCGGCGGATTCAATTTCTCACAGATTCTGCCA



GACCCCAGTAAGCCTTCCAAGAGGAGCTTCATCGAGGATCT



CCTGTTTAACAAGGTGACCCTGGCAGACGCCGGCTTTATTA



AGCAATATGGGGATTGCCTGGGCGACATTGCTGCCAGAGA



CCTGATTTGCGCCCAGAAATTCAATGGCCTCACAGTGCTGC



CACCTCTGCTGACCGACGAGATGATCGCTCAATACACTAGC



GCACTGCTGGCCGGAACCATCACATCAGGCTGGACCTTCGG



GGCCGGAGCAGCACTGCAGATTCCATTCGCCATGCAGATG



GCCTATAGATTCAACGGCATTGGCGTCACACAGAACGTGCT



GTACGAAAACCAGAAGCTCATCGCTAACCAGTTTAATTCCG



CAATTGGAAAGATCCAAGATTCACTCAGCTCAACCGCCTCT



GCACTCGGAAAGCTGCAGGACGTGGTCAACCAGAATGCTC



AGGCCCTGAACACACTCGTCAAGCAGCTGTCCTCTAACTTT



GGCGCTATCAGCTCCGTTCTGAACGACATTCTGAGCCGCCT



GGATAAGGTGGAGGCTGAAGTCCAGATTGACCGCCTGATT



ACCGGCCGGCTGCAGTCTCTGCAAACATACGTGACCCAGC



AGCTGATCAGAGCAGCCGAGATCCGGGCATCCGCAAATCT



GGCAGCAACTAAGATGAGCGAATGCGTGCTGGGCCAGTCC



AAGCGGGTGGACTTTTGTGGCAAGGGCTACCACCTGATGA



GCTTCCCCCAGAGCGCCCCACATGGCGTTGTTTTTCTGCAC



GTGACCTATGTCCCTGCTCAGGAAAAGAACTTTACAACTGC



TCCTGCTATCTGCCATGACGGCAAGGCCCACTTCCCACGGG



AGGGAGTGTTTGTGTCCAATGGCACACACTGGTTCGTGACC



CAGAGGAACTTCTATGAACCCCAGATCATCACCACTGACA



ATACCTTCGTGTCTGGAAATTGCGACGTCGTGATCGGCATC



GTTAACAACACCGTGTACGACCCTCTCCAGCCAGAGCTGGA



CTCCTTTAAGGAGGAACTGGATAAGTATTTTAAGAACCACA



CAAGCCCAGATGTGGATCTCGGGGACATCTCCGGAATTAA



CGCCTCCGTGGTGAATATCCAGAAGGAGATTGACCGCCTA



AATGAAGTTGCCAAGAACCTCAATGAGTCTCTGATTGATCT



GCAGGAACTGGGCAAGTATGAGCAGTATATCAAATGGCCC



TGGTACATTTGGCTGGGGTTTATCGCCGGACTGATTGCCAT



CGTCATGGTGACCATCATGCTGTGTTGCATGACCTCCTGTT



GTTCCTGTCTGAAGGGCTGCTGTAGTTGCGGCTCTTGCTGT



AAATTCGACGAAGATGATAGCGAGCCCGTGCTGAAGGGCG



TGAAGCTGCATTATACCTGA





Full length S2 subunit of
(SEQ ID NO: 68)


a SARS-CoV-2 S protein

MFVFLVLLPLVSSQCSVASQSIIAYTMSLGAENSVAYSNNSIAI



with a signal sequence
PTNFTISVTTEILPVSMTKTSVDCTMYICGDSTECSNLLLQYGS



FCTQLNRALTGIAVEQDKNTQEVFAQVKQIYKTPPIKDFGGFN



FSQILPDPSKPSKRSFIEDLLFNKVTLADAGFIKQYGDCLGDIA



ARDLICAQKFNGLTVLPPLLTDEMIAQYTSALLAGTITSGWTF



GAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKLIANQFNSA



IGKIQDSLSSTASALGKLQDVVNQNAQALNTLVKQLSSNFGAI



SSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAE



IRASANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAPHG



VVFLHVTYVPAQEKNFTTAPAICHDGKAHFPREGVFVSNGTH



WFVTQRNFYEPQIITTDNTFVSGNCDVVIGIVNNTVYDPLQPE



LDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNE



VAKNLNESLIDLQELGKYEQYIKWPWYIWLGFIAGLIAIVMVT



IMLCCMTSCCSCLKGCCSCGSCCKFDEDDSEPVLKGVKLHYT





Optimized nucleotide
(SEQ ID NO: 33)


sequence encoding the
ATGAGCTTCATCGAGGATCTCCTGTTTAACAAGGTGACCCT


S2′ subunit of a SARS-
GGCAGACGCCGGCTTTATTAAGCAATATGGGGATTGCCTGG


CoV-2 S protein
GCGACATTGCTGCCAGAGACCTGATTTGCGCCCAGAAATTC



AATGGCCTCACAGTGCTGCCACCTCTGCTGACCGACGAGAT



GATCGCTCAATACACTAGCGCACTGCTGGCCGGAACCATCA



CATCAGGCTGGACCTTCGGGGCCGGAGCAGCACTGCAGAT



TCCATTCGCCATGCAGATGGCCTATAGATTCAACGGCATTG



GCGTCACACAGAACGTGCTGTACGAAAACCAGAAGCTCAT



CGCTAACCAGTTTAATTCCGCAATTGGAAAGATCCAAGATT



CACTCAGCTCAACCGCCTCTGCACTCGGAAAGCTGCAGGAC



GTGGTCAACCAGAATGCTCAGGCCCTGAACACACTCGTCA



AGCAGCTGTCCTCTAACTTTGGCGCTATCAGCTCCGTTCTG



AACGACATTCTGAGCCGCCTGGATAAGGTGGAGGCTGAAG



TCCAGATTGACCGCCTGATTACCGGCCGGCTGCAGTCTCTG



CAAACATACGTGACCCAGCAGCTGATCAGAGCAGCCGAGA



TCCGGGCATCCGCAAATCTGGCAGCAACTAAGATGAGCGA



ATGCGTGCTGGGCCAGTCCAAGCGGGTGGACTTTTGTGGCA



AGGGCTACCACCTGATGAGCTTCCCCCAGAGCGCCCCACAT



GGCGTTGTTTTTCTGCACGTGACCTATGTCCCTGCTCAGGA



AAAGAACTTTACAACTGCTCCTGCTATCTGCCATGACGGCA



AGGCCCACTTCCCACGGGAGGGAGTGTTTGTGTCCAATGGC



ACACACTGGTTCGTGACCCAGAGGAACTTCTATGAACCCCA



GATCATCACCACTGACAATACCTTCGTGTCTGGAAATTGCG



ACGTCGTGATCGGCATCGTTAACAACACCGTGTACGACCCT



CTCCAGCCAGAGCTGGACTCCTTTAAGGAGGAACTGGATA



AGTATTTTAAGAACCACACAAGCCCAGATGTGGATCTCGG



GGACATCTCCGGAATTAACGCCTCCGTGGTGAATATCCAGA



AGGAGATTGACCGCCTAAATGAAGTTGCCAAGAACCTCAA



TGAGTCTCTGATTGATCTGCAGGAACTGGGCAAGTATGAGC



AGTAA





S2′ subunit of a SARS-
(SEQ ID NO: 5)


CoV-2 S protein
MSFIEDLLFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNG



LTVLPPLLTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAM



QMAYRFNGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTAS



ALGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLD



KVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATK



MSECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVP



AQEKNFTTAPAICHDGKAHFPREGVFVSNGTHWFVTQRNFYE



PQIITTDNTFVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKY



FKNHTSPDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLID



LQELGKYEQ





Optimized nucleotide
(SEQ ID NO: 97)


sequence encoding the
ATGAGCTTCATCGAGGATCTCCTGTTTAACAAGGTGACCCT


full length S2′ subunit of
GGCAGACGCCGGCTTTATTAAGCAATATGGGGATTGCCTGG


a SARS-CoV-2 S protein
GCGACATTGCTGCCAGAGACCTGATTTGCGCCCAGAAATTC



AATGGCCTCACAGTGCTGCCACCTCTGCTGACCGACGAGAT



GATCGCTCAATACACTAGCGCACTGCTGGCCGGAACCATCA



CATCAGGCTGGACCTTCGGGGCCGGAGCAGCACTGCAGAT



TCCATTCGCCATGCAGATGGCCTATAGATTCAACGGCATTG



GCGTCACACAGAACGTGCTGTACGAAAACCAGAAGCTCAT



CGCTAACCAGTTTAATTCCGCAATTGGAAAGATCCAAGATT



CACTCAGCTCAACCGCCTCTGCACTCGGAAAGCTGCAGGAC



GTGGTCAACCAGAATGCTCAGGCCCTGAACACACTCGTCA



AGCAGCTGTCCTCTAACTTTGGCGCTATCAGCTCCGTTCTG



AACGACATTCTGAGCCGCCTGGATAAGGTGGAGGCTGAAG



TCCAGATTGACCGCCTGATTACCGGCCGGCTGCAGTCTCTG



CAAACATACGTGACCCAGCAGCTGATCAGAGCAGCCGAGA



TCCGGGCATCCGCAAATCTGGCAGCAACTAAGATGAGCGA



ATGCGTGCTGGGCCAGTCCAAGCGGGTGGACTTTTGTGGCA



AGGGCTACCACCTGATGAGCTTCCCCCAGAGCGCCCCACAT



GGCGTTGTTTTTCTGCACGTGACCTATGTCCCTGCTCAGGA



AAAGAACTTTACAACTGCTCCTGCTATCTGCCATGACGGCA



AGGCCCACTTCCCACGGGAGGGAGTGTTTGTGTCCAATGGC



ACACACTGGTTCGTGACCCAGAGGAACTTCTATGAACCCCA



GATCATCACCACTGACAATACCTTCGTGTCTGGAAATTGCG



ACGTCGTGATCGGCATCGTTAACAACACCGTGTACGACCCT



CTCCAGCCAGAGCTGGACTCCTTTAAGGAGGAACTGGATA



AGTATTTTAAGAACCACACAAGCCCAGATGTGGATCTCGG



GGACATCTCCGGAATTAACGCCTCCGTGGTGAATATCCAGA



AGGAGATTGACCGCCTAAATGAAGTTGCCAAGAACCTCAA



TGAGTCTCTGATTGATCTGCAGGAACTGGGCAAGTATGAGC



AGTATATCAAATGGCCCTGGTACATTTGGCTGGGGTTTATC



GCCGGACTGATTGCCATCGTCATGGTGACCATCATGCTGTG



TTGCATGACCTCCTGTTGTTCCTGTCTGAAGGGCTGCTGTA



GTTGCGGCTCTTGCTGTAAATTCGACGAAGATGATAGCGAG



CCCGTGCTGAAGGGCGTGAAGCTGCATTATACCTGA





Full length S2′ subunit
(SEQ ID NO: 98)


of a SARS-CoV-2 S
MSFIEDLLFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNG


protein
LTVLPPLLTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAM



QMAYRFNGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTAS



ALGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLD



KVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATK



MSECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVP



AQEKNFTTAPAICHDGKAHFPREGVFVSNGTHWFVTQRNFYE



PQIITTDNTFVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKY



FKNHTSPDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLID



LQELGKYEQYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCS



CLKGCCSCGSCCKFDEDDSEPVLKGVKLHYT





Optimized nucleotide
(SEQ ID NO: 65)


sequence encoding the
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


S2′ subunit of a SARS-
TGTAGCTTCATCGAGGATCTCCTGTTTAACAAGGTGACCCT


CoV-2 S protein with a
GGCAGACGCCGGCTTTATTAAGCAATATGGGGATTGCCTGG


signal sequence
GCGACATTGCTGCCAGAGACCTGATTTGCGCCCAGAAATTC



AATGGCCTCACAGTGCTGCCACCTCTGCTGACCGACGAGAT



GATCGCTCAATACACTAGCGCACTGCTGGCCGGAACCATCA



CATCAGGCTGGACCTTCGGGGCCGGAGCAGCACTGCAGAT



TCCATTCGCCATGCAGATGGCCTATAGATTCAACGGCATTG



GCGTCACACAGAACGTGCTGTACGAAAACCAGAAGCTCAT



CGCTAACCAGTTTAATTCCGCAATTGGAAAGATCCAAGATT



CACTCAGCTCAACCGCCTCTGCACTCGGAAAGCTGCAGGAC



GTGGTCAACCAGAATGCTCAGGCCCTGAACACACTCGTCA



AGCAGCTGTCCTCTAACTTTGGCGCTATCAGCTCCGTTCTG



AACGACATTCTGAGCCGCCTGGATAAGGTGGAGGCTGAAG



TCCAGATTGACCGCCTGATTACCGGCCGGCTGCAGTCTCTG



CAAACATACGTGACCCAGCAGCTGATCAGAGCAGCCGAGA



TCCGGGCATCCGCAAATCTGGCAGCAACTAAGATGAGCGA



ATGCGTGCTGGGCCAGTCCAAGCGGGTGGACTTTTGTGGCA



AGGGCTACCACCTGATGAGCTTCCCCCAGAGCGCCCCACAT



GGCGTTGTTTTTCTGCACGTGACCTATGTCCCTGCTCAGGA



AAAGAACTTTACAACTGCTCCTGCTATCTGCCATGACGGCA



AGGCCCACTTCCCACGGGAGGGAGTGTTTGTGTCCAATGGC



ACACACTGGTTCGTGACCCAGAGGAACTTCTATGAACCCCA



GATCATCACCACTGACAATACCTTCGTGTCTGGAAATTGCG



ACGTCGTGATCGGCATCGTTAACAACACCGTGTACGACCCT



CTCCAGCCAGAGCTGGACTCCTTTAAGGAGGAACTGGATA



AGTATTTTAAGAACCACACAAGCCCAGATGTGGATCTCGG



GGACATCTCCGGAATTAACGCCTCCGTGGTGAATATCCAGA



AGGAGATTGACCGCCTAAATGAAGTTGCCAAGAACCTCAA



TGAGTCTCTGATTGATCTGCAGGAACTGGGCAAGTATGAGC



AGTAA





S2′ subunit of a SARS-
(SEQ ID NO: 66)


CoV-2 S protein with a

MFVFLVLLPLVSSQCSFIEDLLFNKVTLADAGFIKQYGDCLGDI



signal sequence
AARDLICAQKFNGLTVLPPLLTDEMIAQYTSALLAGTITSGWT



FGAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKLIANQFNS



AIGKIQDSLSSTASALGKLQDVVNQNAQALNTLVKQLSSNFG



AISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRA



AEIRASANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAP



HGVVFLHVTYVPAQEKNFTTAPAICHDGKAHFPREGVFVSNG



THWFVTQRNFYEPQIITTDNTFVSGNCDVVIGIVNNTVYDPLQ



PELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRL



NEVAKNLNESLIDLQELGKYEQQCSFIEDLLFNKVTLADAGFI



KQYGDCLGDIAARDLICAQKFNGLTVLPPLLTDEMIAQYTSAL



LAGTITSGWTFGAGAALQIPFAMQMAYRFNGIGVTQNVLYEN



QKLIANQFNSAIGKIQDSLSSTASALGKLQDVVNQNAQALNTL



VKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQT



YVTQQLIRAAEIRASANLAATKMSECVLGQSKRVDFCGKGYH



LMSFPQSAPHGVVFLHVTYVPAQEKNFTTAPAICHDGKAHFP



REGVFVSNGTHWFVTQRNFYEPQIITTDNTFVSGNCDVVIGIV



NNTVYDPLQPELDSFKEELDKYFKNHTSPDVDLGDISGINASV



VNIQKEIDRLNEVAKNLNESLIDLQELGKYEQ





Optimized nucleotide
(SEQ ID NO: 95)


sequence encoding the
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


full length S2′ subunit of
TGTAGCTTCATCGAGGATCTCCTGTTTAACAAGGTGACCCT


a SARS-CoV-2 S protein
GGCAGACGCCGGCTTTATTAAGCAATATGGGGATTGCCTGG


with a signal sequence
GCGACATTGCTGCCAGAGACCTGATTTGCGCCCAGAAATTC



AATGGCCTCACAGTGCTGCCACCTCTGCTGACCGACGAGAT



GATCGCTCAATACACTAGCGCACTGCTGGCCGGAACCATCA



CATCAGGCTGGACCTTCGGGGCCGGAGCAGCACTGCAGAT



TCCATTCGCCATGCAGATGGCCTATAGATTCAACGGCATTG



GCGTCACACAGAACGTGCTGTACGAAAACCAGAAGCTCAT



CGCTAACCAGTTTAATTCCGCAATTGGAAAGATCCAAGATT



CACTCAGCTCAACCGCCTCTGCACTCGGAAAGCTGCAGGAC



GTGGTCAACCAGAATGCTCAGGCCCTGAACACACTCGTCA



AGCAGCTGTCCTCTAACTTTGGCGCTATCAGCTCCGTTCTG



AACGACATTCTGAGCCGCCTGGATAAGGTGGAGGCTGAAG



TCCAGATTGACCGCCTGATTACCGGCCGGCTGCAGTCTCTG



CAAACATACGTGACCCAGCAGCTGATCAGAGCAGCCGAGA



TCCGGGCATCCGCAAATCTGGCAGCAACTAAGATGAGCGA



ATGCGTGCTGGGCCAGTCCAAGCGGGTGGACTTTTGTGGCA



AGGGCTACCACCTGATGAGCTTCCCCCAGAGCGCCCCACAT



GGCGTTGTTTTTCTGCACGTGACCTATGTCCCTGCTCAGGA



AAAGAACTTTACAACTGCTCCTGCTATCTGCCATGACGGCA



AGGCCCACTTCCCACGGGAGGGAGTGTTTGTGTCCAATGGC



ACACACTGGTTCGTGACCCAGAGGAACTTCTATGAACCCCA



GATCATCACCACTGACAATACCTTCGTGTCTGGAAATTGCG



ACGTCGTGATCGGCATCGTTAACAACACCGTGTACGACCCT



CTCCAGCCAGAGCTGGACTCCTTTAAGGAGGAACTGGATA



AGTATTTTAAGAACCACACAAGCCCAGATGTGGATCTCGG



GGACATCTCCGGAATTAACGCCTCCGTGGTGAATATCCAGA



AGGAGATTGACCGCCTAAATGAAGTTGCCAAGAACCTCAA



TGAGTCTCTGATTGATCTGCAGGAACTGGGCAAGTATGAGC



AGTATATCAAATGGCCCTGGTACATTTGGCTGGGGTTTATC



GCCGGACTGATTGCCATCGTCATGGTGACCATCATGCTGTG



TTGCATGACCTCCTGTTGTTCCTGTCTGAAGGGCTGCTGTA



GTTGCGGCTCTTGCTGTAAATTCGACGAAGATGATAGCGAG



CCCGTGCTGAAGGGCGTGAAGCTGCATTATACCTGA





Full length S2′ subunit
(SEQ ID NO: 96)


of a SARS-CoV-2 S

MFVFLVLLPLVSSQCSFIEDLLFNKVTLADAGFIKQYGDCLGDI



protein with a signal
AARDLICAQKFNGLTVLPPLLTDEMIAQYTSALLAGTITSGWT


sequence
FGAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKLIANQFNS



AIGKIQDSLSSTASALGKLQDVVNQNAQALNTLVKQLSSNFG



AISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRA



AEIRASANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAP



HGVVFLHVTYVPAQEKNFTTAPAICHDGKAHFPREGVFVSNG



THWFVTQRNFYEPQIITTDNTFVSGNCDVVIGIVNNTVYDPLQ



PELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRL



NEVAKNLNESLIDLQELGKYEQYIKWPWYIWLGFIAGLIAIVM



VTIMLCCMTSCCSCLKGCCSCGSCCKFDEDDSEPVLKGVKLH



YT





Optimized nucleotide
(SEQ ID NO: 34)


sequence encoding the
ATGCCTAATATCACTAACCTGTGTCCTTTCGGTGAAGTGTT


receptor-binding
CAACGCCACCAGGTTTGCTAGCGTGTATGCCTGGAACAGG


domain of a SARS-CoV-
AAGAGGATCTCTAACTGCGTCGCCGACTATTCCGTGCTGTA


2 S protein
TAACAGCGCCTCCTTCTCCACATTCAAATGCTATGGAGTGA



GCCCGACAAAACTGAACGATCTCTGCTTTACAAATGTCTAC



GCCGACTCTTTTGTGATCAGAGGGGACGAGGTCCGGCAGA



TCGCACCAGGACAGACAGGCAAGATTGCTGACTACAACTA



TAAGCTGCCTGACGACTTCACAGGATGTGTGATCGCATGGA



ACTCAAACAATCTGGACTCCAAAGTCGGGGGCAACTATAA



TTACCTGTATCGCCTGTTCCGGAAGTCCAACCTGAAGCCCT



TCGAGAGGGACATCAGTACAGAGATCTATCAGGCTGGCTC



CACCCCTTGCAATGGCGTCGAAGGCTTTAATTGTTATTTTCC



CCTGCAGTCTTACGGGTTTCAGCCTACTAATGGAGTTGGGT



ACCAGCCATACAGAGTGGTCGTGCTCAGCTTCGAGCTCCTG



CATGCTCCATAA





Receptor-binding
(SEQ ID NO: 6)


domain of a SARS-CoV-
MPNITNLCPFGEVFNATRFASVYAWNRKRISNCVADYSVLYN


2 S protein
SASFSTFKCYGVSPTKLNDLCFTNVYADSFVIRGDEVRQIAPG



QTGKIADYNYKLPDDFTGCVIAWNSNNLDSKVGGNYNYLYR



LFRKSNLKPFERDISTEIYQAGSTPCNGVEGFNCYFPLQSYGFQ



PTNGVGYQPYRVVVLSFELLHAP





Optimized nucleotide
(SEQ ID NO: 35)


sequence encoding the
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


receptor-binding
TGTCCTAATATCACTAACCTGTGTCCTTTCGGTGAAGTGTTC


domain of a SARS-CoV-
AACGCCACCAGGTTTGCTAGCGTGTATGCCTGGAACAGGA


2 S protein with a signal
AGAGGATCTCTAACTGCGTCGCCGACTATTCCGTGCTGTAT


sequence
AACAGCGCCTCCTTCTCCACATTCAAATGCTATGGAGTGAG



CCCGACAAAACTGAACGATCTCTGCTTTACAAATGTCTACG



CCGACTCTTTTGTGATCAGAGGGGACGAGGTCCGGCAGATC



GCACCAGGACAGACAGGCAAGATTGCTGACTACAACTATA



AGCTGCCTGACGACTTCACAGGATGTGTGATCGCATGGAAC



TCAAACAATCTGGACTCCAAAGTCGGGGGCAACTATAATT



ACCTGTATCGCCTGTTCCGGAAGTCCAACCTGAAGCCCTTC



GAGAGGGACATCAGTACAGAGATCTATCAGGCTGGCTCCA



CCCCTTGCAATGGCGTCGAAGGCTTTAATTGTTATTTTCCCC



TGCAGTCTTACGGGTTTCAGCCTACTAATGGAGTTGGGTAC



CAGCCATACAGAGTGGTCGTGCTCAGCTTCGAGCTCCTGCA



TGCTCCATAA





Receptor-binding
(SEQ ID NO: 8)


domain of a SARS-CoV-

MFVFLVLLPLVSSQCPNITNLCPFGEVFNATRFASVYAWNRKR



2 S protein with a signal
ISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSF


sequence
VIRGDEVRQIAPGQTGKIADYNYKLPDDFTGCVIAWNSNNLD



SKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAGSTPCNGVEG



FNCYFPLQSYGFQPTNGVGYQPYRVVVLSFELLHAP





Optimized nucleotide
(SEQ ID NO: 42)


sequence encoding a
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


SARS-CoV-2 S protein
TGTGTCAACCTGACAACTAGGACTCAGCTGCCACCAGCCTA


mutated to remove a
CACCAACTCCTTCACCAGAGGCGTGTATTACCCAGACAAGG


furin cleavage site
TGTTTAGAAGCAGCGTGCTGCACTCTACCCAGGACCTCTTT



CTGCCCTTTTTCAGCAACGTGACATGGTTTCACGCAATTCA



CGTGTCCGGCACTAATGGCACAAAGCGGTTCGACAATCCA



GTCCTGCCTTTCAACGATGGCGTCTACTTTGCATCTACTGA



GAAATCCAATATCATTAGGGGATGGATCTTCGGCACAACCC



TGGATTCTAAGACCCAGAGCCTGCTGATCGTCAACAACGCC



ACAAACGTGGTCATTAAGGTTTGCGAGTTTCAGTTCTGTAA



CGATCCTTTTCTGGGCGTGTATTATCATAAGAACAATAAGA



GCTGGATGGAGTCCGAGTTTAGAGTGTATAGCTCTGCAAAT



AATTGTACCTTTGAGTACGTGAGCCAGCCCTTTCTGATGGA



CCTGGAGGGAAAACAAGGAAACTTCAAAAACCTGCGGGAA



TTCGTTTTCAAAAACATCGACGGCTATTTCAAGATCTATAG



CAAGCATACCCCAATCAACCTCGTGAGGGACCTCCCCCAG



GGCTTTAGCGCACTGGAGCCACTGGTTGACCTGCCTATCGG



CATTAATATCACAAGATTTCAGACCCTGCTGGCACTGCATA



GAAGCTATCTGACCCCTGGAGACTCCTCTAGTGGGTGGACT



GCCGGCGCCGCTGCCTACTATGTGGGCTATCTGCAGCCACG



GACATTCCTGCTGAAATACAATGAGAACGGGACAATCACA



GATGCTGTTGATTGCGCACTCGACCCCCTGTCCGAGACAAA



GTGCACTCTCAAGAGCTTTACCGTCGAGAAGGGCATCTATC



AGACCTCAAACTTCAGGGTGCAGCCCACAGAATCTATCGTG



CGCTTCCCTAATATCACTAACCTGTGTCCTTTCGGTGAAGT



GTTCAACGCCACCAGGTTTGCTAGCGTGTATGCCTGGAACA



GGAAGAGGATCTCTAACTGCGTCGCCGACTATTCCGTGCTG



TATAACAGCGCCTCCTTCTCCACATTCAAATGCTATGGAGT



GAGCCCGACAAAACTGAACGATCTCTGCTTTACAAATGTCT



ACGCCGACTCTTTTGTGATCAGAGGGGACGAGGTCCGGCA



GATCGCACCAGGACAGACAGGCAAGATTGCTGACTACAAC



TATAAGCTGCCTGACGACTTCACAGGATGTGTGATCGCATG



GAACTCAAACAATCTGGACTCCAAAGTCGGGGGCAACTAT



AATTACCTGTATCGCCTGTTCCGGAAGTCCAACCTGAAGCC



CTTCGAGAGGGACATCAGTACAGAGATCTATCAGGCTGGC



TCCACCCCTTGCAATGGCGTCGAAGGCTTTAATTGTTATTTT



CCCCTGCAGTCTTACGGGTTTCAGCCTACTAATGGAGTTGG



GTACCAGCCATACAGAGTGGTCGTGCTCAGCTTCGAGCTCC



TGCATGCTCCAGCTACAGTTTGCGGGCCAAAGAAGTCCACT



AACCTGGTGAAGAATAAGTGCGTCAACTTCAACTTTAACGG



GCTCACCGGCACCGGCGTGCTGACTGAGAGCAACAAGAAG



TTTCTGCCATTTCAACAGTTTGGACGGGACATTGCCGACAC



CACCGATGCCGTTCGGGATCCACAGACCCTGGAAATTCTGG



ACATTACACCGTGCAGCTTCGGGGGCGTGAGCGTGATCAC



ACCCGGAACCAATACAAGCAACCAGGTTGCCGTCCTGTATC



AGGATGTCAATTGCACAGAAGTGCCAGTTGCTATCCACGCA



GACCAGCTGACTCCCACATGGCGGGTGTATAGCACCGGAT



CCAACGTGTTTCAGACCCGCGCCGGATGTCTCATTGGGGCC



GAGCACGTGAATAACAGCTACGAGTGCGACATCCCCATTG



GCGCCGGCATTTGTGCGTCTTACCAGACTCAGACCAACTCT



CCTGGCTCCGCCTCTTCCGTTGCTAGTCAGTCTATTATTGCC



TATACCATGAGCCTCGGAGCTGAGAATAGCGTGGCCTACTC



CAATAATTCCATCGCAATCCCTACTAACTTCACTATTTCTGT



GACCACCGAGATCCTGCCTGTGTCTATGACTAAGACTAGCG



TTGATTGTACCATGTATATTTGTGGCGACTCTACCGAATGTT



CTAACCTGCTGCTTCAGTACGGCTCATTTTGCACACAGCTG



AACAGAGCCCTGACTGGGATCGCTGTGGAGCAGGACAAGA



ACACACAGGAGGTGTTTGCACAGGTGAAGCAGATCTATAA



GACCCCTCCTATTAAGGATTTCGGCGGATTCAATTTCTCAC



AGATTCTGCCAGACCCCAGTAAGCCTTCCAAGAGGAGCTTC



ATCGAGGATCTCCTGTTTAACAAGGTGACCCTGGCAGACGC



CGGCTTTATTAAGCAATATGGGGATTGCCTGGGCGACATTG



CTGCCAGAGACCTGATTTGCGCCCAGAAATTCAATGGCCTC



ACAGTGCTGCCACCTCTGCTGACCGACGAGATGATCGCTCA



ATACACTAGCGCACTGCTGGCCGGAACCATCACATCAGGCT



GGACCTTCGGGGCCGGAGCAGCACTGCAGATTCCATTCGCC



ATGCAGATGGCCTATAGATTCAACGGCATTGGCGTCACACA



GAACGTGCTGTACGAAAACCAGAAGCTCATCGCTAACCAG



TTTAATTCCGCAATTGGAAAGATCCAAGATTCACTCAGCTC



AACCGCCTCTGCACTCGGAAAGCTGCAGGACGTGGTCAAC



CAGAATGCTCAGGCCCTGAACACACTCGTCAAGCAGCTGTC



CTCTAACTTTGGCGCTATCAGCTCCGTTCTGAACGACATTCT



GAGCCGCCTGGATAAGGTGGAGGCTGAAGTCCAGATTGAC



CGCCTGATTACCGGCCGGCTGCAGTCTCTGCAAACATACGT



GACCCAGCAGCTGATCAGAGCAGCCGAGATCCGGGCATCC



GCAAATCTGGCAGCAACTAAGATGAGCGAATGCGTGCTGG



GCCAGTCCAAGCGGGTGGACTTTTGTGGCAAGGGCTACCA



CCTGATGAGCTTCCCCCAGAGCGCCCCACATGGCGTTGTTT



TTCTGCACGTGACCTATGTCCCTGCTCAGGAAAAGAACTTT



ACAACTGCTCCTGCTATCTGCCATGACGGCAAGGCCCACTT



CCCACGGGAGGGAGTGTTTGTGTCCAATGGCACACACTGGT



TCGTGACCCAGAGGAACTTCTATGAACCCCAGATCATCACC



ACTGACAATACCTTCGTGTCTGGAAATTGCGACGTCGTGAT



CGGCATCGTTAACAACACCGTGTACGACCCTCTCCAGCCAG



AGCTGGACTCCTTTAAGGAGGAACTGGATAAGTATTTTAAG



AACCACACAAGCCCAGATGTGGATCTCGGGGACATCTCCG



GAATTAACGCCTCCGTGGTGAATATCCAGAAGGAGATTGA



CCGCCTAAATGAAGTTGCCAAGAACCTCAATGAGTCTCTGA



TTGATCTGCAGGAACTGGGCAAGTATGAGCAGTATATCAA



ATGGCCCTGGTACATTTGGCTGGGGTTTATCGCCGGACTGA



TTGCCATCGTCATGGTGACCATCATGCTGTGTTGCATGACC



TCCTGTTGTTCCTGTCTGAAGGGCTGCTGTAGTTGCGGCTCT



TGCTGTAAATTCGACGAAGATGATAGCGAGCCCGTGCTGA



AGGGCGTGAAGCTGCATTATACCTGA





SARS-CoV-2 S protein
(SEQ ID NO: 9)


mutated to remove a
MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVF


furin cleavage site
RSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFDNPVLPF



NDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKV



CEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVS



QPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRD



LPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWT



AGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKC



TLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATR



FASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLN



DLCFTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTG



CVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQ



AGSTPCNGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFE



LLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKK



FLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTN



TSNQVAVLYQDVNCTEVPVAIHADQLTPTWRVYSTGSNVFQT



RAGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSPGSASSVAS



QSIIAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEILPVSMTKT



SVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDK



NTQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSFIEDL



LFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPL



LTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQMAYRF



NGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQ



DVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDKVEAEV



QIDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVL



GQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNF



TTAPAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTD



NTFVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTS



PDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGK



YEQYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCC



SCGSCCKFDEDDSEPVLKGVKLHYT





Optimized nucleotide
(SEQ ID NO: 43)


sequence encoding a
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


SARS-CoV-2 S protein
TGTGTCAACCTGACAACTAGGACTCAGCTGCCACCAGCCTA


with residues 986 and
CACCAACTCCTTCACCAGAGGCGTGTATTACCCAGACAAGG


987 mutated to proline
TGTTTAGAAGCAGCGTGCTGCACTCTACCCAGGACCTCTTT



CTGCCCTTTTTCAGCAACGTGACATGGTTTCACGCAATTCA



CGTGTCCGGCACTAATGGCACAAAGCGGTTCGACAATCCA



GTCCTGCCTTTCAACGATGGCGTCTACTTTGCATCTACTGA



GAAATCCAATATCATTAGGGGATGGATCTTCGGCACAACCC



TGGATTCTAAGACCCAGAGCCTGCTGATCGTCAACAACGCC



ACAAACGTGGTCATTAAGGTTTGCGAGTTTCAGTTCTGTAA



CGATCCTTTTCTGGGCGTGTATTATCATAAGAACAATAAGA



GCTGGATGGAGTCCGAGTTTAGAGTGTATAGCTCTGCAAAT



AATTGTACCTTTGAGTACGTGAGCCAGCCCTTTCTGATGGA



CCTGGAGGGAAAACAAGGAAACTTCAAAAACCTGCGGGAA



TTCGTTTTCAAAAACATCGACGGCTATTTCAAGATCTATAG



CAAGCATACCCCAATCAACCTCGTGAGGGACCTCCCCCAG



GGCTTTAGCGCACTGGAGCCACTGGTTGACCTGCCTATCGG



CATTAATATCACAAGATTTCAGACCCTGCTGGCACTGCATA



GAAGCTATCTGACCCCTGGAGACTCCTCTAGTGGGTGGACT



GCCGGCGCCGCTGCCTACTATGTGGGCTATCTGCAGCCACG



GACATTCCTGCTGAAATACAATGAGAACGGGACAATCACA



GATGCTGTTGATTGCGCACTCGACCCCCTGTCCGAGACAAA



GTGCACTCTCAAGAGCTTTACCGTCGAGAAGGGCATCTATC



AGACCTCAAACTTCAGGGTGCAGCCCACAGAATCTATCGTG



CGCTTCCCTAATATCACTAACCTGTGTCCTTTCGGTGAAGT



GTTCAACGCCACCAGGTTTGCTAGCGTGTATGCCTGGAACA



GGAAGAGGATCTCTAACTGCGTCGCCGACTATTCCGTGCTG



TATAACAGCGCCTCCTTCTCCACATTCAAATGCTATGGAGT



GAGCCCGACAAAACTGAACGATCTCTGCTTTACAAATGTCT



ACGCCGACTCTTTTGTGATCAGAGGGGACGAGGTCCGGCA



GATCGCACCAGGACAGACAGGCAAGATTGCTGACTACAAC



TATAAGCTGCCTGACGACTTCACAGGATGTGTGATCGCATG



GAACTCAAACAATCTGGACTCCAAAGTCGGGGGCAACTAT



AATTACCTGTATCGCCTGTTCCGGAAGTCCAACCTGAAGCC



CTTCGAGAGGGACATCAGTACAGAGATCTATCAGGCTGGC



TCCACCCCTTGCAATGGCGTCGAAGGCTTTAATTGTTATTTT



CCCCTGCAGTCTTACGGGTTTCAGCCTACTAATGGAGTTGG



GTACCAGCCATACAGAGTGGTCGTGCTCAGCTTCGAGCTCC



TGCATGCTCCAGCTACAGTTTGCGGGCCAAAGAAGTCCACT



AACCTGGTGAAGAATAAGTGCGTCAACTTCAACTTTAACGG



GCTCACCGGCACCGGCGTGCTGACTGAGAGCAACAAGAAG



TTTCTGCCATTTCAACAGTTTGGACGGGACATTGCCGACAC



CACCGATGCCGTTCGGGATCCACAGACCCTGGAAATTCTGG



ACATTACACCGTGCAGCTTCGGGGGCGTGAGCGTGATCAC



ACCCGGAACCAATACAAGCAACCAGGTTGCCGTCCTGTATC



AGGATGTCAATTGCACAGAAGTGCCAGTTGCTATCCACGCA



GACCAGCTGACTCCCACATGGCGGGTGTATAGCACCGGAT



CCAACGTGTTTCAGACCCGCGCCGGATGTCTCATTGGGGCC



GAGCACGTGAATAACAGCTACGAGTGCGACATCCCCATTG



GCGCCGGCATTTGTGCGTCTTACCAGACTCAGACCAACTCT



CCTAGAAGGGCCAGGTCCGTTGCTAGTCAGTCTATTATTGC



CTATACCATGAGCCTCGGAGCTGAGAATAGCGTGGCCTACT



CCAATAATTCCATCGCAATCCCTACTAACTTCACTATTTCTG



TGACCACCGAGATCCTGCCTGTGTCTATGACTAAGACTAGC



GTTGATTGTACCATGTATATTTGTGGCGACTCTACCGAATG



TTCTAACCTGCTGCTTCAGTACGGCTCATTTTGCACACAGCT



GAACAGAGCCCTGACTGGGATCGCTGTGGAGCAGGACAAG



AACACACAGGAGGTGTTTGCACAGGTGAAGCAGATCTATA



AGACCCCTCCTATTAAGGATTTCGGCGGATTCAATTTCTCA



CAGATTCTGCCAGACCCCAGTAAGCCTTCCAAGAGGAGCTT



CATCGAGGATCTCCTGTTTAACAAGGTGACCCTGGCAGACG



CCGGCTTTATTAAGCAATATGGGGATTGCCTGGGCGACATT



GCTGCCAGAGACCTGATTTGCGCCCAGAAATTCAATGGCCT



CACAGTGCTGCCACCTCTGCTGACCGACGAGATGATCGCTC



AATACACTAGCGCACTGCTGGCCGGAACCATCACATCAGG



CTGGACCTTCGGGGCCGGAGCAGCACTGCAGATTCCATTCG



CCATGCAGATGGCCTATAGATTCAACGGCATTGGCGTCACA



CAGAACGTGCTGTACGAAAACCAGAAGCTCATCGCTAACC



AGTTTAATTCCGCAATTGGAAAGATCCAAGATTCACTCAGC



TCAACCGCCTCTGCACTCGGAAAGCTGCAGGACGTGGTCA



ACCAGAATGCTCAGGCCCTGAACACACTCGTCAAGCAGCT



GTCCTCTAACTTTGGCGCTATCAGCTCCGTTCTGAACGACA



TTCTGAGCCGCCTGGATCCCCCAGAGGCTGAAGTCCAGATT



GACCGCCTGATTACCGGCCGGCTGCAGTCTCTGCAAACATA



CGTGACCCAGCAGCTGATCAGAGCAGCCGAGATCCGGGCA



TCCGCAAATCTGGCAGCAACTAAGATGAGCGAATGCGTGC



TGGGCCAGTCCAAGCGGGTGGACTTTTGTGGCAAGGGCTA



CCACCTGATGAGCTTCCCCCAGAGCGCCCCACATGGCGTTG



TTTTTCTGCACGTGACCTATGTCCCTGCTCAGGAAAAGAAC



TTTACAACTGCTCCTGCTATCTGCCATGACGGCAAGGCCCA



CTTCCCACGGGAGGGAGTGTTTGTGTCCAATGGCACACACT



GGTTCGTGACCCAGAGGAACTTCTATGAACCCCAGATCATC



ACCACTGACAATACCTTCGTGTCTGGAAATTGCGACGTCGT



GATCGGCATCGTTAACAACACCGTGTACGACCCTCTCCAGC



CAGAGCTGGACTCCTTTAAGGAGGAACTGGATAAGTATTTT



AAGAACCACACAAGCCCAGATGTGGATCTCGGGGACATCT



CCGGAATTAACGCCTCCGTGGTGAATATCCAGAAGGAGAT



TGACCGCCTAAATGAAGTTGCCAAGAACCTCAATGAGTCTC



TGATTGATCTGCAGGAACTGGGCAAGTATGAGCAGTATATC



AAATGGCCCTGGTACATTTGGCTGGGGTTTATCGCCGGACT



GATTGCCATCGTCATGGTGACCATCATGCTGTGTTGCATGA



CCTCCTGTTGTTCCTGTCTGAAGGGCTGCTGTAGTTGCGGCT



CTTGCTGTAAATTCGACGAAGATGATAGCGAGCCCGTGCTG



AAGGGCGTGAAGCTGCATTATACCTGA





SARS-CoV-2 S protein
(SEQ ID NO: 10)


with residues 986 and

MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVF



987 mutated to proline
RSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFDNPVLPF



NDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKV



CEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVS



QPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRD



LPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWT



AGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKC



TLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATR



FASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLN



DLCFTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTG



CVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQ



AGSTPCNGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFE



LLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKK



FLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTN



TSNQVAVLYQDVNCTEVPVAIHADQLTPTWRVYSTGSNVFQT



RAGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSPRRARSVA



SQSIIAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEILPVSMTK



TSVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDK



NTQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSFIEDL



LFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPL



LTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQMAYRF



NGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQ



DVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDPPEAEVQ



IDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLG



QSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTT



APAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTDNT



FVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTSPD



VDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYE



QYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCCSC



GSCCKFDEDDSEPVLKGVKLHYT





Optimized nucleotide
(SEQ ID NO: 44)


sequence encoding a
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


SARS-CoV-2 S protein
TGTGTCAACCTGACAACTAGGACTCAGCTGCCACCAGCCTA


mutated to remove a
CACCAACTCCTTCACCAGAGGCGTGTATTACCCAGACAAGG


furin cleavage site and
TGTTTAGAAGCAGCGTGCTGCACTCTACCCAGGACCTCTTT


to replace residues 986
CTGCCCTTTTTCAGCAACGTGACATGGTTTCACGCAATTCA


and 987 with proline
CGTGTCCGGCACTAATGGCACAAAGCGGTTCGACAATCCA



GTCCTGCCTTTCAACGATGGCGTCTACTTTGCATCTACTGA



GAAATCCAATATCATTAGGGGATGGATCTTCGGCACAACCC



TGGATTCTAAGACCCAGAGCCTGCTGATCGTCAACAACGCC



ACAAACGTGGTCATTAAGGTTTGCGAGTTTCAGTTCTGTAA



CGATCCTTTTCTGGGCGTGTATTATCATAAGAACAATAAGA



GCTGGATGGAGTCCGAGTTTAGAGTGTATAGCTCTGCAAAT



AATTGTACCTTTGAGTACGTGAGCCAGCCCTTTCTGATGGA



CCTGGAGGGAAAACAAGGAAACTTCAAAAACCTGCGGGAA



TTCGTTTTCAAAAACATCGACGGCTATTTCAAGATCTATAG



CAAGCATACCCCAATCAACCTCGTGAGGGACCTCCCCCAG



GGCTTTAGCGCACTGGAGCCACTGGTTGACCTGCCTATCGG



CATTAATATCACAAGATTTCAGACCCTGCTGGCACTGCATA



GAAGCTATCTGACCCCTGGAGACTCCTCTAGTGGGTGGACT



GCCGGCGCCGCTGCCTACTATGTGGGCTATCTGCAGCCACG



GACATTCCTGCTGAAATACAATGAGAACGGGACAATCACA



GATGCTGTTGATTGCGCACTCGACCCCCTGTCCGAGACAAA



GTGCACTCTCAAGAGCTTTACCGTCGAGAAGGGCATCTATC



AGACCTCAAACTTCAGGGTGCAGCCCACAGAATCTATCGTG



CGCTTCCCTAATATCACTAACCTGTGTCCTTTCGGTGAAGT



GTTCAACGCCACCAGGTTTGCTAGCGTGTATGCCTGGAACA



GGAAGAGGATCTCTAACTGCGTCGCCGACTATTCCGTGCTG



TATAACAGCGCCTCCTTCTCCACATTCAAATGCTATGGAGT



GAGCCCGACAAAACTGAACGATCTCTGCTTTACAAATGTCT



ACGCCGACTCTTTTGTGATCAGAGGGGACGAGGTCCGGCA



GATCGCACCAGGACAGACAGGCAAGATTGCTGACTACAAC



TATAAGCTGCCTGACGACTTCACAGGATGTGTGATCGCATG



GAACTCAAACAATCTGGACTCCAAAGTCGGGGGCAACTAT



AATTACCTGTATCGCCTGTTCCGGAAGTCCAACCTGAAGCC



CTTCGAGAGGGACATCAGTACAGAGATCTATCAGGCTGGC



TCCACCCCTTGCAATGGCGTCGAAGGCTTTAATTGTTATTTT



CCCCTGCAGTCTTACGGGTTTCAGCCTACTAATGGAGTTGG



GTACCAGCCATACAGAGTGGTCGTGCTCAGCTTCGAGCTCC



TGCATGCTCCAGCTACAGTTTGCGGGCCAAAGAAGTCCACT



AACCTGGTGAAGAATAAGTGCGTCAACTTCAACTTTAACGG



GCTCACCGGCACCGGCGTGCTGACTGAGAGCAACAAGAAG



TTTCTGCCATTTCAACAGTTTGGACGGGACATTGCCGACAC



CACCGATGCCGTTCGGGATCCACAGACCCTGGAAATTCTGG



ACATTACACCGTGCAGCTTCGGGGGCGTGAGCGTGATCAC



ACCCGGAACCAATACAAGCAACCAGGTTGCCGTCCTGTATC



AGGATGTCAATTGCACAGAAGTGCCAGTTGCTATCCACGCA



GACCAGCTGACTCCCACATGGCGGGTGTATAGCACCGGAT



CCAACGTGTTTCAGACCCGCGCCGGATGTCTCATTGGGGCC



GAGCACGTGAATAACAGCTACGAGTGCGACATCCCCATTG



GCGCCGGCATTTGTGCGTCTTACCAGACTCAGACCAACTCT



CCTGGCTCCGCCTCTTCCGTTGCTAGTCAGTCTATTATTGCC



TATACCATGAGCCTCGGAGCTGAGAATAGCGTGGCCTACTC



CAATAATTCCATCGCAATCCCTACTAACTTCACTATTTCTGT



GACCACCGAGATCCTGCCTGTGTCTATGACTAAGACTAGCG



TTGATTGTACCATGTATATTTGTGGCGACTCTACCGAATGTT



CTAACCTGCTGCTTCAGTACGGCTCATTTTGCACACAGCTG



AACAGAGCCCTGACTGGGATCGCTGTGGAGCAGGACAAGA



ACACACAGGAGGTGTTTGCACAGGTGAAGCAGATCTATAA



GACCCCTCCTATTAAGGATTTCGGCGGATTCAATTTCTCAC



AGATTCTGCCAGACCCCAGTAAGCCTTCCAAGAGGAGCTTC



ATCGAGGATCTCCTGTTTAACAAGGTGACCCTGGCAGACGC



CGGCTTTATTAAGCAATATGGGGATTGCCTGGGCGACATTG



CTGCCAGAGACCTGATTTGCGCCCAGAAATTCAATGGCCTC



ACAGTGCTGCCACCTCTGCTGACCGACGAGATGATCGCTCA



ATACACTAGCGCACTGCTGGCCGGAACCATCACATCAGGCT



GGACCTTCGGGGCCGGAGCAGCACTGCAGATTCCATTCGCC



ATGCAGATGGCCTATAGATTCAACGGCATTGGCGTCACACA



GAACGTGCTGTACGAAAACCAGAAGCTCATCGCTAACCAG



TTTAATTCCGCAATTGGAAAGATCCAAGATTCACTCAGCTC



AACCGCCTCTGCACTCGGAAAGCTGCAGGACGTGGTCAAC



CAGAATGCTCAGGCCCTGAACACACTCGTCAAGCAGCTGTC



CTCTAACTTTGGCGCTATCAGCTCCGTTCTGAACGACATTCT



GAGCCGCCTGGATCCCCCAGAGGCTGAAGTCCAGATTGAC



CGCCTGATTACCGGCCGGCTGCAGTCTCTGCAAACATACGT



GACCCAGCAGCTGATCAGAGCAGCCGAGATCCGGGCATCC



GCAAATCTGGCAGCAACTAAGATGAGCGAATGCGTGCTGG



GCCAGTCCAAGCGGGTGGACTTTTGTGGCAAGGGCTACCA



CCTGATGAGCTTCCCCCAGAGCGCCCCACATGGCGTTGTTT



TTCTGCACGTGACCTATGTCCCTGCTCAGGAAAAGAACTTT



ACAACTGCTCCTGCTATCTGCCATGACGGCAAGGCCCACTT



CCCACGGGAGGGAGTGTTTGTGTCCAATGGCACACACTGGT



TCGTGACCCAGAGGAACTTCTATGAACCCCAGATCATCACC



ACTGACAATACCTTCGTGTCTGGAAATTGCGACGTCGTGAT



CGGCATCGTTAACAACACCGTGTACGACCCTCTCCAGCCAG



AGCTGGACTCCTTTAAGGAGGAACTGGATAAGTATTTTAAG



AACCACACAAGCCCAGATGTGGATCTCGGGGACATCTCCG



GAATTAACGCCTCCGTGGTGAATATCCAGAAGGAGATTGA



CCGCCTAAATGAAGTTGCCAAGAACCTCAATGAGTCTCTGA



TTGATCTGCAGGAACTGGGCAAGTATGAGCAGTATATCAA



ATGGCCCTGGTACATTTGGCTGGGGTTTATCGCCGGACTGA



TTGCCATCGTCATGGTGACCATCATGCTGTGTTGCATGACC



TCCTGTTGTTCCTGTCTGAAGGGCTGCTGTAGTTGCGGCTCT



TGCTGTAAATTCGACGAAGATGATAGCGAGCCCGTGCTGA



AGGGCGTGAAGCTGCATTATACCTGA





SARS-CoV-2 S protein
(SEQ ID NO: 11)


mutated to remove a
MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVF


furin cleavage site and
RSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFDNPVLPF


to replace residues 986
NDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKV


and 987 with proline
CEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVS



QPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRD



LPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWT



AGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKC



TLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATR



FASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLN



DLCFTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTG



CVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQ



AGSTPCNGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFE



LLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKK



FLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTN



TSNQVAVLYQDVNCTEVPVAIHADQLTPTWRVYSTGSNVFQT



RAGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSPGSASSVAS



QSIIAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEILPVSMTKT



SVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDK



NTQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSFIEDL



LFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPL



LTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQMAYRF



NGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQ



DVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDPPEAEVQ



IDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLG



QSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTT



APAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTDNT



FVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTSPD



VDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYE



QYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCCSC



GSCCKFDEDDSEPVLKGVKLHYT





Optimized nucleotide
(SEQ ID NO: 45)


sequence encoding the
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


ectodomain of a SARS-
TGTGTCAACCTGACAACTAGGACTCAGCTGCCACCAGCCTA


CoV-2 S protein
CACCAACTCCTTCACCAGAGGCGTGTATTACCCAGACAAGG


mutated to remove a
TGTTTAGAAGCAGCGTGCTGCACTCTACCCAGGACCTCTTT


furin cleavage site and
CTGCCCTTTTTCAGCAACGTGACATGGTTTCACGCAATTCA


to replace residues 986
CGTGTCCGGCACTAATGGCACAAAGCGGTTCGACAATCCA


and 987 with proline
GTCCTGCCTTTCAACGATGGCGTCTACTTTGCATCTACTGA



GAAATCCAATATCATTAGGGGATGGATCTTCGGCACAACCC



TGGATTCTAAGACCCAGAGCCTGCTGATCGTCAACAACGCC



ACAAACGTGGTCATTAAGGTTTGCGAGTTTCAGTTCTGTAA



CGATCCTTTTCTGGGCGTGTATTATCATAAGAACAATAAGA



GCTGGATGGAGTCCGAGTTTAGAGTGTATAGCTCTGCAAAT



AATTGTACCTTTGAGTACGTGAGCCAGCCCTTTCTGATGGA



CCTGGAGGGAAAACAAGGAAACTTCAAAAACCTGCGGGAA



TTCGTTTTCAAAAACATCGACGGCTATTTCAAGATCTATAG



CAAGCATACCCCAATCAACCTCGTGAGGGACCTCCCCCAG



GGCTTTAGCGCACTGGAGCCACTGGTTGACCTGCCTATCGG



CATTAATATCACAAGATTTCAGACCCTGCTGGCACTGCATA



GAAGCTATCTGACCCCTGGAGACTCCTCTAGTGGGTGGACT



GCCGGCGCCGCTGCCTACTATGTGGGCTATCTGCAGCCACG



GACATTCCTGCTGAAATACAATGAGAACGGGACAATCACA



GATGCTGTTGATTGCGCACTCGACCCCCTGTCCGAGACAAA



GTGCACTCTCAAGAGCTTTACCGTCGAGAAGGGCATCTATC



AGACCTCAAACTTCAGGGTGCAGCCCACAGAATCTATCGTG



CGCTTCCCTAATATCACTAACCTGTGTCCTTTCGGTGAAGT



GTTCAACGCCACCAGGTTTGCTAGCGTGTATGCCTGGAACA



GGAAGAGGATCTCTAACTGCGTCGCCGACTATTCCGTGCTG



TATAACAGCGCCTCCTTCTCCACATTCAAATGCTATGGAGT



GAGCCCGACAAAACTGAACGATCTCTGCTTTACAAATGTCT



ACGCCGACTCTTTTGTGATCAGAGGGGACGAGGTCCGGCA



GATCGCACCAGGACAGACAGGCAAGATTGCTGACTACAAC



TATAAGCTGCCTGACGACTTCACAGGATGTGTGATCGCATG



GAACTCAAACAATCTGGACTCCAAAGTCGGGGGCAACTAT



AATTACCTGTATCGCCTGTTCCGGAAGTCCAACCTGAAGCC



CTTCGAGAGGGACATCAGTACAGAGATCTATCAGGCTGGC



TCCACCCCTTGCAATGGCGTCGAAGGCTTTAATTGTTATTTT



CCCCTGCAGTCTTACGGGTTTCAGCCTACTAATGGAGTTGG



GTACCAGCCATACAGAGTGGTCGTGCTCAGCTTCGAGCTCC



TGCATGCTCCAGCTACAGTTTGCGGGCCAAAGAAGTCCACT



AACCTGGTGAAGAATAAGTGCGTCAACTTCAACTTTAACGG



GCTCACCGGCACCGGCGTGCTGACTGAGAGCAACAAGAAG



TTTCTGCCATTTCAACAGTTTGGACGGGACATTGCCGACAC



CACCGATGCCGTTCGGGATCCACAGACCCTGGAAATTCTGG



ACATTACACCGTGCAGCTTCGGGGGCGTGAGCGTGATCAC



ACCCGGAACCAATACAAGCAACCAGGTTGCCGTCCTGTATC



AGGATGTCAATTGCACAGAAGTGCCAGTTGCTATCCACGCA



GACCAGCTGACTCCCACATGGCGGGTGTATAGCACCGGAT



CCAACGTGTTTCAGACCCGCGCCGGATGTCTCATTGGGGCC



GAGCACGTGAATAACAGCTACGAGTGCGACATCCCCATTG



GCGCCGGCATTTGTGCGTCTTACCAGACTCAGACCAACTCT



CCTGGCTCCGCCTCTTCCGTTGCTAGTCAGTCTATTATTGCC



TATACCATGAGCCTCGGAGCTGAGAATAGCGTGGCCTACTC



CAATAATTCCATCGCAATCCCTACTAACTTCACTATTTCTGT



GACCACCGAGATCCTGCCTGTGTCTATGACTAAGACTAGCG



TTGATTGTACCATGTATATTTGTGGCGACTCTACCGAATGTT



CTAACCTGCTGCTTCAGTACGGCTCATTTTGCACACAGCTG



AACAGAGCCCTGACTGGGATCGCTGTGGAGCAGGACAAGA



ACACACAGGAGGTGTTTGCACAGGTGAAGCAGATCTATAA



GACCCCTCCTATTAAGGATTTCGGCGGATTCAATTTCTCAC



AGATTCTGCCAGACCCCAGTAAGCCTTCCAAGAGGAGCTTC



ATCGAGGATCTCCTGTTTAACAAGGTGACCCTGGCAGACGC



CGGCTTTATTAAGCAATATGGGGATTGCCTGGGCGACATTG



CTGCCAGAGACCTGATTTGCGCCCAGAAATTCAATGGCCTC



ACAGTGCTGCCACCTCTGCTGACCGACGAGATGATCGCTCA



ATACACTAGCGCACTGCTGGCCGGAACCATCACATCAGGCT



GGACCTTCGGGGCCGGAGCAGCACTGCAGATTCCATTCGCC



ATGCAGATGGCCTATAGATTCAACGGCATTGGCGTCACACA



GAACGTGCTGTACGAAAACCAGAAGCTCATCGCTAACCAG



TTTAATTCCGCAATTGGAAAGATCCAAGATTCACTCAGCTC



AACCGCCTCTGCACTCGGAAAGCTGCAGGACGTGGTCAAC



CAGAATGCTCAGGCCCTGAACACACTCGTCAAGCAGCTGTC



CTCTAACTTTGGCGCTATCAGCTCCGTTCTGAACGACATTCT



GAGCCGCCTGGATCCCCCAGAGGCTGAAGTCCAGATTGAC



CGCCTGATTACCGGCCGGCTGCAGTCTCTGCAAACATACGT



GACCCAGCAGCTGATCAGAGCAGCCGAGATCCGGGCATCC



GCAAATCTGGCAGCAACTAAGATGAGCGAATGCGTGCTGG



GCCAGTCCAAGCGGGTGGACTTTTGTGGCAAGGGCTACCA



CCTGATGAGCTTCCCCCAGAGCGCCCCACATGGCGTTGTTT



TTCTGCACGTGACCTATGTCCCTGCTCAGGAAAAGAACTTT



ACAACTGCTCCTGCTATCTGCCATGACGGCAAGGCCCACTT



CCCACGGGAGGGAGTGTTTGTGTCCAATGGCACACACTGGT



TCGTGACCCAGAGGAACTTCTATGAACCCCAGATCATCACC



ACTGACAATACCTTCGTGTCTGGAAATTGCGACGTCGTGAT



CGGCATCGTTAACAACACCGTGTACGACCCTCTCCAGCCAG



AGCTGGACTCCTTTAAGGAGGAACTGGATAAGTATTTTAAG



AACCACACAAGCCCAGATGTGGATCTCGGGGACATCTCCG



GAATTAACGCCTCCGTGGTGAATATCCAGAAGGAGATTGA



CCGCCTAAATGAAGTTGCCAAGAACCTCAATGAGTCTCTGA



TTGATCTGCAGGAACTGGGCAAGTATGAGCAGTGA





SARS-CoV-2 S protein
(SEQ ID NO: 118)


with residues 986 and
MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVF


987 mutated to proline,
RSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFDNPVLPF


and to contain the
NDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKV


D614G mutation
CEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVS



QPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRD



LPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWT



AGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKC



TLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATR



FASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLN



DLCFTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTG



CVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQ



AGSTPCNGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFE



LLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKK



FLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTN



TSNQVAVLYQDVNCTEVPVAIHADQLTPTWRVYSTGSNVFQT



RAGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSPRRARSVA



SQSIIAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEILPVSMTK



TSVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDK



NTQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSFIEDL



LFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPL



LTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQMAYRF



NGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQ



DVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDPPEAEVQ



IDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLG



QSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTT



APAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTDNT



FVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTSPD



VDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYE



QYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCCSC



GSCCKFDEDDSEPVLKGVKLHYT





Optimized nucleotide
(SEQ ID NO: 119)


sequence encoding
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


SARS-CoV-2 S protein
TGTGTCAACCTGACAACTAGGACTCAGCTGCCACCAGCCTA


with residues 986 and
CACCAACTCCTTCACCAGAGGCGTGTATTACCCAGACAAGG


987 mutated to proline,
TGTTTAGAAGCAGCGTGCTGCACTCTACCCAGGACCTCTTT


and to contain the
CTGCCCTTTTTCAGCAACGTGACATGGTTTCACGCAATTCA


D614G mutation*
CGTGTCCGGCACTAATGGCACAAAGCGGTTCGACAATCCA


*underlined residues
GTCCTGCCTTTCAACGATGGCGTCTACTTTGCATCTACTGA


correspond to D614G
GAAATCCAATATCATTAGGGGATGGATCTTCGGCACAACCC


mutation location
TGGATTCTAAGACCCAGAGCCTGCTGATCGTCAACAACGCC



ACAAACGTGGTCATTAAGGTTTGCGAGTTTCAGTTCTGTAA



CGATCCTTTTCTGGGCGTGTATTATCATAAGAACAATAAGA



GCTGGATGGAGTCCGAGTTTAGAGTGTATAGCTCTGCAAAT



AATTGTACCTTTGAGTACGTGAGCCAGCCCTTTCTGATGGA



CCTGGAGGGAAAACAAGGAAACTTCAAAAACCTGCGGGAA



TTCGTTTTCAAAAACATCGACGGCTATTTCAAGATCTATAG



CAAGCATACCCCAATCAACCTCGTGAGGGACCTCCCCCAG



GGCTTTAGCGCACTGGAGCCACTGGTTGACCTGCCTATCGG



CATTAATATCACAAGATTTCAGACCCTGCTGGCACTGCATA



GAAGCTATCTGACCCCTGGAGACTCCTCTAGTGGGTGGACT



GCCGGCGCCGCTGCCTACTATGTGGGCTATCTGCAGCCACG



GACATTCCTGCTGAAATACAATGAGAACGGGACAATCACA



GATGCTGTTGATTGCGCACTCGACCCCCTGTCCGAGACAAA



GTGCACTCTCAAGAGCTTTACCGTCGAGAAGGGCATCTATC



AGACCTCAAACTTCAGGGTGCAGCCCACAGAATCTATCGTG



CGCTTCCCTAATATCACTAACCTGTGTCCTTTCGGTGAAGT



GTTCAACGCCACCAGGTTTGCTAGCGTGTATGCCTGGAACA



GGAAGAGGATCTCTAACTGCGTCGCCGACTATTCCGTGCTG



TATAACAGCGCCTCCTTCTCCACATTCAAATGCTATGGAGT



GAGCCCGACAAAACTGAACGATCTCTGCTTTACAAATGTCT



ACGCCGACTCTTTTGTGATCAGAGGGGACGAGGTCCGGCA



GATCGCACCAGGACAGACAGGCAAGATTGCTGACTACAAC



TATAAGCTGCCTGACGACTTCACAGGATGTGTGATCGCATG



GAACTCAAACAATCTGGACTCCAAAGTCGGGGGCAACTAT



AATTACCTGTATCGCCTGTTCCGGAAGTCCAACCTGAAGCC



CTTCGAGAGGGACATCAGTACAGAGATCTATCAGGCTGGC



TCCACCCCTTGCAATGGCGTCGAAGGCTTTAATTGTTATTTT



CCCCTGCAGTCTTACGGGTTTCAGCCTACTAATGGAGTTGG



GTACCAGCCATACAGAGTGGTCGTGCTCAGCTTCGAGCTCC



TGCATGCTCCAGCTACAGTTTGCGGGCCAAAGAAGTCCACT



AACCTGGTGAAGAATAAGTGCGTCAACTTCAACTTTAACGG



GCTCACCGGCACCGGCGTGCTGACTGAGAGCAACAAGAAG



TTTCTGCCATTTCAACAGTTTGGACGGGACATTGCCGACAC



CACCGATGCCGTTCGGGATCCACAGACCCTGGAAATTCTGG



ACATTACACCGTGCAGCTTCGGGGGCGTGAGCGTGATCAC



ACCCGGAACCAATACAAGCAACCAGGTTGCCGTCCTGTATC



AGGGCGTCAATTGCACAGAAGTGCCAGTTGCTATCCACGC



AGACCAGCTGACTCCCACATGGCGGGTGTATAGCACCGGA



TCCAACGTGTTTCAGACCCGCGCCGGATGTCTCATTGGGGC



CGAGCACGTGAATAACAGCTACGAGTGCGACATCCCCATT



GGCGCCGGCATTTGTGCGTCTTACCAGACTCAGACCAACTC



TCCTAGAAGGGCCAGGTCCGTTGCTAGTCAGTCTATTATTG



CCTATACCATGAGCCTCGGAGCTGAGAATAGCGTGGCCTAC



TCCAATAATTCCATCGCAATCCCTACTAACTTCACTATTTCT



GTGACCACCGAGATCCTGCCTGTGTCTATGACTAAGACTAG



CGTTGATTGTACCATGTATATTTGTGGCGACTCTACCGAAT



GTTCTAACCTGCTGCTTCAGTACGGCTCATTTTGCACACAG



CTGAACAGAGCCCTGACTGGGATCGCTGTGGAGCAGGACA



AGAACACACAGGAGGTGTTTGCACAGGTGAAGCAGATCTA



TAAGACCCCTCCTATTAAGGATTTCGGCGGATTCAATTTCT



CACAGATTCTGCCAGACCCCAGTAAGCCTTCCAAGAGGAG



CTTCATCGAGGATCTCCTGTTTAACAAGGTGACCCTGGCAG



ACGCCGGCTTTATTAAGCAATATGGGGATTGCCTGGGCGAC



ATTGCTGCCAGAGACCTGATTTGCGCCCAGAAATTCAATGG



CCTCACAGTGCTGCCACCTCTGCTGACCGACGAGATGATCG



CTCAATACACTAGCGCACTGCTGGCCGGAACCATCACATCA



GGCTGGACCTTCGGGGCCGGAGCAGCACTGCAGATTCCATT



CGCCATGCAGATGGCCTATAGATTCAACGGCATTGGCGTCA



CACAGAACGTGCTGTACGAAAACCAGAAGCTCATCGCTAA



CCAGTTTAATTCCGCAATTGGAAAGATCCAAGATTCACTCA



GCTCAACCGCCTCTGCACTCGGAAAGCTGCAGGACGTGGTC



AACCAGAATGCTCAGGCCCTGAACACACTCGTCAAGCAGC



TGTCCTCTAACTTTGGCGCTATCAGCTCCGTTCTGAACGAC



ATTCTGAGCCGCCTGGATCCCCCAGAGGCTGAAGTCCAGAT



TGACCGCCTGATTACCGGCCGGCTGCAGTCTCTGCAAACAT



ACGTGACCCAGCAGCTGATCAGAGCAGCCGAGATCCGGGC



ATCCGCAAATCTGGCAGCAACTAAGATGAGCGAATGCGTG



CTGGGCCAGTCCAAGCGGGTGGACTTTTGTGGCAAGGGCT



ACCACCTGATGAGCTTCCCCCAGAGCGCCCCACATGGCGTT



GTTTTTCTGCACGTGACCTATGTCCCTGCTCAGGAAAAGAA



CTTTACAACTGCTCCTGCTATCTGCCATGACGGCAAGGCCC



ACTTCCCACGGGAGGGAGTGTTTGTGTCCAATGGCACACAC



TGGTTCGTGACCCAGAGGAACTTCTATGAACCCCAGATCAT



CACCACTGACAATACCTTCGTGTCTGGAAATTGCGACGTCG



TGATCGGCATCGTTAACAACACCGTGTACGACCCTCTCCAG



CCAGAGCTGGACTCCTTTAAGGAGGAACTGGATAAGTATTT



TAAGAACCACACAAGCCCAGATGTGGATCTCGGGGACATC



TCCGGAATTAACGCCTCCGTGGTGAATATCCAGAAGGAGA



TTGACCGCCTAAATGAAGTTGCCAAGAACCTCAATGAGTCT



CTGATTGATCTGCAGGAACTGGGCAAGTATGAGCAGTATAT



CAAATGGCCCTGGTACATTTGGCTGGGGTTTATCGCCGGAC



TGATTGCCATCGTCATGGTGACCATCATGCTGTGTTGCATG



ACCTCCTGTTGTTCCTGTCTGAAGGGCTGCTGTAGTTGCGG



CTCTTGCTGTAAATTCGACGAAGATGATAGCGAGCCCGTGC



TGAAGGGCGTGAAGCTGCATTATACCTGA





SARS-CoV-2 S protein
(SEQ ID NO: 120)


mutated to remove a
MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVF


furin cleavage site and
RSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFDNPVLPF


to replace residues 986
NDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKV


and 987 with proline,
CEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVS


and to contain the
QPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRD


D614G mutation
LPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWT



AGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKC



TLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATR



FASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLN



DLCFTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTG



CVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQ



AGSTPCNGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFE



LLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKK



FLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTN



TSNQVAVLYQGVNCTEVPVAIHADQLTPTWRVYSTGSNVFQT



RAGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSPGSASSVAS



QSIIAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEILPVSMTKT



SVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDK



NTQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSFIEDL



LFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPL



LTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQMAYRF



NGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQ



DVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDPPEAEVQ



IDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLG



QSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTT



APAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTDNT



FVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTSPD



VDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYE



QYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCCSC



GSCCKFDEDDSEPVLKGVKLHYT





Optimized nucleotide
(SEQ ID NO: 121)


sequence encoding
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


SARS-CoV-2 S protein
TGTGTCAACCTGACAACTAGGACTCAGCTGCCACCAGCCTA


mutated to remove a
CACCAACTCCTTCACCAGAGGCGTGTATTACCCAGACAAGG


furin cleavage site and
TGTTTAGAAGCAGCGTGCTGCACTCTACCCAGGACCTCTTT


to replace residues 986
CTGCCCTTTTTCAGCAACGTGACATGGTTTCACGCAATTCA


and 987 with proline,
CGTGTCCGGCACTAATGGCACAAAGCGGTTCGACAATCCA


and to contain the
GTCCTGCCTTTCAACGATGGCGTCTACTTTGCATCTACTGA


D614G mutation*
GAAATCCAATATCATTAGGGGATGGATCTTCGGCACAACCC


*underlined residues
TGGATTCTAAGACCCAGAGCCTGCTGATCGTCAACAACGCC


correspond to D614G
ACAAACGTGGTCATTAAGGTTTGCGAGTTTCAGTTCTGTAA


mutation location
CGATCCTTTTCTGGGCGTGTATTATCATAAGAACAATAAGA



GCTGGATGGAGTCCGAGTTTAGAGTGTATAGCTCTGCAAAT



AATTGTACCTTTGAGTACGTGAGCCAGCCCTTTCTGATGGA



CCTGGAGGGAAAACAAGGAAACTTCAAAAACCTGCGGGAA



TTCGTTTTCAAAAACATCGACGGCTATTTCAAGATCTATAG



CAAGCATACCCCAATCAACCTCGTGAGGGACCTCCCCCAG



GGCTTTAGCGCACTGGAGCCACTGGTTGACCTGCCTATCGG



CATTAATATCACAAGATTTCAGACCCTGCTGGCACTGCATA



GAAGCTATCTGACCCCTGGAGACTCCTCTAGTGGGTGGACT



GCCGGCGCCGCTGCCTACTATGTGGGCTATCTGCAGCCACG



GACATTCCTGCTGAAATACAATGAGAACGGGACAATCACA



GATGCTGTTGATTGCGCACTCGACCCCCTGTCCGAGACAAA



GTGCACTCTCAAGAGCTTTACCGTCGAGAAGGGCATCTATC



AGACCTCAAACTTCAGGGTGCAGCCCACAGAATCTATCGTG



CGCTTCCCTAATATCACTAACCTGTGTCCTTTCGGTGAAGT



GTTCAACGCCACCAGGTTTGCTAGCGTGTATGCCTGGAACA



GGAAGAGGATCTCTAACTGCGTCGCCGACTATTCCGTGCTG



TATAACAGCGCCTCCTTCTCCACATTCAAATGCTATGGAGT



GAGCCCGACAAAACTGAACGATCTCTGCTTTACAAATGTCT



ACGCCGACTCTTTTGTGATCAGAGGGGACGAGGTCCGGCA



GATCGCACCAGGACAGACAGGCAAGATTGCTGACTACAAC



TATAAGCTGCCTGACGACTTCACAGGATGTGTGATCGCATG



GAACTCAAACAATCTGGACTCCAAAGTCGGGGGCAACTAT



AATTACCTGTATCGCCTGTTCCGGAAGTCCAACCTGAAGCC



CTTCGAGAGGGACATCAGTACAGAGATCTATCAGGCTGGC



TCCACCCCTTGCAATGGCGTCGAAGGCTTTAATTGTTATTTT



CCCCTGCAGTCTTACGGGTTTCAGCCTACTAATGGAGTTGG



GTACCAGCCATACAGAGTGGTCGTGCTCAGCTTCGAGCTCC



TGCATGCTCCAGCTACAGTTTGCGGGCCAAAGAAGTCCACT



AACCTGGTGAAGAATAAGTGCGTCAACTTCAACTTTAACGG



GCTCACCGGCACCGGCGTGCTGACTGAGAGCAACAAGAAG



TTTCTGCCATTTCAACAGTTTGGACGGGACATTGCCGACAC



CACCGATGCCGTTCGGGATCCACAGACCCTGGAAATTCTGG



ACATTACACCGTGCAGCTTCGGGGGCGTGAGCGTGATCAC



ACCCGGAACCAATACAAGCAACCAGGTTGCCGTCCTGTATC



AGGGCGTCAATTGCACAGAAGTGCCAGTTGCTATCCACGC



AGACCAGCTGACTCCCACATGGCGGGTGTATAGCACCGGA



TCCAACGTGTTTCAGACCCGCGCCGGATGTCTCATTGGGGC



CGAGCACGTGAATAACAGCTACGAGTGCGACATCCCCATT



GGCGCCGGCATTTGTGCGTCTTACCAGACTCAGACCAACTC



TCCTGGCTCCGCCTCTTCCGTTGCTAGTCAGTCTATTATTGC



CTATACCATGAGCCTCGGAGCTGAGAATAGCGTGGCCTACT



CCAATAATTCCATCGCAATCCCTACTAACTTCACTATTTCTG



TGACCACCGAGATCCTGCCTGTGTCTATGACTAAGACTAGC



GTTGATTGTACCATGTATATTTGTGGCGACTCTACCGAATG



TTCTAACCTGCTGCTTCAGTACGGCTCATTTTGCACACAGCT



GAACAGAGCCCTGACTGGGATCGCTGTGGAGCAGGACAAG



AACACACAGGAGGTGTTTGCACAGGTGAAGCAGATCTATA



AGACCCCTCCTATTAAGGATTTCGGCGGATTCAATTTCTCA



CAGATTCTGCCAGACCCCAGTAAGCCTTCCAAGAGGAGCTT



CATCGAGGATCTCCTGTTTAACAAGGTGACCCTGGCAGACG



CCGGCTTTATTAAGCAATATGGGGATTGCCTGGGCGACATT



GCTGCCAGAGACCTGATTTGCGCCCAGAAATTCAATGGCCT



CACAGTGCTGCCACCTCTGCTGACCGACGAGATGATCGCTC



AATACACTAGCGCACTGCTGGCCGGAACCATCACATCAGG



CTGGACCTTCGGGGCCGGAGCAGCACTGCAGATTCCATTCG



CCATGCAGATGGCCTATAGATTCAACGGCATTGGCGTCACA



CAGAACGTGCTGTACGAAAACCAGAAGCTCATCGCTAACC



AGTTTAATTCCGCAATTGGAAAGATCCAAGATTCACTCAGC



TCAACCGCCTCTGCACTCGGAAAGCTGCAGGACGTGGTCA



ACCAGAATGCTCAGGCCCTGAACACACTCGTCAAGCAGCT



GTCCTCTAACTTTGGCGCTATCAGCTCCGTTCTGAACGACA



TTCTGAGCCGCCTGGATCCCCCAGAGGCTGAAGTCCAGATT



GACCGCCTGATTACCGGCCGGCTGCAGTCTCTGCAAACATA



CGTGACCCAGCAGCTGATCAGAGCAGCCGAGATCCGGGCA



TCCGCAAATCTGGCAGCAACTAAGATGAGCGAATGCGTGC



TGGGCCAGTCCAAGCGGGTGGACTTTTGTGGCAAGGGCTA



CCACCTGATGAGCTTCCCCCAGAGCGCCCCACATGGCGTTG



TTTTTCTGCACGTGACCTATGTCCCTGCTCAGGAAAAGAAC



TTTACAACTGCTCCTGCTATCTGCCATGACGGCAAGGCCCA



CTTCCCACGGGAGGGAGTGTTTGTGTCCAATGGCACACACT



GGTTCGTGACCCAGAGGAACTTCTATGAACCCCAGATCATC



ACCACTGACAATACCTTCGTGTCTGGAAATTGCGACGTCGT



GATCGGCATCGTTAACAACACCGTGTACGACCCTCTCCAGC



CAGAGCTGGACTCCTTTAAGGAGGAACTGGATAAGTATTTT



AAGAACCACACAAGCCCAGATGTGGATCTCGGGGACATCT



CCGGAATTAACGCCTCCGTGGTGAATATCCAGAAGGAGAT



TGACCGCCTAAATGAAGTTGCCAAGAACCTCAATGAGTCTC



TGATTGATCTGCAGGAACTGGGCAAGTATGAGCAGTATATC



AAATGGCCCTGGTACATTTGGCTGGGGTTTATCGCCGGACT



GATTGCCATCGTCATGGTGACCATCATGCTGTGTTGCATGA



CCTCCTGTTGTTCCTGTCTGAAGGGCTGCTGTAGTTGCGGCT



CTTGCTGTAAATTCGACGAAGATGATAGCGAGCCCGTGCTG



AAGGGCGTGAAGCTGCATTATACCTGA





Ectodomain of a SARS-
(SEQ ID NO: 12)


CoV-2 S protein
MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVF


mutated to remove a
RSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFDNPVLPF


furin cleavage site and
NDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKV


to replace residues 986
CEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVS


and 987 with proline
QPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRD



LPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWT



AGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKC



TLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATR



FASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLN



DLCFTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTG



CVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQ



AGSTPCNGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFE



LLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKK



FLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTN



TSNQVAVLYQDVNCTEVPVAIHADQLTPTWRVYSTGSNVFQT



RAGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSPGSASSVAS



QSIIAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEILPVSMTKT



SVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDK



NTQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSFIEDL



LFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPL



LTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQMAYRF



NGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQ



DVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDPPEAEVQ



IDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLG



QSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTT



APAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTDNT



FVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTSPD



VDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYE



Q





Optimized nucleotide
(SEQ ID NO: 46)


sequence encoding the
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


ectodomain of the
TGTGTCAACCTGACAACTAGGACTCAGCTGCCACCAGCCTA


SARS-CoV-2 S protein
CACCAACTCCTTCACCAGAGGCGTGTATTACCCAGACAAGG


with a Foldon
TGTTTAGAAGCAGCGTGCTGCACTCTACCCAGGACCTCTTT



CTGCCCTTTTTCAGCAACGTGACATGGTTTCACGCAATTCA



CGTGTCCGGCACTAATGGCACAAAGCGGTTCGACAATCCA



GTCCTGCCTTTCAACGATGGCGTCTACTTTGCATCTACTGA



GAAATCCAATATCATTAGGGGATGGATCTTCGGCACAACCC



TGGATTCTAAGACCCAGAGCCTGCTGATCGTCAACAACGCC



ACAAACGTGGTCATTAAGGTTTGCGAGTTTCAGTTCTGTAA



CGATCCTTTTCTGGGCGTGTATTATCATAAGAACAATAAGA



GCTGGATGGAGTCCGAGTTTAGAGTGTATAGCTCTGCAAAT



AATTGTACCTTTGAGTACGTGAGCCAGCCCTTTCTGATGGA



CCTGGAGGGAAAACAAGGAAACTTCAAAAACCTGCGGGAA



TTCGTTTTCAAAAACATCGACGGCTATTTCAAGATCTATAG



CAAGCATACCCCAATCAACCTCGTGAGGGACCTCCCCCAG



GGCTTTAGCGCACTGGAGCCACTGGTTGACCTGCCTATCGG



CATTAATATCACAAGATTTCAGACCCTGCTGGCACTGCATA



GAAGCTATCTGACCCCTGGAGACTCCTCTAGTGGGTGGACT



GCCGGCGCCGCTGCCTACTATGTGGGCTATCTGCAGCCACG



GACATTCCTGCTGAAATACAATGAGAACGGGACAATCACA



GATGCTGTTGATTGCGCACTCGACCCCCTGTCCGAGACAAA



GTGCACTCTCAAGAGCTTTACCGTCGAGAAGGGCATCTATC



AGACCTCAAACTTCAGGGTGCAGCCCACAGAATCTATCGTG



CGCTTCCCTAATATCACTAACCTGTGTCCTTTCGGTGAAGT



GTTCAACGCCACCAGGTTTGCTAGCGTGTATGCCTGGAACA



GGAAGAGGATCTCTAACTGCGTCGCCGACTATTCCGTGCTG



TATAACAGCGCCTCCTTCTCCACATTCAAATGCTATGGAGT



GAGCCCGACAAAACTGAACGATCTCTGCTTTACAAATGTCT



ACGCCGACTCTTTTGTGATCAGAGGGGACGAGGTCCGGCA



GATCGCACCAGGACAGACAGGCAAGATTGCTGACTACAAC



TATAAGCTGCCTGACGACTTCACAGGATGTGTGATCGCATG



GAACTCAAACAATCTGGACTCCAAAGTCGGGGGCAACTAT



AATTACCTGTATCGCCTGTTCCGGAAGTCCAACCTGAAGCC



CTTCGAGAGGGACATCAGTACAGAGATCTATCAGGCTGGC



TCCACCCCTTGCAATGGCGTCGAAGGCTTTAATTGTTATTTT



CCCCTGCAGTCTTACGGGTTTCAGCCTACTAATGGAGTTGG



GTACCAGCCATACAGAGTGGTCGTGCTCAGCTTCGAGCTCC



TGCATGCTCCAGCTACAGTTTGCGGGCCAAAGAAGTCCACT



AACCTGGTGAAGAATAAGTGCGTCAACTTCAACTTTAACGG



GCTCACCGGCACCGGCGTGCTGACTGAGAGCAACAAGAAG



TTTCTGCCATTTCAACAGTTTGGACGGGACATTGCCGACAC



CACCGATGCCGTTCGGGATCCACAGACCCTGGAAATTCTGG



ACATTACACCGTGCAGCTTCGGGGGCGTGAGCGTGATCAC



ACCCGGAACCAATACAAGCAACCAGGTTGCCGTCCTGTATC



AGGATGTCAATTGCACAGAAGTGCCAGTTGCTATCCACGCA



GACCAGCTGACTCCCACATGGCGGGTGTATAGCACCGGAT



CCAACGTGTTTCAGACCCGCGCCGGATGTCTCATTGGGGCC



GAGCACGTGAATAACAGCTACGAGTGCGACATCCCCATTG



GCGCCGGCATTTGTGCGTCTTACCAGACTCAGACCAACTCT



CCTAGAAGGGCCAGGTCCGTTGCTAGTCAGTCTATTATTGC



CTATACCATGAGCCTCGGAGCTGAGAATAGCGTGGCCTACT



CCAATAATTCCATCGCAATCCCTACTAACTTCACTATTTCTG



TGACCACCGAGATCCTGCCTGTGTCTATGACTAAGACTAGC



GTTGATTGTACCATGTATATTTGTGGCGACTCTACCGAATG



TTCTAACCTGCTGCTTCAGTACGGCTCATTTTGCACACAGCT



GAACAGAGCCCTGACTGGGATCGCTGTGGAGCAGGACAAG



AACACACAGGAGGTGTTTGCACAGGTGAAGCAGATCTATA



AGACCCCTCCTATTAAGGATTTCGGCGGATTCAATTTCTCA



CAGATTCTGCCAGACCCCAGTAAGCCTTCCAAGAGGAGCTT



CATCGAGGATCTCCTGTTTAACAAGGTGACCCTGGCAGACG



CCGGCTTTATTAAGCAATATGGGGATTGCCTGGGCGACATT



GCTGCCAGAGACCTGATTTGCGCCCAGAAATTCAATGGCCT



CACAGTGCTGCCACCTCTGCTGACCGACGAGATGATCGCTC



AATACACTAGCGCACTGCTGGCCGGAACCATCACATCAGG



CTGGACCTTCGGGGCCGGAGCAGCACTGCAGATTCCATTCG



CCATGCAGATGGCCTATAGATTCAACGGCATTGGCGTCACA



CAGAACGTGCTGTACGAAAACCAGAAGCTCATCGCTAACC



AGTTTAATTCCGCAATTGGAAAGATCCAAGATTCACTCAGC



TCAACCGCCTCTGCACTCGGAAAGCTGCAGGACGTGGTCA



ACCAGAATGCTCAGGCCCTGAACACACTCGTCAAGCAGCT



GTCCTCTAACTTTGGCGCTATCAGCTCCGTTCTGAACGACA



TTCTGAGCCGCCTGGATAAGGTGGAGGCTGAAGTCCAGATT



GACCGCCTGATTACCGGCCGGCTGCAGTCTCTGCAAACATA



CGTGACCCAGCAGCTGATCAGAGCAGCCGAGATCCGGGCA



TCCGCAAATCTGGCAGCAACTAAGATGAGCGAATGCGTGC



TGGGCCAGTCCAAGCGGGTGGACTTTTGTGGCAAGGGCTA



CCACCTGATGAGCTTCCCCCAGAGCGCCCCACATGGCGTTG



TTTTTCTGCACGTGACCTATGTCCCTGCTCAGGAAAAGAAC



TTTACAACTGCTCCTGCTATCTGCCATGACGGCAAGGCCCA



CTTCCCACGGGAGGGAGTGTTTGTGTCCAATGGCACACACT



GGTTCGTGACCCAGAGGAACTTCTATGAACCCCAGATCATC



ACCACTGACAATACCTTCGTGTCTGGAAATTGCGACGTCGT



GATCGGCATCGTTAACAACACCGTGTACGACCCTCTCCAGC



CAGAGCTGGACTCCTTTAAGGAGGAACTGGATAAGTATTTT



AAGAACCACACAAGCCCAGATGTGGATCTCGGGGACATCT



CCGGAATTAACGCCTCCGTGGTGAATATCCAGAAGGAGAT



TGACCGCCTAAATGAAGTTGCCAAGAACCTCAATGAGTCTC



TGATTGATCTGCAGGAACTGGGCAAGTATGAGCAGGGGTA



CATTCCCGAGGCTCCTAGGGACGGCCAGGCATACGTGCGC



AAAGACGGCGAGTGGGTGCTGCTGTCCACATTCCTGTAA





Ectodomain of a SARS-
(SEQ ID NO: 14)


CoV-2 S protein with a
MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVF


Foldon
RSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFDNPVLPF



NDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKV



CEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVS



QPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRD



LPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWT



AGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKC



TLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATR



FASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLN



DLCFTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTG



CVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQ



AGSTPCNGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFE



LLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKK



FLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTN



TSNQVAVLYQDVNCTEVPVAIHADQLTPTWRVYSTGSNVFQT



RAGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSPRRARSVA



SQSIIAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEILPVSMTK



TSVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDK



NTQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSFIEDL



LFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPL



LTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQMAYRF



NGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQ



DVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDKVEAEV



QIDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVL



GQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNF



TTAPAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTD



NTFVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTS



PDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGK



YEQGYIPEAPRDGQAYVRKDGEWVLLSTFL





Optimized nucleotide
(SEQ ID NO: 47)


sequence encoding the
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


S2′ subunit of a SARS-
TGTAGCTTCATCGAGGATCTCCTGTTTAACAAGGTGACCCT


CoV-2 S protein with a
GGCAGACGCCGGCTTTATTAAGCAATATGGGGATTGCCTGG


Foldon and a signal
GCGACATTGCTGCCAGAGACCTGATTTGCGCCCAGAAATTC


sequence
AATGGCCTCACAGTGCTGCCACCTCTGCTGACCGACGAGAT



GATCGCTCAATACACTAGCGCACTGCTGGCCGGAACCATCA



CATCAGGCTGGACCTTCGGGGCCGGAGCAGCACTGCAGAT



TCCATTCGCCATGCAGATGGCCTATAGATTCAACGGCATTG



GCGTCACACAGAACGTGCTGTACGAAAACCAGAAGCTCAT



CGCTAACCAGTTTAATTCCGCAATTGGAAAGATCCAAGATT



CACTCAGCTCAACCGCCTCTGCACTCGGAAAGCTGCAGGAC



GTGGTCAACCAGAATGCTCAGGCCCTGAACACACTCGTCA



AGCAGCTGTCCTCTAACTTTGGCGCTATCAGCTCCGTTCTG



AACGACATTCTGAGCCGCCTGGATAAGGTGGAGGCTGAAG



TCCAGATTGACCGCCTGATTACCGGCCGGCTGCAGTCTCTG



CAAACATACGTGACCCAGCAGCTGATCAGAGCAGCCGAGA



TCCGGGCATCCGCAAATCTGGCAGCAACTAAGATGAGCGA



ATGCGTGCTGGGCCAGTCCAAGCGGGTGGACTTTTGTGGCA



AGGGCTACCACCTGATGAGCTTCCCCCAGAGCGCCCCACAT



GGCGTTGTTTTTCTGCACGTGACCTATGTCCCTGCTCAGGA



AAAGAACTTTACAACTGCTCCTGCTATCTGCCATGACGGCA



AGGCCCACTTCCCACGGGAGGGAGTGTTTGTGTCCAATGGC



ACACACTGGTTCGTGACCCAGAGGAACTTCTATGAACCCCA



GATCATCACCACTGACAATACCTTCGTGTCTGGAAATTGCG



ACGTCGTGATCGGCATCGTTAACAACACCGTGTACGACCCT



CTCCAGCCAGAGCTGGACTCCTTTAAGGAGGAACTGGATA



AGTATTTTAAGAACCACACAAGCCCAGATGTGGATCTCGG



GGACATCTCCGGAATTAACGCCTCCGTGGTGAATATCCAGA



AGGAGATTGACCGCCTAAATGAAGTTGCCAAGAACCTCAA



TGAGTCTCTGATTGATCTGCAGGAACTGGGCAAGTATGAGC



AGGGGTACATTCCCGAGGCTCCTAGGGACGGCCAGGCATA



CGTGCGCAAAGACGGCGAGTGGGTGCTGCTGTCCACATTCC



TGTAA





S2′ subunit of a SARS-
(SEQ ID NO: 15)


CoV-2 S protein with a

MFVFLVLLPLVSSQCSFIEDLLFNKVTLADAGFIKQYGDCLGDI



Foldon and a signal
AARDLICAQKFNGLTVLPPLLTDEMIAQYTSALLAGTITSGWT


sequence
FGAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKLIANQFNS



AIGKIQDSLSSTASALGKLQDVVNQNAQALNTLVKQLSSNFG



AISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRA



AEIRASANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAP



HGVVFLHVTYVPAQEKNFTTAPAICHDGKAHFPREGVFVSNG



THWFVTQRNFYEPQIITTDNTFVSGNCDVVIGIVNNTVYDPLQ



PELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRL



NEVAKNLNESLIDLQELGKYEQGYIPEAPRDGQAYVRKDGEWV




LLSTFL






Optimized nucleotide
(SEQ ID NO: 48)


sequence encoding the
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


ectodomain of a SARS-
TGTGTCAACCTGACAACTAGGACTCAGCTGCCACCAGCCTA


CoV-2 S protein, which
CACCAACTCCTTCACCAGAGGCGTGTATTACCCAGACAAGG


has been modified by
TGTTTAGAAGCAGCGTGCTGCACTCTACCCAGGACCTCTTT


mutating residues 986
CTGCCCTTTTTCAGCAACGTGACATGGTTTCACGCAATTCA


and 987 to proline, with
CGTGTCCGGCACTAATGGCACAAAGCGGTTCGACAATCCA


a Foldon
GTCCTGCCTTTCAACGATGGCGTCTACTTTGCATCTACTGA



GAAATCCAATATCATTAGGGGATGGATCTTCGGCACAACCC



TGGATTCTAAGACCCAGAGCCTGCTGATCGTCAACAACGCC



ACAAACGTGGTCATTAAGGTTTGCGAGTTTCAGTTCTGTAA



CGATCCTTTTCTGGGCGTGTATTATCATAAGAACAATAAGA



GCTGGATGGAGTCCGAGTTTAGAGTGTATAGCTCTGCAAAT



AATTGTACCTTTGAGTACGTGAGCCAGCCCTTTCTGATGGA



CCTGGAGGGAAAACAAGGAAACTTCAAAAACCTGCGGGAA



TTCGTTTTCAAAAACATCGACGGCTATTTCAAGATCTATAG



CAAGCATACCCCAATCAACCTCGTGAGGGACCTCCCCCAG



GGCTTTAGCGCACTGGAGCCACTGGTTGACCTGCCTATCGG



CATTAATATCACAAGATTTCAGACCCTGCTGGCACTGCATA



GAAGCTATCTGACCCCTGGAGACTCCTCTAGTGGGTGGACT



GCCGGCGCCGCTGCCTACTATGTGGGCTATCTGCAGCCACG



GACATTCCTGCTGAAATACAATGAGAACGGGACAATCACA



GATGCTGTTGATTGCGCACTCGACCCCCTGTCCGAGACAAA



GTGCACTCTCAAGAGCTTTACCGTCGAGAAGGGCATCTATC



AGACCTCAAACTTCAGGGTGCAGCCCACAGAATCTATCGTG



CGCTTCCCTAATATCACTAACCTGTGTCCTTTCGGTGAAGT



GTTCAACGCCACCAGGTTTGCTAGCGTGTATGCCTGGAACA



GGAAGAGGATCTCTAACTGCGTCGCCGACTATTCCGTGCTG



TATAACAGCGCCTCCTTCTCCACATTCAAATGCTATGGAGT



GAGCCCGACAAAACTGAACGATCTCTGCTTTACAAATGTCT



ACGCCGACTCTTTTGTGATCAGAGGGGACGAGGTCCGGCA



GATCGCACCAGGACAGACAGGCAAGATTGCTGACTACAAC



TATAAGCTGCCTGACGACTTCACAGGATGTGTGATCGCATG



GAACTCAAACAATCTGGACTCCAAAGTCGGGGGCAACTAT



AATTACCTGTATCGCCTGTTCCGGAAGTCCAACCTGAAGCC



CTTCGAGAGGGACATCAGTACAGAGATCTATCAGGCTGGC



TCCACCCCTTGCAATGGCGTCGAAGGCTTTAATTGTTATTTT



CCCCTGCAGTCTTACGGGTTTCAGCCTACTAATGGAGTTGG



GTACCAGCCATACAGAGTGGTCGTGCTCAGCTTCGAGCTCC



TGCATGCTCCAGCTACAGTTTGCGGGCCAAAGAAGTCCACT



AACCTGGTGAAGAATAAGTGCGTCAACTTCAACTTTAACGG



GCTCACCGGCACCGGCGTGCTGACTGAGAGCAACAAGAAG



TTTCTGCCATTTCAACAGTTTGGACGGGACATTGCCGACAC



CACCGATGCCGTTCGGGATCCACAGACCCTGGAAATTCTGG



ACATTACACCGTGCAGCTTCGGGGGCGTGAGCGTGATCAC



ACCCGGAACCAATACAAGCAACCAGGTTGCCGTCCTGTATC



AGGATGTCAATTGCACAGAAGTGCCAGTTGCTATCCACGCA



GACCAGCTGACTCCCACATGGCGGGTGTATAGCACCGGAT



CCAACGTGTTTCAGACCCGCGCCGGATGTCTCATTGGGGCC



GAGCACGTGAATAACAGCTACGAGTGCGACATCCCCATTG



GCGCCGGCATTTGTGCGTCTTACCAGACTCAGACCAACTCT



CCTAGAAGGGCCAGGTCCGTTGCTAGTCAGTCTATTATTGC



CTATACCATGAGCCTCGGAGCTGAGAATAGCGTGGCCTACT



CCAATAATTCCATCGCAATCCCTACTAACTTCACTATTTCTG



TGACCACCGAGATCCTGCCTGTGTCTATGACTAAGACTAGC



GTTGATTGTACCATGTATATTTGTGGCGACTCTACCGAATG



TTCTAACCTGCTGCTTCAGTACGGCTCATTTTGCACACAGCT



GAACAGAGCCCTGACTGGGATCGCTGTGGAGCAGGACAAG



AACACACAGGAGGTGTTTGCACAGGTGAAGCAGATCTATA



AGACCCCTCCTATTAAGGATTTCGGCGGATTCAATTTCTCA



CAGATTCTGCCAGACCCCAGTAAGCCTTCCAAGAGGAGCTT



CATCGAGGATCTCCTGTTTAACAAGGTGACCCTGGCAGACG



CCGGCTTTATTAAGCAATATGGGGATTGCCTGGGCGACATT



GCTGCCAGAGACCTGATTTGCGCCCAGAAATTCAATGGCCT



CACAGTGCTGCCACCTCTGCTGACCGACGAGATGATCGCTC



AATACACTAGCGCACTGCTGGCCGGAACCATCACATCAGG



CTGGACCTTCGGGGCCGGAGCAGCACTGCAGATTCCATTCG



CCATGCAGATGGCCTATAGATTCAACGGCATTGGCGTCACA



CAGAACGTGCTGTACGAAAACCAGAAGCTCATCGCTAACC



AGTTTAATTCCGCAATTGGAAAGATCCAAGATTCACTCAGC



TCAACCGCCTCTGCACTCGGAAAGCTGCAGGACGTGGTCA



ACCAGAATGCTCAGGCCCTGAACACACTCGTCAAGCAGCT



GTCCTCTAACTTTGGCGCTATCAGCTCCGTTCTGAACGACA



TTCTGAGCCGCCTGGATCCCCCAGAGGCTGAAGTCCAGATT



GACCGCCTGATTACCGGCCGGCTGCAGTCTCTGCAAACATA



CGTGACCCAGCAGCTGATCAGAGCAGCCGAGATCCGGGCA



TCCGCAAATCTGGCAGCAACTAAGATGAGCGAATGCGTGC



TGGGCCAGTCCAAGCGGGTGGACTTTTGTGGCAAGGGCTA



CCACCTGATGAGCTTCCCCCAGAGCGCCCCACATGGCGTTG



TTTTTCTGCACGTGACCTATGTCCCTGCTCAGGAAAAGAAC



TTTACAACTGCTCCTGCTATCTGCCATGACGGCAAGGCCCA



CTTCCCACGGGAGGGAGTGTTTGTGTCCAATGGCACACACT



GGTTCGTGACCCAGAGGAACTTCTATGAACCCCAGATCATC



ACCACTGACAATACCTTCGTGTCTGGAAATTGCGACGTCGT



GATCGGCATCGTTAACAACACCGTGTACGACCCTCTCCAGC



CAGAGCTGGACTCCTTTAAGGAGGAACTGGATAAGTATTTT



AAGAACCACACAAGCCCAGATGTGGATCTCGGGGACATCT



CCGGAATTAACGCCTCCGTGGTGAATATCCAGAAGGAGAT



TGACCGCCTAAATGAAGTTGCCAAGAACCTCAATGAGTCTC



TGATTGATCTGCAGGAACTGGGCAAGTATGAGCAGGGGTA



CATTCCCGAGGCTCCTAGGGACGGCCAGGCATACGTGCGC



AAAGACGGCGAGTGGGTGCTGCTGTCCACATTCCTGTAA





Ectodomain of a SARS-
(SEQ ID NO: 16)


CoV-2 S protein, which
MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVF


has been modified by
RSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFDNPVLPF


mutating residues 986
NDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKV


and 987 to proline, with
CEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVS


a Foldon
QPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRD



LPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWT



AGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKC



TLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATR



FASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLN



DLCFTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTG



CVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQ



AGSTPCNGVEGENCYFPLQSYGFQPTNGVGYQPYRVVVLSFE



LLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKK



FLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTN



TSNQVAVLYQDVNCTEVPVAIHADQLTPTWRVYSTGSNVFQT



RAGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSPRRARSVA



SQSIIAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEILPVSMTK



TSVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDK



NTQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSFIEDL



LFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPL



LTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQMAYRF



NGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQ



DVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDPPEAEVQ



IDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLG



QSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTT



APAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTDNT



FVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTSPD



VDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYE



QGYIPEAPRDGQAYVRKDGEWVLLSTFL





Optimized nucleotide
(SEQ ID NO: 49)


sequence encoding the
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


ectodomain of a SARS-
TGTGTCAACCTGACAACTAGGACTCAGCTGCCACCAGCCTA


CoV-2 S protein, which
CACCAACTCCTTCACCAGAGGCGTGTATTACCCAGACAAGG


has been modified to
TGTTTAGAAGCAGCGTGCTGCACTCTACCCAGGACCTCTTT


remove the furin
CTGCCCTTTTTCAGCAACGTGACATGGTTTCACGCAATTCA


cleavage site and to
CGTGTCCGGCACTAATGGCACAAAGCGGTTCGACAATCCA


replace residues 986 and
GTCCTGCCTTTCAACGATGGCGTCTACTTTGCATCTACTGA


987 with proline, with a
GAAATCCAATATCATTAGGGGATGGATCTTCGGCACAACCC


C terminal Foldon
TGGATTCTAAGACCCAGAGCCTGCTGATCGTCAACAACGCC



ACAAACGTGGTCATTAAGGTTTGCGAGTTTCAGTTCTGTAA



CGATCCTTTTCTGGGCGTGTATTATCATAAGAACAATAAGA



GCTGGATGGAGTCCGAGTTTAGAGTGTATAGCTCTGCAAAT



AATTGTACCTTTGAGTACGTGAGCCAGCCCTTTCTGATGGA



CCTGGAGGGAAAACAAGGAAACTTCAAAAACCTGCGGGAA



TTCGTTTTCAAAAACATCGACGGCTATTTCAAGATCTATAG



CAAGCATACCCCAATCAACCTCGTGAGGGACCTCCCCCAG



GGCTTTAGCGCACTGGAGCCACTGGTTGACCTGCCTATCGG



CATTAATATCACAAGATTTCAGACCCTGCTGGCACTGCATA



GAAGCTATCTGACCCCTGGAGACTCCTCTAGTGGGTGGACT



GCCGGCGCCGCTGCCTACTATGTGGGCTATCTGCAGCCACG



GACATTCCTGCTGAAATACAATGAGAACGGGACAATCACA



GATGCTGTTGATTGCGCACTCGACCCCCTGTCCGAGACAAA



GTGCACTCTCAAGAGCTTTACCGTCGAGAAGGGCATCTATC



AGACCTCAAACTTCAGGGTGCAGCCCACAGAATCTATCGTG



CGCTTCCCTAATATCACTAACCTGTGTCCTTTCGGTGAAGT



GTTCAACGCCACCAGGTTTGCTAGCGTGTATGCCTGGAACA



GGAAGAGGATCTCTAACTGCGTCGCCGACTATTCCGTGCTG



TATAACAGCGCCTCCTTCTCCACATTCAAATGCTATGGAGT



GAGCCCGACAAAACTGAACGATCTCTGCTTTACAAATGTCT



ACGCCGACTCTTTTGTGATCAGAGGGGACGAGGTCCGGCA



GATCGCACCAGGACAGACAGGCAAGATTGCTGACTACAAC



TATAAGCTGCCTGACGACTTCACAGGATGTGTGATCGCATG



GAACTCAAACAATCTGGACTCCAAAGTCGGGGGCAACTAT



AATTACCTGTATCGCCTGTTCCGGAAGTCCAACCTGAAGCC



CTTCGAGAGGGACATCAGTACAGAGATCTATCAGGCTGGC



TCCACCCCTTGCAATGGCGTCGAAGGCTTTAATTGTTATTTT



CCCCTGCAGTCTTACGGGTTTCAGCCTACTAATGGAGTTGG



GTACCAGCCATACAGAGTGGTCGTGCTCAGCTTCGAGCTCC



TGCATGCTCCAGCTACAGTTTGCGGGCCAAAGAAGTCCACT



AACCTGGTGAAGAATAAGTGCGTCAACTTCAACTTTAACGG



GCTCACCGGCACCGGCGTGCTGACTGAGAGCAACAAGAAG



TTTCTGCCATTTCAACAGTTTGGACGGGACATTGCCGACAC



CACCGATGCCGTTCGGGATCCACAGACCCTGGAAATTCTGG



ACATTACACCGTGCAGCTTCGGGGGCGTGAGCGTGATCAC



ACCCGGAACCAATACAAGCAACCAGGTTGCCGTCCTGTATC



AGGATGTCAATTGCACAGAAGTGCCAGTTGCTATCCACGCA



GACCAGCTGACTCCCACATGGCGGGTGTATAGCACCGGAT



CCAACGTGTTTCAGACCCGCGCCGGATGTCTCATTGGGGCC



GAGCACGTGAATAACAGCTACGAGTGCGACATCCCCATTG



GCGCCGGCATTTGTGCGTCTTACCAGACTCAGACCAACTCT



CCTGGCTCCGCCTCTTCCGTTGCTAGTCAGTCTATTATTGCC



TATACCATGAGCCTCGGAGCTGAGAATAGCGTGGCCTACTC



CAATAATTCCATCGCAATCCCTACTAACTTCACTATTTCTGT



GACCACCGAGATCCTGCCTGTGTCTATGACTAAGACTAGCG



TTGATTGTACCATGTATATTTGTGGCGACTCTACCGAATGTT



CTAACCTGCTGCTTCAGTACGGCTCATTTTGCACACAGCTG



AACAGAGCCCTGACTGGGATCGCTGTGGAGCAGGACAAGA



ACACACAGGAGGTGTTTGCACAGGTGAAGCAGATCTATAA



GACCCCTCCTATTAAGGATTTCGGCGGATTCAATTTCTCAC



AGATTCTGCCAGACCCCAGTAAGCCTTCCAAGAGGAGCTTC



ATCGAGGATCTCCTGTTTAACAAGGTGACCCTGGCAGACGC



CGGCTTTATTAAGCAATATGGGGATTGCCTGGGCGACATTG



CTGCCAGAGACCTGATTTGCGCCCAGAAATTCAATGGCCTC



ACAGTGCTGCCACCTCTGCTGACCGACGAGATGATCGCTCA



ATACACTAGCGCACTGCTGGCCGGAACCATCACATCAGGCT



GGACCTTCGGGGCCGGAGCAGCACTGCAGATTCCATTCGCC



ATGCAGATGGCCTATAGATTCAACGGCATTGGCGTCACACA



GAACGTGCTGTACGAAAACCAGAAGCTCATCGCTAACCAG



TTTAATTCCGCAATTGGAAAGATCCAAGATTCACTCAGCTC



AACCGCCTCTGCACTCGGAAAGCTGCAGGACGTGGTCAAC



CAGAATGCTCAGGCCCTGAACACACTCGTCAAGCAGCTGTC



CTCTAACTTTGGCGCTATCAGCTCCGTTCTGAACGACATTCT



GAGCCGCCTGGATCCCCCAGAGGCTGAAGTCCAGATTGAC



CGCCTGATTACCGGCCGGCTGCAGTCTCTGCAAACATACGT



GACCCAGCAGCTGATCAGAGCAGCCGAGATCCGGGCATCC



GCAAATCTGGCAGCAACTAAGATGAGCGAATGCGTGCTGG



GCCAGTCCAAGCGGGTGGACTTTTGTGGCAAGGGCTACCA



CCTGATGAGCTTCCCCCAGAGCGCCCCACATGGCGTTGTTT



TTCTGCACGTGACCTATGTCCCTGCTCAGGAAAAGAACTTT



ACAACTGCTCCTGCTATCTGCCATGACGGCAAGGCCCACTT



CCCACGGGAGGGAGTGTTTGTGTCCAATGGCACACACTGGT



TCGTGACCCAGAGGAACTTCTATGAACCCCAGATCATCACC



ACTGACAATACCTTCGTGTCTGGAAATTGCGACGTCGTGAT



CGGCATCGTTAACAACACCGTGTACGACCCTCTCCAGCCAG



AGCTGGACTCCTTTAAGGAGGAACTGGATAAGTATTTTAAG



AACCACACAAGCCCAGATGTGGATCTCGGGGACATCTCCG



GAATTAACGCCTCCGTGGTGAATATCCAGAAGGAGATTGA



CCGCCTAAATGAAGTTGCCAAGAACCTCAATGAGTCTCTGA



TTGATCTGCAGGAACTGGGCAAGTATGAGCAGGGGTACAT



TCCCGAGGCTCCTAGGGACGGCCAGGCATACGTGCGCAAA



GACGGCGAGTGGGTGCTGCTGTCCACATTCCTGTAA





Ectodomain of a SARS-
(SEQ ID NO: 17)


CoV-2 S protein, which
MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVF


has been modified to
RSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFDNPVLPF


remove a furin cleavage
NDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKV


site and to replace
CEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVS


residues 986 and 987
QPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRD


with proline, with a C
LPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWT


terminal Foldon
AGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKC



TLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATR



FASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLN



DLCFTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTG



CVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQ



AGSTPCNGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFE



LLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKK



FLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTN



TSNQVAVLYQDVNCTEVPVAIHADQLTPTWRVYSTGSNVFQT



RAGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSPGSASSVAS



QSIIAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEILPVSMTKT



SVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDK



NTQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSFIEDL



LFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPL



LTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQMAYRF



NGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQ



DVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDPPEAEVQ



IDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLG



QSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTT



APAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTDNT



FVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTSPD



VDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYE



QGYIPEAPRDGQAYVRKDGEWVLLSTFL





Optimized nucleotide
(SEQ ID NO: 50)


sequence encoding the
ATGCCTAATATCACTAACCTGTGTCCTTTCGGTGAAGTGTT


receptor-binding
CAACGCCACCAGGTTTGCTAGCGTGTATGCCTGGAACAGG


domain of a SARS-CoV-
AAGAGGATCTCTAACTGCGTCGCCGACTATTCCGTGCTGTA


2 S protein with an Fc
TAACAGCGCCTCCTTCTCCACATTCAAATGCTATGGAGTGA


region
GCCCGACAAAACTGAACGATCTCTGCTTTACAAATGTCTAC



GCCGACTCTTTTGTGATCAGAGGGGACGAGGTCCGGCAGA



TCGCACCAGGACAGACAGGCAAGATTGCTGACTACAACTA



TAAGCTGCCTGACGACTTCACAGGATGTGTGATCGCATGGA



ACTCAAACAATCTGGACTCCAAAGTCGGGGGCAACTATAA



TTACCTGTATCGCCTGTTCCGGAAGTCCAACCTGAAGCCCT



TCGAGAGGGACATCAGTACAGAGATCTATCAGGCTGGCTC



CACCCCTTGCAATGGCGTCGAAGGCTTTAATTGTTATTTTCC



CCTGCAGTCTTACGGGTTTCAGCCTACTAATGGAGTTGGGT



ACCAGCCATACAGAGTGGTCGTGCTCAGCTTCGAGCTCCTG



CATGCTCCACCTAAGTCCTGCGACAAAACCCATACATGTCC



ACCATGCCCAGCTCCTGAACTGCTCGGCGGGCCTAGTGTTT



TCCTCTTCCCTCCTAAGCCCAAGGATACCCTCATGATCTCTC



GCACACCAGAAGTGACCTGCGTGGTCGTGGATGTCTCTCAC



GAGGATCCTGAAGTGAAGTTTAACTGGTATGTCGACGGAG



TGGAAGTGCACAACGCCAAGACAAAGCCAAGAGAAGAAC



AATACAATTCTACTTATAGGGTGGTGTCTGTGCTGACAGTG



CTGCACCAGGATTGGCTGAATGGAAAAGAATATAAGTGTA



AGGTCTCTAACAAGGCCCTGCCCGCTCCAATTGAGAAGAC



AATTTCCAAGGCCAAGGGGCAGCCTCGGGAACCTCAGGTG



TACACACTGCCCCCATCCAGGGATGAACTGACTAAAAATC



AGGTGTCTCTGACATGCCTGGTGAAAGGGTTTTATCCAAGT



GACATTGCTGTGGAGTGGGAGTCTAATGGGCAGCCTGAAA



ATAACTACAAGACCACACCACCAGTGCTCGATAGCGACGG



GTCTTTCTTTCTGTATTCTAAACTGACCGTGGATAAATCTCG



GTGGCAGCAGGGAAACGTGTTTTCTTGCTCAGTGATGCACG



AAGCTCTGCACAATCACTATACACAGAAATCCCTGTCCCTG



TCTCCAGGCAAATAA





Receptor-binding
(SEQ ID NO: 19)


domain of a SARS-CoV-
MNITNLCPFGEVFNATRFASVYAWNRKRISNCVADYSVLYNS


2 S protein with an Fc
ASFSTFKCYGVSPTKLNDLCFTNVYADSFVIRGDEVRQIAPGQ


region
TGKIADYNYKLPDDFTGCVIAWNSNNLDSKVGGNYNYLYRL



FRKSNLKPFERDISTEIYQAGSTPCNGVEGFNCYFPLQSYGFQP



TNGVGYQPYRVVVLSFELLHAPPKSCDKTHTCPPCPAPELLGG




PSVFLFPPKPKDTLMISRTPEVTCVVVDVSHEDPEVKFNWYVDGV





EVHNAKTKPREEQYNSTYRVVSVLTVLHQDWLNGKEYKCKVSNK





ALPAPIEKTISKAKGQPREPQVYTLPPSRDELTKNQVSLTCLVKGF





YPSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTVDKSRW





QQGNVFSCSVMHEALHNHYTQKSLSLSPGK






Optimized nucleotide
(SEQ ID NO: 36)


sequence encoding a
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


receptor-binding
TGTCCTAATATCACTAACCTGTGTCCTTTCGGTGAAGTGTTC


domain of SARS-CoV-2
AACGCCACCAGGTTTGCTAGCGTGTATGCCTGGAACAGGA


S protein with a signal
AGAGGATCTCTAACTGCGTCGCCGACTATTCCGTGCTGTAT


sequence and an Fc
AACAGCGCCTCCTTCTCCACATTCAAATGCTATGGAGTGAG


region
CCCGACAAAACTGAACGATCTCTGCTTTACAAATGTCTACG



CCGACTCTTTTGTGATCAGAGGGGACGAGGTCCGGCAGATC



GCACCAGGACAGACAGGCAAGATTGCTGACTACAACTATA



AGCTGCCTGACGACTTCACAGGATGTGTGATCGCATGGAAC



TCAAACAATCTGGACTCCAAAGTCGGGGGCAACTATAATT



ACCTGTATCGCCTGTTCCGGAAGTCCAACCTGAAGCCCTTC



GAGAGGGACATCAGTACAGAGATCTATCAGGCTGGCTCCA



CCCCTTGCAATGGCGTCGAAGGCTTTAATTGTTATTTTCCCC



TGCAGTCTTACGGGTTTCAGCCTACTAATGGAGTTGGGTAC



CAGCCATACAGAGTGGTCGTGCTCAGCTTCGAGCTCCTGCA



TGCTCCACCTAAGTCCTGCGACAAAACCCATACATGTCCAC



CATGCCCAGCTCCTGAACTGCTCGGCGGGCCTAGTGTTTTC



CTCTTCCCTCCTAAGCCCAAGGATACCCTCATGATCTCTCG



CACACCAGAAGTGACCTGCGTGGTCGTGGATGTCTCTCACG



AGGATCCTGAAGTGAAGTTTAACTGGTATGTCGACGGAGT



GGAAGTGCACAACGCCAAGACAAAGCCAAGAGAAGAACA



ATACAATTCTACTTATAGGGTGGTGTCTGTGCTGACAGTGC



TGCACCAGGATTGGCTGAATGGAAAAGAATATAAGTGTAA



GGTCTCTAACAAGGCCCTGCCCGCTCCAATTGAGAAGACA



ATTTCCAAGGCCAAGGGGCAGCCTCGGGAACCTCAGGTGT



ACACACTGCCCCCATCCAGGGATGAACTGACTAAAAATCA



GGTGTCTCTGACATGCCTGGTGAAAGGGTTTTATCCAAGTG



ACATTGCTGTGGAGTGGGAGTCTAATGGGCAGCCTGAAAA



TAACTACAAGACCACACCACCAGTGCTCGATAGCGACGGG



TCTTTCTTTCTGTATTCTAAACTGACCGTGGATAAATCTCGG



TGGCAGCAGGGAAACGTGTTTTCTTGCTCAGTGATGCACGA



AGCTCTGCACAATCACTATACACAGAAATCCCTGTCCCTGT



CTCCAGGCAAATAA





Receptor-binding
(SEQ ID NO: 20)


domain of a SARS-CoV-

MFVFLVLLPLVSSQCPNITNLCPFGEVFNATRFASVYAWNRKR



2 S protein with a signal
ISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSF


sequence and an Fc
VIRGDEVRQIAPGQTGKIADYNYKLPDDFTGCVIAWNSNNLD


region
SKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAGSTPCNGVEG



FNCYFPLQSYGFQPTNGVGYQPYRVVVLSFELLHAPPKSCDKT




HTCPPCPAPELLGGPSVFLFPPKPKDTLMISRTPEVTCVVVDVSH





EDPEVKFNWYVDGVEVHNAKTKPREEQYNSTYRVVSVLTVLHQD





WLNGKEYKCKVSNKALPAPIEKTISKAKGQPREPQVYTLPPSRDE





LTKNQVSLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLDSDG





SFFLYSKLTVDKSRWQQGNVFSCSVMHEALHNHYTQKSLSLSPGK






Optimized nucleotide
(SEQ ID NO: 69)


sequence encoding a S2
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


subunit of a SARS-CoV-
TGTTCCGTTGCTAGTCAGTCTATTATTGCCTATACCATGAGC


2 S protein, which has
CTCGGAGCTGAGAATAGCGTGGCCTACTCCAATAATTCCAT


been modified to remove
CGCAATCCCTACTAACTTCACTATTTCTGTGACCACCGAGA


residues 986 and 987
TCCTGCCTGTGTCTATGACTAAGACTAGCGTTGATTGTACC


with proline, with a
ATGTATATTTGTGGCGACTCTACCGAATGTTCTAACCTGCT


signal sequence
GCTTCAGTACGGCTCATTTTGCACACAGCTGAACAGAGCCC



TGACTGGGATCGCTGTGGAGCAGGACAAGAACACACAGGA



GGTGTTTGCACAGGTGAAGCAGATCTATAAGACCCCTCCTA



TTAAGGATTTCGGCGGATTCAATTTCTCACAGATTCTGCCA



GACCCCAGTAAGCCTTCCAAGAGGAGCTTCATCGAGGATCT



CCTGTTTAACAAGGTGACCCTGGCAGACGCCGGCTTTATTA



AGCAATATGGGGATTGCCTGGGCGACATTGCTGCCAGAGA



CCTGATTTGCGCCCAGAAATTCAATGGCCTCACAGTGCTGC



CACCTCTGCTGACCGACGAGATGATCGCTCAATACACTAGC



GCACTGCTGGCCGGAACCATCACATCAGGCTGGACCTTCGG



GGCCGGAGCAGCACTGCAGATTCCATTCGCCATGCAGATG



GCCTATAGATTCAACGGCATTGGCGTCACACAGAACGTGCT



GTACGAAAACCAGAAGCTCATCGCTAACCAGTTTAATTCCG



CAATTGGAAAGATCCAAGATTCACTCAGCTCAACCGCCTCT



GCACTCGGAAAGCTGCAGGACGTGGTCAACCAGAATGCTC



AGGCCCTGAACACACTCGTCAAGCAGCTGTCCTCTAACTTT



GGCGCTATCAGCTCCGTTCTGAACGACATTCTGAGCCGCCT



GGATCCCCCAGAGGCTGAAGTCCAGATTGACCGCCTGATTA



CCGGCCGGCTGCAGTCTCTGCAAACATACGTGACCCAGCA



GCTGATCAGAGCAGCCGAGATCCGGGCATCCGCAAATCTG



GCAGCAACTAAGATGAGCGAATGCGTGCTGGGCCAGTCCA



AGCGGGTGGACTTTTGTGGCAAGGGCTACCACCTGATGAG



CTTCCCCCAGAGCGCCCCACATGGCGTTGTTTTTCTGCACG



TGACCTATGTCCCTGCTCAGGAAAAGAACTTTACAACTGCT



CCTGCTATCTGCCATGACGGCAAGGCCCACTTCCCACGGGA



GGGAGTGTTTGTGTCCAATGGCACACACTGGTTCGTGACCC



AGAGGAACTTCTATGAACCCCAGATCATCACCACTGACAAT



ACCTTCGTGTCTGGAAATTGCGACGTCGTGATCGGCATCGT



TAACAACACCGTGTACGACCCTCTCCAGCCAGAGCTGGACT



CCTTTAAGGAGGAACTGGATAAGTATTTTAAGAACCACAC



AAGCCCAGATGTGGATCTCGGGGACATCTCCGGAATTAAC



GCCTCCGTGGTGAATATCCAGAAGGAGATTGACCGCCTAA



ATGAAGTTGCCAAGAACCTCAATGAGTCTCTGATTGATCTG



CAGGAACTGGGCAAGTATGAGCAGTATATCAAATGGCCCT



GGTACATTTGGCTGGGGTTTATCGCCGGACTGATTGCCATC



GTCATGGTGACCATCATGCTGTGTTGCATGACCTCCTGTTG



TTCCTGTCTGAAGGGCTGCTGTAGTTGCGGCTCTTGCTGTA



AATTCGACGAAGATGATAGCGAGCCCGTGCTGAAGGGCGT



GAAGCTGCATTATACCTGA





S2 subunit of a SARS-
(SEQ ID NO: 70)


CoV-2 S protein, which

MFVFLVLLPLVSSQCSVASQSIIAYTMSLGAENSVAYSNNSIAI



has been modified to
PTNFTISVTTEILPVSMTKTSVDCTMYICGDSTECSNLLLQYGS


remove residues 986 and
FCTQLNRALTGIAVEQDKNTQEVFAQVKQIYKTPPIKDFGGFN


987 with proline, with a
FSQILPDPSKPSKRSFIEDLLFNKVTLADAGFIKQYGDCLGDIA


signal sequence
ARDLICAQKFNGLTVLPPLLTDEMIAQYTSALLAGTITSGWTF



GAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKLIANQFNSA



IGKIQDSLSSTASALGKLQDVVNQNAQALNTLVKQLSSNFGAI



SSVLNDILSRLDPPEAEVQIDRLITGRLQSLQTYVTQQLIRAAEI



RASANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAPHG



VVFLHVTYVPAQEKNFTTAPAICHDGKAHFPREGVFVSNGTH



WFVTQRNFYEPQIITTDNTFVSGNCDVVIGIVNNTVYDPLQPE



LDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNE



VAKNLNESLIDLQELGKYEQYIKWPWYIWLGFIAGLIAIVMVT



IMLCCMTSCCSCLKGCCSCGSCCKFDEDDSEPVLKGVKLHYT





Optimized nucleotide
(SEQ ID NO: 75)


sequence encoding a
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


ectodomain of the S2
TGTTCCGTTGCTAGTCAGTCTATTATTGCCTATACCATGAGC


subunit of a SARS-CoV-
CTCGGAGCTGAGAATAGCGTGGCCTACTCCAATAATTCCAT


2 S protein with a signal
CGCAATCCCTACTAACTTCACTATTTCTGTGACCACCGAGA


sequence and a Foldon
TCCTGCCTGTGTCTATGACTAAGACTAGCGTTGATTGTACC



ATGTATATTTGTGGCGACTCTACCGAATGTTCTAACCTGCT



GCTTCAGTACGGCTCATTTTGCACACAGCTGAACAGAGCCC



TGACTGGGATCGCTGTGGAGCAGGACAAGAACACACAGGA



GGTGTTTGCACAGGTGAAGCAGATCTATAAGACCCCTCCTA



TTAAGGATTTCGGCGGATTCAATTTCTCACAGATTCTGCCA



GACCCCAGTAAGCCTTCCAAGAGGAGCTTCATCGAGGATCT



CCTGTTTAACAAGGTGACCCTGGCAGACGCCGGCTTTATTA



AGCAATATGGGGATTGCCTGGGCGACATTGCTGCCAGAGA



CCTGATTTGCGCCCAGAAATTCAATGGCCTCACAGTGCTGC



CACCTCTGCTGACCGACGAGATGATCGCTCAATACACTAGC



GCACTGCTGGCCGGAACCATCACATCAGGCTGGACCTTCGG



GGCCGGAGCAGCACTGCAGATTCCATTCGCCATGCAGATG



GCCTATAGATTCAACGGCATTGGCGTCACACAGAACGTGCT



GTACGAAAACCAGAAGCTCATCGCTAACCAGTTTAATTCCG



CAATTGGAAAGATCCAAGATTCACTCAGCTCAACCGCCTCT



GCACTCGGAAAGCTGCAGGACGTGGTCAACCAGAATGCTC



AGGCCCTGAACACACTCGTCAAGCAGCTGTCCTCTAACTTT



GGCGCTATCAGCTCCGTTCTGAACGACATTCTGAGCCGCCT



GGATAAGGTGGAGGCTGAAGTCCAGATTGACCGCCTGATT



ACCGGCCGGCTGCAGTCTCTGCAAACATACGTGACCCAGC



AGCTGATCAGAGCAGCCGAGATCCGGGCATCCGCAAATCT



GGCAGCAACTAAGATGAGCGAATGCGTGCTGGGCCAGTCC



AAGCGGGTGGACTTTTGTGGCAAGGGCTACCACCTGATGA



GCTTCCCCCAGAGCGCCCCACATGGCGTTGTTTTTCTGCAC



GTGACCTATGTCCCTGCTCAGGAAAAGAACTTTACAACTGC



TCCTGCTATCTGCCATGACGGCAAGGCCCACTTCCCACGGG



AGGGAGTGTTTGTGTCCAATGGCACACACTGGTTCGTGACC



CAGAGGAACTTCTATGAACCCCAGATCATCACCACTGACA



ATACCTTCGTGTCTGGAAATTGCGACGTCGTGATCGGCATC



GTTAACAACACCGTGTACGACCCTCTCCAGCCAGAGCTGGA



CTCCTTTAAGGAGGAACTGGATAAGTATTTTAAGAACCACA



CAAGCCCAGATGTGGATCTCGGGGACATCTCCGGAATTAA



CGCCTCCGTGGTGAATATCCAGAAGGAGATTGACCGCCTA



AATGAAGTTGCCAAGAACCTCAATGAGTCTCTGATTGATCT



GCAGGAACTGGGCAAGTATGAGCAGGGGTACATTCCCGAG



GCTCCTAGGGACGGCCAGGCATACGTGCGCAAAGACGGCG



AGTGGGTGCTGCTGTCCACATTCCTGTGA





S2 subunit of a SARS-
(SEQ ID NO: 76)


CoV-2 S protein with a

MFVFLVLLPLVSSQCSVASQSIIAYTMSLGAENSVAYSNNSIAI



signal sequence and a
PTNFTISVTTEILPVSMTKTSVDCTMYICGDSTECSNLLLQYGS


Foldon
FCTQLNRALTGIAVEQDKNTQEVFAQVKQIYKTPPIKDFGGFN



FSQILPDPSKPSKRSFIEDLLFNKVTLADAGFIKQYGDCLGDIA



ARDLICAQKFNGLTVLPPLLTDEMIAQYTSALLAGTITSGWTF



GAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKLIANQFNSA



IGKIQDSLSSTASALGKLQDVVNQNAQALNTLVKQLSSNFGAI



SSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAE



IRASANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAPHG



VVFLHVTYVPAQEKNFTTAPAICHDGKAHFPREGVFVSNGTH



WFVTQRNFYEPQIITTDNTFVSGNCDVVIGIVNNTVYDPLQPE



LDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNE



VAKNLNESLIDLQELGKYEQGYIPEAPRDGQAYVRKDGEWVLL




STFL






Optimized nucleotide
(SEQ ID NO: 77)


sequence encoding the
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


S2 subunit of a SARS-
TGTTCCGTTGCTAGTCAGTCTATTATTGCCTATACCATGAGC


CoV-2 S protein, which
CTCGGAGCTGAGAATAGCGTGGCCTACTCCAATAATTCCAT


has been modified to
CGCAATCCCTACTAACTTCACTATTTCTGTGACCACCGAGA


remove residues 986 and
TCCTGCCTGTGTCTATGACTAAGACTAGCGTTGATTGTACC


987 with proline, with a
ATGTATATTTGTGGCGACTCTACCGAATGTTCTAACCTGCT


signal sequence
GCTTCAGTACGGCTCATTTTGCACACAGCTGAACAGAGCCC



TGACTGGGATCGCTGTGGAGCAGGACAAGAACACACAGGA



GGTGTTTGCACAGGTGAAGCAGATCTATAAGACCCCTCCTA



TTAAGGATTTCGGCGGATTCAATTTCTCACAGATTCTGCCA



GACCCCAGTAAGCCTTCCAAGAGGAGCTTCATCGAGGATCT



CCTGTTTAACAAGGTGACCCTGGCAGACGCCGGCTTTATTA



AGCAATATGGGGATTGCCTGGGCGACATTGCTGCCAGAGA



CCTGATTTGCGCCCAGAAATTCAATGGCCTCACAGTGCTGC



CACCTCTGCTGACCGACGAGATGATCGCTCAATACACTAGC



GCACTGCTGGCCGGAACCATCACATCAGGCTGGACCTTCGG



GGCCGGAGCAGCACTGCAGATTCCATTCGCCATGCAGATG



GCCTATAGATTCAACGGCATTGGCGTCACACAGAACGTGCT



GTACGAAAACCAGAAGCTCATCGCTAACCAGTTTAATTCCG



CAATTGGAAAGATCCAAGATTCACTCAGCTCAACCGCCTCT



GCACTCGGAAAGCTGCAGGACGTGGTCAACCAGAATGCTC



AGGCCCTGAACACACTCGTCAAGCAGCTGTCCTCTAACTTT



GGCGCTATCAGCTCCGTTCTGAACGACATTCTGAGCCGCCT



GGATCCCCCAGAGGCTGAAGTCCAGATTGACCGCCTGATTA



CCGGCCGGCTGCAGTCTCTGCAAACATACGTGACCCAGCA



GCTGATCAGAGCAGCCGAGATCCGGGCATCCGCAAATCTG



GCAGCAACTAAGATGAGCGAATGCGTGCTGGGCCAGTCCA



AGCGGGTGGACTTTTGTGGCAAGGGCTACCACCTGATGAG



CTTCCCCCAGAGCGCCCCACATGGCGTTGTTTTTCTGCACG



TGACCTATGTCCCTGCTCAGGAAAAGAACTTTACAACTGCT



CCTGCTATCTGCCATGACGGCAAGGCCCACTTCCCACGGGA



GGGAGTGTTTGTGTCCAATGGCACACACTGGTTCGTGACCC



AGAGGAACTTCTATGAACCCCAGATCATCACCACTGACAAT



ACCTTCGTGTCTGGAAATTGCGACGTCGTGATCGGCATCGT



TAACAACACCGTGTACGACCCTCTCCAGCCAGAGCTGGACT



CCTTTAAGGAGGAACTGGATAAGTATTTTAAGAACCACAC



AAGCCCAGATGTGGATCTCGGGGACATCTCCGGAATTAAC



GCCTCCGTGGTGAATATCCAGAAGGAGATTGACCGCCTAA



ATGAAGTTGCCAAGAACCTCAATGAGTCTCTGATTGATCTG



CAGGAACTGGGCAAGTATGAGCAGTGA





S2 subunit of a SARS-
(SEQ ID NO: 78)


CoV-2 S protein which

MFVFLVLLPLVSSQCSVASQSIIAYTMSLGAENSVAYSNNSIAI



has been modified to
PTNFTISVTTEILPVSMTKTSVDCTMYICGDSTECSNLLLQYGS


remove residues 986 and
FCTQLNRALTGIAVEQDKNTQEVFAQVKQIYKTPPIKDFGGFN


987 with proline
FSQILPDPSKPSKRSFIEDLLFNKVTLADAGFIKQYGDCLGDIA



ARDLICAQKFNGLTVLPPLLTDEMIAQYTSALLAGTITSGWTF



GAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKLIANQFNSA



IGKIQDSLSSTASALGKLQDVVNQNAQALNTLVKQLSSNFGAI



SSVLNDILSRLDPPEAEVQIDRLITGRLQSLQTYVTQQLIRAAEI



RASANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAPHG



VVFLHVTYVPAQEKNFTTAPAICHDGKAHFPREGVFVSNGTH



WFVTQRNFYEPQIITTDNTFVSGNCDVVIGIVNNTVYDPLQPE



LDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNE



VAKNLNESLIDLQELGKYEQ





Optimized nucleotide
(SEQ ID NO: 79)


sequence encoding S2
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


subunit of a SARS-CoV-
TGTTCCGTTGCTAGTCAGTCTATTATTGCCTATACCATGAGC


2 S protein with a signal
CTCGGAGCTGAGAATAGCGTGGCCTACTCCAATAATTCCAT


sequence, which has
CGCAATCCCTACTAACTTCACTATTTCTGTGACCACCGAGA


been modified to remove
TCCTGCCTGTGTCTATGACTAAGACTAGCGTTGATTGTACC


residues 986 and 987
ATGTATATTTGTGGCGACTCTACCGAATGTTCTAACCTGCT


with proline, and a
GCTTCAGTACGGCTCATTTTGCACACAGCTGAACAGAGCCC


Foldon
TGACTGGGATCGCTGTGGAGCAGGACAAGAACACACAGGA



GGTGTTTGCACAGGTGAAGCAGATCTATAAGACCCCTCCTA



TTAAGGATTTCGGCGGATTCAATTTCTCACAGATTCTGCCA



GACCCCAGTAAGCCTTCCAAGAGGAGCTTCATCGAGGATCT



CCTGTTTAACAAGGTGACCCTGGCAGACGCCGGCTTTATTA



AGCAATATGGGGATTGCCTGGGCGACATTGCTGCCAGAGA



CCTGATTTGCGCCCAGAAATTCAATGGCCTCACAGTGCTGC



CACCTCTGCTGACCGACGAGATGATCGCTCAATACACTAGC



GCACTGCTGGCCGGAACCATCACATCAGGCTGGACCTTCGG



GGCCGGAGCAGCACTGCAGATTCCATTCGCCATGCAGATG



GCCTATAGATTCAACGGCATTGGCGTCACACAGAACGTGCT



GTACGAAAACCAGAAGCTCATCGCTAACCAGTTTAATTCCG



CAATTGGAAAGATCCAAGATTCACTCAGCTCAACCGCCTCT



GCACTCGGAAAGCTGCAGGACGTGGTCAACCAGAATGCTC



AGGCCCTGAACACACTCGTCAAGCAGCTGTCCTCTAACTTT



GGCGCTATCAGCTCCGTTCTGAACGACATTCTGAGCCGCCT



GGATCCCCCAGAGGCTGAAGTCCAGATTGACCGCCTGATTA



CCGGCCGGCTGCAGTCTCTGCAAACATACGTGACCCAGCA



GCTGATCAGAGCAGCCGAGATCCGGGCATCCGCAAATCTG



GCAGCAACTAAGATGAGCGAATGCGTGCTGGGCCAGTCCA



AGCGGGTGGACTTTTGTGGCAAGGGCTACCACCTGATGAG



CTTCCCCCAGAGCGCCCCACATGGCGTTGTTTTTCTGCACG



TGACCTATGTCCCTGCTCAGGAAAAGAACTTTACAACTGCT



CCTGCTATCTGCCATGACGGCAAGGCCCACTTCCCACGGGA



GGGAGTGTTTGTGTCCAATGGCACACACTGGTTCGTGACCC



AGAGGAACTTCTATGAACCCCAGATCATCACCACTGACAAT



ACCTTCGTGTCTGGAAATTGCGACGTCGTGATCGGCATCGT



TAACAACACCGTGTACGACCCTCTCCAGCCAGAGCTGGACT



CCTTTAAGGAGGAACTGGATAAGTATTTTAAGAACCACAC



AAGCCCAGATGTGGATCTCGGGGACATCTCCGGAATTAAC



GCCTCCGTGGTGAATATCCAGAAGGAGATTGACCGCCTAA



ATGAAGTTGCCAAGAACCTCAATGAGTCTCTGATTGATCTG



CAGGAACTGGGCAAGTATGAGCAGGGGTACATTCCCGAGG



CTCCTAGGGACGGCCAGGCATACGTGCGCAAAGACGGCGA



GTGGGTGCTGCTGTCCACATTCCTGTGA





S2 subunit of a SARS-
(SEQ ID NO: 80)


CoV-2 S protein, which

MFVFLVLLPLVSSQCSVASQSIIAYTMSLGAENSVAYSNNSIAI



has been modified to
PTNFTISVTTEILPVSMTKTSVDCTMYICGDSTECSNLLLQYGS


remove residues 986 and
FCTQLNRALTGIAVEQDKNTQEVFAQVKQIYKTPPIKDFGGFN


987 with proline, with a
FSQILPDPSKPSKRSFIEDLLFNKVTLADAGFIKQYGDCLGDIA


signal sequence and a
ARDLICAQKFNGLTVLPPLLTDEMIAQYTSALLAGTITSGWTF


Foldon
GAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKLIANQFNSA



IGKIQDSLSSTASALGKLQDVVNQNAQALNTLVKQLSSNFGAI



SSVLNDILSRLDPPEAEVQIDRLITGRLQSLQTYVTQQLIRAAEI



RASANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAPHG



VVFLHVTYVPAQEKNFTTAPAICHDGKAHFPREGVFVSNGTH



WFVTQRNFYEPQIITTDNTFVSGNCDVVIGIVNNTVYDPLQPE



LDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNE



VAKNLNESLIDLQELGKYEQGYIPEAPRDGQAYVRKDGEWVLL




STFL






Optimized nucleotide
(SEQ ID NO: 81)


sequence encoding the
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


S2′ subunit of a SARS-
TGTAGCTTCATCGAGGATCTCCTGTTTAACAAGGTGACCCT


CoV-2 S protein, which
GGCAGACGCCGGCTTTATTAAGCAATATGGGGATTGCCTGG


has been modified to
GCGACATTGCTGCCAGAGACCTGATTTGCGCCCAGAAATTC


remove residues 986 and
AATGGCCTCACAGTGCTGCCACCTCTGCTGACCGACGAGAT


987 with proline, with a
GATCGCTCAATACACTAGCGCACTGCTGGCCGGAACCATCA


signal sequence
CATCAGGCTGGACCTTCGGGGCCGGAGCAGCACTGCAGAT



TCCATTCGCCATGCAGATGGCCTATAGATTCAACGGCATTG



GCGTCACACAGAACGTGCTGTACGAAAACCAGAAGCTCAT



CGCTAACCAGTTTAATTCCGCAATTGGAAAGATCCAAGATT



CACTCAGCTCAACCGCCTCTGCACTCGGAAAGCTGCAGGAC



GTGGTCAACCAGAATGCTCAGGCCCTGAACACACTCGTCA



AGCAGCTGTCCTCTAACTTTGGCGCTATCAGCTCCGTTCTG



AACGACATTCTGAGCCGCCTGGATCCCCCAGAGGCTGAAG



TCCAGATTGACCGCCTGATTACCGGCCGGCTGCAGTCTCTG



CAAACATACGTGACCCAGCAGCTGATCAGAGCAGCCGAGA



TCCGGGCATCCGCAAATCTGGCAGCAACTAAGATGAGCGA



ATGCGTGCTGGGCCAGTCCAAGCGGGTGGACTTTTGTGGCA



AGGGCTACCACCTGATGAGCTTCCCCCAGAGCGCCCCACAT



GGCGTTGTTTTTCTGCACGTGACCTATGTCCCTGCTCAGGA



AAAGAACTTTACAACTGCTCCTGCTATCTGCCATGACGGCA



AGGCCCACTTCCCACGGGAGGGAGTGTTTGTGTCCAATGGC



ACACACTGGTTCGTGACCCAGAGGAACTTCTATGAACCCCA



GATCATCACCACTGACAATACCTTCGTGTCTGGAAATTGCG



ACGTCGTGATCGGCATCGTTAACAACACCGTGTACGACCCT



CTCCAGCCAGAGCTGGACTCCTTTAAGGAGGAACTGGATA



AGTATTTTAAGAACCACACAAGCCCAGATGTGGATCTCGG



GGACATCTCCGGAATTAACGCCTCCGTGGTGAATATCCAGA



AGGAGATTGACCGCCTAAATGAAGTTGCCAAGAACCTCAA



TGAGTCTCTGATTGATCTGCAGGAACTGGGCAAGTATGAGC



AGTGA





S2′ subunit of a SARS-
(SEQ ID NO: 82)


CoV-2 S protein, which

MFVFLVLLPLVSSQCSFIEDLLFNKVTLADAGFIKQYGDCLGDI



has been modified to
AARDLICAQKFNGLTVLPPLLTDEMIAQYTSALLAGTITSGWT


remove residues 986 and
FGAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKLIANQFNS


987 with proline, with a
AIGKIQDSLSSTASALGKLQDVVNQNAQALNTLVKQLSSNFG


signal sequence
AISSVLNDILSRLDPPEAEVQIDRLITGRLQSLQTYVTQQLIRA



AEIRASANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAP



HGVVFLHVTYVPAQEKNFTTAPAICHDGKAHFPREGVFVSNG



THWFVTQRNFYEPQIITTDNTFVSGNCDVVIGIVNNTVYDPLQ



PELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRL



NEVAKNLNESLIDLQELGKYEQ





Optimized nucleotide
(SEQ ID NO: 83)


sequence encoding the
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


S2′ subunit of a SARS-
TGTAGCTTCATCGAGGATCTCCTGTTTAACAAGGTGACCCT


CoV-2 S protein, which
GGCAGACGCCGGCTTTATTAAGCAATATGGGGATTGCCTGG


has been modified to
GCGACATTGCTGCCAGAGACCTGATTTGCGCCCAGAAATTC


remove residues 986 and
AATGGCCTCACAGTGCTGCCACCTCTGCTGACCGACGAGAT


987 with proline, with a
GATCGCTCAATACACTAGCGCACTGCTGGCCGGAACCATCA


signal sequence and a
CATCAGGCTGGACCTTCGGGGCCGGAGCAGCACTGCAGAT


Foldon
TCCATTCGCCATGCAGATGGCCTATAGATTCAACGGCATTG



GCGTCACACAGAACGTGCTGTACGAAAACCAGAAGCTCAT



CGCTAACCAGTTTAATTCCGCAATTGGAAAGATCCAAGATT



CACTCAGCTCAACCGCCTCTGCACTCGGAAAGCTGCAGGAC



GTGGTCAACCAGAATGCTCAGGCCCTGAACACACTCGTCA



AGCAGCTGTCCTCTAACTTTGGCGCTATCAGCTCCGTTCTG



AACGACATTCTGAGCCGCCTGGATCCCCCAGAGGCTGAAG



TCCAGATTGACCGCCTGATTACCGGCCGGCTGCAGTCTCTG



CAAACATACGTGACCCAGCAGCTGATCAGAGCAGCCGAGA



TCCGGGCATCCGCAAATCTGGCAGCAACTAAGATGAGCGA



ATGCGTGCTGGGCCAGTCCAAGCGGGTGGACTTTTGTGGCA



AGGGCTACCACCTGATGAGCTTCCCCCAGAGCGCCCCACAT



GGCGTTGTTTTTCTGCACGTGACCTATGTCCCTGCTCAGGA



AAAGAACTTTACAACTGCTCCTGCTATCTGCCATGACGGCA



AGGCCCACTTCCCACGGGAGGGAGTGTTTGTGTCCAATGGC



ACACACTGGTTCGTGACCCAGAGGAACTTCTATGAACCCCA



GATCATCACCACTGACAATACCTTCGTGTCTGGAAATTGCG



ACGTCGTGATCGGCATCGTTAACAACACCGTGTACGACCCT



CTCCAGCCAGAGCTGGACTCCTTTAAGGAGGAACTGGATA



AGTATTTTAAGAACCACACAAGCCCAGATGTGGATCTCGG



GGACATCTCCGGAATTAACGCCTCCGTGGTGAATATCCAGA



AGGAGATTGACCGCCTAAATGAAGTTGCCAAGAACCTCAA



TGAGTCTCTGATTGATCTGCAGGAACTGGGCAAGTATGAGC



AGGGGTACATTCCCGAGGCTCCTAGGGACGGCCAGGCATA



CGTGCGCAAAGACGGCGAGTGGGTGCTGCTGTCCACATTCC



TGTGA





S2′ subunit of a SARS-
(SEQ ID NO: 84)


CoV-2 S protein, which

MFVFLVLLPLVSSQCSFIEDLLFNKVTLADAGFIKQYGDCLGDI



has been modified to
AARDLICAQKFNGLTVLPPLLTDEMIAQYTSALLAGTITSGWT


remove residues 986 and
FGAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKLIANQFNS


987 with proline, with a
AIGKIQDSLSSTASALGKLQDVVNQNAQALNTLVKQLSSNFG


signal sequence and a
AISSVLNDILSRLDPPEAEVQIDRLITGRLQSLQTYVTQQLIRA


Foldon
AEIRASANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAP



HGVVFLHVTYVPAQEKNFTTAPAICHDGKAHFPREGVFVSNG



THWFVTQRNFYEPQIITTDNTFVSGNCDVVIGIVNNTVYDPLQ



PELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRL



NEVAKNLNESLIDLQELGKYEQGYIPEAPRDGQAYVRKDGEWV




LLSTFL






Optimized nucleotide
(SEQ ID NO: 85)


sequence encoding the
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


full length S2′ subunit of
TGTAGCTTCATCGAGGATCTCCTGTTTAACAAGGTGACCCT


a SARS-CoV-2 S
GGCAGACGCCGGCTTTATTAAGCAATATGGGGATTGCCTGG


protein, which has been
GCGACATTGCTGCCAGAGACCTGATTTGCGCCCAGAAATTC


modified to remove
AATGGCCTCACAGTGCTGCCACCTCTGCTGACCGACGAGAT


residues 986 and 987
GATCGCTCAATACACTAGCGCACTGCTGGCCGGAACCATCA


with proline, with a
CATCAGGCTGGACCTTCGGGGCCGGAGCAGCACTGCAGAT


signal sequence
TCCATTCGCCATGCAGATGGCCTATAGATTCAACGGCATTG



GCGTCACACAGAACGTGCTGTACGAAAACCAGAAGCTCAT



CGCTAACCAGTTTAATTCCGCAATTGGAAAGATCCAAGATT



CACTCAGCTCAACCGCCTCTGCACTCGGAAAGCTGCAGGAC



GTGGTCAACCAGAATGCTCAGGCCCTGAACACACTCGTCA



AGCAGCTGTCCTCTAACTTTGGCGCTATCAGCTCCGTTCTG



AACGACATTCTGAGCCGCCTGGATCCCCCAGAGGCTGAAG



TCCAGATTGACCGCCTGATTACCGGCCGGCTGCAGTCTCTG



CAAACATACGTGACCCAGCAGCTGATCAGAGCAGCCGAGA



TCCGGGCATCCGCAAATCTGGCAGCAACTAAGATGAGCGA



ATGCGTGCTGGGCCAGTCCAAGCGGGTGGACTTTTGTGGCA



AGGGCTACCACCTGATGAGCTTCCCCCAGAGCGCCCCACAT



GGCGTTGTTTTTCTGCACGTGACCTATGTCCCTGCTCAGGA



AAAGAACTTTACAACTGCTCCTGCTATCTGCCATGACGGCA



AGGCCCACTTCCCACGGGAGGGAGTGTTTGTGTCCAATGGC



ACACACTGGTTCGTGACCCAGAGGAACTTCTATGAACCCCA



GATCATCACCACTGACAATACCTTCGTGTCTGGAAATTGCG



ACGTCGTGATCGGCATCGTTAACAACACCGTGTACGACCCT



CTCCAGCCAGAGCTGGACTCCTTTAAGGAGGAACTGGATA



AGTATTTTAAGAACCACACAAGCCCAGATGTGGATCTCGG



GGACATCTCCGGAATTAACGCCTCCGTGGTGAATATCCAGA



AGGAGATTGACCGCCTAAATGAAGTTGCCAAGAACCTCAA



TGAGTCTCTGATTGATCTGCAGGAACTGGGCAAGTATGAGC



AGTATATCAAATGGCCCTGGTACATTTGGCTGGGGTTTATC



GCCGGACTGATTGCCATCGTCATGGTGACCATCATGCTGTG



TTGCATGACCTCCTGTTGTTCCTGTCTGAAGGGCTGCTGTA



GTTGCGGCTCTTGCTGTAAATTCGACGAAGATGATAGCGAG



CCCGTGCTGAAGGGCGTGAAGCTGCATTATACCTGA





The full length S2′
(SEQ ID NO: 86)


subunit of a SARS-CoV-

MFVFLVLLPLVSSQCSFIEDLLFNKVTLADAGFIKQYGDCLGDI



2 S protein, which has
AARDLICAQKFNGLTVLPPLLTDEMIAQYTSALLAGTITSGWT


been modified to remove
FGAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKLIANQFNS


residues 986 and 987
AIGKIQDSLSSTASALGKLQDVVNQNAQALNTLVKQLSSNFG


with proline, with a
AISSVLNDILSRLDPPEAEVQIDRLITGRLQSLQTYVTQQLIRA


signal sequence
AEIRASANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAP



HGVVFLHVTYVPAQEKNFTTAPAICHDGKAHFPREGVFVSNG



THWFVTQRNFYEPQIITTDNTFVSGNCDVVIGIVNNTVYDPLQ



PELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRL



NEVAKNLNESLIDLQELGKYEQYIKWPWYIWLGFIAGLIAIVM



VTIMLCCMTSCCSCLKGCCSCGSCCKFDEDDSEPVLKGVKLH



YT





Optimized nucleotide
(SEQ ID NO: 87)


sequence encoding of a
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


SARS-CoV-2 S protein
TGTGTCAACCTGACAACTAGGACTCAGCTGCCACCAGCCTA


which has been modified
CACCAACTCCTTCACCAGAGGCGTGTATTACCCAGACAAGG


to remove residues 985
TGTTTAGAAGCAGCGTGCTGCACTCTACCCAGGACCTCTTT


with proline
CTGCCCTTTTTCAGCAACGTGACATGGTTTCACGCAATTCA



CGTGTCCGGCACTAATGGCACAAAGCGGTTCGACAATCCA



GTCCTGCCTTTCAACGATGGCGTCTACTTTGCATCTACTGA



GAAATCCAATATCATTAGGGGATGGATCTTCGGCACAACCC



TGGATTCTAAGACCCAGAGCCTGCTGATCGTCAACAACGCC



ACAAACGTGGTCATTAAGGTTTGCGAGTTTCAGTTCTGTAA



CGATCCTTTTCTGGGCGTGTATTATCATAAGAACAATAAGA



GCTGGATGGAGTCCGAGTTTAGAGTGTATAGCTCTGCAAAT



AATTGTACCTTTGAGTACGTGAGCCAGCCCTTTCTGATGGA



CCTGGAGGGAAAACAAGGAAACTTCAAAAACCTGCGGGAA



TTCGTTTTCAAAAACATCGACGGCTATTTCAAGATCTATAG



CAAGCATACCCCAATCAACCTCGTGAGGGACCTCCCCCAG



GGCTTTAGCGCACTGGAGCCACTGGTTGACCTGCCTATCGG



CATTAATATCACAAGATTTCAGACCCTGCTGGCACTGCATA



GAAGCTATCTGACCCCTGGAGACTCCTCTAGTGGGTGGACT



GCCGGCGCCGCTGCCTACTATGTGGGCTATCTGCAGCCACG



GACATTCCTGCTGAAATACAATGAGAACGGGACAATCACA



GATGCTGTTGATTGCGCACTCGACCCCCTGTCCGAGACAAA



GTGCACTCTCAAGAGCTTTACCGTCGAGAAGGGCATCTATC



AGACCTCAAACTTCAGGGTGCAGCCCACAGAATCTATCGTG



CGCTTCCCTAATATCACTAACCTGTGTCCTTTCGGTGAAGT



GTTCAACGCCACCAGGTTTGCTAGCGTGTATGCCTGGAACA



GGAAGAGGATCTCTAACTGCGTCGCCGACTATTCCGTGCTG



TATAACAGCGCCTCCTTCTCCACATTCAAATGCTATGGAGT



GAGCCCGACAAAACTGAACGATCTCTGCTTTACAAATGTCT



ACGCCGACTCTTTTGTGATCAGAGGGGACGAGGTCCGGCA



GATCGCACCAGGACAGACAGGCAAGATTGCTGACTACAAC



TATAAGCTGCCTGACGACTTCACAGGATGTGTGATCGCATG



GAACTCAAACAATCTGGACTCCAAAGTCGGGGGCAACTAT



AATTACCTGTATCGCCTGTTCCGGAAGTCCAACCTGAAGCC



CTTCGAGAGGGACATCAGTACAGAGATCTATCAGGCTGGC



TCCACCCCTTGCAATGGCGTCGAAGGCTTTAATTGTTATTTT



CCCCTGCAGTCTTACGGGTTTCAGCCTACTAATGGAGTTGG



GTACCAGCCATACAGAGTGGTCGTGCTCAGCTTCGAGCTCC



TGCATGCTCCAGCTACAGTTTGCGGGCCAAAGAAGTCCACT



AACCTGGTGAAGAATAAGTGCGTCAACTTCAACTTTAACGG



GCTCACCGGCACCGGCGTGCTGACTGAGAGCAACAAGAAG



TTTCTGCCATTTCAACAGTTTGGACGGGACATTGCCGACAC



CACCGATGCCGTTCGGGATCCACAGACCCTGGAAATTCTGG



ACATTACACCGTGCAGCTTCGGGGGCGTGAGCGTGATCAC



ACCCGGAACCAATACAAGCAACCAGGTTGCCGTCCTGTATC



AGGATGTCAATTGCACAGAAGTGCCAGTTGCTATCCACGCA



GACCAGCTGACTCCCACATGGCGGGTGTATAGCACCGGAT



CCAACGTGTTTCAGACCCGCGCCGGATGTCTCATTGGGGCC



GAGCACGTGAATAACAGCTACGAGTGCGACATCCCCATTG



GCGCCGGCATTTGTGCGTCTTACCAGACTCAGACCAACTCT



CCTAGAAGGGCCAGGTCCGTTGCTAGTCAGTCTATTATTGC



CTATACCATGAGCCTCGGAGCTGAGAATAGCGTGGCCTACT



CCAATAATTCCATCGCAATCCCTACTAACTTCACTATTTCTG



TGACCACCGAGATCCTGCCTGTGTCTATGACTAAGACTAGC



GTTGATTGTACCATGTATATTTGTGGCGACTCTACCGAATG



TTCTAACCTGCTGCTTCAGTACGGCTCATTTTGCACACAGCT



GAACAGAGCCCTGACTGGGATCGCTGTGGAGCAGGACAAG



AACACACAGGAGGTGTTTGCACAGGTGAAGCAGATCTATA



AGACCCCTCCTATTAAGGATTTCGGCGGATTCAATTTCTCA



CAGATTCTGCCAGACCCCAGTAAGCCTTCCAAGAGGAGCTT



CATCGAGGATCTCCTGTTTAACAAGGTGACCCTGGCAGACG



CCGGCTTTATTAAGCAATATGGGGATTGCCTGGGCGACATT



GCTGCCAGAGACCTGATTTGCGCCCAGAAATTCAATGGCCT



CACAGTGCTGCCACCTCTGCTGACCGACGAGATGATCGCTC



AATACACTAGCGCACTGCTGGCCGGAACCATCACATCAGG



CTGGACCTTCGGGGCCGGAGCAGCACTGCAGATTCCATTCG



CCATGCAGATGGCCTATAGATTCAACGGCATTGGCGTCACA



CAGAACGTGCTGTACGAAAACCAGAAGCTCATCGCTAACC



AGTTTAATTCCGCAATTGGAAAGATCCAAGATTCACTCAGC



TCAACCGCCTCTGCACTCGGAAAGCTGCAGGACGTGGTCA



ACCAGAATGCTCAGGCCCTGAACACACTCGTCAAGCAGCT



GTCCTCTAACTTTGGCGCTATCAGCTCCGTTCTGAACGACA



TTCTGAGCCGCCTGCCCAAGGTGGAGGCTGAAGTCCAGATT



GACCGCCTGATTACCGGCCGGCTGCAGTCTCTGCAAACATA



CGTGACCCAGCAGCTGATCAGAGCAGCCGAGATCCGGGCA



TCCGCAAATCTGGCAGCAACTAAGATGAGCGAATGCGTGC



TGGGCCAGTCCAAGCGGGTGGACTTTTGTGGCAAGGGCTA



CCACCTGATGAGCTTCCCCCAGAGCGCCCCACATGGCGTTG



TTTTTCTGCACGTGACCTATGTCCCTGCTCAGGAAAAGAAC



TTTACAACTGCTCCTGCTATCTGCCATGACGGCAAGGCCCA



CTTCCCACGGGAGGGAGTGTTTGTGTCCAATGGCACACACT



GGTTCGTGACCCAGAGGAACTTCTATGAACCCCAGATCATC



ACCACTGACAATACCTTCGTGTCTGGAAATTGCGACGTCGT



GATCGGCATCGTTAACAACACCGTGTACGACCCTCTCCAGC



CAGAGCTGGACTCCTTTAAGGAGGAACTGGATAAGTATTTT



AAGAACCACACAAGCCCAGATGTGGATCTCGGGGACATCT



CCGGAATTAACGCCTCCGTGGTGAATATCCAGAAGGAGAT



TGACCGCCTAAATGAAGTTGCCAAGAACCTCAATGAGTCTC



TGATTGATCTGCAGGAACTGGGCAAGTATGAGCAGTATATC



AAATGGCCCTGGTACATTTGGCTGGGGTTTATCGCCGGACT



GATTGCCATCGTCATGGTGACCATCATGCTGTGTTGCATGA



CCTCCTGTTGTTCCTGTCTGAAGGGCTGCTGTAGTTGCGGCT



CTTGCTGTAAATTCGACGAAGATGATAGCGAGCCCGTGCTG



AAGGGCGTGAAGCTGCATTATACCTGA





A SARS-CoV-2 S
(SEQ ID NO: 88)


protein which has been
MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVF


modified to remove
RSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFDNPVLPF


residues 985 with
NDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKV


proline
CEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVS



QPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRD



LPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWT



AGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKC



TLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATR



FASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLN



DLCFTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTG



CVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQ



AGSTPCNGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFE



LLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKK



FLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTN



TSNQVAVLYQDVNCTEVPVAIHADQLTPTWRVYSTGSNVFQT



RAGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSPRRARSVA



SQSIIAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEILPVSMTK



TSVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDK



NTQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSFIEDL



LFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPL



LTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQMAYRF



NGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQ



DVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLPKVEAEV



QIDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVL



GQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNF



TTAPAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTD



NTFVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTS



PDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGK



YEQYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCC



SCGSCCKFDEDDSEPVLKGVKLHYT





Optimized nucleotide
(SEQ ID NO: 89)


sequence encoding a
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


SARS-CoV-2 S protein
TGTGTCAACCTGACAACTAGGACTCAGCTGCCACCAGCCTA


which has been modified
CACCAACTCCTTCACCAGAGGCGTGTATTACCCAGACAAGG


to remove a furin
TGTTTAGAAGCAGCGTGCTGCACTCTACCCAGGACCTCTTT


cleavage site and to
CTGCCCTTTTTCAGCAACGTGACATGGTTTCACGCAATTCA


replace residues 985
CGTGTCCGGCACTAATGGCACAAAGCGGTTCGACAATCCA


with proline
GTCCTGCCTTTCAACGATGGCGTCTACTTTGCATCTACTGA



GAAATCCAATATCATTAGGGGATGGATCTTCGGCACAACCC



TGGATTCTAAGACCCAGAGCCTGCTGATCGTCAACAACGCC



ACAAACGTGGTCATTAAGGTTTGCGAGTTTCAGTTCTGTAA



CGATCCTTTTCTGGGCGTGTATTATCATAAGAACAATAAGA



GCTGGATGGAGTCCGAGTTTAGAGTGTATAGCTCTGCAAAT



AATTGTACCTTTGAGTACGTGAGCCAGCCCTTTCTGATGGA



CCTGGAGGGAAAACAAGGAAACTTCAAAAACCTGCGGGAA



TTCGTTTTCAAAAACATCGACGGCTATTTCAAGATCTATAG



CAAGCATACCCCAATCAACCTCGTGAGGGACCTCCCCCAG



GGCTTTAGCGCACTGGAGCCACTGGTTGACCTGCCTATCGG



CATTAATATCACAAGATTTCAGACCCTGCTGGCACTGCATA



GAAGCTATCTGACCCCTGGAGACTCCTCTAGTGGGTGGACT



GCCGGCGCCGCTGCCTACTATGTGGGCTATCTGCAGCCACG



GACATTCCTGCTGAAATACAATGAGAACGGGACAATCACA



GATGCTGTTGATTGCGCACTCGACCCCCTGTCCGAGACAAA



GTGCACTCTCAAGAGCTTTACCGTCGAGAAGGGCATCTATC



AGACCTCAAACTTCAGGGTGCAGCCCACAGAATCTATCGTG



CGCTTCCCTAATATCACTAACCTGTGTCCTTTCGGTGAAGT



GTTCAACGCCACCAGGTTTGCTAGCGTGTATGCCTGGAACA



GGAAGAGGATCTCTAACTGCGTCGCCGACTATTCCGTGCTG



TATAACAGCGCCTCCTTCTCCACATTCAAATGCTATGGAGT



GAGCCCGACAAAACTGAACGATCTCTGCTTTACAAATGTCT



ACGCCGACTCTTTTGTGATCAGAGGGGACGAGGTCCGGCA



GATCGCACCAGGACAGACAGGCAAGATTGCTGACTACAAC



TATAAGCTGCCTGACGACTTCACAGGATGTGTGATCGCATG



GAACTCAAACAATCTGGACTCCAAAGTCGGGGGCAACTAT



AATTACCTGTATCGCCTGTTCCGGAAGTCCAACCTGAAGCC



CTTCGAGAGGGACATCAGTACAGAGATCTATCAGGCTGGC



TCCACCCCTTGCAATGGCGTCGAAGGCTTTAATTGTTATTTT



CCCCTGCAGTCTTACGGGTTTCAGCCTACTAATGGAGTTGG



GTACCAGCCATACAGAGTGGTCGTGCTCAGCTTCGAGCTCC



TGCATGCTCCAGCTACAGTTTGCGGGCCAAAGAAGTCCACT



AACCTGGTGAAGAATAAGTGCGTCAACTTCAACTTTAACGG



GCTCACCGGCACCGGCGTGCTGACTGAGAGCAACAAGAAG



TTTCTGCCATTTCAACAGTTTGGACGGGACATTGCCGACAC



CACCGATGCCGTTCGGGATCCACAGACCCTGGAAATTCTGG



ACATTACACCGTGCAGCTTCGGGGGCGTGAGCGTGATCAC



ACCCGGAACCAATACAAGCAACCAGGTTGCCGTCCTGTATC



AGGATGTCAATTGCACAGAAGTGCCAGTTGCTATCCACGCA



GACCAGCTGACTCCCACATGGCGGGTGTATAGCACCGGAT



CCAACGTGTTTCAGACCCGCGCCGGATGTCTCATTGGGGCC



GAGCACGTGAATAACAGCTACGAGTGCGACATCCCCATTG



GCGCCGGCATTTGTGCGTCTTACCAGACTCAGACCAACTCT



CCTGGCTCCGCCTCTTCCGTTGCTAGTCAGTCTATTATTGCC



TATACCATGAGCCTCGGAGCTGAGAATAGCGTGGCCTACTC



CAATAATTCCATCGCAATCCCTACTAACTTCACTATTTCTGT



GACCACCGAGATCCTGCCTGTGTCTATGACTAAGACTAGCG



TTGATTGTACCATGTATATTTGTGGCGACTCTACCGAATGTT



CTAACCTGCTGCTTCAGTACGGCTCATTTTGCACACAGCTG



AACAGAGCCCTGACTGGGATCGCTGTGGAGCAGGACAAGA



ACACACAGGAGGTGTTTGCACAGGTGAAGCAGATCTATAA



GACCCCTCCTATTAAGGATTTCGGCGGATTCAATTTCTCAC



AGATTCTGCCAGACCCCAGTAAGCCTTCCAAGAGGAGCTTC



ATCGAGGATCTCCTGTTTAACAAGGTGACCCTGGCAGACGC



CGGCTTTATTAAGCAATATGGGGATTGCCTGGGCGACATTG



CTGCCAGAGACCTGATTTGCGCCCAGAAATTCAATGGCCTC



ACAGTGCTGCCACCTCTGCTGACCGACGAGATGATCGCTCA



ATACACTAGCGCACTGCTGGCCGGAACCATCACATCAGGCT



GGACCTTCGGGGCCGGAGCAGCACTGCAGATTCCATTCGCC



ATGCAGATGGCCTATAGATTCAACGGCATTGGCGTCACACA



GAACGTGCTGTACGAAAACCAGAAGCTCATCGCTAACCAG



TTTAATTCCGCAATTGGAAAGATCCAAGATTCACTCAGCTC



AACCGCCTCTGCACTCGGAAAGCTGCAGGACGTGGTCAAC



CAGAATGCTCAGGCCCTGAACACACTCGTCAAGCAGCTGTC



CTCTAACTTTGGCGCTATCAGCTCCGTTCTGAACGACATTCT



GAGCCGCCTGCCCAAGGTGGAGGCTGAAGTCCAGATTGAC



CGCCTGATTACCGGCCGGCTGCAGTCTCTGCAAACATACGT



GACCCAGCAGCTGATCAGAGCAGCCGAGATCCGGGCATCC



GCAAATCTGGCAGCAACTAAGATGAGCGAATGCGTGCTGG



GCCAGTCCAAGCGGGTGGACTTTTGTGGCAAGGGCTACCA



CCTGATGAGCTTCCCCCAGAGCGCCCCACATGGCGTTGTTT



TTCTGCACGTGACCTATGTCCCTGCTCAGGAAAAGAACTTT



ACAACTGCTCCTGCTATCTGCCATGACGGCAAGGCCCACTT



CCCACGGGAGGGAGTGTTTGTGTCCAATGGCACACACTGGT



TCGTGACCCAGAGGAACTTCTATGAACCCCAGATCATCACC



ACTGACAATACCTTCGTGTCTGGAAATTGCGACGTCGTGAT



CGGCATCGTTAACAACACCGTGTACGACCCTCTCCAGCCAG



AGCTGGACTCCTTTAAGGAGGAACTGGATAAGTATTTTAAG



AACCACACAAGCCCAGATGTGGATCTCGGGGACATCTCCG



GAATTAACGCCTCCGTGGTGAATATCCAGAAGGAGATTGA



CCGCCTAAATGAAGTTGCCAAGAACCTCAATGAGTCTCTGA



TTGATCTGCAGGAACTGGGCAAGTATGAGCAGTATATCAA



ATGGCCCTGGTACATTTGGCTGGGGTTTATCGCCGGACTGA



TTGCCATCGTCATGGTGACCATCATGCTGTGTTGCATGACC



TCCTGTTGTTCCTGTCTGAAGGGCTGCTGTAGTTGCGGCTCT



TGCTGTAAATTCGACGAAGATGATAGCGAGCCCGTGCTGA



AGGGCGTGAAGCTGCATTATACCTGA





A SARS-CoV-2 S
(SEQ ID NO: 90)


protein which has been
MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVF


modified to remove a
RSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFDNPVLPF


furin cleavage site and
NDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKV


to replace residues 985
CEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVS


with proline
QPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRD



LPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWT



AGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKC



TLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATR



FASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLN



DLCFTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTG



CVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQ



AGSTPCNGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFE



LLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKK



FLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTN



TSNQVAVLYQDVNCTEVPVAIHADQLTPTWRVYSTGSNVFQT



RAGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSPGSASSVAS



QSIIAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEILPVSMTKT



SVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDK



NTQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSFIEDL



LFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPL



LTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQMAYRF



NGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQ



DVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLPKVEAEV



QIDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVL



GQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNF



TTAPAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTD



NTFVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTS



PDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGK



YEQYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCC



SCGSCCKFDEDDSEPVLKGVKLHYT





Optimized nucleotide
(SEQ ID NO: 91)


sequence encoding a
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


SARS-CoV-2 S protein
TGTGTCAACCTGACAACTAGGACTCAGCTGCCACCAGCCTA


which has been modified
CACCAACTCCTTCACCAGAGGCGTGTATTACCCAGACAAGG


to replace residues 985,
TGTTTAGAAGCAGCGTGCTGCACTCTACCCAGGACCTCTTT


986 and 987 with proline
CTGCCCTTTTTCAGCAACGTGACATGGTTTCACGCAATTCA



CGTGTCCGGCACTAATGGCACAAAGCGGTTCGACAATCCA



GTCCTGCCTTTCAACGATGGCGTCTACTTTGCATCTACTGA



GAAATCCAATATCATTAGGGGATGGATCTTCGGCACAACCC



TGGATTCTAAGACCCAGAGCCTGCTGATCGTCAACAACGCC



ACAAACGTGGTCATTAAGGTTTGCGAGTTTCAGTTCTGTAA



CGATCCTTTTCTGGGCGTGTATTATCATAAGAACAATAAGA



GCTGGATGGAGTCCGAGTTTAGAGTGTATAGCTCTGCAAAT



AATTGTACCTTTGAGTACGTGAGCCAGCCCTTTCTGATGGA



CCTGGAGGGAAAACAAGGAAACTTCAAAAACCTGCGGGAA



TTCGTTTTCAAAAACATCGACGGCTATTTCAAGATCTATAG



CAAGCATACCCCAATCAACCTCGTGAGGGACCTCCCCCAG



GGCTTTAGCGCACTGGAGCCACTGGTTGACCTGCCTATCGG



CATTAATATCACAAGATTTCAGACCCTGCTGGCACTGCATA



GAAGCTATCTGACCCCTGGAGACTCCTCTAGTGGGTGGACT



GCCGGCGCCGCTGCCTACTATGTGGGCTATCTGCAGCCACG



GACATTCCTGCTGAAATACAATGAGAACGGGACAATCACA



GATGCTGTTGATTGCGCACTCGACCCCCTGTCCGAGACAAA



GTGCACTCTCAAGAGCTTTACCGTCGAGAAGGGCATCTATC



AGACCTCAAACTTCAGGGTGCAGCCCACAGAATCTATCGTG



CGCTTCCCTAATATCACTAACCTGTGTCCTTTCGGTGAAGT



GTTCAACGCCACCAGGTTTGCTAGCGTGTATGCCTGGAACA



GGAAGAGGATCTCTAACTGCGTCGCCGACTATTCCGTGCTG



TATAACAGCGCCTCCTTCTCCACATTCAAATGCTATGGAGT



GAGCCCGACAAAACTGAACGATCTCTGCTTTACAAATGTCT



ACGCCGACTCTTTTGTGATCAGAGGGGACGAGGTCCGGCA



GATCGCACCAGGACAGACAGGCAAGATTGCTGACTACAAC



TATAAGCTGCCTGACGACTTCACAGGATGTGTGATCGCATG



GAACTCAAACAATCTGGACTCCAAAGTCGGGGGCAACTAT



AATTACCTGTATCGCCTGTTCCGGAAGTCCAACCTGAAGCC



CTTCGAGAGGGACATCAGTACAGAGATCTATCAGGCTGGC



TCCACCCCTTGCAATGGCGTCGAAGGCTTTAATTGTTATTTT



CCCCTGCAGTCTTACGGGTTTCAGCCTACTAATGGAGTTGG



GTACCAGCCATACAGAGTGGTCGTGCTCAGCTTCGAGCTCC



TGCATGCTCCAGCTACAGTTTGCGGGCCAAAGAAGTCCACT



AACCTGGTGAAGAATAAGTGCGTCAACTTCAACTTTAACGG



GCTCACCGGCACCGGCGTGCTGACTGAGAGCAACAAGAAG



TTTCTGCCATTTCAACAGTTTGGACGGGACATTGCCGACAC



CACCGATGCCGTTCGGGATCCACAGACCCTGGAAATTCTGG



ACATTACACCGTGCAGCTTCGGGGGCGTGAGCGTGATCAC



ACCCGGAACCAATACAAGCAACCAGGTTGCCGTCCTGTATC



AGGATGTCAATTGCACAGAAGTGCCAGTTGCTATCCACGCA



GACCAGCTGACTCCCACATGGCGGGTGTATAGCACCGGAT



CCAACGTGTTTCAGACCCGCGCCGGATGTCTCATTGGGGCC



GAGCACGTGAATAACAGCTACGAGTGCGACATCCCCATTG



GCGCCGGCATTTGTGCGTCTTACCAGACTCAGACCAACTCT



CCTAGAAGGGCCAGGTCCGTTGCTAGTCAGTCTATTATTGC



CTATACCATGAGCCTCGGAGCTGAGAATAGCGTGGCCTACT



CCAATAATTCCATCGCAATCCCTACTAACTTCACTATTTCTG



TGACCACCGAGATCCTGCCTGTGTCTATGACTAAGACTAGC



GTTGATTGTACCATGTATATTTGTGGCGACTCTACCGAATG



TTCTAACCTGCTGCTTCAGTACGGCTCATTTTGCACACAGCT



GAACAGAGCCCTGACTGGGATCGCTGTGGAGCAGGACAAG



AACACACAGGAGGTGTTTGCACAGGTGAAGCAGATCTATA



AGACCCCTCCTATTAAGGATTTCGGCGGATTCAATTTCTCA



CAGATTCTGCCAGACCCCAGTAAGCCTTCCAAGAGGAGCTT



CATCGAGGATCTCCTGTTTAACAAGGTGACCCTGGCAGACG



CCGGCTTTATTAAGCAATATGGGGATTGCCTGGGCGACATT



GCTGCCAGAGACCTGATTTGCGCCCAGAAATTCAATGGCCT



CACAGTGCTGCCACCTCTGCTGACCGACGAGATGATCGCTC



AATACACTAGCGCACTGCTGGCCGGAACCATCACATCAGG



CTGGACCTTCGGGGCCGGAGCAGCACTGCAGATTCCATTCG



CCATGCAGATGGCCTATAGATTCAACGGCATTGGCGTCACA



CAGAACGTGCTGTACGAAAACCAGAAGCTCATCGCTAACC



AGTTTAATTCCGCAATTGGAAAGATCCAAGATTCACTCAGC



TCAACCGCCTCTGCACTCGGAAAGCTGCAGGACGTGGTCA



ACCAGAATGCTCAGGCCCTGAACACACTCGTCAAGCAGCT



GTCCTCTAACTTTGGCGCTATCAGCTCCGTTCTGAACGACA



TTCTGAGCCGCCTGCCTCCACCCGAGGCTGAAGTCCAGATT



GACCGCCTGATTACCGGCCGGCTGCAGTCTCTGCAAACATA



CGTGACCCAGCAGCTGATCAGAGCAGCCGAGATCCGGGCA



TCCGCAAATCTGGCAGCAACTAAGATGAGCGAATGCGTGC



TGGGCCAGTCCAAGCGGGTGGACTTTTGTGGCAAGGGCTA



CCACCTGATGAGCTTCCCCCAGAGCGCCCCACATGGCGTTG



TTTTTCTGCACGTGACCTATGTCCCTGCTCAGGAAAAGAAC



TTTACAACTGCTCCTGCTATCTGCCATGACGGCAAGGCCCA



CTTCCCACGGGAGGGAGTGTTTGTGTCCAATGGCACACACT



GGTTCGTGACCCAGAGGAACTTCTATGAACCCCAGATCATC



ACCACTGACAATACCTTCGTGTCTGGAAATTGCGACGTCGT



GATCGGCATCGTTAACAACACCGTGTACGACCCTCTCCAGC



CAGAGCTGGACTCCTTTAAGGAGGAACTGGATAAGTATTTT



AAGAACCACACAAGCCCAGATGTGGATCTCGGGGACATCT



CCGGAATTAACGCCTCCGTGGTGAATATCCAGAAGGAGAT



TGACCGCCTAAATGAAGTTGCCAAGAACCTCAATGAGTCTC



TGATTGATCTGCAGGAACTGGGCAAGTATGAGCAGTATATC



AAATGGCCCTGGTACATTTGGCTGGGGTTTATCGCCGGACT



GATTGCCATCGTCATGGTGACCATCATGCTGTGTTGCATGA



CCTCCTGTTGTTCCTGTCTGAAGGGCTGCTGTAGTTGCGGCT



CTTGCTGTAAATTCGACGAAGATGATAGCGAGCCCGTGCTG



AAGGGCGTGAAGCTGCATTATACCTGA





A SARS-CoV-2 S
(SEQ ID NO: 92)


protein which has been
MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVF


modified to replace
RSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFDNPVLPF


residues 985, 986 and
NDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKV


987 with proline
CEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVS



QPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRD



LPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWT



AGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKC



TLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATR



FASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLN



DLCFTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTG



CVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQ



AGSTPCNGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFE



LLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKK



FLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTN



TSNQVAVLYQDVNCTEVPVAIHADQLTPTWRVYSTGSNVFQT



RAGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSPRRARSVA



SQSIIAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEILPVSMTK



TSVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDK



NTQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSFIEDL



LFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPL



LTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQMAYRF



NGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQ



DVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLPPPEAEVQ



IDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLG



QSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTT



APAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTDNT



FVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTSPD



VDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYE



QYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCCSC



GSCCKFDEDDSEPVLKGVKLHYT





Optimized nucleotide
(SEQ ID NO: 93)


encoding a SARS-CoV-2
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


S protein sequence
TGTGTCAACCTGACAACTAGGACTCAGCTGCCACCAGCCTA


which has been modified
CACCAACTCCTTCACCAGAGGCGTGTATTACCCAGACAAGG


to remove a furin
TGTTTAGAAGCAGCGTGCTGCACTCTACCCAGGACCTCTTT


cleavage site and to
CTGCCCTTTTTCAGCAACGTGACATGGTTTCACGCAATTCA


replace residues 985,
CGTGTCCGGCACTAATGGCACAAAGCGGTTCGACAATCCA


986 and 987 with proline
GTCCTGCCTTTCAACGATGGCGTCTACTTTGCATCTACTGA



GAAATCCAATATCATTAGGGGATGGATCTTCGGCACAACCC



TGGATTCTAAGACCCAGAGCCTGCTGATCGTCAACAACGCC



ACAAACGTGGTCATTAAGGTTTGCGAGTTTCAGTTCTGTAA



CGATCCTTTTCTGGGCGTGTATTATCATAAGAACAATAAGA



GCTGGATGGAGTCCGAGTTTAGAGTGTATAGCTCTGCAAAT



AATTGTACCTTTGAGTACGTGAGCCAGCCCTTTCTGATGGA



CCTGGAGGGAAAACAAGGAAACTTCAAAAACCTGCGGGAA



TTCGTTTTCAAAAACATCGACGGCTATTTCAAGATCTATAG



CAAGCATACCCCAATCAACCTCGTGAGGGACCTCCCCCAG



GGCTTTAGCGCACTGGAGCCACTGGTTGACCTGCCTATCGG



CATTAATATCACAAGATTTCAGACCCTGCTGGCACTGCATA



GAAGCTATCTGACCCCTGGAGACTCCTCTAGTGGGTGGACT



GCCGGCGCCGCTGCCTACTATGTGGGCTATCTGCAGCCACG



GACATTCCTGCTGAAATACAATGAGAACGGGACAATCACA



GATGCTGTTGATTGCGCACTCGACCCCCTGTCCGAGACAAA



GTGCACTCTCAAGAGCTTTACCGTCGAGAAGGGCATCTATC



AGACCTCAAACTTCAGGGTGCAGCCCACAGAATCTATCGTG



CGCTTCCCTAATATCACTAACCTGTGTCCTTTCGGTGAAGT



GTTCAACGCCACCAGGTTTGCTAGCGTGTATGCCTGGAACA



GGAAGAGGATCTCTAACTGCGTCGCCGACTATTCCGTGCTG



TATAACAGCGCCTCCTTCTCCACATTCAAATGCTATGGAGT



GAGCCCGACAAAACTGAACGATCTCTGCTTTACAAATGTCT



ACGCCGACTCTTTTGTGATCAGAGGGGACGAGGTCCGGCA



GATCGCACCAGGACAGACAGGCAAGATTGCTGACTACAAC



TATAAGCTGCCTGACGACTTCACAGGATGTGTGATCGCATG



GAACTCAAACAATCTGGACTCCAAAGTCGGGGGCAACTAT



AATTACCTGTATCGCCTGTTCCGGAAGTCCAACCTGAAGCC



CTTCGAGAGGGACATCAGTACAGAGATCTATCAGGCTGGC



TCCACCCCTTGCAATGGCGTCGAAGGCTTTAATTGTTATTTT



CCCCTGCAGTCTTACGGGTTTCAGCCTACTAATGGAGTTGG



GTACCAGCCATACAGAGTGGTCGTGCTCAGCTTCGAGCTCC



TGCATGCTCCAGCTACAGTTTGCGGGCCAAAGAAGTCCACT



AACCTGGTGAAGAATAAGTGCGTCAACTTCAACTTTAACGG



GCTCACCGGCACCGGCGTGCTGACTGAGAGCAACAAGAAG



TTTCTGCCATTTCAACAGTTTGGACGGGACATTGCCGACAC



CACCGATGCCGTTCGGGATCCACAGACCCTGGAAATTCTGG



ACATTACACCGTGCAGCTTCGGGGGCGTGAGCGTGATCAC



ACCCGGAACCAATACAAGCAACCAGGTTGCCGTCCTGTATC



AGGATGTCAATTGCACAGAAGTGCCAGTTGCTATCCACGCA



GACCAGCTGACTCCCACATGGCGGGTGTATAGCACCGGAT



CCAACGTGTTTCAGACCCGCGCCGGATGTCTCATTGGGGCC



GAGCACGTGAATAACAGCTACGAGTGCGACATCCCCATTG



GCGCCGGCATTTGTGCGTCTTACCAGACTCAGACCAACTCT



CCTGGCTCCGCCTCTTCCGTTGCTAGTCAGTCTATTATTGCC



TATACCATGAGCCTCGGAGCTGAGAATAGCGTGGCCTACTC



CAATAATTCCATCGCAATCCCTACTAACTTCACTATTTCTGT



GACCACCGAGATCCTGCCTGTGTCTATGACTAAGACTAGCG



TTGATTGTACCATGTATATTTGTGGCGACTCTACCGAATGTT



CTAACCTGCTGCTTCAGTACGGCTCATTTTGCACACAGCTG



AACAGAGCCCTGACTGGGATCGCTGTGGAGCAGGACAAGA



ACACACAGGAGGTGTTTGCACAGGTGAAGCAGATCTATAA



GACCCCTCCTATTAAGGATTTCGGCGGATTCAATTTCTCAC



AGATTCTGCCAGACCCCAGTAAGCCTTCCAAGAGGAGCTTC



ATCGAGGATCTCCTGTTTAACAAGGTGACCCTGGCAGACGC



CGGCTTTATTAAGCAATATGGGGATTGCCTGGGCGACATTG



CTGCCAGAGACCTGATTTGCGCCCAGAAATTCAATGGCCTC



ACAGTGCTGCCACCTCTGCTGACCGACGAGATGATCGCTCA



ATACACTAGCGCACTGCTGGCCGGAACCATCACATCAGGCT



GGACCTTCGGGGCCGGAGCAGCACTGCAGATTCCATTCGCC



ATGCAGATGGCCTATAGATTCAACGGCATTGGCGTCACACA



GAACGTGCTGTACGAAAACCAGAAGCTCATCGCTAACCAG



TTTAATTCCGCAATTGGAAAGATCCAAGATTCACTCAGCTC



AACCGCCTCTGCACTCGGAAAGCTGCAGGACGTGGTCAAC



CAGAATGCTCAGGCCCTGAACACACTCGTCAAGCAGCTGTC



CTCTAACTTTGGCGCTATCAGCTCCGTTCTGAACGACATTCT



GAGCCGCCTGCCTCCACCCGAGGCTGAAGTCCAGATTGACC



GCCTGATTACCGGCCGGCTGCAGTCTCTGCAAACATACGTG



ACCCAGCAGCTGATCAGAGCAGCCGAGATCCGGGCATCCG



CAAATCTGGCAGCAACTAAGATGAGCGAATGCGTGCTGGG



CCAGTCCAAGCGGGTGGACTTTTGTGGCAAGGGCTACCACC



TGATGAGCTTCCCCCAGAGCGCCCCACATGGCGTTGTTTTT



CTGCACGTGACCTATGTCCCTGCTCAGGAAAAGAACTTTAC



AACTGCTCCTGCTATCTGCCATGACGGCAAGGCCCACTTCC



CACGGGAGGGAGTGTTTGTGTCCAATGGCACACACTGGTTC



GTGACCCAGAGGAACTTCTATGAACCCCAGATCATCACCAC



TGACAATACCTTCGTGTCTGGAAATTGCGACGTCGTGATCG



GCATCGTTAACAACACCGTGTACGACCCTCTCCAGCCAGAG



CTGGACTCCTTTAAGGAGGAACTGGATAAGTATTTTAAGAA



CCACACAAGCCCAGATGTGGATCTCGGGGACATCTCCGGA



ATTAACGCCTCCGTGGTGAATATCCAGAAGGAGATTGACC



GCCTAAATGAAGTTGCCAAGAACCTCAATGAGTCTCTGATT



GATCTGCAGGAACTGGGCAAGTATGAGCAGTATATCAAAT



GGCCCTGGTACATTTGGCTGGGGTTTATCGCCGGACTGATT



GCCATCGTCATGGTGACCATCATGCTGTGTTGCATGACCTC



CTGTTGTTCCTGTCTGAAGGGCTGCTGTAGTTGCGGCTCTTG



CTGTAAATTCGACGAAGATGATAGCGAGCCCGTGCTGAAG



GGCGTGAAGCTGCATTATACCTGA





A SARS-CoV-2 S
(SEQ ID NO: 94)


protein which has been
MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVF


modified to remove a
RSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFDNPVLPF


furin cleavage site and
NDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKV


to replace residues 985,
CEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVS


986 and 987 with proline
QPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRD



LPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWT



AGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKC



TLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATR



FASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLN



DLCFTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTG



CVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQ



AGSTPCNGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFE



LLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKK



FLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTN



TSNQVAVLYQDVNCTEVPVAIHADQLTPTWRVYSTGSNVFQT



RAGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSPGSASSVAS



QSIIAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEILPVSMTKT



SVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDK



NTQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSFIEDL



LFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPL



LTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQMAYRF



NGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQ



DVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLPPPEAEVQ



IDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLG



QSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTT



APAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTDNT



FVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTSPD



VDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYE



QYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCCSC



GSCCKFDEDDSEPVLKGVKLHYT





Optimized nucleotide
(SEQ ID NO: 95)


sequence encoding a
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


SARS-CoV-2 S2′
TGTAGCTTCATCGAGGATCTCCTGTTTAACAAGGTGACCCT


subunit protein
GGCAGACGCCGGCTTTATTAAGCAATATGGGGATTGCCTGG


sequence with a signal
GCGACATTGCTGCCAGAGACCTGATTTGCGCCCAGAAATTC


sequence
AATGGCCTCACAGTGCTGCCACCTCTGCTGACCGACGAGAT



GATCGCTCAATACACTAGCGCACTGCTGGCCGGAACCATCA



CATCAGGCTGGACCTTCGGGGCCGGAGCAGCACTGCAGAT



TCCATTCGCCATGCAGATGGCCTATAGATTCAACGGCATTG



GCGTCACACAGAACGTGCTGTACGAAAACCAGAAGCTCAT



CGCTAACCAGTTTAATTCCGCAATTGGAAAGATCCAAGATT



CACTCAGCTCAACCGCCTCTGCACTCGGAAAGCTGCAGGAC



GTGGTCAACCAGAATGCTCAGGCCCTGAACACACTCGTCA



AGCAGCTGTCCTCTAACTTTGGCGCTATCAGCTCCGTTCTG



AACGACATTCTGAGCCGCCTGGATAAGGTGGAGGCTGAAG



TCCAGATTGACCGCCTGATTACCGGCCGGCTGCAGTCTCTG



CAAACATACGTGACCCAGCAGCTGATCAGAGCAGCCGAGA



TCCGGGCATCCGCAAATCTGGCAGCAACTAAGATGAGCGA



ATGCGTGCTGGGCCAGTCCAAGCGGGTGGACTTTTGTGGCA



AGGGCTACCACCTGATGAGCTTCCCCCAGAGCGCCCCACAT



GGCGTTGTTTTTCTGCACGTGACCTATGTCCCTGCTCAGGA



AAAGAACTTTACAACTGCTCCTGCTATCTGCCATGACGGCA



AGGCCCACTTCCCACGGGAGGGAGTGTTTGTGTCCAATGGC



ACACACTGGTTCGTGACCCAGAGGAACTTCTATGAACCCCA



GATCATCACCACTGACAATACCTTCGTGTCTGGAAATTGCG



ACGTCGTGATCGGCATCGTTAACAACACCGTGTACGACCCT



CTCCAGCCAGAGCTGGACTCCTTTAAGGAGGAACTGGATA



AGTATTTTAAGAACCACACAAGCCCAGATGTGGATCTCGG



GGACATCTCCGGAATTAACGCCTCCGTGGTGAATATCCAGA



AGGAGATTGACCGCCTAAATGAAGTTGCCAAGAACCTCAA



TGAGTCTCTGATTGATCTGCAGGAACTGGGCAAGTATGAGC



AGTATATCAAATGGCCCTGGTACATTTGGCTGGGGTTTATC



GCCGGACTGATTGCCATCGTCATGGTGACCATCATGCTGTG



TTGCATGACCTCCTGTTGTTCCTGTCTGAAGGGCTGCTGTA



GTTGCGGCTCTTGCTGTAAATTCGACGAAGATGATAGCGAG



CCCGTGCTGAAGGGCGTGAAGCTGCATTATACCTGA





Full length SARS-CoV-2
(SEQ ID NO: 96)


S2′ subunit of a SARS

MFVFLVLLPLVSSQCSFIEDLLFNKVTLADAGFIKQYGDCLGDI



CoV-2 protein with a
AARDLICAQKFNGLTVLPPLLTDEMIAQYTSALLAGTITSGWT


signal sequence
FGAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKLIANQFNS



AIGKIQDSLSSTASALGKLQDVVNQNAQALNTLVKQLSSNFG



AISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRA



AEIRASANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAP



HGVVFLHVTYVPAQEKNFTTAPAICHDGKAHFPREGVFVSNG



THWFVTQRNFYEPQIITTDNTFVSGNCDVVIGIVNNTVYDPLQ



PELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRL



NEVAKNLNESLIDLQELGKYEQYIKWPWYIWLGFIAGLIAIVM



VTIMLCCMTSCCSCLKGCCSCGSCCKFDEDDSEPVLKGVKLH



YT





Optimized nucleotide
(SEQ ID NO: 105)


sequence encoding a
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


receptor-binding
TGTCCTAATATCACTAACCTGTGTCCTTTCGGTGAAGTGTTC


domain of SARS-CoV-2
AACGCCACCAGGTTTGCTAGCGTGTATGCCTGGAACAGGA


S protein with a signal
AGAGGATCTCTAACTGCGTCGCCGACTATTCCGTGCTGTAT


sequence and a mutated
AACAGCGCCTCCTTCTCCACATTCAAATGCTATGGAGTGAG


Fc region
CCCGACAAAACTGAACGATCTCTGCTTTACAAATGTCTACG


(L309D/Q311H/N434S)
CCGACTCTTTTGTGATCAGAGGGGACGAGGTCCGGCAGATC



GCACCAGGACAGACAGGCAAGATTGCTGACTACAACTATA



AGCTGCCTGACGACTTCACAGGATGTGTGATCGCATGGAAC



TCAAACAATCTGGACTCCAAAGTCGGGGGCAACTATAATT



ACCTGTATCGCCTGTTCCGGAAGTCCAACCTGAAGCCCTTC



GAGAGGGACATCAGTACAGAGATCTATCAGGCTGGCTCCA



CCCCTTGCAATGGCGTCGAAGGCTTTAATTGTTATTTTCCCC



TGCAGTCTTACGGGTTTCAGCCTACTAATGGAGTTGGGTAC



CAGCCATACAGAGTGGTCGTGCTCAGCTTCGAGCTCCTGCA



TGCTCCACCTAAGTCCTGCGACAAAACCCATACATGTCCAC



CATGCCCAGCTCCTGAACTGCTCGGCGGGCCTAGTGTTTTC



CTCTTCCCTCCTAAGCCCAAGGATACCCTCATGATCTCTCG



CACACCAGAAGTGACCTGCGTGGTCGTGGATGTCTCTCACG



AGGATCCTGAAGTGAAGTTTAACTGGTATGTCGACGGAGT



GGAAGTGCACAACGCCAAGACAAAGCCAAGAGAAGAACA



ATACAATTCTACTTATAGGGTGGTGTCTGTGCTGACAGTGG



ATCACCATGATTGGCTGAATGGAAAAGAATATAAGTGTAA



GGTCTCTAACAAGGCCCTGCCCGCTCCAATTGAGAAGACA



ATTTCCAAGGCCAAGGGGCAGCCTCGGGAACCTCAGGTGT



ACACACTGCCCCCATCCAGGGATGAACTGACTAAAAATCA



GGTGTCTCTGACATGCCTGGTGAAAGGGTTTTATCCAAGTG



ACATTGCTGTGGAGTGGGAGTCTAATGGGCAGCCTGAAAA



TAACTACAAGACCACACCACCAGTGCTCGATAGCGACGGG



TCTTTCTTTCTGTATTCTAAACTGACCGTGGATAAATCTCGG



TGGCAGCAGGGAAACGTGTTTTCTTGCTCAGTGATGCACGA



AGCTCTGCACTCTCACTATACACAGAAATCCCTGTCCCTGT



CTCCAGGCAAATAA





A receptor-binding
(SEQ ID NO: 104)


domain of SARS-CoV-2

MFVFLVLLPLVSSQCPNITNLCPFGEVFNATRFASVYAWNRKR



S protein with a signal
ISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSF


sequence and a mutated
VIRGDEVRQIAPGQTGKIADYNYKLPDDFTGCVIAWNSNNLD


Fc region
SKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAGSTPCNGVEG


(L309D/Q311H/N434S)
FNCYFPLQSYGFQPTNGVGYQPYRVVVLSFELLHAPPKSCDKT




HTCPPCPAPELLGGPSVFLFPPKPKDTLMISRTPEVTCVVVDVSH





EDPEVKFNWYVDGVEVHNAKTKPREEQYNSTYRVVSVLTV


D


H


H


D





WLNGKEYKCKVSNKALPAPIEKTISKAKGOPREPQVYTLPPSRDE





LTKNQVSLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLDSDG





SFFLYSKLTVDKSRWQQGNVFSCSVMHEALH


S


HYTQKSLSLSPGK






Optimized nucleotide
(SEQ ID NO: 107)


sequence encoding a
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


receptor-binding
TGTCCTAATATCACTAACCTGTGTCCTTTCGGTGAAGTGTTC


domain of SARS-CoV-2
AACGCCACCAGGTTTGCTAGCGTGTATGCCTGGAACAGGA


S protein with a signal
AGAGGATCTCTAACTGCGTCGCCGACTATTCCGTGCTGTAT


sequence and a mutated
AACAGCGCCTCCTTCTCCACATTCAAATGCTATGGAGTGAG


Fc region
CCCGACAAAACTGAACGATCTCTGCTTTACAAATGTCTACG


(M428L/N434S)
CCGACTCTTTTGTGATCAGAGGGGACGAGGTCCGGCAGATC



GCACCAGGACAGACAGGCAAGATTGCTGACTACAACTATA



AGCTGCCTGACGACTTCACAGGATGTGTGATCGCATGGAAC



TCAAACAATCTGGACTCCAAAGTCGGGGGCAACTATAATT



ACCTGTATCGCCTGTTCCGGAAGTCCAACCTGAAGCCCTTC



GAGAGGGACATCAGTACAGAGATCTATCAGGCTGGCTCCA



CCCCTTGCAATGGCGTCGAAGGCTTTAATTGTTATTTTCCCC



TGCAGTCTTACGGGTTTCAGCCTACTAATGGAGTTGGGTAC



CAGCCATACAGAGTGGTCGTGCTCAGCTTCGAGCTCCTGCA



TGCTCCACCTAAGTCCTGCGACAAAACCCATACATGTCCAC



CATGCCCAGCTCCTGAACTGCTCGGCGGGCCTAGTGTTTTC



CTCTTCCCTCCTAAGCCCAAGGATACCCTCTATATCACTCG



CGAACCAGAAGTGACCTGCGTGGTCGTGGATGTCTCTCACG



AGGATCCTGAAGTGAAGTTTAACTGGTATGTCGACGGAGT



GGAAGTGCACAACGCCAAGACAAAGCCAAGAGAAGAACA



ATACAATTCTACTTATAGGGTGGTGTCTGTGCTGACAGTGC



TGCACCAGGATTGGCTGAATGGAAAAGAATATAAGTGTAA



GGTCTCTAACAAGGCCCTGCCCGCTCCAATTGAGAAGACA



ATTTCCAAGGCCAAGGGGCAGCCTCGGGAACCTCAGGTGT



ACACACTGCCCCCATCCAGGGATGAACTGACTAAAAATCA



GGTGTCTCTGACATGCCTGGTGAAAGGGTTTTATCCAAGTG



ACATTGCTGTGGAGTGGGAGTCTAATGGGCAGCCTGAAAA



TAACTACAAGACCACACCACCAGTGCTCGATAGCGACGGG



TCTTTCTTTCTGTATTCTAAACTGACCGTGGATAAATCTCGG



TGGCAGCAGGGAAACGTGTTTTCTTGCTCAGTGATGCACGA



AGCTCTGCACAATCACTATACACAGAAATCCCTGTCCCTGT



CTCCAGGCAAATAA





A receptor-binding
(SEQ ID NO: 106)


domain of SARS-CoV-2

MFVFLVLLPLVSSQCPNITNLCPFGEVFNATRFASVYAWNRKR



S protein with a signal
ISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSF


sequence and a mutated
VIRGDEVRQIAPGQTGKIADYNYKLPDDFTGCVIAWNSNNLD


Fc region
SKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAGSTPCNGVEG


(M428L/N434S)
FNCYFPLQSYGFQPTNGVGYQPYRVVVLSFELLHAPPKSCDKT




HTCPPCPAPELLGGPSVFLFPPKPKDTLMISRTPEVTCVVVDVSH





EDPEVKFNWYVDGVEVHNAKTKPREEQYNSTYRVVSVLTVLHQD





WLNGKEYKCKVSNKALPAPIEKTISKAKGQPREPQVYTLPPSRDE





LTKNQVSLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLDSDG





SFFLYSKLTVDKSRWQQGNVFSCSV


L


HEALH


S


HYTQKSLSLSPGK






Optimized nucleotide
(SEQ ID NO: 109)


sequence encoding a
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


receptor-binding
TGTCCTAATATCACTAACCTGTGTCCTTTCGGTGAAGTGTTC


domain of SARS-CoV-2
AACGCCACCAGGTTTGCTAGCGTGTATGCCTGGAACAGGA


S protein with a signal
AGAGGATCTCTAACTGCGTCGCCGACTATTCCGTGCTGTAT


sequence and a mutated
AACAGCGCCTCCTTCTCCACATTCAAATGCTATGGAGTGAG


Fc region
CCCGACAAAACTGAACGATCTCTGCTTTACAAATGTCTACG


(M252Y/S254T/T256E)
CCGACTCTTTTGTGATCAGAGGGGACGAGGTCCGGCAGATC



GCACCAGGACAGACAGGCAAGATTGCTGACTACAACTATA



AGCTGCCTGACGACTTCACAGGATGTGTGATCGCATGGAAC



TCAAACAATCTGGACTCCAAAGTCGGGGGCAACTATAATT



ACCTGTATCGCCTGTTCCGGAAGTCCAACCTGAAGCCCTTC



GAGAGGGACATCAGTACAGAGATCTATCAGGCTGGCTCCA



CCCCTTGCAATGGCGTCGAAGGCTTTAATTGTTATTTTCCCC



TGCAGTCTTACGGGTTTCAGCCTACTAATGGAGTTGGGTAC



CAGCCATACAGAGTGGTCGTGCTCAGCTTCGAGCTCCTGCA



TGCTCCACCTAAGTCCTGCGACAAAACCCATACATGTCCAC



CATGCCCAGCTCCTGAACTGCTCGGCGGGCCTAGTGTTTTC



CTCTTCCCTCCTAAGCCCAAGGATACCCTCTATATCACTCG



CGAACCAGAAGTGACCTGCGTGGTCGTGGATGTCTCTCACG



AGGATCCTGAAGTGAAGTTTAACTGGTATGTCGACGGAGT



GGAAGTGCACAACGCCAAGACAAAGCCAAGAGAAGAACA



ATACAATTCTACTTATAGGGTGGTGTCTGTGCTGACAGTGC



TGCACCAGGATTGGCTGAATGGAAAAGAATATAAGTGTAA



GGTCTCTAACAAGGCCCTGCCCGCTCCAATTGAGAAGACA



ATTTCCAAGGCCAAGGGGCAGCCTCGGGAACCTCAGGTGT



ACACACTGCCCCCATCCAGGGATGAACTGACTAAAAATCA



GGTGTCTCTGACATGCCTGGTGAAAGGGTTTTATCCAAGTG



ACATTGCTGTGGAGTGGGAGTCTAATGGGCAGCCTGAAAA



TAACTACAAGACCACACCACCAGTGCTCGATAGCGACGGG



TCTTTCTTTCTGTATTCTAAACTGACCGTGGATAAATCTCGG



TGGCAGCAGGGAAACGTGTTTTCTTGCTCAGTGATGCACGA



AGCTCTGCACAATCACTATACACAGAAATCCCTGTCCCTGT



CTCCAGGCAAATAA





A receptor-binding
(SEQ ID NO: 108)


domain of SARS-CoV-2

MFVFLVLLPLVSSQCPNITNLCPFGEVFNATRFASVYAWNRKR



S protein with a signal
ISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSF


sequence and a mutated
VIRGDEVRQIAPGQTGKIADYNYKLPDDFTGCVIAWNSNNLD


Fc region
SKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAGSTPCNGVEG


(M252Y/S254T/T256E)
FNCYFPLQSYGFQPTNGVGYQPYRVVVLSFELLHAPPKSCDKT




HTCPPCPAPELLGGPSVFLFPPKPKDTL


Y


I


T


R


E


PEVTCVVVDVSH





EDPEVKFNWYVDGVEVHNAKTKPREEQYNSTYRVVSVLTVLHQD





WLNGKEYKCKVSNKALPAPIEKTISKAKGQPREPQVYTLPPSRDE





LTKNQVSLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLDSDG





SFFLYSKLTVDKSRWQQGNVFSCSVMHEALHNHYTQKSLSLSPGK






Optimized nucleotide
(SEQ ID NO: 111)


sequence encoding a
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


receptor-binding
TGTCCTAATATCACTAACCTGTGTCCTTTCGGTGAAGTGTTC


domain of SARS-CoV-2
AACGCCACCAGGTTTGCTAGCGTGTATGCCTGGAACAGGA


S protein with a signal
AGAGGATCTCTAACTGCGTCGCCGACTATTCCGTGCTGTAT


sequence and a mutated
AACAGCGCCTCCTTCTCCACATTCAAATGCTATGGAGTGAG


Fc region
CCCGACAAAACTGAACGATCTCTGCTTTACAAATGTCTACG


(H433K/N434F)
CCGACTCTTTTGTGATCAGAGGGGACGAGGTCCGGCAGATC



GCACCAGGACAGACAGGCAAGATTGCTGACTACAACTATA



AGCTGCCTGACGACTTCACAGGATGTGTGATCGCATGGAAC



TCAAACAATCTGGACTCCAAAGTCGGGGGCAACTATAATT



ACCTGTATCGCCTGTTCCGGAAGTCCAACCTGAAGCCCTTC



GAGAGGGACATCAGTACAGAGATCTATCAGGCTGGCTCCA



CCCCTTGCAATGGCGTCGAAGGCTTTAATTGTTATTTTCCCC



TGCAGTCTTACGGGTTTCAGCCTACTAATGGAGTTGGGTAC



CAGCCATACAGAGTGGTCGTGCTCAGCTTCGAGCTCCTGCA



TGCTCCACCTAAGTCCTGCGACAAAACCCATACATGTCCAC



CATGCCCAGCTCCTGAACTGCTCGGCGGGCCTAGTGTTTTC



CTCTTCCCTCCTAAGCCCAAGGATACCCTCATGATCTCTCG



CACACCAGAAGTGACCTGCGTGGTCGTGGATGTCTCTCACG



AGGATCCTGAAGTGAAGTTTAACTGGTATGTCGACGGAGT



GGAAGTGCACAACGCCAAGACAAAGCCAAGAGAAGAACA



ATACAATTCTACTTATAGGGTGGTGTCTGTGCTGACAGTGC



TGCACCAGGATTGGCTGAATGGAAAAGAATATAAGTGTAA



GGTCTCTAACAAGGCCCTGCCCGCTCCAATTGAGAAGACA



ATTTCCAAGGCCAAGGGGCAGCCTCGGGAACCTCAGGTGT



ACACACTGCCCCCATCCAGGGATGAACTGACTAAAAATCA



GGTGTCTCTGACATGCCTGGTGAAAGGGTTTTATCCAAGTG



ACATTGCTGTGGAGTGGGAGTCTAATGGGCAGCCTGAAAA



TAACTACAAGACCACACCACCAGTGCTCGATAGCGACGGG



TCTTTCTTTCTGTATTCTAAACTGACCGTGGATAAATCTCGG



TGGCAGCAGGGAAACGTGTTTTCTTGCTCAGTGATGCACGA



AGCTCTGAAATTTCACTATACACAGAAATCCCTGTCCCTGT



CTCCAGGCAAATAA





A receptor-binding
(SEQ ID NO: 110)


domain of SARS-CoV-2

MFVFLVLLPLVSSQCPNITNLCPFGEVFNATRFASVYAWNRKR



S protein with a signal
ISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSF


sequence and a mutated
VIRGDEVRQIAPGQTGKIADYNYKLPDDFTGCVIAWNSNNLD


Fc region
SKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAGSTPCNGVEG


(H433K/N434F)
FNCYFPLQSYGFQPTNGVGYQPYRVVVLSFELLHAPPKSCDKT




HTCPPCPAPELLGGPSVFLFPPKPKDTLMISRTPEVTCVVVDVSH





EDPEVKFNWYVDGVEVHNAKTKPREEQYNSTYRVVSVLTVLHQD





WLNGKEYKCKVSNKALPAPIEKTISKAKGQPREPQVYTLPPSRDE





LTKNQVSLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLDSDG





SFFLYSKLTVDKSRWQQGNVFSCSVMHEAL


KF


HYTQKSLSLSPGK






SARS-CoV-2 S protein
(SEQ ID NO: 122)


mutated to remove a
ATGTTCCTGCTGACAACAAAAAGAACCATGTTTGTGTTCCT


furin cleavage site, to
GGTGCTGCTGCCTCTGGTGTCCTCACAGTGTGTCAACCTGA


replace residues 986 and
CAACAAGAACTCAGCTGCCACCAGCCTACACCAACTCCTTC


987 with proline and
ACCAGAGGCGTGTATTACCCAGACAAGGTGTTTAGAAGCA


containing an extended
GCGTGCTGCACTCTACCCAGGACCTCTTTCTGCCCTTTTTCA


signal sequence
GCAACGTGACATGGTTTCACGCAATTCACGTGTCCGGCACT



AATGGCACAAAGCGGTTCGACAATCCAGTCCTGCCTTTCAA



CGATGGCGTCTACTTTGCATCTACTGAGAAATCCAATATCA



TTAGGGGATGGATCTTCGGCACAACCCTGGATTCTAAGACC



CAGAGCCTGCTGATCGTCAACAACGCCACAAACGTGGTCA



TTAAGGTTTGCGAGTTTCAGTTCTGTAACGATCCTTTTCTGG



GCGTGTATTATCATAAGAACAATAAGAGCTGGATGGAGTC



CGAGTTTAGAGTGTATAGCTCTGCAAATAATTGTACCTTTG



AGTACGTGAGCCAGCCCTTTCTGATGGACCTGGAGGGAAA



ACAAGGAAACTTCAAAAACCTGCGGGAATTCGTTTTCAAA



AACATCGACGGCTATTTCAAGATCTATAGCAAGCATACCCC



AATCAACCTCGTGAGGGACCTCCCCCAGGGCTTTAGCGCAC



TGGAGCCACTGGTTGACCTGCCTATCGGCATTAATATCACA



AGATTTCAGACCCTGCTGGCACTGCATAGAAGCTATCTGAC



CCCTGGAGACTCCTCTAGTGGGTGGACTGCCGGCGCCGCTG



CCTACTATGTGGGCTATCTGCAGCCACGGACATTCCTGCTG



AAATACAATGAGAACGGGACAATCACAGATGCTGTTGATT



GCGCACTCGACCCCCTGTCCGAGACAAAGTGCACTCTCAAG



AGCTTTACCGTCGAGAAGGGCATCTATCAGACCTCAAACTT



CAGGGTGCAGCCCACAGAATCTATCGTGCGCTTCCCTAATA



TCACTAACCTGTGTCCTTTCGGTGAAGTGTTCAACGCCACC



AGGTTTGCTAGCGTGTATGCCTGGAACAGGAAGAGGATCT



CTAACTGCGTCGCCGACTATTCCGTGCTGTATAACAGCGCC



TCCTTCTCCACATTCAAATGCTATGGAGTGAGCCCGACAAA



ACTGAACGATCTCTGCTTTACAAATGTCTACGCCGACTCTT



TTGTGATCAGAGGGGACGAGGTCCGGCAGATCGCACCAGG



ACAGACAGGCAAGATTGCTGACTACAACTATAAGCTGCCT



GACGACTTCACAGGATGTGTGATCGCATGGAACTCAAACA



ATCTGGACTCCAAAGTCGGGGGCAACTATAATTACCTGTAT



CGCCTGTTCCGGAAGTCCAACCTGAAGCCCTTCGAGAGGG



ACATCAGTACAGAGATCTATCAGGCTGGCTCCACCCCTTGC



AATGGCGTCGAAGGCTTTAATTGTTATTTTCCCCTGCAGTCT



TACGGGTTTCAGCCTACTAATGGAGTTGGGTACCAGCCATA



CAGAGTGGTCGTGCTCAGCTTCGAGCTCCTGCATGCTCCAG



CTACAGTTTGCGGGCCAAAGAAGTCCACTAACCTGGTGAA



GAATAAGTGCGTCAACTTCAACTTTAACGGGCTCACCGGCA



CCGGCGTGCTGACTGAGAGCAACAAGAAGTTTCTGCCATTT



CAACAGTTTGGACGGGACATTGCCGACACCACCGATGCCG



TTCGGGATCCACAGACCCTGGAAATTCTGGACATTACACCG



TGCAGCTTCGGGGGCGTGAGCGTGATCACACCCGGAACCA



ATACAAGCAACCAGGTTGCCGTCCTGTATCAGGATGTCAAT



TGCACAGAAGTGCCAGTTGCTATCCACGCAGACCAGCTGA



CTCCCACATGGCGGGTGTATAGCACCGGATCCAACGTGTTT



CAGACCCGCGCCGGATGTCTCATTGGGGCCGAGCACGTGA



ATAACAGCTACGAGTGCGACATCCCCATTGGCGCCGGCATT



TGTGCGTCTTACCAGACTCAGACCAACTCTCCTGGCTCCGC



CTCTTCCGTTGCTAGTCAGTCTATTATTGCCTATACCATGAG



CCTCGGAGCTGAGAATAGCGTGGCCTACTCCAATAATTCCA



TCGCAATCCCTACTAACTTCACTATTTCTGTGACCACCGAG



ATCCTGCCTGTGTCTATGACTAAGACTAGCGTTGATTGTAC



CATGTATATTTGTGGCGACTCTACCGAATGTTCTAACCTGC



TGCTTCAGTACGGCTCATTTTGCACACAGCTGAACAGAGCC



CTGACTGGGATCGCTGTGGAGCAGGACAAGAACACACAGG



AGGTGTTTGCACAGGTGAAGCAGATCTATAAGACCCCTCCT



ATTAAGGATTTCGGCGGATTCAATTTCTCACAGATTCTGCC



AGACCCCAGTAAGCCTTCCAAGAGGAGCTTCATCGAGGAT



CTCCTGTTTAACAAGGTGACCCTGGCAGACGCCGGCTTTAT



TAAGCAATATGGGGATTGCCTGGGCGACATTGCTGCCAGA



GACCTGATTTGCGCCCAGAAATTCAATGGCCTCACAGTGCT



GCCACCTCTGCTGACCGACGAGATGATCGCTCAATACACTA



GCGCACTGCTGGCCGGAACCATCACATCAGGCTGGACCTTC



GGGGCCGGAGCAGCACTGCAGATTCCATTCGCCATGCAGA



TGGCCTATAGATTCAACGGCATTGGCGTCACACAGAACGTG



CTGTACGAAAACCAGAAGCTCATCGCTAACCAGTTTAATTC



CGCAATTGGAAAGATCCAAGATTCACTCAGCTCAACCGCCT



CTGCACTCGGAAAGCTGCAGGACGTGGTCAACCAGAATGC



TCAGGCCCTGAACACACTCGTCAAGCAGCTGTCCTCTAACT



TTGGCGCTATCAGCTCCGTTCTGAACGACATTCTGAGCCGC



CTGGATCCCCCAGAGGCTGAAGTCCAGATTGACCGCCTGAT



TACCGGCCGGCTGCAGTCTCTGCAAACATACGTGACCCAGC



AGCTGATCAGAGCAGCCGAGATCCGGGCATCCGCAAATCT



GGCAGCAACTAAGATGAGCGAATGCGTGCTGGGCCAGTCC



AAGCGGGTGGACTTTTGTGGCAAGGGCTACCACCTGATGA



GCTTCCCCCAGAGCGCCCCACATGGCGTTGTTTTTCTGCAC



GTGACCTATGTCCCTGCTCAGGAAAAGAACTTTACAACTGC



TCCTGCTATCTGCCATGACGGCAAGGCCCACTTCCCACGGG



AGGGAGTGTTTGTGTCCAATGGCACACACTGGTTCGTGACC



CAGAGGAACTTCTATGAACCCCAGATCATCACCACTGACA



ATACCTTCGTGTCTGGAAATTGCGACGTCGTGATCGGCATC



GTTAACAACACCGTGTACGACCCTCTCCAGCCAGAGCTGGA



CTCCTTTAAGGAGGAACTGGATAAGTATTTTAAGAACCACA



CAAGCCCAGATGTGGATCTCGGGGACATCTCCGGAATTAA



CGCCTCCGTGGTGAATATCCAGAAGGAGATTGACCGCCTA



AATGAAGTTGCCAAGAACCTCAATGAGTCTCTGATTGATCT



GCAGGAACTGGGCAAGTATGAGCAGTATATCAAATGGCCC



TGGTACATTTGGCTGGGGTTTATCGCCGGACTGATTGCCAT



CGTCATGGTGACCATCATGCTGTGTTGCATGACCTCCTGTT



GTTCCTGTCTGAAGGGCTGCTGTAGTTGCGGCTCTTGCTGT



AAATTCGACGAAGATGATAGCGAGCCCGTGCTGAAGGGCG



TGAAGCTGCATTATACCTGA





Optimized nucleotide
(SEQ ID NO: 123)


sequence encoding a

MFLLTTKRTMFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPD



SARS-CoV-2 S protein
KVFRSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFDNPV


mutated to remove a
LPFNDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVV


furin cleavage site, to
IKVCEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFE


replace residues 986 and
YVSQPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINL


987 with proline and
VRDLPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSG


which contains an
WTAGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSET


extended signal
KCTLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFN


sequence
ATRFASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPT



KLNDLCFTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPD



DFTGCVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDIS



TEIYQAGSTPCNGVEGFNCYFPLQSYGFQPTNGVGYQPYRVV



VLSFELLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLT



ESNKKFLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSV



ITPGTNTSNQVAVLYQDVNCTEVPVAIHADQLTPTWRVYSTG



SNVFQTRAGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSPGS




ASSVASQSIIAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEILPV




SMTKTSVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAV



EQDKNTQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRS



FIEDLLFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLT



VLPPLLTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQ



MAYRFNGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASA



LGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDPP



EAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMS



ECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQ



EKNFTTAPAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQI



ITTDNTFVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFK



NHTSPDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQ



ELGKYEQYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCL



KGCCSCGSCCKFDEDDSEPVLKGVKLHYT





Optimized nucleotide
(SEQ ID NO: 124)


sequence encoding a
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


SARS-CoV-2 S protein
TGTGTCAACCTGACAACTAGGACTCAGCTGCCACCAGCCTA


mutated to remove a
CACCAACTCCTTCACCAGAGGCGTGTATTACCCAGACAAGG


furin cleavage site, to
TGTTTAGAAGCAGCGTGCTGCACTCTACCCAGGACCTCTTT


replace residues 986 and
CTGCCCTTTTTCAGCAACGTGACATGGTTTCACGCAATTCA


987 with proline and to
CGTGTCCGGCACTAATGGCACAAAGCGGTTCGACAATCCA


mutate the ER retrieval
GTCCTGCCTTTCAACGATGGCGTCTACTTTGCATCTACTGA


signal
GAAATCCAATATCATTAGGGGATGGATCTTCGGCACAACCC



TGGATTCTAAGACCCAGAGCCTGCTGATCGTCAACAACGCC



ACAAACGTGGTCATTAAGGTTTGCGAGTTTCAGTTCTGTAA



CGATCCTTTTCTGGGCGTGTATTATCATAAGAACAATAAGA



GCTGGATGGAGTCCGAGTTTAGAGTGTATAGCTCTGCAAAT



AATTGTACCTTTGAGTACGTGAGCCAGCCCTTTCTGATGGA



CCTGGAGGGAAAACAAGGAAACTTCAAAAACCTGCGGGAA



TTCGTTTTCAAAAACATCGACGGCTATTTCAAGATCTATAG



CAAGCATACCCCAATCAACCTCGTGAGGGACCTCCCCCAG



GGCTTTAGCGCACTGGAGCCACTGGTTGACCTGCCTATCGG



CATTAATATCACAAGATTTCAGACCCTGCTGGCACTGCATA



GAAGCTATCTGACCCCTGGAGACTCCTCTAGTGGGTGGACT



GCCGGCGCCGCTGCCTACTATGTGGGCTATCTGCAGCCACG



GACATTCCTGCTGAAATACAATGAGAACGGGACAATCACA



GATGCTGTTGATTGCGCACTCGACCCCCTGTCCGAGACAAA



GTGCACTCTCAAGAGCTTTACCGTCGAGAAGGGCATCTATC



AGACCTCAAACTTCAGGGTGCAGCCCACAGAATCTATCGTG



CGCTTCCCTAATATCACTAACCTGTGTCCTTTCGGTGAAGT



GTTCAACGCCACCAGGTTTGCTAGCGTGTATGCCTGGAACA



GGAAGAGGATCTCTAACTGCGTCGCCGACTATTCCGTGCTG



TATAACAGCGCCTCCTTCTCCACATTCAAATGCTATGGAGT



GAGCCCGACAAAACTGAACGATCTCTGCTTTACAAATGTCT



ACGCCGACTCTTTTGTGATCAGAGGGGACGAGGTCCGGCA



GATCGCACCAGGACAGACAGGCAAGATTGCTGACTACAAC



TATAAGCTGCCTGACGACTTCACAGGATGTGTGATCGCATG



GAACTCAAACAATCTGGACTCCAAAGTCGGGGGCAACTAT



AATTACCTGTATCGCCTGTTCCGGAAGTCCAACCTGAAGCC



CTTCGAGAGGGACATCAGTACAGAGATCTATCAGGCTGGC



TCCACCCCTTGCAATGGCGTCGAAGGCTTTAATTGTTATTTT



CCCCTGCAGTCTTACGGGTTTCAGCCTACTAATGGAGTTGG



GTACCAGCCATACAGAGTGGTCGTGCTCAGCTTCGAGCTCC



TGCATGCTCCAGCTACAGTTTGCGGGCCAAAGAAGTCCACT



AACCTGGTGAAGAATAAGTGCGTCAACTTCAACTTTAACGG



GCTCACCGGCACCGGCGTGCTGACTGAGAGCAACAAGAAG



TTTCTGCCATTTCAACAGTTTGGACGGGACATTGCCGACAC



CACCGATGCCGTTCGGGATCCACAGACCCTGGAAATTCTGG



ACATTACACCGTGCAGCTTCGGGGGCGTGAGCGTGATCAC



ACCCGGAACCAATACAAGCAACCAGGTTGCCGTCCTGTATC



AGGATGTCAATTGCACAGAAGTGCCAGTTGCTATCCACGCA



GACCAGCTGACTCCCACATGGCGGGTGTATAGCACCGGAT



CCAACGTGTTTCAGACCCGCGCCGGATGTCTCATTGGGGCC



GAGCACGTGAATAACAGCTACGAGTGCGACATCCCCATTG



GCGCCGGCATTTGTGCGTCTTACCAGACTCAGACCAACTCT



CCTGGCTCCGCCTCTTCCGTTGCTAGTCAGTCTATTATTGCC



TATACCATGAGCCTCGGAGCTGAGAATAGCGTGGCCTACTC



CAATAATTCCATCGCAATCCCTACTAACTTCACTATTTCTGT



GACCACCGAGATCCTGCCTGTGTCTATGACTAAGACTAGCG



TTGATTGTACCATGTATATTTGTGGCGACTCTACCGAATGTT



CTAACCTGCTGCTTCAGTACGGCTCATTTTGCACACAGCTG



AACAGAGCCCTGACTGGGATCGCTGTGGAGCAGGACAAGA



ACACACAGGAGGTGTTTGCACAGGTGAAGCAGATCTATAA



GACCCCTCCTATTAAGGATTTCGGCGGATTCAATTTCTCAC



AGATTCTGCCAGACCCCAGTAAGCCTTCCAAGAGGAGCTTC



ATCGAGGATCTCCTGTTTAACAAGGTGACCCTGGCAGACGC



CGGCTTTATTAAGCAATATGGGGATTGCCTGGGCGACATTG



CTGCCAGAGACCTGATTTGCGCCCAGAAATTCAATGGCCTC



ACAGTGCTGCCACCTCTGCTGACCGACGAGATGATCGCTCA



ATACACTAGCGCACTGCTGGCCGGAACCATCACATCAGGCT



GGACCTTCGGGGCCGGAGCAGCACTGCAGATTCCATTCGCC



ATGCAGATGGCCTATAGATTCAACGGCATTGGCGTCACACA



GAACGTGCTGTACGAAAACCAGAAGCTCATCGCTAACCAG



TTTAATTCCGCAATTGGAAAGATCCAAGATTCACTCAGCTC



AACCGCCTCTGCACTCGGAAAGCTGCAGGACGTGGTCAAC



CAGAATGCTCAGGCCCTGAACACACTCGTCAAGCAGCTGTC



CTCTAACTTTGGCGCTATCAGCTCCGTTCTGAACGACATTCT



GAGCCGCCTGGATCCCCCAGAGGCTGAAGTCCAGATTGAC



CGCCTGATTACCGGCCGGCTGCAGTCTCTGCAAACATACGT



GACCCAGCAGCTGATCAGAGCAGCCGAGATCCGGGCATCC



GCAAATCTGGCAGCAACTAAGATGAGCGAATGCGTGCTGG



GCCAGTCCAAGCGGGTGGACTTTTGTGGCAAGGGCTACCA



CCTGATGAGCTTCCCCCAGAGCGCCCCACATGGCGTTGTTT



TTCTGCACGTGACCTATGTCCCTGCTCAGGAAAAGAACTTT



ACAACTGCTCCTGCTATCTGCCATGACGGCAAGGCCCACTT



CCCACGGGAGGGAGTGTTTGTGTCCAATGGCACACACTGGT



TCGTGACCCAGAGGAACTTCTATGAACCCCAGATCATCACC



ACTGACAATACCTTCGTGTCTGGAAATTGCGACGTCGTGAT



CGGCATCGTTAACAACACCGTGTACGACCCTCTCCAGCCAG



AGCTGGACTCCTTTAAGGAGGAACTGGATAAGTATTTTAAG



AACCACACAAGCCCAGATGTGGATCTCGGGGACATCTCCG



GAATTAACGCCTCCGTGGTGAATATCCAGAAGGAGATTGA



CCGCCTAAATGAAGTTGCCAAGAACCTCAATGAGTCTCTGA



TTGATCTGCAGGAACTGGGCAAGTATGAGCAGTATATCAA



ATGGCCCTGGTACATTTGGCTGGGGTTTATCGCCGGACTGA



TTGCCATCGTCATGGTGACCATCATGCTGTGTTGCATGACC



TCCTGTTGTTCCTGTCTGAAGGGCTGCTGTAGTTGCGGCTCT



TGCTGTAAATTCGACGAAGATGATAGCGAGCCCGTGCTGA



AGGGCGTGGCCCTGGCTTATACCTGA





SARS-CoV-2 S protein
(SEQ ID NO: 125)


mutated to remove a
MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVF


furin cleavage site, to
RSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFDNPVLPF


replace residues 986 and
NDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKV


987 with proline and to
CEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVS


mutate the ER retrieval
QPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRD


signal
LPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWT



AGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKC



TLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATR



FASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLN



DLCFTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTG



CVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQ



AGSTPCNGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFE



LLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKK



FLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTN



TSNQVAVLYQDVNCTEVPVAIHADQLTPTWRVYSTGSNVFQT



RAGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSPGSASSVAS



QSIIAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEILPVSMTKT



SVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDK



NTQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSFIEDL



LFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPL



LTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQMAYRF



NGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQ



DVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDPPEAEVQ



IDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLG



QSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTT



APAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTDNT



FVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTSPD



VDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYE



QYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCCSC



GSCCKFDEDDSEPVLKGVALAYT





Optimized nucleotide
(SEQ ID NO: 126)


sequence encoding a
ATGTTCCTGCTGACAACAAAAAGAACCATGTTTGTGTTCCT


SARS-CoV-2 S protein
GGTGCTGCTGCCTCTGGTGTCCTCACAGTGTGTCAACCTGA


mutated to remove a
CAACAAGAACTCAGCTGCCACCAGCCTACACCAACTCCTTC


furin cleavage site, to
ACCAGAGGCGTGTATTACCCAGACAAGGTGTTTAGAAGCA


replace residues 986 and
GCGTGCTGCACTCTACCCAGGACCTCTTTCTGCCCTTTTTCA


987 with proline, to
GCAACGTGACATGGTTTCACGCAATTCACGTGTCCGGCACT


mutate the ER retrieval
AATGGCACAAAGCGGTTCGACAATCCAGTCCTGCCTTTCAA


signal and which
CGATGGCGTCTACTTTGCATCTACTGAGAAATCCAATATCA


contains an extended
TTAGGGGATGGATCTTCGGCACAACCCTGGATTCTAAGACC


signal sequence
CAGAGCCTGCTGATCGTCAACAACGCCACAAACGTGGTCA



TTAAGGTTTGCGAGTTTCAGTTCTGTAACGATCCTTTTCTGG



GCGTGTATTATCATAAGAACAATAAGAGCTGGATGGAGTC



CGAGTTTAGAGTGTATAGCTCTGCAAATAATTGTACCTTTG



AGTACGTGAGCCAGCCCTTTCTGATGGACCTGGAGGGAAA



ACAAGGAAACTTCAAAAACCTGCGGGAATTCGTTTTCAAA



AACATCGACGGCTATTTCAAGATCTATAGCAAGCATACCCC



AATCAACCTCGTGAGGGACCTCCCCCAGGGCTTTAGCGCAC



TGGAGCCACTGGTTGACCTGCCTATCGGCATTAATATCACA



AGATTTCAGACCCTGCTGGCACTGCATAGAAGCTATCTGAC



CCCTGGAGACTCCTCTAGTGGGTGGACTGCCGGCGCCGCTG



CCTACTATGTGGGCTATCTGCAGCCACGGACATTCCTGCTG



AAATACAATGAGAACGGGACAATCACAGATGCTGTTGATT



GCGCACTCGACCCCCTGTCCGAGACAAAGTGCACTCTCAAG



AGCTTTACCGTCGAGAAGGGCATCTATCAGACCTCAAACTT



CAGGGTGCAGCCCACAGAATCTATCGTGCGCTTCCCTAATA



TCACTAACCTGTGTCCTTTCGGTGAAGTGTTCAACGCCACC



AGGTTTGCTAGCGTGTATGCCTGGAACAGGAAGAGGATCT



CTAACTGCGTCGCCGACTATTCCGTGCTGTATAACAGCGCC



TCCTTCTCCACATTCAAATGCTATGGAGTGAGCCCGACAAA



ACTGAACGATCTCTGCTTTACAAATGTCTACGCCGACTCTT



TTGTGATCAGAGGGGACGAGGTCCGGCAGATCGCACCAGG



ACAGACAGGCAAGATTGCTGACTACAACTATAAGCTGCCT



GACGACTTCACAGGATGTGTGATCGCATGGAACTCAAACA



ATCTGGACTCCAAAGTCGGGGGCAACTATAATTACCTGTAT



CGCCTGTTCCGGAAGTCCAACCTGAAGCCCTTCGAGAGGG



ACATCAGTACAGAGATCTATCAGGCTGGCTCCACCCCTTGC



AATGGCGTCGAAGGCTTTAATTGTTATTTTCCCCTGCAGTCT



TACGGGTTTCAGCCTACTAATGGAGTTGGGTACCAGCCATA



CAGAGTGGTCGTGCTCAGCTTCGAGCTCCTGCATGCTCCAG



CTACAGTTTGCGGGCCAAAGAAGTCCACTAACCTGGTGAA



GAATAAGTGCGTCAACTTCAACTTTAACGGGCTCACCGGCA



CCGGCGTGCTGACTGAGAGCAACAAGAAGTTTCTGCCATTT



CAACAGTTTGGACGGGACATTGCCGACACCACCGATGCCG



TTCGGGATCCACAGACCCTGGAAATTCTGGACATTACACCG



TGCAGCTTCGGGGGCGTGAGCGTGATCACACCCGGAACCA



ATACAAGCAACCAGGTTGCCGTCCTGTATCAGGATGTCAAT



TGCACAGAAGTGCCAGTTGCTATCCACGCAGACCAGCTGA



CTCCCACATGGCGGGTGTATAGCACCGGATCCAACGTGTTT



CAGACCCGCGCCGGATGTCTCATTGGGGCCGAGCACGTGA



ATAACAGCTACGAGTGCGACATCCCCATTGGCGCCGGCATT



TGTGCGTCTTACCAGACTCAGACCAACTCTCCTGGCTCCGC



CTCTTCCGTTGCTAGTCAGTCTATTATTGCCTATACCATGAG



CCTCGGAGCTGAGAATAGCGTGGCCTACTCCAATAATTCCA



TCGCAATCCCTACTAACTTCACTATTTCTGTGACCACCGAG



ATCCTGCCTGTGTCTATGACTAAGACTAGCGTTGATTGTAC



CATGTATATTTGTGGCGACTCTACCGAATGTTCTAACCTGC



TGCTTCAGTACGGCTCATTTTGCACACAGCTGAACAGAGCC



CTGACTGGGATCGCTGTGGAGCAGGACAAGAACACACAGG



AGGTGTTTGCACAGGTGAAGCAGATCTATAAGACCCCTCCT



ATTAAGGATTTCGGCGGATTCAATTTCTCACAGATTCTGCC



AGACCCCAGTAAGCCTTCCAAGAGGAGCTTCATCGAGGAT



CTCCTGTTTAACAAGGTGACCCTGGCAGACGCCGGCTTTAT



TAAGCAATATGGGGATTGCCTGGGCGACATTGCTGCCAGA



GACCTGATTTGCGCCCAGAAATTCAATGGCCTCACAGTGCT



GCCACCTCTGCTGACCGACGAGATGATCGCTCAATACACTA



GCGCACTGCTGGCCGGAACCATCACATCAGGCTGGACCTTC



GGGGCCGGAGCAGCACTGCAGATTCCATTCGCCATGCAGA



TGGCCTATAGATTCAACGGCATTGGCGTCACACAGAACGTG



CTGTACGAAAACCAGAAGCTCATCGCTAACCAGTTTAATTC



CGCAATTGGAAAGATCCAAGATTCACTCAGCTCAACCGCCT



CTGCACTCGGAAAGCTGCAGGACGTGGTCAACCAGAATGC



TCAGGCCCTGAACACACTCGTCAAGCAGCTGTCCTCTAACT



TTGGCGCTATCAGCTCCGTTCTGAACGACATTCTGAGCCGC



CTGGATCCCCCAGAGGCTGAAGTCCAGATTGACCGCCTGAT



TACCGGCCGGCTGCAGTCTCTGCAAACATACGTGACCCAGC



AGCTGATCAGAGCAGCCGAGATCCGGGCATCCGCAAATCT



GGCAGCAACTAAGATGAGCGAATGCGTGCTGGGCCAGTCC



AAGCGGGTGGACTTTTGTGGCAAGGGCTACCACCTGATGA



GCTTCCCCCAGAGCGCCCCACATGGCGTTGTTTTTCTGCAC



GTGACCTATGTCCCTGCTCAGGAAAAGAACTTTACAACTGC



TCCTGCTATCTGCCATGACGGCAAGGCCCACTTCCCACGGG



AGGGAGTGTTTGTGTCCAATGGCACACACTGGTTCGTGACC



CAGAGGAACTTCTATGAACCCCAGATCATCACCACTGACA



ATACCTTCGTGTCTGGAAATTGCGACGTCGTGATCGGCATC



GTTAACAACACCGTGTACGACCCTCTCCAGCCAGAGCTGGA



CTCCTTTAAGGAGGAACTGGATAAGTATTTTAAGAACCACA



CAAGCCCAGATGTGGATCTCGGGGACATCTCCGGAATTAA



CGCCTCCGTGGTGAATATCCAGAAGGAGATTGACCGCCTA



AATGAAGTTGCCAAGAACCTCAATGAGTCTCTGATTGATCT



GCAGGAACTGGGCAAGTATGAGCAGTATATCAAATGGCCC



TGGTACATTTGGCTGGGGTTTATCGCCGGACTGATTGCCAT



CGTCATGGTGACCATCATGCTGTGTTGCATGACCTCCTGTT



GTTCCTGTCTGAAGGGCTGCTGTAGTTGCGGCTCTTGCTGT



AAATTCGACGAAGATGATAGCGAGCCCGTGCTGAAGGGCG



TGGCCCTGGCTTATACCTGA





SARS-CoV-2 S protein
(SEQ ID NO: 127)


mutated to remove a

MFLLTTKRTMFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPD



furin cleavage site, to
KVFRSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFDNPV


replace residues 986 and
LPFNDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVV


987 with proline, to
IKVCEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFE


mutate the ER retrieval
YVSQPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINL


signal and which
VRDLPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSG


contains an extended
WTAGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSET


signal sequence
KCTLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFN



ATRFASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPT



KLNDLCFTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPD



DFTGCVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDIS



TEIYQAGSTPCNGVEGFNCYFPLQSYGFQPTNGVGYQPYRVV



VLSFELLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLT



ESNKKFLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSV



ITPGTNTSNQVAVLYQDVNCTEVPVAIHADQLTPTWRVYSTG



SNVFQTRAGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSPGS




ASSVASQSIIAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEILPV




SMTKTSVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAV



EQDKNTQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRS



FIEDLLFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLT



VLPPLLTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQ



MAYRFNGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASA



LGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDPP



EAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMS



ECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQ



EKNFTTAPAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQI



ITTDNTFVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFK



NHTSPDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQ



ELGKYEQYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCL



KGCCSCGSCCKFDEDDSEPVLKGVALAYT





Optimized nucleotide
(SEQ ID NO: 128)


sequence encoding a
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


SARS-CoV-2 S protein
TGTGTCAACCTGACAACTAGGACTCAGCTGCCACCAGCCTA


mutated to replace
CACCAACTCCTTCACCAGAGGCGTGTATTACCCAGACAAGG


residues 817, 892, 899,
TGTTTAGAAGCAGCGTGCTGCACTCTACCCAGGACCTCTTT


942, 986 and 987 with
CTGCCCTTTTTCAGCAACGTGACATGGTTTCACGCAATTCA


proline
CGTGTCCGGCACTAATGGCACAAAGCGGTTCGACAATCCA



GTCCTGCCTTTCAACGATGGCGTCTACTTTGCATCTACTGA



GAAATCCAATATCATTAGGGGATGGATCTTCGGCACAACCC



TGGATTCTAAGACCCAGAGCCTGCTGATCGTCAACAACGCC



ACAAACGTGGTCATTAAGGTTTGCGAGTTTCAGTTCTGTAA



CGATCCTTTTCTGGGCGTGTATTATCATAAGAACAATAAGA



GCTGGATGGAGTCCGAGTTTAGAGTGTATAGCTCTGCAAAT



AATTGTACCTTTGAGTACGTGAGCCAGCCCTTTCTGATGGA



CCTGGAGGGAAAACAAGGAAACTTCAAAAACCTGCGGGAA



TTCGTTTTCAAAAACATCGACGGCTATTTCAAGATCTATAG



CAAGCATACCCCAATCAACCTCGTGAGGGACCTCCCCCAG



GGCTTTAGCGCACTGGAGCCACTGGTTGACCTGCCTATCGG



CATTAATATCACAAGATTTCAGACCCTGCTGGCACTGCATA



GAAGCTATCTGACCCCTGGAGACTCCTCTAGTGGGTGGACT



GCCGGCGCCGCTGCCTACTATGTGGGCTATCTGCAGCCACG



GACATTCCTGCTGAAATACAATGAGAACGGGACAATCACA



GATGCTGTTGATTGCGCACTCGACCCCCTGTCCGAGACAAA



GTGCACTCTCAAGAGCTTTACCGTCGAGAAGGGCATCTATC



AGACCTCAAACTTCAGGGTGCAGCCCACAGAATCTATCGTG



CGCTTCCCTAATATCACTAACCTGTGTCCTTTCGGTGAAGT



GTTCAACGCCACCAGGTTTGCTAGCGTGTATGCCTGGAACA



GGAAGAGGATCTCTAACTGCGTCGCCGACTATTCCGTGCTG



TATAACAGCGCCTCCTTCTCCACATTCAAATGCTATGGAGT



GAGCCCGACAAAACTGAACGATCTCTGCTTTACAAATGTCT



ACGCCGACTCTTTTGTGATCAGAGGGGACGAGGTCCGGCA



GATCGCACCAGGACAGACAGGCAAGATTGCTGACTACAAC



TATAAGCTGCCTGACGACTTCACAGGATGTGTGATCGCATG



GAACTCAAACAATCTGGACTCCAAAGTCGGGGGCAACTAT



AATTACCTGTATCGCCTGTTCCGGAAGTCCAACCTGAAGCC



CTTCGAGAGGGACATCAGTACAGAGATCTATCAGGCTGGC



TCCACCCCTTGCAATGGCGTCGAAGGCTTTAATTGTTATTTT



CCCCTGCAGTCTTACGGGTTTCAGCCTACTAATGGAGTTGG



GTACCAGCCATACAGAGTGGTCGTGCTCAGCTTCGAGCTCC



TGCATGCTCCAGCTACAGTTTGCGGGCCAAAGAAGTCCACT



AACCTGGTGAAGAATAAGTGCGTCAACTTCAACTTTAACGG



GCTCACCGGCACCGGCGTGCTGACTGAGAGCAACAAGAAG



TTTCTGCCATTTCAACAGTTTGGACGGGACATTGCCGACAC



CACCGATGCCGTTCGGGATCCACAGACCCTGGAAATTCTGG



ACATTACACCGTGCAGCTTCGGGGGCGTGAGCGTGATCAC



ACCCGGAACCAATACAAGCAACCAGGTTGCCGTCCTGTATC



AGGATGTCAATTGCACAGAAGTGCCAGTTGCTATCCACGCA



GACCAGCTGACTCCCACATGGCGGGTGTATAGCACCGGAT



CCAACGTGTTTCAGACCCGCGCCGGATGTCTCATTGGGGCC



GAGCACGTGAATAACAGCTACGAGTGCGACATCCCCATTG



GCGCCGGCATTTGTGCGTCTTACCAGACTCAGACCAACTCT



CCTAGAAGGGCCAGGTCCGTTGCTAGTCAGTCTATTATTGC



CTATACCATGAGCCTCGGAGCTGAGAATAGCGTGGCCTACT



CCAATAATTCCATCGCAATCCCTACTAACTTCACTATTTCTG



TGACCACCGAGATCCTGCCTGTGTCTATGACTAAGACTAGC



GTTGATTGTACCATGTATATTTGTGGCGACTCTACCGAATG



TTCTAACCTGCTGCTTCAGTACGGCTCATTTTGCACACAGCT



GAACAGAGCCCTGACTGGGATCGCTGTGGAGCAGGACAAG



AACACACAGGAGGTGTTTGCACAGGTGAAGCAGATCTATA



AGACCCCTCCTATTAAGGATTTCGGCGGATTCAATTTCTCA



CAGATTCTGCCAGACCCCAGTAAGCCTTCCAAGAGGAGCC



CTATCGAGGATCTCCTGTTTAACAAGGTGACCCTGGCAGAC



GCCGGCTTTATTAAGCAATATGGGGATTGCCTGGGCGACAT



TGCTGCCAGAGACCTGATTTGCGCCCAGAAATTCAATGGCC



TCACAGTGCTGCCACCTCTGCTGACCGACGAGATGATCGCT



CAATACACTAGCGCACTGCTGGCCGGAACCATCACATCAG



GCTGGACCTTCGGGGCCGGACCAGCACTGCAGATTCCATTC



CCTATGCAGATGGCCTATAGATTCAACGGCATTGGCGTCAC



ACAGAACGTGCTGTACGAAAACCAGAAGCTCATCGCTAAC



CAGTTTAATTCCGCAATTGGAAAGATCCAAGATTCACTCAG



CTCAACCCCCTCTGCACTCGGAAAGCTGCAGGACGTGGTCA



ACCAGAATGCTCAGGCCCTGAACACACTCGTCAAGCAGCT



GTCCTCTAACTTTGGCGCTATCAGCTCCGTTCTGAACGACA



TTCTGAGCCGCCTGGATCCCCCAGAGGCTGAAGTCCAGATT



GACCGCCTGATTACCGGCCGGCTGCAGTCTCTGCAAACATA



CGTGACCCAGCAGCTGATCAGAGCAGCCGAGATCCGGGCA



TCCGCAAATCTGGCAGCAACTAAGATGAGCGAATGCGTGC



TGGGCCAGTCCAAGCGGGTGGACTTTTGTGGCAAGGGCTA



CCACCTGATGAGCTTCCCCCAGAGCGCCCCACATGGCGTTG



TTTTTCTGCACGTGACCTATGTCCCTGCTCAGGAAAAGAAC



TTTACAACTGCTCCTGCTATCTGCCATGACGGCAAGGCCCA



CTTCCCACGGGAGGGAGTGTTTGTGTCCAATGGCACACACT



GGTTCGTGACCCAGAGGAACTTCTATGAACCCCAGATCATC



ACCACTGACAATACCTTCGTGTCTGGAAATTGCGACGTCGT



GATCGGCATCGTTAACAACACCGTGTACGACCCTCTCCAGC



CAGAGCTGGACTCCTTTAAGGAGGAACTGGATAAGTATTTT



AAGAACCACACAAGCCCAGATGTGGATCTCGGGGACATCT



CCGGAATTAACGCCTCCGTGGTGAATATCCAGAAGGAGAT



TGACCGCCTAAATGAAGTTGCCAAGAACCTCAATGAGTCTC



TGATTGATCTGCAGGAACTGGGCAAGTATGAGCAGTATATC



AAATGGCCCTGGTACATTTGGCTGGGGTTTATCGCCGGACT



GATTGCCATCGTCATGGTGACCATCATGCTGTGTTGCATGA



CCTCCTGTTGTTCCTGTCTGAAGGGCTGCTGTAGTTGCGGCT



CTTGCTGTAAATTCGACGAAGATGATAGCGAGCCCGTGCTG



AAGGGCGTGAAGCTGCATTATACCTGA





SARS-CoV-2 S protein
(SEQ ID NO: 129)


mutated to replace
MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVF


residues 817, 892, 899,
RSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFDNPVLPF


942, 986 and 987 with
NDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKV


proline
CEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVS



QPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRD



LPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWT



AGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKC



TLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATR



FASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLN



DLCFTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTG



CVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQ



AGSTPCNGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFE



LLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKK



FLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTN



TSNQVAVLYQDVNCTEVPVAIHADQLTPTWRVYSTGSNVFQT



RAGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSPRRARSVA



SQSIIAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEILPVSMTK



TSVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDK



NTQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSPIEDL



LFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPL



LTDEMIAQYTSALLAGTITSGWTFGAGPALQIPFPMQMAYRF



NGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTPSALGKLQ



DVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDPPEAEVQ



IDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLG



QSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTT



APAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTDNT



FVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTSPD



VDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYE



QYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCCSC



GSCCKFDEDDSEPVLKGVKLHYT





Optimized nucleotide
(SEQ ID NO: 130)


sequence encoding a
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


SARS-CoV-2 S protein
TGTGTCAACCTGACAACTAGGACTCAGCTGCCACCAGCCTA


mutated to remove a
CACCAACTCCTTCACCAGAGGCGTGTATTACCCAGACAAGG


furin cleavage site and
TGTTTAGAAGCAGCGTGCTGCACTCTACCCAGGACCTCTTT


to replace residues 817,
CTGCCCTTTTTCAGCAACGTGACATGGTTTCACGCAATTCA


892, 899, 942, 986 and
CGTGTCCGGCACTAATGGCACAAAGCGGTTCGACAATCCA


987 with proline
GTCCTGCCTTTCAACGATGGCGTCTACTTTGCATCTACTGA



GAAATCCAATATCATTAGGGGATGGATCTTCGGCACAACCC



TGGATTCTAAGACCCAGAGCCTGCTGATCGTCAACAACGCC



ACAAACGTGGTCATTAAGGTTTGCGAGTTTCAGTTCTGTAA



CGATCCTTTTCTGGGCGTGTATTATCATAAGAACAATAAGA



GCTGGATGGAGTCCGAGTTTAGAGTGTATAGCTCTGCAAAT



AATTGTACCTTTGAGTACGTGAGCCAGCCCTTTCTGATGGA



CCTGGAGGGAAAACAAGGAAACTTCAAAAACCTGCGGGAA



TTCGTTTTCAAAAACATCGACGGCTATTTCAAGATCTATAG



CAAGCATACCCCAATCAACCTCGTGAGGGACCTCCCCCAG



GGCTTTAGCGCACTGGAGCCACTGGTTGACCTGCCTATCGG



CATTAATATCACAAGATTTCAGACCCTGCTGGCACTGCATA



GAAGCTATCTGACCCCTGGAGACTCCTCTAGTGGGTGGACT



GCCGGCGCCGCTGCCTACTATGTGGGCTATCTGCAGCCACG



GACATTCCTGCTGAAATACAATGAGAACGGGACAATCACA



GATGCTGTTGATTGCGCACTCGACCCCCTGTCCGAGACAAA



GTGCACTCTCAAGAGCTTTACCGTCGAGAAGGGCATCTATC



AGACCTCAAACTTCAGGGTGCAGCCCACAGAATCTATCGTG



CGCTTCCCTAATATCACTAACCTGTGTCCTTTCGGTGAAGT



GTTCAACGCCACCAGGTTTGCTAGCGTGTATGCCTGGAACA



GGAAGAGGATCTCTAACTGCGTCGCCGACTATTCCGTGCTG



TATAACAGCGCCTCCTTCTCCACATTCAAATGCTATGGAGT



GAGCCCGACAAAACTGAACGATCTCTGCTTTACAAATGTCT



ACGCCGACTCTTTTGTGATCAGAGGGGACGAGGTCCGGCA



GATCGCACCAGGACAGACAGGCAAGATTGCTGACTACAAC



TATAAGCTGCCTGACGACTTCACAGGATGTGTGATCGCATG



GAACTCAAACAATCTGGACTCCAAAGTCGGGGGCAACTAT



AATTACCTGTATCGCCTGTTCCGGAAGTCCAACCTGAAGCC



CTTCGAGAGGGACATCAGTACAGAGATCTATCAGGCTGGC



TCCACCCCTTGCAATGGCGTCGAAGGCTTTAATTGTTATTTT



CCCCTGCAGTCTTACGGGTTTCAGCCTACTAATGGAGTTGG



GTACCAGCCATACAGAGTGGTCGTGCTCAGCTTCGAGCTCC



TGCATGCTCCAGCTACAGTTTGCGGGCCAAAGAAGTCCACT



AACCTGGTGAAGAATAAGTGCGTCAACTTCAACTTTAACGG



GCTCACCGGCACCGGCGTGCTGACTGAGAGCAACAAGAAG



TTTCTGCCATTTCAACAGTTTGGACGGGACATTGCCGACAC



CACCGATGCCGTTCGGGATCCACAGACCCTGGAAATTCTGG



ACATTACACCGTGCAGCTTCGGGGGCGTGAGCGTGATCAC



ACCCGGAACCAATACAAGCAACCAGGTTGCCGTCCTGTATC



AGGATGTCAATTGCACAGAAGTGCCAGTTGCTATCCACGCA



GACCAGCTGACTCCCACATGGCGGGTGTATAGCACCGGAT



CCAACGTGTTTCAGACCCGCGCCGGATGTCTCATTGGGGCC



GAGCACGTGAATAACAGCTACGAGTGCGACATCCCCATTG



GCGCCGGCATTTGTGCGTCTTACCAGACTCAGACCAACTCT



CCTGGCTCCGCCTCTTCCGTTGCTAGTCAGTCTATTATTGCC



TATACCATGAGCCTCGGAGCTGAGAATAGCGTGGCCTACTC



CAATAATTCCATCGCAATCCCTACTAACTTCACTATTTCTGT



GACCACCGAGATCCTGCCTGTGTCTATGACTAAGACTAGCG



TTGATTGTACCATGTATATTTGTGGCGACTCTACCGAATGTT



CTAACCTGCTGCTTCAGTACGGCTCATTTTGCACACAGCTG



AACAGAGCCCTGACTGGGATCGCTGTGGAGCAGGACAAGA



ACACACAGGAGGTGTTTGCACAGGTGAAGCAGATCTATAA



GACCCCTCCTATTAAGGATTTCGGCGGATTCAATTTCTCAC



AGATTCTGCCAGACCCCAGTAAGCCTTCCAAGAGGAGCCCT



ATCGAGGATCTCCTGTTTAACAAGGTGACCCTGGCAGACGC



CGGCTTTATTAAGCAATATGGGGATTGCCTGGGCGACATTG



CTGCCAGAGACCTGATTTGCGCCCAGAAATTCAATGGCCTC



ACAGTGCTGCCACCTCTGCTGACCGACGAGATGATCGCTCA



ATACACTAGCGCACTGCTGGCCGGAACCATCACATCAGGCT



GGACCTTCGGGGCCGGACCAGCACTGCAGATTCCATTCCCT



ATGCAGATGGCCTATAGATTCAACGGCATTGGCGTCACACA



GAACGTGCTGTACGAAAACCAGAAGCTCATCGCTAACCAG



TTTAATTCCGCAATTGGAAAGATCCAAGATTCACTCAGCTC



AACCCCCTCTGCACTCGGAAAGCTGCAGGACGTGGTCAAC



CAGAATGCTCAGGCCCTGAACACACTCGTCAAGCAGCTGTC



CTCTAACTTTGGCGCTATCAGCTCCGTTCTGAACGACATTCT



GAGCCGCCTGGATCCCCCAGAGGCTGAAGTCCAGATTGAC



CGCCTGATTACCGGCCGGCTGCAGTCTCTGCAAACATACGT



GACCCAGCAGCTGATCAGAGCAGCCGAGATCCGGGCATCC



GCAAATCTGGCAGCAACTAAGATGAGCGAATGCGTGCTGG



GCCAGTCCAAGCGGGTGGACTTTTGTGGCAAGGGCTACCA



CCTGATGAGCTTCCCCCAGAGCGCCCCACATGGCGTTGTTT



TTCTGCACGTGACCTATGTCCCTGCTCAGGAAAAGAACTTT



ACAACTGCTCCTGCTATCTGCCATGACGGCAAGGCCCACTT



CCCACGGGAGGGAGTGTTTGTGTCCAATGGCACACACTGGT



TCGTGACCCAGAGGAACTTCTATGAACCCCAGATCATCACC



ACTGACAATACCTTCGTGTCTGGAAATTGCGACGTCGTGAT



CGGCATCGTTAACAACACCGTGTACGACCCTCTCCAGCCAG



AGCTGGACTCCTTTAAGGAGGAACTGGATAAGTATTTTAAG



AACCACACAAGCCCAGATGTGGATCTCGGGGACATCTCCG



GAATTAACGCCTCCGTGGTGAATATCCAGAAGGAGATTGA



CCGCCTAAATGAAGTTGCCAAGAACCTCAATGAGTCTCTGA



TTGATCTGCAGGAACTGGGCAAGTATGAGCAGTATATCAA



ATGGCCCTGGTACATTTGGCTGGGGTTTATCGCCGGACTGA



TTGCCATCGTCATGGTGACCATCATGCTGTGTTGCATGACC



TCCTGTTGTTCCTGTCTGAAGGGCTGCTGTAGTTGCGGCTCT



TGCTGTAAATTCGACGAAGATGATAGCGAGCCCGTGCTGA



AGGGCGTGAAGCTGCATTATACCTGA





SARS-CoV-2 S protein
(SEQ ID NO: 131)


mutated to remove a
MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVF


furin cleavage site and
RSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFDNPVLPF


to replace residues 817,
NDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKV


892, 899, 942, 986 and
CEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVS


987 with proline
QPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRD



LPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWT



AGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKC



TLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATR



FASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLN



DLCFTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTG



CVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQ



AGSTPCNGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFE



LLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKK



FLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTN



TSNQVAVLYQDVNCTEVPVAIHADQLTPTWRVYSTGSNVFQT



RAGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSPGSASSVAS



QSIIAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEILPVSMTKT



SVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDK



NTQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSPIEDL



LFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPL



LTDEMIAQYTSALLAGTITSGWTFGAGPALQIPFPMQMAYRF



NGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTPSALGKLQ



DVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDPPEAEVQ



IDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLG



QSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTT



APAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTDNT



FVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTSPD



VDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYE



QYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCCSC



GSCCKFDEDDSEPVLKGVKLHYT





Optimized nucleotide
(SEQ ID NO: 132)


sequence encoding a
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


SARS-CoV-2 S protein
TGTGTCAACCTGACAACTAGGACTCAGCTGCCACCAGCCTA


mutated to replace
CACCAACTCCTTCACCAGAGGCGTGTATTACCCAGACAAGG


residues 817, 892, 899,
TGTTTAGAAGCAGCGTGCTGCACTCTACCCAGGACCTCTTT


942, 986 and 987 with
CTGCCCTTTTTCAGCAACGTGACATGGTTTCACGCAATTCA


proline and which
CGTGTCCGGCACTAATGGCACAAAGCGGTTCGACAATCCA


contains the D614G
GTCCTGCCTTTCAACGATGGCGTCTACTTTGCATCTACTGA


mutation
GAAATCCAATATCATTAGGGGATGGATCTTCGGCACAACCC



TGGATTCTAAGACCCAGAGCCTGCTGATCGTCAACAACGCC



ACAAACGTGGTCATTAAGGTTTGCGAGTTTCAGTTCTGTAA



CGATCCTTTTCTGGGCGTGTATTATCATAAGAACAATAAGA



GCTGGATGGAGTCCGAGTTTAGAGTGTATAGCTCTGCAAAT



AATTGTACCTTTGAGTACGTGAGCCAGCCCTTTCTGATGGA



CCTGGAGGGAAAACAAGGAAACTTCAAAAACCTGCGGGAA



TTCGTTTTCAAAAACATCGACGGCTATTTCAAGATCTATAG



CAAGCATACCCCAATCAACCTCGTGAGGGACCTCCCCCAG



GGCTTTAGCGCACTGGAGCCACTGGTTGACCTGCCTATCGG



CATTAATATCACAAGATTTCAGACCCTGCTGGCACTGCATA



GAAGCTATCTGACCCCTGGAGACTCCTCTAGTGGGTGGACT



GCCGGCGCCGCTGCCTACTATGTGGGCTATCTGCAGCCACG



GACATTCCTGCTGAAATACAATGAGAACGGGACAATCACA



GATGCTGTTGATTGCGCACTCGACCCCCTGTCCGAGACAAA



GTGCACTCTCAAGAGCTTTACCGTCGAGAAGGGCATCTATC



AGACCTCAAACTTCAGGGTGCAGCCCACAGAATCTATCGTG



CGCTTCCCTAATATCACTAACCTGTGTCCTTTCGGTGAAGT



GTTCAACGCCACCAGGTTTGCTAGCGTGTATGCCTGGAACA



GGAAGAGGATCTCTAACTGCGTCGCCGACTATTCCGTGCTG



TATAACAGCGCCTCCTTCTCCACATTCAAATGCTATGGAGT



GAGCCCGACAAAACTGAACGATCTCTGCTTTACAAATGTCT



ACGCCGACTCTTTTGTGATCAGAGGGGACGAGGTCCGGCA



GATCGCACCAGGACAGACAGGCAAGATTGCTGACTACAAC



TATAAGCTGCCTGACGACTTCACAGGATGTGTGATCGCATG



GAACTCAAACAATCTGGACTCCAAAGTCGGGGGCAACTAT



AATTACCTGTATCGCCTGTTCCGGAAGTCCAACCTGAAGCC



CTTCGAGAGGGACATCAGTACAGAGATCTATCAGGCTGGC



TCCACCCCTTGCAATGGCGTCGAAGGCTTTAATTGTTATTTT



CCCCTGCAGTCTTACGGGTTTCAGCCTACTAATGGAGTTGG



GTACCAGCCATACAGAGTGGTCGTGCTCAGCTTCGAGCTCC



TGCATGCTCCAGCTACAGTTTGCGGGCCAAAGAAGTCCACT



AACCTGGTGAAGAATAAGTGCGTCAACTTCAACTTTAACGG



GCTCACCGGCACCGGCGTGCTGACTGAGAGCAACAAGAAG



TTTCTGCCATTTCAACAGTTTGGACGGGACATTGCCGACAC



CACCGATGCCGTTCGGGATCCACAGACCCTGGAAATTCTGG



ACATTACACCGTGCAGCTTCGGGGGCGTGAGCGTGATCAC



ACCCGGAACCAATACAAGCAACCAGGTTGCCGTCCTGTATC



AGGGCGTCAATTGCACAGAAGTGCCAGTTGCTATCCACGC



AGACCAGCTGACTCCCACATGGCGGGTGTATAGCACCGGA



TCCAACGTGTTTCAGACCCGCGCCGGATGTCTCATTGGGGC



CGAGCACGTGAATAACAGCTACGAGTGCGACATCCCCATT



GGCGCCGGCATTTGTGCGTCTTACCAGACTCAGACCAACTC



TCCTAGAAGGGCCAGGTCCGTTGCTAGTCAGTCTATTATTG



CCTATACCATGAGCCTCGGAGCTGAGAATAGCGTGGCCTAC



TCCAATAATTCCATCGCAATCCCTACTAACTTCACTATTTCT



GTGACCACCGAGATCCTGCCTGTGTCTATGACTAAGACTAG



CGTTGATTGTACCATGTATATTTGTGGCGACTCTACCGAAT



GTTCTAACCTGCTGCTTCAGTACGGCTCATTTTGCACACAG



CTGAACAGAGCCCTGACTGGGATCGCTGTGGAGCAGGACA



AGAACACACAGGAGGTGTTTGCACAGGTGAAGCAGATCTA



TAAGACCCCTCCTATTAAGGATTTCGGCGGATTCAATTTCT



CACAGATTCTGCCAGACCCCAGTAAGCCTTCCAAGAGGAG



CCCTATCGAGGATCTCCTGTTTAACAAGGTGACCCTGGCAG



ACGCCGGCTTTATTAAGCAATATGGGGATTGCCTGGGCGAC



ATTGCTGCCAGAGACCTGATTTGCGCCCAGAAATTCAATGG



CCTCACAGTGCTGCCACCTCTGCTGACCGACGAGATGATCG



CTCAATACACTAGCGCACTGCTGGCCGGAACCATCACATCA



GGCTGGACCTTCGGGGCCGGACCAGCACTGCAGATTCCATT



CCCTATGCAGATGGCCTATAGATTCAACGGCATTGGCGTCA



CACAGAACGTGCTGTACGAAAACCAGAAGCTCATCGCTAA



CCAGTTTAATTCCGCAATTGGAAAGATCCAAGATTCACTCA



GCTCAACCCCCTCTGCACTCGGAAAGCTGCAGGACGTGGTC



AACCAGAATGCTCAGGCCCTGAACACACTCGTCAAGCAGC



TGTCCTCTAACTTTGGCGCTATCAGCTCCGTTCTGAACGAC



ATTCTGAGCCGCCTGGATCCCCCAGAGGCTGAAGTCCAGAT



TGACCGCCTGATTACCGGCCGGCTGCAGTCTCTGCAAACAT



ACGTGACCCAGCAGCTGATCAGAGCAGCCGAGATCCGGGC



ATCCGCAAATCTGGCAGCAACTAAGATGAGCGAATGCGTG



CTGGGCCAGTCCAAGCGGGTGGACTTTTGTGGCAAGGGCT



ACCACCTGATGAGCTTCCCCCAGAGCGCCCCACATGGCGTT



GTTTTTCTGCACGTGACCTATGTCCCTGCTCAGGAAAAGAA



CTTTACAACTGCTCCTGCTATCTGCCATGACGGCAAGGCCC



ACTTCCCACGGGAGGGAGTGTTTGTGTCCAATGGCACACAC



TGGTTCGTGACCCAGAGGAACTTCTATGAACCCCAGATCAT



CACCACTGACAATACCTTCGTGTCTGGAAATTGCGACGTCG



TGATCGGCATCGTTAACAACACCGTGTACGACCCTCTCCAG



CCAGAGCTGGACTCCTTTAAGGAGGAACTGGATAAGTATTT



TAAGAACCACACAAGCCCAGATGTGGATCTCGGGGACATC



TCCGGAATTAACGCCTCCGTGGTGAATATCCAGAAGGAGA



TTGACCGCCTAAATGAAGTTGCCAAGAACCTCAATGAGTCT



CTGATTGATCTGCAGGAACTGGGCAAGTATGAGCAGTATAT



CAAATGGCCCTGGTACATTTGGCTGGGGTTTATCGCCGGAC



TGATTGCCATCGTCATGGTGACCATCATGCTGTGTTGCATG



ACCTCCTGTTGTTCCTGTCTGAAGGGCTGCTGTAGTTGCGG



CTCTTGCTGTAAATTCGACGAAGATGATAGCGAGCCCGTGC



TGAAGGGCGTGAAGCTGCATTATACCTGA





SARS-CoV-2 S protein
(SEQ ID NO: 133)


mutated to replace
MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVF


residues 817, 892, 899,
RSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFDNPVLPF


942, 986 and 987 with
NDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKV


proline and which
CEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVS


contains the D614G
QPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRD


mutation
LPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWT



AGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKC



TLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATR



FASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLN



DLCFTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTG



CVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQ



AGSTPCNGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFE



LLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKK



FLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTN



TSNQVAVLYQGVNCTEVPVAIHADQLTPTWRVYSTGSNVFQ



TRAGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSPRRARSV



ASQSIIAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEILPVSMT



KTSVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQD



KNTQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSPIED



LLFNKVTLADAGFIKQYGDCLGDIAARDLICAQKENGLTVLPP



LLTDEMIAQYTSALLAGTITSGWTFGAGPALQIPFPMQMAYR



FNGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTPSALGKLQ



DVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDPPEAEVQ



IDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLG



QSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTT



APAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTDNT



FVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTSPD



VDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYE



QYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCCSC



GSCCKFDEDDSEPVLKGVKLHYT





Optimized nucleotide
(SEQ ID NO: 134)


sequence encoding a
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


SARS-CoV-2 S protein
TGTGTCAACCTGACAACTAGGACTCAGCTGCCACCAGCCTA


mutated to remove a
CACCAACTCCTTCACCAGAGGCGTGTATTACCCAGACAAGG


furin cleavage site, to
TGTTTAGAAGCAGCGTGCTGCACTCTACCCAGGACCTCTTT


replace residues 817,
CTGCCCTTTTTCAGCAACGTGACATGGTTTCACGCAATTCA


892, 899, 942, 986 and
CGTGTCCGGCACTAATGGCACAAAGCGGTTCGACAATCCA


987 with proline and
GTCCTGCCTTTCAACGATGGCGTCTACTTTGCATCTACTGA


which contains the
GAAATCCAATATCATTAGGGGATGGATCTTCGGCACAACCC


D614G mutation
TGGATTCTAAGACCCAGAGCCTGCTGATCGTCAACAACGCC



ACAAACGTGGTCATTAAGGTTTGCGAGTTTCAGTTCTGTAA



CGATCCTTTTCTGGGCGTGTATTATCATAAGAACAATAAGA



GCTGGATGGAGTCCGAGTTTAGAGTGTATAGCTCTGCAAAT



AATTGTACCTTTGAGTACGTGAGCCAGCCCTTTCTGATGGA



CCTGGAGGGAAAACAAGGAAACTTCAAAAACCTGCGGGAA



TTCGTTTTCAAAAACATCGACGGCTATTTCAAGATCTATAG



CAAGCATACCCCAATCAACCTCGTGAGGGACCTCCCCCAG



GGCTTTAGCGCACTGGAGCCACTGGTTGACCTGCCTATCGG



CATTAATATCACAAGATTTCAGACCCTGCTGGCACTGCATA



GAAGCTATCTGACCCCTGGAGACTCCTCTAGTGGGTGGACT



GCCGGCGCCGCTGCCTACTATGTGGGCTATCTGCAGCCACG



GACATTCCTGCTGAAATACAATGAGAACGGGACAATCACA



GATGCTGTTGATTGCGCACTCGACCCCCTGTCCGAGACAAA



GTGCACTCTCAAGAGCTTTACCGTCGAGAAGGGCATCTATC



AGACCTCAAACTTCAGGGTGCAGCCCACAGAATCTATCGTG



CGCTTCCCTAATATCACTAACCTGTGTCCTTTCGGTGAAGT



GTTCAACGCCACCAGGTTTGCTAGCGTGTATGCCTGGAACA



GGAAGAGGATCTCTAACTGCGTCGCCGACTATTCCGTGCTG



TATAACAGCGCCTCCTTCTCCACATTCAAATGCTATGGAGT



GAGCCCGACAAAACTGAACGATCTCTGCTTTACAAATGTCT



ACGCCGACTCTTTTGTGATCAGAGGGGACGAGGTCCGGCA



GATCGCACCAGGACAGACAGGCAAGATTGCTGACTACAAC



TATAAGCTGCCTGACGACTTCACAGGATGTGTGATCGCATG



GAACTCAAACAATCTGGACTCCAAAGTCGGGGGCAACTAT



AATTACCTGTATCGCCTGTTCCGGAAGTCCAACCTGAAGCC



CTTCGAGAGGGACATCAGTACAGAGATCTATCAGGCTGGC



TCCACCCCTTGCAATGGCGTCGAAGGCTTTAATTGTTATTTT



CCCCTGCAGTCTTACGGGTTTCAGCCTACTAATGGAGTTGG



GTACCAGCCATACAGAGTGGTCGTGCTCAGCTTCGAGCTCC



TGCATGCTCCAGCTACAGTTTGCGGGCCAAAGAAGTCCACT



AACCTGGTGAAGAATAAGTGCGTCAACTTCAACTTTAACGG



GCTCACCGGCACCGGCGTGCTGACTGAGAGCAACAAGAAG



TTTCTGCCATTTCAACAGTTTGGACGGGACATTGCCGACAC



CACCGATGCCGTTCGGGATCCACAGACCCTGGAAATTCTGG



ACATTACACCGTGCAGCTTCGGGGGCGTGAGCGTGATCAC



ACCCGGAACCAATACAAGCAACCAGGTTGCCGTCCTGTATC



AGGGCGTCAATTGCACAGAAGTGCCAGTTGCTATCCACGC



AGACCAGCTGACTCCCACATGGCGGGTGTATAGCACCGGA



TCCAACGTGTTTCAGACCCGCGCCGGATGTCTCATTGGGGC



CGAGCACGTGAATAACAGCTACGAGTGCGACATCCCCATT



GGCGCCGGCATTTGTGCGTCTTACCAGACTCAGACCAACTC



TCCTGGCTCCGCCTCTTCCGTTGCTAGTCAGTCTATTATTGC



CTATACCATGAGCCTCGGAGCTGAGAATAGCGTGGCCTACT



CCAATAATTCCATCGCAATCCCTACTAACTTCACTATTTCTG



TGACCACCGAGATCCTGCCTGTGTCTATGACTAAGACTAGC



GTTGATTGTACCATGTATATTTGTGGCGACTCTACCGAATG



TTCTAACCTGCTGCTTCAGTACGGCTCATTTTGCACACAGCT



GAACAGAGCCCTGACTGGGATCGCTGTGGAGCAGGACAAG



AACACACAGGAGGTGTTTGCACAGGTGAAGCAGATCTATA



AGACCCCTCCTATTAAGGATTTCGGCGGATTCAATTTCTCA



CAGATTCTGCCAGACCCCAGTAAGCCTTCCAAGAGGAGCC



CTATCGAGGATCTCCTGTTTAACAAGGTGACCCTGGCAGAC



GCCGGCTTTATTAAGCAATATGGGGATTGCCTGGGCGACAT



TGCTGCCAGAGACCTGATTTGCGCCCAGAAATTCAATGGCC



TCACAGTGCTGCCACCTCTGCTGACCGACGAGATGATCGCT



CAATACACTAGCGCACTGCTGGCCGGAACCATCACATCAG



GCTGGACCTTCGGGGCCGGACCAGCACTGCAGATTCCATTC



CCTATGCAGATGGCCTATAGATTCAACGGCATTGGCGTCAC



ACAGAACGTGCTGTACGAAAACCAGAAGCTCATCGCTAAC



CAGTTTAATTCCGCAATTGGAAAGATCCAAGATTCACTCAG



CTCAACCCCCTCTGCACTCGGAAAGCTGCAGGACGTGGTCA



ACCAGAATGCTCAGGCCCTGAACACACTCGTCAAGCAGCT



GTCCTCTAACTTTGGCGCTATCAGCTCCGTTCTGAACGACA



TTCTGAGCCGCCTGGATCCCCCAGAGGCTGAAGTCCAGATT



GACCGCCTGATTACCGGCCGGCTGCAGTCTCTGCAAACATA



CGTGACCCAGCAGCTGATCAGAGCAGCCGAGATCCGGGCA



TCCGCAAATCTGGCAGCAACTAAGATGAGCGAATGCGTGC



TGGGCCAGTCCAAGCGGGTGGACTTTTGTGGCAAGGGCTA



CCACCTGATGAGCTTCCCCCAGAGCGCCCCACATGGCGTTG



TTTTTCTGCACGTGACCTATGTCCCTGCTCAGGAAAAGAAC



TTTACAACTGCTCCTGCTATCTGCCATGACGGCAAGGCCCA



CTTCCCACGGGAGGGAGTGTTTGTGTCCAATGGCACACACT



GGTTCGTGACCCAGAGGAACTTCTATGAACCCCAGATCATC



ACCACTGACAATACCTTCGTGTCTGGAAATTGCGACGTCGT



GATCGGCATCGTTAACAACACCGTGTACGACCCTCTCCAGC



CAGAGCTGGACTCCTTTAAGGAGGAACTGGATAAGTATTTT



AAGAACCACACAAGCCCAGATGTGGATCTCGGGGACATCT



CCGGAATTAACGCCTCCGTGGTGAATATCCAGAAGGAGAT



TGACCGCCTAAATGAAGTTGCCAAGAACCTCAATGAGTCTC



TGATTGATCTGCAGGAACTGGGCAAGTATGAGCAGTATATC



AAATGGCCCTGGTACATTTGGCTGGGGTTTATCGCCGGACT



GATTGCCATCGTCATGGTGACCATCATGCTGTGTTGCATGA



CCTCCTGTTGTTCCTGTCTGAAGGGCTGCTGTAGTTGCGGCT



CTTGCTGTAAATTCGACGAAGATGATAGCGAGCCCGTGCTG



AAGGGCGTGAAGCTGCATTATACCTGA





SARS-CoV-2 S protein
(SEQ ID NO: 135)


mutated to remove a
MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVF


furin cleavage site, to
RSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFDNPVLPF


replace residues 817,
NDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKV


892, 899, 942, 986 and
CEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVS


987 with proline and
QPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRD


which contains the
LPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWT


D614G mutation
AGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKC



TLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATR



FASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLN



DLCFTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTG



CVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQ



AGSTPCNGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFE



LLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKK



FLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTN



TSNQVAVLYQGVNCTEVPVAIHADQLTPTWRVYSTGSNVFQ



TRAGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSPGSASSVA



SQSIIAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEILPVSMTK



TSVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDK



NTQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSPIEDL



LFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPL



LTDEMIAQYTSALLAGTITSGWTFGAGPALQIPFPMQMAYRF



NGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTPSALGKLQ



DVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDPPEAEVQ



IDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLG



QSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTT



APAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTDNT



FVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTSPD



VDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYE



QYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCCSC



GSCCKFDEDDSEPVLKGVKLHYT





Optimized nucleotide
(SEQ ID NO: 136)


sequence encoding a
ATGTTCCTGCTGACAACAAAAAGAACCATGTTTGTGTTCCT


SARS-CoV-2 S protein
GGTGCTGCTGCCTCTGGTGTCCTCACAGTGTGTCAACCTGA


mutated to remove a
CAACAAGAACTCAGCTGCCACCAGCCTACACCAACTCCTTC


furin cleavage site, to
ACCAGAGGCGTGTATTACCCAGACAAGGTGTTTAGAAGCA


replace residues 817,
GCGTGCTGCACTCTACCCAGGACCTCTTTCTGCCCTTTTTCA


892, 899, 942, 986 and
GCAACGTGACATGGTTTCACGCAATTCACGTGTCCGGCACT


987 with proline and
AATGGCACAAAGCGGTTCGACAATCCAGTCCTGCCTTTCAA


containing an extended
CGATGGCGTCTACTTTGCATCTACTGAGAAATCCAATATCA


signal sequence
TTAGGGGATGGATCTTCGGCACAACCCTGGATTCTAAGACC



CAGAGCCTGCTGATCGTCAACAACGCCACAAACGTGGTCA



TTAAGGTTTGCGAGTTTCAGTTCTGTAACGATCCTTTTCTGG



GCGTGTATTATCATAAGAACAATAAGAGCTGGATGGAGTC



CGAGTTTAGAGTGTATAGCTCTGCAAATAATTGTACCTTTG



AGTACGTGAGCCAGCCCTTTCTGATGGACCTGGAGGGAAA



ACAAGGAAACTTCAAAAACCTGCGGGAATTCGTTTTCAAA



AACATCGACGGCTATTTCAAGATCTATAGCAAGCATACCCC



AATCAACCTCGTGAGGGACCTCCCCCAGGGCTTTAGCGCAC



TGGAGCCACTGGTTGACCTGCCTATCGGCATTAATATCACA



AGATTTCAGACCCTGCTGGCACTGCATAGAAGCTATCTGAC



CCCTGGAGACTCCTCTAGTGGGTGGACTGCCGGCGCCGCTG



CCTACTATGTGGGCTATCTGCAGCCACGGACATTCCTGCTG



AAATACAATGAGAACGGGACAATCACAGATGCTGTTGATT



GCGCACTCGACCCCCTGTCCGAGACAAAGTGCACTCTCAAG



AGCTTTACCGTCGAGAAGGGCATCTATCAGACCTCAAACTT



CAGGGTGCAGCCCACAGAATCTATCGTGCGCTTCCCTAATA



TCACTAACCTGTGTCCTTTCGGTGAAGTGTTCAACGCCACC



AGGTTTGCTAGCGTGTATGCCTGGAACAGGAAGAGGATCT



CTAACTGCGTCGCCGACTATTCCGTGCTGTATAACAGCGCC



TCCTTCTCCACATTCAAATGCTATGGAGTGAGCCCGACAAA



ACTGAACGATCTCTGCTTTACAAATGTCTACGCCGACTCTT



TTGTGATCAGAGGGGACGAGGTCCGGCAGATCGCACCAGG



ACAGACAGGCAAGATTGCTGACTACAACTATAAGCTGCCT



GACGACTTCACAGGATGTGTGATCGCATGGAACTCAAACA



ATCTGGACTCCAAAGTCGGGGGCAACTATAATTACCTGTAT



CGCCTGTTCCGGAAGTCCAACCTGAAGCCCTTCGAGAGGG



ACATCAGTACAGAGATCTATCAGGCTGGCTCCACCCCTTGC



AATGGCGTCGAAGGCTTTAATTGTTATTTTCCCCTGCAGTCT



TACGGGTTTCAGCCTACTAATGGAGTTGGGTACCAGCCATA



CAGAGTGGTCGTGCTCAGCTTCGAGCTCCTGCATGCTCCAG



CTACAGTTTGCGGGCCAAAGAAGTCCACTAACCTGGTGAA



GAATAAGTGCGTCAACTTCAACTTTAACGGGCTCACCGGCA



CCGGCGTGCTGACTGAGAGCAACAAGAAGTTTCTGCCATTT



CAACAGTTTGGACGGGACATTGCCGACACCACCGATGCCG



TTCGGGATCCACAGACCCTGGAAATTCTGGACATTACACCG



TGCAGCTTCGGGGGCGTGAGCGTGATCACACCCGGAACCA



ATACAAGCAACCAGGTTGCCGTCCTGTATCAGGATGTCAAT



TGCACAGAAGTGCCAGTTGCTATCCACGCAGACCAGCTGA



CTCCCACATGGCGGGTGTATAGCACCGGATCCAACGTGTTT



CAGACCCGCGCCGGATGTCTCATTGGGGCCGAGCACGTGA



ATAACAGCTACGAGTGCGACATCCCCATTGGCGCCGGCATT



TGTGCGTCTTACCAGACTCAGACCAACTCTCCTGGCTCCGC



CTCTTCCGTTGCTAGTCAGTCTATTATTGCCTATACCATGAG



CCTCGGAGCTGAGAATAGCGTGGCCTACTCCAATAATTCCA



TCGCAATCCCTACTAACTTCACTATTTCTGTGACCACCGAG



ATCCTGCCTGTGTCTATGACTAAGACTAGCGTTGATTGTAC



CATGTATATTTGTGGCGACTCTACCGAATGTTCTAACCTGC



TGCTTCAGTACGGCTCATTTTGCACACAGCTGAACAGAGCC



CTGACTGGGATCGCTGTGGAGCAGGACAAGAACACACAGG



AGGTGTTTGCACAGGTGAAGCAGATCTATAAGACCCCTCCT



ATTAAGGATTTCGGCGGATTCAATTTCTCACAGATTCTGCC



AGACCCCAGTAAGCCTTCCAAGAGGAGCCCTATCGAGGAT



CTCCTGTTTAACAAGGTGACCCTGGCAGACGCCGGCTTTAT



TAAGCAATATGGGGATTGCCTGGGCGACATTGCTGCCAGA



GACCTGATTTGCGCCCAGAAATTCAATGGCCTCACAGTGCT



GCCACCTCTGCTGACCGACGAGATGATCGCTCAATACACTA



GCGCACTGCTGGCCGGAACCATCACATCAGGCTGGACCTTC



GGGGCCGGACCAGCACTGCAGATTCCATTCCCTATGCAGAT



GGCCTATAGATTCAACGGCATTGGCGTCACACAGAACGTG



CTGTACGAAAACCAGAAGCTCATCGCTAACCAGTTTAATTC



CGCAATTGGAAAGATCCAAGATTCACTCAGCTCAACCCCCT



CTGCACTCGGAAAGCTGCAGGACGTGGTCAACCAGAATGC



TCAGGCCCTGAACACACTCGTCAAGCAGCTGTCCTCTAACT



TTGGCGCTATCAGCTCCGTTCTGAACGACATTCTGAGCCGC



CTGGATCCCCCAGAGGCTGAAGTCCAGATTGACCGCCTGAT



TACCGGCCGGCTGCAGTCTCTGCAAACATACGTGACCCAGC



AGCTGATCAGAGCAGCCGAGATCCGGGCATCCGCAAATCT



GGCAGCAACTAAGATGAGCGAATGCGTGCTGGGCCAGTCC



AAGCGGGTGGACTTTTGTGGCAAGGGCTACCACCTGATGA



GCTTCCCCCAGAGCGCCCCACATGGCGTTGTTTTTCTGCAC



GTGACCTATGTCCCTGCTCAGGAAAAGAACTTTACAACTGC



TCCTGCTATCTGCCATGACGGCAAGGCCCACTTCCCACGGG



AGGGAGTGTTTGTGTCCAATGGCACACACTGGTTCGTGACC



CAGAGGAACTTCTATGAACCCCAGATCATCACCACTGACA



ATACCTTCGTGTCTGGAAATTGCGACGTCGTGATCGGCATC



GTTAACAACACCGTGTACGACCCTCTCCAGCCAGAGCTGGA



CTCCTTTAAGGAGGAACTGGATAAGTATTTTAAGAACCACA



CAAGCCCAGATGTGGATCTCGGGGACATCTCCGGAATTAA



CGCCTCCGTGGTGAATATCCAGAAGGAGATTGACCGCCTA



AATGAAGTTGCCAAGAACCTCAATGAGTCTCTGATTGATCT



GCAGGAACTGGGCAAGTATGAGCAGTATATCAAATGGCCC



TGGTACATTTGGCTGGGGTTTATCGCCGGACTGATTGCCAT



CGTCATGGTGACCATCATGCTGTGTTGCATGACCTCCTGTT



GTTCCTGTCTGAAGGGCTGCTGTAGTTGCGGCTCTTGCTGT



AAATTCGACGAAGATGATAGCGAGCCCGTGCTGAAGGGCG



TGAAGCTGCATTATACCTGA





SARS-CoV-2 S protein
(SEQ ID NO: 137)


mutated to remove a
MFLLTTKRTMFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTR


furin cleavage site, to
GVYYPDKVFRSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTK


replace residues 817,
RFDNPVLPFNDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVN


892, 899, 942, 986 and
NATNVVIKVCEFQFCNDPFLGVYYHKNNKSWMESEFRVYSS


987 with proline and
ANNCTFEYVSQPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYS


containing an extended
KHTPINLVRDLPQGFSALEPLVDLPIGINITRFQTLLALHRSYLT


signal sequence
PGDSSSGWTAGAAAYYVGYLQPRTFLLKYNENGTITDAVDC



ALDPLSETKCTLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLC



PFGEVFNATRFASVYAWNRKRISNCVADYSVLYNSASFSTFK



CYGVSPTKLNDLCFTNVYADSFVIRGDEVRQIAPGQTGKIADY



NYKLPDDFTGCVIAWNSNNLDSKVGGNYNYLYRLFRKSNLK



PFERDISTEIYQAGSTPCNGVEGFNCYFPLQSYGFQPTNGVGY



QPYRVVVLSFELLHAPATVCGPKKSTNLVKNKCVNFNFNGLT



GTGVLTESNKKFLPFQQFGRDIADTTDAVRDPQTLEILDITPCS



FGGVSVITPGTNTSNQVAVLYQDVNCTEVPVAIHADQLTPTW



RVYSTGSNVFQTRAGCLIGAEHVNNSYECDIPIGAGICASYQT



QTNSPGSASSVASQSIIAYTMSLGAENSVAYSNNSIAIPTNFTIS



VTTEILPVSMTKTSVDCTMYICGDSTECSNLLLQYGSFCTQLN



RALTGIAVEQDKNTQEVFAQVKQIYKTPPIKDFGGFNFSQILPD



PSKPSKRSPIEDLLFNKVTLADAGFIKQYGDCLGDIAARDLICA



QKFNGLTVLPPLLTDEMIAQYTSALLAGTITSGWTFGAGPALQ



IPFPMQMAYRFNGIGVTQNVLYENQKLIANQFNSAIGKIQDSL



SSTPSALGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLNDIL



SRLDPPEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASANLA



ATKMSECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVT



YVPAQEKNFTTAPAICHDGKAHFPREGVFVSNGTHWFVTQRN



FYEPQIITTDNTFVSGNCDVVIGIVNNTVYDPLQPELDSFKEEL



DKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEVAKNLNE



SLIDLQELGKYEQYIKWPWYIWLGFIAGLIAIVMVTIMLCCMT



SCCSCLKGCCSCGSCCKFDEDDSEPVLKGVKLHYT





Optimized nucleotide
(SEQ ID NO: 138)


sequence encoding a
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


SARS-CoV-2 S protein
TGTGTCAACCTGACAACTAGGACTCAGCTGCCACCAGCCTA


mutated to remove a
CACCAACTCCTTCACCAGAGGCGTGTATTACCCAGACAAGG


furin cleavage site, to
TGTTTAGAAGCAGCGTGCTGCACTCTACCCAGGACCTCTTT


replace residues 817,
CTGCCCTTTTTCAGCAACGTGACATGGTTTCACGCAATTCA


892, 899, 942, 986 and
CGTGTCCGGCACTAATGGCACAAAGCGGTTCGACAATCCA


987 with proline and to
GTCCTGCCTTTCAACGATGGCGTCTACTTTGCATCTACTGA


mutate the ER retrieval
GAAATCCAATATCATTAGGGGATGGATCTTCGGCACAACCC


signal
TGGATTCTAAGACCCAGAGCCTGCTGATCGTCAACAACGCC



ACAAACGTGGTCATTAAGGTTTGCGAGTTTCAGTTCTGTAA



CGATCCTTTTCTGGGCGTGTATTATCATAAGAACAATAAGA



GCTGGATGGAGTCCGAGTTTAGAGTGTATAGCTCTGCAAAT



AATTGTACCTTTGAGTACGTGAGCCAGCCCTTTCTGATGGA



CCTGGAGGGAAAACAAGGAAACTTCAAAAACCTGCGGGAA



TTCGTTTTCAAAAACATCGACGGCTATTTCAAGATCTATAG



CAAGCATACCCCAATCAACCTCGTGAGGGACCTCCCCCAG



GGCTTTAGCGCACTGGAGCCACTGGTTGACCTGCCTATCGG



CATTAATATCACAAGATTTCAGACCCTGCTGGCACTGCATA



GAAGCTATCTGACCCCTGGAGACTCCTCTAGTGGGTGGACT



GCCGGCGCCGCTGCCTACTATGTGGGCTATCTGCAGCCACG



GACATTCCTGCTGAAATACAATGAGAACGGGACAATCACA



GATGCTGTTGATTGCGCACTCGACCCCCTGTCCGAGACAAA



GTGCACTCTCAAGAGCTTTACCGTCGAGAAGGGCATCTATC



AGACCTCAAACTTCAGGGTGCAGCCCACAGAATCTATCGTG



CGCTTCCCTAATATCACTAACCTGTGTCCTTTCGGTGAAGT



GTTCAACGCCACCAGGTTTGCTAGCGTGTATGCCTGGAACA



GGAAGAGGATCTCTAACTGCGTCGCCGACTATTCCGTGCTG



TATAACAGCGCCTCCTTCTCCACATTCAAATGCTATGGAGT



GAGCCCGACAAAACTGAACGATCTCTGCTTTACAAATGTCT



ACGCCGACTCTTTTGTGATCAGAGGGGACGAGGTCCGGCA



GATCGCACCAGGACAGACAGGCAAGATTGCTGACTACAAC



TATAAGCTGCCTGACGACTTCACAGGATGTGTGATCGCATG



GAACTCAAACAATCTGGACTCCAAAGTCGGGGGCAACTAT



AATTACCTGTATCGCCTGTTCCGGAAGTCCAACCTGAAGCC



CTTCGAGAGGGACATCAGTACAGAGATCTATCAGGCTGGC



TCCACCCCTTGCAATGGCGTCGAAGGCTTTAATTGTTATTTT



CCCCTGCAGTCTTACGGGTTTCAGCCTACTAATGGAGTTGG



GTACCAGCCATACAGAGTGGTCGTGCTCAGCTTCGAGCTCC



TGCATGCTCCAGCTACAGTTTGCGGGCCAAAGAAGTCCACT



AACCTGGTGAAGAATAAGTGCGTCAACTTCAACTTTAACGG



GCTCACCGGCACCGGCGTGCTGACTGAGAGCAACAAGAAG



TTTCTGCCATTTCAACAGTTTGGACGGGACATTGCCGACAC



CACCGATGCCGTTCGGGATCCACAGACCCTGGAAATTCTGG



ACATTACACCGTGCAGCTTCGGGGGCGTGAGCGTGATCAC



ACCCGGAACCAATACAAGCAACCAGGTTGCCGTCCTGTATC



AGGATGTCAATTGCACAGAAGTGCCAGTTGCTATCCACGCA



GACCAGCTGACTCCCACATGGCGGGTGTATAGCACCGGAT



CCAACGTGTTTCAGACCCGCGCCGGATGTCTCATTGGGGCC



GAGCACGTGAATAACAGCTACGAGTGCGACATCCCCATTG



GCGCCGGCATTTGTGCGTCTTACCAGACTCAGACCAACTCT



CCTGGCTCCGCCTCTTCCGTTGCTAGTCAGTCTATTATTGCC



TATACCATGAGCCTCGGAGCTGAGAATAGCGTGGCCTACTC



CAATAATTCCATCGCAATCCCTACTAACTTCACTATTTCTGT



GACCACCGAGATCCTGCCTGTGTCTATGACTAAGACTAGCG



TTGATTGTACCATGTATATTTGTGGCGACTCTACCGAATGTT



CTAACCTGCTGCTTCAGTACGGCTCATTTTGCACACAGCTG



AACAGAGCCCTGACTGGGATCGCTGTGGAGCAGGACAAGA



ACACACAGGAGGTGTTTGCACAGGTGAAGCAGATCTATAA



GACCCCTCCTATTAAGGATTTCGGCGGATTCAATTTCTCAC



AGATTCTGCCAGACCCCAGTAAGCCTTCCAAGAGGAGCCCT



ATCGAGGATCTCCTGTTTAACAAGGTGACCCTGGCAGACGC



CGGCTTTATTAAGCAATATGGGGATTGCCTGGGCGACATTG



CTGCCAGAGACCTGATTTGCGCCCAGAAATTCAATGGCCTC



ACAGTGCTGCCACCTCTGCTGACCGACGAGATGATCGCTCA



ATACACTAGCGCACTGCTGGCCGGAACCATCACATCAGGCT



GGACCTTCGGGGCCGGACCAGCACTGCAGATTCCATTCCCT



ATGCAGATGGCCTATAGATTCAACGGCATTGGCGTCACACA



GAACGTGCTGTACGAAAACCAGAAGCTCATCGCTAACCAG



TTTAATTCCGCAATTGGAAAGATCCAAGATTCACTCAGCTC



AACCCCCTCTGCACTCGGAAAGCTGCAGGACGTGGTCAAC



CAGAATGCTCAGGCCCTGAACACACTCGTCAAGCAGCTGTC



CTCTAACTTTGGCGCTATCAGCTCCGTTCTGAACGACATTCT



GAGCCGCCTGGATCCCCCAGAGGCTGAAGTCCAGATTGAC



CGCCTGATTACCGGCCGGCTGCAGTCTCTGCAAACATACGT



GACCCAGCAGCTGATCAGAGCAGCCGAGATCCGGGCATCC



GCAAATCTGGCAGCAACTAAGATGAGCGAATGCGTGCTGG



GCCAGTCCAAGCGGGTGGACTTTTGTGGCAAGGGCTACCA



CCTGATGAGCTTCCCCCAGAGCGCCCCACATGGCGTTGTTT



TTCTGCACGTGACCTATGTCCCTGCTCAGGAAAAGAACTTT



ACAACTGCTCCTGCTATCTGCCATGACGGCAAGGCCCACTT



CCCACGGGAGGGAGTGTTTGTGTCCAATGGCACACACTGGT



TCGTGACCCAGAGGAACTTCTATGAACCCCAGATCATCACC



ACTGACAATACCTTCGTGTCTGGAAATTGCGACGTCGTGAT



CGGCATCGTTAACAACACCGTGTACGACCCTCTCCAGCCAG



AGCTGGACTCCTTTAAGGAGGAACTGGATAAGTATTTTAAG



AACCACACAAGCCCAGATGTGGATCTCGGGGACATCTCCG



GAATTAACGCCTCCGTGGTGAATATCCAGAAGGAGATTGA



CCGCCTAAATGAAGTTGCCAAGAACCTCAATGAGTCTCTGA



TTGATCTGCAGGAACTGGGCAAGTATGAGCAGTATATCAA



ATGGCCCTGGTACATTTGGCTGGGGTTTATCGCCGGACTGA



TTGCCATCGTCATGGTGACCATCATGCTGTGTTGCATGACC



TCCTGTTGTTCCTGTCTGAAGGGCTGCTGTAGTTGCGGCTCT



TGCTGTAAATTCGACGAAGATGATAGCGAGCCCGTGCTGA



AGGGCGTGGCCCTGGCTTATACCTGA





SARS-CoV-2 S protein
(SEQ ID NO: 139)


mutated to remove a
MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVF


furin cleavage site, to
RSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFDNPVLPF


replace residues 817,
NDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKV


892, 899, 942, 986 and
CEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVS


987 with proline and to
QPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRD


mutate the ER retrieval
LPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWT


signal
AGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKC



TLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATR



FASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLN



DLCFTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTG



CVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQ



AGSTPCNGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFE



LLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKK



FLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTN



TSNQVAVLYQDVNCTEVPVAIHADQLTPTWRVYSTGSNVFQT



RAGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSPGSASSVAS



QSIIAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEILPVSMTKT



SVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDK



NTQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSPIEDL



LFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPL



LTDEMIAQYTSALLAGTITSGWTFGAGPALQIPFPMQMAYRF



NGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTPSALGKLQ



DVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDPPEAEVQ



IDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLG



QSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTT



APAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTDNT



FVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTSPD



VDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYE



QYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCCSC



GSCCKFDEDDSEPVLKGVALAYT





Optimized nucleotide
(SEQ ID NO: 140)


sequence encoding a
ATGTTCCTGCTGACAACAAAAAGAACCATGTTTGTGTTCCT


SARS-CoV-2 S protein
GGTGCTGCTGCCTCTGGTGTCCTCACAGTGTGTCAACCTGA


mutated to remove a
CAACAAGAACTCAGCTGCCACCAGCCTACACCAACTCCTTC


furin cleavage site, to
ACCAGAGGCGTGTATTACCCAGACAAGGTGTTTAGAAGCA


replace residues 817,
GCGTGCTGCACTCTACCCAGGACCTCTTTCTGCCCTTTTTCA


892, 899, 942, 986 and
GCAACGTGACATGGTTTCACGCAATTCACGTGTCCGGCACT


987 with proline, to
AATGGCACAAAGCGGTTCGACAATCCAGTCCTGCCTTTCAA


mutate the ER retrieval
CGATGGCGTCTACTTTGCATCTACTGAGAAATCCAATATCA


signal and containing an
TTAGGGGATGGATCTTCGGCACAACCCTGGATTCTAAGACC


extended signal
CAGAGCCTGCTGATCGTCAACAACGCCACAAACGTGGTCA


sequence
TTAAGGTTTGCGAGTTTCAGTTCTGTAACGATCCTTTTCTGG



GCGTGTATTATCATAAGAACAATAAGAGCTGGATGGAGTC



CGAGTTTAGAGTGTATAGCTCTGCAAATAATTGTACCTTTG



AGTACGTGAGCCAGCCCTTTCTGATGGACCTGGAGGGAAA



ACAAGGAAACTTCAAAAACCTGCGGGAATTCGTTTTCAAA



AACATCGACGGCTATTTCAAGATCTATAGCAAGCATACCCC



AATCAACCTCGTGAGGGACCTCCCCCAGGGCTTTAGCGCAC



TGGAGCCACTGGTTGACCTGCCTATCGGCATTAATATCACA



AGATTTCAGACCCTGCTGGCACTGCATAGAAGCTATCTGAC



CCCTGGAGACTCCTCTAGTGGGTGGACTGCCGGCGCCGCTG



CCTACTATGTGGGCTATCTGCAGCCACGGACATTCCTGCTG



AAATACAATGAGAACGGGACAATCACAGATGCTGTTGATT



GCGCACTCGACCCCCTGTCCGAGACAAAGTGCACTCTCAAG



AGCTTTACCGTCGAGAAGGGCATCTATCAGACCTCAAACTT



CAGGGTGCAGCCCACAGAATCTATCGTGCGCTTCCCTAATA



TCACTAACCTGTGTCCTTTCGGTGAAGTGTTCAACGCCACC



AGGTTTGCTAGCGTGTATGCCTGGAACAGGAAGAGGATCT



CTAACTGCGTCGCCGACTATTCCGTGCTGTATAACAGCGCC



TCCTTCTCCACATTCAAATGCTATGGAGTGAGCCCGACAAA



ACTGAACGATCTCTGCTTTACAAATGTCTACGCCGACTCTT



TTGTGATCAGAGGGGACGAGGTCCGGCAGATCGCACCAGG



ACAGACAGGCAAGATTGCTGACTACAACTATAAGCTGCCT



GACGACTTCACAGGATGTGTGATCGCATGGAACTCAAACA



ATCTGGACTCCAAAGTCGGGGGCAACTATAATTACCTGTAT



CGCCTGTTCCGGAAGTCCAACCTGAAGCCCTTCGAGAGGG



ACATCAGTACAGAGATCTATCAGGCTGGCTCCACCCCTTGC



AATGGCGTCGAAGGCTTTAATTGTTATTTTCCCCTGCAGTCT



TACGGGTTTCAGCCTACTAATGGAGTTGGGTACCAGCCATA



CAGAGTGGTCGTGCTCAGCTTCGAGCTCCTGCATGCTCCAG



CTACAGTTTGCGGGCCAAAGAAGTCCACTAACCTGGTGAA



GAATAAGTGCGTCAACTTCAACTTTAACGGGCTCACCGGCA



CCGGCGTGCTGACTGAGAGCAACAAGAAGTTTCTGCCATTT



CAACAGTTTGGACGGGACATTGCCGACACCACCGATGCCG



TTCGGGATCCACAGACCCTGGAAATTCTGGACATTACACCG



TGCAGCTTCGGGGGCGTGAGCGTGATCACACCCGGAACCA



ATACAAGCAACCAGGTTGCCGTCCTGTATCAGGATGTCAAT



TGCACAGAAGTGCCAGTTGCTATCCACGCAGACCAGCTGA



CTCCCACATGGCGGGTGTATAGCACCGGATCCAACGTGTTT



CAGACCCGCGCCGGATGTCTCATTGGGGCCGAGCACGTGA



ATAACAGCTACGAGTGCGACATCCCCATTGGCGCCGGCATT



TGTGCGTCTTACCAGACTCAGACCAACTCTCCTGGCTCCGC



CTCTTCCGTTGCTAGTCAGTCTATTATTGCCTATACCATGAG



CCTCGGAGCTGAGAATAGCGTGGCCTACTCCAATAATTCCA



TCGCAATCCCTACTAACTTCACTATTTCTGTGACCACCGAG



ATCCTGCCTGTGTCTATGACTAAGACTAGCGTTGATTGTAC



CATGTATATTTGTGGCGACTCTACCGAATGTTCTAACCTGC



TGCTTCAGTACGGCTCATTTTGCACACAGCTGAACAGAGCC



CTGACTGGGATCGCTGTGGAGCAGGACAAGAACACACAGG



AGGTGTTTGCACAGGTGAAGCAGATCTATAAGACCCCTCCT



ATTAAGGATTTCGGCGGATTCAATTTCTCACAGATTCTGCC



AGACCCCAGTAAGCCTTCCAAGAGGAGCCCTATCGAGGAT



CTCCTGTTTAACAAGGTGACCCTGGCAGACGCCGGCTTTAT



TAAGCAATATGGGGATTGCCTGGGCGACATTGCTGCCAGA



GACCTGATTTGCGCCCAGAAATTCAATGGCCTCACAGTGCT



GCCACCTCTGCTGACCGACGAGATGATCGCTCAATACACTA



GCGCACTGCTGGCCGGAACCATCACATCAGGCTGGACCTTC



GGGGCCGGACCAGCACTGCAGATTCCATTCCCTATGCAGAT



GGCCTATAGATTCAACGGCATTGGCGTCACACAGAACGTG



CTGTACGAAAACCAGAAGCTCATCGCTAACCAGTTTAATTC



CGCAATTGGAAAGATCCAAGATTCACTCAGCTCAACCCCCT



CTGCACTCGGAAAGCTGCAGGACGTGGTCAACCAGAATGC



TCAGGCCCTGAACACACTCGTCAAGCAGCTGTCCTCTAACT



TTGGCGCTATCAGCTCCGTTCTGAACGACATTCTGAGCCGC



CTGGATCCCCCAGAGGCTGAAGTCCAGATTGACCGCCTGAT



TACCGGCCGGCTGCAGTCTCTGCAAACATACGTGACCCAGC



AGCTGATCAGAGCAGCCGAGATCCGGGCATCCGCAAATCT



GGCAGCAACTAAGATGAGCGAATGCGTGCTGGGCCAGTCC



AAGCGGGTGGACTTTTGTGGCAAGGGCTACCACCTGATGA



GCTTCCCCCAGAGCGCCCCACATGGCGTTGTTTTTCTGCAC



GTGACCTATGTCCCTGCTCAGGAAAAGAACTTTACAACTGC



TCCTGCTATCTGCCATGACGGCAAGGCCCACTTCCCACGGG



AGGGAGTGTTTGTGTCCAATGGCACACACTGGTTCGTGACC



CAGAGGAACTTCTATGAACCCCAGATCATCACCACTGACA



ATACCTTCGTGTCTGGAAATTGCGACGTCGTGATCGGCATC



GTTAACAACACCGTGTACGACCCTCTCCAGCCAGAGCTGGA



CTCCTTTAAGGAGGAACTGGATAAGTATTTTAAGAACCACA



CAAGCCCAGATGTGGATCTCGGGGACATCTCCGGAATTAA



CGCCTCCGTGGTGAATATCCAGAAGGAGATTGACCGCCTA



AATGAAGTTGCCAAGAACCTCAATGAGTCTCTGATTGATCT



GCAGGAACTGGGCAAGTATGAGCAGTATATCAAATGGCCC



TGGTACATTTGGCTGGGGTTTATCGCCGGACTGATTGCCAT



CGTCATGGTGACCATCATGCTGTGTTGCATGACCTCCTGTT



GTTCCTGTCTGAAGGGCTGCTGTAGTTGCGGCTCTTGCTGT



AAATTCGACGAAGATGATAGCGAGCCCGTGCTGAAGGGCG



TGGCCCTGGCTTATACCTGA





SARS-CoV-2 S protein
(SEQ ID NO: 141)


mutated to remove a
MFLLTTKRTMFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTR


furin cleavage site, to
GVYYPDKVFRSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTK


replace residues 817,
RFDNPVLPFNDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVN


892, 899, 942, 986 and
NATNVVIKVCEFQFCNDPFLGVYYHKNNKSWMESEFRVYSS


987 with proline, to
ANNCTFEYVSQPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYS


mutate the ER retrieval
KHTPINLVRDLPQGFSALEPLVDLPIGINITRFQTLLALHRSYLT


signal and containing an
PGDSSSGWTAGAAAYYVGYLQPRTFLLKYNENGTITDAVDC


extended signal
ALDPLSETKCTLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLC


sequence
PFGEVFNATRFASVYAWNRKRISNCVADYSVLYNSASFSTFK



CYGVSPTKLNDLCFTNVYADSFVIRGDEVRQIAPGQTGKIADY



NYKLPDDFTGCVIAWNSNNLDSKVGGNYNYLYRLFRKSNLK



PFERDISTEIYQAGSTPCNGVEGFNCYFPLQSYGFQPTNGVGY



QPYRVVVLSFELLHAPATVCGPKKSTNLVKNKCVNFNFNGLT



GTGVLTESNKKFLPFQQFGRDIADTTDAVRDPQTLEILDITPCS



FGGVSVITPGTNTSNQVAVLYQDVNCTEVPVAIHADQLTPTW



RVYSTGSNVFQTRAGCLIGAEHVNNSYECDIPIGAGICASYQT



QTNSPGSASSVASQSIIAYTMSLGAENSVAYSNNSIAIPTNFTIS



VTTEILPVSMTKTSVDCTMYICGDSTECSNLLLQYGSFCTQLN



RALTGIAVEQDKNTQEVFAQVKQIYKTPPIKDFGGFNFSQILPD



PSKPSKRSPIEDLLFNKVTLADAGFIKQYGDCLGDIAARDLICA



QKFNGLTVLPPLLTDEMIAQYTSALLAGTITSGWTFGAGPALQ



IPFPMQMAYRFNGIGVTQNVLYENQKLIANQFNSAIGKIQDSL



SSTPSALGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLNDIL



SRLDPPEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASANLA



ATKMSECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVT



YVPAQEKNFTTAPAICHDGKAHFPREGVFVSNGTHWFVTQRN



FYEPQIITTDNTFVSGNCDVVIGIVNNTVYDPLQPELDSFKEEL



DKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEVAKNLNE



SLIDLQELGKYEQYIKWPWYIWLGFIAGLIAIVMVTIMLCCMT



SCCSCLKGCCSCGSCCKFDEDDSEPVLKGVALAYT





Optimized nucleotide
(SEQ ID NO: 150)


sequence encoding a
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


SARS-CoV-2 S protein
TGTGTCAACCTGACAACTAGGACTCAGCTGCCACCAGCCTA


mutated to contain the
CACCAACTCCTTCACCAGAGGCGTGTATTACCCAGACAAGG


H69-, V70-, Y144-,
TGTTTAGAAGCAGCGTGCTGCACTCTACCCAGGACCTCTTT


N501Y, A570D, D614G,
CTGCCCTTTTTCAGCAACGTGACATGGTTTCACGCAATTTCC


P681H, T716I, S982A
GGCACTAATGGCACAAAGCGGTTCGACAATCCAGTCCTGC


and D1118H mutations
CTTTCAACGATGGCGTCTACTTTGCATCTACTGAGAAATCC



AATATCATTAGGGGATGGATCTTCGGCACAACCCTGGATTC



TAAGACCCAGAGCCTGCTGATCGTCAACAACGCCACAAAC



GTGGTCATTAAGGTTTGCGAGTTTCAGTTCTGTAACGATCC



TTTTCTGGGCGTTTACCATAAGAACAATAAGAGCTGGATGG



AGTCCGAGTTTAGAGTGTATAGCTCTGCAAATAATTGTACC



TTTGAGTACGTGAGCCAGCCCTTTCTGATGGACCTGGAGGG



AAAACAAGGAAACTTCAAAAACCTGCGGGAATTCGTTTTC



AAAAACATCGACGGCTATTTCAAGATCTATAGCAAGCATA



CCCCAATCAACCTCGTGAGGGACCTCCCCCAGGGCTTTAGC



GCACTGGAGCCACTGGTTGACCTGCCTATCGGCATTAATAT



CACAAGATTTCAGACCCTGCTGGCACTGCATAGAAGCTATC



TGACCCCTGGAGACTCCTCTAGTGGGTGGACTGCCGGCGCC



GCTGCCTACTATGTGGGCTATCTGCAGCCACGGACATTCCT



GCTGAAATACAATGAGAACGGGACAATCACAGATGCTGTT



GATTGCGCACTCGACCCCCTGTCCGAGACAAAGTGCACTCT



CAAGAGCTTTACCGTCGAGAAGGGCATCTATCAGACCTCA



AACTTCAGGGTGCAGCCCACAGAATCTATCGTGCGCTTCCC



TAATATCACTAACCTGTGTCCTTTCGGTGAAGTGTTCAACG



CCACCAGGTTTGCTAGCGTGTATGCCTGGAACAGGAAGAG



GATCTCTAACTGCGTCGCCGACTATTCCGTGCTGTATAACA



GCGCCTCCTTCTCCACATTCAAATGCTATGGAGTGAGCCCG



ACAAAACTGAACGATCTCTGCTTTACAAATGTCTACGCCGA



CTCTTTTGTGATCAGAGGGGACGAGGTCCGGCAGATCGCAC



CAGGACAGACAGGCAAGATTGCTGACTACAACTATAAGCT



GCCTGACGACTTCACAGGATGTGTGATCGCATGGAACTCAA



ACAATCTGGACTCCAAAGTCGGGGGCAACTATAATTACCTG



TATCGCCTGTTCCGGAAGTCCAACCTGAAGCCCTTCGAGAG



GGACATCAGTACAGAGATCTATCAGGCTGGCTCCACCCCTT



GCAATGGCGTCGAAGGCTTTAATTGTTATTTTCCCCTGCAG



TCTTACGGGTTTCAGCCTACTTATGGAGTTGGGTACCAGCC



ATACAGAGTGGTCGTGCTCAGCTTCGAGCTCCTGCATGCTC



CAGCTACAGTTTGCGGGCCAAAGAAGTCCACTAACCTGGT



GAAGAATAAGTGCGTCAACTTCAACTTTAACGGGCTCACCG



GCACCGGCGTGCTGACTGAGAGCAACAAGAAGTTTCTGCC



ATTTCAACAGTTTGGACGGGACATTGATGACACCACCGAT



GCCGTTCGGGATCCACAGACCCTGGAAATTCTGGACATTAC



ACCGTGCAGCTTCGGGGGCGTGAGCGTGATCACACCCGGA



ACCAATACAAGCAACCAGGTTGCCGTCCTGTATCAGGGAG



TCAATTGCACAGAAGTGCCAGTTGCTATCCACGCAGACCAG



CTGACTCCCACATGGCGGGTGTATAGCACCGGATCCAACGT



GTTTCAGACCCGCGCCGGATGTCTCATTGGGGCCGAGCACG



TGAATAACAGCTACGAGTGCGACATCCCCATTGGCGCCGG



CATTTGTGCGTCTTACCAGACTCAGACCAACTCTCATAGAA



GGGCCAGGTCCGTTGCTAGTCAGTCTATTATTGCCTATACC



ATGAGCCTCGGAGCTGAGAATAGCGTGGCCTACTCCAATA



ATTCCATCGCAATCCCTATAAACTTCACTATTTCTGTGACC



ACCGAGATCCTGCCTGTGTCTATGACTAAGACTAGCGTTGA



TTGTACCATGTATATTTGTGGCGACTCTACCGAATGTTCTA



ACCTGCTGCTTCAGTACGGCTCATTTTGCACACAGCTGAAC



AGAGCCCTGACTGGGATCGCTGTGGAGCAGGACAAGAACA



CACAGGAGGTGTTTGCACAGGTGAAGCAGATCTATAAGAC



CCCTCCTATTAAGGATTTCGGCGGATTCAATTTCTCACAGA



TTCTGCCAGACCCCAGTAAGCCTTCCAAGAGGAGCTTCATC



GAGGATCTCCTGTTTAACAAGGTGACCCTGGCAGACGCCG



GCTTTATTAAGCAATATGGGGATTGCCTGGGCGACATTGCT



GCCAGAGACCTGATTTGCGCCCAGAAATTCAATGGCCTCAC



AGTGCTGCCACCTCTGCTGACCGACGAGATGATCGCTCAAT



ACACTAGCGCACTGCTGGCCGGAACCATCACATCAGGCTG



GACCTTCGGGGCCGGAGCAGCACTGCAGATTCCATTCGCCA



TGCAGATGGCCTATAGATTCAACGGCATTGGCGTCACACAG



AACGTGCTGTACGAAAACCAGAAGCTCATCGCTAACCAGT



TTAATTCCGCAATTGGAAAGATCCAAGATTCACTCAGCTCA



ACCGCCTCTGCACTCGGAAAGCTGCAGGACGTGGTCAACC



AGAATGCTCAGGCCCTGAACACACTCGTCAAGCAGCTGTCC



TCTAACTTTGGCGCTATCAGCTCCGTTCTGAACGACATTCT



GGCACGCCTGGATAAGGTGGAGGCTGAAGTCCAGATTGAC



CGCCTGATTACCGGCCGGCTGCAGTCTCTGCAAACATACGT



GACCCAGCAGCTGATCAGAGCAGCCGAGATCCGGGCATCC



GCAAATCTGGCAGCAACTAAGATGAGCGAATGCGTGCTGG



GCCAGTCCAAGCGGGTGGACTTTTGTGGCAAGGGCTACCA



CCTGATGAGCTTCCCCCAGAGCGCCCCACATGGCGTTGTTT



TTCTGCACGTGACCTATGTCCCTGCTCAGGAAAAGAACTTT



ACAACTGCTCCTGCTATCTGCCATGACGGCAAGGCCCACTT



CCCACGGGAGGGAGTGTTTGTGTCCAATGGCACACACTGGT



TCGTGACCCAGAGGAACTTCTATGAACCCCAGATCATCACC



ACTCACAATACCTTCGTGTCTGGAAATTGCGACGTCGTGAT



CGGCATCGTTAACAACACCGTGTACGACCCTCTCCAGCCAG



AGCTGGACTCCTTTAAGGAGGAACTGGATAAGTATTTTAAG



AACCACACAAGCCCAGATGTGGATCTCGGGGACATCTCCG



GAATTAACGCCTCCGTGGTGAATATCCAGAAGGAGATTGA



CCGCCTAAATGAAGTTGCCAAGAACCTCAATGAGTCTCTGA



TTGATCTGCAGGAACTGGGCAAGTATGAGCAGTATATCAA



ATGGCCCTGGTACATTTGGCTGGGGTTTATCGCCGGACTGA



TTGCCATCGTCATGGTGACCATCATGCTGTGTTGCATGACC



TCCTGTTGTTCCTGTCTGAAGGGCTGCTGTAGTTGCGGCTCT



TGCTGTAAATTCGACGAAGATGATAGCGAGCCCGTGCTGA



AGGGCGTGAAGCTGCATTATACCTGA





SARS-CoV-2 S protein
(SEQ ID NO: 151)


mutated to contain the
MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVF


H69-, V70-, Y144-,
RSSVLHSTQDLFLPFFSNVTWFHAISGTNGTKRFDNPVLPFND


N501Y, A570D, D614G,
GVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKVCE


P681H, T716I, S982A
FQFCNDPFLGVYHKNNKSWMESEFRVYSSANNCTFEYVSQPF


and D1118H mutations
LMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRDLPQ



GFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWTAGA



AAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKCTLK



SFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATRFAS



VYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLC



FTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTGCVI



AWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAG



STPCNGVEGFNCYFPLQSYGFQPTYGVGYQPYRVVVLSFELL



HAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKKFL



PFQQFGRDIDDTTDAVRDPQTLEILDITPCSFGGVSVITPGTNTS



NQVAVLYQGVNCTEVPVAIHADQLTPTWRVYSTGSNVFQTR



AGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSHRRARSVAS



QSIIAYTMSLGAENSVAYSNNSIAIPINFTISVTTEILPVSMTKTS



VDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDKN



TQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSFIEDLL



FNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPLL



TDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQMAYRFN



GIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQD



VVNQNAQALNTLVKQLSSNFGAISSVLNDILARLDKVEAEVQI



DRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLGQ



SKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTTA



PAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTHNTF



VSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTSPDV



DLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYEQ



YIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCCSCG



SCCKFDEDDSEPVLKGVKLHYT*





Optimized nucleotide
(SEQ ID NO: 152)


sequence encoding a
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


SARS-CoV-2 S protein
TGTGTCAACCTGACAACTAGGACTCAGCTGCCACCAGCCTA


with residues 986 and
CACCAACTCCTTCACCAGAGGCGTGTATTACCCAGACAAGG


987 mutated to proline
TGTTTAGAAGCAGCGTGCTGCACTCTACCCAGGACCTCTTT


and which contains the
CTGCCCTTTTTCAGCAACGTGACATGGTTTCACGCAATTTCC


H69-, V70-, Y144-,
GGCACTAATGGCACAAAGCGGTTCGACAATCCAGTCCTGC


N501Y, A570D, D614G,
CTTTCAACGATGGCGTCTACTTTGCATCTACTGAGAAATCC


P681H, T716I, S982A
AATATCATTAGGGGATGGATCTTCGGCACAACCCTGGATTC


and D1118H mutations
TAAGACCCAGAGCCTGCTGATCGTCAACAACGCCACAAAC



GTGGTCATTAAGGTTTGCGAGTTTCAGTTCTGTAACGATCC



TTTTCTGGGCGTTTACCATAAGAACAATAAGAGCTGGATGG



AGTCCGAGTTTAGAGTGTATAGCTCTGCAAATAATTGTACC



TTTGAGTACGTGAGCCAGCCCTTTCTGATGGACCTGGAGGG



AAAACAAGGAAACTTCAAAAACCTGCGGGAATTCGTTTTC



AAAAACATCGACGGCTATTTCAAGATCTATAGCAAGCATA



CCCCAATCAACCTCGTGAGGGACCTCCCCCAGGGCTTTAGC



GCACTGGAGCCACTGGTTGACCTGCCTATCGGCATTAATAT



CACAAGATTTCAGACCCTGCTGGCACTGCATAGAAGCTATC



TGACCCCTGGAGACTCCTCTAGTGGGTGGACTGCCGGCGCC



GCTGCCTACTATGTGGGCTATCTGCAGCCACGGACATTCCT



GCTGAAATACAATGAGAACGGGACAATCACAGATGCTGTT



GATTGCGCACTCGACCCCCTGTCCGAGACAAAGTGCACTCT



CAAGAGCTTTACCGTCGAGAAGGGCATCTATCAGACCTCA



AACTTCAGGGTGCAGCCCACAGAATCTATCGTGCGCTTCCC



TAATATCACTAACCTGTGTCCTTTCGGTGAAGTGTTCAACG



CCACCAGGTTTGCTAGCGTGTATGCCTGGAACAGGAAGAG



GATCTCTAACTGCGTCGCCGACTATTCCGTGCTGTATAACA



GCGCCTCCTTCTCCACATTCAAATGCTATGGAGTGAGCCCG



ACAAAACTGAACGATCTCTGCTTTACAAATGTCTACGCCGA



CTCTTTTGTGATCAGAGGGGACGAGGTCCGGCAGATCGCAC



CAGGACAGACAGGCAAGATTGCTGACTACAACTATAAGCT



GCCTGACGACTTCACAGGATGTGTGATCGCATGGAACTCAA



ACAATCTGGACTCCAAAGTCGGGGGCAACTATAATTACCTG



TATCGCCTGTTCCGGAAGTCCAACCTGAAGCCCTTCGAGAG



GGACATCAGTACAGAGATCTATCAGGCTGGCTCCACCCCTT



GCAATGGCGTCGAAGGCTTTAATTGTTATTTTCCCCTGCAG



TCTTACGGGTTTCAGCCTACTTATGGAGTTGGGTACCAGCC



ATACAGAGTGGTCGTGCTCAGCTTCGAGCTCCTGCATGCTC



CAGCTACAGTTTGCGGGCCAAAGAAGTCCACTAACCTGGT



GAAGAATAAGTGCGTCAACTTCAACTTTAACGGGCTCACCG



GCACCGGCGTGCTGACTGAGAGCAACAAGAAGTTTCTGCC



ATTTCAACAGTTTGGACGGGACATTGATGACACCACCGAT



GCCGTTCGGGATCCACAGACCCTGGAAATTCTGGACATTAC



ACCGTGCAGCTTCGGGGGCGTGAGCGTGATCACACCCGGA



ACCAATACAAGCAACCAGGTTGCCGTCCTGTATCAGGGAG



TCAATTGCACAGAAGTGCCAGTTGCTATCCACGCAGACCAG



CTGACTCCCACATGGCGGGTGTATAGCACCGGATCCAACGT



GTTTCAGACCCGCGCCGGATGTCTCATTGGGGCCGAGCACG



TGAATAACAGCTACGAGTGCGACATCCCCATTGGCGCCGG



CATTTGTGCGTCTTACCAGACTCAGACCAACTCTCATAGAA



GGGCCAGGTCCGTTGCTAGTCAGTCTATTATTGCCTATACC



ATGAGCCTCGGAGCTGAGAATAGCGTGGCCTACTCCAATA



ATTCCATCGCAATCCCTATAAACTTCACTATTTCTGTGACC



ACCGAGATCCTGCCTGTGTCTATGACTAAGACTAGCGTTGA



TTGTACCATGTATATTTGTGGCGACTCTACCGAATGTTCTA



ACCTGCTGCTTCAGTACGGCTCATTTTGCACACAGCTGAAC



AGAGCCCTGACTGGGATCGCTGTGGAGCAGGACAAGAACA



CACAGGAGGTGTTTGCACAGGTGAAGCAGATCTATAAGAC



CCCTCCTATTAAGGATTTCGGCGGATTCAATTTCTCACAGA



TTCTGCCAGACCCCAGTAAGCCTTCCAAGAGGAGCTTCATC



GAGGATCTCCTGTTTAACAAGGTGACCCTGGCAGACGCCG



GCTTTATTAAGCAATATGGGGATTGCCTGGGCGACATTGCT



GCCAGAGACCTGATTTGCGCCCAGAAATTCAATGGCCTCAC



AGTGCTGCCACCTCTGCTGACCGACGAGATGATCGCTCAAT



ACACTAGCGCACTGCTGGCCGGAACCATCACATCAGGCTG



GACCTTCGGGGCCGGAGCAGCACTGCAGATTCCATTCGCCA



TGCAGATGGCCTATAGATTCAACGGCATTGGCGTCACACAG



AACGTGCTGTACGAAAACCAGAAGCTCATCGCTAACCAGT



TTAATTCCGCAATTGGAAAGATCCAAGATTCACTCAGCTCA



ACCGCCTCTGCACTCGGAAAGCTGCAGGACGTGGTCAACC



AGAATGCTCAGGCCCTGAACACACTCGTCAAGCAGCTGTCC



TCTAACTTTGGCGCTATCAGCTCCGTTCTGAACGACATTCT



GGCACGCCTGGATCCCCCAGAGGCTGAAGTCCAGATTGAC



CGCCTGATTACCGGCCGGCTGCAGTCTCTGCAAACATACGT



GACCCAGCAGCTGATCAGAGCAGCCGAGATCCGGGCATCC



GCAAATCTGGCAGCAACTAAGATGAGCGAATGCGTGCTGG



GCCAGTCCAAGCGGGTGGACTTTTGTGGCAAGGGCTACCA



CCTGATGAGCTTCCCCCAGAGCGCCCCACATGGCGTTGTTT



TTCTGCACGTGACCTATGTCCCTGCTCAGGAAAAGAACTTT



ACAACTGCTCCTGCTATCTGCCATGACGGCAAGGCCCACTT



CCCACGGGAGGGAGTGTTTGTGTCCAATGGCACACACTGGT



TCGTGACCCAGAGGAACTTCTATGAACCCCAGATCATCACC



ACTCACAATACCTTCGTGTCTGGAAATTGCGACGTCGTGAT



CGGCATCGTTAACAACACCGTGTACGACCCTCTCCAGCCAG



AGCTGGACTCCTTTAAGGAGGAACTGGATAAGTATTTTAAG



AACCACACAAGCCCAGATGTGGATCTCGGGGACATCTCCG



GAATTAACGCCTCCGTGGTGAATATCCAGAAGGAGATTGA



CCGCCTAAATGAAGTTGCCAAGAACCTCAATGAGTCTCTGA



TTGATCTGCAGGAACTGGGCAAGTATGAGCAGTATATCAA



ATGGCCCTGGTACATTTGGCTGGGGTTTATCGCCGGACTGA



TTGCCATCGTCATGGTGACCATCATGCTGTGTTGCATGACC



TCCTGTTGTTCCTGTCTGAAGGGCTGCTGTAGTTGCGGCTCT



TGCTGTAAATTCGACGAAGATGATAGCGAGCCCGTGCTGA



AGGGCGTGAAGCTGCATTATACCTGA





SARS-CoV-2 S protein
(SEQ ID NO: 153)


with residues 986 and
MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVF


987 mutated to proline
RSSVLHSTQDLFLPFFSNVTWFHAISGTNGTKRFDNPVLPFND


and which contains the
GVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKVCE


H69-, V70-, Y144-,
FQFCNDPFLGVYHKNNKSWMESEFRVYSSANNCTFEYVSQPF


N501Y, A570D, D614G,
LMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRDLPQ


P681H, T716I, S982A
GFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWTAGA


and D1118H mutations
AAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKCTLK



SFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATRFAS



VYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLC



FTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTGCVI



AWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAG



STPCNGVEGFNCYFPLQSYGFQPTYGVGYQPYRVVVLSFELL



HAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKKFL



PFQQFGRDIDDTTDAVRDPQTLEILDITPCSFGGVSVITPGTNTS



NQVAVLYQGVNCTEVPVAIHADQLTPTWRVYSTGSNVFQTR



AGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSHRRARSVAS



QSIIAYTMSLGAENSVAYSNNSIAIPINFTISVTTEILPVSMTKTS



VDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDKN



TQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSFIEDLL



FNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPLL



TDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQMAYRFN



GIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQD



VVNQNAQALNTLVKQLSSNFGAISSVLNDILARLDPPEAEVQI



DRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLGQ



SKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTTA



PAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTHNTF



VSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTSPDV



DLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYEQ



YIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCCSCG



SCCKFDEDDSEPVLKGVKLHYT





Optimized nucleotide
(SEQ ID NO: 154)


sequence encoding a
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


SARS-CoV-2 S protein
TGTGTCAACCTGACAACTAGGACTCAGCTGCCACCAGCCTA


mutated to remove the
CACCAACTCCTTCACCAGAGGCGTGTATTACCCAGACAAGG


furin cleavage site
TGTTTAGAAGCAGCGTGCTGCACTCTACCCAGGACCTCTTT


required for activation
CTGCCCTTTTTCAGCAACGTGACATGGTTTCACGCAATTTCC


and which contains the
GGCACTAATGGCACAAAGCGGTTCGACAATCCAGTCCTGC


H69-, V70-, Y144-,
CTTTCAACGATGGCGTCTACTTTGCATCTACTGAGAAATCC


N501Y, A570D, D614G,
AATATCATTAGGGGATGGATCTTCGGCACAACCCTGGATTC


P681H, T716I, S982A
TAAGACCCAGAGCCTGCTGATCGTCAACAACGCCACAAAC


and D1118H mutations
GTGGTCATTAAGGTTTGCGAGTTTCAGTTCTGTAACGATCC



TTTTCTGGGCGTTTACCATAAGAACAATAAGAGCTGGATGG



AGTCCGAGTTTAGAGTGTATAGCTCTGCAAATAATTGTACC



TTTGAGTACGTGAGCCAGCCCTTTCTGATGGACCTGGAGGG



AAAACAAGGAAACTTCAAAAACCTGCGGGAATTCGTTTTC



AAAAACATCGACGGCTATTTCAAGATCTATAGCAAGCATA



CCCCAATCAACCTCGTGAGGGACCTCCCCCAGGGCTTTAGC



GCACTGGAGCCACTGGTTGACCTGCCTATCGGCATTAATAT



CACAAGATTTCAGACCCTGCTGGCACTGCATAGAAGCTATC



TGACCCCTGGAGACTCCTCTAGTGGGTGGACTGCCGGCGCC



GCTGCCTACTATGTGGGCTATCTGCAGCCACGGACATTCCT



GCTGAAATACAATGAGAACGGGACAATCACAGATGCTGTT



GATTGCGCACTCGACCCCCTGTCCGAGACAAAGTGCACTCT



CAAGAGCTTTACCGTCGAGAAGGGCATCTATCAGACCTCA



AACTTCAGGGTGCAGCCCACAGAATCTATCGTGCGCTTCCC



TAATATCACTAACCTGTGTCCTTTCGGTGAAGTGTTCAACG



CCACCAGGTTTGCTAGCGTGTATGCCTGGAACAGGAAGAG



GATCTCTAACTGCGTCGCCGACTATTCCGTGCTGTATAACA



GCGCCTCCTTCTCCACATTCAAATGCTATGGAGTGAGCCCG



ACAAAACTGAACGATCTCTGCTTTACAAATGTCTACGCCGA



CTCTTTTGTGATCAGAGGGGACGAGGTCCGGCAGATCGCAC



CAGGACAGACAGGCAAGATTGCTGACTACAACTATAAGCT



GCCTGACGACTTCACAGGATGTGTGATCGCATGGAACTCAA



ACAATCTGGACTCCAAAGTCGGGGGCAACTATAATTACCTG



TATCGCCTGTTCCGGAAGTCCAACCTGAAGCCCTTCGAGAG



GGACATCAGTACAGAGATCTATCAGGCTGGCTCCACCCCTT



GCAATGGCGTCGAAGGCTTTAATTGTTATTTTCCCCTGCAG



TCTTACGGGTTTCAGCCTACTTATGGAGTTGGGTACCAGCC



ATACAGAGTGGTCGTGCTCAGCTTCGAGCTCCTGCATGCTC



CAGCTACAGTTTGCGGGCCAAAGAAGTCCACTAACCTGGT



GAAGAATAAGTGCGTCAACTTCAACTTTAACGGGCTCACCG



GCACCGGCGTGCTGACTGAGAGCAACAAGAAGTTTCTGCC



ATTTCAACAGTTTGGACGGGACATTGATGACACCACCGAT



GCCGTTCGGGATCCACAGACCCTGGAAATTCTGGACATTAC



ACCGTGCAGCTTCGGGGGCGTGAGCGTGATCACACCCGGA



ACCAATACAAGCAACCAGGTTGCCGTCCTGTATCAGGGAG



TCAATTGCACAGAAGTGCCAGTTGCTATCCACGCAGACCAG



CTGACTCCCACATGGCGGGTGTATAGCACCGGATCCAACGT



GTTTCAGACCCGCGCCGGATGTCTCATTGGGGCCGAGCACG



TGAATAACAGCTACGAGTGCGACATCCCCATTGGCGCCGG



CATTTGTGCGTCTTACCAGACTCAGACCAACTCTCATGGCT



CCGCCTCTTCCGTTGCTAGTCAGTCTATTATTGCCTATACCA



TGAGCCTCGGAGCTGAGAATAGCGTGGCCTACTCCAATAAT



TCCATCGCAATCCCTATAAACTTCACTATTTCTGTGACCAC



CGAGATCCTGCCTGTGTCTATGACTAAGACTAGCGTTGATT



GTACCATGTATATTTGTGGCGACTCTACCGAATGTTCTAAC



CTGCTGCTTCAGTACGGCTCATTTTGCACACAGCTGAACAG



AGCCCTGACTGGGATCGCTGTGGAGCAGGACAAGAACACA



CAGGAGGTGTTTGCACAGGTGAAGCAGATCTATAAGACCC



CTCCTATTAAGGATTTCGGCGGATTCAATTTCTCACAGATT



CTGCCAGACCCCAGTAAGCCTTCCAAGAGGAGCTTCATCGA



GGATCTCCTGTTTAACAAGGTGACCCTGGCAGACGCCGGCT



TTATTAAGCAATATGGGGATTGCCTGGGCGACATTGCTGCC



AGAGACCTGATTTGCGCCCAGAAATTCAATGGCCTCACAGT



GCTGCCACCTCTGCTGACCGACGAGATGATCGCTCAATACA



CTAGCGCACTGCTGGCCGGAACCATCACATCAGGCTGGAC



CTTCGGGGCCGGAGCAGCACTGCAGATTCCATTCGCCATGC



AGATGGCCTATAGATTCAACGGCATTGGCGTCACACAGAA



CGTGCTGTACGAAAACCAGAAGCTCATCGCTAACCAGTTTA



ATTCCGCAATTGGAAAGATCCAAGATTCACTCAGCTCAACC



GCCTCTGCACTCGGAAAGCTGCAGGACGTGGTCAACCAGA



ATGCTCAGGCCCTGAACACACTCGTCAAGCAGCTGTCCTCT



AACTTTGGCGCTATCAGCTCCGTTCTGAACGACATTCTGGC




ACGCCTGGATAAGGTGGAGGCTGAAGTCCAGATTGACCGC




CTGATTACCGGCCGGCTGCAGTCTCTGCAAACATACGTGAC



CCAGCAGCTGATCAGAGCAGCCGAGATCCGGGCATCCGCA



AATCTGGCAGCAACTAAGATGAGCGAATGCGTGCTGGGCC



AGTCCAAGCGGGTGGACTTTTGTGGCAAGGGCTACCACCTG



ATGAGCTTCCCCCAGAGCGCCCCACATGGCGTTGTTTTTCT



GCACGTGACCTATGTCCCTGCTCAGGAAAAGAACTTTACAA



CTGCTCCTGCTATCTGCCATGACGGCAAGGCCCACTTCCCA



CGGGAGGGAGTGTTTGTGTCCAATGGCACACACTGGTTCGT



GACCCAGAGGAACTTCTATGAACCCCAGATCATCACCACTC




ACAATACCTTCGTGTCTGGAAATTGCGACGTCGTGATCGGC




ATCGTTAACAACACCGTGTACGACCCTCTCCAGCCAGAGCT



GGACTCCTTTAAGGAGGAACTGGATAAGTATTTTAAGAACC



ACACAAGCCCAGATGTGGATCTCGGGGACATCTCCGGAAT



TAACGCCTCCGTGGTGAATATCCAGAAGGAGATTGACCGC



CTAAATGAAGTTGCCAAGAACCTCAATGAGTCTCTGATTGA



TCTGCAGGAACTGGGCAAGTATGAGCAGTATATCAAATGG



CCCTGGTACATTTGGCTGGGGTTTATCGCCGGACTGATTGC



CATCGTCATGGTGACCATCATGCTGTGTTGCATGACCTCCT



GTTGTTCCTGTCTGAAGGGCTGCTGTAGTTGCGGCTCTTGCT



GTAAATTCGACGAAGATGATAGCGAGCCCGTGCTGAAGGG



CGTGAAGCTGCATTATACCTGA





SARS-CoV-2 S protein
(SEQ ID NO: 155)


mutated to remove the
MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVF


furin cleavage site
RSSVLHSTQDLFLPFFSNVTWFHAISGTNGTKRFDNPVLPFND


required for activation
GVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKVCE


and which contains the
FQFCNDPFLGVYHKNNKSWMESEFRVYSSANNCTFEYVSQPF


H69-, V70-, Y144-,
LMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRDLPQ


N501Y, A570D, D614G,
GFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWTAGA


P681H, T716I, S982A
AAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKCTLK


and D1118H mutations
SFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATRFAS



VYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLC



FTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTGCVI



AWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAG



STPCNGVEGFNCYFPLQSYGFQPTYGVGYQPYRVVVLSFELL



HAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKKFL



PFQQFGRDIDDTTDAVRDPQTLEILDITPCSFGGVSVITPGTNTS



NQVAVLYQGVNCTEVPVAIHADQLTPTWRVYSTGSNVFQTR



AGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSHGSASSVAS



QSIIAYTMSLGAENSVAYSNNSIAIPINFTISVTTEILPVSMTKTS



VDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDKN



TQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSFIEDLL



FNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPLL



TDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQMAYRFN



GIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQD



VVNQNAQALNTLVKQLSSNFGAISSVLNDILARLDKVEAEVQI



DRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLGQ



SKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTTA



PAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTHNTF



VSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTSPDV



DLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYEQ



YIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCCSCG



SCCKFDEDDSEPVLKGVKLHYT





Optimized nucleotide
(SEQ ID NO: 156)


sequence encoding a
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


SARS-CoV-2 S protein
TGTGTCAACCTGACAACTAGGACTCAGCTGCCACCAGCCTA


mutated to remove a
CACCAACTCCTTCACCAGAGGCGTGTATTACCCAGACAAGG


furin cleavage site and
TGTTTAGAAGCAGCGTGCTGCACTCTACCCAGGACCTCTTT


to replace residues 986
CTGCCCTTTTTCAGCAACGTGACATGGTTTCACGCAATTTCC


and 987 with proline
GGCACTAATGGCACAAAGCGGTTCGACAATCCAGTCCTGC


and which contains the
CTTTCAACGATGGCGTCTACTTTGCATCTACTGAGAAATCC


H69-, V70-, Y144-,
AATATCATTAGGGGATGGATCTTCGGCACAACCCTGGATTC


N501Y, A570D, D614G,
TAAGACCCAGAGCCTGCTGATCGTCAACAACGCCACAAAC


P681H, T716I, S982A
GTGGTCATTAAGGTTTGCGAGTTTCAGTTCTGTAACGATCC


and D1118H mutations
TTTTCTGGGCGTTTACCATAAGAACAATAAGAGCTGGATGG



AGTCCGAGTTTAGAGTGTATAGCTCTGCAAATAATTGTACC



TTTGAGTACGTGAGCCAGCCCTTTCTGATGGACCTGGAGGG



AAAACAAGGAAACTTCAAAAACCTGCGGGAATTCGTTTTC



AAAAACATCGACGGCTATTTCAAGATCTATAGCAAGCATA



CCCCAATCAACCTCGTGAGGGACCTCCCCCAGGGCTTTAGC



GCACTGGAGCCACTGGTTGACCTGCCTATCGGCATTAATAT



CACAAGATTTCAGACCCTGCTGGCACTGCATAGAAGCTATC



TGACCCCTGGAGACTCCTCTAGTGGGTGGACTGCCGGCGCC



GCTGCCTACTATGTGGGCTATCTGCAGCCACGGACATTCCT



GCTGAAATACAATGAGAACGGGACAATCACAGATGCTGTT



GATTGCGCACTCGACCCCCTGTCCGAGACAAAGTGCACTCT



CAAGAGCTTTACCGTCGAGAAGGGCATCTATCAGACCTCA



AACTTCAGGGTGCAGCCCACAGAATCTATCGTGCGCTTCCC



TAATATCACTAACCTGTGTCCTTTCGGTGAAGTGTTCAACG



CCACCAGGTTTGCTAGCGTGTATGCCTGGAACAGGAAGAG



GATCTCTAACTGCGTCGCCGACTATTCCGTGCTGTATAACA



GCGCCTCCTTCTCCACATTCAAATGCTATGGAGTGAGCCCG



ACAAAACTGAACGATCTCTGCTTTACAAATGTCTACGCCGA



CTCTTTTGTGATCAGAGGGGACGAGGTCCGGCAGATCGCAC



CAGGACAGACAGGCAAGATTGCTGACTACAACTATAAGCT



GCCTGACGACTTCACAGGATGTGTGATCGCATGGAACTCAA



ACAATCTGGACTCCAAAGTCGGGGGCAACTATAATTACCTG



TATCGCCTGTTCCGGAAGTCCAACCTGAAGCCCTTCGAGAG



GGACATCAGTACAGAGATCTATCAGGCTGGCTCCACCCCTT



GCAATGGCGTCGAAGGCTTTAATTGTTATTTTCCCCTGCAG



TCTTACGGGTTTCAGCCTACTTATGGAGTTGGGTACCAGCC



ATACAGAGTGGTCGTGCTCAGCTTCGAGCTCCTGCATGCTC



CAGCTACAGTTTGCGGGCCAAAGAAGTCCACTAACCTGGT



GAAGAATAAGTGCGTCAACTTCAACTTTAACGGGCTCACCG



GCACCGGCGTGCTGACTGAGAGCAACAAGAAGTTTCTGCC



ATTTCAACAGTTTGGACGGGACATTGATGACACCACCGAT



GCCGTTCGGGATCCACAGACCCTGGAAATTCTGGACATTAC



ACCGTGCAGCTTCGGGGGCGTGAGCGTGATCACACCCGGA



ACCAATACAAGCAACCAGGTTGCCGTCCTGTATCAGGGAG



TCAATTGCACAGAAGTGCCAGTTGCTATCCACGCAGACCAG



CTGACTCCCACATGGCGGGTGTATAGCACCGGATCCAACGT



GTTTCAGACCCGCGCCGGATGTCTCATTGGGGCCGAGCACG



TGAATAACAGCTACGAGTGCGACATCCCCATTGGCGCCGG



CATTTGTGCGTCTTACCAGACTCAGACCAACTCTCATGGCT



CCGCCTCTTCCGTTGCTAGTCAGTCTATTATTGCCTATACCA



TGAGCCTCGGAGCTGAGAATAGCGTGGCCTACTCCAATAAT



TCCATCGCAATCCCTATAAACTTCACTATTTCTGTGACCAC



CGAGATCCTGCCTGTGTCTATGACTAAGACTAGCGTTGATT



GTACCATGTATATTTGTGGCGACTCTACCGAATGTTCTAAC



CTGCTGCTTCAGTACGGCTCATTTTGCACACAGCTGAACAG



AGCCCTGACTGGGATCGCTGTGGAGCAGGACAAGAACACA



CAGGAGGTGTTTGCACAGGTGAAGCAGATCTATAAGACCC



CTCCTATTAAGGATTTCGGCGGATTCAATTTCTCACAGATT



CTGCCAGACCCCAGTAAGCCTTCCAAGAGGAGCTTCATCGA



GGATCTCCTGTTTAACAAGGTGACCCTGGCAGACGCCGGCT



TTATTAAGCAATATGGGGATTGCCTGGGCGACATTGCTGCC



AGAGACCTGATTTGCGCCCAGAAATTCAATGGCCTCACAGT



GCTGCCACCTCTGCTGACCGACGAGATGATCGCTCAATACA



CTAGCGCACTGCTGGCCGGAACCATCACATCAGGCTGGAC



CTTCGGGGCCGGAGCAGCACTGCAGATTCCATTCGCCATGC



AGATGGCCTATAGATTCAACGGCATTGGCGTCACACAGAA



CGTGCTGTACGAAAACCAGAAGCTCATCGCTAACCAGTTTA



ATTCCGCAATTGGAAAGATCCAAGATTCACTCAGCTCAACC



GCCTCTGCACTCGGAAAGCTGCAGGACGTGGTCAACCAGA



ATGCTCAGGCCCTGAACACACTCGTCAAGCAGCTGTCCTCT



AACTTTGGCGCTATCAGCTCCGTTCTGAACGACATTCTGGC




ACGCCTGGATCCCCCAGAGGCTGAAGTCCAGATTGACCGC




CTGATTACCGGCCGGCTGCAGTCTCTGCAAACATACGTGAC



CCAGCAGCTGATCAGAGCAGCCGAGATCCGGGCATCCGCA



AATCTGGCAGCAACTAAGATGAGCGAATGCGTGCTGGGCC



AGTCCAAGCGGGTGGACTTTTGTGGCAAGGGCTACCACCTG



ATGAGCTTCCCCCAGAGCGCCCCACATGGCGTTGTTTTTCT



GCACGTGACCTATGTCCCTGCTCAGGAAAAGAACTTTACAA



CTGCTCCTGCTATCTGCCATGACGGCAAGGCCCACTTCCCA



CGGGAGGGAGTGTTTGTGTCCAATGGCACACACTGGTTCGT



GACCCAGAGGAACTTCTATGAACCCCAGATCATCACCACTC




ACAATACCTTCGTGTCTGGAAATTGCGACGTCGTGATCGGC




ATCGTTAACAACACCGTGTACGACCCTCTCCAGCCAGAGCT



GGACTCCTTTAAGGAGGAACTGGATAAGTATTTTAAGAACC



ACACAAGCCCAGATGTGGATCTCGGGGACATCTCCGGAAT



TAACGCCTCCGTGGTGAATATCCAGAAGGAGATTGACCGC



CTAAATGAAGTTGCCAAGAACCTCAATGAGTCTCTGATTGA



TCTGCAGGAACTGGGCAAGTATGAGCAGTATATCAAATGG



CCCTGGTACATTTGGCTGGGGTTTATCGCCGGACTGATTGC



CATCGTCATGGTGACCATCATGCTGTGTTGCATGACCTCCT



GTTGTTCCTGTCTGAAGGGCTGCTGTAGTTGCGGCTCTTGCT



GTAAATTCGACGAAGATGATAGCGAGCCCGTGCTGAAGGG



CGTGAAGCTGCATTATACCTGA





SARS-CoV-2 S protein
(SEQ ID NO: 157)


mutated to remove a
MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVF


furin cleavage site and
RSSVLHSTQDLFLPFFSNVTWFHAISGTNGTKRFDNPVLPFND


to replace residues 986
GVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKVCE


and 987 with proline
FQFCNDPFLGVYHKNNKSWMESEFRVYSSANNCTFEYVSQPF


and which contains the
LMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRDLPQ


H69-, V70-, Y144-,
GFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWTAGA


N501Y, A570D, D614G,
AAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKCTLK


P681H, T716I, S982A
SFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATRFAS


and D1118H mutations
VYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLC



FTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTGCVI



AWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAG



STPCNGVEGFNCYFPLQSYGFQPTYGVGYQPYRVVVLSFELL



HAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKKFL



PFQQFGRDIDDTTDAVRDPQTLEILDITPCSFGGVSVITPGTNTS



NQVAVLYQGVNCTEVPVAIHADQLTPTWRVYSTGSNVFQTR



AGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSHGSASSVAS



QSIIAYTMSLGAENSVAYSNNSIAIPINFTISVTTEILPVSMTKTS



VDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDKN



TQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSFIEDLL



FNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPLL



TDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQMAYRFN



GIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQD



VVNQNAQALNTLVKQLSSNFGAISSVLNDILARLDPPEAEVQI



DRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLGQ



SKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTTA



PAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTHNTF



VSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTSPDV



DLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYEQ



YIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCCSCG



SCCKFDEDDSEPVLKGVKLHYT





Optimized nucleotide
(SEQ ID NO: 158)


sequence encoding a
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


SARS-CoV-2 S protein
TGTGTCAACCTGACAACTAGGACTCAGCTGCCACCAGCCTA


mutated to remove a
CACCAACTCCTTCACCAGAGGCGTGTATTACCCAGACAAGG


furin cleavage site and
TGTTTAGAAGCAGCGTGCTGCACTCTACCCAGGACCTCTTT


to replace residues 817,
CTGCCCTTTTTCAGCAACGTGACATGGTTTCACGCAATTTCC


892, 899 and 942, 986
GGCACTAATGGCACAAAGCGGTTCGACAATCCAGTCCTGC


and 987 with proline
CTTTCAACGATGGCGTCTACTTTGCATCTACTGAGAAATCC


and which contains the
AATATCATTAGGGGATGGATCTTCGGCACAACCCTGGATTC


H69-, V70-, Y144-,
TAAGACCCAGAGCCTGCTGATCGTCAACAACGCCACAAAC


N501Y, A570D, D614G,
GTGGTCATTAAGGTTTGCGAGTTTCAGTTCTGTAACGATCC


P681H, T716I, S982A
TTTTCTGGGCGTTTACCATAAGAACAATAAGAGCTGGATGG


and D1118H mutations
AGTCCGAGTTTAGAGTGTATAGCTCTGCAAATAATTGTACC



TTTGAGTACGTGAGCCAGCCCTTTCTGATGGACCTGGAGGG



AAAACAAGGAAACTTCAAAAACCTGCGGGAATTCGTTTTC



AAAAACATCGACGGCTATTTCAAGATCTATAGCAAGCATA



CCCCAATCAACCTCGTGAGGGACCTCCCCCAGGGCTTTAGC



GCACTGGAGCCACTGGTTGACCTGCCTATCGGCATTAATAT



CACAAGATTTCAGACCCTGCTGGCACTGCATAGAAGCTATC



TGACCCCTGGAGACTCCTCTAGTGGGTGGACTGCCGGCGCC



GCTGCCTACTATGTGGGCTATCTGCAGCCACGGACATTCCT



GCTGAAATACAATGAGAACGGGACAATCACAGATGCTGTT



GATTGCGCACTCGACCCCCTGTCCGAGACAAAGTGCACTCT



CAAGAGCTTTACCGTCGAGAAGGGCATCTATCAGACCTCA



AACTTCAGGGTGCAGCCCACAGAATCTATCGTGCGCTTCCC



TAATATCACTAACCTGTGTCCTTTCGGTGAAGTGTTCAACG



CCACCAGGTTTGCTAGCGTGTATGCCTGGAACAGGAAGAG



GATCTCTAACTGCGTCGCCGACTATTCCGTGCTGTATAACA



GCGCCTCCTTCTCCACATTCAAATGCTATGGAGTGAGCCCG



ACAAAACTGAACGATCTCTGCTTTACAAATGTCTACGCCGA



CTCTTTTGTGATCAGAGGGGACGAGGTCCGGCAGATCGCAC



CAGGACAGACAGGCAAGATTGCTGACTACAACTATAAGCT



GCCTGACGACTTCACAGGATGTGTGATCGCATGGAACTCAA



ACAATCTGGACTCCAAAGTCGGGGGCAACTATAATTACCTG



TATCGCCTGTTCCGGAAGTCCAACCTGAAGCCCTTCGAGAG



GGACATCAGTACAGAGATCTATCAGGCTGGCTCCACCCCTT



GCAATGGCGTCGAAGGCTTTAATTGTTATTTTCCCCTGCAG



TCTTACGGGTTTCAGCCTACTTATGGAGTTGGGTACCAGCC



ATACAGAGTGGTCGTGCTCAGCTTCGAGCTCCTGCATGCTC



CAGCTACAGTTTGCGGGCCAAAGAAGTCCACTAACCTGGT



GAAGAATAAGTGCGTCAACTTCAACTTTAACGGGCTCACCG



GCACCGGCGTGCTGACTGAGAGCAACAAGAAGTTTCTGCC



ATTTCAACAGTTTGGACGGGACATTGATGACACCACCGAT



GCCGTTCGGGATCCACAGACCCTGGAAATTCTGGACATTAC



ACCGTGCAGCTTCGGGGGCGTGAGCGTGATCACACCCGGA



ACCAATACAAGCAACCAGGTTGCCGTCCTGTATCAGGGCG



TCAATTGCACAGAAGTGCCAGTTGCTATCCACGCAGACCAG



CTGACTCCCACATGGCGGGTGTATAGCACCGGATCCAACGT



GTTTCAGACCCGCGCCGGATGTCTCATTGGGGCCGAGCACG



TGAATAACAGCTACGAGTGCGACATCCCCATTGGCGCCGG



CATTTGTGCGTCTTACCAGACTCAGACCAACTCTCATGGCT



CCGCCTCTTCCGTTGCTAGTCAGTCTATTATTGCCTATACCA



TGAGCCTCGGAGCTGAGAATAGCGTGGCCTACTCCAATAAT



TCCATCGCAATCCCTATAAACTTCACTATTTCTGTGACCAC



CGAGATCCTGCCTGTGTCTATGACTAAGACTAGCGTTGATT



GTACCATGTATATTTGTGGCGACTCTACCGAATGTTCTAAC



CTGCTGCTTCAGTACGGCTCATTTTGCACACAGCTGAACAG



AGCCCTGACTGGGATCGCTGTGGAGCAGGACAAGAACACA



CAGGAGGTGTTTGCACAGGTGAAGCAGATCTATAAGACCC



CTCCTATTAAGGATTTCGGCGGATTCAATTTCTCACAGATT



CTGCCAGACCCCAGTAAGCCTTCCAAGAGGAGCCCTATCG



AGGATCTCCTGTTTAACAAGGTGACCCTGGCAGACGCCGGC



TTTATTAAGCAATATGGGGATTGCCTGGGCGACATTGCTGC



CAGAGACCTGATTTGCGCCCAGAAATTCAATGGCCTCACAG



TGCTGCCACCTCTGCTGACCGACGAGATGATCGCTCAATAC



ACTAGCGCACTGCTGGCCGGAACCATCACATCAGGCTGGA



CCTTCGGGGCCGGACCAGCACTGCAGATTCCATTCCCTATG



CAGATGGCCTATAGATTCAACGGCATTGGCGTCACACAGA



ACGTGCTGTACGAAAACCAGAAGCTCATCGCTAACCAGTTT



AATTCCGCAATTGGAAAGATCCAAGATTCACTCAGCTCAAC



CCCCTCTGCACTCGGAAAGCTGCAGGACGTGGTCAACCAG



AATGCTCAGGCCCTGAACACACTCGTCAAGCAGCTGTCCTC



TAACTTTGGCGCTATCAGCTCCGTTCTGAACGACATTCTGG




CACGCCTGGATCCCCCAGAGGCTGAAGTCCAGATTGACCG




CCTGATTACCGGCCGGCTGCAGTCTCTGCAAACATACGTGA



CCCAGCAGCTGATCAGAGCAGCCGAGATCCGGGCATCCGC



AAATCTGGCAGCAACTAAGATGAGCGAATGCGTGCTGGGC



CAGTCCAAGCGGGTGGACTTTTGTGGCAAGGGCTACCACCT



GATGAGCTTCCCCCAGAGCGCCCCACATGGCGTTGTTTTTC



TGCACGTGACCTATGTCCCTGCTCAGGAAAAGAACTTTACA



ACTGCTCCTGCTATCTGCCATGACGGCAAGGCCCACTTCCC



ACGGGAGGGAGTGTTTGTGTCCAATGGCACACACTGGTTCG



TGACCCAGAGGAACTTCTATGAACCCCAGATCATCACCACT




CACAATACCTTCGTGTCTGGAAATTGCGACGTCGTGATCGG




CATCGTTAACAACACCGTGTACGACCCTCTCCAGCCAGAGC



TGGACTCCTTTAAGGAGGAACTGGATAAGTATTTTAAGAAC



CACACAAGCCCAGATGTGGATCTCGGGGACATCTCCGGAA



TTAACGCCTCCGTGGTGAATATCCAGAAGGAGATTGACCGC



CTAAATGAAGTTGCCAAGAACCTCAATGAGTCTCTGATTGA



TCTGCAGGAACTGGGCAAGTATGAGCAGTATATCAAATGG



CCCTGGTACATTTGGCTGGGGTTTATCGCCGGACTGATTGC



CATCGTCATGGTGACCATCATGCTGTGTTGCATGACCTCCT



GTTGTTCCTGTCTGAAGGGCTGCTGTAGTTGCGGCTCTTGCT



GTAAATTCGACGAAGATGATAGCGAGCCCGTGCTGAAGGG



CGTGAAGCTGCATTATACCTGA





SARS-CoV-2 S protein
(SEQ ID NO: 159)


mutated to remove a
MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVF


furin cleavage site and
RSSVLHSTQDLFLPFFSNVTWFHAISGTNGTKRFDNPVLPFND


to replace residues 817,
GVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKVCE


892, 899 and 942, 986
FQFCNDPFLGVYHKNNKSWMESEFRVYSSANNCTFEYVSQPF


and 987 with proline
LMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRDLPQ


and which contains the
GFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWTAGA


H69-, V70-, Y144-,
AAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKCTLK


N501Y, A570D, D614G,
SFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVENATRFAS


P681H, T716I, S982A
VYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLC


and D1118H mutations
FTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTGCVI



AWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAG



STPCNGVEGFNCYFPLQSYGFQPTYGVGYQPYRVVVLSFELL



HAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKKFL



PFQQFGRDIDDTTDAVRDPQTLEILDITPCSFGGVSVITPGTNTS



NQVAVLYQGVNCTEVPVAIHADQLTPTWRVYSTGSNVFQTR



AGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSHGSASSVAS



QSIIAYTMSLGAENSVAYSNNSIAIPINFTISVTTEILPVSMTKTS



VDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDKN



TQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSPIEDLL



FNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPLL



TDEMIAQYTSALLAGTITSGWTFGAGPALQIPFPMQMAYRFN



GIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTPSALGKLQD



VVNQNAQALNTLVKQLSSNFGAISSVLNDILARLDPPEAEVQI



DRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLGQ



SKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTTA



PAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTHNTF



VSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTSPDV



DLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYEQ



YIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCCSCG



SCCKFDEDDSEPVLKGVKLHYT





Optimized nucleotide
(SEQ ID NO: 160)


sequence encoding a
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


SARS-CoV-2 S protein
TGTGTCAACCTGACAACTAGGACTCAGCTGCCACCAGCCTA


containing the D80A,
CACCAACTCCTTCACCAGAGGCGTGTATTACCCAGACAAGG


D215G, L242-, A243-,
TGTTTAGAAGCAGCGTGCTGCACTCTACCCAGGACCTCTTT


L244-, K417N, E484K,
CTGCCCTTTTTCAGCAACGTGACATGGTTTCACGCAATTCA


N501Y, D614G and
CGTGTCCGGCACTAATGGCACAAAGCGGTTCGCCAATCCA


A701V mutations
GTCCTGCCTTTCAACGATGGCGTCTACTTTGCATCTACTGA



GAAATCCAATATCATTAGGGGATGGATCTTCGGCACAACCC



TGGATTCTAAGACCCAGAGCCTGCTGATCGTCAACAACGCC



ACAAACGTGGTCATTAAGGTTTGCGAGTTTCAGTTCTGTAA



CGATCCTTTTCTGGGCGTGTATTATCATAAGAACAATAAGA



GCTGGATGGAGTCCGAGTTTAGAGTGTATAGCTCTGCAAAT



AATTGTACCTTTGAGTACGTGAGCCAGCCCTTTCTGATGGA



CCTGGAGGGAAAACAAGGAAACTTCAAAAACCTGCGGGAA



TTCGTTTTCAAAAACATCGACGGCTATTTCAAGATCTATAG



CAAGCATACCCCAATCAACCTCGTGAGGGGCCTCCCCCAG



GGCTTTAGCGCACTGGAGCCACTGGTTGACCTGCCTATCGG



CATTAATATCACAAGATTTCAGACCCTGCATAGAAGCTATC



TGACCCCTGGAGACTCCTCTAGTGGGTGGACTGCCGGCGCC



GCTGCCTACTATGTGGGCTATCTGCAGCCACGGACATTCCT



GCTGAAATACAATGAGAACGGGACAATCACAGATGCTGTT



GATTGCGCACTCGACCCCCTGTCCGAGACAAAGTGCACTCT



CAAGAGCTTTACCGTCGAGAAGGGCATCTATCAGACCTCA



AACTTCAGGGTGCAGCCCACAGAATCTATCGTGCGCTTCCC



TAATATCACTAACCTGTGTCCTTTCGGTGAAGTGTTCAACG



CCACCAGGTTTGCTAGCGTGTATGCCTGGAACAGGAAGAG



GATCTCTAACTGCGTCGCCGACTATTCCGTGCTGTATAACA



GCGCCTCCTTCTCCACATTCAAATGCTATGGAGTGAGCCCG



ACAAAACTGAACGATCTCTGCTTTACAAATGTCTACGCCGA



CTCTTTTGTGATCAGAGGGGACGAGGTCCGGCAGATCGCAC



CAGGACAGACAGGCAACATTGCTGACTACAACTATAAGCT



GCCTGACGACTTCACAGGATGTGTGATCGCATGGAACTCAA



ACAATCTGGACTCCAAAGTCGGGGGCAACTATAATTACCTG



TATCGCCTGTTCCGGAAGTCCAACCTGAAGCCCTTCGAGAG



GGACATCAGTACAGAGATCTATCAGGCTGGCTCCACCCCTT



GCAATGGCGTCAAGGGCTTTAATTGTTATTTTCCCCTGCAG



TCTTACGGGTTTCAGCCTACTTACGGAGTTGGGTACCAGCC



ATACAGAGTGGTCGTGCTCAGCTTCGAGCTCCTGCATGCTC



CAGCTACAGTTTGCGGGCCAAAGAAGTCCACTAACCTGGT



GAAGAATAAGTGCGTCAACTTCAACTTTAACGGGCTCACCG



GCACCGGCGTGCTGACTGAGAGCAACAAGAAGTTTCTGCC



ATTTCAACAGTTTGGACGGGACATTGCCGACACCACCGATG



CCGTTCGGGATCCACAGACCCTGGAAATTCTGGACATTACA



CCGTGCAGCTTCGGGGGCGTGAGCGTGATCACACCCGGAA



CCAATACAAGCAACCAGGTTGCCGTCCTGTATCAGGGCGT



CAATTGCACAGAAGTGCCAGTTGCTATCCACGCAGACCAG



CTGACTCCCACATGGCGGGTGTATAGCACCGGATCCAACGT



GTTTCAGACCCGCGCCGGATGTCTCATTGGGGCCGAGCACG



TGAATAACAGCTACGAGTGCGACATCCCCATTGGCGCCGG



CATTTGTGCGTCTTACCAGACTCAGACCAACTCTCCTAGAA



GGGCCAGGTCCGTTGCTAGTCAGTCTATTATTGCCTATACC



ATGAGCCTCGGAGTGGAGAATAGCGTGGCCTACTCCAATA



ATTCCATCGCAATCCCTACTAACTTCACTATTTCTGTGACCA



CCGAGATCCTGCCTGTGTCTATGACTAAGACTAGCGTTGAT



TGTACCATGTATATTTGTGGCGACTCTACCGAATGTTCTAA



CCTGCTGCTTCAGTACGGCTCATTTTGCACACAGCTGAACA



GAGCCCTGACTGGGATCGCTGTGGAGCAGGACAAGAACAC



ACAGGAGGTGTTTGCACAGGTGAAGCAGATCTATAAGACC



CCTCCTATTAAGGATTTCGGCGGATTCAATTTCTCACAGAT



TCTGCCAGACCCCAGTAAGCCTTCCAAGAGGAGCTTCATCG



AGGATCTCCTGTTTAACAAGGTGACCCTGGCAGACGCCGGC



TTTATTAAGCAATATGGGGATTGCCTGGGCGACATTGCTGC



CAGAGACCTGATTTGCGCCCAGAAATTCAATGGCCTCACAG



TGCTGCCACCTCTGCTGACCGACGAGATGATCGCTCAATAC



ACTAGCGCACTGCTGGCCGGAACCATCACATCAGGCTGGA



CCTTCGGGGCCGGAGCAGCACTGCAGATTCCATTCGCCATG



CAGATGGCCTATAGATTCAACGGCATTGGCGTCACACAGA



ACGTGCTGTACGAAAACCAGAAGCTCATCGCTAACCAGTTT



AATTCCGCAATTGGAAAGATCCAAGATTCACTCAGCTCAAC



CGCCTCTGCACTCGGAAAGCTGCAGGACGTGGTCAACCAG



AATGCTCAGGCCCTGAACACACTCGTCAAGCAGCTGTCCTC



TAACTTTGGCGCTATCAGCTCCGTTCTGAACGACATTCTGA



GCCGCCTGGATAAGGTGGAGGCTGAAGTCCAGATTGACCG



CCTGATTACCGGCCGGCTGCAGTCTCTGCAAACATACGTGA



CCCAGCAGCTGATCAGAGCAGCCGAGATCCGGGCATCCGC



AAATCTGGCAGCAACTAAGATGAGCGAATGCGTGCTGGGC



CAGTCCAAGCGGGTGGACTTTTGTGGCAAGGGCTACCACCT



GATGAGCTTCCCCCAGAGCGCCCCACATGGCGTTGTTTTTC



TGCACGTGACCTATGTCCCTGCTCAGGAAAAGAACTTTACA



ACTGCTCCTGCTATCTGCCATGACGGCAAGGCCCACTTCCC



ACGGGAGGGAGTGTTTGTGTCCAATGGCACACACTGGTTCG



TGACCCAGAGGAACTTCTATGAACCCCAGATCATCACCACT



GACAATACCTTCGTGTCTGGAAATTGCGACGTCGTGATCGG



CATCGTTAACAACACCGTGTACGACCCTCTCCAGCCAGAGC



TGGACTCCTTTAAGGAGGAACTGGATAAGTATTTTAAGAAC



CACACAAGCCCAGATGTGGATCTCGGGGACATCTCCGGAA



TTAACGCCTCCGTGGTGAATATCCAGAAGGAGATTGACCGC



CTAAATGAAGTTGCCAAGAACCTCAATGAGTCTCTGATTGA



TCTGCAGGAACTGGGCAAGTATGAGCAGTATATCAAATGG



CCCTGGTACATTTGGCTGGGGTTTATCGCCGGACTGATTGC



CATCGTCATGGTGACCATCATGCTGTGTTGCATGACCTCCT



GTTGTTCCTGTCTGAAGGGCTGCTGTAGTTGCGGCTCTTGCT



GTAAATTCGACGAAGATGATAGCGAGCCCGTGCTGAAGGG



CGTGAAGCTGCATTATACCTGA





SARS-CoV-2 S protein
(SEQ ID NO: 161)


containing the D80A,
MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVF


D215G, L242-, A243-,
RSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFANPVLPF


L244-, K417N, E484K,
NDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKV


N501Y, D614G and
CEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVS


A701V mutations
QPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRG



LPQGFSALEPLVDLPIGINITRFQTLHRSYLTPGDSSSGWTAGA



AAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKCTLK



SFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATRFAS



VYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLC



FTNVYADSFVIRGDEVRQIAPGQTGNIADYNYKLPDDFTGCVI



AWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAG



STPCNGVKGFNCYFPLQSYGFQPTYGVGYQPYRVVVLSFELL



HAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKKFL



PFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTNTS



NQVAVLYQGVNCTEVPVAIHADQLTPTWRVYSTGSNVFQTR



AGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSPRRARSVAS



QSIIAYTMSLGVENSVAYSNNSIAIPTNFTISVTTEILPVSMTKT



SVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDK



NTQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSFIEDL



LFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPL



LTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQMAYRF



NGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQ



DVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDKVEAEV



QIDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVL



GQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNF



TTAPAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTD



NTFVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTS



PDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGK



YEQYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCC



SCGSCCKFDEDDSEPVLKGVKLHYT





Optimized nucleotide
(SEQ ID NO: 162)


sequence encoding a
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


SARS-CoV-2 S protein
TGTGTCAACCTGACAACTAGGACTCAGCTGCCACCAGCCTA


containing mutated to
CACCAACTCCTTCACCAGAGGCGTGTATTACCCAGACAAGG


remove a furin cleavage
TGTTTAGAAGCAGCGTGCTGCACTCTACCCAGGACCTCTTT


site and to replace
CTGCCCTTTTTCAGCAACGTGACATGGTTTCACGCAATTCA


residues 986 and 987
CGTGTCCGGCACTAATGGCACAAAGCGGTTCGCCAATCCA


with proline and which
GTCCTGCCTTTCAACGATGGCGTCTACTTTGCATCTACTGA


contains the D80A,
GAAATCCAATATCATTAGGGGATGGATCTTCGGCACAACCC


D215G, L242-, A243-,
TGGATTCTAAGACCCAGAGCCTGCTGATCGTCAACAACGCC


L244-, K417N, E484K,
ACAAACGTGGTCATTAAGGTTTGCGAGTTTCAGTTCTGTAA


N501Y, D614G and
CGATCCTTTTCTGGGCGTGTATTATCATAAGAACAATAAGA


A701V mutations
GCTGGATGGAGTCCGAGTTTAGAGTGTATAGCTCTGCAAAT



AATTGTACCTTTGAGTACGTGAGCCAGCCCTTTCTGATGGA



CCTGGAGGGAAAACAAGGAAACTTCAAAAACCTGCGGGAA



TTCGTTTTCAAAAACATCGACGGCTATTTCAAGATCTATAG



CAAGCATACCCCAATCAACCTCGTGAGGGGCCTCCCCCAG



GGCTTTAGCGCACTGGAGCCACTGGTTGACCTGCCTATCGG



CATTAATATCACAAGATTTCAGACCCTGCATAGAAGCTATC



TGACCCCTGGAGACTCCTCTAGTGGGTGGACTGCCGGCGCC



GCTGCCTACTATGTGGGCTATCTGCAGCCACGGACATTCCT



GCTGAAATACAATGAGAACGGGACAATCACAGATGCTGTT



GATTGCGCACTCGACCCCCTGTCCGAGACAAAGTGCACTCT



CAAGAGCTTTACCGTCGAGAAGGGCATCTATCAGACCTCA



AACTTCAGGGTGCAGCCCACAGAATCTATCGTGCGCTTCCC



TAATATCACTAACCTGTGTCCTTTCGGTGAAGTGTTCAACG



CCACCAGGTTTGCTAGCGTGTATGCCTGGAACAGGAAGAG



GATCTCTAACTGCGTCGCCGACTATTCCGTGCTGTATAACA



GCGCCTCCTTCTCCACATTCAAATGCTATGGAGTGAGCCCG



ACAAAACTGAACGATCTCTGCTTTACAAATGTCTACGCCGA



CTCTTTTGTGATCAGAGGGGACGAGGTCCGGCAGATCGCAC



CAGGACAGACAGGCAACATTGCTGACTACAACTATAAGCT



GCCTGACGACTTCACAGGATGTGTGATCGCATGGAACTCAA



ACAATCTGGACTCCAAAGTCGGGGGCAACTATAATTACCTG



TATCGCCTGTTCCGGAAGTCCAACCTGAAGCCCTTCGAGAG



GGACATCAGTACAGAGATCTATCAGGCTGGCTCCACCCCTT



GCAATGGCGTCAAGGGCTTTAATTGTTATTTTCCCCTGCAG



TCTTACGGGTTTCAGCCTACTTACGGAGTTGGGTACCAGCC



ATACAGAGTGGTCGTGCTCAGCTTCGAGCTCCTGCATGCTC



CAGCTACAGTTTGCGGGCCAAAGAAGTCCACTAACCTGGT



GAAGAATAAGTGCGTCAACTTCAACTTTAACGGGCTCACCG



GCACCGGCGTGCTGACTGAGAGCAACAAGAAGTTTCTGCC



ATTTCAACAGTTTGGACGGGACATTGCCGACACCACCGATG



CCGTTCGGGATCCACAGACCCTGGAAATTCTGGACATTACA



CCGTGCAGCTTCGGGGGCGTGAGCGTGATCACACCCGGAA



CCAATACAAGCAACCAGGTTGCCGTCCTGTATCAGGGCGT



CAATTGCACAGAAGTGCCAGTTGCTATCCACGCAGACCAG



CTGACTCCCACATGGCGGGTGTATAGCACCGGATCCAACGT



GTTTCAGACCCGCGCCGGATGTCTCATTGGGGCCGAGCACG



TGAATAACAGCTACGAGTGCGACATCCCCATTGGCGCCGG



CATTTGTGCGTCTTACCAGACTCAGACCAACTCTCCTGGCT



CCGCCTCTTCCGTTGCTAGTCAGTCTATTATTGCCTATACCA



TGAGCCTCGGAGTGGAGAATAGCGTGGCCTACTCCAATAA



TTCCATCGCAATCCCTACTAACTTCACTATTTCTGTGACCAC



CGAGATCCTGCCTGTGTCTATGACTAAGACTAGCGTTGATT



GTACCATGTATATTTGTGGCGACTCTACCGAATGTTCTAAC



CTGCTGCTTCAGTACGGCTCATTTTGCACACAGCTGAACAG



AGCCCTGACTGGGATCGCTGTGGAGCAGGACAAGAACACA



CAGGAGGTGTTTGCACAGGTGAAGCAGATCTATAAGACCC



CTCCTATTAAGGATTTCGGCGGATTCAATTTCTCACAGATT



CTGCCAGACCCCAGTAAGCCTTCCAAGAGGAGCTTCATCGA



GGATCTCCTGTTTAACAAGGTGACCCTGGCAGACGCCGGCT



TTATTAAGCAATATGGGGATTGCCTGGGCGACATTGCTGCC



AGAGACCTGATTTGCGCCCAGAAATTCAATGGCCTCACAGT



GCTGCCACCTCTGCTGACCGACGAGATGATCGCTCAATACA



CTAGCGCACTGCTGGCCGGAACCATCACATCAGGCTGGAC



CTTCGGGGCCGGAGCAGCACTGCAGATTCCATTCGCCATGC



AGATGGCCTATAGATTCAACGGCATTGGCGTCACACAGAA



CGTGCTGTACGAAAACCAGAAGCTCATCGCTAACCAGTTTA



ATTCCGCAATTGGAAAGATCCAAGATTCACTCAGCTCAACC



GCCTCTGCACTCGGAAAGCTGCAGGACGTGGTCAACCAGA



ATGCTCAGGCCCTGAACACACTCGTCAAGCAGCTGTCCTCT



AACTTTGGCGCTATCAGCTCCGTTCTGAACGACATTCTGAG



CCGCCTGGATCCCCCAGAGGCTGAAGTCCAGATTGACCGCC



TGATTACCGGCCGGCTGCAGTCTCTGCAAACATACGTGACC



CAGCAGCTGATCAGAGCAGCCGAGATCCGGGCATCCGCAA



ATCTGGCAGCAACTAAGATGAGCGAATGCGTGCTGGGCCA



GTCCAAGCGGGTGGACTTTTGTGGCAAGGGCTACCACCTGA



TGAGCTTCCCCCAGAGCGCCCCACATGGCGTTGTTTTTCTG



CACGTGACCTATGTCCCTGCTCAGGAAAAGAACTTTACAAC



TGCTCCTGCTATCTGCCATGACGGCAAGGCCCACTTCCCAC



GGGAGGGAGTGTTTGTGTCCAATGGCACACACTGGTTCGTG



ACCCAGAGGAACTTCTATGAACCCCAGATCATCACCACTGA



CAATACCTTCGTGTCTGGAAATTGCGACGTCGTGATCGGCA



TCGTTAACAACACCGTGTACGACCCTCTCCAGCCAGAGCTG



GACTCCTTTAAGGAGGAACTGGATAAGTATTTTAAGAACCA



CACAAGCCCAGATGTGGATCTCGGGGACATCTCCGGAATT



AACGCCTCCGTGGTGAATATCCAGAAGGAGATTGACCGCC



TAAATGAAGTTGCCAAGAACCTCAATGAGTCTCTGATTGAT



CTGCAGGAACTGGGCAAGTATGAGCAGTATATCAAATGGC



CCTGGTACATTTGGCTGGGGTTTATCGCCGGACTGATTGCC



ATCGTCATGGTGACCATCATGCTGTGTTGCATGACCTCCTG



TTGTTCCTGTCTGAAGGGCTGCTGTAGTTGCGGCTCTTGCTG



TAAATTCGACGAAGATGATAGCGAGCCCGTGCTGAAGGGC



GTGAAGCTGCATTATACCTGA





SARS-CoV-2 S protein
(SEQ ID NO: 163)


containing mutated to
MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVF


remove a furin cleavage
RSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFANPVLPF


site and to replace
NDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKV


residues 986 and 987
CEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVS


with proline and which
QPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRG


contains the D80A,
LPQGFSALEPLVDLPIGINITRFQTLHRSYLTPGDSSSGWTAGA


D215G, L242-, A243-,
AAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKCTLK


L244-, K417N, E484K,
SFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATRFAS


N501Y, D614G and
VYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLC


A701V mutations
FTNVYADSFVIRGDEVRQIAPGQTGNIADYNYKLPDDFTGCVI



AWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAG



STPCNGVKGFNCYFPLQSYGFQPTYGVGYQPYRVVVLSFELL



HAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKKFL



PFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTNTS



NQVAVLYQGVNCTEVPVAIHADQLTPTWRVYSTGSNVFQTR



AGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSPGSASSVASQ



SIIAYTMSLGVENSVAYSNNSIAIPTNFTISVTTEILPVSMTKTS



VDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDKN



TQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSFIEDLL



FNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPLL



TDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQMAYRFN



GIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQD



VVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDPPEAEVQI



DRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLGQ



SKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTTA



PAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTDNTF



VSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTSPDV



DLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYEQ



YIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCCSCG



SCCKFDEDDSEPVLKGVKLHYT





Optimized nucleotide
(SEQ ID NO: 164)


sequence encoding a
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


SARS-CoV-2 S protein
TGTGTCAACTTCACAACTAGGACTCAGCTGCCACCAGCCTA


containing the L18F,
CACCAACTCCTTCACCAGAGGCGTGTATTACCCAGACAAGG


D80A, D215G, L242-,
TGTTTAGAAGCAGCGTGCTGCACTCTACCCAGGACCTCTTT


A243-, L244-, K417N,
CTGCCCTTTTTCAGCAACGTGACATGGTTTCACGCAATTCA


E484K, N501Y, D614G
CGTGTCCGGCACTAATGGCACAAAGCGGTTCGCCAATCCA


and A701V mutations
GTCCTGCCTTTCAACGATGGCGTCTACTTTGCATCTACTGA



GAAATCCAATATCATTAGGGGATGGATCTTCGGCACAACCC



TGGATTCTAAGACCCAGAGCCTGCTGATCGTCAACAACGCC



ACAAACGTGGTCATTAAGGTTTGCGAGTTTCAGTTCTGTAA



CGATCCTTTTCTGGGCGTGTATTATCATAAGAACAATAAGA



GCTGGATGGAGTCCGAGTTTAGAGTGTATAGCTCTGCAAAT



AATTGTACCTTTGAGTACGTGAGCCAGCCCTTTCTGATGGA



CCTGGAGGGAAAACAAGGAAACTTCAAAAACCTGCGGGAA



TTCGTTTTCAAAAACATCGACGGCTATTTCAAGATCTATAG



CAAGCATACCCCAATCAACCTCGTGAGGGGCCTCCCCCAG



GGCTTTAGCGCACTGGAGCCACTGGTTGACCTGCCTATCGG



CATTAATATCACAAGATTTCAGACCCTGCATAGAAGCTATC



TGACCCCTGGAGACTCCTCTAGTGGGTGGACTGCCGGCGCC



GCTGCCTACTATGTGGGCTATCTGCAGCCACGGACATTCCT



GCTGAAATACAATGAGAACGGGACAATCACAGATGCTGTT



GATTGCGCACTCGACCCCCTGTCCGAGACAAAGTGCACTCT



CAAGAGCTTTACCGTCGAGAAGGGCATCTATCAGACCTCA



AACTTCAGGGTGCAGCCCACAGAATCTATCGTGCGCTTCCC



TAATATCACTAACCTGTGTCCTTTCGGTGAAGTGTTCAACG



CCACCAGGTTTGCTAGCGTGTATGCCTGGAACAGGAAGAG



GATCTCTAACTGCGTCGCCGACTATTCCGTGCTGTATAACA



GCGCCTCCTTCTCCACATTCAAATGCTATGGAGTGAGCCCG



ACAAAACTGAACGATCTCTGCTTTACAAATGTCTACGCCGA



CTCTTTTGTGATCAGAGGGGACGAGGTCCGGCAGATCGCAC



CAGGACAGACAGGCAACATTGCTGACTACAACTATAAGCT



GCCTGACGACTTCACAGGATGTGTGATCGCATGGAACTCAA



ACAATCTGGACTCCAAAGTCGGGGGCAACTATAATTACCTG



TATCGCCTGTTCCGGAAGTCCAACCTGAAGCCCTTCGAGAG



GGACATCAGTACAGAGATCTATCAGGCTGGCTCCACCCCTT



GCAATGGCGTCAAGGGCTTTAATTGTTATTTTCCCCTGCAG



TCTTACGGGTTTCAGCCTACTTACGGAGTTGGGTACCAGCC



ATACAGAGTGGTCGTGCTCAGCTTCGAGCTCCTGCATGCTC



CAGCTACAGTTTGCGGGCCAAAGAAGTCCACTAACCTGGT



GAAGAATAAGTGCGTCAACTTCAACTTTAACGGGCTCACCG



GCACCGGCGTGCTGACTGAGAGCAACAAGAAGTTTCTGCC



ATTTCAACAGTTTGGACGGGACATTGCCGACACCACCGATG



CCGTTCGGGATCCACAGACCCTGGAAATTCTGGACATTACA



CCGTGCAGCTTCGGGGGCGTGAGCGTGATCACACCCGGAA



CCAATACAAGCAACCAGGTTGCCGTCCTGTATCAGGGCGT



CAATTGCACAGAAGTGCCAGTTGCTATCCACGCAGACCAG



CTGACTCCCACATGGCGGGTGTATAGCACCGGATCCAACGT



GTTTCAGACCCGCGCCGGATGTCTCATTGGGGCCGAGCACG



TGAATAACAGCTACGAGTGCGACATCCCCATTGGCGCCGG



CATTTGTGCGTCTTACCAGACTCAGACCAACTCTCCTAGAA



GGGCCAGGTCCGTTGCTAGTCAGTCTATTATTGCCTATACC



ATGAGCCTCGGAGTGGAGAATAGCGTGGCCTACTCCAATA



ATTCCATCGCAATCCCTACTAACTTCACTATTTCTGTGACCA



CCGAGATCCTGCCTGTGTCTATGACTAAGACTAGCGTTGAT



TGTACCATGTATATTTGTGGCGACTCTACCGAATGTTCTAA



CCTGCTGCTTCAGTACGGCTCATTTTGCACACAGCTGAACA



GAGCCCTGACTGGGATCGCTGTGGAGCAGGACAAGAACAC



ACAGGAGGTGTTTGCACAGGTGAAGCAGATCTATAAGACC



CCTCCTATTAAGGATTTCGGCGGATTCAATTTCTCACAGAT



TCTGCCAGACCCCAGTAAGCCTTCCAAGAGGAGCTTCATCG



AGGATCTCCTGTTTAACAAGGTGACCCTGGCAGACGCCGGC



TTTATTAAGCAATATGGGGATTGCCTGGGCGACATTGCTGC



CAGAGACCTGATTTGCGCCCAGAAATTCAATGGCCTCACAG



TGCTGCCACCTCTGCTGACCGACGAGATGATCGCTCAATAC



ACTAGCGCACTGCTGGCCGGAACCATCACATCAGGCTGGA



CCTTCGGGGCCGGAGCAGCACTGCAGATTCCATTCGCCATG



CAGATGGCCTATAGATTCAACGGCATTGGCGTCACACAGA



ACGTGCTGTACGAAAACCAGAAGCTCATCGCTAACCAGTTT



AATTCCGCAATTGGAAAGATCCAAGATTCACTCAGCTCAAC



CGCCTCTGCACTCGGAAAGCTGCAGGACGTGGTCAACCAG



AATGCTCAGGCCCTGAACACACTCGTCAAGCAGCTGTCCTC



TAACTTTGGCGCTATCAGCTCCGTTCTGAACGACATTCTGA



GCCGCCTGGATAAGGTGGAGGCTGAAGTCCAGATTGACCG



CCTGATTACCGGCCGGCTGCAGTCTCTGCAAACATACGTGA



CCCAGCAGCTGATCAGAGCAGCCGAGATCCGGGCATCCGC



AAATCTGGCAGCAACTAAGATGAGCGAATGCGTGCTGGGC



CAGTCCAAGCGGGTGGACTTTTGTGGCAAGGGCTACCACCT



GATGAGCTTCCCCCAGAGCGCCCCACATGGCGTTGTTTTTC



TGCACGTGACCTATGTCCCTGCTCAGGAAAAGAACTTTACA



ACTGCTCCTGCTATCTGCCATGACGGCAAGGCCCACTTCCC



ACGGGAGGGAGTGTTTGTGTCCAATGGCACACACTGGTTCG



TGACCCAGAGGAACTTCTATGAACCCCAGATCATCACCACT



GACAATACCTTCGTGTCTGGAAATTGCGACGTCGTGATCGG



CATCGTTAACAACACCGTGTACGACCCTCTCCAGCCAGAGC



TGGACTCCTTTAAGGAGGAACTGGATAAGTATTTTAAGAAC



CACACAAGCCCAGATGTGGATCTCGGGGACATCTCCGGAA



TTAACGCCTCCGTGGTGAATATCCAGAAGGAGATTGACCGC



CTAAATGAAGTTGCCAAGAACCTCAATGAGTCTCTGATTGA



TCTGCAGGAACTGGGCAAGTATGAGCAGTATATCAAATGG



CCCTGGTACATTTGGCTGGGGTTTATCGCCGGACTGATTGC



CATCGTCATGGTGACCATCATGCTGTGTTGCATGACCTCCT



GTTGTTCCTGTCTGAAGGGCTGCTGTAGTTGCGGCTCTTGCT



GTAAATTCGACGAAGATGATAGCGAGCCCGTGCTGAAGGG



CGTGAAGCTGCATTATACCTGA





SARS-CoV-2 S protein
(SEQ ID NO: 165)


containing the L18F,
MFVFLVLLPLVSSQCVNFTTRTQLPPAYTNSFTRGVYYPDKVF


D80A, D215G, L242-,
RSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFANPVLPF


A243-, L244-, K417N,
NDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKV


E484K, N501Y, D614G
CEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVS


and A701V mutations
QPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRG



LPQGFSALEPLVDLPIGINITRFQTLHRSYLTPGDSSSGWTAGA



AAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKCTLK



SFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVENATRFAS



VYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLC



FTNVYADSFVIRGDEVRQIAPGQTGNIADYNYKLPDDFTGCVI



AWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAG



STPCNGVKGFNCYFPLQSYGFQPTYGVGYQPYRVVVLSFELL



HAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKKFL



PFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTNTS



NQVAVLYQGVNCTEVPVAIHADQLTPTWRVYSTGSNVFQTR



AGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSPRRARSVAS



QSIIAYTMSLGVENSVAYSNNSIAIPTNFTISVTTEILPVSMTKT



SVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDK



NTQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSFIEDL



LFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPL



LTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQMAYRF



NGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQ



DVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDKVEAEV



QIDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVL



GQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNF



TTAPAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQUITTD



NTFVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTS



PDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGK



YEQYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCC



SCGSCCKFDEDDSEPVLKGVKLHYT





Optimized nucleotide
(SEQ ID NO: 166)


sequence encoding a
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


SARS-CoV-2 S protein
TGTGTCAACTTCACAACTAGGACTCAGCTGCCACCAGCCTA


containing mutated to
CACCAACTCCTTCACCAGAGGCGTGTATTACCCAGACAAGG


remove a furin cleavage
TGTTTAGAAGCAGCGTGCTGCACTCTACCCAGGACCTCTTT


site and to replace
CTGCCCTTTTTCAGCAACGTGACATGGTTTCACGCAATTCA


residues 986 and 987
CGTGTCCGGCACTAATGGCACAAAGCGGTTCGCCAATCCA


with proline and which
GTCCTGCCTTTCAACGATGGCGTCTACTTTGCATCTACTGA


contains the L18F,
GAAATCCAATATCATTAGGGGATGGATCTTCGGCACAACCC


D80A, D215G, L242-,
TGGATTCTAAGACCCAGAGCCTGCTGATCGTCAACAACGCC


A243-, L244-, K417N,
ACAAACGTGGTCATTAAGGTTTGCGAGTTTCAGTTCTGTAA


E484K, N501Y, D614G
CGATCCTTTTCTGGGCGTGTATTATCATAAGAACAATAAGA


and A701V mutations
GCTGGATGGAGTCCGAGTTTAGAGTGTATAGCTCTGCAAAT



AATTGTACCTTTGAGTACGTGAGCCAGCCCTTTCTGATGGA



CCTGGAGGGAAAACAAGGAAACTTCAAAAACCTGCGGGAA



TTCGTTTTCAAAAACATCGACGGCTATTTCAAGATCTATAG



CAAGCATACCCCAATCAACCTCGTGAGGGGCCTCCCCCAG



GGCTTTAGCGCACTGGAGCCACTGGTTGACCTGCCTATCGG



CATTAATATCACAAGATTTCAGACCCTGCATAGAAGCTATC



TGACCCCTGGAGACTCCTCTAGTGGGTGGACTGCCGGCGCC



GCTGCCTACTATGTGGGCTATCTGCAGCCACGGACATTCCT



GCTGAAATACAATGAGAACGGGACAATCACAGATGCTGTT



GATTGCGCACTCGACCCCCTGTCCGAGACAAAGTGCACTCT



CAAGAGCTTTACCGTCGAGAAGGGCATCTATCAGACCTCA



AACTTCAGGGTGCAGCCCACAGAATCTATCGTGCGCTTCCC



TAATATCACTAACCTGTGTCCTTTCGGTGAAGTGTTCAACG



CCACCAGGTTTGCTAGCGTGTATGCCTGGAACAGGAAGAG



GATCTCTAACTGCGTCGCCGACTATTCCGTGCTGTATAACA



GCGCCTCCTTCTCCACATTCAAATGCTATGGAGTGAGCCCG



ACAAAACTGAACGATCTCTGCTTTACAAATGTCTACGCCGA



CTCTTTTGTGATCAGAGGGGACGAGGTCCGGCAGATCGCAC



CAGGACAGACAGGCAACATTGCTGACTACAACTATAAGCT



GCCTGACGACTTCACAGGATGTGTGATCGCATGGAACTCAA



ACAATCTGGACTCCAAAGTCGGGGGCAACTATAATTACCTG



TATCGCCTGTTCCGGAAGTCCAACCTGAAGCCCTTCGAGAG



GGACATCAGTACAGAGATCTATCAGGCTGGCTCCACCCCTT



GCAATGGCGTCAAGGGCTTTAATTGTTATTTTCCCCTGCAG



TCTTACGGGTTTCAGCCTACTTACGGAGTTGGGTACCAGCC



ATACAGAGTGGTCGTGCTCAGCTTCGAGCTCCTGCATGCTC



CAGCTACAGTTTGCGGGCCAAAGAAGTCCACTAACCTGGT



GAAGAATAAGTGCGTCAACTTCAACTTTAACGGGCTCACCG



GCACCGGCGTGCTGACTGAGAGCAACAAGAAGTTTCTGCC



ATTTCAACAGTTTGGACGGGACATTGCCGACACCACCGATG



CCGTTCGGGATCCACAGACCCTGGAAATTCTGGACATTACA



CCGTGCAGCTTCGGGGGCGTGAGCGTGATCACACCCGGAA



CCAATACAAGCAACCAGGTTGCCGTCCTGTATCAGGGCGT



CAATTGCACAGAAGTGCCAGTTGCTATCCACGCAGACCAG



CTGACTCCCACATGGCGGGTGTATAGCACCGGATCCAACGT



GTTTCAGACCCGCGCCGGATGTCTCATTGGGGCCGAGCACG



TGAATAACAGCTACGAGTGCGACATCCCCATTGGCGCCGG



CATTTGTGCGTCTTACCAGACTCAGACCAACTCTCCTGGCT



CCGCCTCTTCCGTTGCTAGTCAGTCTATTATTGCCTATACCA



TGAGCCTCGGAGTGGAGAATAGCGTGGCCTACTCCAATAA



TTCCATCGCAATCCCTACTAACTTCACTATTTCTGTGACCAC



CGAGATCCTGCCTGTGTCTATGACTAAGACTAGCGTTGATT



GTACCATGTATATTTGTGGCGACTCTACCGAATGTTCTAAC



CTGCTGCTTCAGTACGGCTCATTTTGCACACAGCTGAACAG



AGCCCTGACTGGGATCGCTGTGGAGCAGGACAAGAACACA



CAGGAGGTGTTTGCACAGGTGAAGCAGATCTATAAGACCC



CTCCTATTAAGGATTTCGGCGGATTCAATTTCTCACAGATT



CTGCCAGACCCCAGTAAGCCTTCCAAGAGGAGCTTCATCGA



GGATCTCCTGTTTAACAAGGTGACCCTGGCAGACGCCGGCT



TTATTAAGCAATATGGGGATTGCCTGGGCGACATTGCTGCC



AGAGACCTGATTTGCGCCCAGAAATTCAATGGCCTCACAGT



GCTGCCACCTCTGCTGACCGACGAGATGATCGCTCAATACA



CTAGCGCACTGCTGGCCGGAACCATCACATCAGGCTGGAC



CTTCGGGGCCGGAGCAGCACTGCAGATTCCATTCGCCATGC



AGATGGCCTATAGATTCAACGGCATTGGCGTCACACAGAA



CGTGCTGTACGAAAACCAGAAGCTCATCGCTAACCAGTTTA



ATTCCGCAATTGGAAAGATCCAAGATTCACTCAGCTCAACC



GCCTCTGCACTCGGAAAGCTGCAGGACGTGGTCAACCAGA



ATGCTCAGGCCCTGAACACACTCGTCAAGCAGCTGTCCTCT



AACTTTGGCGCTATCAGCTCCGTTCTGAACGACATTCTGAG



CCGCCTGGATCCCCCAGAGGCTGAAGTCCAGATTGACCGCC



TGATTACCGGCCGGCTGCAGTCTCTGCAAACATACGTGACC



CAGCAGCTGATCAGAGCAGCCGAGATCCGGGCATCCGCAA



ATCTGGCAGCAACTAAGATGAGCGAATGCGTGCTGGGCCA



GTCCAAGCGGGTGGACTTTTGTGGCAAGGGCTACCACCTGA



TGAGCTTCCCCCAGAGCGCCCCACATGGCGTTGTTTTTCTG



CACGTGACCTATGTCCCTGCTCAGGAAAAGAACTTTACAAC



TGCTCCTGCTATCTGCCATGACGGCAAGGCCCACTTCCCAC



GGGAGGGAGTGTTTGTGTCCAATGGCACACACTGGTTCGTG



ACCCAGAGGAACTTCTATGAACCCCAGATCATCACCACTGA



CAATACCTTCGTGTCTGGAAATTGCGACGTCGTGATCGGCA



TCGTTAACAACACCGTGTACGACCCTCTCCAGCCAGAGCTG



GACTCCTTTAAGGAGGAACTGGATAAGTATTTTAAGAACCA



CACAAGCCCAGATGTGGATCTCGGGGACATCTCCGGAATT



AACGCCTCCGTGGTGAATATCCAGAAGGAGATTGACCGCC



TAAATGAAGTTGCCAAGAACCTCAATGAGTCTCTGATTGAT



CTGCAGGAACTGGGCAAGTATGAGCAGTATATCAAATGGC



CCTGGTACATTTGGCTGGGGTTTATCGCCGGACTGATTGCC



ATCGTCATGGTGACCATCATGCTGTGTTGCATGACCTCCTG



TTGTTCCTGTCTGAAGGGCTGCTGTAGTTGCGGCTCTTGCTG



TAAATTCGACGAAGATGATAGCGAGCCCGTGCTGAAGGGC



GTGAAGCTGCATTATACCTGA





SARS-CoV-2 S protein
(SEQ ID NO: 167)


containing mutated to
MFVFLVLLPLVSSQCVNFTTRTQLPPAYTNSFTRGVYYPDKVF


remove a furin cleavage
RSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFANPVLPF


site and to replace
NDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKV


residues 986 and 987
CEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVS


with proline and which
QPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRG


contains the L18F,
LPQGFSALEPLVDLPIGINITRFQTLHRSYLTPGDSSSGWTAGA


D80A, D215G, L242-,
AAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKCTLK


A243-, L244-, K417N,
SFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATRFAS


E484K, N501Y, D614G
VYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLC


and A701V mutations
FTNVYADSFVIRGDEVRQIAPGQTGNIADYNYKLPDDFTGCVI



AWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAG



STPCNGVKGFNCYFPLQSYGFQPTYGVGYQPYRVVVLSFELL



HAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKKFL



PFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTNTS



NQVAVLYQGVNCTEVPVAIHADQLTPTWRVYSTGSNVFQTR



AGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSPGSASSVASQ



SIIAYTMSLGVENSVAYSNNSIAIPTNFTISVTTEILPVSMTKTS



VDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDKN



TQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSFIEDLL



FNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPLL



TDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQMAYRFN



GIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQD



VVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDPPEAEVQI



DRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLGQ



SKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTTA



PAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTDNTF



VSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTSPDV



DLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYEQ



YIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCCSCG



SCCKFDEDDSEPVLKGVKLHYT





Optimized nucleotide
(SEQ ID NO: 168)


sequence encoding a
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


SARS-CoV-2 S protein
TGTGTCAACTTTACAAACAGGACTCAGCTGCCATCCGCCT


containing the L18F,
ACACCAACTCCTTCACCAGAGGCGTGTATTACCCAGACAAG


T20N, P26S, D138Y,
GTGTTTAGAAGCAGCGTGCTGCACTCTACCCAGGACCTCTT


R190S, K417T, E484K,
TCTGCCCTTTTTCAGCAACGTGACATGGTTTCACGCAATTC


N501Y, D614G, H655Y,
ACGTGTCCGGCACTAATGGCACAAAGCGGTTCGACAATCC


T10271 and V1176F
AGTCCTGCCTTTCAACGATGGCGTCTACTTTGCATCTACTG


mutations
AGAAATCCAATATCATTAGGGGATGGATCTTCGGCACAAC



CCTGGATTCTAAGACCCAGAGCCTGCTGATCGTCAACAACG



CCACAAACGTGGTCATTAAGGTTTGCGAGTTTCAGTTCTGT



AACTACCCTTTTCTGGGCGTGTATTATCATAAGAACAATAA



GAGCTGGATGGAGTCCGAGTTTAGAGTGTATAGCTCTGCAA



ATAATTGTACCTTTGAGTACGTGAGCCAGCCCTTTCTGATG



GACCTGGAGGGAAAACAAGGAAACTTCAAAAACCTGAGC



GAATTCGTTTTCAAAAACATCGACGGCTATTTCAAGATCTA



TAGCAAGCATACCCCAATCAACCTCGTGAGGGACCTCCCCC



AGGGCTTTAGCGCACTGGAGCCACTGGTTGACCTGCCTATC



GGCATTAATATCACAAGATTTCAGACCCTGCTGGCACTGCA



TAGAAGCTATCTGACCCCTGGAGACTCCTCTAGTGGGTGGA



CTGCCGGCGCCGCTGCCTACTATGTGGGCTATCTGCAGCCA



CGGACATTCCTGCTGAAATACAATGAGAACGGGACAATCA



CAGATGCTGTTGATTGCGCACTCGACCCCCTGTCCGAGACA



AAGTGCACTCTCAAGAGCTTTACCGTCGAGAAGGGCATCTA



TCAGACCTCAAACTTCAGGGTGCAGCCCACAGAATCTATCG



TGCGCTTCCCTAATATCACTAACCTGTGTCCTTTCGGTGAA



GTGTTCAACGCCACCAGGTTTGCTAGCGTGTATGCCTGGAA



CAGGAAGAGGATCTCTAACTGCGTCGCCGACTATTCCGTGC



TGTATAACAGCGCCTCCTTCTCCACATTCAAATGCTATGGA



GTGAGCCCGACAAAACTGAACGATCTCTGCTTTACAAATGT



CTACGCCGACTCTTTTGTGATCAGAGGGGACGAGGTCCGGC



AGATCGCACCAGGACAGACAGGCACCATTGCTGACTACAA



CTATAAGCTGCCTGACGACTTCACAGGATGTGTGATCGCAT



GGAACTCAAACAATCTGGACTCCAAAGTCGGGGGCAACTA



TAATTACCTGTATCGCCTGTTCCGGAAGTCCAACCTGAAGC



CCTTCGAGAGGGACATCAGTACAGAGATCTATCAGGCTGG



CTCCACCCCTTGCAATGGCGTCAAGGGCTTTAATTGTTATT



TTCCCCTGCAGTCTTACGGGTTTCAGCCTACTTACGGAGTT



GGGTACCAGCCATACAGAGTGGTCGTGCTCAGCTTCGAGCT



CCTGCATGCTCCAGCTACAGTTTGCGGGCCAAAGAAGTCCA



CTAACCTGGTGAAGAATAAGTGCGTCAACTTCAACTTTAAC



GGGCTCACCGGCACCGGCGTGCTGACTGAGAGCAACAAGA



AGTTTCTGCCATTTCAACAGTTTGGACGGGACATTGCCGAC



ACCACCGATGCCGTTCGGGATCCACAGACCCTGGAAATTCT



GGACATTACACCGTGCAGCTTCGGGGGCGTGAGCGTGATC



ACACCCGGAACCAATACAAGCAACCAGGTTGCCGTCCTGT



ATCAGGGCGTCAATTGCACAGAAGTGCCAGTTGCTATCCA



CGCAGACCAGCTGACTCCCACATGGCGGGTGTATAGCACC



GGATCCAACGTGTTTCAGACCCGCGCCGGATGTCTCATTGG



GGCCGAGTACGTGAATAACAGCTACGAGTGCGACATCCCC



ATTGGCGCCGGCATTTGTGCGTCTTACCAGACTCAGACCAA



CTCTCCTAGAAGGGCCAGGTCCGTTGCTAGTCAGTCTATTA



TTGCCTATACCATGAGCCTCGGAGCTGAGAATAGCGTGGCC



TACTCCAATAATTCCATCGCAATCCCTACTAACTTCACTATT



TCTGTGACCACCGAGATCCTGCCTGTGTCTATGACTAAGAC



TAGCGTTGATTGTACCATGTATATTTGTGGCGACTCTACCG



AATGTTCTAACCTGCTGCTTCAGTACGGCTCATTTTGCACA



CAGCTGAACAGAGCCCTGACTGGGATCGCTGTGGAGCAGG



ACAAGAACACACAGGAGGTGTTTGCACAGGTGAAGCAGAT



CTATAAGACCCCTCCTATTAAGGATTTCGGCGGATTCAATT



TCTCACAGATTCTGCCAGACCCCAGTAAGCCTTCCAAGAGG



AGCTTCATCGAGGATCTCCTGTTTAACAAGGTGACCCTGGC



AGACGCCGGCTTTATTAAGCAATATGGGGATTGCCTGGGCG



ACATTGCTGCCAGAGACCTGATTTGCGCCCAGAAATTCAAT



GGCCTCACAGTGCTGCCACCTCTGCTGACCGACGAGATGAT



CGCTCAATACACTAGCGCACTGCTGGCCGGAACCATCACAT



CAGGCTGGACCTTCGGGGCCGGAGCAGCACTGCAGATTCC



ATTCGCCATGCAGATGGCCTATAGATTCAACGGCATTGGCG



TCACACAGAACGTGCTGTACGAAAACCAGAAGCTCATCGC



TAACCAGTTTAATTCCGCAATTGGAAAGATCCAAGATTCAC



TCAGCTCAACCGCCTCTGCACTCGGAAAGCTGCAGGACGTG



GTCAACCAGAATGCTCAGGCCCTGAACACACTCGTCAAGC



AGCTGTCCTCTAACTTTGGCGCTATCAGCTCCGTTCTGAAC



GACATTCTGAGCCGCCTGGATAAGGTGGAGGCTGAAGTCC



AGATTGACCGCCTGATTACCGGCCGGCTGCAGTCTCTGCAA



ACATACGTGACCCAGCAGCTGATCAGAGCAGCCGAGATCC



GGGCATCCGCAAATCTGGCAGCAATCAAGATGAGCGAATG



CGTGCTGGGCCAGTCCAAGCGGGTGGACTTTTGTGGCAAG



GGCTACCACCTGATGAGCTTCCCCCAGAGCGCCCCACATGG



CGTTGTTTTTCTGCACGTGACCTATGTCCCTGCTCAGGAAA



AGAACTTTACAACTGCTCCTGCTATCTGCCATGACGGCAAG



GCCCACTTCCCACGGGAGGGAGTGTTTGTGTCCAATGGCAC



ACACTGGTTCGTGACCCAGAGGAACTTCTATGAACCCCAGA



TCATCACCACTGACAATACCTTCGTGTCTGGAAATTGCGAC



GTCGTGATCGGCATCGTTAACAACACCGTGTACGACCCTCT



CCAGCCAGAGCTGGACTCCTTTAAGGAGGAACTGGATAAG



TATTTTAAGAACCACACAAGCCCAGATGTGGATCTCGGGG



ACATCTCCGGAATTAACGCCTCCTTCGTGAATATCCAGAAG



GAGATTGACCGCCTAAATGAAGTTGCCAAGAACCTCAATG



AGTCTCTGATTGATCTGCAGGAACTGGGCAAGTATGAGCA



GTATATCAAATGGCCCTGGTACATTTGGCTGGGGTTTATCG



CCGGACTGATTGCCATCGTCATGGTGACCATCATGCTGTGT



TGCATGACCTCCTGTTGTTCCTGTCTGAAGGGCTGCTGTAG



TTGCGGCTCTTGCTGTAAATTCGACGAAGATGATAGCGAGC



CCGTGCTGAAGGGCGTGAAGCTGCATTATACCTGA





SARS-CoV-2 S protein
(SEQ ID NO: 169)


containing the L18F,
MFVFLVLLPLVSSQCVNFTNRTQLPSAYTNSFTRGVYYPDKV


T20N, P26S, D138Y,
FRSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFDNPVLP


R190S, K417T, E484K,
FNDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIK


N501Y, D614G, H655Y,
VCEFQFCNYPFLGVYYHKNNKSWMESEFRVYSSANNCTFEY


T10271 and V1176F
VSQPFLMDLEGKQGNFKNLSEFVFKNIDGYFKIYSKHTPINLV


mutations
RDLPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSG



WTAGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSET



KCTLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFN



ATRFASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPT



KLNDLCFTNVYADSFVIRGDEVRQIAPGQTGTIADYNYKLPD



DFTGCVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDIS



TEIYQAGSTPCNGVKGFNCYFPLQSYGFQPTYGVGYQPYRVV



VLSFELLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLT



ESNKKFLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSV



ITPGTNTSNQVAVLYQGVNCTEVPVAIHADQLTPTWRVYSTG



SNVFQTRAGCLIGAEYVNNSYECDIPIGAGICASYQTQTNSPRR



ARSVASQSIIAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEILP



VSMTKTSVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIA



VEQDKNTQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKR



SFIEDLLFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGL



TVLPPLLTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQ



MAYRFNGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASA



LGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDK



VEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASANLAAIKM



SECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPA



QEKNFTTAPAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEP



QIITTDNTFVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYF



KNHTSPDVDLGDISGINASFVNIQKEIDRLNEVAKNLNESLIDL



QELGKYEQYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSC



LKGCCSCGSCCKFDEDDSEPVLKGVKLHYT





Optimized nucleotide
(SEQ ID NO: 170)


sequence encoding a
ATGTTCGTCTTCCTCGTGCTGCTCCCACTCGTTTCTTCCCAG


SARS-CoV-2 S protein
TGTGTCAACTTTACAAACAGGACTCAGCTGCCATCCGCCT


mutated to remove a
ACACCAACTCCTTCACCAGAGGCGTGTATTACCCAGACAAG


furin cleavage site and
GTGTTTAGAAGCAGCGTGCTGCACTCTACCCAGGACCTCTT


to replace residues 986
TCTGCCCTTTTTCAGCAACGTGACATGGTTTCACGCAATTC


and 987 with proline
ACGTGTCCGGCACTAATGGCACAAAGCGGTTCGACAATCC


and which contains the
AGTCCTGCCTTTCAACGATGGCGTCTACTTTGCATCTACTG


L18F, T20N, P26S,
AGAAATCCAATATCATTAGGGGATGGATCTTCGGCACAAC


D138Y, R190S, K417T,
CCTGGATTCTAAGACCCAGAGCCTGCTGATCGTCAACAACG


E484K, N501Y, D614G,
CCACAAACGTGGTCATTAAGGTTTGCGAGTTTCAGTTCTGT


H655Y, T1027I and
AACTACCCTTTTCTGGGCGTGTATTATCATAAGAACAATAA


V1176F mutations
GAGCTGGATGGAGTCCGAGTTTAGAGTGTATAGCTCTGCAA



ATAATTGTACCTTTGAGTACGTGAGCCAGCCCTTTCTGATG



GACCTGGAGGGAAAACAAGGAAACTTCAAAAACCTGAGC



GAATTCGTTTTCAAAAACATCGACGGCTATTTCAAGATCTA



TAGCAAGCATACCCCAATCAACCTCGTGAGGGACCTCCCCC



AGGGCTTTAGCGCACTGGAGCCACTGGTTGACCTGCCTATC



GGCATTAATATCACAAGATTTCAGACCCTGCTGGCACTGCA



TAGAAGCTATCTGACCCCTGGAGACTCCTCTAGTGGGTGGA



CTGCCGGCGCCGCTGCCTACTATGTGGGCTATCTGCAGCCA



CGGACATTCCTGCTGAAATACAATGAGAACGGGACAATCA



CAGATGCTGTTGATTGCGCACTCGACCCCCTGTCCGAGACA



AAGTGCACTCTCAAGAGCTTTACCGTCGAGAAGGGCATCTA



TCAGACCTCAAACTTCAGGGTGCAGCCCACAGAATCTATCG



TGCGCTTCCCTAATATCACTAACCTGTGTCCTTTCGGTGAA



GTGTTCAACGCCACCAGGTTTGCTAGCGTGTATGCCTGGAA



CAGGAAGAGGATCTCTAACTGCGTCGCCGACTATTCCGTGC



TGTATAACAGCGCCTCCTTCTCCACATTCAAATGCTATGGA



GTGAGCCCGACAAAACTGAACGATCTCTGCTTTACAAATGT



CTACGCCGACTCTTTTGTGATCAGAGGGGACGAGGTCCGGC



AGATCGCACCAGGACAGACAGGCACCATTGCTGACTACAA



CTATAAGCTGCCTGACGACTTCACAGGATGTGTGATCGCAT



GGAACTCAAACAATCTGGACTCCAAAGTCGGGGGCAACTA



TAATTACCTGTATCGCCTGTTCCGGAAGTCCAACCTGAAGC



CCTTCGAGAGGGACATCAGTACAGAGATCTATCAGGCTGG



CTCCACCCCTTGCAATGGCGTCAAGGGCTTTAATTGTTATT



TTCCCCTGCAGTCTTACGGGTTTCAGCCTACTTACGGAGTT



GGGTACCAGCCATACAGAGTGGTCGTGCTCAGCTTCGAGCT



CCTGCATGCTCCAGCTACAGTTTGCGGGCCAAAGAAGTCCA



CTAACCTGGTGAAGAATAAGTGCGTCAACTTCAACTTTAAC



GGGCTCACCGGCACCGGCGTGCTGACTGAGAGCAACAAGA



AGTTTCTGCCATTTCAACAGTTTGGACGGGACATTGCCGAC



ACCACCGATGCCGTTCGGGATCCACAGACCCTGGAAATTCT



GGACATTACACCGTGCAGCTTCGGGGGCGTGAGCGTGATC



ACACCCGGAACCAATACAAGCAACCAGGTTGCCGTCCTGT



ATCAGGGCGTCAATTGCACAGAAGTGCCAGTTGCTATCCA



CGCAGACCAGCTGACTCCCACATGGCGGGTGTATAGCACC



GGATCCAACGTGTTTCAGACCCGCGCCGGATGTCTCATTGG



GGCCGAGTACGTGAATAACAGCTACGAGTGCGACATCCCC



ATTGGCGCCGGCATTTGTGCGTCTTACCAGACTCAGACCAA



CTCTCCTGGCTCCGCCTCTTCCGTTGCTAGTCAGTCTATTAT



TGCCTATACCATGAGCCTCGGAGCTGAGAATAGCGTGGCCT



ACTCCAATAATTCCATCGCAATCCCTACTAACTTCACTATTT



CTGTGACCACCGAGATCCTGCCTGTGTCTATGACTAAGACT



AGCGTTGATTGTACCATGTATATTTGTGGCGACTCTACCGA



ATGTTCTAACCTGCTGCTTCAGTACGGCTCATTTTGCACAC



AGCTGAACAGAGCCCTGACTGGGATCGCTGTGGAGCAGGA



CAAGAACACACAGGAGGTGTTTGCACAGGTGAAGCAGATC



TATAAGACCCCTCCTATTAAGGATTTCGGCGGATTCAATTT



CTCACAGATTCTGCCAGACCCCAGTAAGCCTTCCAAGAGGA



GCTTCATCGAGGATCTCCTGTTTAACAAGGTGACCCTGGCA



GACGCCGGCTTTATTAAGCAATATGGGGATTGCCTGGGCGA



CATTGCTGCCAGAGACCTGATTTGCGCCCAGAAATTCAATG



GCCTCACAGTGCTGCCACCTCTGCTGACCGACGAGATGATC



GCTCAATACACTAGCGCACTGCTGGCCGGAACCATCACATC



AGGCTGGACCTTCGGGGCCGGAGCAGCACTGCAGATTCCA



TTCGCCATGCAGATGGCCTATAGATTCAACGGCATTGGCGT



CACACAGAACGTGCTGTACGAAAACCAGAAGCTCATCGCT



AACCAGTTTAATTCCGCAATTGGAAAGATCCAAGATTCACT



CAGCTCAACCGCCTCTGCACTCGGAAAGCTGCAGGACGTG



GTCAACCAGAATGCTCAGGCCCTGAACACACTCGTCAAGC



AGCTGTCCTCTAACTTTGGCGCTATCAGCTCCGTTCTGAAC



GACATTCTGAGCCGCCTGGATCCCCCAGAGGCTGAAGTCCA



GATTGACCGCCTGATTACCGGCCGGCTGCAGTCTCTGCAAA



CATACGTGACCCAGCAGCTGATCAGAGCAGCCGAGATCCG



GGCATCCGCAAATCTGGCAGCAATCAAGATGAGCGAATGC



GTGCTGGGCCAGTCCAAGCGGGTGGACTTTTGTGGCAAGG



GCTACCACCTGATGAGCTTCCCCCAGAGCGCCCCACATGGC



GTTGTTTTTCTGCACGTGACCTATGTCCCTGCTCAGGAAAA



GAACTTTACAACTGCTCCTGCTATCTGCCATGACGGCAAGG



CCCACTTCCCACGGGAGGGAGTGTTTGTGTCCAATGGCACA



CACTGGTTCGTGACCCAGAGGAACTTCTATGAACCCCAGAT



CATCACCACTGACAATACCTTCGTGTCTGGAAATTGCGACG



TCGTGATCGGCATCGTTAACAACACCGTGTACGACCCTCTC



CAGCCAGAGCTGGACTCCTTTAAGGAGGAACTGGATAAGT



ATTTTAAGAACCACACAAGCCCAGATGTGGATCTCGGGGA



CATCTCCGGAATTAACGCCTCCTTCGTGAATATCCAGAAGG



AGATTGACCGCCTAAATGAAGTTGCCAAGAACCTCAATGA



GTCTCTGATTGATCTGCAGGAACTGGGCAAGTATGAGCAGT



ATATCAAATGGCCCTGGTACATTTGGCTGGGGTTTATCGCC



GGACTGATTGCCATCGTCATGGTGACCATCATGCTGTGTTG



CATGACCTCCTGTTGTTCCTGTCTGAAGGGCTGCTGTAGTT



GCGGCTCTTGCTGTAAATTCGACGAAGATGATAGCGAGCCC



GTGCTGAAGGGCGTGAAGCTGCATTATACCTGA





SARS-CoV-2 S protein
(SEQ ID NO: 171)


containing mutated to
MFVFLVLLPLVSSQCVNFTNRTQLPSAYTNSFTRGVYYPDKV


remove a furin cleavage
FRSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFDNPVLP


site and to replace
FNDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIK


residues 986 and 987
VCEFQFCNYPFLGVYYHKNNKSWMESEFRVYSSANNCTFEY


with proline and which
VSQPFLMDLEGKQGNFKNLSEFVFKNIDGYFKIYSKHTPINLV


contains the L18F,
RDLPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSG


T20N, P26S, D138Y,
WTAGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSET


R190S, K417T, E484K,
KCTLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVEN


N501Y, D614G, H655Y,
ATRFASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPT


T10271 and V1176F
KLNDLCFTNVYADSFVIRGDEVRQIAPGQTGTIADYNYKLPD


mutations
DFTGCVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDIS



TEIYQAGSTPCNGVKGFNCYFPLQSYGFQPTYGVGYQPYRVV



VLSFELLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLT



ESNKKFLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSV



ITPGTNTSNQVAVLYQGVNCTEVPVAIHADQLTPTWRVYSTG



SNVFQTRAGCLIGAEYVNNSYECDIPIGAGICASYQTQTNSPGS




ASSVASQSIIAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEILPV




SMTKTSVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAV



EQDKNTQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRS



FIEDLLFNKVTLADAGFIKQYGDCLGDIAARDLICAQKENGLT



VLPPLLTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQ



MAYRFNGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASA



LGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDPP



EAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASANLAAIKMSE



CVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQE



KNFTTAPAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQII



TTDNTFVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKN



HTSPDVDLGDISGINASFVNIQKEIDRLNEVAKNLNESLIDLQE



LGKYEQYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLK



GCCSCGSCCKFDEDDSEPVLKGVKLHYT









Peptide Fusions

The inventors have identified regions in the SARS-COV-2 S protein which are likely to be highly antigenic. These include residues 815-833 (FP), 820-846 (D1) 1078-1111 (D2) and residues 815-846 (F1/D1). The sequences for these antigenic fragments in the full-length SARS-CoV-2 protein with the amino acid sequence of SEQ ID NO: 1 are SFIEDLLFNKVTLADAGF (SEQ ID NO: 21), LLFNKVTLADAGFIKQYGDCLGDIAA (SEQ ID NO: 22), PAICHDGKAHFPREGVFVSNGTHWFVTQRNFYE (SEQ ID NO: 23), and GGFNFSQILPDPSKPSKRSFIEDLLFNKVTLA (SEQ ID NO: 24), respectively. The antigenic regions can be arranged in different orders to form a variety of fusion peptides that are likely to be highly antigenic and therefore are expected to induce a strong immunogenic response. The domains can be linked by a linker sequence, e.g., GGGGS. Alternatively, given the similarity in their amino acid sequences, the FP and DI regions can be overlapped to produce a single immunogenic motif:











(SEQ ID NO: 99)



SFIEDLLFNKVTLADAGFIKQYGDCLGDIAA (FP/D1),







with the overlap sequence underlined.


An exemplary peptide fusion may have the following domains:

    • D1-linker-FP-linker-D2-linker-D1 (Fusion peptide A)
    • FP/D1-linker-FP/D1-linker-FP/D1 (Fusion peptide B)


Accordingly, the invention provides optimized nucleotide sequences that encode fusion peptides comprising antigenic regions of the SARS-COV-2 S protein. In one embodiment, an optimized nucleotide sequence encodes an amino acid sequence comprising Fusion peptide A. For example, the optimized nucleotide sequence can encode the amino acid sequence of SEQ ID NO: 25. In a particular embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 26. In another embodiment, the optimized nucleotide sequence encodes an amino acid sequence comprising Fusion peptide B. For example, the optimized nucleotide sequence can encode an amino acid sequence of SEQ ID NO: 27. In a particular embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 28.


In certain embodiments, the fusion peptide may be operably linked to an N terminal signal sequence, such as SEQ ID NO: 7. For example, an optimized nucleotide sequence may encode an amino acid sequence comprising Fusion peptide A operably linked with an N terminal signal sequence. The optimized nucleotide sequence can encode the amino acid sequence of SEQ ID NO: 51. In a particular embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 52. Alternatively, the optimized nucleotide sequence may encode an amino acid sequence comprising Fusion peptide B operably linked with an N terminal signal sequence. The optimized nucleotide sequence can encode the amino acid sequence of SEQ ID NO: 53. In a particular embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 54.


Additionally, the fusion peptides can be operably linked with a C-terminal Fc domain, typically in addition to an N terminal signal sequence. For example, an optimized nucleotide sequence may encode an amino acid sequence comprising Fusion peptide A operably linked with a C terminal Fc domain (e.g., SEQ ID NO: 18) and an N terminal signal sequence (e.g., SEQ ID NO: 7). The optimized nucleotide sequence can encode the amino acid sequence of SEQ ID NO: 55. In a particular embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 56. Alternatively, the optimized nucleotide sequence may encode an amino acid sequence comprising Fusion peptide B operably linked with a C terminal Fc domain (e.g., SEQ ID NO: 18) and an N terminal signal sequence (e.g., SEQ ID NO: 7). The optimized nucleotide sequence can encode the amino acid sequence of SEQ ID NO: 57. In a particular embodiment, the optimized nucleotide sequence has the sequence of SEQ ID NO: 58.


In some embodiments, the fusion peptides can be operably linked with a C terminal Fc domain which has been altered to improve circulation half-life of the resulting fusion protein. In particular embodiment, the Fc domain with improve circulation half-life has the amino acid sequence of SEQ ID NO: 100, SEQ ID NO: 101, SEQ ID NO: 102 or SEQ ID NO: 103. Accordingly, the invention also provides an optimized nucleotide sequence that encodes Fusion peptide A or Fusion peptide B, operably linked with an N-terminal signal peptide and a C-terminal Fc domain having the amino acid sequence of SEQ ID NO: 100, SEQ ID NO: 101, SEQ ID NO: 102 or SEQ ID NO: 103. The signal peptide can have the amino acid sequence of SEQ ID NO:7.


Exemplary Optimized Nucleotide Sequences Encoding a Fusion Peptide

An optimized nucleotide sequence according to the present invention may encode one or more antigenic regions of a SARS-COV-2 S protein in the form of a fusion peptide. In one embodiment, the invention provides a nucleic acid comprising an optimized nucleotide sequence encoding one or more antigenic regions of the SARS-COV-2 S protein in the form of a fusion peptide. In some embodiments, the nucleic acid is an mRNA comprising an optimized nucleotide sequence encoding one or more antigenic regions of the SARS-COV-2 S protein in the form of a fusion peptide. In some embodiments, a suitable mRNA sequence comprises a nucleotide sequence encoding one or more antigenic regions of the SARS-COV-2 S protein in the form of a fusion peptide optimized for efficient expression in human cells. Exemplary optimized nucleotide sequences encoding antigenic regions of the SARS-COV-2 S protein in the form of a fusion peptide produced with the process for generating optimized nucleotide sequences in accordance with the invention and the corresponding amino acid sequence are shown in Table 2. Bold residues indicate those amino acids which have been mutated compared to a naturally occurring SARS-COV-2 S protein, underlined residues represent a signal peptide and the residues in italics indicate the presence of an Fc region.









TABLE 2





Exemplary fusion peptides.
















Optimized nucleotide
(SEQ ID NO: 25)


sequence encoding Fusion
ATGCTGCTGTTTAACAAAGTGACTCTGGCAGACGCAG


peptide A
GCTTTATCAAGCAGTACGGAGACTGTCTCGGGGACAT



TGCAGCCGGCGGCGGAGGCTCATCTTTCATTGAGGAC



CTGCTGTTCAACAAGGTCACTCTGGCAGATGCCGGAT



TCGGAGGAGGGGGATCTCCAGCTATCTGCCATGACGG



AAAGGCTCATTTTCCTCGGGAGGGTGTGTTTGTGTCCA



ACGGAACCCATTGGTTCGTCACACAGCGCAACTTCTA



TGAAGGAGGGGGGGGCTCCAGCTTCATCGAGGACCTG



CTCTTTAACAAAGTGACCCTGGCCGATGCTGGATTTG



GGGGAGGGGGATCCCTGCTGTTCAACAAAGTTACACT



GGCCGACGCAGGCTTCATCAAACAGTACGGCGATTGT



TTAGGGGACATCGCCGCTGGCGGCGGAGGATCACCTA



AGTCCTGCGACAAAACCCATACATGTCCACCATGCCC



AGCTCCTGAACTGCTCGGCGGGCCTAGTGTTTTCCTCT



TCCCTCCTAAGCCCAAGGATACCCTCATGATCTCTCGC



ACACCAGAAGTGACCTGCGTGGTCGTGGATGTCTCTC



ACGAGGATCCTGAAGTGAAGTTTAACTGGTATGTCGA



CGGAGTGGAAGTGCACAACGCCAAGACAAAGCCAAG



AGAAGAACAATACAATTCTACTTATAGGGTGGTGTCT



GTGCTGACAGTGCTGCACCAGGATTGGCTGAATGGAA



AAGAATATAAGTGTAAGGTCTCTAACAAGGCCCTGCC



CGCTCCAATTGAGAAGACAATTTCCAAGGCCAAGGGG



CAGCCTCGGGAACCTCAGGTGTACACACTGCCCCCAT



CCAGGGATGAACTGACTAAAAATCAGGTGTCTCTGAC



ATGCCTGGTGAAAGGGTTTTATCCAAGTGACATTGCT



GTGGAGTGGGAGTCTAATGGGCAGCCTGAAAATAACT



ACAAGACCACACCACCAGTGCTCGATAGCGACGGGTC



TTTCTTTCTGTATTCTAAACTGACCGTGGATAAATCTC



GGTGGCAGCAGGGAAACGTGTTTTCTTGCTCAGTGAT



GCACGAAGCTCTGCACAATCACTATACACAGAAATCC



CTGTCCCTGTCTCCAGGCAAATAA





Fusion peptide A
(SEQ ID NO: 26)



MLLFNKVTLADAGFIKQYGDCLGDIAAGGGGSSFIEDLL



FNKVTLADAGFGGGGSPAICHDGKAHFPREGVFVSNGT



HWFVTQRNFYEGGGGSSFIEDLLFNKVTLADAGFGGGG



SLLFNKVTLADAGFIKQYGDCLGDIAA





Optimized nucleotide
(SEQ ID NO: 27)


sequence encoding Fusion
ATGTCCTTCATTGAGGACCTGCTGTTTAATAAGGTGAC


peptide B
CCTGGCCGACGCTGGGTTCATCAAACAGTATGGAGAT



TGTCTGGGAGATATTGCAGCAGGCGGGGGCGGCAGC



AGCTTTATTGAGGACCTCCTGTTCAACAAGGTGACCC



TTGCCGACGCAGGGTTTATTAAGCAGTATGGCGACTG



TCTGGGAGACATTGCAGCCGGCGGCGGCGGGTCTTCT



TTTATCGAGGACCTGCTGTTCAACAAGGTGACACTGG



CCGACGCAGGCTTTATTAAGCAGTACGGGGACTGCCT



GGGAGACATTGCCGCCTGA





Fusion peptide B
(SEQ ID NO: 28)



MSFIEDLLFNKVTLADAGFIKQYGDCLGDIAAGGGGSSFI



EDLLFNKVTLADAGFIKQYGDCLGDIAAGGGGSSFIEDL



LFNKVTLADAGFIKQYGDCLGDIAA





Optimized nucleotide
(SEQ ID NO: 52)


sequence encoding Fusion
ATGTTCGTGTTCCTGGTGCTGCTGCCACTGGTTTCCTC


peptide A with a signal
CCAGTGTCTGCTGTTTAACAAGGTTACACTGGCAGAC


peptide
GCCGGCTTCATCAAGCAGTATGGGGACTGTCTGGGCG



ATATCGCCGCTGGCGGCGGAGGATCTAGCTTCATTGA



GGACCTGCTGTTCAACAAAGTGACTCTGGCTGACGCC



GGATTTGGCGGAGGAGGGTCTCCTGCCATTTGTCATG



ACGGGAAGGCTCATTTCCCTAGGGAGGGGGTTTTTGT



CTCCAATGGAACTCACTGGTTCGTGACCCAAAGAAAC



TTCTATGAGGGAGGTGGCGGATCCTCTTTTATCGAGG



ACCTGCTGTTTAACAAGGTCACTCTGGCCGATGCAGG



CTTCGGAGGAGGAGGGTCTCTGCTGTTCAACAAAGTT



ACTCTGGCAGATGCTGGGTTCATTAAGCAGTACGGCG



ACTGTCTGGGCGATATTGCCGCCTGA





Fusion peptide A with a
(SEQ ID NO: 51)


signal peptide

MFVFLVLLPLVSSQCLLFNKVTLADAGFIKQYGDCLGDI




AAGGGGSSFIEDLLFNKVTLADAGFGGGGSPAICHDGKA



HFPREGVFVSNGTHWFVTQRNFYEGGGGSSFIEDLLENK



VTLADAGFGGGGSLLFNKVTLADAGFIKQYGDCLGDIA



A





Optimized nucleotide
(SEQ ID NO: 54)


sequence encoding Fusion
ATGTTCGTGTTCCTGGTCCTGCTACCCCTGGTGTCCTC


peptide B with a signal
TCAGTGCTCCTTCATTGAGGACCTGCTGTTTAATAAGG


peptide
TGACCCTGGCCGACGCTGGGTTCATCAAACAGTATGG



AGATTGTCTGGGAGATATTGCAGCAGGCGGGGGCGGC



AGCAGCTTTATTGAGGACCTCCTGTTCAACAAGGTGA



CCCTTGCCGACGCAGGGTTTATTAAGCAGTATGGCGA



CTGTCTGGGAGACATTGCAGCCGGCGGCGGCGGGTCT



TCTTTTATCGAGGACCTGCTGTTCAACAAGGTGACACT



GGCCGACGCAGGCTTTATTAAGCAGTACGGGGACTGC



CTGGGAGACATTGCCGCCTGA





Fusion peptide B with a
(SEQ ID NO: 53)


signal peptide

MFVFLVLLPLVSSQCSFIEDLLFNKVTLADAGFIKQYGD




CLGDIAAGGGGSSFIEDLLFNKVTLADAGFIKQYGDCLG



DIAAGGGGSSFIEDLLFNKVTLADAGFIKQYGDCLGDIA



A





Optimized nucleotide
(SEQ ID NO: 56)


sequence encoding Fusion
ATGTTTGTGTTCCTCGTTCTGCTGCCTCTGGTGAGCTC


peptide A with a signal
CCAGTGTCTGCTGTTTAACAAAGTGACTCTGGCAGAC


peptide and an Fc region
GCAGGCTTTATCAAGCAGTACGGAGACTGTCTCGGGG



ACATTGCAGCCGGCGGCGGAGGCTCATCTTTCATTGA



GGACCTGCTGTTCAACAAGGTCACTCTGGCAGATGCC



GGATTCGGAGGAGGGGGATCTCCAGCTATCTGCCATG



ACGGAAAGGCTCATTTTCCTCGGGAGGGTGTGTTTGT



GTCCAACGGAACCCATTGGTTCGTCACACAGCGCAAC



TTCTATGAAGGAGGGGGGGGCTCCAGCTTCATCGAGG



ACCTGCTCTTTAACAAAGTGACCCTGGCCGATGCTGG



ATTTGGGGGAGGGGGATCCCTGCTGTTCAACAAAGTT



ACACTGGCCGACGCAGGCTTCATCAAACAGTACGGCG



ATTGTTTAGGGGACATCGCCGCTGGCGGCGGAGGATC



ACCTAAGTCCTGCGACAAAACCCATACATGTCCACCA



TGCCCAGCTCCTGAACTGCTCGGCGGGCCTAGTGTTTT



CCTCTTCCCTCCTAAGCCCAAGGATACCCTCATGATCT



CTCGCACACCAGAAGTGACCTGCGTGGTCGTGGATGT



CTCTCACGAGGATCCTGAAGTGAAGTTTAACTGGTAT



GTCGACGGAGTGGAAGTGCACAACGCCAAGACAAAG



CCAAGAGAAGAACAATACAATTCTACTTATAGGGTGG



TGTCTGTGCTGACAGTGCTGCACCAGGATTGGCTGAA



TGGAAAAGAATATAAGTGTAAGGTCTCTAACAAGGCC



CTGCCCGCTCCAATTGAGAAGACAATTTCCAAGGCCA



AGGGGCAGCCTCGGGAACCTCAGGTGTACACACTGCC



CCCATCCAGGGATGAACTGACTAAAAATCAGGTGTCT



CTGACATGCCTGGTGAAAGGGTTTTATCCAAGTGACA



TTGCTGTGGAGTGGGAGTCTAATGGGCAGCCTGAAAA



TAACTACAAGACCACACCACCAGTGCTCGATAGCGAC



GGGTCTTTCTTTCTGTATTCTAAACTGACCGTGGATAA



ATCTCGGTGGCAGCAGGGAAACGTGTTTTCTTGCTCA



GTGATGCACGAAGCTCTGCACAATCACTATACACAGA



AATCCCTGTCCCTGTCTCCAGGCAAATAA





Fusion peptide A with a
(SEQ ID NO: 55)


signal peptide and an Fc

MFVFLVLLPLVSSQCLLFNKVTLADAGFIKQYGDCLGDI



region
AAGGGGSSFIEDLLFNKVTLADAGFGGGGSPAICHDGKA



HFPREGVFVSNGTHWFVTQRNFYEGGGGSSFIEDLLENK



VTLADAGFGGGGSLLFNKVTLADAGFIKQYGDCLGDIA



AGGGGSPKSCDKTHTCPPCPAPELLGGPSVFLFPPKPKDT




LMISRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTK





PREEQYNSTYRVVSVLTVLHQDWLNGKEYKCKVSNKALPAP





IEKTISKAKGQPREPQVYTLPPSRDELTKNQVSLTCLVKGFY





PSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTVDKS





RWQQGNVFSCSVMHEALHNHYTQKSLSLSPGK






Optimized nucleotide
(SEQ ID NO: 58)


sequence encoding Fusion
ATGTTCGTGTTCCTGGTCCTGCTGCCTCTGGTGTCCTC


peptide B with a signal
TCAGTGCAGCTTCATCGAGGACCTGCTCTTTAACAAG


peptide and an Fc region
GTGACTCTCGCAGATGCTGGCTTCATCAAGCAGTACG



GAGACTGCCTTGGAGACATCGCTGCAGGCGGAGGGG



GCAGCAGTTTCATCGAGGACCTGCTGTTTAACAAGGT



GACCCTGGCCGACGCCGGGTTCATTAAGCAATACGGC



GATTGTCTGGGAGACATCGCAGCTGGGGGAGGGGGG



AGCTCTTTTATTGAGGACCTGCTGTTCAACAAGGTGA



CTCTGGCCGACGCAGGGTTCATCAAACAGTATGGGGA



CTGTCTGGGAGATATCGCAGCCGGGGGAGGAGGCTCC



CCTAAGTCCTGCGACAAAACCCATACATGTCCACCAT



GCCCAGCTCCTGAACTGCTCGGCGGGCCTAGTGTTTTC



CTCTTCCCTCCTAAGCCCAAGGATACCCTCATGATCTC



TCGCACACCAGAAGTGACCTGCGTGGTCGTGGATGTC



TCTCACGAGGATCCTGAAGTGAAGTTTAACTGGTATG



TCGACGGAGTGGAAGTGCACAACGCCAAGACAAAGC



CAAGAGAAGAACAATACAATTCTACTTATAGGGTGGT



GTCTGTGCTGACAGTGCTGCACCAGGATTGGCTGAAT



GGAAAAGAATATAAGTGTAAGGTCTCTAACAAGGCCC



TGCCCGCTCCAATTGAGAAGACAATTTCCAAGGCCAA



GGGGCAGCCTCGGGAACCTCAGGTGTACACACTGCCC



CCATCCAGGGATGAACTGACTAAAAATCAGGTGTCTC



TGACATGCCTGGTGAAAGGGTTTTATCCAAGTGACAT



TGCTGTGGAGTGGGAGTCTAATGGGCAGCCTGAAAAT



AACTACAAGACCACACCACCAGTGCTCGATAGCGACG



GGTCTTTCTTTCTGTATTCTAAACTGACCGTGGATAAA



TCTCGGTGGCAGCAGGGAAACGTGTTTTCTTGCTCAG



TGATGCACGAAGCTCTGCACAATCACTATACACAGAA



ATCCCTGTCCCTGTCTCCAGGCAAATAA





Fusion peptide B with a
(SEQ ID NO: 57)


signal peptide and an Fc

MFVFLVLLPLVSSQCSFIEDLLFNKVTLADAGFIKQYGD



region
CLGDIAAGGGGSSFIEDLLFNKVTLADAGFIKQYGDCLG



DIAAGGGGSSFIEDLLFNKVTLADAGFIKQYGDCLGDIA



AGGGGSPKSCDKTHTCPPCPAPELLGGPSVFLFPPKPKDT




LMISRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTK





PREEQYNSTYRVVSVLTVLHQDWLNGKEYKCKVSNKALPAP





IEKTISKAKGQPREPQVYTLPPSRDELTKNQVSLTCLVKGFY





PSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTVDKS





RWQQGNVFSCSVMHEALHNHYTQKSLSLSPGK










Other Essential Structural Proteins

Based on their homology to proteins in related β-coronaviruses, the M, N and E proteins of SARS-COV-2 are considered to play important roles in forming the structure of the virus particle. The M protein is believed to be the most abundant structural protein in the virion. It is 222 amino acids in length with 3 transmembrane domains. It has been proposed that the M protein gives the virus particle its shape. The M protein is suggested to exist as a dimer in the virion where it may adopt two different conformations allowing it to promote membrane curvature and bind to the nucleocapsid.


The 419 amino acid long N protein likely forms the nucleocapsid. It is composed of two separate domains, which are both capable of binding RNA in vitro using different mechanisms. The N protein binds the viral genome in a beads-on-a-string type conformation and can also bind to nsp3, a key component of the viral replicase complex, and the M protein.


The E protein is 77 amino acids in length and is believed to be present only in small quantities within the virus particle. One of the E protein's proposed functions is to facilitate the assembly and release of the virus. The amino acid sequence for the M, N and E proteins of SARS-CoV-2 are shown in Table 3 below.


While memory CD8+ T cells have broad reactivity against many SARS-COV-2 proteins, including ORF1ab, S, N, M, and ORF3a, most of the epitopes are located in ORF1ab and the highest density of epitopes is located in the N protein (Ferretti et al. (2020) https://doi.org/10.1101/2020.07.24.20161653). ORF1ab is encoded by residues 266 . . . 13555 of the NC 045512.2 SARS-COV-2 genome. The ORF1ab and N proteins of SARS-COV-2 may therefore be useful for inducing a T cell response.









TABLE 3





SARS-CoV-2 M, E and N proteins
















Nucleotide sequence of
(SEQ ID NO: 59)


SARS-CoV-2 M protein
ATGGCAGACAACGGTACTATTACCGTTGAGGAGCTTA


NC_004718.3 SARS-CoV-2
AACAACTCCTGGAACAATGGAACCTAGTAATAGGTTT


genome
CCTATTCCTAGCCTGGATTATGTTACTACAATTTGCCT


Range 26398..27063
ATTCTAATCGGAACAGGTTTTTGTACATAATAAAGCTT



GTTTTCCTCTGGCTCTTGTGGCCAGTAACACTTGCTTG



TTTTGTGCTTGCTGCTGTCTACAGAATTAATTGGGTGA



CTGGCGGGATTGCGATTGCAATGGCTTGTATTGTAGG



CTTGATGTGGCTTAGCTACTTCGTTGCTTCCTTCAGGC



TGTTTGCTCGTACCCGCTCAATGTGGTCATTCAACCCA



GAAACAAACATTCTTCTCAATGTGCCTCTCCGGGGGA



CAATTGTGACCAGACCGCTCATGGAAAGTGAACTTGT



CATTGGTGCTGTGATCATTCGTGGTCACTTGCGAATGG



CCGGACACTCCCTAGGGCGCTGTGACATTAAGGACCT



GCCAAAAGAGATCACTGTGGCTACATCACGAACGCTT



TCTTATTACAAATTAGGAGCGTCGCAGCGTGTAGGCA



CTGATTCAGGTTTTGCTGCATACAACCGCTACCGTATT



GGAAACTATAAATTAAATACAGACCACGCCGGTAGCA



ACGACAATATTGCTTTGCTAGTACAGTAA





SARS-CoV-2 M protein
(SEQ ID NO: 60)


sequence
MADSNGTITVEELKKLLEQWNLVIGFLFLTWICLLQFAY


Accession number
ANRNRFLYIIKLIFLWLLWPVTLACFVLAAVYRINWITG


QII57163
GIAIAMACLVGLMWLSYFIASFRLFARTRSMWSFNPETN



ILLNVPLHGTILTRPLLESELVIGAVILRGHLRIAGHHLGR



CDIKDLPKEITVATSRTLSYYKLGASQRVAGDSGFAAYS



RYRIGNYKLNTDHSSSSDNIALLVQ





Nucleotide sequence of
(SEQ ID NO: 61)


SARS-CoV-2 E protein
ATGTACTCATTCGTTTCGGAAGAAACAGGTACGTTAA


NC_004718.3 SARS-CoV-2
TAGTTAATAGCGTACTTCTTTTTCTTGCTTTCGTGGTAT


genome
TCTTGCTAGTCACACTAGCCATCCTTACTGCGCTTCGA


Range 26117..26347
TTGTGTGCGTACTGCTGCAATATTGTTAACGTGAGTTT



AGTAAAACCAACGGTTTACGTCTACTCGCGTGTTAAA



AATCTGAACTCTTCTGAAGGAGTTCCTGATCTTCTGGT



CTAA





SARS-CoV-2 E protein
(SEQ ID NO: 62)


sequence
MYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLC


Accession number P59637.1
AYCCNIVNVSLVKPSFYVYSRVKNLNSSRVPDLLV





Nucleotide sequence of
(SEQ ID NO: 63)


SARS-CoV-2 N protein
ATGTCTGATAATGGACCCCAAAATCAGCGAAATGCAC


NC_045512.2 SARS-CoV-2
CCCGCATTACGTTTGGTGGACCCTCAGATTCAACTGG


genome
CAGTAACCAGAATGGAGAACGCAGTGGGGCGCGATC


range 28274..29533
AAAACAACGTCGGCCCCAAGGTTTACCCAATAATACT



GCGTCTTGGTTCACCGCTCTCACTCAACATGGCAAGG



AAGACCTTAAATTCCCTCGAGGACAAGGCGTTCCAAT



TAACACCAATAGCAGTCCAGATGACCAAATTGGCTAC



TACCGAAGAGCTACCAGACGAATTCGTGGTGGTGACG



GTAAAATGAAAGATCTCAGTCCAAGATGGTATTTCTA



CTACCTAGGAACTGGGCCAGAAGCTGGACTTCCCTAT



GGTGCTAACAAAGACGGCATCATATGGGTTGCAACTG



AGGGAGCCTTGAATACACCAAAAGATCACATTGGCAC



CCGCAATCCTGCTAACAATGCTGCAATCGTGCTACAA



CTTCCTCAAGGAACAACATTGCCAAAAGGCTTCTACG



CAGAAGGGAGCAGAGGCGGCAGTCAAGCCTCTTCTCG



TTCCTCATCACGTAGTCGCAACAGTTCAAGAAATTCA



ACTCCAGGCAGCAGTAGGGGAACTTCTCCTGCTAGAA



TGGCTGGCAATGGCGGTGATGCTGCTCTTGCTTTGCTG



CTGCTTGACAGATTGAACCAGCTTGAGAGCAAAATGT



CTGGTAAAGGCCAACAACAACAAGGCCAAACTGTCA



CTAAGAAATCTGCTGCTGAGGCTTCTAAGAAGCCTCG



GCAAAAACGTACTGCCACTAAAGCATACAATGTAACA



CAAGCTTTCGGCAGACGTGGTCCAGAACAAACCCAAG



GAAATTTTGGGGACCAGGAACTAATCAGACAAGGAA



CTGATTACAAACATTGGCCGCAAATTGCACAATTTGC



CCCCAGCGCTTCAGCGTTCTTCGGAATGTCGCGCATTG



GCATGGAAGTCACACCTTCGGGAACGTGGTTGACCTA



CACAGGTGCCATCAAATTGGATGACAAAGATCCAAAT



TTCAAAGATCAAGTCATTTTGCTGAATAAGCATATTG



ACGCATACAAAACATTCCCACCAACAGAGCCTAAAAA



GGACAAAAAGAAGAAGGCTGATGAAACTCAAGCCTT



ACCGCAGAGACAGAAGAAACAGCAAACTGTGACTCT



TCTTCCTGCTGCAGATTTGGATGATTTCTCCAAACAAT



TGCAACAATCCATGAGCAGTGCTGACTCAACTCAGGC



CTAA





SARS-CoV-2 N protein
(SEQ ID NO: 64)


sequence
MSDNGPQNQRNAPRITFGGPSDSTGSNQNGERSGARSK


Accession number
QRRPQGLPNNTASWFTALTQHGKEDLKFPRGQGVPINT


QIS29990.1
NSSPDDQIGYYRRATRRIRGGDGKMKDLSPRWYFYYLG



TGPEAGLPYGANKDGIIWVATEGALNTPKDHIGTRNPAN



NAAIVLQLPQGTTLPKGFYAEGSRGGSQASSRSSSRSRNS



SRNLTPGSSRGTSPARMAGNGGDAALALLLLDRLNQLE



SKMSGKGQQQQGQTVTKKSAAEASKKPRQKRTATKAY



NVTQAFGRRGPEQTQGNFGDQELIRQGTDYKHWPQIAQ



FAPSASAFFGMSRIGMEVTPSGTWLTYTGAIKLDDKDPN



FKDQVILLNKHIDAYKTFPPTEPKKDKKKKADETQALPQ



RQKKQQTVTLLPAADLDDFSKQLQQSMSSADSTQA









An optimized nucleotide sequence according to the present invention may encode a SARS-COV-2 E protein or an antigenic fragment thereof. In one embodiment, the invention provides a nucleic acid comprising an optimized nucleotide sequence encoding a SARS-COV-2 E protein or an antigenic fragment thereof. In some embodiments, the nucleic acid is an mRNA comprising an optimized nucleotide sequence encoding a SARS-COV-2 E protein or an antigenic fragment thereof. In some embodiments, a suitable mRNA sequence comprises a nucleotide sequence encoding a SARS-COV-2 small envelope protein or an antigenic fragment thereof optimized for efficient expression in human cells.


An optimized nucleotide sequence according to the present invention may encode a SARS-COV-2 M protein or an antigenic fragment thereof. In one embodiment, the invention provides a nucleic acid comprising an optimized nucleotide sequence encoding a SARS-COV-2 M protein or an antigenic fragment thereof. In some embodiments, the nucleic acid is an mRNA comprising an optimized nucleotide sequence encoding a SARS-COV-2 M protein or an antigenic fragment thereof. In some embodiments, a suitable mRNA sequence comprises a nucleotide sequence encoding a SARS-COV-2 M protein or an antigenic fragment thereof optimized for efficient expression in human cells. An optimized nucleotide sequence according to the present invention may encode a SARS-COV-2 N protein or an antigenic fragment thereof. In one embodiment, the invention provides a nucleic acid comprising an optimized nucleotide sequence encoding a SARS-COV-2 N protein or an antigenic fragment thereof. In some embodiments, the nucleic acid is an mRNA comprising an optimized nucleotide sequence encoding a SARS-COV-2 N protein or an antigenic fragment thereof. In some embodiments, a suitable mRNA sequence comprises a nucleotide sequence encoding a SARS-COV-2 N protein or an antigenic fragment thereof optimized for efficient expression in human cells.


An optimized nucleotide sequence according to the present invention may encode a SARS-COV-2 ORF1ab protein or an antigenic fragment thereof. In one embodiment, the invention provides a nucleic acid comprising an optimized nucleotide sequence encoding a SARS-COV-2 ORF1ab protein or an antigenic fragment thereof. In some embodiments, the nucleic acid is an mRNA comprising an optimized nucleotide sequence encoding a SARS-COV-2 ORF1ab protein or an antigenic fragment thereof. In some embodiments, a suitable mRNA sequence comprises a nucleotide sequence encoding a SARS-COV-2 ORF1ab protein or an antigenic fragment thereof optimized for efficient expression in human cells.


In some embodiments, a first nucleic acid comprising an optimized nucleotide sequence encoding a SARS-COV-2 S protein or an antigenic fragment thereof is combined with a second nucleic acid comprising an optimized nucleotide sequence encoding a SARS-COV-2 M protein or an antigenic fragment thereof. In some embodiments, a first nucleic acid comprising an optimized nucleotide sequence encoding a SARS-COV-2 S protein or an antigenic fragment thereof is combined with a second nucleic acid comprising an optimized nucleotide sequence encoding a SARS-COV-2 N protein or an antigenic fragment thereof. In some embodiments, a first nucleic acid comprising an optimized nucleotide sequence encoding a SARS-COV-2 S protein or an antigenic fragment thereof is combined with a second nucleic acid comprising an optimized nucleotide sequence encoding a SARS-COV-2 E protein or an antigenic fragment thereof. In other embodiments, a first nucleic acid comprising an optimized nucleotide sequence encoding a SARS-CoV-2 S protein or an antigenic fragment thereof is combined with second, third and/or fourth nucleic acids, wherein said second nucleic acid comprises an optimized nucleotide sequence encoding a SARS-COV-2 M protein or an antigenic fragment thereof, wherein said third nucleic acid comprises an optimized nucleotide sequence encoding a SARS-COV-2 N protein or an antigenic fragment thereof, and wherein said fourth nucleic acid comprises an optimized nucleotide sequence encoding a SARS-COV-2 E protein or an antigenic fragment thereof.


mRNA Sequences


In some embodiments, an mRNA comprising an optimized nucleotide sequence encoding a SARS-COV-2 S protein or an antigenic fragment thereof also contains 5′ and 3′ UTR sequences. Exemplary 5′ and 3′ UTR sequences are shown below:


Exemplary 5′ UTR Sequence









(SEQ ID NO: 144)


GGACAGAUCGCCUGGAGACGCCAUCCACGCUGUUUUGACCUCCAUAGAA


GACACCGGGACCGAUCCAGCCUCCGCGGCCGGGAACGGUGCAUUGGAAC


GCGGAUUCCCCGUGCCAAGAGUGACUCACCGUCCUUGACACG






Exemplary 3′ UTR Sequence









(SEQ ID NO: 145)


CGGGUGGCAUCCCUGUGACCCCUCCCCAGUGCCUCUCCUGGCCCUGGAA


GUUGCCACUCCAGUGCCCACCAGCCUUGUCCUAAUAAAAUUAAGUUGCA


UCAAGCU


OR





(SEQ ID NO: 146)


GGGUGGCAUCCCUGUGACCCCUCCCCAGUGCCUCUCCUGGCCCUGGAAG


UUGCCACUCCAGUGCCCACCAGCCUUGUCCUAAUAAAAUUAAGUUGCAU


CAAAGCU







Exemplary mRNA Constructs


In a particular embodiment, an mRNA comprising an optimized nucleotide sequence encoding a SARS-COV-2 S protein comprises the following structural elements:









TABLE 4







Structural elements of exemplary mRNA constructs









Structural

Sequence


Element
Description
Coordinates










mRNA construct 1









Cap Structure


embedded image


1





5′ UTR
GGAC . . . CACG
1-140




(SEQ ID NO: 144)


SARS-CoV-
AUG . . . UGA
141-3962


2 S protein1

(SEQ ID NO:148),




which corresponds




to the nucleotide




sequence of




SEQ ID NO: 44


3′ UTR
CGGG . . . AGCU
3963-4067




(SEQ ID NO: 145)


PolyA tail
(A)x, x = 100-5003
NA







mRNA construct 2









Cap Structure


embedded image


1





5′ UTR
GGAC . . .CACG
1-140




(SEQ ID NO: 144)


SARS-CoV-
AUG .... UGA
141-3962


2 S protein2

(SEQ ID NO:173),




which corresponds




to the nucleotide




sequence of




SEQ ID NO: 166


3′ UTR
CGGG . . . AGCU
3963-4067




(SEQ ID NO: 145)


Poly A tail
(A)x, x = 100-5003
NA





NA = not applicable


UTR = untranslated region



1Optimized nucleotide sequence encoding a SARS-CoV-2 S protein mutated to remove a furin cleavage site and to replace residues 986 and 987 with proline




2Optimized nucleotide sequence encoding a SARS-CoV-2 S protein mutated to remove a furin cleavage site and to replace residues 986 and 987 with proline and further containing the L18F, D80A, D215G, L242-, A243-, L244-, K417N, E484K, N501Y, D614G and A701V mutations




3expected range







In one particular embodiment, the mRNA in accordance with the present invention has the following nucleic acid sequence:










(SEQ ID NO: 147)










1
GGACAGAUCG CCUGGAGACG CCAUCCACGC UGUUUUGACC UCCAUAGAAG






51
ACACCGGGAC CGAUCCAGCC UCCGCGGCCG GGAACGGUGC AUUGGAACGC





101
GGAUUCCCCG UGCCAAGAGU GACUCACCGU CCUUGACACG AUGUUCGUCU





151
UCCUCGUGCU GCUCCCACUC GUUUCUUCCC AGUGUGUCAA CCUGACAACU





201
AGGACUCAGC UGCCACCAGC CUACACCAAC UCCUUCACCA GAGGCGUGUA





251
UUACCCAGAC AAGGUGUUUA GAAGCAGCGU GCUGCACUCU ACCCAGGACC





301
UCUUUCUGCC CUUUUUCAGC AACGUGACAU GGUUUCACGC AAUUCACGUG





351
UCCGGCACUA AUGGCACAAA GCGGUUCGAC AAUCCAGUCC UGCCUUUCAA





401
CGAUGGCGUC UACUUUGCAU CUACUGAGAA AUCCAAUAUC AUUAGGGGAU





451
GGAUCUUCGG CACAACCCUG GAUUCUAAGA CCCAGAGCCU GCUGAUCGUC





501
AACAACGCCA CAAACGUGGU CAUUAAGGUU UGCGAGUUUC AGUUCUGUAA





551
CGAUCCUUUU CUGGGCGUGU AUUAUCAUAA GAACAAUAAG AGCUGGAUGG





601
AGUCCGAGUU UAGAGUGUAU AGCUCUGCAA AUAAUUGUAC CUUUGAGUAC





651
GUGAGCCAGC CCUUUCUGAU GGACCUGGAG GGAAAACAAG GAAACUUCAA





701
AAACCUGCGG GAAUUCGUUU UCAAAAACAU CGACGGCUAU UUCAAGAUCU





751
AUAGCAAGCA UACCCCAAUC AACCUCGUGA GGGACCUCCC CCAGGGCUUU





801
AGCGCACUGG AGCCACUGGU UGACCUGCCU AUCGGCAUUA AUAUCACAAG





851
AUUUCAGACC CUGCUGGCAC UGCAUAGAAG CUAUCUGACC CCUGGAGACU





901
CCUCUAGUGG GUGGACUGCC GGCGCCGCUG CCUACUAUGU GGGCUAUCUG





951
CAGCCACGGA CAUUCCUGCU GAAAUACAAU GAGAACGGGA CAAUCACAGA





1001
UGCUGUUGAU UGCGCACUCG ACCCCCUGUC CGAGACAAAG UGCACUCUCA





1051
AGAGCUUUAC CGUCGAGAAG GGCAUCUAUC AGACCUCAAA CUUCAGGGUG





1101
CAGCCCACAG AAUCUAUCGU GCGCUUCCCU AAUAUCACUA ACCUGUGUCC





1151
UUUCGGUGAA GUGUUCAACG CCACCAGGUU UGCUAGCGUG UAUGCCUGGA





1201
ACAGGAAGAG GAUCUCUAAC UGCGUCGCCG ACUAUUCCGU GCUGUAUAAC





1251
AGCGCCUCCU UCUCCACAUU CAAAUGCUAU GGAGUGAGCC CGACAAAACU





1301
GAACGAUCUC UGCUUUACAA AUGUCUACGC CGACUCUUUU GUGAUCAGAG





1351
GGGACGAGGU CCGGCAGAUC GCACCAGGAC AGACAGGCAA GAUUGCUGAC





1401
UACAACUAUA AGCUGCCUGA CGACUUCACA GGAUGUGUGA UCGCAUGGAA





1451
CUCAAACAAU CUGGACUCCA AAGUCGGGGG CAACUAUAAU UACCUGUAUC





1501
GCCUGUUCCG GAAGUCCAAC CUGAAGCCCU UCGAGAGGGA CAUCAGUACA





1551
GAGAUCUAUC AGGCUGGCUC CACCCCUUGC AAUGGCGUCG AAGGCUUUAA





1601
UUGUUAUUUU CCCCUGCAGU CUUACGGGUU UCAGCCUACU AAUGGAGUUG





1651
GGUACCAGCC AUACAGAGUG GUCGUGCUCA GCUUCGAGCU CCUGCAUGCU





1701
CCAGCUACAG UUUGCGGGCC AAAGAAGUCC ACUAACCUGG UGAAGAAUAA





1751
GUGCGUCAAC UUCAACUUUA ACGGGCUCAC CGGCACCGGC GUGCUGACUG





1801
AGAGCAACAA GAAGUUUCUG CCAUUUCAAC AGUUUGGACG GGACAUUGCC





1851
GACACCACCG AUGCCGUUCG GGAUCCACAG ACCCUGGAAA UUCUGGACAU





1901
UACACCGUGC AGCUUCGGGG GCGUGAGCGU GAUCACACCC GGAACCAAUA





1951
CAAGCAACCA GGUUGCCGUC CUGUAUCAGG AUGUCAAUUG CACAGAAGUG





2001
CCAGUUGCUA UCCACGCAGA CCAGCUGACU CCCACAUGGC GGGUGUAUAG





2051
CACCGGAUCC AACGUGUUUC AGACCCGCGC CGGAUGUCUC AUUGGGGCCG





2101
AGCACGUGAA UAACAGCUAC GAGUGCGACA UCCCCAUUGG CGCCGGCAUU





2151
UGUGCGUCUU ACCAGACUCA GACCAACUCU CCUGGCUCCG CCUCUUCCGU





2201
UGCUAGUCAG UCUAUUAUUG CCUAUACCAU GAGCCUCGGA GCUGAGAAUA





2251
GCGUGGCCUA CUCCAAUAAU UCCAUCGCAA UCCCUACUAA CUUCACUAUU





2301
UCUGUGACCA CCGAGAUCCU GCCUGUGUCU AUGACUAAGA CUAGCGUUGA





2351
UUGUACCAUG UAUAUUUGUG GCGACUCUAC CGAAUGUUCU AACCUGCUGC





2401
UUCAGUACGG CUCAUUUUGC ACACAGCUGA ACAGAGCCCU GACUGGGAUC





2451
GCUGUGGAGC AGGACAAGAA CACACAGGAG GUGUUUGCAC AGGUGAAGCA





2501
GAUCUAUAAG ACCCCUCCUA UUAAGGAUUU CGGCGGAUUC AAUUUCUCAC





2551
AGAUUCUGCC AGACCCCAGU AAGCCUUCCA AGAGGAGCUU CAUCGAGGAU





2601
CUCCUGUUUA ACAAGGUGAC CCUGGCAGAC GCCGGCUUUA UUAAGCAAUA





2651
UGGGGAUUGC CUGGGCGACA UUGCUGCCAG AGACCUGAUU UGCGCCCAGA





2701
AAUUCAAUGG CCUCACAGUG CUGCCACCUC UGCUGACCGA CGAGAUGAUC





2751
GCUCAAUACA CUAGCGCACU GCUGGCCGGA ACCAUCACAU CAGGCUGGAC





2801
CUUCGGGGCC GGAGCAGCAC UGCAGAUUCC AUUCGCCAUG CAGAUGGCCU





2851
AUAGAUUCAA CGGCAUUGGC GUCACACAGA ACGUGCUGUA CGAAAACCAG





2901
AAGCUCAUCG CUAACCAGUU UAAUUCCGCA AUUGGAAAGA UCCAAGAUUC





2951
ACUCAGCUCA ACCGCCUCUG CACUCGGAAA GCUGCAGGAC GUGGUCAACC





3001
AGAAUGCUCA GGCCCUGAAC ACACUCGUCA AGCAGCUGUC CUCUAACUUU





3051
GGCGCUAUCA GCUCCGUUCU GAACGACAUU CUGAGCCGCC UGGAUCCCCC





3101
AGAGGCUGAA GUCCAGAUUG ACCGCCUGAU UACCGGCCGG CUGCAGUCUC





3151
UGCAAACAUA CGUGACCCAG CAGCUGAUCA GAGCAGCCGA GAUCCGGGCA





3201
UCCGCAAAUC UGGCAGCAAC UAAGAUGAGC GAAUGCGUGC UGGGCCAGUC





3251
CAAGCGGGUG GACUUUUGUG GCAAGGGCUA CCACCUGAUG AGCUUCCCCC





3301
AGAGCGCCCC ACAUGGCGUU GUUUUUCUGC ACGUGACCUA UGUCCCUGCU





3351
CAGGAAAAGA ACUUUACAAC UGCUCCUGCU AUCUGCCAUG ACGGCAAGGC





3401
CCACUUCCCA CGGGAGGGAG UGUUUGUGUC CAAUGGCACA CACUGGUUCG





3451
UGACCCAGAG GAACUUCUAU GAACCCCAGA UCAUCACCAC UGACAAUACC





3501
UUCGUGUCUG GAAAUUGCGA CGUCGUGAUC GGCAUCGUUA ACAACACCGU





3551
GUACGACCCU CUCCAGCCAG AGCUGGACUC CUUUAAGGAG GAACUGGAUA





3601
AGUAUUUUAA GAACCACACA AGCCCAGAUG UGGAUCUCGG GGACAUCUCC





3651
GGAAUUAACG CCUCCGUGGU GAAUAUCCAG AAGGAGAUUG ACCGCCUAAA





3701
UGAAGUUGCC AAGAACCUCA AUGAGUCUCU GAUUGAUCUG CAGGAACUGG





3751
GCAAGUAUGA GCAGUAUAUC AAAUGGCCCU GGUACAUUUG GCUGGGGUUU





3801
AUCGCCGGAC UGAUUGCCAU CGUCAUGGUG ACCAUCAUGC UGUGUUGCAU





3851
GACCUCCUGU UGUUCCUGUC UGAAGGGCUG CUGUAGUUGC GGCUCUUGCU





3901
GUAAAUUCGA CGAAGAUGAU AGCGAGCCCG UGCUGAAGGG CGUGAAGCUG





3951
CAUUAUACCU GACGGGUGGC AUCCCUGUGA CCCCUCCCCA GUGCCUCUCC





4001
UGGCCCUGGA AGUUGCCACU CCAGUGCCCA CCAGCCUUGU CCUAAUAAAA





4051
UUAAGUUGCA UCAAGCU






+Poly A Tail

Nucleic acids in bold denote start and stop codons


In another particular embodiment, the mRNA in accordance with the present invention has the following nucleic acid sequence:










(SEQ ID NO: 172)










1
GGACAGAUCG CCUGGAGACG CCAUCCACGC UGUUUUGACC UCCAUAGAAG






51
ACACCGGGAC CGAUCCAGCC UCCGCGGCCG GGAACGGUGC AUUGGAACGC





101
GGAUUCCCCG UGCCAAGAGU GACUCACCGU CCUUGACACG AUGUUCGUCU





151
UCCUCGUGCU GCUCCCACUC GUUUCUUCCC AGUGUGUCAA CUUCACAACU





201
AGGACUCAGC UGCCACCAGC CUACACCAAC UCCUUCACCA GAGGCGUGUA





251
UUACCCAGAC AAGGUGUUUA GAAGCAGCGU GCUGCACUCU ACCCAGGACC





301
UCUUUCUGCC CUUUUUCAGC AACGUGACAU GGUUUCACGC AAUUCACGUG





351
UCCGGCACUA AUGGCACAAA GCGGUUCGCC AAUCCAGUCC UGCCUUUCAA





401
CGAUGGCGUC UACUUUGCAU CUACUGAGAA AUCCAAUAUC AUUAGGGGAU





451
GGAUCUUCGG CACAACCCUG GAUUCUAAGA CCCAGAGCCU GCUGAUCGUC





501
AACAACGCCA CAAACGUGGU CAUUAAGGUU UGCGAGUUUC AGUUCUGUAA





551
CGAUCCUUUU CUGGGCGUGU AUUAUCAUAA GAACAAUAAG AGCUGGAUGG





601
AGUCCGAGUU UAGAGUGUAU AGCUCUGCAA AUAAUUGUAC CUUUGAGUAC





651
GUGAGCCAGC CCUUUCUGAU GGACCUGGAG GGAAAACAAG GAAACUUCAA





701
AAACCUGCGG GAAUUCGUUU UCAAAAACAU CGACGGCUAU UUCAAGAUCU





751
AUAGCAAGCA UACCCCAAUC AACCUCGUGA GGGGCCUCCC CCAGGGCUUU





801
AGCGCACUGG AGCCACUGGU UGACCUGCCU AUCGGCAUUA AUAUCACAAG





851
AUUUCAGACC CUGCAUAGAA GCUAUCUGAC CCCUGGAGAC UCCUCUAGUG





901
GGUGGACUGC CGGCGCCGCU GCCUACUAUG UGGGCUAUCU GCAGCCACGG





951
ACAUUCCUGC UGAAAUACAA UGAGAACGGG ACAAUCACAG AUGCUGUUGA





1001
UUGCGCACUC GACCCCCUGU CCGAGACAAA GUGCACUCUC AAGAGCUUUA





1051
CCGUCGAGAA GGGCAUCUAU CAGACCUCAA ACUUCAGGGU GCAGCCCACA





1101
GAAUCUAUCG UGCGCUUCCC UAAUAUCACU AACCUGUGUC CUUUCGGUGA





1151
AGUGUUCAAC GCCACCAGGU UUGCUAGCGU GUAUGCCUGG AACAGGAAGA





1201
GGAUCUCUAA CUGCGUCGCC GACUAUUCCG UGCUGUAUAA CAGCGCCUCC





1251
UUCUCCACAU UCAAAUGCUA UGGAGUGAGC CCGACAAAAC UGAACGAUCU





1301
CUGCUUUACA AAUGUCUACG CCGACUCUUU UGUGAUCAGA GGGGACGAGG





1351
UCCGGCAGAU CGCACCAGGA CAGACAGGCA ACAUUGCUGA CUACAACUAU





1401
AAGCUGCCUG ACGACUUCAC AGGAUGUGUG AUCGCAUGGA ACUCAAACAA





1451
UCUGGACUCC AAAGUCGGGG GCAACUAUAA UUACCUGUAU CGCCUGUUCC





1501
GGAAGUCCAA CCUGAAGCCC UUCGAGAGGG ACAUCAGUAC AGAGAUCUAU





1551
CAGGCUGGCU CCACCCCUUG CAAUGGCGUC AAGGGCUUUA AUUGUUAUUU





1601
UCCCCUGCAG UCUUACGGGU UUCAGCCUAC UUACGGAGUU GGGUACCAGC





1651
CAUACAGAGU GGUCGUGCUC AGCUUCGAGC UCCUGCAUGC UCCAGCUACA





1701
GUUUGCGGGC CAAAGAAGUC CACUAACCUG GUGAAGAAUA AGUGCGUCAA





1751
CUUCAACUUU AACGGGCUCA CCGGCACCGG CGUGCUGACU GAGAGCAACA





1801
AGAAGUUUCU GCCAUUUCAA CAGUUUGGAC GGGACAUUGC CGACACCACC





1851
GAUGCCGUUC GGGAUCCACA GACCCUGGAA AUUCUGGACA UUACACCGUG





1901
CAGCUUCGGG GGCGUGAGCG UGAUCACACC CGGAACCAAU ACAAGCAACC





1951
AGGUUGCCGU CCUGUAUCAG GGCGUCAAUU GCACAGAAGU GCCAGUUGCU





2001
AUCCACGCAG ACCAGCUGAC UCCCACAUGG CGGGUGUAUA GCACCGGAUC





2051
CAACGUGUUU CAGACCCGCG CCGGAUGUCU CAUUGGGGCC GAGCACGUGA





2101
AUAACAGCUA CGAGUGCGAC AUCCCCAUUG GCGCCGGCAU UUGUGCGUCU





2151
UACCAGACUC AGACCAACUC UCCUGGCUCC GCCUCUUCCG UUGCUAGUCA





2201
GUCUAUUAUU GCCUAUACCA UGAGCCUCGG AGUGGAGAAU AGCGUGGCCU





2251
ACUCCAAUAA UUCCAUCGCA AUCCCUACUA ACUUCACUAU UUCUGUGACC





2301
ACCGAGAUCC UGCCUGUGUC UAUGACUAAG ACUAGCGUUG AUUGUACCAU





2351
GUAUAUUUGU GGCGACUCUA CCGAAUGUUC UAACCUGCUG CUUCAGUACG





2401
GCUCAUUUUG CACACAGCUG AACAGAGCCC UGACUGGGAU CGCUGUGGAG





2451
CAGGACAAGA ACACACAGGA GGUGUUUGCA CAGGUGAAGC AGAUCUAUAA





2501
GACCCCUCCU AUUAAGGAUU UCGGCGGAUU CAAUUUCUCA CAGAUUCUGC





2551
CAGACCCCAG UAAGCCUUCC AAGAGGAGCU UCAUCGAGGA UCUCCUGUUU





2601
AACAAGGUGA CCCUGGCAGA CGCCGGCUUU AUUAAGCAAU AUGGGGAUUG





2651
CCUGGGCGAC AUUGCUGCCA GAGACCUGAU UUGCGCCCAG AAAUUCAAUG





2701
GCCUCACAGU GCUGCCACCU CUGCUGACCG ACGAGAUGAU CGCUCAAUAC





2751
ACUAGCGCAC UGCUGGCCGG AACCAUCACA UCAGGCUGGA CCUUCGGGGC





2801
CGGAGCAGCA CUGCAGAUUC CAUUCGCCAU GCAGAUGGCC UAUAGAUUCA





2851
ACGGCAUUGG CGUCACACAG AACGUGCUGU ACGAAAACCA GAAGCUCAUC





2901
GCUAACCAGU UUAAUUCCGC AAUUGGAAAG AUCCAAGAUU CACUCAGCUC





2951
AACCGCCUCU GCACUCGGAA AGCUGCAGGA CGUGGUCAAC CAGAAUGCUC





3001
AGGCCCUGAA CACACUCGUC AAGCAGCUGU CCUCUAACUU UGGCGCUAUC





3051
AGCUCCGUUC UGAACGACAU UCUGAGCCGC CUGGAUCCCC CAGAGGCUGA





3101
AGUCCAGAUU GACCGCCUGA UUACCGGCCG GCUGCAGUCU CUGCAAACAU





3151
ACGUGACCCA GCAGCUGAUC AGAGCAGCCG AGAUCCGGGC AUCCGCAAAU





3201
CUGGCAGCAA CUAAGAUGAG CGAAUGCGUG CUGGGCCAGU CCAAGCGGGU





3251
GGACUUUUGU GGCAAGGGCU ACCACCUGAU GAGCUUCCCC CAGAGCGCCC





3301
CACAUGGCGU UGUUUUUCUG CACGUGACCU AUGUCCCUGC UCAGGAAAAG





3351
AACUUUACAA CUGCUCCUGC UAUCUGCCAU GACGGCAAGG CCCACUUCCC





3401
ACGGGAGGGA GUGUUUGUGU CCAAUGGCAC ACACUGGUUC GUGACCCAGA





3451
GGAACUUCUA UGAACCCCAG AUCAUCACCA CUGACAAUAC CUUCGUGUCU





3501
GGAAAUUGCG ACGUCGUGAU CGGCAUCGUU AACAACACCG UGUACGACCC





3551
UCUCCAGCCA GAGCUGGACU CCUUUAAGGA GGAACUGGAU AAGUAUUUUA





3601
AGAACCACAC AAGCCCAGAU GUGGAUCUCG GGGACAUCUC CGGAAUUAAC





3651
GCCUCCGUGG UGAAUAUCCA GAAGGAGAUU GACCGCCUAA AUGAAGUUGC





3701
CAAGAACCUC AAUGAGUCUC UGAUUGAUCU GCAGGAACUG GGCAAGUAUG





3751
AGCAGUAUAU CAAAUGGCCC UGGUACAUUU GGCUGGGGUU UAUCGCCGGA





3801
CUGAUUGCCA UCGUCAUGGU GACCAUCAUG CUGUGUUGCA UGACCUCCUG





3851
UUGUUCCUGU CUGAAGGGCU GCUGUAGUUG CGGCUCUUGC UGUAAAUUCG





3901
ACGAAGAUGA UAGCGAGCCC GUGCUGAAGG GCGUGAAGCU GCAUUAUACC





3951

UGACGGGUGG CAUCCCUGUG ACCCCUCCCC AGUGCCUCUC CUGGCCCUGG






4001
AAGUUGCCAC UCCAGUGCCC ACCAGCCUUG UCCUAAUAAA AUUAAGUUGC





4051
AUCAAGCU






+Poly A Tail

Nucleic acids in bold denote start and stop codons


mRNA Synthesis


In Vitro Transcription

mRNAs according to the present invention may be synthesized according to any of a variety of known methods. Various methods are described in published U.S. Application No. US 2018/0258423, and can be used to practice the present invention, all of which are incorporated herein by reference. For example, mRNAs according to the present invention may be synthesized via in vitro transcription (IVT). Briefly, IVT is typically performed with a linear or circular DNA template containing a promoter, a pool of ribonucleotide triphosphates, a buffer system that may include DTT and magnesium ions, and an appropriate RNA polymerase (e.g., T3, T7, or SP6 RNA polymerase), DNAse I, pyrophosphatase, and/or RNAse inhibitor. The exact conditions will vary according to the specific application.


In some embodiments, for the preparation of mRNA according to the invention, a DNA template is transcribed in vitro. A suitable DNA template typically has a promoter, for example a T3, T7 or SP6 promoter, for in vitro transcription, followed by desired nucleotide sequence for desired mRNA and a termination signal.


Nucleotides

In some embodiments, an mRNA comprises or consists of naturally-occurring nucleosides (or unmodified nucleosides; i.e., adenosine, guanosine, cytidine, and uridine). In some embodiments an mRNA comprises one or more modified nucleosides, such as nucleoside analogs (e.g. adenosine analog, guanosine analog, cytidine analog, or uridine analog). The presence of one or more nucleoside analogs may render an mRNA more stable and/or less immunogenic than a control mRNA with the same sequence but containing only naturally-occurring nucleosides. In a particular embodiment of the invention, mRNAs comprising an optimized nucleotide sequence encoding a SARS-COV-2 antigen are synthesized with naturally-occurring nucleosides. Without wishing to be bound by any particular theory, the inventors believe that the use of mRNAs prepared with naturally-occurring nucleosides is advantageous for providing an immunogenic composition of the invention.


In some embodiments, an mRNA comprises both unmodified and modified nucleosides. In some embodiments, the one or more modified nucleosides is a nucleoside analog. In some embodiments, the one or more modified nucleosides comprises at least one modification selected from a modified sugar, and a modified nucleobase. In some embodiments, the mRNA comprises one or more modified internucleoside linkages.


In some embodiments, the one or more modified nucleosides is a nucleoside analog, for example one of 2-aminoadenosine, 2-thiothymidine, inosine, pyrrolo-pyrimidine, 3-methyl adenosine, 5-methylcytidine, C-5 propynyl-cytidine, C-5 propynyl-uridine, 2-aminoadenosine, C5-bromouridine, C5-fluorouridine, C5-iodouridine, C5-propynyl-uridine, C5-propynyl-cytidine, C5-methylcytidine, 2-aminoadenosine, 7-deazaadenosine, 7-deazaguanosine, 8-oxoadenosine, 8-oxoguanosine, O(6)-methylguanine, pseudouridine (e.g., N-1-methyl-pseudouridine), 2-thiouridine, and 2-thiocytidine. See, e.g., U.S. Pat. No. 8,278,036 or WO 2011/012316 for a discussion of 5-methyl-cytidine, pseudouridine, and 2-thio-uridine and their incorporation into mRNA. In some embodiments, the mRNA may be RNA wherein 25% of U residues are 2-thio-uridine and 25% of C residues are 5-methylcytidine. Teachings for the use of such modified RNA are disclosed in US Patent Publication US 2012/0195936 and international publication WO 2011/012316, both of which are hereby incorporated by reference in their entirety.


Post-Synthesis Processing

Typically, a 5′ cap and/or a 3′ tail may be added after mRNA synthesis. The presence of the cap is important in providing resistance to nucleases found in most eukaryotic cells. The presence of a “tail” serves to protect the mRNA from exonuclease degradation. Alternatively, the 5′ cap and/or a 3′ tail sequences are included in the DNA template sequences used in in vitro transcription reaction.


A 5′ cap is typically added as follows: first, an RNA terminal phosphatase removes one of the terminal phosphate groups from the 5′ nucleotide, leaving two terminal phosphates; guanosine triphosphate (GTP) is then added to the terminal phosphates via a guanylyl transferase, producing a 5′5′5 triphosphate linkage; and the 7-nitrogen of guanine is then methylated by a methyltransferase. Examples of cap structures include, but are not limited to, m7G(5′)ppp (5′(A,G(5′)ppp(5′)A and G(5′)ppp(5′)G. Additional cap structures are described in published U.S. Application No. US 2016/0032356 and published U.S. Application No. US 2018/0125989, which are incorporated herein by reference.


Typically, a tail structure includes a poly(A) and/or poly(C) tail. A poly-A or poly-C tail on the 3′ terminus of mRNA typically includes at least 50 adenosine or cytosine nucleotides, at least 150 adenosine or cytosine nucleotides, at least 200 adenosine or cytosine nucleotides, at least 250 adenosine or cytosine nucleotides, at least 300 adenosine or cytosine nucleotides, at least 350 adenosine or cytosine nucleotides, at least 400 adenosine or cytosine nucleotides, at least 450 adenosine or cytosine nucleotides, at least 500 adenosine or cytosine nucleotides, at least 550 adenosine or cytosine nucleotides, at least 600 adenosine or cytosine nucleotides, at least 650 adenosine or cytosine nucleotides, at least 700 adenosine or cytosine nucleotides, at least 750 adenosine or cytosine nucleotides, at least 800 adenosine or cytosine nucleotides, at least 850 adenosine or cytosine nucleotides, at least 900 adenosine or cytosine nucleotides, at least 950 adenosine or cytosine nucleotides, or at least 1 kb adenosine or cytosine nucleotides, respectively. In some embodiments, a poly A or poly C tail may be about 10 to 800 adenosine or cytosine nucleotides (e.g., about 10 to 200 adenosine or cytosine nucleotides, about 10 to 300 adenosine or cytosine nucleotides, about 10 to 400 adenosine or cytosine nucleotides, about 10 to 500 adenosine or cytosine nucleotides, about 10 to 550 adenosine or cytosine nucleotides, about 10 to 600 adenosine or cytosine nucleotides, about 50 to 600 adenosine or cytosine nucleotides, about 100 to 600 adenosine or cytosine nucleotides, about 150 to 600 adenosine or cytosine nucleotides, about 200 to 600 adenosine or cytosine nucleotides, about 250 to 600 adenosine or cytosine nucleotides, about 300 to 600 adenosine or cytosine nucleotides, about 350 to 600 adenosine or cytosine nucleotides, about 400 to 600 adenosine or cytosine nucleotides, about 450 to 600 adenosine or cytosine nucleotides, about 500 to 600 adenosine or cytosine nucleotides, about 10 to 150 adenosine or cytosine nucleotides, about 10 to 100 adenosine or cytosine nucleotides, about 20 to 70 adenosine or cytosine nucleotides, or about 20 to 60 adenosine or cytosine nucleotides) respectively. In some embodiments, a tail structure includes is a combination of poly (A) and poly (C) tails with various lengths described herein. In some embodiments, a tail structure includes at least 50%, 55%, 65%, 70%, 75%, 80%, 85%, 90%, 92%, 94%, 95%, 96%, 97%, 98%, or 99% adenosine nucleotides. In some embodiments, a tail structure includes at least 50%, 55%, 65%, 70%, 75%, 80%, 85%, 90%, 92%, 94%, 95%, 96%, 97%, 98%, or 99% cytosine nucleotides.


Post-Synthesis Purification

Various methods may be used to purify mRNA after synthesis. In some embodiments, the mRNA is purified using Tangential Flow Filtration. Suitable purification methods include those described in published U.S. Application No. US 2016/0040154, published U.S. Application No. US 2015/0376220, published U.S. Application No. US 2018/0251755, published U.S. Application No. US 2018/0251754, U.S. Provisional Application No. 62/757,612 filed on Nov. 8, 2018, and U.S. Provisional Application No. 62/891,781 filed on Aug. 26, 2019, all of which are incorporated by reference herein and may be used to practice the present invention.


In some embodiments, the mRNA is purified before capping and tailing. In some embodiments, the mRNA is purified after capping and tailing. In some embodiments, the mRNA is purified both before and after capping and tailing.


In some embodiments, the mRNA is purified either before or after or both before and after capping and tailing, by centrifugation.


In some embodiments, the mRNA is purified either before or after or both before and after capping and tailing, by filtration.


In some embodiments, the mRNA is purified either before or after or both before and after capping and tailing, by Tangential Flow Filtration (TFF).


Lipid Nanoparticles

According to the present invention, an mRNA comprising an optimized nucleotide sequence of the invention may be delivered in a lipid nanoparticle. Typically, a lipid nanoparticle suitable for use with the present invention comprises one or more cationic lipids. In some embodiments, a lipid nanoparticle comprises one or more cationic lipids, one or more non-cationic lipids, one or more cholesterol-based lipids and one or more PEG-modified lipids. In some embodiments, a lipid nanoparticle comprises one or more cationic lipids, one or more non-cationic lipids, and one or more PEG-modified lipids. In some embodiments, a lipid nanoparticle comprises no more than four distinct lipid components.


A typical lipid nanoparticle for use with the invention is composed of four lipid components: a cationic lipid (e.g., a sterol-based cationic lipid), a non-cationic lipid (e.g., DOPE or DEPE), a cholesterol-based lipid (e.g., cholesterol) and a PEG-modified lipid (e.g., DMG-PEG2K). In some embodiments, a lipid nanoparticle comprises no more than three distinct lipid components. An exemplary lipid nanoparticle is composed of three lipid components: a cationic lipid (e.g., a sterol-based cationic lipid), a non-cationic lipid (e.g., DOPE or DEPE) and a PEG-modified lipid (e.g., DMG-PEG2K).


Formation of Lipid Nanoparticles Encapsulating mRNA


The lipid nanoparticles for use in the invention can be prepared by various techniques which are presently known in the art. For example, multilamellar vesicles (MLV) may be prepared according to conventional techniques, such as by depositing a selected lipid on the inside wall of a suitable container or vessel by dissolving the lipid in an appropriate solvent, and then evaporating the solvent to leave a thin film on the inside of the vessel or by spray drying. An aqueous phase may then be added to the vessel with a vortexing motion which results in the formation of MLVs. Unilamellar vesicles (ULV) can then be formed by homogenization, sonication or extrusion of the multilamellar vesicles. In addition, unilamellar vesicles can be formed by detergent removal techniques.


Various methods are described in published U.S. Application No. US 2011/0244026, published U.S. Application No. US 2016/0038432, published U.S. Application No. US 2018/0153822, published U.S. Application No. US 2018/0125989 and U.S. Provisional Application No. 62/877,597, filed Jul. 23, 2019 and can be used to practice the present invention, all of which are incorporated herein by reference. As used herein, Process A refers to a conventional method of encapsulating mRNA by mixing it with a mixture of lipids, without first pre-forming the lipids into lipid nanoparticles, as described in US 2016/0038432. As used herein, Process B refers to a process of encapsulating mRNA by mixing pre-formed lipid nanoparticles with mRNA, as described in US 2018/0153822.


Briefly, the process of preparing mRNA-loaded lipid nanoparticles includes a step of heating one or more of the solutions (i.e., applying heat from a heat source to the solution) to a temperature (or to maintain at a temperature) greater than ambient temperature, the one or more solutions being the solution comprising the pre-formed lipid nanoparticles, the solution comprising the mRNA and the mixed solution comprising the lipid nanoparticle encapsulated mRNA. In some embodiments, the process includes the step of heating one or both of the mRNA solution and the pre-formed lipid nanoparticle solution, prior to the mixing step. In some embodiments, the process includes heating one or more of the solution comprising the pre-formed lipid nanoparticles, the solution comprising the mRNA and the solution comprising the lipid nanoparticle encapsulated mRNA, during the mixing step. In some embodiments, the process includes the step of heating the lipid nanoparticle encapsulated mRNA, after the mixing step. In some embodiments, the temperature to which one or more of the solutions is heated (or at which one or more of the solutions is maintained) is or is greater than about 30° C., 37° C., 40° C., 45° C., 50° C., 55° C., 60° C., 65° C., or 70° C. In some embodiments, the temperature to which one or more of the solutions is heated ranges from about 25-70° C., about 30-70° C., about 35-70° C., about 40-70° C., about 45-70° C., about 50-70° C., or about 60-70° C. In some embodiments, the temperature greater than ambient temperature to which one or more of the solutions is heated is about 65° C.


Various methods may be used to prepare an mRNA solution suitable for the present invention. In some embodiments, mRNA may be directly dissolved in a buffer solution described herein. In some embodiments, an mRNA solution may be generated by mixing an mRNA stock solution with a buffer solution prior to mixing with a lipid solution for encapsulation. In some embodiments, an mRNA solution may be generated by mixing an mRNA stock solution with a buffer solution immediately before mixing with a lipid solution for encapsulation. In some embodiments, a suitable mRNA stock solution may contain mRNA in water at a concentration at or greater than about 0.2 mg/ml, 0.4 mg/ml, 0.5 mg/ml, 0.6 mg/ml, 0.8 mg/ml, 1.0 mg/ml, 1.2 mg/ml, 1.4 mg/ml, 1.5 mg/ml, or 1.6 mg/ml, 2.0 mg/ml, 2.5 mg/ml, 3.0 mg/ml, 3.5 mg/ml, 4.0 mg/ml, 4.5 mg/ml, or 5.0 mg/ml.


In some embodiments, an mRNA stock solution is mixed with a buffer solution using a pump. Exemplary pumps include but are not limited to gear pumps, peristaltic pumps and centrifugal pumps.


Typically, the buffer solution is mixed at a rate greater than that of the mRNA stock solution. For example, the buffer solution may be mixed at a rate at least 1×, 2×, 3×, 4×, 5×, 6×, 7×, 8×, 9×, 10×, 15×, or 20× greater than the rate of the mRNA stock solution. In some embodiments, a buffer solution is mixed at a flow rate ranging between about 100-6000 ml/minute (e.g., about 100-300 ml/minute, 300-600 ml/minute, 600-1200 ml/minute, 1200-2400 ml/minute, 2400-3600 ml/minute, 3600-4800 ml/minute, 4800-6000 ml/minute, or 60-420 ml/minute). In some embodiments, a buffer solution is mixed at a flow rate of or greater than about 60 ml/minute, 100 ml/minute, 140 ml/minute, 180 ml/minute, 220 ml/minute, 260 ml/minute, 300 ml/minute, 340 ml/minute, 380 ml/minute, 420 ml/minute, 480 ml/minute, 540 ml/minute, 600 ml/minute, 1200 ml/minute, 2400 ml/minute, 3600 ml/minute, 4800 ml/minute, or 6000 ml/minute.


In some embodiments, an mRNA stock solution is mixed at a flow rate ranging between about 10-600 ml/minute (e.g., about 5-50 ml/minute, about 10-30 ml/minute, about 30-60 ml/minute, about 60-120 ml/minute, about 120-240 ml/minute, about 240-360 ml/minute, about 360-480 ml/minute, or about 480-600 ml/minute). In some embodiments, an mRNA stock solution is mixed at a flow rate of or greater than about 5 ml/minute, 10 ml/minute, 15 ml/minute, 20 ml/minute, 25 ml/minute, 30 ml/minute, 35 ml/minute, 40 ml/minute, 45 ml/minute, 50 ml/minute, 60 ml/minute, 80 ml/minute, 100 ml/minute, 200 ml/minute, 300 ml/minute, 400 ml/minute, 500 ml/minute, or 600 ml/minute.


According to the present invention, a lipid solution contains a mixture of lipids suitable to form lipid nanoparticles for encapsulation of mRNA. In some embodiments, a suitable lipid solution is ethanol based. For example, a suitable lipid solution may contain a mixture of desired lipids dissolved in pure ethanol (i.e., 100% ethanol). In another embodiment, a suitable lipid solution is isopropyl alcohol based. In another embodiment, a suitable lipid solution is dimethylsulfoxide-based. In another embodiment, a suitable lipid solution is a mixture of suitable solvents including, but not limited to, ethanol, isopropyl alcohol and dimethylsulfoxide.


A suitable lipid solution may contain a mixture of desired lipids at various concentrations. For example, a suitable lipid solution may contain a mixture of desired lipids at a total concentration of or greater than about 0.1 mg/ml, 0.5 mg/ml, 1.0 mg/ml, 2.0 mg/ml, 3.0 mg/ml, 4.0 mg/ml, 5.0 mg/ml, 6.0 mg/ml, 7.0 mg/ml, 8.0 mg/ml, 9.0 mg/ml, 10 mg/ml, 15 mg/ml, 20 mg/ml, 30 mg/ml, 40 mg/ml, 50 mg/ml, or 100 mg/ml. In some embodiments, a suitable lipid solution may contain a mixture of desired lipids at a total concentration ranging from about 0.1-100 mg/ml, 0.5-90 mg/ml, 1.0-80 mg/ml, 1.0-70 mg/ml, 1.0-60 mg/ml, 1.0-50 mg/ml, 1.0-40 mg/ml, 1.0-30 mg/ml, 1.0-20 mg/ml, 1.0-15 mg/ml, 1.0-10 mg/ml, 1.0-9 mg/ml, 1.0-8 mg/ml, 1.0-7 mg/ml, 1.0-6 mg/ml, or 1.0-5 mg/ml. In some embodiments, a suitable lipid solution may contain a mixture of desired lipids at a total concentration up to about 100 mg/ml, 90 mg/ml, 80 mg/ml, 70 mg/ml, 60 mg/ml, 50 mg/ml, 40 mg/ml, 30 mg/ml, 20 mg/ml, or 10 mg/ml.


Any desired lipids may be mixed at any ratios suitable for encapsulating mRNA. In some embodiments, a suitable lipid solution contains a mixture of desired lipids including cationic lipids, non-cationic lipids, cholesterol-based lipids, amphiphilic block copolymers (e.g. poloxamers) and/or PEG-modified lipids. In some embodiments, a suitable lipid solution contains a mixture of desired lipids including one or more cationic lipids, one or more non-cationic lipids, one or more cholesterol-based lipids, and/or one or more PEG-modified lipids.


In some embodiments, provided pharmaceutical compositions comprise a lipid nanoparticle wherein the mRNA are associated on both the surface of the lipid nanoparticle and encapsulated within the same lipid nanoparticle. For example, during preparation of the pharmaceutical compositions of the present invention, cationic lipid nanoparticles may associate with the mRNA through electrostatic interactions.


In some embodiments, the compounds, pharmaceutical compositions and methods of the invention comprise mRNA encapsulated in a lipid nanoparticle. In some embodiments, the mRNA may be encapsulated in the same lipid nanoparticle. In some embodiments, the mRNA may be encapsulated in different lipid nanoparticles. In some embodiments, the mRNA is encapsulated in one or more lipid nanoparticles, which differ in their lipid composition, molar ratio of lipid components, size, charge (zeta potential), targeting ligands and/or combinations thereof. In some embodiments, the one or more lipid nanoparticles may have a different composition of sterol-based cationic lipids, neutral lipids, PEG-modified lipids and/or combinations thereof. In some embodiments the one or more lipid nanoparticles may have a different molar ratio of cholesterol-based lipids, cationic lipids, neutral lipids, and PEG-modified lipids used to create the lipid nanoparticles.


The process of incorporation of a desired mRNA into a lipid nanoparticle is often referred to as “loading”. Exemplary methods are described in Lasic, et al. FEBS Lett., 312:255-258, 1992, which is incorporated herein by reference. The lipid nanoparticle-incorporated nucleic acids may be completely or partially located in the interior space of the lipid nanoparticle, within the bilayer membrane of the lipid nanoparticle, or associated with the exterior surface of the lipid nanoparticle membrane. The incorporation of an mRNA into lipid nanoparticles is also referred to herein as “encapsulation” wherein the nucleic acid is entirely contained within the interior space of the lipid nanoparticle. The purpose of incorporating an mRNA into a lipid nanoparticle is often to protect the mRNA from an environment which may contain enzymes or chemicals that degrade mRNA and/or systems or receptors that cause the rapid excretion of the mRNA. Accordingly, in some embodiments, a suitable lipid nanoparticle is capable of enhancing the stability of the mRNA contained therein and/or facilitate the delivery of an mRNA to the target cell or tissue.


Suitable lipid nanoparticles in accordance with the present invention may be made in various sizes. In some embodiments, provided lipid nanoparticles may be made smaller than previously known lipid nanoparticles. In some embodiments, decreased size of lipid nanoparticles is associated with more efficient delivery of an mRNA. Selection of an appropriate lipid nanoparticle size may take into consideration the site of the target cell or tissue and to some extent the application for which the lipid nanoparticle is being made.


In some embodiments, an appropriate size of lipid nanoparticle is selected to facilitate systemic distribution of the mRNA. Alternatively or additionally, a lipid nanoparticle may be sized such that the dimensions of the lipid nanoparticle are of a sufficient diameter to limit or expressly avoid distribution into certain cells or tissues.


A variety of alternative methods known in the art are available for sizing of a population of lipid nanoparticles. One such sizing method is described in U.S. Pat. No. 4,737,323, incorporated herein by reference. Sonicating a lipid nanoparticles suspension either by bath or probe sonication produces a progressive size reduction down to small ULV less than about 0.05 microns in diameter. Homogenization is another method that relies on shearing energy to fragment large lipid nanoparticles into smaller ones. In a typical homogenization procedure, MLV are recirculated through a standard emulsion homogenizer until selected lipid nanoparticle sizes, typically between about 0.1 and 0.5 microns, are observed. The size of the lipid nanoparticles may be determined by quasi-electric light scattering (QELS) as described in Bloomfield, Ann. Rev. Biophys. Bioeng., 10:421-450 (1981), incorporated herein by reference. Average lipid nanoparticle diameter may be reduced by sonication of formed lipid nanoparticles. Intermittent sonication cycles may be alternated with QELS assessment to guide efficient lipid nanoparticle synthesis.


Lipid Nanoparticle Formulations

In some embodiments, the majority of purified lipid nanoparticles in a pharmaceutical composition, i.e., greater than about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% of the lipid nanoparticles, have a size of about 150 nm (e.g., about 145 nm, about 140 nm, about 135 nm, about 130 nm, about 125 nm, about 120 nm, about 115 nm, about 110 nm, about 105 nm, about 100 nm, about 95 nm, about 90 nm, about 85 nm, or about 80 nm). In some embodiments, substantially all of the purified lipid nanoparticles have a size of about 150 nm (e.g., about 145 nm, about 140 nm, about 135 nm, about 130 nm, about 125 nm, about 120 nm, about 115 nm, about 110 nm, about 105 nm, about 100 nm, about 95 nm, about 90 nm, about 85 nm, or about 80 nm).


In some embodiments, a lipid nanoparticle has an average size of less than 150 nm. In some embodiments, a lipid nanoparticle has an average size of less than 120 nm. In some embodiments, a lipid nanoparticle has an average size of less than 100 nm. In some embodiments, a lipid nanoparticle has an average size of less than 90 nm. In some embodiments, a lipid nanoparticle has an average size of less than 80 nm. In some embodiments, a lipid nanoparticle has an average size of less than 70 nm. In some embodiments, a lipid nanoparticle has an average size of less than 60 nm. In some embodiments, a lipid nanoparticle has an average size of less than 50 nm. In some embodiments, a lipid nanoparticle has an average size of less than 30 nm. In some embodiments, a lipid nanoparticle has an average size of less than 20 nm.


In some embodiments, greater than about 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% of the lipid nanoparticles in a pharmaceutical composition provided by the present invention have a size ranging from about 40-90 nm (e.g., about 45-85 nm, about 50-80 nm, about 55-75 nm, about 60-70 nm). In some embodiments, substantially all of the lipid nanoparticles have a size ranging from about 40-90 nm (e.g., about 45-85 nm, about 50-80 nm, about 55-75 nm, about 60-70 nm). Compositions with lipid nanoparticles having an average size of about 50-70 nm (e.g., 55-65 nm) are particular suitable for pulmonary delivery via nebulization.


In some embodiments, the dispersity, or measure of heterogeneity in size of molecules (PDI), of lipid nanoparticles in a pharmaceutical composition provided by the present invention is less than about 0.5. In some embodiments, a lipid nanoparticle has a PDI of less than about 0.5.


In some embodiments, a lipid nanoparticle has a PDI of less than about 0.4. In some embodiments, a lipid nanoparticle has a PDI of less than about 0.3. In some embodiments, a lipid nanoparticle has a PDI of less than about 0.28. In some embodiments, a lipid nanoparticle has a PDI of less than about 0.25. In some embodiments, a lipid nanoparticle has a PDI of less than about 0.23. In some embodiments, a lipid nanoparticle has a PDI of less than about 0.20. In some embodiments, a lipid nanoparticle has a PDI of less than about 0.18. In some embodiments, a lipid nanoparticle has a PDI of less than about 0.16. In some embodiments, a lipid nanoparticle has a PDI of less than about 0.14. In some embodiments, a lipid nanoparticle has a PDI of less than about 0.12. In some embodiments, a lipid nanoparticle has a PDI of less than about 0.10. In some embodiments, a lipid nanoparticle has a PDI of less than about 0.08.


In some embodiments, greater than about 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% of the purified lipid nanoparticles in a pharmaceutical composition provided by the present invention encapsulate an mRNA within each individual particle. In some embodiments, substantially all of the purified lipid nanoparticles in a pharmaceutical composition encapsulate an mRNA within each individual particle. In some embodiments, a lipid nanoparticle has an encapsulation efficiency of between 50% and 99%. In some embodiments, a lipid nanoparticle has an encapsulation efficiency of greater than about 60%. In some embodiments, a lipid nanoparticle has an encapsulation efficiency of greater than about 65%. In some embodiments, a lipid nanoparticle has an encapsulation efficiency of greater than about 70%. In some embodiments, a lipid nanoparticle has an encapsulation efficiency of greater than about 75%. In some embodiments, a lipid nanoparticle has an encapsulation efficiency of greater than about 80%. In some embodiments, a lipid nanoparticle has an encapsulation efficiency of greater than about 85%. In some embodiments, a lipid nanoparticle has an encapsulation efficiency of greater than about 90%. In some embodiments, a lipid nanoparticle has an encapsulation efficiency of greater than about 92%. In some embodiments, a lipid nanoparticle has an encapsulation efficiency of greater than about 95%. In some embodiments, a lipid nanoparticle has an encapsulation efficiency of greater than about 98%. In some embodiments, a lipid nanoparticle has an encapsulation efficiency of greater than about 99%. Typically, lipid nanoparticles for use with the invention have an encapsulation efficiency of at least 90%-95%.


In some embodiments, a lipid nanoparticle has a N/P ratio of between 1 and 10. In some embodiments, a lipid nanoparticle has a N/P ratio above 1. In some embodiments, a lipid nanoparticle has a N/P ratio of about 1. In some embodiments, a lipid nanoparticle has a N/P ratio of about 2. In some embodiments, a lipid nanoparticle has a N/P ratio of about 3. In some embodiments, a lipid nanoparticle has a N/P ratio of about 4. In some embodiments, a lipid nanoparticle has a N/P ratio of about 5. In some embodiments, a lipid nanoparticle has a N/P ratio of about 6. In some embodiments, a lipid nanoparticle has a N/P ratio of about 7. In some embodiments, a lipid nanoparticle has a N/P ratio of about 8. A typical lipid nanoparticle for use with the invention has an N/P ratio of about 4.


In some embodiments, a pharmaceutical composition according to the present invention contains at least about 0.5 mg, 1 mg, 5 mg, 10 mg, 100 mg, 500 mg, or 1000 mg of encapsulated mRNA. In some embodiments, a pharmaceutical composition contains about 0.1 mg to 1000 mg of encapsulated mRNA. In some embodiments, a pharmaceutical composition contains at least about 0.5 mg of encapsulated mRNA. In some embodiments, a pharmaceutical composition contains at least about 0.8 mg of encapsulated mRNA. In some embodiments, a pharmaceutical composition contains at least about 1 mg of encapsulated mRNA. In some embodiments, a pharmaceutical composition contains at least about 5 mg of encapsulated mRNA. In some embodiments, a pharmaceutical composition contains at least about 8 mg of encapsulated mRNA.


In some embodiments, a pharmaceutical composition contains at least about 10 mg of encapsulated mRNA. In some embodiments, a pharmaceutical composition contains at least about 50 mg of encapsulated mRNA. In some embodiments, a pharmaceutical composition contains at least about 100 mg of encapsulated mRNA. In some embodiments, a pharmaceutical composition contains at least about 500 mg of encapsulated mRNA. In some embodiments, a pharmaceutical composition contains at least about 1000 mg of encapsulated mRNA.


Cationic Lipids

Suitable cationic lipids for use in the pharmaceutical compositions and methods of the invention include the cationic lipids as described in International Patent Publication WO 2010/144740, which is incorporated herein by reference. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid, (6Z,9Z,28Z,31Z)-heptatriaconta-6,9,28,31-tetraen-19-yl 4-(dimethylamino) butanoate, having a compound structure of:




embedded image


and pharmaceutically acceptable salts thereof.


Other suitable cationic lipids for use in the pharmaceutical compositions and methods of the present invention include ionizable cationic lipids as described in International Patent Publication WO 2013/149140, which is incorporated herein by reference. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid of one of the following formulas:




embedded image


or a pharmaceutically acceptable salt thereof, wherein R1 and R2 are each independently selected from the group consisting of hydrogen, an optionally substituted, variably saturated or unsaturated C1-C20 alkyl and an optionally substituted, variably saturated or unsaturated C6-C20 acyl; wherein L1 and L2 are each independently selected from the group consisting of hydrogen, an optionally substituted C1-C30 alkyl, an optionally substituted variably unsaturated C1-C30 alkenyl, and an optionally substituted C1-C30 alkynyl; wherein m and o are each independently selected from the group consisting of zero and any positive integer (e.g., where m is three); and wherein n is zero or any positive integer (e.g., where n is one). In some embodiments, the pharmaceutical compositions and methods of the present invention include the cationic lipid (15Z, 18Z)—N,N-dimethyl-6-(9Z,12Z)-octadeca-9,12-dien-1-yl) tetracosa-15,18-dien-1-amine (“HGT5000”), having a compound structure of:




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include the cationic lipid (15Z, 18Z)—N,N-dimethyl-6-((9Z,12Z)-octadeca-9,12-dien-1-yl) tetracosa-4,15,18-trien-1-amine (“HGT5001”), having a compound structure of:




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include the cationic lipid and (15Z,18Z)—N,N-dimethyl-6-((9Z,12Z)-octadeca-9,12-dien-1-yl) tetracosa-5,15,18-trien-1-amine (“HGT5002”), having a compound structure of:




embedded image


and pharmaceutically acceptable salts thereof.


Other suitable cationic lipids for use in the pharmaceutical compositions and methods of the invention include cationic lipids described as aminoalcohol lipidoids in International Patent Publication WO 2010/053572, which is incorporated herein by reference. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having a compound structure of:




embedded image


and pharmaceutically acceptable salts thereof.


Other suitable cationic lipids for use in the pharmaceutical compositions and methods of the invention include the cationic lipids as described in International Patent Publication WO 2016/118725, which is incorporated herein by reference. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having a compound structure of:




embedded image


and pharmaceutically acceptable salts thereof.


Other suitable cationic lipids for use in the pharmaceutical compositions and methods of the invention include the cationic lipids as described in International Patent Publication WO 2016/118724, which is incorporated herein by reference. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having a compound structure of:




embedded image


and pharmaceutically acceptable salts thereof.


Other suitable cationic lipids for use in the pharmaceutical compositions and methods of the invention include a cationic lipid having the formula of 14,25-ditridecyl 15,18,21,24-tetraaza-octatriacontane, and pharmaceutically acceptable salts thereof.


Other suitable cationic lipids for use in the pharmaceutical compositions and methods of the invention include the cationic lipids as described in International Patent Publications WO 2013/063468 and WO 2016/205691, each of which are incorporated herein by reference. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid of the following formula:




embedded image


or pharmaceutically acceptable salts thereof, wherein each instance of RL is independently optionally substituted C6-C40 alkenyl. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having a compound structure of:




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having a compound structure of:




embedded image


and pharmaceutically acceptable salts thereof.


In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having a compound structure of:




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having a compound structure of:




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having a compound structure of:




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having a compound structure of:




embedded image


and pharmaceutically acceptable salts thereof.


Other suitable cationic lipids for use in the pharmaceutical compositions and methods of the invention include the cationic lipids as described in International Patent Publication WO 2015/184256, which is incorporated herein by reference. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid of the following formula:




embedded image


or a pharmaceutically acceptable salt thereof, wherein each X independently is O or S; each Y independently is O or S; each m independently is 0 to 20; each n independently is 1 to 6; each RA is independently hydrogen, optionally substituted C1-50 alkyl, optionally substituted C2-50 alkenyl, optionally substituted C2-50 alkynyl, optionally substituted C3-10 carbocyclyl, optionally substituted 3-14 membered heterocyclyl, optionally substituted C6-14 aryl, optionally substituted 5-14 membered heteroaryl or halogen; and each RB is independently hydrogen, optionally substituted C1-50 alkyl, optionally substituted C2-50 alkenyl, optionally substituted C2-50 alkynyl, optionally substituted C3-10 carbocyclyl, optionally substituted 3-14 membered heterocyclyl, optionally substituted C6-14 aryl, optionally substituted 5-14 membered heteroaryl or halogen. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid, “Target 23”, having a compound structure of:




embedded image


and pharmaceutically acceptable salts thereof.


Other suitable cationic lipids for use in the pharmaceutical compositions and methods of the invention include the cationic lipids as described in International Patent Publication WO 2016/004202, which is incorporated herein by reference. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having the compound structure:




embedded image


or a pharmaceutically acceptable salt thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having the compound structure:




embedded image


or a pharmaceutically acceptable salt thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having the compound structure:




embedded image


or a pharmaceutically acceptable salt thereof.


Other suitable cationic lipids for use in the pharmaceutical compositions and methods of the present invention include cationic lipids as described in U.S. Provisional Patent Application Ser. No. 62/758,179, filed on Nov. 9, 2018, and Provisional Patent Application Ser. No. 62/871,510, filed on Jul. 8, 2019, which are incorporated herein by reference. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid of the following formula:




embedded image


or a pharmaceutically acceptable salt thereof, wherein each R1 and R2 is independently H or C1-C6 aliphatic; each m is independently an integer having a value of 1 to 4; each A is independently a covalent bond or arylene; each L′ is independently an ester, thioester, disulfide, or anhydride group; each L2 is independently C2-C10 aliphatic; each X1 is independently H or OH; and each R3 is independently C6-C20 aliphatic. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid of the following formula:




embedded image


or a pharmaceutically acceptable salt thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid of the following formula:




embedded image


or a pharmaceutically acceptable salt thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid of the following formula:




embedded image


or a pharmaceutically acceptable salt thereof.


Other suitable cationic lipids for use in the pharmaceutical compositions and methods of the present invention include the cationic lipids as described in J. McClellan, M. C. King, Cell 2010, 141, 210-217 and in Whitehead et al., Nature Communications (2014) 5:4277, which is incorporated herein by reference. In some embodiments, the cationic lipids of the pharmaceutical compositions and methods of the present invention include a cationic lipid having a compound structure of:




embedded image


and pharmaceutically acceptable salts thereof.


Other suitable cationic lipids for use in the pharmaceutical compositions and methods of the invention include the cationic lipids as described in International Patent Publication WO 2015/199952, which is incorporated herein by reference. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having the compound structure:




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having the compound structure:




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having the compound structure:




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having the compound




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having the compound structure:




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having the compound structure:




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having the compound structure:




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having the compound structure:




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having the compound structure:




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having the compound structure:




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having the compound




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having the compound structure:




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having the compound structure:




embedded image


and pharmaceutically acceptable salts thereof.


Other suitable cationic lipids for use in the pharmaceutical compositions and methods of the invention include the cationic lipids as described in International Patent Publication WO 2017/004143, which is incorporated herein by reference. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having the compound structure:




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having the compound structure:




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having the compound structure:




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having the compound structure:




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having the compound




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having the compound structure:




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having the compound structure:




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having the compound structure:




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having the compound




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having the compound structure:




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having the compound structure:




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having the compound




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having the compound structure:




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having the compound structure:




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having the compound




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having the compound structure:




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having the compound structure:




embedded image


and pharmaceutically acceptable salts thereof.


Other suitable cationic lipids for use in the pharmaceutical compositions and methods of the invention include the cationic lipids as described in International Patent Publication WO 2017/075531, which is incorporated herein by reference. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid of the following formula:




embedded image


or a pharmaceutically acceptable salt thereof, wherein one of L1 or L2 is —O(C═O)—, —(C═O)O—, —C(═O)—, —O—, —S(O)x, —S—S—, —C(═O)S—, —SC(═O)—, —NRaC(═O)—, —C(═O)NRa—, NRaC(═O)NRa—, —OC(═O)NRa—, or —NRaC(═O)O—; and the other of L1 or L2 is —O(C═O)—, —(C═O)O—, —C(═O)—, —O—, —S(O)x, —S—S—, —C(═O)S—, SC(═O)—, —NRaC(═O)—, —C(═O)NRa—, —NRaC(═O)NRa—, —OC(═O)NRa— or —NRaC(═O)O— or a direct bond; G1 and G2 are each independently unsubstituted C1-C12 alkylene or C1-C12 alkenylene; G3 is C1-C24 alkylene, C1-C24 alkenylene, C3-C8 cycloalkylene, C3-C8 cycloalkenylene; Ra is H or C1-C12 alkyl; R1 and R2 are each independently C6-C24 alkyl or C6-C24 alkenyl; R3 is H, OR5, CN, —C(═O)OR4, —OC(═O)R4 or —NR5C(═O)R4; R4 is C1-C12 alkyl; R5 is H or C1-C6 alkyl; and x is 0, 1 or 2.


Other suitable cationic lipids for use in the pharmaceutical compositions and methods of the invention include the cationic lipids as described in International Patent Publication WO 2017/117528, which is incorporated herein by reference. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having the compound structure:




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having the compound structure:




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having the compound structure:




embedded image


and pharmaceutically acceptable salts thereof.


Other suitable cationic lipids for use in the pharmaceutical compositions and methods of the invention include the cationic lipids as described in International Patent Publication WO 2017/049245, which is incorporated herein by reference. In some embodiments, the cationic lipids of the pharmaceutical compositions and methods of the present invention include a compound of one of the following formulas:




embedded image


and pharmaceutically acceptable salts thereof. For any one of these four formulas, R4 is independently selected from —(CH2)nQ and —(CH2)nCHQR; Q is selected from the group consisting of —OR, —OH, —O(CH2)nN(R)2, —OC(O)R, —CX3, —CN, —N(R) C(O)R, —N(H)C(O)R, —N(R)S(O)2R, —N(H)S(O)2R, —N(R)C(O)N(R)2, —N(H)C(O)N(R)2, —N(H)C(O)N(H)(R), —N(R)C(S)N(R)2, —N(H)C(S)N(R)2, —N(H)C(S)N(H)(R), and a heterocycle; and n is 1, 2, or 3. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having a compound structure of:




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having a compound structure of:




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having a compound structure of:




embedded image


pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having a compound structure of:




embedded image


and pharmaceutically acceptable salts thereof.


Other suitable cationic lipids for use in the pharmaceutical compositions and methods of the invention include the cationic lipids as described in International Patent Publication WO 2017/173054 and WO 2015/095340, each of which is incorporated herein by reference. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having a compound structure of:




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having a compound structure of:




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having a compound structure of:




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having a compound structure of:




embedded image


and pharmaceutically acceptable salts thereof.


Other suitable cationic lipids for use in the pharmaceutical compositions and methods of the present invention include cationic lipids as described in U.S. Provisional Patent Application Ser. No. 62/865,555, filed on Jun. 24, 2019, which is incorporated herein by reference. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having a compound structure of:




embedded image


and pharmaceutically acceptable salts thereof.


Other suitable cationic lipids for use in the pharmaceutical compositions and methods of the present invention include cationic lipids as described in U.S. Provisional Patent Application Ser. No. 62/864,818, filed on Jun. 21, 2019, which is incorporated herein by reference. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having a compound structure according to the following formula:




embedded image


or a pharmaceutically acceptable salt thereof, wherein each of R2, R3, and R4 is independently C6-C30 alkyl, C6-C30 alkenyl, or C6-C30 alkynyl; L1 is C1-C30 alkylene; C2-C30 alkenylene; or C2-C30 alkynylene and B1 is an ionizable nitrogen-containing group. In embodiments, L1 is C1-C10 alkylene. In embodiments, L1 is unsubstituted C1-C10 alkylene. In embodiments, L1 is (CH2)2, (CH2)3, (CH2)4, or (CH2)5. In embodiments, L1 is (CH2), (CH2)6, (CH2)7, (CH2)8, (CH2)9, or (CH2)10. In embodiments, B1 is independently NH2, guanidine, amidine, a mono- or dialkylamine, 5- to 6-membered nitrogen-containing heterocycloalkyl, or 5- to 6-membered nitrogen-containing heteroaryl. In embodiments, B1 is




embedded image


In embodiments, B1 is




embedded image


In embodiments, B1 is




embedded image


In embodiments, each of R2, R3, and R4 is independently unsubstituted linear C6-C22 alkyl, unsubstituted linear C6-C22 alkenyl, unsubstituted linear C6-C22 alkynyl, unsubstituted branched C6-C22 alkyl, unsubstituted branched C6-C22 alkenyl, or unsubstituted branched C6-C22 alkynyl. In embodiments, each of R2, R3, and R4 is unsubstituted C6-C22 alkyl. In embodiments, each of R2, R3, and R4 is —C6H13, —C7H15, —C8H17, —C9H19, —C10H21, —C11H23, —C12H25, —C13H27, —C14H29, —C15H31, —C16H33, —C17H35, —C18H37, —C19H39, —C20H41, —C21H43, —C22H45, —C23H47, —C24H49, or —C25H51. In embodiments, each of R2, R3, and R4 is independently C6-C12 alkyl substituted by —O(CO)R5 or —C(O)OR5, wherein R5 is unsubstituted C6-C14 alkyl. In embodiments, each of R2, R3, and R4 is unsubstituted C6-C22 alkenyl. In embodiments, each of R2, R3, and R4 is —(CH2)4CH═CH2, —(CH2)5CH═CH2, —(CH2)6CH═CH2, —(CH2)7CH═CH2, —(CH2)8CH═CH2, —(CH2)9CH═CH2, —(CH2)10CH═CH2, —(CH2)11CH═CH2, —(CH2)12CH═CH2, —(CH2)13CH═CH2, —(CH2)14CH═CH2, —(CH2)15CH═CH2, —(CH2)16CH═CH2, —(CH2)17CH═CH2, —(CH2)18CH═CH2, —(CH2)7CH═CH(CH2)3CH3, —(CH2)7CH═CH(CH2)5CH3, —(CH2)4CH═CH(CH2)8CH3, —(CH2)7CH═CH(CH2)7CH3, —(CH2)6CH═CHCH2CH═CH(CH2)4CH3, —(CH2)7CH═CHCH2CH═CH(CH2)4CH3, —(CH2)7CH═CHCH2CH═CHCH2CH═CHCH2CH3, —(CH2)3CH═CHCH2CH═CHCH2CH═CHCH2CH═CH(CH2)4CH3, —(CH2)3CH═CHCH2CH═CHCH2CH═CHCH2CH═CHCH2CH═CHCH2CH3, —(CH2)11CH═CH(CH2)/CH3, or —(CH2)2CH═CHCH2CH═CHCH2CH═CHCH2CH═CHCH2CH═CHCH2CH═CHCH2CH3.


In embodiments, said C6-C22 alkenyl is a monoalkenyl, a dienyl, or a trienyl. In embodiments, each of R2, R3, and R4 is




embedded image


In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having a compound structure of:




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having a compound structure of:




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having a compound structure of:




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid having a compound structure of:




embedded image


and pharmaceutically acceptable salts thereof.


Other suitable cationic lipids for use in the pharmaceutical compositions and methods of the present invention include cleavable cationic lipids as described in International Patent Publication WO 2012/170889, which is incorporated herein by reference. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid of the following formula:




embedded image


wherein R1 is selected from the group consisting of imidazole, guanidinium, amino, imine, enamine, an optionally-substituted alkyl amino (e.g., an alkyl amino such as dimethylamino) and pyridyl; wherein R2 is selected from the group consisting of one of the following two formulas:




embedded image


and wherein R3 and R4 are each independently selected from the group consisting of an optionally substituted, variably saturated or unsaturated C6-C20 alkyl and an optionally substituted, variably saturated or unsaturated C6-C20 acyl; and wherein n is zero or any positive integer (e.g., one, two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, twenty or more). In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid, “HGT4001”, having a compound structure of:




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid, “HGT4002”, having a compound structure of:




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid, “HGT4003,” having a compound structure of:




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid, “HGT4004,” having a compound structure of:




embedded image


and pharmaceutically acceptable salts thereof. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid “HGT4005,” having a compound structure of:




embedded image


and pharmaceutically acceptable salts thereof.


Other suitable cationic lipids for use in the pharmaceutical compositions and methods of the present invention include cleavable cationic lipids as described in International Patent Publication WO 2019/222424, and incorporated herein by reference. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid that is any of general formulas or any of structures (1a) (21a) and (1b)-(21b) and (22)-(237) described in International Patent Publication WO 2019/222424. In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid that has a structure according to Formula (I′),




embedded image


wherein:

    • RX is independently —H, -L1-R1, or -L5A-L5B-B′;
    • each of L1, L2, and L3 is independently a covalent bond, —C(O)—, —C(O)O—, —C(O)S—, or —C(O)NRL—;
    • each L4A and L5A is independently-C(O)—, —C(O)O—, or —C(O)NRL—;
    • each L4B and L5B is independently C1-C20 alkylene; C2-C20 alkenylene; or C2-C20 alkynylene;
    • each B and B′ is NR4R5 or a 5- to 10-membered nitrogen-containing heteroaryl;
    • each R1, R2, and R3 is independently C6-C30 alkyl, C6-C30 alkenyl, or C6-C30 alkynyl;
    • each R4 and R5 is independently hydrogen, C1-C10 alkyl; C2-C10 alkenyl; or C2-C10 alkynyl; and
    • each RL is independently hydrogen, C1-C20 alkyl, C2-C20 alkenyl, or C2-C20 alkynyl.


      In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid that is Compound (139) of International Application No. PCT/US2019/032522, having a compound structure of:




embedded image


(“18:1 Carbon tail-ribose lipid”).


In some embodiments, the pharmaceutical compositions and methods of the present invention include a cationic lipid that is RL3-DMA-07D having a compound structure of:




embedded image


and pharmaceutically acceptable salts thereof.


In some embodiments, the pharmaceutical compositions and methods of the present invention include the cationic lipid, N-[1-(2,3-dioleyloxy) propyl]-N,N,N-trimethylammonium chloride (“DOTMA”). (Feigner et al. (Proc. Nat'l Acad. Sci. 84, 7413 (1987); U.S. Pat. No. 4,897,355, which is incorporated herein by reference). Other cationic lipids suitable for the pharmaceutical compositions and methods of the present invention include, for example, 5-carboxyspermylglycinedioctadecylamide (“DOGS”); 2,3-dioleyloxy-N-[2 (spermine-carboxamido) ethyl]-N,N-dimethyl-1-propanaminium (“DOSPA”) (Behr et al. Proc. Nat.′l Acad. Sci. 86, 6982 (1989), U.S. Pat. Nos. 5,171,678; 5,334,761); 1,2-Dioleoyl-3-Dimethylammonium-Propane (“DODAP”); 1,2-Dioleoyl-3-Trimethylammonium-Propane (“DOTAP”).


Additional exemplary cationic lipids suitable for the pharmaceutical compositions and methods of the present invention also include: 1,2-distearyloxy-N,N-dimethyl-3-aminopropane (“DSDMA”); 1,2-dioleyloxy-N,N-dimethyl-3-aminopropane (“DODMA”); 1,2-dilinoleyloxy-N,N-dimethyl-3-aminopropane (“DLinDMA”); 1,2-dilinolenyloxy-N,N-dimethyl-3-aminopropane (“DLenDMA”); N-dioleyl-N,N-dimethylammonium chloride (“DODAC”); N,N-distearyl-N,N-dimethylammonium bromide (“DDAB”); N-(1,2-dimyristyloxyprop-3-yl)-N,N-dimethyl-N-hydroxyethyl ammonium bromide (“DMRIE”); 3-dimethylamino-2-(cholest-5-en-3-beta-oxybutan-4-oxy)-1-(cis, cis-9,12-octadecadienoxy) propane (“CLinDMA”); 2-[5′-(cholest-5-en-3-beta-oxy)-3′-oxapentoxy)-3-dimethy 1-1-(cis, cis-9′, 1-2′-octadecadienoxy) propane (“CpLinDMA”); N,N-dimethyl-3,4-dioleyloxybenzylamine (“DMOBA”); 1,2-N,N′-dioleylcarbamyl-3-dimethylaminopropane (“DOcarbDAP”); 2,3-Dilinoleoyloxy-N,N-dimethylpropylamine (“DLinDAP”); 1,2-N,N′-Dilinoleylcarbamyl-3-dimethylaminopropane (“DLincarbDAP”); 1,2-Dilinoleoylcarbamyl-3-dimethylaminopropane (“DLinCDAP”); 2,2-dilinoleyl-4-dimethylaminomethyl-[1,3]-dioxolane (“DLin-K-DMA”); 2-((8-[(3P)-cholest-5-en-3-yloxy]octyl)oxy)-N, N-dimethyl-3-[(9Z, 12Z)-octadeca-9, 12-dien-1-yloxy]propane-1-amine (“Octyl-CLinDMA”); (2R)-2-((8-[(3beta)-cholest-5-en-3-yloxy]octyl)oxy)-N, N-dimethyl-3-[(9Z, 12Z)-octadeca-9, 12-dien-1-yloxy]propan-1-amine (“Octyl-CLinDMA (2R)”); (2S)-2-((8-[(3P)-cholest-5-en-3-yloxy]octyl)oxy)-N, fsl-dimethyh3-[(9Z, 12Z)-octadeca-9, 12-dien-1-yloxy]propan-1-amine (“Octyl-CLinDMA (2S)”); 2,2-dilinoleyl-4-dimethylaminoethyl-[1,3]-dioxolane (“DLin-K-XTC2-DMA”); and 2-(2,2-di((9Z,12Z)-octadeca-9,1 2-dien-1-yl)-1,3-dioxolan-4-yl)-N,N-dimethylethanamine (“DLin-KC2-DMA”) (see, WO 2010/042877, which is incorporated herein by reference; Semple et al., Nature Biotech. 28:172-176 (2010)). (Heyes, J., et al., J Controlled Release 107:276-287 (2005); Morrissey, D V., et al., Nat. Biotechnol. 23 (8): 1003-1007 (2005); International Patent Publication WO 2005/121348). In some embodiments, one or more of the cationic lipids comprise at least one of an imidazole, dialkylamino, or guanidinium moiety.


In some embodiments, one or more cationic lipids suitable for the pharmaceutical compositions and methods of the present invention include 2,2-Dilinoley 1-4-dimethylaminoethy 1-[1,3]-dioxolane (“XTC”); (3aR,5s,6aS)-N,N-dimethyl-2,2-di((9Z,12Z)-octadeca-9,12-dienyl)tetrahydro-3aH-cyclopenta [d][1,3]dioxol-5-amine (“ALNY-100”) and/or 4,7,13-tris(3-oxo-3-(undecylamino) propyl)-N1,N16-diundecyl-4,7,10,13-tetraazahexadecane-1,16-diamide (“NC98-5”).


In some embodiments, the pharmaceutical compositions of the present invention include one or more cationic lipids that constitute at least about 5%, 10%, 20%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, or 70%, measured by weight, of the total lipid content in the pharmaceutical composition, e.g., a lipid nanoparticle. In some embodiments, the pharmaceutical compositions of the present invention include one or more cationic lipids that constitute at least about 5%, 10%, 20%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, or 70%, measured as a mol %, of the total lipid content in the pharmaceutical composition, e.g., a lipid nanoparticle. In some embodiments, the pharmaceutical compositions of the present invention include one or more cationic lipids that constitute about 30-70% (e.g., about 30-65%, about 30-60%, about 30-55%, about 30-50%, about 30-45%, about 30-40%, about 35-50%, about 35-45%, or about 35-40%), measured by weight, of the total lipid content in the pharmaceutical composition, e.g., a lipid nanoparticle. In some embodiments, the pharmaceutical compositions of the present invention include one or more cationic lipids that constitute about 30-70% (e.g., about 30-65%, about 30-60%, about 30-55%, about 30-50%, about 30-45%, about 30-40%, about 35-50%, about 35-45%, or about 35-40%), measured as mol %, of the total lipid content in the pharmaceutical composition, e.g., a lipid nanoparticle.


Non-Cationic Lipids

In some embodiments, the lipid nanoparticles contain one or more non-cationic lipids. As used herein, the phrase “non-cationic lipid” refers to any neutral, zwitterionic or anionic lipid. As used herein, the phrase “anionic lipid” refers to any of a number of lipid species that carry a net negative charge at a selected pH, such as physiological pH. Non-cationic lipids include, but are not limited to, distearoylphosphatidylcholine (DSPC), dioleoylphosphatidylcholine (DOPC), dipalmitoylphosphatidylcholine (DPPC), dioleoylphosphatidylglycerol (DOPG), dipalmitoylphosphatidylglycerol (DPPG), dioleoylphosphatidylethanolamine (DOPE), palmitoyloleoylphosphatidylcholine (POPC), palmitoyloleoyl-phosphatidylethanolamine (POPE), dioleoyl-phosphatidylethanolamine 4-(N-maleimidomethyl)-cyclohexane-1-carboxylate (DOPE-mal), dipalmitoyl phosphatidyl ethanolamine (DPPE), dimyristoylphosphoethanolamine (DMPE), distearoyl-phosphatidyl-ethanolamine (DSPE), 1,2-dierucoyl-sn-glycero-3-phosphoethanolamine (DEPE), phosphatidylserine, sphingolipids, cerebrosides, gangliosides, 16-O-monomethyl PE, 16-O-dimethyl PE, 18-1-trans PE, 1-stearoyl-2-oleoyl-phosphatidyethanolamine (SOPE), or a mixture thereof. In some embodiments, lipid nanoparticles suitable for use with the invention include DOPE as the non-cationic lipid component. In other embodiments, lipid nanoparticles suitable for use with the invention include DEPE as the non-cationic lipid component.


In some embodiments, a non-cationic lipid is a neutral lipid, i.e., a lipid that does not carry a net charge in the conditions under which the pharmaceutical composition is formulated and/or administered.


In some embodiments, such non-cationic lipids may be used alone, but are preferably used in combination with other lipids, for example, cationic lipids.


In some embodiments, a non-cationic lipid may be present in a molar ratio (mol %) of about 5% to about 90%, about 5% to about 70%, about 5% to about 50%, about 5% to about 40%, about 5% to about 30%, about 10% to about 70%, about 10% to about 50%, or about 10% to about 40% of the total lipids present in a pharmaceutical composition. In some embodiments, total non-cationic lipids may be present in a molar ratio (mol %) of about 5% to about 90%, about 5% to about 70%, about 5% to about 50%, about 5% to about 40%, about 5% to about 30%, about 10% to about 70%, about 10% to about 50%, or about 10% to about 40% of the total lipids present in a pharmaceutical composition. In some embodiments, the percentage of non-cationic lipid in a lipid nanoparticle may be greater than about 5 mol %, greater than about 10 mol %, greater than about 20 mol %, greater than about 30 mol %, or greater than about 40 mol %. In some embodiments, the percentage total non-cationic lipids in a lipid nanoparticle may be greater than about 5 mol %, greater than about 10 mol %, greater than about 20 mol %, greater than about 30 mol %, or greater than about 40 mol %. In some embodiments, the percentage of non-cationic lipid in a lipid nanoparticle is no more than about 5 mol %, no more than about 10 mol %, no more than about 20 mol %, no more than about 30 mol %, or no more than about 40 mol %. In some embodiments, the percentage total non-cationic lipids in a lipid nanoparticle may be no more than about 5 mol %, no more than about 10 mol %, no more than about 20 mol %, no more than about 30 mol %, or no more than about 40 mol %.


In some embodiments, a non-cationic lipid may be present in a weight ratio (wt %) of about 5% to about 90%, about 5% to about 70%, about 5% to about 50%, about 5% to about 40%, about 5% to about 30%, about 10% to about 70%, about 10% to about 50%, or about 10% to about 40% of the total lipids present in a pharmaceutical composition. In some embodiments, total non-cationic lipids may be present in a weight ratio (wt %) of about 5% to about 90%, about 5% to about 70%, about 5% to about 50%, about 5% to about 40%, about 5% to about 30%, about 10% to about 70%, about 10% to about 50%, or about 10% to about 40% of the total lipids present in a pharmaceutical composition. In some embodiments, the percentage of non-cationic lipid in a lipid nanoparticle may be greater than about 5 wt %, greater than about 10 wt %, greater than about 20 wt %, greater than about 30 wt %, or greater than about 40 wt %. In some embodiments, the percentage total non-cationic lipids in a lipid nanoparticle may be greater than about 5 wt %, greater than about 10 wt %, greater than about 20 wt %, greater than about 30 wt %, or greater than about 40 wt %. In some embodiments, the percentage of non-cationic lipid in a lipid nanoparticle is no more than about 5 wt %, no more than about 10 wt %, no more than about 20 wt %, no more than about 30 wt %, or no more than about 40 wt %. In some embodiments, the percentage total non-cationic lipids in a lipid nanoparticle may be no more than about 5 wt %, no more than about 10 wt %, no more than about 20 wt %, no more than about 30 wt %, or no more than about 40 wt %.


Cholesterol-Based Lipids In some embodiments, the lipid nanoparticles comprise one or more cholesterol-based lipids. For example, suitable cholesterol-based cationic lipids include, for example, DC-Choi (N,N-dimethyl-N-ethylcarboxamidocholesterol), 1,4-bis(3-N-oleylamino-propyl) piperazine (Gao, et al. Biochem. Biophys. Res. Comm. 179, 280 (1991); Wolf et al. BioTechniques 23, 139 (1997); U.S. Pat. No. 5,744,335), or imidazole cholesterol ester (ICE), as disclosed in International Patent Publication WO 2011/068810, which has the following structure:




embedded image


In embodiments, a cholesterol-based lipid is cholesterol.


In some embodiments, the cholesterol-based lipid may comprise a molar ratio (mol %) of about 1% to about 30%, or about 5% to about 20% of the total lipids present in a lipid nanoparticle. In some embodiments, the percentage of cholesterol-based lipid in the lipid nanoparticle may be greater than about 5 mol %, greater than about 10 mol %, greater than about 20 mol %, greater than about 30 mol %, or greater than about 40 mol %. In some embodiments, the percentage of cholesterol-based lipid in the lipid nanoparticle may be no more than about 5 mol %, no more than about 10 mol %, no more than about 20 mol %, no more than about 30 mol %, or no more than about 40 mol %.


In some embodiments, a cholesterol-based lipid may be present in a weight ratio (wt %) of about 1% to about 30%, or about 5% to about 20% of the total lipids present in a lipid nanoparticle. In some embodiments, the percentage of cholesterol-based lipid in the lipid nanoparticle may be greater than about 5 wt %, greater than about 10 wt %, greater than about 20 wt %, greater than about 30 wt %, or greater than about 40 wt %. In some embodiments, the percentage of cholesterol-based lipid in the lipid nanoparticle may be no more than about 5 wt %, no more than about 10 wt %, no more than about 20 wt %, no more than about 30 wt %, or no more than about 40 wt %.


PEG-Modified Lipids

In some embodiments, the lipid nanoparticle comprises one or more PEGylated lipids.


For example, the use of polyethylene glycol (PEG)-modified phospholipids and derivatized lipids such as derivatized ceramides (PEG-CER), including N-Octanoyl-Sphingosine-1-[Succinyl(Methoxy Polyethylene Glycol)-2000] (C8 PEG-2000 ceramide) is also contemplated by the present invention, either alone or preferably in combination with other lipid pharmaceutical compositions together which comprise the transfer vehicle (e.g., a lipid nanoparticle).


Contemplated PEG-modified lipids include, but are not limited to, a polyethylene glycol chain of up to 5 kDa in length covalently attached to a lipid with alkyl chain(s) of C6-C20 length. In some embodiments, a PEG-modified or PEGylated lipid is PEGylated cholesterol or PEG-2K. The addition of such components may prevent complex aggregation and may also provide a means for increasing circulation lifetime and increasing the delivery of the lipid-nucleic acid pharmaceutical composition to the target tissues, (Klibanov et al. (1990) FEBS Letters, 268 (1): 235-237), or they may be selected to rapidly exchange out of the pharmaceutical composition in vivo (see U.S. Pat. No. 5,885,613). Particularly useful exchangeable lipids are PEG-ceramides having shorter acyl chains (e.g., C14 or C18). Lipid nanoparticles suitable for use with the invention typically include a PEG-modified lipid such as 1,2-dimyristoyl-rac-glycero-3-methoxypolyethylene glycol-2000 (DMG-PEG2K).


The PEG-modified phospholipid and derivatized lipids of the present invention may comprise a molar ratio from about 0% to about 20%, about 0.5% to about 20%, about 1% to about 15%, about 4% to about 10%, or about 2% of the total lipid present in the liposomal transfer vehicle (e.g., a lipid nanoparticle disclosed herein). In some embodiments, one or more PEG-modified lipids constitute about 4% of the total lipids by molar ratio. In some embodiments, one or more PEG-modified lipids constitute about 5% of the total lipids by molar ratio. In some embodiments, one or more PEG-modified lipids constitute about 6% of the total lipids by molar ratio. For certain applications, such as pulmonary delivery, lipid nanoparticles in which the PEG-modified lipid component constitutes about 5% of the total lipids by molar ratio have been found to be particularly suitable.


Ratio of Distinct Lipid Components

A suitable lipid nanoparticle for the present invention may include one or more of any of the cationic lipids, non-cationic lipids, cholesterol lipids, PEG-modified lipids, amphiphilic block copolymers and/or polymers described herein at various ratios. In some embodiments, a lipid nanoparticle comprises five and no more than five distinct components of nanoparticle. In some embodiments, a lipid nanoparticle comprises four and no more than four distinct components of nanoparticle. In some embodiments, a lipid nanoparticle comprises three and no more than three distinct components of nanoparticle. As non-limiting examples, a suitable lipid nanoparticle pharmaceutical composition may include a combination selected from cKK-E12, DOPE, cholesterol and DMG-PEG2K; C12-200, DOPE, cholesterol and DMG-PEG2K; HGT4003, DOPE, cholesterol and DMG-PEG2K; ICE, DOPE, cholesterol and DMG-PEG2K; HGT4001, DOPE, cholesterol and DMG-PEG2K; HGT4002, DOPE, cholesterol and DMG-PEG2K; TL1-01D-DMA, DOPE, cholesterol and DMG-PEG2K; TL1-04D-DMA, DOPE, cholesterol and DMG-PEG2K; TL1-08D-DMA, DOPE, cholesterol and DMG-PEG2K; TL1-10D-DMA, DOPE, cholesterol and DMG-PEG2K; ICE, DOPE and DMG-PEG2K; HGT4001, DOPE and DMG-PEG2K; or HGT4002, DOPE and DMG-PEG2K.


In various embodiments, cationic lipids (e.g., cKK-E12, C12-200, TL1-01D-DMA, TL1-04D-DMA, TL1-08D-DMA, TL1-10D-DMA, ICE, HGT4001, HGT4002 and/or HGT4003) constitute about 30-60% (e.g., about 30-55%, about 30-50%, about 30-45%, about 30-40%, about 35-50%, about 35-45%, or about 35-40%) of the lipid nanoparticle by molar ratio. In some embodiments, the percentage of cationic lipids (e.g., cKK-E12, C12-200, TL1-01D-DMA, TL1-04D-DMA, TL1-08D-DMA, TL1-10D-DMA, ICE, HGT4001, HGT4002 and/or HGT4003) is or greater than about 30%, about 35%, about 40%, about 45%, about 50%, about 55%, or about 60% of the lipid nanoparticle by molar ratio.


In some embodiments, the molar ratio of cationic lipid(s) to non-cationic lipid(s) to cholesterol-based lipid(s) to PEG-modified lipid(s) may be between about 30-60:25-35:20-30:1-15, respectively. In some embodiments, the ratio of cationic lipid(s) to non-cationic lipid(s) to cholesterol-based lipid(s) to PEG-modified lipid(s) is approximately 40:30:20:10, respectively. In some embodiments, the ratio of cationic lipid(s) to non-cationic lipid(s) to cholesterol-based lipid(s) to PEG-modified lipid(s) is approximately 40:30:25:5, respectively. In some embodiments, the ratio of cationic lipid(s) to non-cationic lipid(s) to cholesterol-based lipid(s) to PEG-modified lipid(s) is approximately 40:32:25:3, respectively. In some embodiments, the ratio of cationic lipid(s) to non-cationic lipid(s) to cholesterol-based lipid(s) to PEG-modified lipid(s) is approximately 50:25:20:5.


In embodiments where a lipid nanoparticle comprises three and no more than three distinct components of lipids, the ratio of total lipid content (i.e., the ratio of lipid component (1):lipid component (2):lipid component (3)) can be represented as x:y:z, wherein







(

y
+
z

)

=


1

0

0

-

x
.






In some embodiments, each of “x,” “y,” and “z” represents molar percentages of the three distinct components of lipids, and the ratio is a molar ratio.


In some embodiments, each of “x,” “y,” and “z” represents weight percentages of the three distinct components of lipids, and the ratio is a weight ratio.


In some embodiments, lipid component (1), represented by variable “x,” is a sterol-based cationic lipid.


In some embodiments, lipid component (2), represented by variable “y,” is a non-cationic lipid.


In some embodiments, lipid component (3), represented by variable “z” is a PEG lipid.


In some embodiments, variable “x,” representing the molar percentage of lipid component (1) (e.g., a sterol-based cationic lipid), is at least about 10%, about 20%, about 30%, about 40%, about 50%, about 55%, about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, about 90%, or about 95%.


In some embodiments, variable “x,” representing the molar percentage of lipid component (1) (e.g., a sterol-based cationic lipid), is no more than about 95%, about 90%, about 85%, about 80%, about 75%, about 70%, about 65%, about 60%, about 55%, about 50%, about 40%, about 30%, about 20%, or about 10%. In embodiments, variable “x” is no more than about 65%, about 60%, about 55%, about 50%, about 40%.


In some embodiments, variable “x,” representing the molar percentage of lipid component (1) (e.g., a sterol-based cationic lipid), is: at least about 50% but less than about 95%; at least about 50% but less than about 90%; at least about 50% but less than about 85%; at least about 50% but less than about 80%; at least about 50% but less than about 75%; at least about 50% but less than about 70%; at least about 50% but less than about 65%; or at least about 50% but less than about 60%. In embodiments, variable “x” is at least about 50% but less than about 70%; at least about 50% but less than about 65%; or at least about 50% but less than about 60%.


In some embodiments, variable “x,” representing the weight percentage of lipid component (1) (e.g., a sterol-based cationic lipid), is at least about 10%, about 20%, about 30%, about 40%, about 50%, about 55%, about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, about 90%, or about 95%.


In some embodiments, variable “x,” representing the weight percentage of lipid component (1) (e.g., a sterol-based cationic lipid), is no more than about 95%, about 90%, about 85%, about 80%, about 75%, about 70%, about 65%, about 60%, about 55%, about 50%, about 40%, about 30%, about 20%, or about 10%. In embodiments, variable “x” is no more than about 65%, about 60%, about 55%, about 50%, about 40%.


In some embodiments, variable “x,” representing the weight percentage of lipid component (1) (e.g., a sterol-based cationic lipid), is: at least about 50% but less than about 95%; at least about 50% but less than about 90%; at least about 50% but less than about 85%; at least about 50% but less than about 80%; at least about 50% but less than about 75%; at least about 50% but less than about 70%; at least about 50% but less than about 65%; or at least about 50% but less than about 60%. In embodiments, variable “x” is at least about 50% but less than about 70%; at least about 50% but less than about 65%; or at least about 50% but less than about 60%.


In some embodiments, variable “z,” representing the molar percentage of lipid component (3) (e.g., a PEG lipid) is no more than about 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, or 25%. In embodiments, variable “z,” representing the molar percentage of lipid component (3) (e.g., a PEG lipid) is about 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%. In embodiments, variable “z,” representing the molar percentage of lipid component (3) (e.g., a PEG lipid) is about 1% to about 10%, about 2% to about 10%, about 3% to about 10%, about 4% to about 10%, about 1% to about 7.5%, about 2.5% to about 10%, about 2.5% to about 7.5%, about 2.5% to about 5%, about 5% to about 7.5%, or about 5% to about 10%.


In some embodiments, variable “z,” representing the weight percentage of lipid component (3) (e.g., a PEG lipid) is no more than about 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, or 25%. In embodiments, variable “z,” representing the weight percentage of lipid component (3) (e.g., a PEG lipid) is about 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%. In embodiments, variable “z,” representing the weight percentage of lipid component (3) (e.g., a PEG lipid) is about 1% to about 10%, about 2% to about 10%, about 3% to about 10%, about 4% to about 10%, about 1% to about 7.5%, about 2.5% to about 10%, about 2.5% to about 7.5%, about 2.5% to about 5%, about 5% to about 7.5%, or about 5% to about 10%.


For pharmaceutical compositions having three and only three distinct lipid components, variables “x,” “y,” and “z” may be in any combination so long as the total of the three variables sums to 100% of the total lipid content. For example, in typical three-component lipid nanoparticles suitable for use with the invention, the molar ratio of cationic lipid to non-cationic lipid to PEG-modified lipid may be between about 55-65:30-40:1-15, respectively. In some embodiments, a molar ratio of cationic lipid (e.g., a sterol-based lipid) to non-cationic lipid (e.g., DOPE or DEPE) to PEG-modified lipid (e.g., DMG-PEG2K) of 60:35:5 is particularly suitable, e.g., for pulmonary delivery of lipid nanoparticles via nebulization.


Exemplary Lipid Nanoparticle Formulation

An exemplary lipid nanoparticle for in vivo delivery of a nucleic acids in accordance with the present invention comprises a cationic lipid (e.g., cKK-E10), a non-cationic lipid (e.g., DOPE), cholesterol and a PEG-modified lipid (e.g., DMG-PEG2K). In a particular embodiment, the invention provides a lipid nanoparticle for the delivery of the nucleic acids of the invention, which has a lipid component consisting of cKK-E10, DOPE, cholesterol and DMG-PEG2K at the molar ratios 40:30:28.5:1.5. As shown in the examples, this lipid nanoparticle formulation has been found to be particularly effective for use in the immunogenic compositions of the invention, in particular for intramuscular administration of lipid nanoparticles comprising the nucleic acids of the invention.


Lipid Nanoparticle Compositions Containing at Least Two Nucleic Acids

In some embodiments, at least two nucleic acids comprising different optimized nucleotide sequences of the invention are encapsulated in the same lipid nanoparticle (e.g., a lipid nanoparticle comprising cKK-E10, DOPE, cholesterol and DMG-PEG2K). For example, a first nucleic acid (e.g., an mRNA) comprising a first optimized nucleotide sequence of the invention may be combined with a second nucleic acid (e.g., an mRNA) comprising a second optimized nucleotide sequence of the invention and encapsulated in the same lipid nanoparticle.


In other embodiments, at least two nucleic acids comprising different optimized nucleotide sequences of the invention are encapsulated separately (typically using a lipid nanoparticle formulation having the same lipid composition, e.g., cKK-E10, DOPE, cholesterol and DMG-PEG2K). For example, a first nucleic acid (e.g., an mRNA) comprising a first optimized nucleotide sequence of the invention and a second nucleic acid (e.g., an mRNA) comprising a second optimized nucleotide sequence of the invention may each be encapsulated in separate lipid nanoparticles, which are then combined to provide a mixture of lipid nanoparticles encapsulating the first nucleic acid and lipid nanoparticles encapsulating the second nucleic acid (typically at a 1:1 ratio).


For instance, an immunogenic composition in accordance with the invention may comprise at least two nucleic acids, wherein the first nucleic acid comprises an optimized nucleotide sequence encoding a full-length SARS-COV-2 spike protein which has been modified relative to naturally occurring full-length SARS-COV-2 spike protein of SEQ ID NO: 1 to remove the furin cleavage site and to mutate residues 986 and 987 to proline (e.g., an mRNA comprising the optimized nucleotide sequence of SEQ ID NO: 44, or the exemplary mRNA construct 1 shown in Table 4); and the second nucleic acid comprises an optimized nucleotide sequence encoding a full-length SARS-COV-2 spike protein which has been modified relative to naturally occurring full-length SARS-COV-2 spike protein of SEQ ID NO: 1 to remove the furin cleavage site and to mutate residues 986 and 987 to proline and further contains the L18F, D80A, D215G, L242-, A243-, L244-, K417N, E484K, N501Y, D614G and A701V mutations (e.g., an mRNA comprising the optimized nucleotide sequence of SEQ ID NO: 166, or the exemplary mRNA construct 2 shown in Table 4). In some embodiments, the first nucleic acid may be combined with the second nucleic acid and encapsulated in the same lipid nanoparticle. In other embodiments, the first nucleic acid and the second nucleic acid may each be encapsulated in separate lipid nanoparticles (typically formed from the same lipid components, e.g., cKK-E10, DOPE, cholesterol and DMG-PEG2K). The lipid nanoparticles encapsulating the first nucleic acid and the lipid nanoparticles encapsulating the second nucleic acid are then combined (typically at a 1:1 ratio).


Pharmaceutical Compositions

A nucleic acid comprising an optimized nucleotide sequence encoding a SARS-COV-2 antigen in accordance with the invention may be provided in a pharmaceutical composition (e.g., an immunogenic composition or a vaccine). In a typical embodiment, a pharmaceutical composition in accordance with the invention comprises a nucleic acid in accordance with the invention and a lipid nanoparticle. In particular embodiments, the nucleic acid is encapsulated in the lipid nanoparticle. In some embodiments, the lipid nanoparticle may comprise one or more of a cationic lipid, a non-cationic lipid, a cholesterol-based lipid, a PEG-modified lipid, or a combination thereof. In a typical embodiment, the lipid nanoparticle comprises a cationic lipid, a non-cationic lipid, a cholesterol-based lipid, and a PEG-modified lipid. In some embodiments, the lipid nanoparticle comprises a cationic lipid, a non-cationic lipid, and a PEG-modified lipid.


Pharmaceutically Acceptable Excipients

To stabilize the nucleic acid and/or lipid nanoparticle, or to facilitate administration of the pharmaceutical composition and/or enhance in vivo expression of the nucleic acids of the invention, the nucleic acid and/or lipid nanoparticle can be formulated in combination with one or more additional nucleic acids, carriers, targeting ligands, stabilizing reagents, and/or other pharmaceutically acceptable excipients. Techniques for formulation and administration of drugs may be found in “Remington's Pharmaceutical Sciences,” Mack Publishing Co., Easton, Pa., latest edition.


In some embodiments, the pharmaceuticals composition is formulated with a diluent. In some embodiments, the diluent is selected from a group consisting of DMSO, ethylene glycol, glycerol, 2-Methyl-2,4-pentanediol (MPD), propylene glycol, sucrose, and trehalose. In some embodiments, the formulation comprises 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19% or 20% diluent. In a particular embodiment, the mRNA is formulated in 10% trehalose as the diluent.


Therapeutically Effective Amount

The nucleic acid in accordance with the invention is provided in a therapeutically effective amount in the pharmaceutical compositions provided here. As used herein, the term “therapeutically effective amount” is largely determined based on the total amount of the therapeutic agent contained in the pharmaceutical compositions of the present invention. Generally, a therapeutically effective amount is sufficient to achieve a meaningful benefit to the subject (e.g., treating or preventing an infection with a SARS-COV-2 infection). For example, a therapeutically effective amount may be an amount sufficient to achieve a desired prophylactic effect with an immunogenic composition of the invention.


In some embodiments, a pharmaceutical composition (e.g., an immunogenic composition) in accordance with the present invention comprises an mRNA comprising an optimized nucleotide sequence encoding a SARS-COV-2 antigen at a concentration ranging from 0.1 mg/mL to 10.0 mg/mL. In some embodiments, the mRNA is at a concentration of at least 0.1 mg/mL. In some embodiments, the mRNA is at a concentration of at least 0.2 mg/mL. In some embodiments, the mRNA is at a concentration of at least 0.3 mg/mL. In some embodiments, the mRNA is at a concentration of at least 0.4 mg/mL. In some embodiments, the mRNA is at a concentration of at least 0.5 mg/mL. In some embodiments, the mRNA is at a concentration of at least 0.6 mg/mL. In some embodiments, the mRNA is at a concentration of at least 0.7 mg/mL. In some embodiments, the mRNA is at a concentration of at least 0.8 mg/mL. In some embodiments, the mRNA is at a concentration of at least 0.9 mg/mL. In some embodiments, the mRNA is at a concentration of at least 1.0 mg/mL. In a typical embodiment, the mRNA is at a concentration of about 0.5 mg/mL to about 1.0 mg/mL, e.g., about 0.6 mg/mL to about 0.8 mg/mL.


In some embodiments, a pharmaceutical composition (e.g., an immunogenic composition) in accordance with the present invention comprises an mRNA comprising an optimized nucleotide sequence encoding a SARS-COV-2 antigen at a dose of between 5 μg and 200 μg. In some embodiments, the mRNA dose in the pharmaceutical composition is 10 μg and 200 μg. In some embodiments, the mRNA dose in the pharmaceutical composition is between 7 μg and 135 μg. In particular embodiments, the mRNA dose in the pharmaceutical composition is between 15 μg and 135 μg (e.g., between 15 μg and 45 μg).


In some embodiments, the mRNA dose in the pharmaceutical composition is at least 5 μg. In some embodiments, the mRNA dose in the pharmaceutical composition is at least 10 μg. In some embodiments, the mRNA dose in the pharmaceutical composition is at least 15 μg. In some embodiments, the mRNA dose in the pharmaceutical composition is at least 20 μg. In some embodiments, the mRNA dose in the pharmaceutical composition is at least 25 μg. In some embodiments, the mRNA dose in the pharmaceutical composition is at least 30 μg. In some embodiments, the mRNA dose in the pharmaceutical composition is at least 35 μg. In some embodiments, the mRNA dose in the pharmaceutical composition is at least 40 μg. In some embodiments, the mRNA dose in the pharmaceutical composition is at least 45 μg. In some embodiments, the mRNA dose in the pharmaceutical composition is at least 50 μg. In some embodiments, the mRNA dose in the pharmaceutical composition is at least 75 μg. In some embodiments, the mRNA dose in the pharmaceutical composition is at least 100 μg. In some embodiments, the mRNA dose in the pharmaceutical composition is at least 150 μg.


In a specific embodiment, the mRNA dose in the pharmaceutical composition is about 7.5 μg. In another specific embodiment, the mRNA dose in the pharmaceutical composition is about 10 μg. In another specific embodiment, the mRNA dose in the pharmaceutical composition is about 15 μg. In another specific embodiment, the mRNA dose in the pharmaceutical composition is about 20 μg. In another specific embodiment, the mRNA dose in the pharmaceutical composition is about 30 μg. In another specific embodiment, the mRNA dose in the pharmaceutical composition is about 40 μg. In another specific embodiment, the mRNA dose in the pharmaceutical composition is about 45 μg. In another specific embodiment, the mRNA dose in the pharmaceutical composition is about 135 μg.


In some embodiments, a pharmaceutical composition (e.g., an immunogenic composition) in accordance with the present invention comprises more than one mRNA construct (e.g., at least two mRNA constructs) comprising an optimized nucleotide sequence encoding a SARS-COV-2 antigen (e.g., two mRNA constructs encoding naturally occurring variants of the SARS-COV-2 S protein). Accordingly, in some embodiments, the total dose of the mRNA constructs is 5 μg and 200 μg. For example, the total dose of the mRNA constructs is between 10 μg and 200 μg. In some embodiments, the total dose of the mRNA constructs is between 7 μg and 135 μg. In particular embodiments, the total dose of the mRNA constructs is between 15 μg and 135 μg (e.g., between 15 μg and 45 μg).


In some embodiments, the total dose of the mRNA constructs is at least 5 μg. In some embodiments, the total dose of the mRNA constructs is at least 10 μg. In some embodiments, the total dose of the mRNA constructs is at least 15 μg. In some embodiments, the total dose of the mRNA constructs is at least 20 μg. In some embodiments, the total dose of the mRNA constructs is at least 25 μg. In some embodiments, the total dose of the mRNA constructs is at least 30 μg. In some embodiments, the total dose of the mRNA constructs is at least 35 μg. In some embodiments, the total dose of the mRNA constructs is at least 40 μg. In some embodiments, the total dose of the mRNA constructs is at least 45 μg. In some embodiments, the total dose of the mRNA constructs is at least 50 μg. In some embodiments, the total dose of the mRNA constructs is at least 75 μg. In some embodiments, the total dose of the mRNA constructs is at least 100 μg. In some embodiments, the total dose of the mRNA constructs is at least 150 μg.


In a specific embodiment, the total dose of the mRNA constructs in the pharmaceutical composition is about 7.5 μg. In another specific embodiment, the total dose of the mRNA constructs in the pharmaceutical composition is about 10 μg. In another specific embodiment, the total dose of the mRNA constructs in the pharmaceutical composition is about 15 μg. In another specific embodiment, the total dose of the mRNA constructs in the pharmaceutical composition is about 20 μg. In another specific embodiment, the total dose of the mRNA constructs in the pharmaceutical composition is about 30 μg. In another specific embodiment, the total dose of the mRNA constructs in the pharmaceutical composition is about 40 μg. In another specific embodiment, the total dose of the mRNA constructs in the pharmaceutical composition is about 45 μg. In another specific embodiment, the total dose of the mRNA constructs in the pharmaceutical composition is about 135 μg.


Combinations of SARS-COV-2 S Proteins

In some embodiments, an immunogenic composition in accordance with the invention comprises more than one optimized nucleotide sequence encoding a SARS-COV-2 spike protein. In some embodiments, each of the optimized nucleotide sequences encodes a naturally occurring variant of a SARS-COV-2 spike protein. In some embodiments, one or more of these optimized nucleotide sequences encodes a SARS-COV-2 spike protein that has been modified relative to naturally occurring SARS-COV-2 spike protein. In particular embodiments, the modifications stabilize the SARS-COV-2 spike protein in its prefusion conformation, as described in detail above.


In some embodiments, an immunogenic composition in accordance with the present invention comprises at least two nucleic acids, for use in prophylaxis of an infection with SARS-CoV-2, wherein a first nucleic acid comprises an optimized nucleotide sequence which encodes an amino acid sequence comprising a sequence selected from SEQ ID NO: 1, 2, 3, 4, 5, 8, 9, 10, 11, 12, 14, 15, 16, 17, 19, 20, 35, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84. 86, 88, 90, 92, 94, 96, 98, 104, 106, 108, 110, 118, 120, 123, 125, 127, 129, 131, 133, 135, 137, 139 or 141, and wherein one or more further nucleic acid(s) comprise(s) an optimized nucleotide sequence which encodes an amino acid sequence comprising a sequence selected from SEQ ID NO: 151, 153, 155, 157, 159, 161, 163, 165, 167, 169 or 171.


In some embodiments, an immunogenic composition in accordance with the present invention comprises at least two nucleic acids, for use in prophylaxis of an infection with SARS-CoV-2, wherein a first nucleic acid that encodes an amino acid sequence comprising SEQ ID NO: 11; and wherein a second nucleic acid comprises an optimized nucleotide sequence that encodes an amino acid sequence of SEQ ID NO: 157.


In some embodiments, an immunogenic composition in accordance with the present invention comprises at least two nucleic acids, for use in prophylaxis of an infection with SARS-CoV-2, wherein a first nucleic acid comprises an optimized nucleotide sequence that is at least 85% (e.g., at least 90%) identical to the nucleic acid sequence of SEQ ID NO: 44 and encodes an amino acid sequence comprising SEQ ID NO:11, optionally wherein the optimized nucleotide sequence has the nucleic acid sequence of SEQ ID NO: 44; and wherein a second nucleic acid comprises an optimized nucleotide sequence that is at least 85% (e.g., at least 90%) identical to the nucleic acid sequence of SEQ ID NO: 156 and encodes an amino acid sequence comprising SEQ ID NO: 157, optionally wherein the optimized nucleotide sequence has the nucleic acid sequence of SEQ ID NO 156.


In some embodiments, an immunogenic composition in accordance with the present invention comprises at least two nucleic acids, for use in prophylaxis of an infection with SARS-CoV-2, wherein a first nucleic acid that encodes an amino acid sequence comprising SEQ ID NO: 11; and wherein a second nucleic acid comprises an optimized nucleotide sequence that encodes an amino acid sequence of SEQ ID NO: 163.


In some embodiments, an immunogenic composition in accordance with the present invention comprises at least two nucleic acids, for use in prophylaxis of an infection with SARS-CoV-2, wherein a first nucleic acid comprises an optimized nucleotide sequence that is at least 85% (e.g., at least 90%) identical to the nucleic acid sequence of SEQ ID NO: 44 and encodes an amino acid sequence comprising SEQ ID NO:11, optionally wherein the optimized nucleotide sequence has the nucleic acid sequence of SEQ ID NO: 44; and wherein a second nucleic acid comprises an optimized nucleotide sequence that is at least 85% (e.g., at least 90%) identical to the nucleic acid sequence of SEQ ID NO: 162 and encodes an amino acid sequence comprising SEQ ID NO: 163, optionally wherein the optimized nucleotide sequence has the nucleic acid sequence of SEQ ID NO 162.


In some embodiments, an immunogenic composition in accordance with the present invention comprises at least two nucleic acids, for use in prophylaxis of an infection with SARS-CoV-2, wherein a first nucleic acid that encodes an amino acid sequence comprising SEQ ID NO: 11; and wherein a second nucleic acid comprises an optimized nucleotide sequence that encodes an amino acid sequence of SEQ ID NO: 167.


In some embodiments, an immunogenic composition in accordance with the present invention comprises at least two nucleic acids, for use in prophylaxis of an infection with SARS-CoV-2, wherein a first nucleic acid comprises an optimized nucleotide sequence that is at least 85% (e.g., at least 90%) identical to the nucleic acid sequence of SEQ ID NO: 44 and encodes an amino acid sequence comprising SEQ ID NO:11, optionally wherein the optimized nucleotide sequence has the nucleic acid sequence of SEQ ID NO: 44; and wherein a second nucleic acid comprises an optimized nucleotide sequence that is at least 85% (e.g., at least 90%) identical to the nucleic acid sequence of SEQ ID NO: 166 and encodes an amino acid sequence comprising SEQ ID NO: 167, optionally wherein the optimized nucleotide sequence has the nucleic acid sequence of SEQ ID NO 166.


In some embodiments, an immunogenic composition in accordance with the present invention comprises at least two nucleic acids, for use in prophylaxis of an infection with SARS-CoV-2, wherein a first nucleic acid that encodes an amino acid sequence comprising SEQ ID NO: 11; and wherein a second nucleic acid comprises an optimized nucleotide sequence that encodes an amino acid sequence of SEQ ID NO: 171.


In some embodiments, an immunogenic composition in accordance with the present invention comprises at least two nucleic acids, for use in prophylaxis of an infection with SARS-CoV-2, wherein a first nucleic acid comprises an optimized nucleotide sequence that is at least 85% (e.g., at least 90%) identical to the nucleic acid sequence of SEQ ID NO: 44 and encodes an amino acid sequence comprising SEQ ID NO:11, optionally wherein the optimized nucleotide sequence has the nucleic acid sequence of SEQ ID NO: 44; and wherein a second nucleic acid comprises an optimized nucleotide sequence that is at least 85% (e.g., at least 90%) identical to the nucleic acid sequence of SEQ ID NO: 170 and encodes an amino acid sequence comprising SEQ ID NO: 171, optionally wherein the optimized nucleotide sequence has the nucleic acid sequence of SEQ ID NO 170.


In some embodiments, an immunogenic composition in accordance with the present invention comprises at least three, at least four or at least five nucleic acids, for use in prophylaxis of an infection with SARS-COV-2. The first, second, third, fourth and fifth nucleic acids, as applicable, may be encapsulated in the same lipid nanoparticles. Alternatively, the first, second, third, fourth and fifth nucleic acids, as applicable, may be encapsulated in separate lipid nanoparticles which are mixed together to form a pharmaceutical composition in accordance with the present invention.


Combinations of SARS-COV-2 Antigens

In some embodiments, a pharmaceutical composition in accordance with the invention comprises more than one optimized nucleotide sequence encoding a SARS-COV-2 antigen. In some embodiments, a pharmaceutical composition may comprise a first nucleic acid comprising an optimized nucleotide sequence encoding a SARS-COV-2 S protein or an antigenic fragment thereof and a second nucleic acid comprising an optimized nucleotide sequence encoding a SARS-CoV-2 M protein or an antigenic fragment thereof. In some embodiments, a pharmaceutical composition may comprise a first nucleic acid comprising an optimized nucleotide sequence encoding a SARS-COV-2 S protein or an antigenic fragment thereof and a second nucleic acid comprising an optimized nucleotide sequence encoding a SARS-COV-2 N protein or an antigenic fragment thereof. In some embodiments, a pharmaceutical composition may comprise a first nucleic acid comprising an optimized nucleotide sequence encoding a SARS-COV-2 S protein or an antigenic fragment thereof and a second nucleic acid comprising an optimized nucleotide sequence encoding a SARS-COV-2 E protein or an antigenic fragment thereof. In other embodiments, a pharmaceutical composition may comprise a first nucleic acid comprising an optimized nucleotide sequence encoding a SARS-COV-2 S protein or an antigenic fragment thereof and second, third and/or fourth nucleic acids, wherein said second nucleic acid comprises an optimized nucleotide sequence encoding a SARS-COV-2 M protein or an antigenic fragment thereof, wherein said third nucleic acid comprises an optimized nucleotide sequence encoding a SARS-COV-2 N protein or an antigenic fragment thereof, and wherein said fourth nucleic acid comprises an optimized nucleotide sequence encoding a SARS-COV-2 E protein or an antigenic fragment thereof.


The first, second, third and fourth nucleic acids, as applicable, may be encapsulated in the same lipid nanoparticles. Alternatively, the first, second, third and fourth nucleic acids, as applicable, may be encapsulated in separate lipid nanoparticles which are mixed together to form a pharmaceutical composition in accordance with the present invention.


Administration

Typically, a pharmaceutical composition in accordance with the invention (e.g., an immunogenic composition or a vaccine) is administered parenterally, e.g., by an intravenous, intradermal, subcutaneous, or intramuscular route. Most commonly the administration is intramuscular. Administration may be by injection, e.g., by needle-free and/or needle injection.


For example, using lipid nanoparticles containing the cationic lipid OF-Deg-Lin, Fenton et al. (Adv Mater. 2017; 29 (33)) were able to deliver encapsulated mRNA successfully to the spleen via intravenous injection. They observed that more than 85% of total protein production occurred in the spleen. When they analyzed the spleen of test animals, they found that lipid nanoparticles delivered the encapsulated mRNA primarily to B cell and monocyte/macrophage populations. A small percentage of the mRNA also appeared to be delivered to the neutrophil and T cell populations. As shown in the examples of the present specification, pharmaceutical compositions comprising lipid nanoparticles which have a lipid component consisting of cKK-E10, DOPE, cholesterol and DMG-PEG2K at the molar ratios 40:30:28.5:1.5 are especially effective in eliciting an immune response against the encapsulated nucleic acid(s), in particular when administered intramuscularly.


Prime-Boost Immunization

In some embodiments, a pharmaceutical composition in accordance with the invention is administered once. In some embodiments, a pharmaceutical composition in accordance with the invention is administered at least twice.


For example, a typical prime-boost immunization of a subject who has not previously been immunized against an infection with a β-coronavirus, e.g., SARS-COV-2, typically comprises at least two immunizations. Commonly, these two immunization are administered at an interval. Accordingly, in some embodiments, a pharmaceutical composition in accordance with the invention is administered at least twice (e.g., three times) at an interval of 2, 3, 4, 5, 6, 7 or 8 weeks. In some embodiments, a pharmaceutical composition in accordance with the invention is administered twice at an interval of 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 or 12 weeks. In typical embodiments, the administration interval is 2 weeks or 4 weeks (e.g., 1 month). In other embodiments, the administration interval is 11 weeks, or 12 weeks (e.g. about 3 months). Accordingly, in one embodiment, the invention provides a method of preventing an infection caused by a β-coronavirus (e.g., SARS-COV-2), wherein said method comprises administering to a subject a first dose of an immunogenic composition comprising an mRNA construct of the invention, and a second dose of an immunogenic composition of the invention, wherein said first and second doses are administered at least 2 weeks apart from each other. In some embodiments, the invention provides a method of preventing an infection caused by a β-coronavirus (e.g., SARS-CoV-2), wherein said method comprises administering to a subject a first dose of an immunogenic composition comprising an mRNA construct of the invention, and a second dose of an immunogenic composition of the invention, wherein said first and second doses are administered about 3 weeks apart from each other.


Sometimes, an initial prime-boost immunization is followed by at least one further immunization to refresh the protective effective of the initial immunization series. This further immunization typically takes place several months, and sometimes several years, after the initial prime-boost immunization. Accordingly, in some embodiments, a pharmaceutical composition in accordance with the invention is administered to a subject at least once 3-18 months (e.g., about 9 months or about 12 months) after the subject was administered with at least one dose of an immunogenic composition for the prophylaxis of an infection with a β-coronavirus, e.g. a β-coronavirus expressing a spike protein which binds to angiotensin-converting enzyme 2 (ACE2), such as SARS-COV-2. For example, a subject may have received at least one dose of an immunogenic composition for the prophylaxis of an infection with a β-coronavirus (e.g., SARS-CoV-2), and 3-18 months (e.g., about 9 months or about 12 months) later, the subject is administered a pharmaceutical composition of the invention. More typically, a subject may have received two doses of an immunogenic composition for the prophylaxis of an infection with a β-coronavirus (e.g., SARS-COV-2), e.g. a first dose and, at least two weeks later, a second dose. 3-18 months after having received the second dose, the subject may be administered with a pharmaceutical composition of the invention. The administration of a pharmaceutical composition of the invention may commonly occur at least 9 months (e.g., about 12 months) after the subject has received the second dose of an immunogenic composition for the prophylaxis of an infection with a β-coronavirus (e.g., SARS-COV-2).


In some embodiments, the first and second doses may be an immunogenic composition for the prophylaxis of an infection with a β-coronavirus (e.g., SARS-COV-2), e.g., a vaccine that elicits neutralizing antibodies against the S protein of the SARS-COV-2 index strain from Wuhan (SEQ ID NO: 1). For example, the vaccine may comprise a nucleic acid encoding a full-length SARS-COV-2 spike protein which has been modified relative to naturally occurring full-length SARS-COV-2 spike protein of SEQ ID NO: 1 to mutate residues 986 and 987 to proline to stabilize the full-length SARS-COV-2 spike protein in its prefusion conformation. Vaccines that elicit neutralizing antibodies include a pharmaceutical compositions disclosed herein (e.g., an immunogenic composition or a vaccine disclosed herein) as well as COVID-19 vaccines produced by Moderna (COVID-19 Vaccine Moderna, such as for example, mRNA-1273 or mRNA-1283), CureVac (CVnCOV), Johnson & Johnson (COVID-19 Vaccine Janssen), AstraZeneca (Vaxzevria), Pfizer/BioNTech (Comirnaty), Sputnik (Gam-COVID-Vac), Sinovac (COVID-19 Vaccine (Vero Cell) Inactivated) and Novavax (NVX-CoV2373). The first dose and the second dose may comprise the same vaccine. The first dose and the second dose may comprise different vaccines.


In a particular embodiment, the pharmaceutical composition of the invention which is administered 3-18 months later comprises a nucleic acid (e.g., an mRNA) comprising an optimized nucleotide sequence encoding a full-length SARS-COV-2 spike protein which has been modified relative to naturally occurring full-length SARS-COV-2 spike protein of SEQ ID NO: 1 to remove the furin cleavage site and to mutate residues 986 and 987 to proline and further contains the L18F, D80A, D215G, L242-, A243-, L244-, K417N, E484K, N501Y, D614G and A701V mutations. In a particular embodiment, the nucleic acid (e.g., an mRNA) comprising the optimized nucleotide sequence is capable of eliciting a broadly neutralizing antibody response against naturally occurring variants of SARS-COV-2, including the Wuhan index strain as well as variants observed in South Africa, Japan, Brazil, the UK, India and California. In some embodiments, the nucleic acid (e.g., an mRNA) comprising the optimized nucleotide sequence is capable of eliciting a neutralizing antibody response against SARS-COV-1. In a specific embodiment, the nucleic acid (e.g., an mRNA) comprising the optimized nucleotide sequence is capable of eliciting a neutralizing antibody response to a β-coronavirus expressing a spike protein which binds to angiotensin-converting enzyme 2 (ACE2). In some embodiments, the spike protein is at least 75% (e.g., at least 80%, 90%, 95% or 99%) identical to SEQ ID NO: 1. In a specific embodiment, the nucleic acid (e.g., the mRNA) comprises an optimized nucleotide sequence that encodes an amino acid sequence comprising SEQ ID NO: 167, optionally wherein the optimized nucleotide sequence has the nucleic acid sequence of SEQ ID NO: 166. For example, the optimized nucleotide sequence of the mRNA may have the nucleic acid sequence of SEQ ID NO: 173.


In one specific embodiment, the pharmaceutical composition of the invention which is administered 3-18 months later comprises at least two nucleic acids (e.g., a first mRNA and a second mRNA), wherein the first nucleic acid (e.g., the first mRNA) comprises an optimized nucleotide sequence encoding a full-length SARS-COV-2 spike protein which has been modified relative to naturally occurring full-length SARS-COV-2 spike protein of SEQ ID NO: 1 to remove the furin cleavage site and to mutate residues 986 and 987 to proline; and the second nucleic acid (e.g., the second mRNA) comprises an optimized nucleotide sequence encoding a full-length SARS-COV-2 spike protein which has been modified relative to naturally occurring full-length SARS-COV-2 spike protein of SEQ ID NO: 1 to remove the furin cleavage site and to mutate residues 986 and 987 to proline and further contains the L18F, D80A, D215G, L242-, A243-, L244-, K417N, E484K, N501Y, D614G and A701V mutations. In a particular embodiment, the pharmaceutical composition comprising the first and second mRNAs is capable of eliciting a broadly neutralizing antibody response against naturally occurring variants of SARS-COV-2, including the Wuhan index strain as well as variants observed in South Africa, Japan, Brazil, the UK, India and California. In some embodiments, the pharmaceutical composition comprising the first and second mRNAs is capable of eliciting a neutralizing antibody response against SARS-CoV-1. In a specific embodiment, the pharmaceutical composition comprising the first and second mRNAs is capable of eliciting a neutralizing antibody response to a β-coronavirus expressing a spike protein which binds to angiotensin-converting enzyme 2 (ACE2). In some embodiments, the spike protein is at least 75% (e.g., at least 80%, 90%, 95% or 99%) identical to SEQ ID NO: 1. The first nucleic acid may comprise an optimized nucleotide sequence which encodes an amino acid sequence comprising SEQ ID NO:11, optionally wherein the optimized nucleotide sequence has the nucleic acid sequence of SEQ ID NO: 44. The second nucleic acid comprises an optimized nucleotide sequence that encodes an amino acid sequence comprising SEQ ID NO: 167, optionally wherein the optimized nucleotide sequence has the nucleic acid sequence of SEQ ID NO: 166. For example, the optimized nucleotide sequence of the first mRNA may have the nucleic acid sequence of SEQ ID NO: 148, wherein the optimized nucleotide sequence of the second mRNA may have the nucleic acid sequence of SEQ ID NO: 173. Typically, the at least two nucleic acids are encapsulated in lipid nanoparticles. For example, the first nucleic acid and the second nucleic acid may be encapsulated in the same lipid nanoparticle. Alternatively, the first nucleic acid and the second nucleic may be encapsulated in separate lipid nanoparticles.


As shown in the examples, subjects who have previously been immunized with a vaccine that elicits neutralizing antibodies against the S protein of the SARS-COV-2 index strain from Wuhan (SEQ ID NO: 1) and who are administered about 9 months later an mRNA vaccine comprising an optimized nucleotide sequence of the invention that encodes a prefusion stabilized South African variant of the SARS-COV-2 S protein are able to mount a broadly neutralizing antibody response effective against a wide variety of S proteins expressed by naturally occurring variants of the original SARS-COV-2 Wuhan strain as well as other β-coronaviruses, in particular those expressing a spike protein which binds to angiotensin-converting enzyme 2 (ACE2), such as SARS-COV-1.


Accordingly, in some embodiments, the pharmaceutical compositions of the invention are for use in the prophylaxis of an infection caused by a β-coronavirus, in particular a β-coronavirus expressing a spike protein which binds to angiotensin-converting enzyme 2 (ACE2). In some embodiments, the pharmaceutical compositions of the invention are for use in the manufacture of a medicament for the prophylaxis of an infection caused by a β-coronavirus, in particular a β-coronavirus expressing a spike protein which binds to angiotensin-converting enzyme 2 (ACE2). In some embodiments, the spike protein is at least 75% (e.g., at least 80%, 90%, 95% or 99%) identical to SEQ ID NO: 1. In a typical embodiment, the β-coronavirus is SARS-COV-2 (e.g., a naturally occurring variant of the Wuhan index strain, such as a South Africa variant, a Japanese variant, a Brazilian variant, a UK variant, an Indian variant or a California variant).


In a specific embodiment, the invention provides a method of preventing an infection caused by SARS-COV-2, wherein said method comprises administering to a subject an effective amount of an immunogenic composition comprising an mRNA construct, wherein said mRNA construct comprises an optimized nucleotide sequence encoding a full-length SARS-COV-2 spike protein which has been modified relative to naturally occurring full-length SARS-COV-2 spike protein of SEQ ID NO: 1 to remove the furin cleavage site and to mutate residues 986 and 987 to proline and further contains the L18F, D80A, D215G, L242-, A243-, L244-, K417N, E484K, N501Y, D614G and A701V mutations, wherein said immunogenic composition is administered to the subject at least 3 months (e.g., about 6 months, about 9 months or about 12 months) after the subject was immunized with a first COVID-19 vaccine and a second COVID-19 vaccine, wherein said first and second COVID-19 vaccines were administered to the subject at least two weeks apart from each other and wherein said first and second COVID-19 vaccines were designed to elicit neutralizing antibodies against the S protein of SARS-COV-2, e.g., the S-protein of the SARS-COV-2 index strain from Wuhan (SEQ ID NO: 1). In some embodiments, the first and second COVID-19 vaccines are identical. In other embodiments, said first and second vaccines are different. In particular embodiments, said first and second COVID-19 vaccines are produced by Moderna (COVID-19 Vaccine Moderna, such as for example, mRNA-1273 or mRNA-1283), CureVac (CVnCOV), Johnson & Johnson (COVID-19 Vaccine Janssen), AstraZeneca (Vaxzevria), Pfizer/BioNTech (Comirnaty), Sputnik (Gam-COVID-Vac), Sinovac (COVID-19 Vaccine (Vero Cell) Inactivated) or Novavax (NVX-CoV2373).


In some embodiments, the immunogenic composition is capable of eliciting a broadly neutralizing antibody response against naturally occurring variants of SARS-COV-2, including the Wuhan index strain as well as variants observed in South Africa, Japan, Brazil, the UK, India and California. In some embodiments, the immunogenic composition is capable of eliciting a neutralizing antibody response against SARS-COV-1. In particular embodiments, the immunogenic composition is capable of eliciting a neutralizing antibody response to a β-coronavirus expressing a spike protein which binds to angiotensin-converting enzyme 2 (ACE2). In particular embodiments, the spike protein is at least 75% (e.g., at least 80%, 90%, 95% or 99%) identical to SEQ ID NO: 1. In particular embodiments, the optimized nucleotide sequence encodes an amino acid sequence comprising SEQ ID NO: 167, optionally wherein the optimized nucleotide sequence has the nucleic acid sequence of SEQ ID NO: 173. In a specific embodiment, the mRNA construct is mRNA construct 2. In particular embodiments, said mRNA construct is encapsulated in a lipid nanoparticle which has a lipid component consisting of cKK-E10, DOPE, cholesterol and DMG-PEG2K, e.g., at the molar ratios 40:30:28.5:1.5. In some embodiments, the immunogenic composition comprises between 7 μg and 135 μg of the mRNA construct, e.g., 7.5 μg, 15 μg, 45 μg or 135 μg.


Further Exemplary Embodiments of the Invention

In one aspect, the invention provides a nucleic acid comprising an optimized nucleotide sequence encoding a SARS-COV-2 antigen, wherein the optimized nucleotide sequence consists of codons associated with a usage frequency which is greater than or equal to 10%; wherein the optimized nucleotide sequence:

    • (i) does not contain a termination signal having one of the following nucleotide sequences:
      • 5′-X1ATCTX2TX3-3′, wherein X1, X2 and X3 are independently selected from A, C, T or G; and 5′-X1AUCUX2UX3-3′, wherein X1, X2 and X3 are independently selected from A, C, U or G;
    • (ii) does not contain any negative cis-regulatory elements and negative repeat elements; and
    • (iii) has a codon adaptation index greater than 0.8;
    • wherein, when divided into non-overlapping 30 nucleotide-long portions, each portion of the optimized nucleotide sequence has a guanine cytosine content range of 30%-70%.


In certain embodiments, the optimized nucleotide sequence does not contain a termination signal having one of the following sequences: TATCTGTT; TTTTTT; AAGCTT; GAAGAGC; TCTAGA; UAUCUGUU; UUUUUU; AAGCUU; GAAGAGC; UCUAGA. In certain embodiments the nucleic acid is mRNA or DNA.


In the following, modified SARS-COV-2 spike proteins or antigenic fragments thereof are described by reference to particular optimized nucleic acid sequences. It should be understood that, although these modified SARS-COV-2 spike protein or an antigenic fragment may have particular utility in the context of the disclosed nucleic acid-based vaccines of the invention, they may also have utility in protein-based vaccines. Moreover, the optimized nucleic acid sequences may also be useful in the efficient production of such protein-based vaccines.


In certain aspects, the nucleic acid of the invention is an optimized nucleotide sequence encoding the SARS-COV-2 spike protein or an antigenic fragment thereof. In certain embodiments, the optimized nucleotide sequence encodes the full-length SARS-COV-2 spike protein. In specific embodiments, the optimized nucleotide sequence encodes an amino acid sequence comprising SEQ ID NO:1. In other embodiments, the nucleic acid of the invention is an optimized nucleotide sequence encoding the ectodomain of the SARS-COV-2 spike protein or an antigenic fragment thereof. In specific embodiments, the optimized nucleotide sequences encode an amino acid sequence comprising SEQ ID NO:2. In certain embodiments, the antigenic fragment comprises the receptor-binding domain (RBD) of the SARS-COV-2 spike protein. In specific embodiments, the optimized nucleotide sequencing encodes an amino acid sequence comprising SEQ ID NO:6.


In certain embodiments, the antigenic fragment further comprises a signal sequence. In certain embodiments, the signal sequence is SEQ ID NO: 7. In other embodiments, the optimized nucleotide sequence of the invention encodes an amino acid sequence comprising SEQ ID NO:8. In certain embodiments, the signal sequence is SEQ ID NO: 142. In other embodiments, the optimized nucleotide sequence of the invention encodes an amino acid sequence comprising SEQ ID NO: 143. In further aspects of the invention the antigenic fragment can additional comprises an Fc region. In specific embodiments, the Fc region has the amino acid sequence of SEQ ID NO: 18. In certain embodiments, the antigenic fragment further comprises a signal sequence and an Fc region.


In certain embodiments, the antigenic fragment consists of the RBD of the SARS-COV-2 spike protein operably linked to a signal sequence and an Fc region. In particular embodiments, the optimized nucleotide sequences encode an amino acid sequence comprising SEQ ID NO:20.


In other embodiments, the SARS-COV-2 spike protein, the ectodomain of the SARS-CoV-2 spike protein or the antigenic fragment thereof has been modified to form a stable prefusion conformation. In certain embodiments, the SARS-COV-2 spike protein, the ectodomain of the SARS-COV-2 spike protein or the antigenic fragment has been modified relative to naturally occurring SARS-COV-2 spike protein to remove the furin cleavage site required for activation. In specific embodiments, the optimized nucleotide sequence encodes a SARS-COV-2 S protein which has been modified relative to naturally occurring SARS-COV-2 spike protein to remove the furin cleavage site required for activation. In further specific embodiments, the optimized nucleotide sequences encode an amino acid sequence comprising SEQ ID NO:9.


In certain embodiments, the SARS-COV-2 spike protein, the ectodomain of the SARS-CoV-2 spike protein or the antigenic fragment thereof has been modified relative to naturally occurring SARS-COV-2 spike protein to mutate residue 985 to proline and/or mutate residues 986 and 987 to proline. In specific embodiments, the optimized nucleotide sequence encodes a SARS-CoV-2 S protein which has been modified relative to naturally occurring SARS-COV-2 spike protein to mutate residues 986 and 987 to proline. In further specific embodiments, the optimized nucleotide sequence encodes an amino acid sequence comprising SEQ ID NO: 10. In further specific embodiments, the optimized nucleotide sequence encodes an amino acid sequence comprising SEQ ID NO: 118.


In certain embodiments, the optimized nucleotide sequence encodes a SARS-COV-2 S protein which has been modified relative to naturally occurring SARS-COV-2 spike protein to mutate residues 985, 986 and 987 to proline. In specific embodiments, the optimized nucleotide sequences encode an amino acid sequence comprising SEQ ID NO: 92.


In certain embodiments, the SARS-COV-2 spike protein, the ectodomain of the SARS-CoV-2 spike protein or the antigenic fragment thereof has been modified relative to naturally occurring SARS-COV-2 spike protein to remove the furin cleavage site and to mutate (a) residues 985 to proline; and/or (b) residues 986 and 987 to proline. In specific embodiments, the SARS-CoV-2 spike protein, the ectodomain of the SARS-COV-2 spike protein or the antigenic fragment thereof has been modified relative to naturally occurring SARS-COV-2 spike protein to remove the furin cleavage site and to mutate residues 986 and 987 to proline. In certain embodiments, the optimized nucleotide sequence encodes a SARS-COV-2 S protein. In specific embodiments, the optimized nucleotide sequence encodes an amino acid sequence comprising SEQ ID NO:11. For example, the optimized nucleotide sequence has the nucleic acid sequence of SEQ ID NO: 44 or SEQ ID NO: 148. In further specific embodiments, the optimized nucleotide sequence encodes an amino acid sequence comprising SEQ ID NO:120. For example, the optimized nucleotide sequence encodes the ectodomain of the SARS-COV-2 S protein. In specific embodiments, the optimized nucleotide sequences encode an amino acid sequence comprising SEQ ID NO:12.


In certain embodiments, the SARS-COV-2 spike protein, the ectodomain of the SARS-CoV-2 spike protein or the antigenic fragment thereof has been modified relative to naturally occurring SARS-COV-2 spike protein to remove the furin cleavage site and to mutate residues 985, 986 and 987 to proline. In specific embodiments, the optimized nucleotide sequence encodes a SARS-COV-2 S protein. In further specific embodiments the optimized nucleotide sequences encodes an amino acid sequence comprising SEQ ID NO:94.


In certain embodiments, the SARS-COV-2 spike protein, the ectodomain thereof or the antigenic fragment thereof has been modified relative to mutate residues 986 and 987 to proline and to contain the D614G mutation. In specific embodiments, the optimized nucleotide sequences encodes an amino acid sequence comprising SEQ ID NO: 118.


In certain embodiments, the SARS-COV-2 spike protein, the ectodomain thereof or the antigenic fragment thereof has been modified relative to remove the furin cleavage site, to mutate residues 986 and 987 to proline and to contain the D614G mutation. In specific embodiments, the optimized nucleotide sequences encodes an amino acid sequence comprising SEQ ID NO: 120.


In certain embodiments, the SARS-COV-2 spike protein, the ectodomain thereof or the antigenic fragment thereof has been modified relative to naturally occurring SARS-COV-2 spike protein to mutate residues by 817, 892, 899, 942, 986 and 987 to proline. In specific embodiments, the optimized nucleotide sequences encodes an amino acid sequence comprising SEQ ID NO: 129.


In certain embodiments, the SARS-COV-2 spike protein, the ectodomain thereof or the antigenic fragment thereof has been modified relative to naturally occurring SARS-COV-2 spike protein to remove the furin cleavage site and to mutate residues by 817, 892, 899, 942, 986 and 987 to proline. In specific embodiments, the optimized nucleotide sequences encodes an amino acid sequence comprising SEQ ID NO: 131.


In certain embodiments, the SARS-COV-2 spike protein, the ectodomain thereof or the antigenic fragment thereof has been modified relative to naturally occurring SARS-COV-2 spike protein to mutate residues 817, 892, 899, 942, 986 and 987 to proline and which contains the D614G mutation. In specific embodiments, the optimized nucleotide sequences encodes an amino acid sequence comprising SEQ ID NO: 133.


In certain embodiments, the SARS-COV-2 spike protein, the ectodomain thereof or the antigenic fragment thereof has been modified relative to naturally occurring SARS-COV-2 spike protein to remove the furin cleavage site and to mutate residues 817, 892, 899, 942, 986 and 987 to proline and which contains the D614G mutation. In specific embodiments, the optimized nucleotide sequences encodes an amino acid sequence comprising SEQ ID NO: 135.


In certain embodiments, the SARS-COV-2 spike protein, the ectodomain thereof or the antigenic fragment thereof has been modified relative to naturally occurring SARS-COV-2 spike protein to remove the furin cleavage site, to mutate residues 986 and 987 to proline and which contains an extended N-terminal signal peptide. In specific embodiments, the optimized nucleotide sequences encodes an amino acid sequence comprising SEQ ID NO: 123. In certain embodiments, the SARS-COV-2 spike protein, the ectodomain thereof or the antigenic fragment thereof has been modified relative to naturally occurring SARS-COV-2 spike protein to remove the furin cleavage site, to mutate residues 817, 892, 899, 942, 986 and 987 to proline and which contains an extended N-terminal signal peptide. In specific embodiments, the optimized nucleotide sequences encodes an amino acid sequence comprising SEQ ID NO: 137.


In certain embodiments, the SARS-COV-2 spike protein, the ectodomain thereof or the antigenic fragment thereof c been modified relative to naturally occurring SARS-COV-2 spike protein to mutate the ER retrieval signal. In certain embodiments, the wherein the SARS-COV-2 spike protein, the ectodomain thereof or the antigenic fragment thereof has been modified relative to naturally occurring SARS-COV-2 spike protein to remove the furin cleavage site, to mutate residues 986 and 987 to proline and to remove the ER retrieval signal. In specific embodiments, the optimized nucleotide sequences encodes an amino acid sequence comprising SEQ ID NO: 125.


In certain embodiments, the SARS-COV-2 spike protein, the ectodomain thereof or the antigenic fragment thereof has been modified relative to naturally occurring SARS-COV-2 spike protein to remove the furin cleavage site, to mutate residues 986 and 987 to proline, to remove the ER retrieval signal and which contains an extended N-terminal signal peptide. In specific embodiments, the optimized nucleotide sequences encodes an amino acid sequence comprising SEQ ID NO: 127.


In certain embodiments, the SARS-COV-2 spike protein, the ectodomain thereof or the antigenic fragment thereof has been modified relative to naturally occurring SARS-COV-2 spike protein to remove the furin cleavage site, to mutate residues 817, 892, 899, 942, 986 and 987 to proline and to remove the ER retrieval signal. In specific embodiments, the optimized nucleotide sequences encodes an amino acid sequence comprising SEQ ID NO: 139.


In certain embodiments, the SARS-COV-2 spike protein, the ectodomain thereof or the antigenic fragment thereof has been modified relative to naturally occurring SARS-COV-2 spike protein to remove the furin cleavage site, to mutate residues 817, 892, 899, 942, 986 and 987 to proline, to remove the ER retrieval signal and which contains an extended N-terminal signal peptide. In specific embodiments, the optimized nucleotide sequences encodes an amino acid sequence comprising SEQ ID NO: 141.


In certain embodiments, an antigenic fragment comprises or consists of the S1, S2 or S2′ subunit of the SARS-COV-2 spike protein. In certain embodiments, the optimized nucleotide sequences encode an amino acid sequence comprising SEQ ID NO: 3, SEQ ID NO: 4 or SEQ ID NO: 5.


In certain embodiments, an optimized nucleotide sequence encodes a fusion peptide comprising one or more antigenic fragments of the SARS-COV-2 S protein. In specific embodiments, the one or more antigenic fragments of the SARS-COV-2 S protein has/have the amino acid sequence of SEQ ID NO: 21, the amino acid sequence SEQ ID NO: 22, the amino acid sequence SEQ ID NO: 23 and/or the amino acid sequence SEQ ID NO: 24.


In certain embodiments, the one or more antigenic fragments are linked by a linker sequence, e.g., GGGGS. In specific embodiments, the optimized nucleotide sequence encodes a fusion peptide comprising SEQ ID NO: 25 or SEQ ID NO: 27. In certain embodiments the fusion peptide comprises an N terminal signal sequence, for example the optimized nucleotide sequence encodes a fusion peptide comprising SEQ ID NO: 51 or SEQ ID NO: 53. In certain embodiments the fusion peptide comprises a C-terminal Fc domain. In other embodiments, the fusion peptide comprises an N terminal signal sequence and a C-terminal Fc domain. In specific embodiments, the optimized nucleotide sequence encodes a fusion peptide comprising SEQ ID NO: 55 or SEQ ID NO: 57.


In other aspects, the nucleic acid of the invention as disclosed above is for use in therapy. For example, the nucleic acid of the invention as disclosed above may be for use in the manufacture of a medicament for the prophylaxis of an infection with SARS-COV-2. In other aspects an immunogenic composition comprising the nucleic acid of the invention for use in prophylaxis of an infection with SARS-COV-2 is provided. The invention also provides methods of treating or preventing a SARS-COV-2 infection, said method comprising administering to a subject an effective amount of an immunogenic composition comprising the nucleic acid of the invention.


In other aspects, an immunogenic composition according to the invention comprises at least two nucleic acids, for use in prophylaxis of an infection with SARS-COV-2 is provided wherein a first nucleic acid comprises an optimized nucleotide sequence which encodes an amino acid sequence comprising a sequence selected from SEQ ID NO: 1, 2, 3, 4, 5, 8, 9, 10, 11, 12, 14, 15, 16, 17, 19, 20, 35, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84. 86, 88, 90, 92, 94, 96, 98, 104, 106, 108, 110, 118, 120, 123, 125, 127, 129, 131, 133, 135, 137, 139 or 141, and wherein one or more further nucleic acid(s) comprise(s) an optimized nucleotide sequence which encodes an amino acid sequence comprising a sequence selected from SEQ ID NO: 151, 153, 155, 157, 159, 161, 163, 165, 167, 169 or 171.


In other aspects, an immunogenic composition according to the invention comprises at least two nucleic acids, for use in prophylaxis of an infection with SARS-COV-2 is provided, wherein a first nucleic acid comprises an optimized nucleotide sequence which encodes an amino acid sequence comprising SEQ ID NO:11, optionally wherein the optimized nucleotide sequence has the nucleic acid sequence of SEQ ID NO: 44, and wherein one or more further nucleic acid(s) is (are) selected from:

    • (a) a nucleic acid comprising an optimized nucleotide sequence which encodes an amino acid sequence comprising a sequence selected from SEQ ID NO: 157, optionally wherein the optimized nucleotide sequence has the nucleic acid sequence of SEQ ID NO 156, and
    • (b) nucleic acid comprises an optimized nucleotide sequence that encodes an amino acid sequence comprising a sequence selected from SEQ ID NO: 163, optionally wherein the optimized nucleotide sequence has the nucleic acid sequence of SEQ ID NO: 162; and
    • (c) nucleic acid comprises an optimized nucleotide sequence that encodes an amino acid sequence comprising a sequence selected from SEQ ID NO: 167, optionally wherein the optimized nucleotide sequence has the nucleic acid sequence of SEQ ID NO: 166; and
    • (d) nucleic acid comprises an optimized nucleotide sequence that encodes an amino acid sequence comprising a sequence selected from SEQ ID NO: 171, optionally wherein the optimized nucleotide sequence has the nucleic acid sequence of SEQ ID NO: 170.


Certain aspects, the invention provides a pharmaceutical composition comprising i) a nucleic acid of the invention and ii) a lipid nanoparticle. In certain embodiments, the nucleic acid is encapsulated in the lipid nanoparticle. The lipid nanoparticle can comprise one or more of a cationic lipid, a non-cationic lipid, a cholesterol-based lipid, a PEG-modified lipid, or a combination thereof. In certain embodiments, the lipid nanoparticle comprises a cationic lipid, a non-cationic lipid, a cholesterol-based lipid, and a PEG-modified lipid.


In certain embodiments, the lipid nanoparticle comprises a cationic lipid, a non-cationic lipid, and a PEG-modified lipid. In certain embodiments, the lipid nanoparticle comprises:

    • (a) a cationic lipid selected from DOTAP (1,2-dioleyl-3-trimethylammonium propane), DODAP (1,2-dioleyl-3-dimethylammonium propane), DOTMA (N-[1-(2,3-dioleyloxy) propyl]-N,N,N-trimethylammonium chloride), DLinKC2DMA, DLin-KC2-DM, C12-200, cKK-E12, cKK-E10, HGT5000, HGT5001, HGT4003, ICE, HGT4001, HGT4002, TL1-01D-DMA, TL1-04D-DMA, TL1-08D-DMA, TL1-10D-DMA, OF-Deg-Lin and OF-02;
    • (b) a non-cationic lipid selected from DSPC (1,2-distearoyl-sn-glycero-3-phosphocholine), DPPC (1,2-dipalmitoyl-sn-glycero-3-phosphocholine), DOPE (1,2-dioleyl-sn-glycero-3-phosphoethanolamine), DEPE 1,2-dierucoyl-sn-glycero-3-phosphoethanolamine, DOPC (1,2-dioleyl-sn-glycero-3-phosphotidylcholine), DPPE (1,2-dipalmitoyl-sn-glycero-3-phosphoethanolamine), DMPE (1,2-dimyristoyl-sn-glycero-3-phosphoethanolamine), and DOPG (1,2-dioleoyl-sn-glycero-3-phospho-(l′-rac-glycerol));
    • (c) a cholesterol-based lipid selected from DC-Choi (N,N-dimethyl-N-ethylcarboxamidocholesterol), 1,4-bis(3-N-oleylamino-propyl) piperazine, or imidazole cholesterol ester (ICE); and/or
    • (d) a PEG-modified lipid selected from PEGylated cholesterol and DMG-PEG-2K.


In certain embodiments of the pharmaceutical composition the

    • a. the cationic lipid is selected from cKK-E12, cKK-E10, OF-Deg-Lin and OF-02;
    • b. the non-cationic lipid is selected from DOPE and DEPE;
    • c. the cholesterol-based lipid is cholesterol; and
    • d. the PEG-modified lipid is DMG-PEG-2K.


In certain embodiments, the cationic lipid constitutes about 30-60% of the lipid nanoparticle by molar ratio, e.g., about 35-40%. In certain embodiments, the ratio of cationic lipid to non-cationic lipid to cholesterol-based lipid to PEG-modified lipid is approximately 30-60:25-35:20-30:1-15 by molar ratio or wherein the ratio of cationic lipid to non-cationic lipid to PEG-modified lipid is approximately 55-65:30-40:1-15 by molar ratio.


In certain embodiments, the lipid nanoparticle includes a combination of a cationic lipid, a non-cationic lipid, a PEG-modified lipid and optionally cholesterol selected from cKK-E12, DOPE, cholesterol and DMG-PEG2K; cKK-E10, DOPE, cholesterol and DMG-PEG2K; OF-Deg-Lin, DOPE, cholesterol and DMG-PEG2K; OF-02, DOPE, cholesterol and DMG-PEG2K; TL1-01D-DMA, DOPE, cholesterol and DMG-PEG2K; TL1-04D-DMA, DOPE, cholesterol and DMG-PEG2K; TL1-08D-DMA, DOPE, cholesterol and DMG-PEG2K; TL1-10D-DMA, DOPE, cholesterol and DMG-PEG2K; ICE, DOPE and DMG-PEG2K; HGT4001, DOPE and DMG-PEG2K; or HGT4002, DOPE and DMG-PEG2K.


In certain embodiments, the lipid nanoparticle has an average size of less than 150 nm, e.g., less than 100 nm. In specific embodiments, the lipid nanoparticle has an average size of about 50-70 nm, e.g., about 55-65 nm.


In certain embodiments, the lipid nanoparticles are suspended in 10% trehalose in water for injection. In certain embodiments, the nucleic acid is mRNA at a concentration of between about 0.5 mg/mL to about 1.0 mg/mL.


In certain aspects, the invention provides a pharmaceutical composition comprising i) an optimized nucleic acid of invention (e.g., an mRNA) and ii) a lipid nanoparticle. Such pharmaceutical compositions are for use in treating or preventing an infection with SARS-COV-2. In certain embodiments, the pharmaceutical composition is administered parenterally. In certain embodiments, the pharmaceutical composition is administered intravenously, intradermally, subcutaneously, or intramuscularly. In specific embodiments the pharmaceutical composition is administered intravenously or intramuscularly.


In certain embodiments, the pharmaceutical composition is administered at least once. In specific embodiments, the pharmaceutical composition is administered at least twice. In more specific embodiments, the period between administrations is at least 2 weeks, e.g. 1 month. In some embodiments, the period between administrations is about 3 weeks.


In certain aspects, the invention provides a SARS-COV-2 antigen. For example, the SARS-COV-2 antigen can be any of the SARS-COV-2 spike proteins, antigenic fragments or fusion peptides of antigenic fragments which are described above or in more detail below in reference to particular optimized nucleic acid sequences. In some embodiments, the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO: 1. In some embodiments, the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO: 10 . . . . In some embodiments, the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO: 9. In some embodiments, the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO: 11. In some embodiments, the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:2. In some embodiments, the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO: 12. In some embodiments, the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:3. In some embodiments, the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:8. In some embodiments, the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:20. In some embodiments, the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO: 17. In some embodiments, the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:14. In some embodiments, the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:16. In some embodiments, the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:66. In some embodiments, the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:15. In some embodiments, the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:82. In some embodiments, the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:84. In some embodiments, the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:74. In some embodiments, the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:76. In some embodiments, the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:78. In some embodiments, the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:80. In some embodiments, the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:68. In some embodiments, the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:70. In some embodiments, the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:96. In some embodiments, the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:86. In some embodiments, the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:88. In some embodiments, the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:90. In some embodiments, the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:92. In some embodiments, the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:94. In some embodiments, the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO: 118. In some embodiments, the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO: 120.


In further aspects, the invention provides a peptide fusion construct comprising one or more antigenic regions of the SARS-COV-2 S protein, where the one or more antigenic regions comprises or consists of the following components: FP, D1, D2 and/or B1, wherein FP comprises residues 815-833 of the SARS-COV-2 S protein, wherein DI comprises residues 820-846 of the SARS-COV-2 S protein, wherein D2 comprises residues 1078-1111 of the SARS-COV-2 S protein, and wherein B1 comprises residues 798-829 of the SARS-COV-2 S protein. The peptide fusion construct may have the following structure: D1-linker-FP-linker-D2-linker-D1. DI may have the sequence of SEQ ID NO: 22. FP may have the sequence of SEQ ID NO: 21. The linker comprises or consists of the amino acid sequence GGGGS. For example, the peptide fusion construct may comprise or consist of the sequence of SEQ ID NO: 25 or 51, 55. Alternatively, the peptide fusion construct may have the following structure: FP-linker-FP-linker-FP, D1-linker-D1-linker-D1, or FP/D1-linker-FP/D1-linker-FP/D1. The FP/D1 portion may have the sequence of SEQ ID NO: 99. The linker may comprise or consist of the amino acid sequence GGGGS. For example, the peptide fusion construct may comprise or consist of the sequence of SEQ ID NO: 27 or 53, 57


The invention also provides a pharmaceutical composition comprising the SARS-COV-2 antigen or the peptide fusion construct of the invention. In some embodiments, the pharmaceutical composition further comprising an adjuvant. In certain embodiments, the adjuvant is selected from alum, CpG, PolyI:C, MF59, AS01, AS02, AS03, AS04, AF03, flagellin, ISCOMs and ISCOMMATRIX. In some aspects, the pharmaceutical composition is for use in treating or preventing an infection with SARS-COV-2. In some embodiments, the pharmaceutical composition is administered parenterally. In some embodiments, the pharmaceutical composition is administered intradermally, subcutaneously, or intramuscularly. In some embodiments, the pharmaceutical composition is administered intramuscularly. In some embodiments, the pharmaceutical composition is administered at least once. In some embodiments, the pharmaceutical composition is administered at least twice. In some embodiments, the period between administrations is at least 2 weeks, e.g. 1 month. In some embodiments, the period between administrations is about 3 weeks.


In a particular embodiment, the invention provides an mRNA construct consisting of the following structural elements:

    • (i) a 5′ cap with the following structure:




embedded image




    • (ii) a 5′ untranslated region (5′ UTR) having the nucleic acid sequence of SEQ ID NO: 144;

    • (iii) a protein coding region having the nucleic acid sequence of SEQ ID NO: 148;

    • (iv) a 3′ untranslated region (3′ UTR) having the nucleic acid sequence of SEQ ID NO: 145; and

    • (v) a polyA tail.





In a specific embodiment, the invention provides a lipid nanoparticle encapsulating said mRNA construct. The lipid nanoparticle may comprise a cationic lipid (e.g., cKK-E12, cKK-E10, OF-Deg-Lin or OF-02), a non-cationic lipid (e.g., DOPE or DEPE), a cholesterol-based lipid (e.g., cholesterol) and a PEG-modified lipid (e.g., DMG-PEG-2K). In a particular embodiment, the mRNA construct or the lipid nanoparticle encapsulating it are provided as an immunogenic composition. In some embodiments, the immunogenic composition comprises between 10 μg and 200 μg of the mRNA construct. In particular embodiments, the immunogenic composition comprises between 15 μg and 135 μg (e.g., between 15 μg and 45 μg) of the mRNA construct. In some embodiments, the immunogenic composition may comprise at least 20 μg, at least 25 μg, at least 30 μg, at least 35 μg, at least 40 μg, or at least 45 μg of the mRNA construct. In specific embodiments, the immunogenic composition comprises 15 μg, 45 μg or 135 μg of the mRNA construct. The invention further provides a method of treating or preventing a SARS-COV-2 infection, wherein said method comprises administering to a subject an effective amount of the immunogenic composition. In some embodiments, the immunogenic is administered to the subject at least twice. In some embodiments, the period between administrations is at least 2 weeks. In some embodiments, the period between administrations is about 3 weeks.


In certain embodiments, the invention is further described by the following numbered embodiments:

    • 1. A nucleic acid comprising an optimized nucleotide sequence encoding a SARS-COV-2 antigen, wherein the optimized nucleotide sequence consists of codons associated with a usage frequency which is greater than or equal to 10%; wherein the optimized nucleotide sequence:
    • (i) does not contain a termination signal having one of the following nucleotide sequences:
    • 5′-X1ATCTX2TX3-3′, wherein X1, X2 and X3 are independently selected from A, C, T or G; and 5′-X1AUCUX2UX3-3′, wherein X1, X2 and X3 are independently selected from A, C, U or G;
    • (ii) does not contain any negative cis-regulatory elements and negative repeat elements; and
    • (iii) has a codon adaptation index greater than 0.8;
    • wherein, when divided into non-overlapping 30 nucleotide-long portions, each portion of the optimized nucleotide sequence has a guanine cytosine content range of 30%-70%.
    • 2. The nucleic acid of embodiment 1, wherein the optimized nucleotide sequence does not contain a termination signal having one of the following sequences: TATCTGTT; TTTTTT; AAGCTT; GAAGAGC; TCTAGA; UAUCUGUU; UUUUUU; AAGCUU; GAAGAGC; UCUAGA.
    • 3. The nucleic acid of embodiment 1 or 2, wherein the nucleic acid is mRNA.
    • 4. The nucleic acid of embodiment 1 or 2, wherein the nucleic acid is DNA.
    • 5. The nucleic acid of any one of the preceding embodiments, wherein the optimized nucleotide sequence encodes the SARS-COV-2 spike protein or an antigenic fragment thereof.
    • 6. The nucleic acid of embodiment 5, wherein the optimized nucleotide sequence encodes the full-length SARS-COV-2 spike protein.
    • 7. The nucleic acid of embodiment 5 or embodiment 6, wherein the optimized nucleotide sequences encodes an amino acid sequence comprising SEQ ID NO:1.
    • 8. The nucleic acid of embodiment 5, wherein the optimized nucleotide sequence encodes the ectodomain of the SARS-COV-2 spike protein or an antigenic fragment thereof.
    • 9 The nucleic acid of embodiment 8, wherein the optimized nucleotide sequences encodes an amino acid sequence comprising SEQ ID NO:2.
    • 10. The nucleic acid of embodiment 5, wherein the antigenic fragment comprises the receptor-binding domain (RBD) of the SARS-COV-2 spike protein.
    • 11. The nucleic acid of embodiment 10, wherein the optimized nucleotide sequences encodes an amino acid sequence comprising SEQ ID NO:6.
    • 12. The nucleic acid of embodiment 10 or 11, wherein the antigenic fragment further comprises a signal sequence.
    • 13. The nucleic acid of embodiment 12, wherein the signal sequence is SEQ ID NO: 7.
    • 14. The nucleic acid of embodiment 12 or embodiment 13, wherein the optimized nucleotide sequences encodes an amino acid sequence comprising SEQ ID NO:8.
    • 15. The nucleic acid of embodiment 12, wherein the signal sequence is SEQ ID NO: 142.
    • 16. The nucleic acid of embodiment 12 or embodiment 13, wherein the optimized nucleotide sequences encodes an amino acid sequence comprising SEQ ID NO:143.
    • 17. The nucleic acid of embodiments 10-16, wherein the antigenic fragment further comprises an Fc region.
    • 18. The nucleic acid of embodiment 17, wherein the Fc region is SEQ ID NO: 18.
    • 19. The nucleic acid of embodiments 10-18, wherein the antigenic fragment further comprises a signal sequence and an Fc region.
    • 20. The nucleic acid of embodiments 10-18, wherein the antigenic fragment consists of the RBD of the SARS-COV-2 spike protein operably linked to a signal sequence and an Fc region.
    • 21. The nucleic acid of embodiment 20, wherein the optimized nucleotide sequences encodes an amino acid sequence comprising SEQ ID NO:20.
    • 22. The nucleic acid of any one of embodiment 5, embodiment 6 or embodiment 8, wherein the SARS-COV-2 spike protein, the ectodomain thereof or the antigenic fragment thereof has been modified relative to naturally occurring SARS-COV-2 spike protein to assume a stable prefusion conformation.
    • 23. The nucleic acid of embodiment 22, wherein the SARS-COV-2 spike protein, the ectodomain or the antigenic fragment has been modified relative to naturally occurring SARS-CoV-2 spike protein to remove the furin cleavage site required for activation.
    • 24. The nucleic acid of embodiment 23, wherein the optimized nucleotide sequence encodes a SARS-COV-2 spike protein which has been modified relative to naturally occurring SARS-COV-2 spike protein to remove the furin cleavage site required for activation
    • 25. The nucleic acid of embodiment 23, wherein the optimized nucleotide sequences encodes an amino acid sequence comprising SEQ ID NO:9.
    • 26. The nucleic acid of embodiments 22-25, wherein the SARS-COV-2 spike protein, the ectodomain thereof or the antigenic fragment thereof has been modified relative to naturally occurring SARS-COV-2 spike protein to mutate residue 985 to proline and/or mutate residues 986 and 987 to proline.
    • 27. The nucleic acid of embodiment 26, wherein the optimized nucleotide sequence encodes a SARS-COV-2 spike protein which has been modified relative to naturally occurring SARS-COV-2 spike protein to mutate residues 986 and 987 to proline.
    • 28. The nucleic acid of embodiment 27, wherein the optimized nucleotide sequences encodes an amino acid sequence comprising SEQ ID NO: 10 or SEQ ID NO: 118.
    • 29. The nucleic acid of embodiment 26, wherein the optimized nucleotide sequence encodes a SARS-COV-2 spike protein which has been modified relative to naturally occurring SARS-COV-2 spike protein to mutate residues 985, 986 and 987 to proline.
    • 30. The nucleic acid of embodiment 29, wherein the optimized nucleotide sequences encodes an amino acid sequence comprising SEQ ID NO:92.
    • 31. The nucleic acid of embodiments 22-30, wherein the SARS-COV-2 spike protein, the ectodomain thereof or the antigenic fragment thereof has been modified relative to naturally occurring SARS-COV-2 spike protein to remove the furin cleavage site and to mutate
    • (a) residues 985 to proline; and/or
    • (b) residues 986 and 987 to proline.
    • 32. The nucleic acid to embodiment 31, wherein the SARS-COV-2 spike protein, the ectodomain of the SARS-COV-2 spike protein or the antigenic fragment thereof has been modified relative to naturally occurring SARS-COV-2 spike protein to remove the furin cleavage site and to mutate residues 986 and 987 to proline.
    • 33. The nucleic acid of embodiment 32, wherein the optimized nucleotide sequence encodes a SARS-COV-2 spike protein.
    • 34. The nucleic acid of embodiment 33, wherein the optimized nucleotide sequences encodes an amino acid sequence comprising SEQ ID NO:11 or SEQ ID NO: 120, optionally wherein the optimized nucleotide sequence has the nucleic acid sequence of SEQ ID NO: 44 or SEQ ID NO: 148.
    • 35. The nucleic acid of embodiment 32, wherein the optimized nucleotide sequence encodes the ectodomain of the SARS-COV-2 spike protein.
    • 36. The nucleic acid of embodiment 35, wherein the optimized nucleotide sequences encodes an amino acid sequence comprising SEQ ID NO:12.
    • 37. The nucleic acid to embodiment 31, wherein the SARS-COV-2 spike protein, the ectodomain thereof or the antigenic fragment thereof has been modified relative to naturally occurring SARS-COV-2 spike protein to remove the furin cleavage site and to mutate residues 985, 986 and 987 to proline.
    • 38. The nucleic acid of embodiment 37, wherein the optimized nucleotide sequence encodes a SARS-COV-2 spike protein.
    • 39. The nucleic acid of embodiment 38, wherein the optimized nucleotide sequences encodes an amino acid sequence comprising SEQ ID NO:94.
    • 40. The nucleic acid of embodiments 22-39, wherein the SARS-COV-2 spike protein, the ectodomain thereof or the antigenic fragment thereof has been modified relative to mutate residues 986 and 987 to proline and to contain the D614G mutation.
    • 41. The nucleic acid of embodiment 40, wherein the optimized nucleotide sequences encodes an amino acid sequence comprising SEQ ID NO: 118.
    • 42. The nucleic acid of embodiments 22-41, wherein the SARS-COV-2 spike protein, the ectodomain thereof or the antigenic fragment thereof has been modified relative to remove the furin cleavage site, to mutate residues 986 and 987 to proline and to contain the D614G mutation
    • 43. The nucleic acid of embodiment 42, wherein the optimized nucleotide sequences encodes an amino acid sequence comprising SEQ ID NO: 120.
    • 44. The nucleic acid of embodiments 22-43, wherein the SARS-COV-2 spike protein, the ectodomain thereof or the antigenic fragment thereof has been modified relative to naturally occurring SARS-COV-2 spike protein to mutate residues by 817, 892, 899, 942, 986 and 987 to proline.
    • 45. The nucleic acid of embodiment 44, wherein the optimized nucleotide sequences encodes an amino acid sequence comprising SEQ ID NO: 129.
    • 46. The nucleic acid of embodiments 22-45, wherein the SARS-COV-2 spike protein, the ectodomain thereof or the antigenic fragment thereof has been modified relative to naturally occurring SARS-COV-2 spike protein to remove the furin cleavage site and to mutate residues by 817, 892, 899, 942, 986 and 987 to proline.
    • 47. The nucleic acid of embodiment 46, wherein the optimized nucleotide sequences encodes an amino acid sequence comprising SEQ ID NO: 131.
    • 48. The nucleic acid of embodiments 22-47, wherein the SARS-COV-2 spike protein, the ectodomain thereof or the antigenic fragment thereof has been modified relative to naturally occurring SARS-COV-2 spike protein to mutate residues 817, 892, 899, 942, 986 and 987 to proline and which contains the D614G mutation.
    • 49. The nucleic acid of embodiment 48, wherein the optimized nucleotide sequences encodes an amino acid sequence comprising SEQ ID NO: 133.
    • 50. The nucleic acid of embodiments 22-49, wherein the SARS-COV-2 spike protein, the ectodomain thereof or the antigenic fragment thereof has been modified relative to naturally occurring SARS-COV-2 spike protein to remove the furin cleavage site and to mutate residues 817, 892, 899, 942, 986 and 987 to proline and which contains the D614G mutation.
    • 51. The nucleic acid of embodiment 50, wherein the optimized nucleotide sequences encodes an amino acid sequence comprising SEQ ID NO: 135.
    • 52. The nucleic acid of embodiments 22-51, wherein the SARS-COV-2 spike protein, the ectodomain thereof or the antigenic fragment thereof has been modified relative to naturally occurring SARS-COV-2 spike protein to remove the furin cleavage site, to mutate residues 986 and 987 to proline and which contains an extended N-terminal signal peptide.
    • 53. The nucleic acid of embodiment 52, wherein the optimized nucleotide sequences encodes an amino acid sequence comprising SEQ ID NO: 123.
    • 54. The nucleic acid of embodiments 22-53, wherein the SARS-COV-2 spike protein, the ectodomain thereof or the antigenic fragment thereof has been modified relative to naturally occurring SARS-COV-2 spike protein to remove the furin cleavage site, to mutate residues 817, 892, 899, 942, 986 and 987 to proline and which contains an extended N-terminal signal peptide.
    • 55. The nucleic acid of embodiment 54, wherein the optimized nucleotide sequences encodes an amino acid sequence comprising SEQ ID NO: 137.
    • 56. The nucleic acid of embodiments 22-55, wherein the SARS-COV-2 spike protein, the ectodomain thereof or the antigenic fragment thereof has been modified relative to naturally occurring SARS-COV-2 spike protein to mutate the ER retrieval signal.
    • 57. The nucleic acid of embodiment 56, wherein the SARS-COV-2 spike protein, the ectodomain thereof or the antigenic fragment thereof has been modified relative to naturally occurring SARS-COV-2 spike protein to remove the furin cleavage site, to mutate residues 986 and 987 to proline and to remove the ER retrieval signal.
    • 58. The nucleic acid of embodiment 57, wherein the optimized nucleotide sequences encodes an amino acid sequence comprising SEQ ID NO: 125.
    • 59. The nucleic acid of embodiment 56, wherein the SARS-COV-2 spike protein, the ectodomain thereof or the antigenic fragment thereof has been modified relative to naturally occurring SARS-COV-2 spike protein to remove the furin cleavage site, to mutate residues 986 and 987 to proline, to remove the ER retrieval signal and which contains an extended N-terminal signal peptide.
    • 60. The nucleic acid of embodiment 59, wherein the optimized nucleotide sequences encodes an amino acid sequence comprising SEQ ID NO: 127.
    • 61. The nucleic acid of embodiment 56, wherein the SARS-COV-2 spike protein, the ectodomain thereof or the antigenic fragment thereof has been modified relative to naturally occurring SARS-COV-2 spike protein to remove the furin cleavage site, to mutate residues 817, 892, 899, 942, 986 and 987 to proline and to remove the ER retrieval signal.
    • 62. The nucleic acid of embodiment 61, wherein the optimized nucleotide sequences encodes an amino acid sequence comprising SEQ ID NO: 139.
    • 63. The nucleic acid of embodiment 56, wherein the SARS-COV-2 spike protein, the ectodomain thereof or the antigenic fragment thereof has been modified relative to naturally occurring SARS-COV-2 spike protein to remove the furin cleavage site, to mutate residues 817, 892, 899, 942, 986 and 987 to proline, to remove the ER retrieval signal and which contains an extended N-terminal signal peptide.
    • 64. The nucleic acid of embodiment 63, wherein the optimized nucleotide sequences encodes an amino acid sequence comprising SEQ ID NO: 141.
    • 65. The nucleic acid of embodiment 5, wherein the antigenic fragment comprises or consists of the S1, S2 or S2′ subunit of the SARS-COV-2 spike protein.
    • 66. The nucleic acid of embodiment 65, wherein the optimized nucleotide sequences encodes an amino acid sequence comprising SEQ ID NO: 3, SEQ ID NO: 4 or SEQ ID NO: 5.
    • 67. The nucleic acid of embodiments 1-4, wherein the optimized nucleotide sequence encodes a fusion peptide comprising one or more antigenic fragments of the SARS-COV-2 spike protein.
    • 68. The nucleic acid of embodiment 67, wherein the one or more antigenic fragments of the SARS-COV-2 spike protein has/have the amino acid sequence of SEQ ID NO: 21, the amino acid sequence SEQ ID NO: 22, the amino acid sequence SEQ ID NO: 23 and/or the amino acid sequence SEQ ID NO: 24.
    • 69. The nucleic acid of embodiment 67 or 68, wherein the one or more antigenic fragments are linked by a linker sequence, e.g., GGGGS.
    • 70. The nucleic acid of embodiment 69, wherein the optimized nucleotide sequence encodes a fusion peptide comprising SEQ ID NO: 25 or SEQ ID NO: 27.
    • 71. The nucleic acid of embodiment 67-70, wherein the fusion peptide comprises an N terminal signal sequence.
    • 72. The nucleic acid of embodiment 71, wherein the optimized nucleotide sequence encodes a fusion peptide comprising SEQ ID NO: 51 or SEQ ID NO: 53.
    • 73. The nucleic acid of embodiment 67-72, wherein the fusion peptide comprises a C-terminal Fc domain.
    • 74. The nucleic acid of embodiment 67-73 wherein the fusion peptide comprises an N terminal signal sequence and a C-terminal Fc domain.
    • 75. The nucleic acid of embodiment 74, wherein the optimized nucleotide sequence encodes a fusion peptide comprising SEQ ID NO: 55 or SEQ ID NO: 57.
    • 76. The nucleic acid of any one of embodiments 1 to 75 for use in therapy.
    • 77. An immunogenic composition comprising the nucleic acid of any one of embodiments 1-76 for use in prophylaxis of an infection with SARS-COV-2.
    • 78. A method of treating or preventing a SARS-COV-2 infection, said method comprising administering to a subject an effective amount of an immunogenic composition comprising the nucleic acid of any one of embodiments 1-76.
    • 79. A pharmaceutical composition comprising i) the nucleic acid of any one of embodiments 1-76 and ii) a lipid nanoparticle.
    • 80. The pharmaceutical composition of embodiment 79, wherein the nucleic acid is encapsulated in the lipid nanoparticle.
    • 81. The pharmaceutical composition of embodiment 79 or embodiment 80, wherein the lipid nanoparticle comprises one or more of a cationic lipid, a non-cationic lipid, a cholesterol-based lipid, a PEG-modified lipid, or a combination thereof.
    • 82. The pharmaceutical composition of embodiment 81, wherein the lipid nanoparticle comprises a cationic lipid, a non-cationic lipid, a cholesterol-based lipid, and a PEG-modified lipid.
    • 83. The pharmaceutical composition of embodiment 79, wherein the lipid nanoparticle comprises a cationic lipid, a non-cationic lipid, and a PEG-modified lipid.
    • 84. The pharmaceutical composition of any one of embodiments 79-83, wherein the lipid nanoparticle comprises:
    • a. a cationic lipid selected from DOTAP (1,2-dioleyl-3-trimethylammonium propane), DODAP (1,2-dioleyl-3-dimethylammonium propane), DOTMA (N-[1-(2,3-dioleyloxy) propyl]-N,N,N-trimethylammonium chloride), DLinKC2DMA, DLin-KC2-DM, C12-200, cKK-E12, cKK-E10, HGT5000, HGT5001, HGT4003, ICE, HGT4001, HGT4002, TL1-01D-DMA, TL1-04D-DMA, TL1-08D-DMA, TL1-10D-DMA, OF-Deg-Lin and OF-02;
    • b. a non-cationic lipid selected from DSPC (1,2-distearoyl-sn-glycero-3-phosphocholine), DPPC (1,2-dipalmitoyl-sn-glycero-3-phosphocholine), DOPE (1,2-dioleyl-sn-glycero-3-phosphoethanolamine), DEPE 1,2-dierucoyl-sn-glycero-3-phosphoethanolamine, DOPC (1,2-dioleyl-sn-glycero-3-phosphotidylcholine), DPPE (1,2-dipalmitoyl-sn-glycero-3-phosphoethanolamine), DMPE (1,2-dimyristoyl-sn-glycero-3-phosphoethanolamine), and DOPG (1,2-dioleoyl-sn-glycero-3-phospho-(1′-rac-glycerol));
    • c. a cholesterol-based lipid selected from DC-Choi (N,N-dimethyl-N-ethylcarboxamidocholesterol), 1,4-bis(3-N-oleylamino-propyl) piperazine, or imidazole cholesterol ester (ICE); and/or
    • d. a PEG-modified lipid selected from PEGylated cholesterol and DMG-PEG-2K.
    • 85. The pharmaceutical composition of embodiment 82, wherein
    • a. the cationic lipid is selected from cKK-E12, cKK-E10, OF-Deg-Lin and OF-02;
    • b. the non-cationic lipid is selected from DOPE and DEPE;
    • c. the cholesterol-based lipid is cholesterol; and
    • d. the PEG-modified lipid is DMG-PEG-2K.
    • 86. The pharmaceutical composition of any one of embodiments 79-85, wherein cationic lipid constitutes about 30-60% of the lipid nanoparticle by molar ratio, e.g., about 35-40%.
    • 87. The pharmaceutical composition of any one of embodiments 79-86, wherein the ratio of cationic lipid to non-cationic lipid to cholesterol-based lipid to PEG-modified lipid is approximately 30-60:25-35:20-30:1-15 by molar ratio or wherein the ratio of cationic lipid to non-cationic lipid to PEG-modified lipid is approximately 55-65:30-40:1-15 by molar ratio.
    • 88. The pharmaceutical composition of any one of embodiments 79-87, wherein the lipid nanoparticle includes a combination of a cationic lipid, a non-cationic lipid, a PEG-modified lipid and optionally cholesterol selected from cKK-E12, DOPE, cholesterol and DMG-PEG2K; cKK-E10, DOPE, cholesterol and DMG-PEG2K; OF-Deg-Lin, DOPE, cholesterol and DMG-PEG2K; OF-02, DOPE, cholesterol and DMG-PEG2K; TL1-01D-DMA, DOPE, cholesterol and DMG-PEG2K; TL1-04D-DMA, DOPE, cholesterol and DMG-PEG2K; TL1-08D-DMA, DOPE, cholesterol and DMG-PEG2K; TL1-10D-DMA, DOPE, cholesterol and DMG-PEG2K; ICE, DOPE and DMG-PEG2K; HGT4001, DOPE and DMG-PEG2K; or HGT4002, DOPE and DMG-PEG2K.
    • 89. The pharmaceutical composition of any one of embodiments 79-88, wherein the lipid nanoparticle has an average size of less than 150 nm, e.g., less than 100 nm.
    • 90. The pharmaceutical composition of embodiment 89, wherein the lipid nanoparticle has an average size of about 50-70 nm, e.g., about 55-65 nm.
    • 91. The pharmaceutical composition any one of embodiments 79-90, wherein the lipid nanoparticles are suspended in 10% trehalose in water for injection.
    • 92. The pharmaceutical composition any one of embodiments 79-91, wherein the nucleic acid is mRNA at a concentration of between about 0.5 mg/mL to about 1.0 mg/mL.
    • 93. The pharmaceutical composition of any one of embodiments 79-92 for use in treating or preventing an infection with SARS-COV-2.
    • 94. The pharmaceutical composition for use according to embodiment 79-93, wherein the pharmaceutical composition is administered parenterally.
    • 95. The pharmaceutical composition for use according to embodiment 79-93, wherein the pharmaceutical composition is administered intravenously, intradermally, subcutaneously, or intramuscularly.
    • 96. The pharmaceutical composition for use according to embodiment 95, wherein the pharmaceutical composition is administered intravenously.
    • 97. The pharmaceutical composition for use according to embodiment 95, wherein the pharmaceutical composition is administered intramuscularly.
    • 98. The pharmaceutical composition for use according to any one of embodiments 79-97, wherein the pharmaceutical composition is administered at least once.
    • 99. The pharmaceutical composition for use according to embodiment 98, wherein the pharmaceutical composition is administered at least twice.
    • 100. The pharmaceutical composition for use according to embodiment 99, wherein the period between administrations is at least 2 weeks, e.g. 3 weeks, or 1 month.
    • 101. A SARS-COV-2 antigen, wherein the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO: 1.
    • 102. A SARS-COV-2 antigen, wherein the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO: 10.
    • 103. A SARS-COV-2 antigen, wherein the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO: 9.
    • 104. A SARS-COV-2 antigen, wherein the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO: 11.
    • 105. A SARS-COV-2 antigen, wherein the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:2.
    • 106. A SARS-COV-2 antigen, wherein the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:12.
    • 107. A SARS-COV-2 antigen, wherein the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:3.
    • 108. A SARS-COV-2 antigen, wherein the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:8.
    • 109. A SARS-COV-2 antigen, wherein the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:20.
    • 110. A SARS-COV-2 antigen, wherein the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO: 17.
    • 111. A SARS-COV-2 antigen, wherein the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO: 14.
    • 112. A SARS-COV-2 antigen, wherein the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:16.
    • 113. A SARS-COV-2 antigen, wherein the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:66.
    • 114. A SARS-COV-2 antigen, wherein the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:15.
    • 115. A SARS-COV-2 antigen, wherein the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:82.
    • 116. A SARS-COV-2 antigen, wherein the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:84.
    • 117. A SARS-COV-2 antigen, wherein the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:74.
    • 118. A SARS-COV-2 antigen, wherein the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:76.
    • 119. A SARS-COV-2 antigen, wherein the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:78.
    • 120. A SARS-COV-2 antigen, wherein the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:80.
    • 121. A SARS-COV-2 antigen, wherein the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:68.
    • 122. A SARS-COV-2 antigen, wherein the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:70.
    • 123. A SARS-COV-2 antigen, wherein the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:96.
    • 124. A SARS-COV-2 antigen, wherein the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:86.
    • 125. A SARS-COV-2 antigen, wherein the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:88.
    • 126. A SARS-COV-2 antigen, wherein the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:90.
    • 127. A SARS-COV-2 antigen, wherein the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:92.
    • 128. A SARS-COV-2 antigen, wherein the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO:94.
    • 129. A SARS-COV-2 antigen, wherein the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO: 118.
    • 130. A SARS-COV-2 antigen, wherein the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO: 120.
    • 131. A SARS-COV-2 antigen, wherein the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO: 123.
    • 132. A SARS-COV-2 antigen, wherein the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO: 125.
    • 133. A SARS-COV-2 antigen, wherein the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO: 127.
    • 134. A SARS-COV-2 antigen, wherein the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO: 129.
    • 135. A SARS-COV-2 antigen, wherein the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO: 131.
    • 136. A SARS-COV-2 antigen, wherein the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO: 133.
    • 137. A SARS-COV-2 antigen, wherein the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO: 135.
    • 138. A SARS-COV-2 antigen, wherein the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO: 139.
    • 139. A SARS-COV-2 antigen, wherein the SARS-COV-2 antigen is a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO: 141.
    • 140. A peptide fusion construct comprising one or more antigenic regions of the SARS-COV-2 S protein, where the one or more antigenic regions comprises or consists of the following components: FP, D1, D2 and/or B1, wherein FP comprises residues 815-833 of the SARS-COV-2 S protein, wherein D1 comprises residues 820-846 of the SARS-COV-2 S protein, wherein D2 comprises residues 1078-1111 of the SARS-COV-2 S protein, and wherein B1 comprises residues 798-829 of the SARS-COV-2 S protein.
    • 141. The peptide fusion construct according to embodiment 140, wherein the peptide fusion construct has the following structure: D1-linker-FP-linker-D2-linker-D1,
    • 142. The peptide fusion construct according to embodiment 141, wherein DI has the sequence of SEQ ID NO: 22.
    • 143. The peptide fusion construct according to embodiment 140 or 141, wherein FP has the sequence of SEQ ID NO: 21
    • 144. The peptide fusion construct according to any one of embodiments 140, 141 and 142, wherein the linker comprises or consists of the amino acid sequence GGGGS.
    • 145. The peptide fusion construct according to any one of embodiments 140-144, comprising or consisting of the sequence of SEQ ID NO: 25 or 51, 55,
    • 146. The peptide fusion construct according to embodiment 140, wherein the peptide fusion construct has the following structure: FP-linker-FP-linker-FP, D1-linker-D1-linker-D1, or FP/D1-linker-FP/D1-linker-FP/D1.
    • 147. The peptide fusion construct according to embodiment 146, wherein the FP/D1 portion has the sequence of SEQ ID NO: 99.
    • 148. The peptide fusion construct according to embodiment 146 or 147, wherein the linker comprises or consists of the amino acid sequence GGGGS.
    • 149. The peptide fusion construct according to any one of embodiments 146-148, comprising or consisting of the sequence of SEQ ID NO: 27 or 53, 57.
    • 150. A pharmaceutical composition comprising the SARS-COV-2 antigen of any one of embodiments 101-131 or the peptide fusion construct of any one of embodiments 146-149.
    • 151. The pharmaceutical composition of embodiment 150, further comprising an adjuvant.
    • 152. The pharmaceutical composition of embodiment 151, wherein the adjuvant is selected from alum, CpG, PolyI: C, MF59, AS01, AS02, AS03, AS04, AF03, flagellin, ISCOMs and ISCOMMATRIX.
    • 153. The pharmaceutical composition of any one of embodiments 150-152 for use in treating or preventing an infection with SARS-COV-2.
    • 154. The pharmaceutical composition for use according to embodiment 153, wherein the pharmaceutical composition is administered parenterally.
    • 155. The pharmaceutical composition for use according to embodiment 154, wherein the pharmaceutical composition is administered intradermally, subcutaneously, or intramuscularly.
    • 156. The pharmaceutical composition for use according to embodiment 155, wherein the pharmaceutical composition is administered intramuscularly.
    • 157. The pharmaceutical composition for use according to any one of embodiments 153-156, wherein the pharmaceutical composition is administered at least once.
    • 158. The pharmaceutical composition for use according to embodiments 153-156, wherein the pharmaceutical composition is administered at least twice.
    • 159. The pharmaceutical composition for use according to embodiments 158, wherein the period between administrations is at least 2 weeks, e.g. 3 weeks, or 1 month.
    • 160. An mRNA construct consisting of the following structural elements:
    • (i) a 5′ cap with the following structure:




embedded image




    • (ii) a 5′ untranslated region (5′ UTR) having the nucleic acid sequence of SEQ ID NO: 144;

    • (iii) a protein coding region having the nucleic acid sequence of SEQ ID NO: 148;

    • (iv) a 3′ untranslated region (3′ UTR) having the nucleic acid sequence of SEQ ID NO: 145; and

    • (v) a poly A tail.

    • 161. A lipid nanoparticle encapsulating the mRNA construct of embodiment 160.

    • 162. The lipid nanoparticle of embodiment 161, wherein the lipid nanoparticle comprises a cationic lipid, a non-cationic lipid, a cholesterol-based lipid and a PEG-modified lipid.

    • 163. The lipid nanoparticle of embodiment 161 or 162, wherein the cationic lipid is selected from cKK-E12, cKK-E10, OF-Deg-Lin and OF-02; the non-cationic lipid is selected from DOPE and DEPE; the cholesterol-based lipid is cholesterol; and the PEG-modified lipid is DMG-PEG-2K.

    • 164. An immunogenic composition comprising the mRNA construct of embodiment 160 or the lipid nanoparticle of any of embodiments 161-163.

    • 165. The immunogenic composition according to embodiment 164 comprising between 10 μg and 200 μg of the mRNA construct.

    • 166. The immunogenic composition according to embodiment 165 comprising between 15 μg and 135 μg of the mRNA construct.

    • 167. The immunogenic composition according to embodiment 166 comprising at least 20 μg of the mRNA construct.

    • 168. The immunogenic composition according to embodiment 166 comprising at least 25 μg of the mRNA construct.

    • 169. The immunogenic composition according to embodiment 166 comprising at least 35 μg of the mRNA construct.

    • 170. The immunogenic composition according to embodiment 166 comprising at least 40 μg of the mRNA construct.

    • 171. The immunogenic composition according to embodiment 166 comprising at least 45 μg of the mRNA construct.

    • 172. The immunogenic composition according to embodiment 166 comprising 15 μg, 45 μg or 135 μg of the mRNA construct.

    • 173. A method of treating or preventing a SARS-COV-2 infection, said method comprising administering to a subject an effective amount of the immunogenic composition of any one of embodiments 164 to 172.

    • 174. The method of embodiment 173, wherein the immunogenic is administered to the subject at least twice.

    • 175. The method of embodiment 174, wherein the period between administrations is at least 2 weeks, e.g., 3 weeks, or 1 month.

    • 176. An immunogenic composition comprising at least two nucleic acids, for use in prophylaxis of an infection with SARS-COV-2, wherein a first nucleic acid comprises an optimized nucleotide sequence which encodes an amino acid sequence comprising a sequence selected from SEQ ID NO: 1, 2, 3, 4, 5, 8, 9, 10, 11, 12, 14, 15, 16, 17, 19, 20, 35, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84. 86, 88, 90, 92, 94, 96, 98, 104, 106, 108, 110, 118, 120, 123, 125, 127, 129, 131, 133, 135, 137, 139 or 141, and wherein one or more further nucleic acid(s) comprise(s) an optimized nucleotide sequence which encodes an amino acid sequence comprising a sequence selected from SEQ ID NO: 151, 153, 155, 157, 159, 161, 163, 165, 167, 169 or 171.

    • 177. An immunogenic composition comprising at least two nucleic acids, for use in prophylaxis of an infection with SARS-COV-2, wherein a first nucleic acid comprises an optimized nucleotide sequence which encodes an amino acid sequence comprising SEQ ID NO: 11, optionally wherein the optimized nucleotide sequence has the nucleic acid sequence of SEQ ID NO: 44, and

    • wherein one or more further nucleic acid(s) is (are) selected from:
      • (a) a nucleic acid comprising an optimized nucleotide sequence which encodes an amino acid sequence comprising a sequence selected from SEQ ID NO: 157, optionally wherein the optimized nucleotide sequence has the nucleic acid sequence of SEQ ID NO 156, and
      • (b) nucleic acid comprises an optimized nucleotide sequence that encodes an amino acid sequence comprising a sequence selected from SEQ ID NO: 163, optionally wherein the optimized nucleotide sequence has the nucleic acid sequence of SEQ ID NO: 162; and
      • (c) nucleic acid comprises an optimized nucleotide sequence that encodes an amino acid sequence comprising a sequence selected from SEQ ID NO: 167, optionally wherein the optimized nucleotide sequence has the nucleic acid sequence of SEQ ID NO: 166; and
      • (d) nucleic acid comprises an optimized nucleotide sequence that encodes an amino acid sequence comprising a sequence selected from SEQ ID NO: 171, optionally wherein the optimized nucleotide sequence has the nucleic acid sequence of SEQ ID NO 170.

    • 178. The immunogenic composition according to embodiment 176 or embodiment 177, wherein the at least two nucleic acids are mRNA.

    • 179. The immunogenic composition according to embodiment 178, wherein the first nucleic acid comprises an optimized nucleotide sequence which encodes an amino acid sequence comprising SEQ ID NO:11 and wherein the optimized nucleotide sequence has the nucleic acid sequence of SEQ ID NO: 148.

    • 180. The immunogenic composition according to embodiments 176-178, wherein the nucleic acids are encapsulated in a lipid nanoparticle.

    • 181. A method of treating or preventing a SARS-COV-2 infection, said method comprising administering to a subject an effective amount of the immunogenic composition of any one of embodiments 176-179.





EXAMPLES
Example 1. Generating Optimized Nucleotide Sequences

This example illustrates a process that results in optimized nucleotide sequences in accordance with the invention that are optimized to yield full-length transcripts during in vitro synthesis and result in high levels of expression of the encoded protein.


The process combines the codon optimization method of FIG. 1A with a sequence of filtering steps illustrated in FIG. 1B to generate a list of optimized nucleotide sequences. Specifically, as illustrated in FIG. 1A, the process receives an amino acid sequence of interest and a first codon usage table which reflects the frequency of each codon in a given organism (namely human codon usage preferences in the context of the present example). The process then removes codons from the first codon usage table if they are associated with a codon usage frequency which is less than a threshold frequency (10%). The codon usage frequencies of the codons not removed in the first step are normalized to generate a normalized codon usage table.


Normalizing the codon usage table involves re-distributing the usage frequency value for each removed codon; the usage frequency for a certain removed codon is added to the usage frequencies of the other codons with which the removed codon shares an amino acid. In this example, the re-distribution is proportional to the magnitude of the usage frequencies of the codons not removed from the table. The process uses the normalized codon usage table to generate a list of optimized nucleotide sequences. Each of the optimized nucleotide sequences encode the amino acid sequence of interest.


As illustrated in FIG. 1B, the list of optimized nucleotide sequences is further processed by applying a motif screen filter, guanine-cytosine (GC) content analysis filter, and codon adaptation index (CAI) analysis filter, in that order, to generate an updated list of optimized nucleotide sequences.


As illustrated in following examples, this process results in optimized nucleotide sequences encoding the amino acid sequence of interest. The nucleotide sequences yield full-length transcripts during in vitro synthesis and result in high levels of expression of the encoded protein (see Example 2).


Example 2. Codon Optimization to Generate Nucleotide Sequences with a High CAI Score Improves Protein Yield

This example demonstrates that codon-optimized protein coding sequences with a codon adaptation index (CAI) of about 0.8 or higher outperform codon-optimized protein coding sequences with a CAI below 0.8.


Codon optimization was performed on a wild-type amino acid sequence of human erythropoietin (hEPO). hEPO is a protein hormone secreted by the kidney in response to low cellular oxygen levels (hypoxia). hEPO is essential for erythropoiesis, the production of red blood cells. Recombinant hEPO is commonly used in the treatment of anemia, a condition characterized by a low red blood cell or hemoglobin count, which can occur in subjects with chronic kidney disease or in subjects undergoing cancer chemotherapy.


Using different codon optimization algorithms, a total of 5 new codon-optimized nucleotide sequences encoding hEPO (#1 through #5) were generated. Nucleotide sequences #4 and #5 were generated according to a codon optimization method as illustrated in FIGS. 1A and 1B. As a reference, a nucleotide sequence with a codon-optimized hEPO coding sequence was provided that had previously been validated experimentally both in vitro and in vivo. The reference nucleotide sequence had been found to provide superior protein yield relative to the wild-type nucleotide sequence and other codon-optimized nucleotide sequences encoding the hEPO protein.









TABLE 5





hEPO-encoding nucleotide sequences
















SEQ ID NO: 112
ATGGGTGTGCACGAATGTCCTGCTTGGCTGTGGCTCCTTCTCTC



CCTGCTGTCCCTGCCTCTTGGACTCCCGGTGCTTGGAGCACCCC



CGAGACTGATCTGCGACAGCAGGGTGCTCGAGCGCTACCTCCT



GGAAGCCAAGGAAGCCGAAAACATCACTACTGGCTGCGCCGA



ACACTGCTCCCTGAACGAGAACATCACCGTGCCGGACACCAAG



GTCAACTTCTACGCGTGGAAGAGAATGGAGGTCGGACAGCAA



GCCGTGGAAGTGTGGCAGGGACTTGCGCTCCTGTCGGAAGCCG



TGCTGAGGGGACAAGCCCTGCTCGTGAACAGCTCACAGCCTTG



GGAGCCCCTGCAGCTGCATGTCGACAAGGCCGTGTCCGGACTG



CGCTCACTGACCACTCTGCTGAGGGCCTTGGGTGCCCAGAAAG



AGGCTATTTCCCCACCGGATGCAGCCTCGGCAGCTCCTCTGCG



GACCATTACGGCGGACACCTTTCGGAAGCTGTTCCGCGTCTAC



AGCAATTTCCTCCGGGGGAAGTTGAAACTGTATACCGGCGAAG



CCTGTCGGACTGGCGATCGCTGA





SEQ ID NO: 113
ATGGGGGTTCATGAGTGCCCAGCTTGGCTTTGGCTCCTGCTCAG



CTTGCTTAGTCTCCCTTTGGGCCTGCCCGTGCTGGGCGCCCCTC



CACGCTTGATCTGTGACAGCAGGGTCTTGGAACGGTATTTGCTT



GAAGCTAAAGAAGCTGAGAACATAACAACGGGATGTGCTGAA



CATTGCTCCTTGAACGAAAACATCACAGTTCCCGACACAAAAG



TCAATTTTTACGCATGGAAGCGGATGGAGGTTGGCCAGCAAGC



TGTGGAGGTCTGGCAAGGGCTGGCTCTTCTCAGTGAAGCCGTG



CTGCGCGGACAAGCACTCTTGGTGAACTCCAGCCAGCCCTGGG



AGCCCCTTCAGCTCCATGTCGATAAAGCAGTTAGCGGCCTCCG



ATCATTGACTACCCTCCTTAGGGCTTTGGGTGCACAAAAAGAG



GCCATTTCACCACCGGACGCGGCAAGTGCTGCTCCGTTGCGAA



CTATAACTGCTGACACCTTCCGGAAACTTTTTCGGGTATATTCC



AACTTTCTCAGGGGGAAACTCAAGCTCTACACCGGCGAGGCGT



GCCGAACTGGAGACCGCTGA





SEQ ID NO: 114
ATGGGCGTACATGAATGCCCGGCATGGCTTTGGCTGCTGCTGT



CCCTGCTGAGTTTGCCGCTGGGCCTCCCCGTCCTCGGCGCTCCC



CCGAGACTCATTTGCGACTCTAGGGTCCTCGAACGCTATCTGCT



GGAAGCAAAAGAAGCTGAGAACATAACTACAGGATGCGCTGA



GCACTGTTCCTTGAATGAGAATATCACAGTACCTGACACTAAG



GTGAATTTTTACGCATGGAAACGCATGGAAGTGGGTCAGCAGG



CCGTGGAAGTGTGGCAGGGCCTGGCGCTGCTGTCCGAGGCTGT



TCTTAGAGGCCAAGCCTTGTTGGTCAATTCCTCTCAACCCTGGG



AGCCCCTCCAGCTGCATGTTGATAAAGCCGTCTCTGGTCTCCGG



TCCCTTACCACCCTGCTCAGGGCACTTGGCGCACAGAAGGAAG



CTATCTCCCCCCCAGACGCTGCCAGTGCCGCCCCCCTCCGGACT



ATTACCGCCGATACTTTCAGGAAACTGTTTCGAGTCTATAGCAA



TTTTCTCCGCGGGAAACTGAAGCTGTATACAGGTGAGGCCTGC



AGGACAGGAGATCGCTGA





SEQ ID NO: 115
ATGGGCGTGCACGAATGTCCTGCTTGGCTGTGGCTGCTGCTGA



GTCTGCTGTCTCTGCCTCTGGGACTGCCTGTTCTTGGAGCCCCT



CCTAGACTGATCTGCGACAGCAGAGTGCTGGAAAGATACCTGC



TGGAAGCCAAAGAGGCCGAGAACATCACAACAGGCTGTGCCG



AGCACTGCAGCCTGAACGAGAATATCACCGTGCCTGACACCAA



AGTGAACTTCTACGCCTGGAAGCGGATGGAAGTGGGACAGCA



GGCTGTGGAAGTTTGGCAAGGACTGGCCCTGCTGTCTGAAGCT



GTTCTGAGAGGACAGGCTCTGCTGGTCAATAGCTCTCAGCCTT



GGGAACCTCTCCAGCTGCATGTGGATAAGGCCGTGTCTGGCCT



GAGAAGCCTGACAACACTGCTGAGAGCCCTGGGAGCCCAGAA



AGAGGCCATTTCTCCACCTGATGCTGCCAGCGCTGCCCCTCTGA



GAACAATCACCGCCGACACCTTCAGAAAGCTGTTCCGGGTGTA



CAGCAACTTCCTGCGGGGCAAGCTGAAACTGTACACCGGCGAA



GCCTGCAGAACCGGCGATAGATAA





SEQ ID NO: 116
ATGGGGGTGCACGAGTGCCCTGCCTGGCTGTGGTTGCTGCTGT



CCCTGCTGTCTCTGCCACTGGGACTGCCAGTGCTGGGAGCTCCA



CCTAGGCTGATCTGCGACAGCCGGGTCCTGGAGAGGTACCTGC



TCGAGGCCAAGGAGGCCGAGAACATTACCACAGGCTGCGCCG



AGCACTGCAGCCTGAACGAGAACATTACAGTGCCCGATACAAA



GGTGAACTTCTACGCCTGGAAGAGGATGGAGGTGGGCCAGCA



GGCCGTGGAGGTGTGGCAGGGGCTGGCCCTGCTGAGCGAGGCC



GTGCTGAGGGGCCAAGCCCTGCTGGTCAACAGCAGCCAGCCTT



GGGAGCCCCTGCAGCTCCACGTGGACAAGGCTGTGTCTGGCTT



GAGGTCTCTCACAACATTGCTGAGGGCCCTGGGCGCACAGAAA



GAAGCTATCAGCCCACCTGATGCCGCTAGTGCCGCTCCACTGC



GGACAATTACCGCCGATACCTTTAGAAAATTGTTCAGGGTCTA



CTCCAACTTTTTGCGCGGGAAGCTGAAGCTCTATACCGGCGAG



GCCTGCCGGACAGGGGACAGATGA





SEQ ID NO: 117
ATGGGAGTGCACGAATGTCCTGCATGGCTCTGGCTCCTGCTGTC



TCTCCTGAGCCTGCCACTGGGACTCCCAGTGCTGGGAGCACCC



CCTAGGCTGATCTGCGATTCTCGGGTGCTGGAGCGCTACCTGCT



CGAGGCTAAGGAGGCCGAGAATATCACTACTGGGTGTGCCGAA



CACTGTAGCCTCAATGAAAACATTACAGTCCCAGATACCAAGG



TGAACTTTTATGCATGGAAGAGGATGGAGGTCGGGCAGCAGGC



AGTGGAGGTGTGGCAGGGACTGGCTCTGCTGTCCGAAGCCGTG



CTCAGAGGTCAGGCCCTGCTGGTTAATTCCAGCCAGCCTTGGG



AACCTCTGCAGCTGCATGTGGACAAGGCAGTGTCTGGCCTGAG



ATCCCTTACTACACTGCTGAGAGCACTGGGGGCTCAGAAAGAA



GCTATTTCCCCACCAGACGCCGCCTCAGCAGCACCTCTCCGGA



CCATCACTGCTGACACCTTCCGCAAGCTCTTTAGGGTGTACTCC



AACTTCCTGCGCGGGAAGCTCAAGCTGTACACCGGCGAAGCCT



GCAGGACCGGGGATCGCTGA









The characteristics of each of the 5 nucleotide sequences in terms of CAI, GC content, codon frequency distribution (CFD) as well as the presence of negative CIS elements and negative repeat elements is summarized in Table 6.









TABLE 6







Characteristics of the optimized nucleotide


sequences encoding hEPO
















GC

Negative
Negative


Nucleotide


content
CFD
CIS
repeat


Sequence
SEQ ID NO.
CAI
%
%
elements
elements
















Reference
SEQ ID NO:
0.79
61.06%
3%
0
0



112


#1
SEQ ID NO:
0.69
54.12%
2%
0
0



113


#2
SEQ ID NO:
0.76
56.23%
1%
0
0



114


#3
SEQ ID NO:
0.90
57.28%
0%
0
0



115


#4
SEQ ID NO:
0.89
60.95%
0%
0
0



116


#5
SEQ ID NO:
0.86
59.56%
0%
0
0



117









In order to test the protein yield from each of the codon-optimized sequences, 6 nucleic acid vectors were prepared each comprising an expression cassette that contained one of the 6 nucleotide sequences encoding the hEPO protein flanked by identical 3′ and 5′ untranslated sequences (3′ and 5′ UTRs) and preceded by an RNA polymerase promoter. These nucleic acid vectors served as templates for in vitro transcription reactions to provide 6 batches of mRNA containing the 6 codon-optimized nucleotide sequences (reference and nucleotide sequences #1 through #5). Capping and tailing were performed separately. Each of the capped and tailed mRNAs were separately transfected into a cell line (HEK293). Expression levels of the encoded hEPO protein was assessed by ELISA. The results of this experiment are summarized in FIG. 2.


As can be seen from FIG. 2, the highest level of expression was observed with nucleotide sequence #3, which yielded nearly twice as much hEPO protein as the experimentally validated reference nucleotide sequence. A trend towards higher protein yield could be observed for sequences depending on their CAI (cf. Table 6). Nucleotide sequence #3 with the highest protein yield had the highest CAI. The second and third highest yielding nucleotide sequences #4 and #5 had the third and fourth highest CAI. The lowest performing nucleotide sequences #1 and #2 also had the lowest CAI. Incidentally, these were also the nucleotide sequences with the lowest GC content. However, GC content alone was not determinative. The reference nucleotide sequence had the highest GC content (61%) of all tested codon-optimized sequences, but did not perform as well as nucleotide sequences #3, #4 and #5, all of which had a lower GC content. Notably, the lowest performing nucleotide sequences #1 and #2 also had a higher CFD.


Taken together, the data in this example demonstrate that codon optimization of a therapeutically relevant nucleotide sequence to achieve a CAI of about 0.8 or higher results in greater protein yield than, e.g., codon optimization to achieve a nucleotide sequence with the highest possible GC content.


Example 3. Detection of Spike Proteins Produced Using Optimized Nucleic Constructs

This example demonstrates that optimized nucleotide sequences encoding a full-length SARS-COV-2 S protein are successfully expressed in cultured cells at high levels following transfection. It also demonstrates that the expressed protein is processed by the cells as expected.


Nucleic acid constructs comprising optimized nucleotide sequences encoding a full-length SARS-COV-2 S protein were generated according to a codon optimization method as illustrated in FIGS. 1A and 1B. The optimized nucleotide sequences are shown in Table 7.









TABLE 7







Nucleic acids comprising an optimized nucleotide


sequence encoding a SARS-CoV-2 S protein










Construct
Optimized nucleic
Amino acid



No.
acid sequence
sequence
Protein description





A
SEQ ID NO: 29
SEQ ID NO: 1
Native full-length SARS-CoV-2 spike





protein


B
SEQ ID NO: 44
SEQ ID NO: 11
SARS-CoV-2 S protein that has been





modified relative to naturally





occurring SARS-CoV-2 spike protein to





remove the furin cleavage site and to





mutate residues 986 and 987 to proline


C
SEQ ID NO: 43
SEQ ID NO: 10
SARS-CoV-2 S protein that has been





modified relative to naturally occurring





SARS-CoV-2 spike protein to mutate





residues 986 and 987 to proline


D
SEQ ID NO: 42
SEQ ID NO: 9
SARS-CoV-2 S protein that has been





modified relative to naturally occurring





SARS-CoV-2 spike protein to remove





the furin cleavage site









For transfection of cultured cells, 150 μL OptiMEM Reduced Serum Medium was added to a 1.5 mL Eppendorf tube, along with 0.5 μg (FIG. 7) or 1 μg (FIGS. 5 and 6) mRNA and 2.5 μL Lipofectamine 2000 for complexation of the mRNA to the transfection reagent. Each tube was gently mixed on a Vortex and spun briefly in a microcentrifuge to collect the contents. The complexes were incubated for 10±2 minutes at room temperature. Then the entire complex volume was carefully added to a well of a 12 well plate, so as not to disturb the HEK293 cell monolayer (5×105 per well). The cells were returned to a 37° C. incubator and incubated for 18±2 hours prior to harvesting.


The contents of each well was harvested by removing the culture medium and adding 250 μL of CelLytic M (Sigma)+1× HALT. The cell suspension was left for 20 minutes on ice to allow the cells to fully lyse, before the lysates were collected in 1.5 mL Eppendorf tubes. The lysates were centrifuged at 13,000 RPM for 3 minutes to pellet the debris. The supernatants were transferred to clean 1.5 mL Eppendorf tubes. From this point forward, samples were always kept on ice.


For Western Blotting, 15 μL of each cell lysate was combined with 5 μL 4× Novex NuPAGE LDS Sample Buffer supplemented with 1× NuPage Sample Reducing Agent. The samples were incubated at 85° C. for 5 minutes, then cooled on ice. The entire sample volume was loaded into a Novex WedgeWell 12-well 6% tris-glycine mini gel with 3 μg I-56578SS/gel and run for 1-1.5 hour at 165V. A TransBlot Turbo with the PVDF transfer pack was used for transfer and the membranes were blocked in 0.2% iBlock (Thermo) with 0.05% Tween-20 in 1×PBS. The membranes were incubated for ≥1 hour with primary antibody (Anti-rabbit HRP #W401B) diluted as specified in blocking buffer. They were then washed twice with 1×TBST (Thermo). The membranes were then incubated for ≥1 hour with species-appropriate secondary antibody diluted 1:10,000 in blocking buffer. They were then washed four times with 1×TBST. The membranes were then develop using SuperSignal Pico West substrate on film.


Transfection of mRNAs containing the optimized nucleotide sequences described in Table 7 resulted in levels of protein expression in cultured HEK293 cells. FIGS. 5 and 6 show a ˜170-180 kDa band corresponding to a pre-processed full length S protein. FIG. 5 also shows the presence of S1 and S2 subunit bands, demonstrating that the native full length SARS-COV-2 S protein (Construct A) is processed correctly by the cells. A large band corresponding to fully glycosylated mature protein was observed when cells expressed construct B. Construct B encodes a variant SARS-COV-2 S protein that is modified relative to naturally occurring SARS-COV-2 spike protein to lack the furin cleavage site (and therefore is not cleaved to form the S1 and S2 subunits) and to contain prolines as residues 986 and 987 (thereby stabilizing the protein in its prefusion conformation).



FIG. 7 also shows the full length S protein band of ˜170-180 kDa. This band was observed with all 4 constructs tested. S1 and S2 subunit bands were detected with construct A and construct C. Construct C expresses a variant SARS-COV-2 S protein which is modified relative to naturally occurring SARS-COV-2 S protein to contain prolines as residues 986 and 987 (thereby stabilizing the protein in its prefusion conformation). Again, the fully glycosylated mature protein was detected as a strong band with construct B and construct D. Construct D encodes a variant SARS-COV-2 S protein that is modified relative to naturally occurring SARS-COV-2 S protein to lack the furin cleavage site (and therefore is not cleaved to form the S1 and S2 subunits).


This example demonstrates that optimized nucleic acid sequences encoding full length SARS-COV-2 S protein or variants thereof are expressed at high levels. It also demonstrates that the expressed protein is processed by the cells as expected.


Example 4. Neutralizing Antibody Response to Immunization with Sequence-Optimized mRNAs Encoding a Full-Length Prefusion Stabilized SARS-COV-2 S Protein

This examples demonstrates that mRNAs comprising an optimized nucleotide sequence encoding a full-length prefusion stabilized SARS-COV-2 S protein are effective in inducing a neutralizing antibody response in mice.


Each of the four mRNAs containing the optimized nucleotide sequences described in Table 7 of Example 3 was encapsulated in lipid nanoparticles (LNPs). Groups of BALB/c mice were administered two immunizations at a 0.4 μg dose of one of the four formulations at a three week interval. Binding antibody activities in the serum samples were assessed via Enzyme-Linked Immunosorbent Assay (ELISA). To determine titers of neutralizing antibodies, a pseudovirus-based neutralization assay was used.


For the ELISA, 2019-nCOV Spike protein (S1+S2) ectodomain (Sino Biological, Cat #40589-V08B1) was used as substrate and coated at 2 μg/mL concentration in bicarbonate buffer overnight at 4° C. The plates were developed using colorimetric substrate, Sure Blue TMB 1-component (SERA CARE, KPL Cat #5120-0077), and stopped by Stop solution (SERA CARE Sure Blue, KPL Cat #5120-0024). The endpoint antibody titer for each sample was determined as the highest dilution which gave an OD value 3× higher than the background.


For the pseudovirus-based neutralization assay, serum samples were diluted 1:4 in medium (FluoroBrite phenol red free DMEM+10% FBS+10 mM HEPES+1% PS+1% Glutamax) and heat inactivated at 56° C. for 0.5 h. Further, 2-fold dilution series of the heat inactivated sera were prepared and mixed with the reporter virus particle (RVP)-GFP (Integral Molecular), diluted to contain 300 infectious particles per well and incubated for 1 h at 37° C. 96-well plates of 50% confluent 293T-hsACE2 clonal cells in 75 μL volume were inoculated with 50 μL of the serum/virus mixtures and incubated at 37° C. for 72h. At the end of the incubation, plates were scanned on a high-content imager and individual GFP expressing cells were counted. The inhibitory dilution titer (ID50) was reported as the reciprocal of the dilution that reduced the number of virus plaques in the test by 50%. ID50 for each test sample was interpolated by calculating the slope and intercept using the last dilution with a plaque number below the 50% neutralization point and the first dilution with a plaque number above the 50% neutralization point. ID50 Titer=(50% neutralization point−intercept)/slope.


All four mRNA formulations induced similar levels of binding antibodies 14 days after the first vaccination, and the responses were further enhanced one week after the second dose at Day 28. On Day 35, the geometric mean titers (GMTs) for neutralizing antibodies as determined by pseudovirus neutralization assay were 152 for construct A, 354 for construct B, 195 for construct C, and 1005 for construct D. The neutralizing potential of construct D variant was slightly trending higher than construct B.


Serological antibody titers detected for binding in ELISA were not predictive of neutralizing titers determined by pseudovirus. Some mice in the construct A and construct C groups did not seroconvert in the neutralization assay but their endpoint titration titers in ELISA were comparable to the others in the group. Constructs B and D were likely comparable in immunogenicity for induction of neutralizing antibodies.


This example demonstrates that mRNAs comprising an optimized nucleotide sequence encoding a full-length prefusion stabilized SARS-COV-2 S protein are more effective in inducing neutralizing antibody titers than an mRNA that encodes a native full-length SARS-COV-2 S protein. Blocking the furin cleavage site in addition to mutating residues 986 and 987 to proline adds another layer for prevention of prefusion to postfusion conversion. Considering the importance of the pre-fusion conformation, construct B (encoding a SARS-COV-2 S protein that has been modified relative to naturally occurring SARS-COV-2 S protein to remove the furin cleavage site and to mutate residues 986 and 987 to proline) was selected for further preclinical evaluations.


Example 5. Preparation of mRNA-Encapsulating Lipid Nanoparticles

An mRNA comprising an optimized nucleotide sequence encoding a full-length SARS-COV-2 S protein that has been modified relative to naturally occurring SARS-COV-2 spike protein to remove the furin cleavage site and to mutate residues 986 and 987 to proline was synthesized in vitro. The mRNA was prepared using a template plasmid comprising the following nucleic acid sequence operable linked to an RNA polymerase promoter sequence:










(SEQ ID NO: 149)










1
GGACAGATCG CCTGGAGACG CCATCCACGC TGTTTTGACC TCCATAGAAG






51
ACACCGGGAC CGATCCAGCC TCCGCGGCCG GGAACGGTGC ATTGGAACGC





101
GGATTCCCCG TGCCAAGAGT GACTCACCGT CCTTGACACG ATGTTCGTCT





151
TCCTCGTGCT GCTCCCACTC GTTTCTTCCC AGTGTGTCAA CCTGACAACT





201
AGGACTCAGC TGCCACCAGC CTACACCAAC TCCTTCACCA GAGGCGTGTA





251
TTACCCAGAC AAGGTGTTTA GAAGCAGCGT GCTGCACTCT ACCCAGGACC





301
TCTTTCTGCC CTTTTTCAGC AACGTGACAT GGTTTCACGC AATTCACGTG





351
TCCGGCACTA ATGGCACAAA GCGGTTCGAC AATCCAGTCC TGCCTTTCAA





401
CGATGGCGTC TACTTTGCAT CTACTGAGAA ATCCAATATC ATTAGGGGAT





451
GGATCTTCGG CACAACCCTG GATTCTAAGA CCCAGAGCCT GCTGATCGTC





501
AACAACGCCA CAAACGTGGT CATTAAGGTT TGCGAGTTTC AGTTCTGTAA





551
CGATCCTTTT CTGGGCGTGT ATTATCATAA GAACAATAAG AGCTGGATGG





601
AGTCCGAGTT TAGAGTGTAT AGCTCTGCAA ATAATTGTAC CTTTGAGTAC





651
GTGAGCCAGC CCTTTCTGAT GGACCTGGAG GGAAAACAAG GAAACTTCAA





701
AAACCTGCGG GAATTCGTTT TCAAAAACAT CGACGGCTAT TTCAAGATCT





751
ATAGCAAGCA TACCCCAATC AACCTCGTGA GGGACCTCCC CCAGGGCTTT





801
AGCGCACTGG AGCCACTGGT TGACCTGCCT ATCGGCATTA ATATCACAAG





851
ATTTCAGACC CTGCTGGCAC TGCATAGAAG CTATCTGACC CCTGGAGACT





901
CCTCTAGTGG GTGGACTGCC GGCGCCGCTG CCTACTATGT GGGCTATCTG





951
CAGCCACGGA CATTCCTGCT GAAATACAAT GAGAACGGGA CAATCACAGA





1001
TGCTGTTGAT TGCGCACTCG ACCCCCTGTC CGAGACAAAG TGCACTCTCA





1051
AGAGCTTTAC CGTCGAGAAG GGCATCTATC AGACCTCAAA CTTCAGGGTG





1101
CAGCCCACAG AATCTATCGT GCGCTTCCCT AATATCACTA ACCTGTGTCC





1151
TTTCGGTGAA GTGTTCAACG CCACCAGGTT TGCTAGCGTG TATGCCTGGA





1201
ACAGGAAGAG GATCTCTAAC TGCGTCGCCG ACTATTCCGT GCTGTATAAC





1251
AGCGCCTCCT TCTCCACATT CAAATGCTAT GGAGTGAGCC CGACAAAACT





1301
GAACGATCTC TGCTTTACAA ATGTCTACGC CGACTCTTTT GTGATCAGAG





1351
GGGACGAGGT CCGGCAGATC GCACCAGGAC AGACAGGCAA GATTGCTGAC





1401
TACAACTATA AGCTGCCTGA CGACTTCACA GGATGTGTGA TCGCATGGAA





1451
CTCAAACAAT CTGGACTCCA AAGTCGGGGG CAACTATAAT TACCTGTATC





1501
GCCTGTTCCG GAAGTCCAAC CTGAAGCCCT TCGAGAGGGA CATCAGTACA





1551
GAGATCTATC AGGCTGGCTC CACCCCTTGC AATGGCGTCG AAGGCTTTAA





1601
TTGTTATTTT CCCCTGCAGT CTTACGGGTT TCAGCCTACT AATGGAGTTG





1651
GGTACCAGCC ATACAGAGTG GTCGTGCTCA GCTTCGAGCT CCTGCATGCT





1701
CCAGCTACAG TTTGCGGGCC AAAGAAGTCC ACTAACCTGG TGAAGAATAA





1751
GTGCGTCAAC TTCAACTTTA ACGGGCTCAC CGGCACCGGC GTGCTGACTG





1801
AGAGCAACAA GAAGTTTCTG CCATTTCAAC AGTTTGGACG GGACATTGCC





1851
GACACCACCG ATGCCGTTCG GGATCCACAG ACCCTGGAAA TTCTGGACAT





1901
TACACCGTGC AGCTTCGGGG GCGTGAGCGT GATCACACCC GGAACCAATA





1951
CAAGCAACCA GGTTGCCGTC CTGTATCAGG ATGTCAATTG CACAGAAGTG





2001
CCAGTTGCTA TCCACGCAGA CCAGCTGACT CCCACATGGC GGGTGTATAG





2051
CACCGGATCC AACGTGTTTC AGACCCGCGC CGGATGTCTC ATTGGGGCCG





2101
AGCACGTGAA TAACAGCTAC GAGTGCGACA TCCCCATTGG CGCCGGCATT





2151
TGTGCGTCTT ACCAGACTCA GACCAACTCT CCTGGCTCCG CCTCTTCCGT





2201
TGCTAGTCAG TCTATTATTG CCTATACCAT GAGCCTCGGA GCTGAGAATA





2251
GCGTGGCCTA CTCCAATAAT TCCATCGCAA TCCCTACTAA CTTCACTATT





2301
TCTGTGACCA CCGAGATCCT GCCTGTGTCT ATGACTAAGA CTAGCGTTGA





2351
TTGTACCATG TATATTTGTG GCGACTCTAC CGAATGTTCT AACCTGCTGC





2401
TTCAGTACGG CTCATTTTGC ACACAGCTGA ACAGAGCCCT GACTGGGATC





2451
GCTGTGGAGC AGGACAAGAA CACACAGGAG GTGTTTGCAC AGGTGAAGCA





2501
GATCTATAAG ACCCCTCCTA TTAAGGATTT CGGCGGATTC AATTTCTCAC





2551
AGATTCTGCC AGACCCCAGT AAGCCTTCCA AGAGGAGCTT CATCGAGGAT





2601
CTCCTGTTTA ACAAGGTGAC CCTGGCAGAC GCCGGCTTTA TTAAGCAATA





2651
TGGGGATTGC CTGGGCGACA TTGCTGCCAG AGACCTGATT TGCGCCCAGA





2701
AATTCAATGG CCTCACAGTG CTGCCACCTC TGCTGACCGA CGAGATGATC





2751
GCTCAATACA CTAGCGCACT GCTGGCCGGA ACCATCACAT CAGGCTGGAC





2801
CTTCGGGGCC GGAGCAGCAC TGCAGATTCC ATTCGCCATG CAGATGGCCT





2851
ATAGATTCAA CGGCATTGGC GTCACACAGA ACGTGCTGTA CGAAAACCAG





2901
AAGCTCATCG CTAACCAGTT TAATTCCGCA ATTGGAAAGA TCCAAGATTC





2951
ACTCAGCTCA ACCGCCTCTG CACTCGGAAA GCTGCAGGAC GTGGTCAACC





3001
AGAATGCTCA GGCCCTGAAC ACACTCGTCA AGCAGCTGTC CTCTAACTTT





3051
GGCGCTATCA GCTCCGTTCT GAACGACATT CTGAGCCGCC TGGATCCCCC





3101
AGAGGCTGAA GTCCAGATTG ACCGCCTGAT TACCGGCCGG CTGCAGTCTC





3151
TGCAAACATA CGTGACCCAG CAGCTGATCA GAGCAGCCGA GATCCGGGCA





3201
TCCGCAAATC TGGCAGCAAC TAAGATGAGC GAATGCGTGC TGGGCCAGTC





3251
CAAGCGGGTG GACTTTTGTG GCAAGGGCTA CCACCTGATG AGCTTCCCCC





3301
AGAGCGCCCC ACATGGCGTT GTTTTTCTGC ACGTGACCTA TGTCCCTGCT





3351
CAGGAAAAGA ACTTTACAAC TGCTCCTGCT ATCTGCCATG ACGGCAAGGC





3401
CCACTTCCCA CGGGAGGGAG TGTTTGTGTC CAATGGCACA CACTGGTTCG





3451
TGACCCAGAG GAACTTCTAT GAACCCCAGA TCATCACCAC TGACAATACC





3501
TTCGTGTCTG GAAATTGCGA CGTCGTGATC GGCATCGTTA ACAACACCGT





3551
GTACGACCCT CTCCAGCCAG AGCTGGACTC CTTTAAGGAG GAACTGGATA





3601
AGTATTTTAA GAACCACACA AGCCCAGATG TGGATCTCGG GGACATCTCC





3651
GGAATTAACG CCTCCGTGGT GAATATCCAG AAGGAGATTG ACCGCCTAAA





3701
TGAAGTTGCC AAGAACCTCA ATGAGTCTCT GATTGATCTG CAGGAACTGG





3751
GCAAGTATGA GCAGTATATC AAATGGCCCT GGTACATTTG GCTGGGGTTT





3801
ATCGCCGGAC TGATTGCCAT CGTCATGGTG ACCATCATGC TGTGTTGCAT





3851
GACCTCCTGT TGTTCCTGTC TGAAGGGCTG CTGTAGTTGC GGCTCTTGCT





3901
GTAAATTCGA CGAAGATGAT AGCGAGCCCG TGCTGAAGGG CGTGAAGCTG





3951
CATTATACCT GACGGGTGGC ATCCCTGTGA CCCCTCCCCA GTGCCTCTCC





4001
TGGCCCTGGA AGTTGCCACT CCAGTGCCCA CCAGCCTTGT CCTAATAAAA





4051
TTAAGTTGCA TCAAGCT






Template-dependent RNA synthesis of unmodified nucleotides yielded a polynucleotide with the nucleic acid sequence of SEQ ID NO: 147 which comprises the optimized nucleic acid sequence of SEQ ID NO: 148. In a multi-step, enzyme-catalyzed process, the final mRNA product was synthesized, which was purified to remove enzyme reagents and prematurely aborted synthesis products (“shortmers”).


The final mRNA had the structural elements shown in Table 4. The SARS-COV-2 S protein coding sequence is flanked by 5′ and 3′ untranslated regions (UTRs) of 140 and 105 nucleotides, respectively. The mRNA also contains a 5′ cap structure consisting of a 7-methyl guanosine (m7G) residue linked via an inverted 5′5′ triphosphate bridge to the first nucleoside of the 5′ UTR, which is itself modified by 2′Oribose methylation. The 5′ cap is essential for initiation of translation by the ribosome. The entire linear structure is terminated at the 3′ end by a tract of approximately 100 to 500 adenosine nucleosides (polyA). The polyA region confers stability to the mRNA and is also thought to enhance translation. All of these structural elements are naturally occurring components which are required for the efficient translation of the SARS-COV-2 spike mRNA.


The purified mRNA was encapsulated in lipid nanoparticles (LNPs) comprising a proprietary cationic lipid, a non-cationic lipid (DOPE), a cholesterol-based lipid (cholesterol) and a PEG-modified lipid (DMG-PEG-2K). The final mRNA-LNP formulation was an aqueous suspension.


Example 6. Induction of a Neutralizing Antibody Response in Mice

This example demonstrates that an immunogenic composition of LNP-encapsulated mRNA comprising an optimized nucleotide sequence encoding a full-length pre-fusion stabilized SARS-COV-2 S protein induces a robust response of binding and neutralizing antibodies against the SARS-COV-2 S protein in mice.


The LNP formulation prepared in Example 5 was used to immunize mice twice by intramuscular injection (IM), at Day 0 and Day 21 (see FIG. 9C). Four groups of eight 6-8 week-old BALB/c mice were immunized with 0.2 μg, 1 μg, 5 μg or 10 μg mRNA per dose, respectively. A fifth group of mice (which served as a negative control) received only the diluent of the mRNA-LNP composition. Seven days (Day 7) prior to immunization a blood sample was taken from each mouse to determine the baseline level of antibodies against the SARS-COV-2 S protein. Additional blood samples were taken at Day 14, Day 21, Day 28 and Day 35. The mouse experiments were carried out in compliance with all pertinent US National Institutes of Health regulations and approval from the Animal Care and Use Committee of Covance Inc, Denver, PA.


An ELISA assay was used to determine the antibody titer against SARS-COV-2 S protein. 96-well plates were coated with commercially available SARS-COV-2 S protein (SinoBio), incubated with serially diluted mouse sera from Day −7, Day 14, Day 21, Day 28 and Day 35 and probed with secondary antibodies to detect bound total mouse IgG.


To determine titers of neutralizing antibodies, a pseudovirus-based assay was used. 39 individual conversion serum samples from COVID-19 patients with mild, strong and severe symptoms served as positive control. Serum samples were diluted 1:4 in medium (FluoroBrite phenol red free DMEM+10% FBS+10 mM HEPES+1% PS+1% Glutamax) and heat inactivated at 56° C. for 30 minutes. A further 2-fold, 9-point serial dilution series of the heat inactivated sera was performed in the same media. Diluted serum samples were mixed with a volume of Reporter Virus Particle (RVP)-Green Fluorescent Protein (GFP) (Integral Molecular) diluted to contain ˜300 infectious particles per well and incubated for 1 hour at 37° C. 96-well plates of ˜50% confluent 293T-hsACE2 clonal cells in 75 μL volume were inoculated with 50 μL of the serum+virus mixtures in singleton and incubated at 37° C. for 72h. At the end of the 72-hour incubation, plates were scanned on a high-content imager and individual GFP expressing cells counted. The neutralizing antibody titers are reported as the reciprocal of the dilution that reduced the number of virus plaques in the test by 50% (see FIG. 9B).


The results of this mouse immunization experiment are summarized FIGS. 9A and 9B. Even after a single shot, a robust antibody response was observed by ELISA at Day 14 for all tested doses (see FIG. 9A). A second shot resulted in a significant boost of the antibody response and dramatically improved the titer of neutralizing antibodies (see FIG. 9B). Administration of two doses of 1 μg, 5 μg or 10 μg mRNA resulted in comparable antibody titers as determined by ELISA at Day 35. As can be seen in FIG. 9B, two doses of 0.2 μg mRNA were slightly less effective in inducing neutralizing antibodies at Day 35, whereas two doses of 1 μg, 5 μg or 10 μg mRNA induced comparable titers of antibodies at Day 35, exceeding the titer of neutralizing antibodies observed in the conversion sera of human patients previously infected with SARS-CoV-2.


This example demonstrates that the immunogenic composition tested in this example induces a robust neutralizing antibody response after two doses. The magnitude of the response was dose-dependent. The results indicate that the immunogenic composition can induce neutralizing antibody titers comparable to those in convalescent human patients.


Example 7. Induction of a Th1-Biased T Cell Response in Mice

A vaccine that promotes Th1-biased immunity is typically more protective against viral pathogens than a vaccine that does not. The secretion of Th1 cytokines such as IFN-γ activates cytotoxic T lymphocytes (CTL), a sub-group of T cells, which can induce the death of cells infected with viruses. This example demonstrates that the immunogenic composition tested in Example 6 induces a Th1-biased T cell response in mice.


To further assess the quality of the immune response of the vaccine tested in Example 6, the experiment described in that example was repeated by immunizing groups of mice twice by IM injection with 5 μg or 10 μg mRNA, respectively. Blood was sampled on days Day-4 (baseline), Day 14, Day 21, Day 28 and Day 35 (see FIG. 10C). The mouse experiments were carried out in compliance with all pertinent US National Institutes of Health regulations and approval from the Animal Care and Use Committee of Covance Inc, Denver, PA. The mice were sacrificed on Day 35, and their spleens were removed. The isolated spleens were homogenized and splenocytes isolated as described below. IFN-γ and IL-5 secretion by peptide-stimulated splenocytes was determined by ELISPOT assay.


Harvested spleens were stored in a 5 mL of chilled medium on ice. Just prior to processing the spleens were placed into a sterile petri dish containing medium. The back of a 10 cc syringe plunger was used to homogenize the spleens. The homogenate was passed through a filter and transferred into a sterile tube. The homogenate was then be pelleted by centrifugation at 1200 rpm for 8-10 minutes. Supernatant was gently poured off and edge of tube blotted with a clean paper towel. ACK lysis buffer was added to lyse the red blood cells and cells were incubated at room temperature for 5 min. The tube was centrifuged at 1200 rpm for 8-10 minutes. Supernatants were poured off and pellet resuspended in 2 mM L-Glutamine CTL-Test Media. The suspensions were filtered into new 15 mL conical tubes. The cells were maintained at 37° C. in humidified incubator, 5% CO2 until use.


Solution with PepMix™ SARS-COV-2 (Spike Glycoprotein, Cat #PM-WCPV-S-1) peptide pool 1 and peptide pool 2 were prepared using test medium. Final concentration of each peptide in the assay was 2 μg/ml. As a positive control, 1 μg/ml of ConA in test medium were used. These antigen/mitogen solutions were plated at 100 μL/well. The plates containing the antigen/mitogen solutions were placed into a 37° C. incubator for 10-20 minutes before plating cells to ensure the pH and temperature were optimal for cells. The cell concentration was adjusted to the desired concentration. 0.3×106/100 μl/well splenocytes were added to the plates with the antigen/mitogen solution. Once completed, the plate was gently taped and placed into a 37° C. humidified incubator, 5% CO2 and incubated overnight. Plates were washed 2× with PBS and then 2× with 0.05% Tween-PBS, 200 μL/well.


Mouse IFN-γ/IL-5 Double-Color enzymatic ELISPOT kits (CTL Shaker Heights, Cleveland,) were used according to the manufacture's protocol. Detection solution was prepared per manufacturer's instructions and 80 μL was added to each well. The plates were then incubated at RT for 2 hrs. Plates were washed 3× with 0.05% Tween-PBS, 200 μL/well. Tertiary solution at 80 μL/well was added and plates will be incubated at RT for 30 min. Plates were washed 2× with 0.05% Tween-PBS, and then 2× with distilled water, 200 μL/well each time. Developer Solution was added to wells at 80 μL/well and incubated at RT for 15 min. Reaction was stopped by gently rinsing membrane with tap water three times. Plates were air-dried and scanned using a CTL analyzer. The number of cytokine producing cells per million cells is reported (see FIG. 10).


As can be seen from FIG. 10A, splenocytes isolated at Day 35 from mice immunized twice with either 5 μg or 10 μg of mRNA secreted large amounts of the Th1 cytokine IFN-γ. As can be seen from FIG. 10B, these cells did not, however, secrete detectable amounts of the Th2 cytokine IL-5.


This example demonstrates that the tested immunogenic composition is effective in inducing a Th1-biased T cell response in mice, indicating that vaccination with this immunogenic composition can induce a CTL response that recognizes and eliminates SARS-COV-2-infected cells.


Example 8. Induction of a Neutralizing Antibody Response in Cynomolgus Monkeys

This example demonstrates that an immunogenic composition of LNP-encapsulated mRNA comprising an optimized nucleotide sequence encoding a full-length pre-fusion stabilized SARS-COV-2 S protein induces a robust response of binding and neutralizing antibodies against the SARS-COV-2 S protein in cynomolgus monkeys.


The LNP formulation prepared in Example 5 was used to immunize monkeys twice by IM administration, at Day 0 and Day 21 (see FIG. 11D). Three groups of four 3-4 year-old cynomolgus monkeys were immunized with 15 μg, 45 μg or 135 μg mRNA per dose, respectively. Four days (Day −4) prior to immunization a blood sample was taken from each monkey to determine the baseline level of antibodies against the SARS-COV-2 S protein. Additional blood samples were taken at Day −4, Day 2, Day 7, Day 14, Day 21, Day 23, Day 28 and Day 35 and Day 42. Cynomolgus monkey experiments were carried out in compliance with all pertinent US National Institutes of Health regulations and approval from the Animal Care and Use Committee of the New Iberia Research Center.


An ELISA assay was used to determine the antibody titers against SARS-COV-2 S protein in the blood samples obtained from the cynomolgus monkeys. 39 individual serum samples from COVID-19 patients with mild, strong and severe symptoms served as positive control. Nunc microwell plates were coated with SARS-COV S-GCN4 protein (GeneArt, expressed in Expi 293 cell line) at 0.5 μg/ml in PBS overnight at 4° C. Plates were washed 3 times with PBS-Tween 0.1% before blocking with 1% BSA in PBS-Tween 0.1% for 1 hour. Samples were plated with 1:450 initial dilution followed by 3-fold, 7-point serial dilution in blocking buffer. Plates were washed 3 times after 1-hour incubation at room temperature before adding 50 ul of 1:5000 Rabbit anti-human IgG (Jackson Immuno Research) to each well. Plates were incubated at room temperature for 1 hr and washed 3×. Plates were developed using Pierce 1-Step Ultra TMB-ELISA Substrate Solution for 6 minutes and stopped by TMB STOP solution. Plates were read at 450 nm in SpectraMax plate reader. Antibody titers were reported as the highest dilution that is equal to 0.2 OD cutoff.


Titers of neutralizing antibodies in the serum of the cynomolgus monkeys were determined using a pseudovirus-based assay. 39 individual conversion serum samples from COVID-19 patients with mild, strong and severe symptoms served as positive control. Serum samples were diluted 1:4 in media (FluoroBrite phenol red free DMEM+10% FBS+10 mM HEPES+1% PS+1% Glutamax) and heat inactivated at 56° C. for 30 minutes. A further, 2-fold, 9-point, serial dilution series of the heat inactivated serum was performed in media. Diluted serum samples were mixed with a volume of reporter virus particle (RVP)-GFP (Integral Molecular) diluted to contain ˜300 infectious particles per well and incubated for 1 hour at 37° C. 96-well plates of ˜50% confluent 293T-hsACE2 clonal cells in 75 μL volume were inoculated with 50 μL of the serum+virus mixtures in singleton and incubated at 37° C. for 72h. At the end of the incubation, plates were scanned on a high-content imager and individual GFP expressing cells counted. The neutralizing antibody titers are reported as the reciprocal of the dilution that reduced the number of virus plaques in the test by 50% (see FIG. 11B).


In addition, the microneutralization titer of each monkey sample was determined, using the 39 human conversion sera as positive controls. Vero E6 cells were seeded into 96-well flat bottom cell culture plates at a concentration of 2×104 cells in 0.1 mL per well one day before use. On the day of the experiment, starting at a 1:10 dilution, 2-fold serial dilutions of heat-inactivated monkey or human sera were incubated with SARS-COV-2 virus (e.g., isolate USA-WA1/2020 [BEI Resources; catalog #NR-52281] in a 37° C. incubator for 60±5 minutes. Then the growth medium was aseptically removed from the Vero E6 cells and the test samples (sera and virus) were added to the Vero E6-seeded plates and incubate in a 37° C. incubator for 30±5 minutes. Subsequently, 100 μL of growth medium was added to all wells of all the plates without removing the existing inoculum. The plates were then placed back into the incubator and incubated for 2 days. Two days post infection, the cells were fixed and stained with primary antibody (SARS-COV anti-nucleoprotein mouse monoclonal antibody (SinoBio catalog #40143-MM05 or equivalent) and then with HRP-tagged secondary antibody (Horseradish peroxidase (HRP)-conjugated goat anti-mouse immunoglobulin G (IgG) antibody (Jackson ImmunoResearch Laboratories, catalog #115-035-062 or equivalent).


The results of these assays are summarized in FIG. 11. Even at the lowest tested mRNA dose of 15 μg, a robust binding and neutralizing antibody response was observed after two shots (see FIGS. 11A and 11B). Administration of two doses of 15 μg, 45 μg or 135 μg mRNA resulted in comparable antibody titers as determined by ELISA at Days 28, 35 and 42 (see FIG. 11A). Two doses of 15 μg or 45 μg mRNA also yielded comparable levels of neutralizing antibodies at these days (see FIG. 11B). Two doses of 135 μg mRNA induced titers of antibodies at Days 28, 35 and 42 that exceeded the titers of neutralizing antibodies observed in conversion sera of human patients infected with SARS-COV-2. The microneutralization titer assay provided similar results, with 15 μg and 45 μg mRNA doses resulting in comparable titers, and the 135 μg dose exceeding the titers observed in conversion sera of human patients infected with SARS-COV-2 (see FIG. 11C).


This example demonstrates that the tested immunogenic composition induces a robust neutralizing antibody response even at the lowest dose of 15 μg after two shots, when the period between administrations is at least 2 weeks (in particular about 3 weeks). The data support the use of the test composition in human patients to induce a protective neutralizing antibody response.


Example 9. Induction of a Th1-Biased T Cell Response in Cynomolgus Monkeys

This example demonstrates that the immunogenic composition tested in Example 8 induces a Th1-biased T cell response in cynomolgus monkeys.


To further assess the quality of the immune response of the vaccine tested in Example 8, PBMCs were isolated as cynomolgus blood samples. Isolated PBMCs were stored in cryovials. T cell responses were assessed by determining IFN-γ and IL-13 secretion by peptide-stimulated PBMC using ELISPOT assays. Naïve PBMCs served as a control to establish baseline levels of IFN-γ or IL-13 secretion in non-activated, non-stimulated cells. The results are summarized in FIG. 12.


To perform the assays, complete medium for monkey PBMCs (DMEM1640+10% heat-inactivated FCS) was prewarmed in a 37° C. water bath. PBMCs cryovials were quickly thawed in a 37° C. water bath, and their content was slowly transferred dropwise into the prewarmed medium in conical tubes. The tubes were then centrifuged at 1500 RPM for 5 mins. The cell pellets were washed once with prewarmed complete medium, and re-pelleted at 1500 RPM for 15 min. The supernatant was discarded, and PBMCs were resuspended with complete medium and counted using a Guava cell counter.


Monkey IFN-γ ELISPOT kit (CTL, cat #3421M-4APW) and IL-13 ELISPOT kit (CTL, cat #3470M-4APW) were used to determine the levels of IFN-γ and IL-13 secretion by peptide-stimulated PBMCs. The precoated plates provided with the kits were washed 4 times with sterile PBS and then blocked with 200 μl/well complete medium. The blocking step was performed in a 37° C. incubator for at least 30 minutes. PepMix™ SARS-COV-2 (JPT Cat #PM-WCPV-S-1) peptide pool 1 and pool 2 were used as recall antigens at a final concentration of 2 μg/ml per peptide in the assay. 2 μg/ml of Concanavalin A (Sigma, cat #C5275) was used as a positive control. 50 μl of recall antigen and 300,000 PBMCs in 50 μl were added to each well for stimulation. The plates were then placed in a 37° C., 5% CO2 humidified incubator for 24 hours. Following the 24 hour incubation, the plates were washed 5 times with PBS. 100 μl/well of biotinylated anti-IFN-γ or anti-IL-13 detection antibodies (1 μg/ml) prepared in PBS containing 1% fetal calf serum were added, and the plates were incubated for 2 hours at room temperature. The plates were then washed 5 times with PBS as before and incubated for 1 hr at room temperature with 100 μl/well of streptavidin at a dilution of 1:1000 in PBS containing 1% fetal calf serum. The plates were again washed 5 times with PBS and developed using 100 μl/well BCIP/NBT substrate solution until the spots were visible. Color development was stopped by washing the plates in tap water. The plates were then dried overnight, scanned, and spots were counted using a CTL analyzer. The data are reported as spot forming cells (SFC) per million PBMCs (see FIG. 12).


As can be seen from FIGS. 12A (peptide pool S1) and 12C (peptide pool S2), PBMCs isolated at Day 42 from monkeys immunized twice with a dose of 15 μg, 45 μg or 135 μg mRNA secreted large amounts of the Th1 cytokine IFN-γ in response to stimulation with peptides derived from the SARS-COV-2 S protein. In contrast, these cells secreted only baseline amounts of the Th2 cytokine IL-13 in response to peptide stimulation (see FIGS. 12B (peptide pool S1) and 12D (peptide pool S2)).


This example demonstrates that the tested immunogenic composition is effective in inducing a Th1-biased T cell response in cynomolgus monkeys, indicating that vaccination with this immunogenic composition can induce a CTL response in humans that recognizes and eliminates SARS-COV-2-infected cells.


Example 10. Dose Modelling

This example demonstrates that low mRNA doses of the immunogenic composition tested in Examples 6 and 8 are effective in yielding neutralizing antibody titers that are significantly higher than corresponding titers observed in a control panel of convalescent sera from COVID-19 patients.


There were no statistically significant differences in pseudovirus neutralization titers on Day 35 between 1 μg, 5 μg and 10 μg groups of immunized mice described in Example 6, suggesting a dose-saturation effect beyond 1 μg of mRNA comprising the tested optimized nucleotide sequence encoding a full-length pre-fusion stabilized SARS-COV-2 S protein. Peak pseudovirus neutralization titers on Day 35 in mice were significantly higher than corresponding titers observed in the control panel of convalescent sera from COVID-19 patients (see FIG. 13A).


The results from both the pseudovirus neutralization assay and the microneutralization assay for the cynomolgus monkey experiments described in Example 8 were highly correlated (FIG. 13B). Regardless of the dose levels, Day 35 pseudovirus and microneutralization titers were about 130-fold higher than that of pre-immune animals. Further statistical analysis of a complete data set with 93 convalescent sera from COVID-19 patients revealed that the titers obtained with mRNA doses of 15 μg, 45 μg and 135 μg, respectively, were significantly higher than corresponding titers observed in the convalescent human sera (all P values were less than 0.005; FIGS. 13C and 13D).


This example supports an mRNA dose range of 10 μg to 200 μg for human clinical trials that investigate the safety and efficacy of the immunogenic composition prepared in Example 5. Indeed, a dose between 15 μg and 45 μg may be sufficient to induce an effective neutralizing antibody response, while being well-tolerated at the same time.


Example 11. Immunogenicity of mRNAs Encoding Full-Length Prefusion Stabilized SARS-CoV-2 S Proteins

This example demonstrates that an mRNA encoding a SARS-COV-2 S protein that has been modified relative to naturally occurring SARS-COV-2 S protein to remove the furin cleavage site and to mutate residues 986 and 987 to proline (2P/GSAS) is more effective in eliciting a neutralizing antibody response than mRNA encoding other full-length prefusion stabilized SARS-COV-2 S protein.


To determine the impact of mutations that stabilize the SARS-COV-2 S protein in its prefusion confirmation on immunogenicity, seven mRNA constructs—a wild-type SARS-COV-2 S protein (WT) and corresponding prefusion stabilized SARS-COV-2 S proteins (2P, GSAS, 2P/GSAS, 2P/GSAS/ALAYT, 6P and 6P/GSAS, respectively)—were formulated in a lipid nanoparticle (LNP) as mRNA vaccines as described in Example 5. WT, 2P/GSAS, 2P, GSAS, correspond to constructs A-D in example 3 respectively. 2P/GSAS/KLHYT is a SARS-COV-2 S protein mutated to remove a furin cleavage site, to replace residues 986 and 987 with proline and to mutate the ER retrieval signal, which has the optimized nucleic acid sequence of SEQ ID NO: 124 and an amino acid sequence of SEQ ID NO: 125. 6P is a SARS-COV-2 S protein mutated to replace residues 817, 892, 899, 942, 986 and 987 with proline, which has the optimized nucleic acid sequence of SEQ ID NO: 128 and an amino acid sequence of SEQ ID NO: 129. 6P/GSAS is a SARS-COV-2 S protein mutated to remove a furin cleavage site and to replace residues 817, 892, 899, 942, 986 and 987 with proline, which has the optimized nucleic acid sequence of SEQ ID NO: 130 and an amino acid sequence of SEQ ID NO: 131.


Two animal models were used for the immune assessment. BALB/c mice were administered two immunizations at a three-week interval with a 0.4 μg per dose of each of five formulations (WT, 2P, GSAS, 2P/GSAS, 2P/GSAS/ALAYT). In parallel, non-human primates (NHPs) were immunized using the same immunization schedule at 5 μg per dose of six S mRNA vaccines (2P, GSAS, 2P/GSAS, 2P/GSAS/ALAYT, 6P and 6P/GSAS).


To evaluate for functional antibodies, e.g., nAbs titers, the ability of immune sera to neutralize the infectivity of GFP reporter pseudoviral particles (RVP) in HEK-293T cells stably over-expressing human ACE2 was tested. RVPs expressing SARS COV-2 S protein are capable of a single round of infection, indicated by GFP expression upon entry. Neutralizing potency was determined as the serum dilution which can achieve 50% inhibition of RVP entry (ID50). In addition, Enzyme-Linked Immunosorbent Assay (ELISA) titers were evaluated using a recombinant soluble S-protein trimerized by GCN4 helix bundle as antigen.


Although a few animals developed neutralizing titers at Day 14 after the first immunization, the titers were in general low. Expectedly, the majority of test animals developed neutralizing titers after the second immunization (FIG. 14). On Day 35, the geometric mean titers (GMTs) with the 95% confidence interval (95% CI) for pseudoviral (PsV) nAb titers in mice were 152 (36; 645) for WT, 195 (44; 870) for 2P, 1005 (261; 3877) for GSAS, 354 (129; 976) for 2P/GSAS and 940 for 2P/GSAS/ALAYT. There was a trend for higher GMTs, especially at Day 35 and Day 42, for the three constructs with GSAS mutations when compared to those of WT and 2P constructs.


In NHPs, diverse neutralizing titers were observed within each group even after the second immunization (FIG. 14). 2P and 6P/GSAS vaccines showed lower immunogenicity than other constructs with GMTs at Day 35 of 78 and 10, respectively. The 6P vaccine failed to elicit any detectable neutralizing titers. Consistent with the observations in the mouse study, all GSAS constructs with the exception of 6P/GSAS induced higher neutralizing titers after the second dose, with GMTs (95% CI) at D35 recorded as 425 (48; 3769) for GSAS, 772 (116; 5121) for 2P/GSAS, 280 (11; 6970) for 2P/GSAS/ALAYT, as compared to those of the 2P vaccine group. The trending of GMTs in both mice and NHPs suggested superior immunogenicity for 2P/GSAS to other constructs. Moreover, the peak PsVNa titers (Day 35) for the 2P/GSAS variant in mice and NHPs were comparable or higher than the titers observed in a panel of 93 convalescent sera from COVID-19 patients.


This example demonstrates that the GSAS mutation is beneficial for vaccine immunogenicity. The 2P mutation, which was introduced for stabilization of prefusion form of S protein, appeared beneficial in the context of the GSAS mutation, while ALAYT showed less impact on immunogenicity, especially in NHPs, in the context of 2P/GSAS. Accordingly, this example provides further confirmation that an optimized mRNA encoding a SARS-COV-2 S protein that has been modified relative to naturally occurring SARS-COV-2 S protein to remove the furin cleavage site and to mutate residues 986 and 987 to proline can be more effective in inducing neutralizing antibodies than mRNAs encoding other prefusion stabilized SARS-COV-2 S protein.


Example 12. Protective Efficacy in Syrian Golden Hamsters

This example demonstrates that an immunogenic composition of LNP-encapsulated mRNA comprising an optimized nucleotide sequence encoding a SARS-COV-2 S protein that has been modified relative to naturally occurring SARS-COV-2 S protein to remove the furin cleavage site and to mutate residues 986 and 987 to proline can have protective efficacy in an animal model of COVID-19 by reducing viral infection of the lung and preventing lung pathology.


SARS-COV-2 infection in Syrian golden hamster is a pathology model, where the viral infection is associated with high levels of virus replication with peak titers in the lungs and nasal epitheliums at 2 day post infection (DPI), histopathological evidence of disease in lungs at 7 DPI, and about 8-15% weight loss around 7 DPI.


To evaluate the potential of the LNP formulation prepared in Example 5 to protect against viral infection and disease, Syrian golden hamsters were immunized with four vaccine formulation dose levels of 0.15, 1.5, 4.5 or 13.5 μg per dose, either per a single IM immunization at D21 or two IM administrations at Day 0 and Day 21. Animals were challenged at Day 49 via intranasal (IN) inoculation of SARS-COV-2 and monitored for clinical manifestations of disease as body weight loss at 8 DPI. Lungs and nasal tissues were harvested at 4 or 7 DPI for histopathology, and for quantification of viral replication by subgenomic RNA RT-PCR assays.


The LNP formulation of Example 5 induced robust dose-dependent neutralizing antibody responses after the first vaccination, which were significantly enhanced by the second immunization. After the first immunization, all animals, except for the 0.15 μg dose group, developed neutralizing antibodies recorded as plaque reduction neutralization titers (PRNT) against wild-type SARS-COV-2 virus. Day 35 PRNT50 GMTs for single-dose immunization schedules were 237, 410 and 711 for 1.5, 4.5 and 13.5 μg dose respectively, while corresponding values for two-dose groups were 3219, 2446 and 3219. Despite the observed trend towards higher titers with increasing dose, the differences between titers in the 1.5, 4.5 and 13.5 μg groups were not statistically significant.


To test the protective effects of vaccination, all groups were challenged intranasally. The body weight for each animal was monitored daily for 7 days (FIG. 15a). Sham (diluent) vaccinated animals were observed with most significant weight loss, with more than 10% loss at 7 DPI. The vaccination regimens of 1.5, 4.5, and 13.5 μg, regardless of one-dose or two-dose regimens, protected animals against body weight loss, with most animals experiencing less than 5% loss, with the loss mostly peaking around 2-3 DPI. There was no significant difference for the weight comparison among these groups. The only group experiencing a similar degree of weight loss, compared to that of sham group, was the 0.15 μg dose group with single immunization.


To assess the pathology caused by viral infection, lung samples were harvested from 4 animals of each group on either 4 or 7 DPI, and the fixed tissues were sectioned, and randomized and blinded for histopathological examination. A pathology score of 0-3 was assigned to each sample, based on severity of tissue damages, with higher score reflecting the more severe pathology. A score of 1 was attributed to lung sections that revealed histopathology findings in less than 25% of the section. Similarly, if greater than 25% but less than 50% of the parenchyma was involved, a score of 2 was assigned. A score of 3 was designated to those sections where more than 50% of the total section was affected. Sham vaccinated hamsters inoculated with SARS-COV-2 revealed widespread lung histopathology which resemble the reports of severe pneumonia detected in COVID-19 patients (FIG. 15b). Lungs from naïve hamsters were histologically unremarkable. Similar lesions could be seen in lung samples from the 0.15 μg dose group of single vaccination, which was scored as 3 in blind examination. On the contrary, the lung samples from the 13.5 μg dose group of single vaccination revealed no such lesions, similar to that of health control, and both were scored as 0 (FIG. 15c).


Lung pathology was markedly attenuated in hamsters that received either one or two doses of the LNP formulation of Example 5, and there appeared to be a dose-dependent effect at both 4 and 7 DPI (FIG. 15b). While a single vaccination of 1.5, 4.5 and 13.5 μg substantially attenuated pathology caused by infection, the two-dose vaccination of 1.5, 4.5 and 13.5 μg provided almost complete protection against pathology. The very low dose level of 0.15 μg showed no protection when used in a single-dose regimen but some marginal protection in a two-dose vaccination regimen.


To assess whether immunization with the LNP formulation of Example 5 could impact viral infection in hamsters, viral subgenomic mRNA (sgRNA) from lung and nasal samples by RT PCR were measured. Lung and nasal samples of half the group (n=4) were collected at either 4 or 7 DPI and total RNA was processed for detection of sgRNA by RT-PCR (FIG. 15d). For lung samples collected at 4 and 7 DPI, the sham vaccinated group yielded about 108 and 105 copies per gram tissues, respectively, while those receiving the 13.5 μg two-dose regimen were below the level of detection at both time points. The lung samples from those receiving the 1.5 μg and 4.5 μg two-dose regimens had a nearly 3 log reduction in viral sgRNA copies at 4 DPI and were below detection at 7 DPI. For the lung samples from those receiving the 1.5, 4.5 and 13.5 μg single-dose vaccination, the viral loads at 4 DPI were not different from those of the sham vaccinated group while the loads at 7 DPI were below the threshold of detection. Notably, the lung samples from the 0.15 μg receiving one-dose or two-dose regimens had similar or even higher viral load as compared to those of the sham vaccinated group at either 4 or 7 DPI. However, the viral loads (sgRNA) were more diverse at 4 DPI among all groups, with one or two animals testing negative in most groups. The only group that achieved clearance of viral sgRNA in nasal samples at 7 DPI was the 13.5 μg two-dose vaccination group.


This example demonstrates that the immunogenic composition prepared in Example 5 can reduce viral infection of the lung and prevent lung pathology in an animal model of COVID-19. Immunization with the immunogenic composition prepared in Example 5 may have an impact on transmission due to shortened duration and lower loads of viral shedding from the upper respiratory tract.


Example 13. Preparation of mRNA-Encapsulating Lipid Nanoparticles

An mRNA comprising an optimized nucleotide sequence encoding a full-length SARS-COV-2 S protein that has been modified relative to naturally occurring SARS-COV-2 spike protein to remove the furin cleavage site, to mutate residues 986 and 987 to proline and which contains the L18F, D80A, D215G, ΔL242, ΔA243, ΔL244, K417N, E484K, N501Y, D614G and A701V mutations (South African variant 2+D614G) was synthesized in vitro. The mRNA was prepared using a template plasmid comprising the sequence SEQ ID NO: 166 operable linked to an RNA polymerase promoter sequence.


Template-dependent RNA synthesis of unmodified nucleotides yielded a polynucleotide with the nucleic acid sequence of SEQ ID NO: 172 which comprises the optimized nucleic acid sequence of SEQ ID NO: 173. In a multi-step, enzyme-catalyzed process, the final mRNA product was synthesized, which was purified to remove enzyme reagents and prematurely aborted synthesis products (“shortmers”).


The final mRNA had the structural elements shown in mRNA construct 2 in Table 4. The SARS-COV-2 S protein coding sequence is flanked by 5′ and 3′ untranslated regions (UTRs) of 140 and 105 nucleotides, respectively. The mRNA also contains a 5′ cap structure consisting of a 7-methyl guanosine (m7G) residue linked via an inverted 5′5′ triphosphate bridge to the first nucleoside of the 5′ UTR, which is itself modified by 2′Oribose methylation. The 5′ cap is essential for initiation of translation by the ribosome. The entire linear structure is terminated at the 3′ end by a tract of approximately 100 to 500 adenosine nucleosides (polyA). The polyA region confers stability to the mRNA and is also thought to enhance translation. All of these structural elements are naturally occurring components which are required for the efficient translation of the SARS-CoV-2 spike mRNA.


The purified mRNA was encapsulated in lipid nanoparticles (LNPs) comprising 40% cKK-E10, 30% DOPE, 28.5% Cholesterol and 1.5% DMG-PEG-2K (molar ratios). The final mRNA-LNP formulation was an aqueous suspension.


Example 14. Neutralizing Antibody Response Effective Against Variant Strains of SARS-COV-2

This example demonstrates that non-human primates (NHPs), which previously had been immunized with two doses of the LNP formulation of Example 5, mount an effective neutralizing antibody response against the SARS-COV-2 S protein derived from the original Wuhan strain as well as naturally occurring variants of the SARS-COV-2 S protein observed in South Africa, Japan/Brazil and California, and an S protein derived from SARS-COV-1 in response to exposure with an immunogenic composition of LNP-encapsulated mRNA comprising an optimized nucleotide sequence encoding a SARS-COV-2 S protein that has been modified relative to naturally occurring SARS-COV-2 S protein to remove the furin cleavage site to mutate residues 986 and 987 to proline and which contains the L18F, D80A, D215G, ΔL242, ΔA243, ΔL244, K417N, E484K, N501Y, D614G and A701V mutations of a South African variant (South African variant 2+D614G) of SARS-COV-2. The immunogenic composition was prepared as described in Example 13.


A non-human primate (NHP) model (cynomolgus monkeys) was used to investigate whether the original antigen specificity towards the original Wuhan strain, which was induced by the mRNA vaccine described in Example 5 (encoding a prefusion-stabilized Wuhan variant of the SARS-COV-2 protein), could be overcome by subsequent immunization with an mRNA vaccine comprising an optimized nucleotide sequence encoding a prefusion-stabilized South African (SA) variant of the SARS-COV-2 S protein, either alone or in combination of the mRNA vaccine of Example 5 (Wuhan), in order to elicit a broad immune response targeting different circulating variants of SARS-COV-2 and an S protein derived from SARS-COV-1. Cynomolgus monkeys (n=4) were immunized twice three weeks apart (Day 0 and Day 21) with either 15 μg, 45 μg or 135 μg each of the LNP formulation prepared in Example 5. On Day 315 animals were randomized, distributed in two groups and immunized. Group 1 was immunized with an mRNA vaccine described in Example 13, which contained mutations derived from a South African variant of SARS-COV-2 (SA alone). Group 2 was immunized with a formulation that contained the original mRNA vaccine from Example 5 plus the variant given to Group 1 (Wuhan+SA). Both Group 1 and Group 2 received a total mRNA dose of 10 μg. The study was designed to evaluate whether a bivalent immunogenic composition (Wuhan+SA) was required to broaden the antigen response, or whether a monovalent immunogenic composition comprising a SARS-CoV-2 S protein derived from a non-Wuhan variant (SA alone) was sufficient to broaden the antigen response.


Serum samples from pre-immunized and pre-boost animals (Day 4, Day 308) as well as samples collected on Day 14, Day 21, Day 28, Day 35, Day 42, Day 90, Day 308 and Day 329 were tested in a Wuhan S-protein-expressing pseudovirus (PsV) neutralization assay. Serum samples collected on Day 35, Day 308 and Day 329 were tested in pseudovirus (PsV) neutralization assays. The tested PsVs expressed an S protein derived from SARS-COV-2 strains Wuhan, South African (SA 20C and SA 20H), Japan/Brazil (Jap/Braz) or California, or an S protein derived from a SARS-COV-1 strain, as shown in FIG. 16. Serum samples were diluted in medium (FluoroBrite phenol red free DMEM+10% FBS+10 mM HEPES+1% PS+1% Glutamax) and heat-inactivated at 56° C. for 30 minutes. A further, 2-fold, 11-point, serial dilution series of the heat-inactivated serum was performed in medium. Diluted serum samples were mixed with reporter virus particle (RVP)-GFP (Integral Molecular) diluted to contain ˜300 infectious particles per well and incubated for 1 hour at 37° C. 96-well plates of ˜50% confluent 293T-hsACE2 clonal cells in 75 μL volume were inoculated with 50 μL of the serum+RVP mixtures in singleton and incubated at 37° C. for 72h. At the end of the incubation, plates were scanned on a high-content imager and individual GFP expressing cells were counted. The inhibitory dilution titer (ID50) was reported as the reciprocal of the dilution that reduced the number of virus plaques in the test by 50%. ID50 for each test sample was interpolated by calculating the slope and intercept using the last dilution with a plaque number below the 50% neutralization point and the first dilution with a plaque number above the 50% neutralization point (ID50 Titer=(50% neutralization point-intercept)/slope). The results are summarised in FIG. 17.


As can be seen from FIG. 17, in both groups of NHPs booster immunization with an mRNA vaccine comprising an optimized nucleotide sequence encoding a fusion-stabilized South African variant of the SARS-COV-2 S protein about 9 months after the original 2-dose prime-boost immunization resulted in high neutralization potencies against Wuhan PsV, which expressed the SARS-COV-2 S protein of the original Wuhan strain. These data suggest that exposure to an mRNA vaccine encoding a South African variant of the SARS-COV 2 S protein boosts the neutralizing antibody response against the SARS-COV-2 S protein encoded by the original mRNA vaccine. Exposure to a mixture of the mRNA vaccine encoding the prefusion stabilized South African variant of the SARS-COV-2 S protein and the original mRNA encoding a prefusion stabilized S protein derived from the Wuhan strain was no more effective in boosting a neutralizing antibody response against the S protein of the original Wuhan strain than exposure to only the mRNA vaccine encoding the prefusion stabilized South African variant of the SARS-CoV-2 S protein.


Interestingly, immunization with an mRNA vaccine encoding the prefusion stabilized South African variant of the SARS-COV-2 S protein also resulted in high neutralization potencies against all other tested PsV, which expressed a naturally occurring variant of the SARS-CoV-2 S protein observed in South Africa and naturally occurring variants of the SARS-COV-2 S protein observed in Japan/Brazil and California. Surprisingly, the antigen response was so broad that PsVs expressing the S protein of SARS-COV-1 were also effectively neutralized by the NHP test sera. This was unexpected since the S protein of SARS-COV-1 is only 76% identical to the S protein of SARS-COV-2 Wuhan.


As can be seen from FIG. 17, in most instances the neutralizing antibody response was as effective against a variant S protein as against the S protein derived from the original Wuhan strain. Moreover, the magnitude of the neutralizing antibody response observed after booster immunization with an mRNA vaccine encoding a prefusion stabilized South African variant of the SARS-COV-2 S protein was similar or greater to the neutralizing antibody response induced at Day 35 in response to the original prime-boost immunization with the mRNA vaccine of Example 5.


These data demonstrate that subjects who have been previously immunized with a vaccine that elicits neutralizing antibodies against the S protein of SARS-COV-2 Wuhan and who are subsequently administered an mRNA vaccine comprising an optimized nucleotide sequence of the invention that encodes a prefusion stabilized South African variant of the SARS-COV-2 S protein are able to mount a broad neutralizing antibody response effective against a wide variety of S protein variants and therefore should be effectively protected against COVID-19 infections caused by naturally occurring variants of the original SARS-COV-2 Wuhan strain, as well as other β-coronaviruses, in particular those expressing a spike protein which binds to angiotensin-converting enzyme 2 (ACE2), such as SARS-COV-1.

Claims
  • 1. A nucleic acid comprising an optimized nucleotide sequence encoding a full-length SARS-COV-2 spike protein which has been modified relative to naturally occurring full-length SARS-COV-2 spike protein of SEQ ID NO: 1 to remove the furin cleavage site and to mutate residues 986 and 987 to proline, wherein the optimized nucleotide sequence consists of codons associated with a usage frequency which is greater than or equal to 10%; wherein the optimized nucleotide sequence: (i) does not contain a termination signal having one of the following nucleotide sequences: 5′-X1ATCTX2TX3-3′, wherein X1, X2 and X3 are independently selected from A, C, T or G; and 5′-X1AUCUX2UX3-3′, wherein X1, X2 and X3 are independently selected from A, C, U or G;(ii) does not contain any negative cis-regulatory elements and negative repeat elements; and(iii) has a codon adaptation index greater than 0.8;
  • 2. The nucleic acid of claim 1, wherein the optimized nucleotide sequence does not contain a termination signal having one of the following sequences: TATCTGTT; TTTTTT; AAGCTT; GAAGAGC; TCTAGA; UAUCUGUU; UUUUUU; AAGCUU; GAAGAGC; UCUAGA.
  • 3. The nucleic acid of claim 1, wherein the full-length SARS-CoV-2 spike protein encoded by the optimized sequence further contains the L18F, D80A, D215G, L242-, A243-, L244-, K417N, E484K, N501Y, D614G and A701V mutations.
  • 4. The nucleic acid of claim 1, wherein the nucleic acid is mRNA.
  • 5. (canceled)
  • 6. The nucleic acid of claim 1, wherein the optimized nucleotide sequence encodes an amino acid sequence comprising SEQ ID NO:11, optionally wherein the optimized nucleotide sequence has the nucleic acid sequence of SEQ ID NO: 44 or SEQ ID NO: 148; or wherein the optimized nucleotide sequence encodes an amino acid sequence comprising SEQ ID NO: 167, optionally wherein the optimized nucleotide sequence has the nucleic acid sequence of SEQ ID NO: 166 or SEQ ID NO: 173.
  • 7.-16. (canceled)
  • 17. A pharmaceutical composition comprising i) the nucleic acid of claim 1 and ii) a lipid nanoparticle, wherein the nucleic acid is encapsulated in the lipid nanoparticle.
  • 18. (canceled)
  • 19. The pharmaceutical composition of claim 17, wherein the lipid nanoparticle comprises a cationic lipid, a non-cationic lipid, a cholesterol-based lipid, and a PEG-modified lipid.
  • 20.-35. (canceled)
  • 36. The nucleic acid of claim 4, wherein the mRNA comprises an mRNA construct consisting of the following structural elements: (i) a 5′ cap with the following structure:
  • 37. The nucleic acid of claim 4, wherein the mRNA comprises an mRNA construct consisting of the following structural elements: (i) a 5′ cap with the following structure:
  • 38.-43. (canceled)
  • 44. The pharmaceutical composition of claim 17 comprising the mRNA construct of claim 36 and/or the mRNA construct of claim 37.
  • 45. The immunogenic composition according to claim 44 comprising between 5 μg and 200 μg of the mRNA construct(s).
  • 46.-60. (canceled)
  • 61. An immunogenic composition comprising at least two nucleic acids, wherein 1. the first nucleic acid comprises an optimized nucleotide sequence encoding a full-length SARS-COV-2 spike protein which has been modified relative to naturally occurring full-length SARS-COV-2 spike protein of SEQ ID NO: 1 to remove the furin cleavage site and to mutate residues 986 and 987 to proline; and2. the second nucleic acid comprises an optimized nucleotide sequence encoding a full-length SARS-COV-2 spike protein which has been modified relative to naturally occurring full-length SARS-COV-2 spike protein of SEQ ID NO: 1 to remove the furin cleavage site and to mutate residues 986 and 987 to proline and further contains the L18F, D80A, D215G, L242-, A243-, L244-, K417N, E484K, N501Y, D614G and A701V mutations.
  • 62. The immunogenic composition according to claim 61, wherein the first nucleic acid comprises an optimized nucleotide sequence which encodes an amino acid sequence comprising SEQ ID NO: 11, optionally wherein the optimized nucleotide sequence has the nucleic acid sequence of SEQ ID NO: 44 or SEQ ID NO: 148.
  • 63. The immunogenic composition according to claim 61 or 62, wherein the second nucleic acid comprises an optimized nucleotide sequence that encodes an amino acid sequence comprising SEQ ID NO: 167, optionally wherein the optimized nucleotide sequence has the nucleic acid sequence of SEQ ID NO: 166 or SEQ ID NO: 173.
  • 64. (canceled)
  • 65. The immunogenic composition according to claim 61, wherein the optimized nucleotide sequence of the first nucleic acid has the nucleic acid sequence of SEQ ID NO: 148, and wherein the optimized nucleotide sequence of the second nucleic acid has the nucleic acid sequence of SEQ ID NO: 173.
  • 66. (canceled)
  • 67. (canceled)
  • 68. The immunogenic composition according to claim 61, wherein the at least two nucleic acids are encapsulated in the same lipid nanoparticle, or in separate lipid nanoparticles.
  • 69. (canceled)
  • 70. The immunogenic composition of claim 68, wherein the lipid nanoparticle comprises a cationic lipid, a non-cationic lipid, a cholesterol-based lipid and a PEG-modified lipid.
  • 71. The immunogenic composition according to claim 70, wherein the cationic lipid is selected from cKK-E12, cKK-E10, OF-Deg-Lin and OF-02; the non-cationic lipid is selected from DOPE and DEPE; the cholesterol-based lipid is cholesterol; and the PEG-modified lipid is DMG-PEG-2K.
  • 72. The immunogenic composition according to claim 71, wherein the lipid nanoparticle comprises cKK-E10, DOPE, cholesterol and DMG-PEG2K at the molar ratios 40:30:28.5:1.5.
  • 73. The immunogenic composition of claim 61 comprising at total of 7.5 μg, 15 μg, 45 μg or 135 μg of the at least two nucleic acids.
  • 74.-77. (canceled)
  • 78. A method of treating or preventing an infection caused by a β-coronavirus, said method comprising administering to a subject an effective amount of the pharmaceutical composition of claim 17, or the immunogenic composition of claim 61.
  • 79. (canceled)
  • 80. The method of claim 78, wherein the β-coronavirus is SARS-COV-2.
  • 81.-89. (canceled)
  • 90. A method of treating or preventing an infection caused by a β-coronavirus, said method comprising administering to a subject an effective amount of an immunogenic composition comprising an mRNA construct, wherein said mRNA construct comprises an optimized nucleotide sequence encoding a full-length SARS-COV-2 spike protein which has been modified relative to naturally occurring full-length SARS-COV-2 spike protein of SEQ ID NO: 1 to remove the furin cleavage site and to mutate residues 986 and 987 to proline and further contains the L18F, D80A, D215G, L242-, A243-, L244-, K417N, E484K, N501Y, D614G and A701V mutations.
  • 91. The method of claim 90, wherein the optimized nucleotide sequence encodes an amino acid sequence comprising SEQ ID NO: 167, optionally wherein the optimized nucleotide sequence has the nucleic acid sequence of SEQ ID NO: 173.
  • 92.-108. (canceled)
CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a 35 U.S.C. § 371 National Stage Application of International Application No. PCT/US2021/031256, filed on May 7, 2021, which claims benefit of, and priority to U.S. Provisional Patent Application Ser. No. 63/021,319 filed on May 7, 2020, U.S. Provisional Patent Application Ser. No. 63/032,825 filed on Jun. 1, 2020, U.S. Provisional Patent Application Ser. No. 63/076,718 filed on Sep. 10, 2020, U.S. Provisional Patent Application Ser. No. 63/076,729 filed on Sep. 10, 2020, U.S. Provisional Patent Application Ser. No. 63/088,739 filed on Oct. 7, 2020, U.S. Provisional Patent Application Ser. No. 63/143,604 filed on Jan. 29, 2021, U.S. Provisional Patent Application Ser. No. 63/143,612 filed on Jan. 29, 2021, and U.S. Provisional Patent Application Ser. No. 63/146,807 filed on Feb. 8, 2021. The contents of each of the foregoing applications are hereby incorporated by reference in their entireties.

PCT Information
Filing Document Filing Date Country Kind
PCT/US2021/031256 5/7/2021 WO
Provisional Applications (8)
Number Date Country
63021319 May 2020 US
63032825 Jun 2020 US
63076729 Sep 2020 US
63076718 Sep 2020 US
63088739 Oct 2020 US
63143604 Jan 2021 US
63143612 Jan 2021 US
63146807 Feb 2021 US