Biopharmaceuticals, including recombinant therapeutic proteins, nucleic acid products, and therapies based on engineered cells, represent an important public health need. Despite major advances, the price, affordability, and ease of production remain obstacles to ubiquitous access to groundbreaking therapies. In biomanufacturing, a significant cost driver is product titer, or produced concentration of functional product. All current industrial cell hosts contain weaknesses in which improvement would enhance the production of biologics.
Current industrial cell hosts include E. coli, Chinese Hamster Ovary (CHO) cells, and S. cerevisiae, which combine to produce nearly all marketed biologics. E. coli offers a fast and inexpensive host but production of proteins of eukaryotic hosts can be problematic. CHO cells are capable of human-like post-translational modifications but are slow to grow, inconsistent in reproducibility, require expensive media for growth, and produce proteins that can be difficult to purify. S. cerevisiae also possesses eukaryotic post-translational machinery; however, excess mannose sugar residues are added, sometimes resulting in immunogenicity and toxicity and recovery of these proteins often requires whole-cell lysis, complicating purification. Thus, a need exists to engineer new types of host cells to produce proteins efficiently.
The invention provides expression constructs, cells expressing heterologous proteins, and methods of producing heterologous proteins. In one aspect, the invention features an expression construct including an OLE1 promoter operably linked to a nucleic acid encoding a polypeptide including a signal sequence and a heterologous protein. In a related aspect, the invention features a methylotrophic cell expressing a heterologous protein, wherein the expression is under the control of an OLE1 promoter. In some embodiments, the OLE1 promoter is located at an OLE1, AOX1, GAPDH, DAS2, or PIF1 locus. The methylotrophic cell may be transformed using an expression construct of the invention. In some embodiments, the OLE promoter has at least 95% (e.g. 95%, 96%, 97%, 98%, 99%, or 100%) homology with SEQ ID NO: 1 or a protein-expressing fragment thereof.
In another aspect, the invention features an expression construct including a DAS2 promoter operably linked to a nucleic acid encoding a polypeptide including a signal sequence and a heterologous protein and a targeting sequence for integration in a methylotrophic cell at a non-native locus. In a related aspect, the invention features a methylotrophic cell expressing a heterologous protein, wherein the expression is under the control of a DAS2 promoter integrated at a non-native locus, e.g., an OLE1, AOX1, GAPDH, or PIF1 locus. The methylotrophic cell may be transformed using an expression construct of the invention. In some embodiments, the DAS2 promoter has at least 95% (e.g. 95%, 96%, 97%, 98%, 99%, or 100%) homology with SEQ ID NO: 2 or a protein-expressing fragment thereof.
In another aspect, the invention features an expression construct including an AOX1 promoter operably linked to a nucleic acid encoding a polypeptide including a signal sequence and a heterologous protein, the construct further including a targeting sequence for integration in a methylotrophic cell at a PIF1, OLE1, or DAS2 locus. In a related aspect, the invention features a methylotrophic cell expressing a heterologous protein, wherein the expression is under the control of an AOX1 promoter integrated at a PIF1, OLE1, or DAS2 locus. The methylotrophic cell may be transformed using an expression construct of the invention. In some embodiments, the AOX1 promoter has at least 95% (e.g. 95%, 96%, 97%, 98%, 99%, or 100%) homology with SEQ ID NO: 3 or a protein-expressing fragment thereof.
In another aspect, the invention features an expression construct including a GAPDH promoter operably linked to a nucleic acid encoding a polypeptide including a signal sequence and a heterologous protein, the construct further including a targeting sequence for integration in a cell at an AOX1, PIF1, OLE1, or DAS2 locus. In a related aspect, the invention features a cell, e.g., a yeast cell or methylotrophic cell, expressing a heterologous protein, wherein the expression is under the control of a GAPDH promoter integrated at an AOX1, PIF1, OLE1, or DAS2 locus. The cell may be transformed using an expression construct of the invention. In some embodiments, the GAPDH promoter has at least 95% (e.g. 95%, 96%, 97%, 98%, 99%, or 100%) homology with SEQ ID NO: 4 or a protein-expressing fragment thereof.
In some embodiments of any of the above aspects, the signal sequence is identical to the signal sequence of a naturally occurring yeast protein such as SCW11, MSC1, EXG1, 0841, 1286, BGL2, 2488, 2848, PRY2, 4355, PIR1 KAR2, TOS1, 2241, LHS1, TIF1, CTS1, or 5326, e.g., KAR2, MSC1, TOS1, 2241, LHS1, TIF1, CTS1, or 5326.
In another aspect, the invention features an expression construct including a promoter operably linked to a nucleic acid encoding a polypeptide including a signal sequence and a heterologous protein, wherein the signal sequence is a signal sequence of KAR2, MSC1, TOS1, 2241, LHS1, TIF1, CTS1, or 5326. In some embodiments, the promoter is an OLE1, AOX1, DAS2, or GAPDH promoter. In some embodiments, the expression construct includes a targeting sequence for integration in a methylotrophic cell at an AOX1, PIF1, OLE1, GAPDH, or DAS2 locus. In a related aspect, the invention features a methylotrophic cell expressing a heterologous protein fused to a signal sequence of KAR2, MSC1, TOS1, 2241, LHS1, TIF1, CTS1, or 5326. In some embodiments, the expression is under the control of an OLE1, AOX1, DAS2, or GAPDH promoter. In some embodiments, the heterologous protein is integrated at an AOX1, PIF1, OLE1, GAPDH, or DAS2 locus.
In another aspect, the invention features an expression construct comprising a promoter operably linked to a nucleic acid encoding a polypeptide comprising a signal sequence and a heterologous protein, wherein (i) the promoter is an AOX1 or DAS2 promoter and/or the construct further comprises a targeting sequence for integration in a methylotrophic cell at an AOX1 or DAS2 locus; (ii) the expression construct further comprises a Kozak sequence beginning at the −3 position relative to the translation start site of the nucleic acid encoding the polypeptide; and/or (iii) a mRNA secondary structure of the nucleic acid encoding a polypeptide has been reduced or eliminated relative to the endogenous mRNA encoding the heterologous protein. In a related aspect, the invention features a cell, e.g., a yeast cell or methylotrophic cell, expressing a heterologous protein under the control of a promoter, wherein (i) the promoter is an AOX1 promoter or a DAS2 promoter and/or the promoter is located at an AOX1 or DAS2 locus; (ii) mRNA encoding the heterologous protein comprises a Kozak sequence beginning at the −3 position relative to the translation start site; and/or (iii) a mRNA secondary structure of the mRNA encoding the heterologous protein has been reduced or eliminated relative to the endogenous mRNA encoding the heterologous protein.
In another aspect, the invention features a method for preparing a transgene expression construct for expressing a heterologous protein in Pichia comprising providing a nucleic acid encoding a heterologous protein; and (i) selecting a promoter that increases expression of genes of the Mut pathway upon integration; or (ii) selecting a targeting sequence for guided recombination into a locus, wherein insertion of the heterologous protein into the locus increases expression of genes of the Mut pathway; or (i) and (ii).
In some embodiments of any of the above aspects, an expression construct of the invention is a plasmid or viral vector. The plasmid may be an episomal plasmid or an integrative plasmid. The expression construct may be linearized (e.g. by a restriction enzyme).
In another aspect, the invention features a method of producing a heterologous protein with a methylotrophic cell. The method includes culturing the cell under conditions suitable to express the heterologous protein. In some embodiments, the method includes first culturing the cell with a first carbon source lacking methanol under conditions in which the heterologous protein is substantially not expressed, followed by switching the carbon source to a carbon source that includes methanol to express the heterologous protein. In some embodiments, the method further includes isolating the protein. In other embodiments, the method further includes transforming the methylotrophic cell with an expression construct encoding the heterologous protein, as described herein.
In embodiments of any of the above aspects, the heterologous protein is selected from the group consisting of enzymes, hormones, antibodies or antigen binding fragments thereof, vaccine components, blood factors, thrombolytic agents, cytokines, receptors, and fusion proteins. In further embodiments of any of the above aspects, the methylotrophic cell is a yeast cell, such as a Pichia pastoris, Komagataella phaffii or Komagataella pastoris cell. The Komagataella phaffii cell may be a Komagataella phaffii Y-11430, Y-7556, YB-4290, Y-12729, Y-17741, Y-48123, Y-48124, YB-378, YB-4289, GS115, KM71H, SMD1168, SMD1168H, or X-33 cell.
In some embodiments of any of the above aspects, the expression construct comprises a Kozak sequence beginning at the −3 position relative to the translation start site of the nucleic acid encoding the polypeptide. In some embodiments, the mRNA encoding the heterologous protein comprises a Kozak sequence beginning at the −3 position relative to the translation start site. In some embodiments, the Kozak sequence comprises (i) the sequence ANAATGNC, wherein N comprises A, T, G, or C; or (ii) the sequence AMMATG, wherein M comprises A or C.
In some embodiments of any of the above aspects, a mRNA secondary structure of the nucleic acid encoding a polypeptide or of the has been reduced or eliminated relative to the endogenous mRNA encoding the polypeptide. In some embodiments, a mRNA secondary structure of the mRNA encoding the heterologous protein has been reduced or eliminated relative to the endogenous mRNA encoding the heterologous protein. In some embodiments, the mRNA secondary structure is selected from a hairpin loop or any other structure as predicted by likelihood of pairing and/or low free energy.
The invention provides expression constructs and methylotrophic cells that express heterologous proteins, as well as methods to produce heterologous proteins. The cells advantageously produce a significantly higher titer of heterologous protein compared to prior expression systems. The DNA constructs are designed to drive gene expression under the control of highly active methanol-inducible promoters and can be integrated at various loci in the genome that enhance protein production. Furthermore, signal sequences of efficiently secreted proteins can be incorporated into the constructs to produce cells resulting in an increase in the titer of protein produced.
By “expression construct” is meant a nucleic acid construct including a promoter operably linked to a nucleic acid sequence of a heterologous protein. Other elements may be included as described herein and known in the art.
By “integration” is meant insertion of a nucleotide sequence into a host cell chromosome or episomal DNA element, such as by homologous recombination.
By “methylotrophic cell” is meant a cell having the ability to use reduced one-carbon compounds, such as methanol or methane, as a carbon source for cellular growth.
By “operably linked” is meant that a gene and a regulatory sequence(s) (e.g., a promoter) are connected in such a way as to permit gene expression when the appropriate molecules (e.g., transcriptional activator proteins) are bound to the regulatory sequence(s).
By “protein” is meant any chain of amino acids, regardless of length or post-translational modification (e.g., glycosylation or phosphorylation). For the purposes of this invention, a “heterologous protein” is a protein not natively expressed by a methylotrophic cell, e.g., a mammalian protein, such as a human protein.
By “promoter” is meant a DNA sequence sufficient to direct transcription; such elements may be located in the 5′ region of the gene. An OLE1 promoter is one having at least 80% homology to SEQ ID NO.: 1 or any protein-expressing fragment thereof and producing at least 80% of the heterologous protein as SEQ ID NO: 1 under the same conditions. A DAS2 promoter is one having at least 80% homology to SEQ ID NO.: 2 or any protein-expressing fragment thereof and producing at least 80% of the heterologous protein as SEQ ID NO: 2 under the same conditions. An AOX1 promoter is one having at least 80% homology to SEQ ID NO.: 3 or any protein-expressing fragment thereof and producing at least 80% of the heterologous protein as SEQ ID NO: 3 under the same conditions. A GAPDH promoter is one having at least 80% homology to SEQ ID NO.: 4 or any protein-expressing fragment thereof and producing at least 80% of the heterologous protein as SEQ ID NO: 4 under the same conditions.
By “signal sequence” is meant a short peptide present at the N-terminus of a newly synthesized heterologous protein that directs the protein toward the secretory pathway of a cell. The signal sequence is typically cleaved from the heterologous protein prior to secretion.
The term “nucleic acid,” in its broadest sense, includes any compound and/or substance that comprises a polymer of nucleotides. These polymers are referred to as polynucleotides.
Nucleic acids (also referred to as polynucleotides) may be or may include, for example, ribonucleic acids (RNAs), deoxyribonucleic acids (DNAs), threose nucleic acids (TNAs), glycol nucleic acids (GNAs), peptide nucleic acids (PNAs), locked nucleic acids (LNAs, including LNA having a β-D-ribo configuration, α-LNA having an α-L-ribo configuration (a diastereomer of LNA), 2′-amino-LNA having a 2′-amino functionalization, and 2′-amino-α-LNA having a 2′-amino functionalization), ethylene nucleic acids (ENA), cyclohexenyl nucleic acids (CeNA) or chimeras or combinations thereof.
In some embodiments, polynucleotides of the present disclosure function as messenger RNA (mRNA). “Messenger RNA” (mRNA) refers to any polynucleotide that encodes a (at least one) polypeptide (a naturally-occurring, non-naturally-occurring, or modified polymer of amino acids) and can be translated to produce the encoded polypeptide in vitro, in vivo, in situ or ex vivo. In some preferred embodiments, an mRNA is translated in vivo.
The basic components of an mRNA molecule typically include at least one coding region, a 5′ untranslated region (UTR), a 3′ UTR, a 5′ cap and a poly-A tail.
An exemplary methylotrophic cell for use in the present invention is a yeast cell, such as Pichia pastoris, which offers an attractive blend of advantages as a host for protein production. Two useful P. pastoris strains include Komagataella pastoris and Komagataella phaffii. As a eukaryotic organism, it is capable of producing the complex post-translational modifications required for human biologics, and it exhibits fast, robust growth on inexpensive media. It possesses a small, tractable 9.4 MB genome that can be easily manipulated with an established toolbox of genetic techniques. Examples of strains of K. phaffii include NRRL Y-11430, Y-7556, YB-4290, Y-12729, Y-17741, Y-48123, Y-48124, YB-378, YB-4289, GS115, KM71H, SMD1168, SMD1168H, and X-33.
Heterologous proteins can be expressed in methylotrophic cells using a promoter at either native locus or an alternate locus and a source of carbon, e.g., methanol. In the context of the present invention, such promoters include OLE1, DAS2, AOX1, and GAPDH promoters.
Expression constructs can provide an early and inexpensive opportunity for optimization of protein quality and titer. High-quality protein is properly folded and full-length (intact), with native N- and C-termini, and without significant proteolysis. In engineering the expression constructs, factors such as the promoter for heterologous gene expression, target site for transgene integration, sequence for translation initiation, and mRNA codon-optimization of the gene of interest are important design points for a given protein-expressing strain.
Expression constructs are nucleic acid constructs that minimally include a promoter or any protein-expressing fragment thereof operably linked to a nucleotide sequence for a heterologous protein. Expression constructs may also include additional elements as is described herein and known in the art. In some embodiments, the expression construct can include one or more of any of the following components: signal sequence, targeting sequence, transcription terminator sequence, origin of replication, multi-cloning site, and an antibiotic resistance marker (which is optionally under the control of its own promoter, e.g., TEFI or GAPDH). In some embodiments, the construct is a viral vector or a plasmid, such as an episomal plasmid or an integrative plasmid. In some embodiments, the construct comprises a transgene cassette. Transgene cassettes may include, e.g., a promoter, a nucleotide sequence for a heterologous protein of interest, and a terminator. Transgene cassettes may also include, e.g., a targeting sequence for guided recombination and/or a selective marker for isolation of positive clones. The construct can be linearized e.g., with a restriction enzyme or it can be in closed-circular form. The construct can be used to transform a methylotrophic cell (e.g. yeast) by electroporation, heat shock, or chemical transformation with lithium acetate. Once integrated, the altered genome is preferably passed on to each replicative generation.
Efforts to-date regarding selection of loci for transgene cassette insertion have focused primarily on locus accessibility for expressing the gene of interest. However, this disclosure demonstrates that use of certain promoters may upregulate native (endogenous) genes (e.g., coding regions) and provide an unexpected benefit to cell health and metabolism that results in increased titers and/or quality of heterologous proteins. This includes, but is not limited to, upregulation of the DAS1, DAS2, AOX1, GAPDH, and ATG30 genes by use of the respective promoter or locus. In the case of DAS1, DAS2, and AOX1, upregulating these genes can upregulate the overall Mut pathway. Since the organism relies on methanol as its carbon source during the production phase of fermentation, enhanced utilization by upregulation of the Mut pathway enables greater cell productivity. It was unexpected that use of a Mut pathway promoter or locus can drive significant upregulation of this pathway.
In some embodiments, expression of the heterologous protein from the promoter and/or at the loci results in an increase or decrease in expression of one or more endogenous genes. In some embodiments, expression of the heterologous protein from the promoter and/or at the loci results in an upregulation of expression of one or more genes in the Mut pathway. In some embodiments, one or more genes in the Mut pathway are upregulated at least 2-fold, at least 3-fold, at least 4-fold, at least 5-fold, at least 10-fold, at least 50-fold, at least 100-fold, at least 1000-fold compared to cells that do not have the heterologous protein inserted.
Exemplary promoters include OLE1, DAS2, AOX1, and GAPDH promoters. These promoter sequences may have at least 80% homology to SEQ ID NOs.: 1-4 (e.g., identical to SEQ ID NOs: 1-4) or any protein-expressing fragment thereof. For example, the promoter sequence may have at least 85, 90, 95, or 99% homology to one of SEQ ID NOs.: 1-4 or any protein-expressing fragment thereof. For a promoter not identical to one of SEQ ID NOs.: 1-4 or any protein-expressing fragment thereof, the promoter will result in protein expression of at least 80% of the protein expressed under control of the corresponding wild type sequence under the same conditions. For example, a promoter sequence or any protein-expressing fragment thereof with less than 100% homology to one of SEQ ID Nos.: 1-4 may result in protein expression of at least 85, 90 95, or 99% of the protein expressed under control of the corresponding wild type sequence under the same conditions.
The heterologous protein expressed by a methylotrophic cell of the invention can be any non-natively expressed protein. Such proteins may be native to another species or artificial and include enzymes (such as trypsin or imiglucerase), hormones (e.g., insulin, glucagon, human growth hormone, gonadotrophins, erythropoietin, or a colony stimulating factor), antibodies or antigen binding fragments thereof (e.g., a monoclonal antibody or Fab fragment), single chain variable fragments (scFvs), nanobodies, a vaccine component, a blood factor (e.g., Factor VIII or Factor IX), a thrombolytic agent (e.g., tissue plasminogen activator), cytokines (such as interferons (e.g., interferon-α, -β, or -γ), interleukins (e.g., IL-2) and tumor necrosis factors), receptors, and fusion proteins (e.g., receptor fusions).
Typically, the heterologous protein will be expressed with a signal sequence. The signal sequences may be expressed under the control of any of the promoters described herein or other suitable promoters, e.g., any methanol inducible promoter. A signal sequence is a short peptide present at the N-terminus of newly synthesized proteins. The peptide directs the proteins toward the secretory pathway and is typically cleaved from the heterologous protein prior to secretion. Examples of signal sequences that may be employed in this invention are shown in Table 1. It will be understood that other nucleic acid sequences may be employed that result in the same protein sequence because of the degeneracy of the genetic code. Signal sequences producing a peptide with at least 80% homology to those listed in Table 1 may be employed. For example, signal sequences may produce a peptide having at least 85, 90, 95, or 99% homology to a peptide listed in Table 1. In certain embodiments, the signal sequence is one of KAR2, MSC1, TOS1, 2241, LHS1, TIF1, CTS1, and 5326. Other signal sequences are known in the art, e.g., alpha mating factor (MFα) from S. cerevisiae.
The expression construct may be designed to insert a sequence into a methylotrophic cell genome or to be transiently or stably expressed in an episomal construct. Constructs useful for integration into a methylotrophic cell minimally include a targeting sequence flanking an insertion sequence. The targeting sequence determines the locus sequence in the genome where the construct will be integrated. In some embodiments, the targeting sequence is a promoter (e.g. OLE1, AOX1, GAPDH, or DAS2 promoter) or another gene (e.g. PIF1). A targeting sequence may encompass the promoter when the construct inserts at the native locus of the promoter. A targeting sequence may include a nucleic acid sequence of from about 10 bp to about 10,000 bp (e.g., 10 bp-100 bp, e.g., 10 bp, 20 bp, 30 bp, 40 bp, 50 bp, 60 bp, 70 bp, 80 bp, 90 bp, 100 bp, e.g. 100 bp-1000 bp, e.g., 200 bp, 300 bp, 400 bp, 500 bp, 600 bp, 700 bp, 800 bp, 900 bp, 1000 bp, e.g., 1,000 bp-10,000 bp, e.g., 1,000 bp, 2,000 bp, 3,000 bp, 4,000 bp, 5,000 bp, 6,000 bp, 7,000 bp, 8,000 bp, 9,000 bp, 10,000 bp) that may enable efficient homologous recombination.
Heterologous proteins may be inserted into the genome of a methylotrophic cell at any suitable locus. Such loci include the native locus of the promoter employed or an alternative locus, such as the locus of a different promoter. Exemplary loci for use in the present invention include that of the OLE1, DAS2, AOX1, or GAPDH promoters or PIF1 (e.g., SEQ ID NO: 65).
Also provided herein are methods of preparing transgene expression constructs for expressing a heterologous protein comprising: (i) selecting a promoter that increases expression of one or more genes of the Mut pathway upon integration; or (ii) selecting a targeting sequence for guided recombination into a locus, wherein insertion of the heterologous protein into the locus increases expression of one or more genes of the Mut pathway; or (i) and (ii).
Alternatively, the heterologous protein may be expressed from an expression construct that is not integrated in the genome of the methylotrophic cell.
Sequences for other possible elements of expression constructs are known in the art. For example, transcription terminator sequence, origin of replication, multi-cloning site, and an antibiotic resistance marker sequences are known.
The methylotrophic cells and expression constructs of the present disclosure may encode a nucleic acid comprising one or more regions or sequences which act or function as an untranslated region (UTR). As their name implies, UTRs are transcribed but not translated. In mRNA, the 5′ UTR is located directly upstream (5′) from the start codon (the first codon of an mRNA transcript translated by a ribosome). The first nucleic acid in the start codon is designated as +1 and nucleic acids located upstream are as designated as −1, −2, −3 and so on, while nucleic acids located downstream of this first nucleic acid are designated as +2, +3, +4 and so on. In some embodiments of the present disclosure, at least one 5′ untranslated region (UTR) is located upstream from the start codon of the nucleic acid encoding a heterologous protein of interest.
5′UTRs may harbor Kozak sequences, which are commonly involved in translation initiation. While Kozak sequences are known to broadly affect translation efficiency, study of the effect of a consensus Kozak sequence in Pichia has been heretofore limited. This disclosure is premised in part on the discovery of promoters (including but not limited to the DAS2, OLE1, AOX1, and SIT1 promoters) causing increased titers of downstream coding sequences, in part, because the promoters comprise enhanced Kozak sequences, leading to high translation efficiency.
Exemplary Kozak sequences include the Kozak sequence located in the 5′ UTR of nucleic acids encoding AOX1, DAS2, OLE1 and SIT1. For example, the Kozak sequence starting at the −4 position relative to the translation start site of the nucleic acid encoding the heterologous protein of interest may be AAAAATG. CACAATG, or AACGATG.
In some embodiments, the Kozak sequence is a native Kozak sequence (i.e., a Kozak sequence found in nature associated with the heterologous protein of interest). In some embodiments, the Kozak sequence is a heterologous Kozak sequence (i.e., a Kozak sequence found in nature not associated with the heterologous protein of interest). In some embodiments, the Kozak sequence is a synthetic Kozak sequence, which does not occur in nature. Synthetic Kozak sequences include sequences that have been mutated to improve their properties (e.g., which increase expression of a heterologous protein of interest). Synthetic Kozak sequences may also include nucleic acid analogues and chemically modified nucleic acids.
In some embodiments, the Kozak sequences of the present disclosure may begin at the −3 position relative to the translation start site of the nucleic acid encoding the heterologous protein of interest. In some embodiments, the Kozak sequence of the present disclosure comprises an adenine (A) at the −3 position and an adenine (A) at the −1 position relative to the translation start site of the nucleic acid encoding the heterologous protein of interest. In some embodiments, the Kozak sequence may comprise the sequence AN1A starting at the −3 position relative to the translation start site of the nucleic acid encoding the heterologous protein of interest. The N1 in the AN1A sequence may be any nucleic acid. In some embodiments, the N1 in AN1A is adenine (A). In some embodiments, the N1 in AN1A is cytosine (C). In some embodiments, the N1 in AN1A is guanine (G). In some embodiments, the N1 in AN1A is thymine (T). In some embodiments, the Kozak sequence is AN1AATGN2C starting at the −3 position. The N2 in the may be any nucleic acid. In some embodiments, N2 is adenine (A). In some embodiments, N2 is cytosine (C). In some embodiments, N2 is guanine (G). In some embodiments, N2 is thymine (T). In some embodiments, the Kozak sequence, starting at the −3 position relative to the translation start site, is A(A/C)(A/C), in which the −3 position is adenine (A), the −2 position is adenine (A) or cytosine (C) and the −1 position is either Adenine (A) or cytosine (C). In some embodiments, the Kozak sequence starting at the −3 position is A(A/C)(A/C)ATG.
Kozak sequences increase expression of a heterologous protein. In some embodiments, a Kozak sequence may increase expression of a heterologous protein at least 2-fold, at least 3-fold, at least 4-fold, at least 5-fold, at least 10-fold, at least 50-fold, at least 100-fold, at least 1000-fold compared to a control under similar or substantially similar conditions. In some embodiments, the control is the level of heterologous protein expression using a Kozak sequence that does not have an adenine (A) at the −1 position relative to the translation start site. In some embodiments, the control is the level of heterologous protein expression using a Kozak sequence that does not have an adenine (A) at the −3 position relative to the translation start site. In some embodiments, the control is the level of heterologous protein expression using a Kozak sequence that does not have an adenine (A) at the −3 position or the −1 position relative to the translation start site.
Secondary Structures in mRNA
Complementary base pairing in mRNA often gives rise to secondary structures. As used herein, secondary structures in mRNA include stem-loops (hairpins). Complementary base pairing in mRNA form the stem portion of a hairpin, while unpaired bases can form loops in the mRNA. Additional mRNA secondary structures include pseudoknots (see e.g., Staple et al., PLoS Biol. 3(6):e213, 2005). Algorithms known in the art may be used to predict mRNA secondary structure (see e.g., Matthews et al., Cold Spring Harb Perspect Biol. 2(12):a003665, 2010).
Free energy minimization can also be used to predict RNA secondary structure. For example, the stability of resulting helices (regions with base pairing) and loop regions often promote the formation of stem-loops in RNA. Parameters that affect the stability of double helix formation include the length of the double helix, the number of mismatches, the length of unpaired regions, the number of unpaired regions, the type of bases in the paired region and base stacking interactions. For example, guanine and cytosine can form three hydrogen bonds, while adenine and uracil form two hydrogen bonds. Thus, guanine-cytosine pairings are more stable than adenine-uracil pairings. Loop formation may be limited by steric hindrance, while base-stacking interactions stabilize loops. As an example, tetraloops (loops of four base pairs) often cap RNA hairpins and common tetraloop sequences include UNCG (N=A, C, G, or U).
In some embodiments, the secondary structure is any structure as predicted by likelihood of pairing and/or low free energy. In some embodiments, the secondary structure is a hairpin loop. In some embodiments, the secondary structure is a duplex, a single-stranded region, a hairpin, a bulge, or an internal loops.
Secondary structures may interfere with translation (e.g., block translation initiation and prevent translation elongation). For example, secondary structures in the 5′ UTR may disrupt binding of the ribosome and/or formation of the ribosomal initiation complex on mRNA. Secondary structures downstream of the translation start site, may prevent translation elongation. In some embodiments, a secondary structure in mRNA decreases total expression of a heterologous protein of interest relative to an mRNA without the secondary structure (e.g., reduces total expression by at least 2-fold, at least 3-fold, at least four-fold, at least 5-fold, at least 10-fold, at least 100-fold, at least 1000-fold). In some embodiments, a secondary structure in mRNA, e.g., a hairpin loop or any other structure as predicted by likelihood of pairing and/or low free energy, decreases expression of a full length version of a heterologous protein of interest (e.g., reduces expression by at least 2-fold, at least 3-fold, at least four-fold, at least 5-fold, at least 10-fold, at least 100-fold, at least 1000-fold). In some embodiments, a secondary structure in mRNA increases expression (e.g., by at least 2-fold, at least 3-fold, at least four-fold, at least 5-fold, at least 10-fold, at least 100-fold, at least 1000-fold) of at least one truncated form of a heterologous protein of interest.
Codon optimization, using one or more synonymous mutations that do not alter the amino acid sequence, may be used to mitigate the formation of secondary structures in mRNA encoding a heterologous protein of interest. In some embodiments, codon optimization reduces the number of complementary base pairs in the mRNA. In some embodiments, codon optimization of an mRNA encoding a heterologous protein of interest increases expression of the heterologous protein by at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90% or at least 100% compared to a control mRNA sequence that encodes the heterologous protein but is not codon optimized.
Heterologous protein production begins with the design of the expression construct carrying the gene of interest. Methods for introducing such constructs are known in the art. For example a construct may be designed for homologous recombination at a particular chromosomal locus in a methylotrophic cells, e.g., yeast. Once transformed (e.g. via electroporation, heat shock, lithium acetate), single or multi-copy strains are typically selected based on an antibiotic resistance gene (e.g., Zeocin (phleomycin Dl)). Higher-copy strains are generally achieved by iterative selection on increasing concentrations of antibiotic. The plasmid is directed to a specific locus by the target sequence on each end of the linearized cassette (
Methylotrophic cells, e.g., yeast, can be cultured via common methods known in the art such as in a shaker flask in an incubator at optimal growth temperatures (e.g., about 25° C.). Culture sizes can be scaled up so as to increase protein yield. First the cells are grown to a suitable cell density such that sufficient biomass is present. Cultures can be grown in media containing glucose or glycerol as the carbon source to promote efficient production of biomass. For example, cultures can be inoculated in buffered glycerol-containing media (BMGY, 4% v/v glycerol, 10 g/L yeast extract, 20 g/L peptone, 13.4 g/L yeast nitrogen base, 0.1 M potassium phosphate buffer pH 6.5) for about 24 hours. The glycerol concentration may vary from about 1% to about 5% (e.g. about 1%, 2%, 3%, 4%, or 5%). When the culture achieves a desired cell density (e.g., OD600 0.2-1.0) after about 24 hours, the medium is switched to a medium containing a different carbon source (e.g., methanol), which activates expression of genes under control of an inducible promoter, such as OLE1, DAS2, and AOX1. In some embodiments, a constitutively active promoter such as GAPDH can be used. For example, the medium is switched to buffered methanol-containing media (BMMY, 1.5% (v/v) methanol, 10 g/L yeast extract, 20 g/L peptone, 13.4 g/L yeast nitrogen base, 0.1 M potassium phosphate buffer pH 6.5) and the culture is grown for about 24 hours. The methanol concentration may vary from about 0.01% to about 10% (e.g. 0.01%-0.1%, e.g. 0.01%, 0.02%, 0.03%, 0.04%, 0.05%, 0.06%, 0.07%, 0.08%, 0.09%, 0.1%, e.g., 0.1%-1%, e.g. 0.2%, 0.3%, 0.4%, 0.5%, 0.6%, 0.7%, 0.8%, 0.9%, 1.0%, e.g., 1%-10%, e.g. 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%). After about 24 hours after induction with BMMY, the culture may be supplemented with additional 1.5% (v/v) methanol carbon source. The methanol supplement concentration may vary from about 0.01% to about 10% (e.g. 0.01%-0.1%, e.g. 0.01%, 0.02%, 0.03%, 0.04%, 0.05%, 0.06%, 0.07%, 0.08%, 0.09%, 0.1%, e.g., 0.1%-1%, e.g. 0.2%, 0.3%, 0.4%, 0.5%, 0.6%, 0.7%, 0.8%, 0.9%, 1.0%, e.g., 1%-10%, e.g. 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%). The culture may be grown for about an additional 24 hours, after which the cells may be harvested. Other modes of fermentation are known, e.g., chemostat and perfusion. The heterologous protein is secreted by the cells and can be purified using known methods. Protein expression levels, purity, and identity can be assayed e.g., with SDS-PAGE analysis, ELISA, and mass spectrometry.
Gene expression profiles of K. phaffii were analyzed using RNA-Seq under either glycerol or glucose conditions first, and then methanol growth conditions (
Heterologous protein production began with the design of the integration cassette carrying the gene of interest. Once transformed with the purified, linearized plasmid, single or multi-copy strains were selected on Zeocin. Higher-copy strains were achieved by iterative selection on increasing concentrations of Zeocin. Promoter sequences were selected by taking the 5′ UTR intergenic region, up to 1000 bp. Each promoter was either used as both the promoter sequence and integration locus, or preceded by the AOX1 or GAPDH promoter sequence for integration in the AOX1 or GAPDH locus. Each promoter was used to express human growth hormone (hGH) fused to the 5′ MFα (α mating factor) signal sequence. Promoter-ahGH sequences were synthesized by GeneArt (Invitrogen) and cloned in either the pPICZA (AOX1 locus) or pGAPZA (GAPDH locus) vectors. Two additional vectors were created for the AOX1 and DAS2 promoters using the PIF1 gene sequence as the locus, which flanks the GAPDH locus, to evaluate the presence of promoter contamination by the GAPDH promoter on the AOX1 or DAS2 promoters.
Vectors were linearized in the integration locus sequence and transformed by electroporation into wild-type P. pastoris by Blue Sky Biosciences (Worcester, Mass.). Clonal stocks were screened by immunoblot, and the top 1 or 2 clones per construct were evaluated in triplicate in 3-mL deep-well cultivation plates. Supernatant hGH titers were quantified by ELISA (
The results indicated that the promoter, and not the locus, dominated the phenotype, as the same promoter at various loci all produced comparable hGH titers. Compared to the benchmark hGH production strain (AOX1 at native locus), both the DAS2 and OLE1 promoters showed comparable or improved titers. A qualitative immunoblot (
Native secretion signal sequences were identified by culturing K. phaffii cells and analyzing secreted proteins. Cultures were inoculated at 25° C. in buffered glycerol-containing media (BMGY, 4% (v/v) glycerol, 10 g/L yeast extract, 20 g/L peptone, 13.4 g/L yeast nitrogen base, 0.1 M potassium phosphate buffer pH 6.5) and grown for 24 hours during a biomass accumulation phase. Protein induction was achieved by switching the media to buffered methanol-containing media (BMMY, 1.5% (v/v) methanol, 10 g/L yeast extract, 20 g/L peptone, 13.4 g/L yeast nitrogen base, 0.1 M potassium phosphate buffer pH 6.5) and cultures were grown for 24 hours. Next, cultures were supplemented with 1.5% (v/v) methanol and grown for an additional 24 hours. 48 hours after induction, the cultures were harvested.
Proteins secreted during fermentation were analyzed by SDS-PAGE and LC-MS. These data were compared with quantification of mRNA transcripts (
This Example examined the effect of DAS2 and AOX1 promoters on expression of the human growth hormone (hGH) and also characterized the effect of these promoters on expression of endogenous methanol utilization pathway (Mut) genes. In particular, hGH cassettes carrying the DAS2 or AOX1 promoter were integrated into various loci and tested in P.pastoris. The results demonstrate that altered Mut pathway expression may enhance hGH productivity.
hGH protein titer was measured at 24 hr post-induction as a function of cassette copy number for strains in which hGH transgene expression is driven by a DAS2 promoter (referred to as PDAS2 or DAS2 strains) and for strains in which hGH transgene expression is driven by the AOX1 promoter (referred to as PAOX1 or AOX1 strains) at various loci (
Added benefits of upregulation of the DAS2 and AOX1 genes were surprisingly found: increased levels of transgene expression were detected when using these promoters and loci beyond what was expected for the level of transgene transcript observed in these strains via RNAseq.
As shown in
These results suggest that altered Mut pathway expression may further enhance hGH productivity.
This Example analysed 5′ UTR sequences from various gene promoters from P. pastoris to determine a consensus Kozak sequence and compared the translation efficiencies of each 5′UTR to direct heterologous expression of hGH.
A HMM Logo of Kozak sequences across all P. pastoris genes was generated by Skylign given input aligned sequences (
A preferential Kozak sequence of ANAATGNC was discovered. As shown in
This Example analyzed whether use of codon optimization to mitigate mRNA hairpin formation for VP8* would affect expression of full length VP8* and N-terminally truncated VP8* variants.
The desired full length VP8* protein consists of residues 86 through 265, directly following the alpha mating factor (uMF) signal sequence (
As shown in
Thus, through the combination of promoter/locus selection (such as DAS2), an optimal Kozak sequence (ANA), and an mRNA sequence which lacks predicted, strong secondary structure, transgene cassette design can enable rapid and robust strain engineering for heterologous protein expression.
While the invention has been described in connection with specific embodiments thereof, it will be understood that it is capable of further modifications and this application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the invention that come within known or customary practice within the art to which the invention pertains and may be applied to the essential features hereinbefore set forth, and follows in the scope of the claims.
Other embodiments are within the claims.
This application claims the benefit of the filing date of U.S. Provisional Application No. 62/444,758, filed on Jan. 10, 2017, the content of which is herein incorporated by reference in its entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2018/013220 | 1/10/2018 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
20200032279 A1 | Jan 2020 | US |
Number | Date | Country | |
---|---|---|---|
62444758 | Jan 2017 | US |