The present invention relates to gene regulatory elements, particularly promoters, for use in gene expression in prokaryotes, particularly methanogenic archaea.
Methanogenic archaea (also known as methanogens) are a large and diverse group of strict anaerobes capable of a specialized metabolism that produces large quantities of methane. They employ unusual enzymes and coenzymes to metabolize a limited number of simple substrates such as CO2, formate, methylated C1 compounds and acetate to meet their energy and carbon needs. These coenzymes function either as C1 or electron carriers and can perform reactions with extremely low redox potentials. Methanogens have large amounts of intracellular Fe, and their genomes encode large numbers of 4Fe-4S motifs. Metabolic models exist that examine the relationships between CO2 and H2 consumption rates, CH4 production rates as well as carbon flux to biomass.
According to a first aspect, provided herein is an isolated DNA molecule, comprising a sequence having at least about 80% identity to at least about 80 contiguous nucleotides of SEQ ID No. 1.
According to a second aspect, there is also provided an isolated DNA molecule comprising a sequence having at least about 80% identity to at least about 200 contiguous nucleotides of SEQ ID No. 2.
According to a third aspect, there is also provided an expression cassette comprising: a promoter comprising the DNA molecule according to the first aspect or the second aspect, and a gene encoding a polypeptide or a functional RNA sequence, wherein the gene is operably linked to the promoter.
According to a fourth aspect, there is provided a vector comprising the expression cassette according to the third aspect.
According to a fifth aspect, there is provided a prokaryotic host cell comprising the expression cassette according to the third aspect.
According to a sixth aspect, there is provided a cell culture comprising prokaryotic host cells according to the fifth aspect.
According to a seventh aspect, there is provided a method for inducing expression of an exogenous gene in a prokaryotic cell culture according to the sixth aspect, the method comprising lowering the concentration of inorganic phosphates in the culture from an initial concentration.
According to an eighth aspect, there is provided a method for transforming a prokaryotic cell, the method comprising: introducing the vector according to the fourth aspect into the prokaryotic cell; and selecting for a transformed prokaryotic cell.
According to a ninth aspect, there is provided a method for co-transforming a prokaryotic cell, the method comprising: introducing the expression cassette according to the third aspect and a nucleic acid sequence encoding a selectable marker into the prokaryotic cell; and selecting for the presence of the selectable marker in a transformed prokaryotic cell to provide a prokaryotic cell transformed with the expression cassette.
These and other features and attributes of the present disclosure and their advantageous applications and/or uses will be apparent from the detailed description which follows.
To assist those of ordinary skill in the relevant art in making and using the subject matter hereof, reference is made to the appended drawings, wherein:
Many challenges are involved in the synthetic conversion of CO2 into CH4 and the production of valuable products from CH4. These include catalyst stability and selectivity, the cost of catalysts and associated technology, and the need to avoid contamination.
Methanogenic archaea are a potentially useful microbial cell factory to produce proteins, biocatalysts and biochemicals through recombinant gene expression. However, the use of methanogenic archaea in a microbial cell factory has been hindered by a lack of molecular tools for auto-inducible gene expression. A regulatory system that uncouples growth from recombinant gene expression is required, especially when the product of interest or its precursors are toxic, or when the engineered pathway competes with endogenous pathways that are essential for cell growth.
Methanococcus maripaludis is a rapidly-growing mesophilic methanogen that utilizes H2:CO2 or formate as its sole energy and carbon source for biomass and methane formation and is a promising candidate for a microbial cell factory. A tetracycline inducible expression system developed for regulated gene expression in Methanosarcina cells has been trialled in M. maripaludis. However, it is not suitable for large-scale (e.g. industrial) cultures, due to the high cost of tetracycline. Of the two remaining regulatory systems known in M. maripaludis, one is nitrogen-dependent and the other is temperature-dependent. However, regulation in both systems is accompanied by other large changes in gene expression due to general responses to nitrogen limitation and temperature. Moreover, to switch gene expression on and off, the nitrogen-dependent promoter needs a laborious change in the nitrogen source between ammonia and alanine or dinitrogen, while the temperature-dependent promoter requires a dramatic change in temperature. Thus, these systems are not practical for large scale (e.g., industrial) cultures.
The present inventors have now identified a region of the M. maripaludis genome containing a phosphate transporter gene promoter (Ppst) and have introduced the promoter into a shuttle vector, thereby providing a recombinant gene expression system. This system induces recombinant gene expression in response to limiting the phosphate levels in a cell culture (without the need for an external inducer such as tetracycline).
When the concentration of phosphate is high, cells grow and recombinant gene expression is low. When the concentration of phosphate drops to limiting concentration during growth, recombinant gene expression can be upregulated at the same time as biomass production is limited. In this way, growth is decoupled from expression, allowing the production of proteins or metabolites inhibitory to growth.
Apart from manipulating the concentrations of phosphate and other growth conditions, protein expression can be controlled by modifying the transcription and/or translation initiation rates, for example by altering the core promoter elements, 5′ untranslated region (UTR), ribosome binding sites (RBS), and/or start codon.
This provides a practical regulatory system for protein overexpression and is particularly suitable for large-scale (e.g. industrial) cultures and thus for large-scale protein and chemical production.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. In case of conflict, the present application, including the definitions, will prevail.
Although methods and materials similar or equivalent to those described herein can be used in practice or testing of the present invention, suitable methods and materials are described below. The materials, methods and examples are illustrative only and are not intended to be limiting. Other features and advantages of the invention will be apparent from the detailed description and from the claims.
To facilitate an understanding of the present invention, a number of terms and phrases are defined below.
As used in the present disclosure and claims, the singular forms “a,” “an,” and “the” include plural forms unless the context clearly dictates otherwise.
Wherever embodiments are described herein with the language “comprising,” otherwise analogous embodiments described in terms of “consisting of” and/or “consisting essentially of” are also provided.
The term “and/or” as used in a phrase such as “A and/or B” herein is intended to include “A and B”, “A or B”, “A”, and “B”.
The terms “cell” (e.g., host cell) and “cell culture” include the primary subject cells and any progeny thereof, without regard to the number of transfers. It should be understood that not all progeny are exactly identical to the parental cell (due to deliberate or inadvertent mutations or differences in environment). However, such altered progeny are included in these terms, so long as the progeny retain the same functionality as that of the originally transformed cell.
The term “gene” is used broadly to refer to any segment of nucleic acid molecule (typically DNA, but optionally RNA) that encodes a protein or that can be transcribed into a functional RNA. Genes may include sequences that are transcribed but are not part of a final, mature, and/or functional RNA transcript, and genes that encode proteins may further comprise sequences that are transcribed but not translated, for example, 5′ untranslated regions, 3′ untranslated regions, introns, etc. Further, genes may optionally comprise regulatory sequences required for their expression, and such sequences may be, e.g., sequences that are not transcribed or translated. Genes can be obtained from a variety of sources, including cloning from a source of interest or synthesizing from known or predicted sequence information, and may include sequences designed to have desired parameters.
The term “nucleic acid” or “nucleic acid molecule” refers to, e.g., DNA or RNA (e.g., mRNA). The nucleic acid molecules can be double-stranded or single-stranded; single stranded RNA or DNA can be the coding (sense) strand or the non-coding (antisense) strand.
The term “isolated” nucleic acid, such as an isolated protein or nucleic acid as used herein, refers to a biomolecule removed from the context in which the biomolecule exists in nature. An isolated biomolecule can be, in some instances, partially or substantially purified. For example, an isolated nucleic acid molecule can be a nucleic acid sequence that has been excised from the chromosome, genome, or episome into which it is integrated in nature.
A “purified” nucleic acid molecule or nucleotide sequence, or protein or polypeptide sequence, is substantially free of cellular material and cellular components. The purified nucleic acid molecule or protein may be free of chemicals beyond buffer or solvent, for example. “Substantially free” is not intended to mean that other components beyond the novel nucleic acid molecules are undetectable.
“Exogenous nucleic acid molecule” or “exogenous gene” refers to a nucleic acid molecule or gene that has been introduced (“transformed”) into a cell. A transformed cell may be referred to as a recombinant cell, into which additional exogenous gene(s) may be introduced. A descendent of a cell transformed with a nucleic acid molecule is also referred to as “transformed” if it has inherited the exogenous nucleic acid molecule. The exogenous gene may be from a different species (and so “heterologous”), or from the same species (and so “homologous”), relative to the cell being transformed. An “endogenous” nucleic acid molecule, gene or protein is a native nucleic acid molecule, gene or protein as it occurs in, or is naturally produced by, the host.
The term “heterologous” when used in reference to a polynucleotide, a gene, a nucleic acid, a polypeptide, or an enzyme refers to a polynucleotide, gene, a nucleic acid, polypeptide, or an enzyme not derived from the host species. For example, “heterologous gene” or “heterologous nucleic acid sequence” as used herein, refers to a gene or nucleic acid sequence from a different species than the species of the host organism it is introduced into. When referring to a gene regulatory sequence or to an auxiliary nucleic acid sequence used for maintaining or manipulating a gene sequence (e.g. a 5′ untranslated region, 3′ untranslated region, poly A addition sequence, intron sequence, splice site, ribosome binding site, internal ribosome entry sequence, genome homology region, recombination site, etc.) or to a nucleic acid sequence encoding a protein domain or protein localization sequence, “heterologous” means that the regulatory or auxiliary sequence or sequence encoding a protein domain or localization sequence is from a different source than the gene with which the regulatory or auxiliary nucleic acid sequence or nucleic acid sequence encoding a protein domain or localization sequence is juxtaposed in a construct, genome, chromosome or episome. Thus, a promoter operably linked to a gene to which it is not operably linked to in its natural state may be referred to herein as a “heterologous promoter,” even though the promoter may be derived from the same species as the gene to which it is linked. Similarly, when referring to a protein localization sequence or protein domain of an engineered protein, “heterologous” means that the localization sequence or protein domain is derived from a protein different from that into which it is incorporated by genetic engineering.
The term “recombinant” or “engineered” nucleic acid molecule as used herein, refers to a nucleic acid molecule that has been altered through human intervention. As non-limiting examples, a recombinant nucleic acid molecule: 1) has been synthesized or modified in vitro, for example, using chemical or enzymatic techniques (for example, by use of chemical nucleic acid synthesis, or by use of enzymes for the replication, polymerization, exonucleolytic digestion, endonucleolytic digestion, ligation, reverse transcription, transcription, base modification (including, e.g., methylation), or recombination (including homologous and site-specific recombination) of nucleic acid molecules; 2) includes cojoined nucleotide sequences that are not cojoined in nature, 3) has been engineered using molecular cloning techniques such that it lacks one or more nucleotides with respect to the naturally occurring nucleic acid molecule sequence, and/or 4) has been manipulated using molecular cloning techniques such that it has one or more sequence changes or rearrangements with respect to the naturally occurring nucleic acid sequence. As non-limiting examples, a cDNA is a recombinant DNA molecule, as is any nucleic acid molecule that has been generated by in vitro polymerase reaction(s), or to which linkers have been attached, or that has been integrated into a vector, such as a cloning vector or expression vector.
The term “recombinant protein” as used herein refers to a protein produced by genetic engineering. The terms “peptide,” “polypeptide” and “protein” are used interchangeably herein, although “peptide” may be used to refer to a polypeptide having no more than about 100 amino acids, or no more than about 60 amino acids.
When applied to organisms, the terms “transgenic” or “recombinant” or “engineered” or “genetically engineered” refer to organisms that have been manipulated by introduction of an exogenous or recombinant nucleic acid sequence into the organism. Non-limiting examples of such manipulations include gene knockouts, targeted mutations and gene replacement, promoter replacement, deletion, or insertion, as well as introduction of transgenes into the organism. For example, a transgenic microorganism can include an introduced exogenous regulatory sequence, for example a promoter sequence, operably linked to an endogenous gene of the transgenic microorganism. Recombinant or genetically engineered organisms can also be organisms into which constructs for gene “knock down” have been introduced. Such constructs include, but are not limited to, RNAi, microRNA, shRNA, antisense, and ribozyme constructs. Also included are organisms whose genomes have been altered by the activity of meganucleases or zinc finger nucleases. A heterologous or recombinant nucleic acid molecule can be integrated into a genetically engineered/recombinant organism's genome or, in other instances, not integrated into a recombinant/genetically engineered organism's genome. As used herein, “recombinant microorganism” or “recombinant host cell” includes progeny or derivatives of the recombinant microorganisms of the invention. Because certain modifications may occur in succeeding generations from either mutation or environmental influences, such progeny or derivatives may not, in fact, be identical to the parent cell, but are still included within the scope of the term as used herein.
The term “expression cassette” as used herein, refers to a nucleic acid construct that encodes a protein or functional RNA (e.g. a tRNA, a short hairpin RNA, one or more microRNAs, a ribosomal RNA, etc.) operably linked to expression control elements, such as a promoter, and optionally, any or a combination of other nucleic acid sequences that affect the transcription or translation of the gene, such as, but not limited to, a transcriptional terminator, a ribosome binding site, a splice site or splicing recognition sequence, an intron, an enhancer, a polyadenylation signal, an internal ribosome entry site, etc.
“Regulatory sequence”, “regulatory element”, or “regulatory element sequence” refers to a nucleotide sequence located upstream (5′), within, or downstream (3′) of a coding sequence or functional RNA-encoding sequence. Transcription of the coding sequence or functional RNA-encoding sequence and/or translation of an RNA molecule resulting from transcription of the coding sequence are typically affected by the presence or absence of the regulatory sequence. These regulatory element sequences may comprise promoters, cis-elements, enhancers, terminators, or introns. Regulatory elements may be isolated or identified from untranslated regions (UTRs) from a particular polynucleotide sequence. Any of the regulatory elements described herein may be present in a chimeric or hybrid regulatory expression element. Any of the regulatory elements described herein may be present in a recombinant construct of the present invention.
The terms “promoter”, “promoter region”, or “promoter sequence” refer to a nucleic acid sequence capable of binding RNA polymerase to initiate transcription of a gene in a 5′ to 3′ (“downstream”) direction. A gene is “under the control of” or “regulated by” a promoter when the binding of RNA polymerase to the promoter is the proximate cause of said gene's transcription. The promoter or promoter region typically provides a recognition site for RNA polymerase and other factors necessary for proper initiation of transcription. A promoter may be isolated from the 5′ untranslated region (5′ UTR) of a genomic copy of a gene. Alternatively, a promoter may be synthetically produced or designed by altering known DNA elements. Also considered are chimeric promoters that combine sequences of one promoter with sequences of another promoter. Promoters may be defined by their expression pattern based on, for example, metabolic, environmental, or developmental conditions. A promoter can be used as a regulatory element for modulating expression of an operably linked transcribable polynucleotide molecule, e.g., a coding sequence or functional RNA sequence. Promoters may contain, in addition to sequences recognized by RNA polymerase and, preferably, other transcription factors, regulatory sequence elements such as cis-elements or enhancer domains that affect the transcription of operably linked genes.
The term “inducible” promoter refers to a promoter having activity dependent on environmental and developmental conditions. The activity of an inducible promoter is dependent on the external environment, such as light and culture medium composition. In some examples, an inducible promoter is inactive in the presence of one or more nutrients (e.g., inorganic phosphates) but active when the one or more nutrients is/are absent. Thus, an inducible promoter is a promoter that is active in response to particular environmental conditions, such as the presence or absence of a nutrient or regulator, the presence or absence of light, etc. In contrast, a “constitutive” promoter is a promoter that is active under most environmental and developmental conditions.
The term “operably linked,” as used herein, denotes a configuration in which a control sequence or localization sequence is placed at an appropriate position relative to a sequence that encodes a polypeptide or functional RNA such that the control sequence directs or regulates the expression or localization of the mRNA encoding the polypeptide, the polypeptide, and/or the functional RNA. Thus, a promoter is in operable linkage with a nucleic acid sequence if it can mediate transcription of the nucleic acid sequence. When introduced into a host cell, an expression cassette that includes a control sequence can result in transcription of the gene to which it is operably linked and/or translation of an encoded RNA or polypeptide under appropriate conditions. Antisense or sense constructs that are not or cannot be translated are not excluded by this definition. In the case of both expression of transgenes and suppression of endogenous genes (e.g., by antisense, or sense suppression) one of ordinary skill will recognize that the inserted polynucleotide sequence need not be identical, but may be only substantially identical to a sequence of the gene from which it was derived. As explained herein, these substantially identical variants are specifically covered by reference to a specific nucleic acid sequence.
The term “selectable marker” or “selectable marker gene” as used herein includes any gene that confers a phenotype on a cell in which it is expressed to facilitate the selection of cells that are transfected or transformed with a nucleic acid construct of the invention. The term may also be used to refer to gene products that effectuate said phenotypes.
A “reporter gene” is a gene encoding a protein that is detectable or has an activity that produces a detectable product. A reporter gene can encode a visual marker or enzyme that produces a detectable signal. Non-limiting examples include: a β-glucuronidase gene, a β-galactosidase gene, or a gene encoding a fluorescent protein, including but not limited to a blue, cyan, green, red, or yellow fluorescent protein, a photoconvertible, photoswitchable, or optical highlighter fluorescent protein, or any of variant thereof, including, without limitation, codon-optimized, rapidly folding, monomeric, increased stability, and enhanced fluorescence variants.
The term “transformation” as used herein refers to the introduction of one or more exogenous nucleic acid sequences or polynucleotides into a host cell or organism by using one or more physical, chemical, or biological methods. Physical and chemical methods of transformation (i.e., “transfection”) include, by way of non-limiting example, polyethylene glycol (PEG) mediated transformation. Biological methods of transformation include, by way of non-limiting example, transfer of DNA using engineered viruses or microbes (e.g., E. coli).
The terms, “identical” or percent “identity”, in the context of two or more nucleic acids or polypeptide sequences, refer to two or more sequences or subsequences that are the same or have a specified percentage of amino acid residues or nucleotides that are the same, when compared and aligned for maximum correspondence over a comparison window. The degree of amino acid or nucleic acid sequence identity can be determined by various computer programs for aligning the sequences to be compared based on designated program parameters. For example, sequences can be aligned and compared using the local homology algorithm of Smith & Waterman (1981) Adv. Appl. Math. 2:482-89, the homology alignment algorithm of Needleman & Wunsch (1970) J. Mol. Biol. 48:443-53, or the search for similarity method of Pearson & Lipman (1988) Proc. Nat'l. Acad. Sci. USA 85:2444-48, and can be aligned and compared based on visual inspection or can use computer programs for the analysis (for example, GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group, 575 Science Dr., Madison, WI).
The BLAST algorithm, described in Altschul et al. (1990) J. Mol. Biol. 215:403-10, is publicly available through software provided by the National Center for Biotechnology Information (available at http://www.ncbi.nlm.nih.gov). This algorithm identifies high scoring sequence pairs (HSPS) by identifying short words of length W in the query sequence, which either match or satisfy some positive-valued threshold score T when aligned with a word of the same length in a database sequence. T is referred to as the neighborhood word score threshold (Altschul et al., 1990). Initial neighborhood word hits act as seeds for initiating searches to find longer HSPs containing them. The word hits are then extended in both directions along each sequence for as far as the cumulative alignment score can be increased. Cumulative scores are calculated for nucleotides sequences using the parameters M (reward score for a pair of matching residues; always >0) and N (penalty score for mismatching residues; always <0). For amino acid sequences, a scoring matrix is used to calculate the cumulative score. Extension of the word hits in each direction are halted when: the cumulative alignment score falls off by the quantity X from its maximum achieved value; the cumulative score goes to zero or below due to the accumulation of one or more negative-scoring residue alignments; or the end of either sequence is reached. For determining the percent identity of a nucleic acid sequence, the default parameters of the BLAST programs can be used. For analysis of nucleic acid sequences, the BLASTN program defaults are word length (W), 11; expectation (E), 10; M=5; N=−4; and a comparison of both strands. The TBLASTN program (using a protein sequence to query nucleotide sequence databases) uses as defaults a word length (W) of 3, an expectation (E) of 10, and a BLOSUM 62 scoring matrix. See Henikoff & Henikoff (1989) Proc. Nat'l. Acad. Sci. USA 89:10915-19.
In addition to calculating percent sequence identity, the BLAST algorithm also performs a statistical analysis of the similarity between two sequences (see, e.g., Karlin & Altschul (1993) Proc. Nat'l. Acad. Sci. USA 90:5873-87). The smallest sum probability (P(N)), provides an indication of the probability by which a match between two nucleotide or amino acid sequences would occur by chance. For example, a nucleic acid is considered similar to a reference sequence if the smallest sum probability in a comparison of the test nucleic acid to the reference nucleic acid is less than about 0.1, preferably less than about 0.01, and more preferably less than about 0.001.
Nucleotide sequences were identified and isolated, which find use as promoter sequences (e.g., inducible or auto-inducible promoter sequences) in the expression of genes (e.g., genes encoding enzymes associated with methanogenesis or reverse methanogenesis) in prokaryotic microorganisms (e.g., methanogenic archaea). The methods by which these sequences were identified is described more fully in the Examples set forth herein.
Thus, an isolated DNA molecule is provided, comprising a sequence having at least about 80% identity to at least about 80 contiguous nucleotides of SEQ ID No. 1.
For example, the isolated DNA molecule can comprise a sequence that can have at least about 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity, or that can have about 100% percent identity, to at least about 80 contiguous nucleotides of SEQ ID No. 1. The isolated DNA molecule can comprise a sequence that can have at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity, or that can have about 100% percent identity, to at least about 81 contiguous nucleotides of SEQ ID No. 1. The isolated DNA molecule can comprise a sequence that can have at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity, or that can have about 100% percent identity, to at least about 82 contiguous nucleotides of SEQ ID No. 1. The isolated DNA molecule can comprise a sequence that can have at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity, or that can have about 100% percent identity, to at least about 83 contiguous nucleotides of SEQ ID No. 1. The isolated DNA molecule can comprise a sequence that can have at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity, or that can have about 100% percent identity, to at least about 84 contiguous nucleotides of SEQ ID No. 1. The isolated DNA molecule can comprise a sequence that can have at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity, or that can have about 100% percent identity, to at least about 85 contiguous nucleotides of SEQ ID No. 1. The isolated DNA molecule can comprise a sequence that can have at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity, or that can have about 100% percent identity, to at least about 86 contiguous nucleotides of SEQ ID No. 1. The isolated DNA molecule can comprise a sequence that can have at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity, or that can have about 100% percent identity, to at least about 87 contiguous nucleotides of SEQ ID No. 1. The isolated DNA molecule can comprise a sequence that can have at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity, or that can have about 100% percent identity, to at least about 88 contiguous nucleotides of SEQ ID No. 1. The isolated DNA molecule can comprise a sequence that can have at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity, or that can have about 100% percent identity, to at least about 89 contiguous nucleotides of SEQ ID No. 1. The isolated DNA molecule can comprise a sequence that can have at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity, or that can have about 100% percent identity, to at least about 90 contiguous nucleotides of SEQ ID No. 1. The isolated DNA molecule can comprise a sequence that can have at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity, or that can have about 100% percent identity, to at least about 91 contiguous nucleotides of SEQ ID No. 1. The isolated DNA molecule can comprise a sequence that can have at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity, or that can have about 100% percent identity, to at least about 92 contiguous nucleotides of SEQ ID No. 1. The isolated DNA molecule can comprise a sequence that can have at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity, or that can have about 100% percent identity, to about 93 contiguous nucleotides of SEQ ID No. 1.
Also provided herein is an isolated DNA molecule comprising a sequence having at least about 80% identity to at least about 200 contiguous nucleotides of SEQ ID No. 2.
For example, the isolated DNA molecule can comprise a sequence that can have at least about 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity, or that can have about 100% percent identity, to at least about 200 contiguous nucleotides of SEQ ID No. 2. The isolated DNA molecule can comprise a sequence that can have at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity, or that can have about 100% percent identity, to at least about 205 contiguous nucleotides of SEQ ID No. 2. The isolated DNA molecule can comprise a sequence that can have at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity, or that can have about 100% percent identity, to at least about 210 contiguous nucleotides of SEQ ID No. 2. The isolated DNA molecule can comprise a sequence that can have at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity, or that can have about 100% percent identity, to at least about 215 contiguous nucleotides of SEQ ID No. 2. The isolated DNA molecule can comprise a sequence that can have at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity, or that can have about 100% percent identity, to at least about 220 contiguous nucleotides of SEQ ID No. 2. The isolated DNA molecule can comprise a sequence that can have at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity, or that can have about 100% percent identity, to at least about 225 contiguous nucleotides of SEQ ID No. 2. The isolated DNA molecule can comprise a sequence that can have at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity, or that can have about 100% percent identity, to at least about 230 contiguous nucleotides of SEQ ID No. 2. The isolated DNA molecule can comprise a sequence that can have at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity, or that can have about 100% percent identity, to at least about 235 contiguous nucleotides of SEQ ID No. 2. The isolated DNA molecule can comprise a sequence that can have at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity, or that can have about 100% percent identity, to at least about 240 contiguous nucleotides of SEQ ID No. 2. The isolated DNA molecule can comprise a sequence that can have at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity, or that can have about 100% percent identity, to about 243 contiguous nucleotides of SEQ ID No. 2.
The sequences provided herein (e.g., a sequence having a percent identity to a number of contiguous nucleotides of SEQ ID No. 1 or a sequence having a percent identity to a number of contiguous nucleotides of SEQ ID No. 2) can comprise a region, for example a region rich in A nucleotides and T nucleotides (“an AT-rich region”) having at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity, or having about 100% identity, to at least about 15, 16, 17, 18, 19, 20, 21, 22, 23 or 24, or to about 25, contiguous nucleotides of SEQ ID No. 3. For example, the region may be a region having about 80% identity to at least about 20 contiguous nucleotides of SEQ ID No. 3.
The sequences provided herein (e.g., a sequence having a percent identity to a number of contiguous nucleotides of SEQ ID No. 1 or a sequence having a percent identity to a number of contiguous nucleotides of SEQ ID No. 2) can comprise a region, for example an untranslated region, having at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity, or having about 100% identity, to at least about 15, 16, 17, 18, 19, 20, or 21, or to about 22, contiguous nucleotides of SEQ ID No. 4. For example, the region may be a region having about 80% identity to at least about 18 contiguous nucleotides of SEQ ID No. 4. Optionally, both a region having a percent identity to a number of contiguous nucleotides of SEQ ID No. 3 and a region having a percent identity to a number of contiguous nucleotides of SEQ ID No. 4 can be present in a sequence provided herein, e.g., in a sequence having a percent identity to a number of contiguous nucleotides of SEQ ID No. 1, or in a sequence having a percent identity to a number of contiguous nucleotides of SEQ ID No. 2.
The sequences provided herein (e.g., a sequence having a percent identity to a number of contiguous nucleotides of SEQ ID No. 1 or a sequence having a percent identity to a number of contiguous nucleotides of SEQ ID No. 2) can comprise a region, for example an untranslated region, having at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity, or having about 100% identity, to at least about 15, 16, 17, 18, 19, or 20, or to about 21, contiguous nucleotides of SEQ ID No. 5. For example, the region may be a region having about 80% identity to at least about 17 contiguous nucleotides of SEQ ID No. 5. Optionally, both a region having a percent identity to a number of contiguous nucleotides of SEQ ID No. 3 and a region having a percent identity to a number of contiguous nucleotides of SEQ ID No. 5 can be present in a sequence provided herein, e.g., in a sequence having a percent identity to a number of contiguous nucleotides of SEQ ID No. 1, or in a sequence having a percent identity to a number of contiguous nucleotides of SEQ ID No. 2.
The sequences provided herein (e.g., a sequence having a percent identity to a number of contiguous nucleotides of SEQ ID No. 1 or a sequence having a percent identity to a number of contiguous nucleotides of SEQ ID No. 2) can comprise a region, for example an untranslated region, having at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity, or having about 100% identity, to at least about 15, 16, 17, 18, or 19, or to about 20, contiguous nucleotides of SEQ ID No. 6. For example, the region may be a region having about 80% identity to at least about 16 contiguous nucleotides of SEQ ID No. 6. Optionally, both a region having a percent identity to a number of contiguous nucleotides of SEQ ID No. 3 and a region having a percent identity to a number of contiguous nucleotides of SEQ ID No. 6 can be present in a sequence provided herein, e.g., in a sequence having a percent identity to a number of contiguous nucleotides of SEQ ID No. 1, or in a sequence having a percent identity to a number of contiguous nucleotides of SEQ ID No. 2.
The sequences provided herein may optionally have a total length of no more than 100, 200, 300, 400, 500, 600, 700, 800, 900 or 1000 base pairs (bp). The sequences provided herein may optionally have a total length of no more than 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 or 100 kilobases (kb). The sequences provided herein may optionally have a total length of no more than 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 megabases (Mb).
An isolated DNA molecule provided herein can find use, for example, as a sequence that when operably linked to a nucleic acid sequence can affect expression of the nucleic acid sequence, which can comprise, for example, a sequence encoding a polypeptide or functional RNA. For example, the isolated DNA molecule may mediate transcription of the operably-linked nucleic acid sequence (or a portion thereof) as a promoter. Thus, isolated DNA molecules provided herein (e.g., a sequence having a percent identity to a number of contiguous nucleotides of SEQ ID No. 1 or a sequence having a percent identity to a number of contiguous nucleotides of SEQ ID No. 2) can have promoter activity. Thus, provided herein are promoters that can comprise or consist of one or more (e.g., 2, 3, 4 or 5) isolated DNA molecules provided herein.
Methods for assessing the functionality of nucleotide sequences for promoter activity, as well as for enhancing or decreasing the activity of proximal promoters, are well-known in the art. For example, promoter function can be validated by confirming the ability of the putative promoter (or promoter variant or fragment) to drive expression of a selectable marker gene conferring resistance to an antibiotic or antibiotics (e.g., puromycin, neomycin, 8-azahypoxanthine, 6-azauracil) to which the putative promoter (or promoter fragment or variant) is operably linked, by detecting and, optionally, analyzing, resistant colonies after plating of cells transformed with the promoter construct on selective media.
Additionally or alternatively, promoter activity may be assessed by measuring the levels of RNA transcripts produced from a promoter construct, for example, using reverse transcription-polymerase chain reaction (RT-PCR), by detection of the expressed protein, or by in vivo assays that rely on an activity of the protein encoded by the transcribed sequence. By way of non-limiting example, promoter activity can be assessed by in vivo assays using a fluorescent protein gene to determine the functionality of any of the sequences disclosed herein, including sequences of reduced size or having one or more nucleotide changes with respect to any of the sequences disclosed herein, such as SEQ ID Nos. 1 or 2.
Thus, provided herein is a promoter comprising a sequence provided herein, such as a sequence having a percent identity to a number of contiguous nucleotides of SEQ ID No. 1, as described hereinabove. For example, provided herein is a promoter comprising a sequence having at least about 80% identity to at least about 80 contiguous nucleotides of SEQ ID No. 1. Also provided herein is a promoter comprising a sequence having a percent identity to a number of contiguous nucleotides of SEQ ID No. 2, as described hereinabove. For example, provided herein is a promoter comprising a sequence having at least about 80% identity to at least about 200 contiguous nucleotides of SEQ ID No. 2.
A promoter provided herein may optionally comprise a sequence including a region having a percent identity to a number of contiguous nucleotides of SEQ ID No. 3, as described hereinabove, for example about 80% identity to at least about 20 contiguous nucleotides of SEQ ID No. 3. A promoter provided herein may optionally comprise a sequence including a region having a percent identity to a number of contiguous nucleotides of SEQ ID No. 4, SEQ ID No. 5, or SEQ ID No. 6, as described hereinabove. For example, the region may be: a region having about 80% identity to at least about 18 contiguous nucleotides of SEQ ID No. 4; a region having about 80% identity to at least about 17 contiguous nucleotides of SEQ ID No. 5; or a region having about 80% identity to at least about 16 contiguous nucleotides of SEQ ID No. 6. A promoter provided herein may optionally comprise a sequence including both: a region having a percent identity to a number of contiguous nucleotides of SEQ ID No. 4, SEQ ID No. 5, or SEQ ID No. 6; and a region having a percent identity to a number of contiguous nucleotides of SEQ ID No. 3.
A promoter provided herein can optionally be an inducible promoter (e.g., an inducible phosphate responsive promoter, such as an inducible inorganic phosphate responsive promoter). For example, the promoter can be active in culture conditions in which one or more nutrients (e.g., inorganic phosphates) are deficient, but not in culture conditions in which the one or more nutrients are sufficient for proliferation and/or growth of the culture. For example, a promoter provided herein can optionally direct expression of an operably linked nucleic acid sequence under conditions in which a host cell that includes the promoter construct is limited in inorganic phosphate availability (inorganic phosphate depletion/deficiency) but not under conditions in which a host cell that includes the promoter construct has unlimited inorganic phosphate availability (inorganic phosphate replete conditions). A promoter provided herein can optionally be an autoinducible promoter (e.g., an autoinducible phosphate responsive promoter, such as an autoinducible inorganic phosphate responsive promoter).
Without wishing to be bound by theory, promoters allow RNA polymerase to attach to DNA near a gene in order for transcription to take place. Promoters contain specific DNA sequences that provide transcription factors to an initial binding site from which they can recruit RNA polymerase binding. These transcription factors have specific protein motifs that enable them to interact with specific corresponding nucleotide sequences to regulate gene expressions.
A proximal promoter sequence may optionally be approximately 250 basepairs (bp) upstream of the translational start site of the open reading frame of the gene and may contain, in addition to sequences for binding RNA polymerase, specific transcription factor binding sites. Some promoters also include a distal sequence upstream of the gene that may contain additional regulatory elements, often with a weaker influence than the proximal promoter. Further, promoters or portions of promoters provided herein may optionally be combined in series to achieve a stronger level of expression or a more complex pattern of regulation.
Thus, a promoter provided herein can optionally comprise more than one, such as 2, 3, 4 or 5, of the sequences provided herein. For example, a promoter provided herein can comprise more than one sequence (e.g., 2, 3, 4 or 5 sequences) having a percent identity to a number of contiguous nucleotides of SEQ ID No. 1, such as about 80% identity to at least about 80 contiguous nucleotides of SEQ ID No. 1. A promoter provided herein can optionally comprise more than one sequence (e.g., 2, 3, 4 or 5 sequences) having a percent identity to a number of contiguous nucleotides of SEQ ID No. 2, such as about 80% identity to at least about 200 contiguous nucleotides of SEQ ID No. 2. A promoter provided herein can optionally comprise one or more sequences (e.g., 1, 2 or 3 sequences) having a percent identity to a number of contiguous nucleotides of SEQ ID No 1 (such as about 80% identity to at least about 80 contiguous nucleotides of SEQ ID No. 1) in addition to one or more sequences (e.g., 1, 2 or 3 sequences) having a percent identity to a number of contiguous nucleotides of SEQ ID No. 2 (such as about 80% identity to at least about 200 contiguous nucleotides of SEQ ID No. 2).
A promoter provided herein can optionally comprise further elements in addition to one or more sequences provided herein. Additional regulatory element sequences contemplated herein for use with the promoters provided herein include cis regulatory elements (e.g., further promoters, enhancers, silencers, operators) and trans regulatory elements. Regulatory elements may be isolated or identified from untranslated regions (UTRs) from a particular polynucleotide sequence. Any of the regulatory elements described herein may optionally be present in a chimeric or hybrid regulatory expression element.
A promoter or promoter region can optionally include variants of the promoters provided herein, derived by deleting sequences, duplicating sequences, or adding sequences from other promoters or as designed, for example, by bioinformatics, or by subjecting the promoter to random or site-directed mutagenesis, etc.
In various optional examples, a promoter provided herein can mediate transcription of an operably linked nucleic acid sequence in a prokaryotic cell (e.g., an archaeal cell, such as a methanogenic archaeal cell, such as a Methanococcus maripaludis cell). In some instances, a promoter provided herein can mediate transcription of an operably linked nucleic acid sequence in a prokaryotic cell, during culturing of the cell under conditions of inorganic phosphate depletion, but not during culturing of the cell under inorganic phosphate replete conditions. For example, a promoter provided herein can mediate transcription of an operably linked nucleic acid sequence in a methanogenic archaeal cell (e.g., a Methanococcus maripaludis cell) cultured under conditions of inorganic phosphate depletion, but not under inorganic phosphate replete conditions.
The activity or strength of a promoter may be measured in terms of the amount of RNA it produces, or the amount of protein accumulation in a cell or tissue, which can optionally be measured by an activity of the expressed protein, e.g., fluorescence, luminescence, acyltransferase activity, etc., relative to a promoter whose transcriptional activity has been previously assessed, relative to a promoterless construct, or relative to non-transformed cells. For example, the activity or strength of a promoter may be measured in terms of the amount of mRNA accumulated that corresponds to a nucleic acid sequence to which it is operably linked in a cell, relative to the total amount of mRNA or protein produced by the cell. The promoter preferably expresses an operably linked nucleic acid sequence at a level greater than 0.01%; preferably in a range of about 0.5% to about 20% (w/w) of the total cellular RNA. The activity can also be measured by quantifying fluorescence, luminescence, or absorbance of the cells or a product made by the cells or an extract thereof, depending on the activity of a reporter protein that may be expressed from the promoter. The activity or strength of a promoter may be expressed relative to a well-characterized promoter (for which transcriptional activity was previously assessed). For example, a less-characterized promoter may optionally be operably linked to a reporter sequence (e.g., a fluorescent protein) and introduced into a specific cell type. A well-characterized promoter is similarly prepared and introduced into the same cellular context. Transcriptional activity of the unknown promoter is determined by comparing the amount of reporter expression, relative to the well characterized promoter.
Promoter activity can optionally be assessed by measuring the amount, relative to total protein in cell-free extracts, of a recombinant protein made by the cells. The promoter (e.g., an inducible or autoinducible phosphate responsive promoter, such as an inducible or autoinducible inorganic phosphate responsive promoter) may express a recombinant protein (e.g., a recombinant enzyme, such as an enzyme associated with methanogenesis or reverse methanogenesis) at a level greater than 1%, e.g., in a range of about 1% to about 20% (w/w), about 1% to about 15%, or about 1% to about 10% (w/w) of total protein in cell-free extracts.
Expression cassettes are also provided herein. The expression cassettes comprise a promoter comprising an isolated DNA molecule as provided herein, and a gene encoding a polypeptide or a functional RNA sequence, wherein the gene is operably linked to the promoter. The gene may optionally be positioned downstream of the promoter sequence. Optionally, the expression cassette also comprises one or more additional regulatory elements as described herein. The basic techniques for operably linking two or more sequences of DNA together are familiar to the person of ordinary skill in the art.
The promoters of the invention can optionally be used with any exogenous or endogenous gene(s). An exogenous or endogenous gene contemplated herein may encode a protein or a polypeptide. For example, the gene can be an exogenous or endogenous gene that encodes an enzyme that catalyses one or more reactions in methanogenesis or in reverse methanogenesis (e.g., methylcoenzyme M reductase). Alternatively, the gene can encode a functional RNA. Any known or later-discovered exogenous or endogenous gene which encodes a desired product can be operably linked to a promoter sequence provided herein using known methods.
Non-limiting examples of known genes suitable for use with the promoters provided herein include genes encoding enzymes associated with methanogenesis or reverse methanogenesis, such as: formylmethanofuran dehydrogenase (Fmd); formylmethanofuran:H4MPT formyltransferase (Ftr); methenyl-H4MPT cyclohydrolase (Mch); F420-dependent methenyl-H4MPT cyclohydrolase (Mtd); H2-forming (F420-independent) methenyl-H4MPT cyclohydrolase (Hmd); methylene-H4MPT reductase (Mer); methyl-H4MPT:coenzyme M methyltransferase (Mtr); methyl-coenzyme M reductase (Mcr); heterodisulfide reductase (Hdr); and F420-reducing hydrogenase (Frh).
Further non-limiting examples of known genes suitable for use with the promoters provided herein include reporter proteins (e.g., fluorescent proteins or enzymes that produce detectable products).
In further examples, an expression cassette can comprise a promoter provided herein operably linked to a gene encoding a functional RNA, optionally wherein the functional RNA is an antisense RNA, a small hairpin RNA, a microRNA, an siRNA, an snoRNA, a piRNA, or a ribozyme.
Also provided herein are vectors that comprise the expression cassettes provided herein. The vector can be a plasmid. The vector can be a shuttle vector, such as a pURB500 shuttle vector.
A vector provided herein may optionally further comprise one or more selectable markers and/or reporter genes. For example, in addition to the expression cassette provided herein, the vector may optionally further comprise one or more of: a selectable marker gene, a reporter gene, an origin of replication, and one or more sequences for promoting integration of the expression cassette into the host genome.
By way of example, a vector that includes an expression cassette may optionally include, as one or more selectable markers, one or more genes conferring resistance to an antibiotic or antibiotics (e.g., puromycin, neomycin, 8-azahypoxanthine, 6-azauracil) so that transformants can be selected by exposing the cells to the agent(s) and selecting those cells which survive the encounter. The selectable marker can optionally be operably linked to and/or under the control of a promoter. The promoter regulating expression of the selectable marker may be inducible (e.g. autoinducible) and can be, for example, any promoter provided herein, or another promoter. The selectable marker may optionally be placed under the control of the expression cassette promoter.
If a vector provided herein that includes an expression cassette lacks a selectable marker gene, transformants may be selected by routine methods familiar to those skilled in the art, such as, by way of a non-limiting example, extracting nucleic acid from the putative transformants and screening by PCR.
Alternatively or in addition, transformants may be screened by detecting expression of a reporter gene, such as but not limited to a gene encoding a fluorescent protein, such as any of the blue, cyan, green, red, yellow, photoconvertible, or photoswitchable fluorescent proteins or any of their variants. The reporter gene can be operably linked to and/or under the control of a promoter. The promoter regulating expression of the reporter gene may be inducible and can be, for example, any promoter provided herein, or another promoter. The reporter gene may optionally be placed under the control of the expression cassette promoter.
In addition to the promoters provided herein, one skilled in the art would know various promoters, introns, enhancers, promoter proximal DNA element, initiator element, transit peptides, targeting signal sequences, 5′ and 3′ untranslated regions (UTRs), IRES, 2A sequences, and terminator sequences, as well as other molecules involved in the regulation of gene expression that are useful in the design of effective expression vectors. The expression vector may contain one or more enhancer elements. Enhancers are short regions of DNA that can bind trans-acting factors to enhance transcription levels. Although enhancers usually act in cis, an enhancer need not be particularly close to its target gene. Enhancers can sometimes be located in introns.
The vector can further include one or more additional genes or constructs for transfer into the host cell, which can optionally be operably linked to a promoter provided herein, or can optionally be operably linked to another promoter.
A prokaryotic host cell is also provided, transformed with an expression vector as provided herein. Thus, a prokaryotic host cell is provided comprising a vector as described herein. A prokaryotic host cell is provided comprising an isolated nucleic acid molecule as described herein. A prokaryotic host cell is provided comprising an expression cassette as described herein.
The prokaryotic host cell can be an archaeal cell. Archaea are single-celled microorganisms. Although their cellular structure may be similar to that of bacteria, they form a separate domain of life and are evolutionarily distinct from bacteria and eukaryotes. Methanogenic archaea are obligate anaerobes. Typically, methanogenic archaea are found in oxygen-depleted environments.
Archaeal transcription is different from that of bacteria. The archaeal basal transcription apparatus closely resembles that of eukaryotes. Archaeal RNA polymerase (RNAP) requires the activity of transcription initiation factors, including the TATA-box binding protein (TBP) and transcription factor B (TFB) to recognize promoter sequences, and these initiation factors are homologs to their eukaryotic counterparts. Transcription initiation occurs upon the sequence-specific binding of TBP to a TATA box upstream of the transcription start sites (TSS) and TFB to a factor B recognition element (BRE) upstream of the TATA box. RNAP is then recruited to the TSS, and the preinitiation complex is formed. By contrast, transcription regulation is bacterial-like, with over half of the identified archaeal transcription factors having at least one bacterial homolog.
The prokaryotic host cell provided herein can optionally be a methanogenic archaeal cell. Methanogenic archaea are also known as methanogens. Methanogenic archaea can produce methane (in a process known as methanogenesis) from CO2, using H2 as the reducing agent. Methanogens that use carbon dioxide as a source of carbon, and hydrogen as a reducing agent are known as hydrogenotrophic methanogens. Alternatives to CO2 include, but are not limited to: acetate; formate; methanol; and methylamines. Methanogens play an important role in the ecosystems of anaerobic environments, by removing the products of other anaerobes, including excess hydrogen. The thermal breakdown of water and water radiolysis are other possible sources of hydrogen. Methanogens may thrive in environments in which electron acceptors other than CO2 (for example, oxygen, Fe(III) ions, nitrate ions, and sulfate ions) are depleted.
The overall reduction of carbon dioxide to methane in the presence of hydrogen (H2/CO2 methanogenesis) can be expressed as:
CO2+4H2→CH4+2 H2O
In the earliest stage of H2/CO2 methanogenesis, CO2 binds to methanofuran (MF) and is reduced to formyl-MF with oxidation of reduced ferredoxin (Fdred2−) to oxidized ferredoxin (Fdox). The reaction is catalyzed by formyl-MF dehydrogenase (Fmd):
CO2+Fdred2−+MF+2H+→HCO-MF+Fdox+H2O
Subsequently, the formyl moiety from formyl-MF is transferred to the coenzyme tetrahydromethanopterin (H4MPT) thereby forming formyl-H4MPT. The reaction is catalyzed by formyl transferase (Ftr):
HCO-MF+H4MPT→HCO—H4MPT+MF
Formyl-H4MPT is then dehydrated to methenyl-H4MPT. This step is catalyzed by methenyl-H4MPT cyclohydrolase (Mch):
HCO—H4MPT+H+→CH—H4MPT++H2O
Methenyl-H4MPT is then converted to methylene-H4MPT with oxidation of coenzyme F420, assisted by F420-dependent methenyl-H4MPT cyclohydrolase (Mtd).
CH—H4MPT++F420H2→CH2=H4MPT+F420+H+
Alternatively, the formation of methylene-H4MPT may be assisted by F420-independent methenyl-H4MPT cyclohydrolase (Hmd).
CH—H4MPT++H2→CH2=H4MPT+H+
Subsequently, methylene-H4MPT is reduced to methyl-H4MPT with oxidation of coenzyme F420, which reaction is catalyzed by methylene-H4MPT reductase (Mer):
CH2═H4MPT+F420H2→CH3—H4MPT+F420
Following this, a methyl group is transferred from methyl-M4MPT to coenzyme M. The reaction is catalyzed by methyl-H4MPT:coenzyme M methyltransferase (Mtr):
CH3—H4MPT+HS-CoM→CH3—S-CoM+H4MPT
The final steps of H2/CO2 methanogenic are assisted by coenzymes N-7 mercaptoheptanoylthreonine phosphate (coenzyme B or HS-CoB) and coenzyme F430. H2 donates electrons to the mixed disulfide of HS-CoM and regenerates coenzyme M. The relevant enzymes are methyl-coenzyme M reductase (Mcr) and heterodisulfide reductase (Hdr) complex:
CH3—S-CoM+HS-CoB→CH4+CoM-S—S—CoB
CoM-S—S-CoB+Fdox+2H2→CoM-SH+HS-CoB+Fdred2−+2H+
It will be appreciated that methanogenesis from different starting materials than H2/CO2 can occur via analogous processes.
The methanogenic archael host cell can optionally be a cell of one of the following families: Methanomicrobiaceae; Methanospirillaceae; Methanocorpusculaceae; Methanoregulaceae; Methanocalculaceae; Methanosarcinaceae; Methanosaetaceae; Methermicoccaceae; Methanopyraceae; Methanocellaceae; and Methanomassiliicoccaceae.
The methanogenic archaeal host cell can optionally be a cell of one of the following genera: Methanobacterium; Methanobrevibacter; Methanococcus; Methanocorpusculum; Methanoculleus; Methanoflorens; Methanofollis; Methanogenium; Methanomicrobium; Methanopyrus; Methanoregula; Methanosaeta; Methanosarcina; Methanosphaera; Methanospirillium; Methanothermobacter; and Methanothrix.
The methanogenic archaeal host cell can optionally be a cell of: Methanobacterium bryantii; Methanobacterium formicum; Methanobrevibacter arboriphilicus; Methanobrevibacter gottschalkii; Methanobrevibacter ruminantium; Methanobrevibacter smithii; Methanocalculus chunghsingensis; Methanococcoides burtonii; Methanococcus aeolicus; Methanococcus voltae; Methanocaldococcus jannaschii; Methanococcus maripaludis; Methanococcus vannielii; Methanocorpusculum labreanum; Methanoculleus bourgensis (Methanogenium olentangyi & Methanogenium bourgense); Methanoculleus marisnigri; Methanoflorens stordalenmirensis; Methanofollis liminatans; Methanogenium cariaci; Methanogenium frigidum; Methanogenium organophilum; Methanogenium wolfei; Methanomicrobium mobile; Methanopyrus kandleri; Methanoregula boonei; Methanosaeta concilii; Methanosaeta thermophila; Methanosarcina acetivorans; Methanosarcina barkeri; Methanosarcina mazei; Methanosphaera stadtmanae; Methanospirillium hungatei; Methanothermobacter defluvii; Methanothermobacter thermautotrophicus; Methanothermobacter marburgensis; Methanothermobacter thermoflexus; Methanothermobacter wolfei; or Methanothrix soehngenii. The host cell can be a Methanococcus cell. The host cell can be a Methanococcus maripaludis cell.
Provided herein is a cell culture, comprising prokaryotic host cells as provided herein.
The cell culture may be an anaerobic cell culture. For example, the cell culture may be under an atmosphere comprising, or consisting of, a mixture of N2 and CO2. N2 and CO2 may be present in a ratio by volume of about 70-90% N2:about 10-30% CO2; about 75-90% N2:about 10-25% CO2; or about 75-85% N2:about 15-25% CO2. For example, N2 and CO2 may be present in a ratio by volume of about 80% N2:about 20% CO2. Where the cell culture is under an atmosphere comprising N2 and CO2, the pressure of the atmosphere may be about 50-200 kPa, 50-150 kPa, or 70-130 kPa, for example about 103 kPa.
The cell culture, including its atmosphere, may optionally contain a limited supply of electron acceptors other than CO2 (for example, oxygen, Fe(III) ions, nitrate ions, and sulfate ions). The cell culture, including its atmosphere, may be free or substantially free of electron acceptors other than CO2 (for example, oxygen, Fe(III) ions, nitrate ions, and sulfate ions). For example, the medium of the cell culture may be sparged with N2 and/or CO2 prior to inoculation, thereby removing all or substantially all O2.
The cell culture may optionally comprise a medium that contains a source of carbon for reduction, including, but not limited to: CO2; acetate; formate; methanol; and methylamines, which may be introduced to the medium by any appropriate means. The cell culture may optionally comprise a medium that contains a source of reductant, including, but not limited to, H2 or cysteine chloride.
The cell culture may optionally comprise a medium comprising (e.g., having an atmosphere comprising) H2 and CO2 (in this case, the medium may conveniently be termed an ‘H2/CO2 medium’). Where the cell culture is supplied with H2 and CO2, these may be present in a ratio by volume of about 70-90% H2:about 10-30% CO2; about 75-90% H2:about 10-25% CO2; or about 75-85% H2:about 15-25% CO2. For example, H2 and CO2 may be present in a ratio by volume of about 80% H2:about 20% CO2. One or both of H2 and CO2 may optionally be supplied to the culture medium intermittently. For example, prior to inoculation, the atmosphere of the culture medium may be pressurized with H2 and/or CO2. Subsequent to inoculation, refills of H2 and/or CO2 can be provided at suitable time intervals, for example at 12 hour, 24 hour, or 36 hour time intervals. Where both CO2 and H2 are supplied intermittently, CO2 may be supplied at different time intervals to H2. One or both of H2 and CO2 may be supplied to the culture medium continuously. Thus, both CO2 and H2 may be supplied to the culture medium intermittently; or CO2 may be supplied intermittently while H2 is supplied continuously; or H2 may be supplied intermittently while CO2 is supplied continuously; or both CO2 and H2 may be supplied continuously. It will be appreciated that where a supply of CO2 and/or H2 is continuous, it need not be never-ending, and could be paused, for example during start-up or shut-down of the cell culture (e.g., for the purpose of maintenance of the apparatus comprising the cell culture). Where the cell culture is under an atmosphere comprising H2 and CO2, the pressure of the atmosphere may be about 250-300 kPa, 260-290 kPa, or 270-280 kPa, for example about 276 kPa.
The cell culture may optionally be conveniently located close to an industrial source of CO2, for example a power plant (e.g., a fossil fuel or a biomass power plant) or petroleum refinery. Additionally or alternatively, CO2 for provision to the cell culture may be captured from the atmosphere. CO2 capture may be carried out by any appropriate means known to one of ordinary skill in the art, including but not limited to membrane capture; oxyfuel combustion; absorption; multiphase absorption; adsorption; chemical looping combustion; calcium looping; and cryogenic capture.
The cell culture may optionally comprise a medium comprising formate (which may conveniently be termed a ‘formate medium’). For example, the medium may comprise sodium formate. The medium may comprise about 0.01-1 M, 0.01-0.8 M, 0.01-0.6 M, 0.1-0.6 M, or 0.2-0.6 M formate. For example, the medium may comprise about 0.4 M formate, such as sodium formate. In addition to formate, the medium may comprise a reductant, for example cysteine hydrochloride, coenzyme M, or dithiothreitol. The medium may comprise about 1-10 mM, 1-5 mM, or 2-5 mM of a reductant. For example, the medium may comprise about 3 mM of a reductant, such as cysteine hydrochloride. Where the medium comprises formate, it may be under an atmosphere comprising, or consisting of, a mixture of N2 and CO2, as described hereinabove.
The cell culture medium may optionally comprise a buffer, for example one or more of Tris/HCl (wherein Tris is 2-amino-2-hydroxy-methyl-propane-1,3-diol), glycine/NaOH, and glycylglycine/NaOH.
The cell culture medium may optionally comprise a sulfur source, such as a sulfide, for example sodium sulfide. The medium may optionally comprise 1-10 mM, 1-5 mM, or 2-5 mM sulfide. For example, the medium may comprise about 3 mM sulfide, such as sodium sulfide.
The cell culture medium may optionally comprise phosphates, for example inorganic phosphates. The concentration of phosphates in the medium may optionally be kept constant over time. Alternatively, the concentration of phosphates in the medium may vary over time. For example, the concentration of phosphates in the medium may be adjusted to an initial value prior to inoculation, and decline following inoculation as phosphate is taken up by the cells. The initial concentration of phosphate may be less than 500 μM, less than 400 μM, less than 300 μM, or less than 200 μM. The supply of phosphates may decline over time as phosphate is taken up by cells, but be replenished intermittently, for example at intervals of about 36 hours, 48 hours, or 72 hours.
The cell culture may be kept at a temperature of from about 0-100° C., 10-60° C., 20-50° C., 30-50° C., or 35-45° C., for example at about 37° C.
The cell culture may be a co-culture, comprising one or more further microbial cells in addition to the prokaryotic host cells provided herein. The additional microbial cells may be bacterial cells. The additional microbial cells may be eukaryotic cells. The additional microbial cells may be prokaryotic cells. Thus, the additional microbial cells may be of the same or a different domain to the prokaryotic host cells provided herein. The additional microbial cells may be archaeal cells, for example methanogenic archaeal cells. For example, both the host cells provided herein and the additional microbial cells may be methanogenic archaeal cells, in which case they may be cells of the same strain (e.g., Methanococcus maripaludis) or of different strains.
The cell culture may optionally be provided in a culture tube having a volume of at least about 1 mL, 2 mL. 5 mL, 10 mL, 25 mL, 50 mL or 100 mL and up to about 1 L, 500 mL, 200 mL, 100 mL, 50 mL or 30 mL. For example, the tube may have a volume of about 28 mL. Suitable tubes will be familiar to one of ordinary skill in the art. The tube may be acid-washed in 1% (v/v) hydrochloric acid (HCl) overnight before use. A rubber stopper and aluminum crimp seal may be used to seal the tube.
The cell culture may be of an industrial scale. For example, the cell culture may occupy a container having a volume of up to about 1 m3, 2 m3, 4 m3, 5 m3, 6 m3, 7 m3, 8 m3, 9 m3, 10 m3, 20 m3, 30 m3, 40 m3, 50 m3, 60 m3, 70 m3, 80 m3, 90 m3, 100 m3, 200 m3, 300 m3, 400 m3, 500 m3, 600 m3, 700 m3, 800 m3, 900 m3, or 1000 m3. For example, the cell culture may occupy a container having a volume of at least about 0.1 m3, 0.2 m3, 0.5 m3, 1 m3, 2 m3, 4 m3, 5 m3, 6 m3, 7 m3, 8 m3, 9 m3 or 10 m3.
Provided herein is a method for inducing expression of an exogenous gene in a prokaryotic cell culture provided herein. The method comprises lowering the concentration of phosphate, for example inorganic phosphate (e.g. K2HPO4) in the culture, from an initial phosphate concentration.
The initial phosphate concentration may optionally be selected depending on the environmental conditions under which the cells exist. For example, the initial phosphate concentration may be selected depending on the type of medium in which the cells are cultured. It will be appreciated that the initial phosphate concentration selected when the medium comprises formate (a “formate medium”) may be the same as, or may differ from, the initial phosphate concentration selected when the medium does not comprise formate but is under an atmosphere of H2 and CO2 (an “H2/CO2 medium”).
The initial phosphate concentration may optionally be less than about 1 mM, 750 μM, 500 μM, 400 μM, 300 μM, 290 μM, 280 μM, 270 μM, 260 μM, 250 μM, 240 μM, 230 μM, 220 μM, 200 μM, 190 μM, 180 μM, 170 μM, 160 μM, 150 μM, 140 μM, 130 μM, 120 μM, 110 μM, 100 μM, 90 μM, 80 μM, 70 μM, 60 μM, 50 μM, 40 μM, 30 μM, 20 μM, or 10 μM. The initial phosphate concentration may optionally be in a range of about 50-750 μM, 50-500 μM, 50-300 μM, 50-200 μM, or 80-150 μM.
By way of example, when the medium comprises formate, the initial phosphate concentration may be less than about 300 μM, 290 μM, 280 μM, 270 μM, 260 μM, 250 μM, 240 μM, 230 μM, 220 μM, 200 μM, 190 μM, 180 μM, 170 μM, 160 μM, 150 μM, 140 μM, 130 μM, 120 μM, 110 μM, 100 μM, 90 μM or 80 μM. By way of example, when the medium comprises (e.g., the atmosphere of the medium comprises) H2 and CO2, the initial phosphate concentration may be less than about 200 μM, 190 μM, 180 μM, 170 μM, 160 μM, 150 μM, 140 μM, 130 μM, 120 μM, 110 μM, 100 μM, 90 μM or 80 μM.
In some instances, a promoter provided herein can mediate transcription of an operably linked nucleic acid sequence in a prokaryotic cell, during culturing of the host cell under conditions of phosphate (e.g., inorganic phosphate) depletion, but not during culturing of the cell under phosphate replete conditions. For example, a promoter provided herein can mediate transcription of an operably linked nucleic acid sequence in a methanogenic archaeal cell (e.g., a Methanococcus maripaludis cell) cultured under conditions of inorganic phosphate depletion, but not under inorganic phosphate replete conditions.
In this way, when the concentration of phosphate (e.g., inorganic phosphate) is high, the host cells grow; at this stage, recombinant gene expression is low. When the concentration of phosphate drops to limiting concentration during growth, recombinant gene expression can be upregulated at the same time as biomass production is limited. Thus, growth is decoupled from expression, allowing the production of proteins or metabolites inhibitory to growth.
It will be appreciated that, apart from manipulating the concentrations of phosphate and other growth conditions, protein expression can optionally be controlled by modifying the transcription initiation rate. It will similarly be appreciated that protein expression can optionally be controlled by modifying the translation initiation rate. For example, protein expression can be controlled by altering one or more of: the core promoter elements, 5′ untranslated region (UTR), ribosome binding sites (RBS), and start codon.
Thus, provided herein is a practical regulatory system for protein overexpression. The present system is particularly suitable for large scale (e.g. industrial) cultures and thus for large-scale protein and chemical production.
The present invention also provides transformation methods in which a prokaryotic cell (in particular one of the cells listed above under the heading “cells”) is transformed with an expression vector provided herein. A method provided herein comprises introducing a vector provided herein into a prokaryotic cell and selecting for a transformed prokaryotic cell. A method is also provided herein for co-transforming a prokaryotic cell, comprising introducing the expression cassette provided herein and a nucleic acid sequence encoding a selectable marker into a prokaryotic cell, and selecting for the presence of the selectable marker in a transformed prokaryotic cell, thereby providing a prokaryotic cell transformed with the expression cassette.
The expression vector may be introduced by methods familiar to those skilled in the art. Relevant processes include, as non-limiting examples: natural transformation; polyethylene glycol (PEG) mediated transformation; E. coli conjugation; liposome-mediated transformation; PEG spheroplasting; electroporation; CaCl2 mediated transformation; and heat shock. Thus, by way of example, host cells provided herein (for example, methanogenic archaeal cells, such as Methanococcus cells, e.g., Methanococcus maripaludis cells) may be transformed by a method such as E. coli conjugation.
Selectable markers contemplated herein include any gene that confers a phenotype on a cell in which it is expressed to facilitate the selection of cells that are transfected or transformed with a nucleic acid construct of the invention. The term may also be used to refer to gene products that effectuate said phenotypes. Examples of selectable markers include genes conferring resistance to antibiotics or antimicrobials (e.g., puromycin, neomycin, 8-azahypoxanthine, 6-azauracil). Further examples of selectable markers include genes that may be used in auxotrophic strains (e.g., histidine auxotrophy) or to confer other metabolic effects.
The following examples are merely illustrative, and do not limit this disclosure in any way.
Methanococcus maripaludis strains (Table 1) were grown anaerobically at 37° C. in an atmosphere of 80% N2: 20% CO2 in minimal formate or H2/CO2 medium as described in Long, F. et al., A Flexible System for Cultivation of Methanococcus and Other Formate-Utilizing Methanogens, Archaea 2017, 7046026, and Sarmiento, F. et al., Genetic systems for hydrogenotrophic methanogens, Methods Enzymol 494, 43-73, the contents of which are incorporated herein by reference in their entirety. For low inorganic phosphate (LPi) treatment, the concentration of potassium phosphate dibasic (K2HPO4) in the medium was reduced from 800 μM to 40 μM or 80 μM, while the high Pi treatment (HPi) remained at 800 μM. 28 mL culture tubes or 160 mL culture bottles with a 28 mL tube side-arm were acid-washed in 1% (v/v) hydrochloric acid (HCl) overnight before use. Rubber stoppers and aluminum crimp seals were used to seal the tubes and bottles. The plasmids (Table 1) were maintained in the recombinant M. maripaludis strains by adding 2.5 μg/mL puromycin to the medium unless otherwise stated. The strains were pre-grown in 5 mL low Pi medium except otherwise stated. After 12-24 hours, 4% inoculum was transferred into the low or high Pi medium. All cultures were prepared at least in triplicate. Growth was monitored via optical density at 600 nm (OD600) with a spectrophotometer.
M. maripaludis mcr operon
The plasmids and PCR primers used are listed in Table 1 and Table 2, respectively. All PCR amplifications were done either by Phusion® High-Fidelity DNA Polymerase (NEB, M0530) or Q5® High-Fidelity DNA Polymerase (NEB, M0515). All ligations were done with T4-ligase (NEB, M0202S). Plasmids pMEV5mT was made by postfixing the Tsl terminator (Table 1) after the mCherry reporter gene in the pMEV4m plasmid, where expression of mCherry was driven by the constitutive promoter PhmvA. The terminator was introduced by amplifying pMEV4m with the AflII-containing primers 0 mT-F/R, digested and ligated at AflII. The addition of the terminator was expected to separate the transcriptions of gene inserts and puromycin resistance pac cassette, which would minimize any pleiotropic effects especially in the event of strong and constitutive gene expression.
To develop a phosphate-dependent regulatory expression system, the wild-type pst promoter (Ppst; including the 5′-UTR for MMP1095) was cloned into the pMEV5mT by replacing the constitutive PhmvA promoter therein. First, a PhmvA and RBS region-less backbone was made by amplifying the pMEV5mT with primers 5p-F346/R46 where the restriction sites 5′-NdeI and 3′-HindIII were included. Then, the Ppst was amplified from the M. maripaludis genomic DNA with primers 5i-F4/R346 where restriction sites 5′-HindIII and 3′-NdeI were introduced. Lastly, the amplified products were restricted at HindIII and NdeI before cloning into the backbone restricted at the same sites. The resulting plasmid, pMEV5mT-P243 carried the wild-type Ppst. Two plasmids were made to further truncate the wild-type Ppst. This truncation was done by amplifying the pMEV5mT-P243 plasmid with primers P1-F/R and P2-F/P1-R where an XbaI site was included at both the 5′- and 3′-ends. The amplified products were then restricted by XbaI and ligated, which resulted in plasmids pMEV5mT-P93 and pMEV5mT-P67 carrying the 93 bp_Ppst and 67 bp_Ppst promoters, respectively. Plasmids pMEV5mT-P85, pMEV5mT-P80, pMEV5mT-P73, pMEV5mT-P93-BRE1-3 and pMEV5mT-P93-UTR1-5 were all synthesized by GenScript. These plasmids were cloned into E. coli Top10. The plasmids pMEV5mT-P93-5S and pMEV5mT-P93-4S were constructed by deleting the 6th nucleotide and 5th and 6th nucleotide from the start codon by amplifying the pMEV5mT-P93 with the primers SL1-F/R and SL2-F/SL1-R, respectively, using the Q5© Site-Directed Mutagenesis kit (E0554S). The plasmids pMEV5mT-P93-TTG and pMEV5mT-P93-GTG were constructed by amplifying the pMEV5mT-P93 with primers SC1-F/R and SC2-F/SC1-R, respectively, using the same kit. The primers were designed on the NEBaseChanger™ (http://nebasechanger.neb.com/) and are listed in Table 2. These plasmids were cloned into NEB® 5-alpha Competent E. coli using the NEBuilder® HiFi DNA Assembly Cloning Kit (NEB #E5520). The plasmids were purified from E. coli and transformed into M. maripaludis S0001. Colonies were picked from formate medium plates containing puromycin (see Long, F. et al., A Flexible System for Cultivation of Methanococcus and Other Formate-Utilizing Methanogens, Archaea 2017, 7046026).
To put the mcrBDCGA under the phosphate-dependent expression system, the mCherry gene in pMEV5mT-P243 was replaced by the TAP-tagged mcrBDCGA. First, an mCherry-less backbone was amplified from pMEV5mT-P243 with the primers Vector-F/R while mcrBDC and mcrGA were amplified from M. maripaludis genome with the primers BDC-F/R and GA-F/R. All primers were designed using the NEBuilder assembly tool (https://nebuilder.neb.com/) with overlapping sequences on the 5′ end of both forward and reverse primers. The TAP tag (3×FLAG-Twin Strep Tag II) was first amplified from the plasmid AAVS1_Puro_PGK1_3×FLAG_Twin_Strep_EZH2 (Addgene, 79902) with the primers TAP-ATG-F/R which include ATG before the N-terminal 3×FLAG and the amino acids serine and glycine after the Twin strep tag IL. This product was amplified again with primers TAP-F/R including the overlapping sequences required for Gibson assembly. All PCR amplification was done using Q5® High-Fidelity DNA Polymerase (NEB #M0491) and digestion of the template DNA with DpnI (NEB #R0176). The PCR fragments were then assembled using the NEBuilder® HiFi DNA Assembly Master Mix (NEB #E2621) before cloning into NEB® 5-alpha Competent E. coli using the NEBuilder® HiFi DNA Assembly Cloning Kit (NEB #E5520). Similarly, the plasmids pMEV5mT-P243-N-TAP-MMPX and pMEV5mT-P243-C-TAP-MmpX were constructed by cloning the M. maripaludis mmpX into pMEV5mT-P243 with a N- or C-terminal TAP tag instead of the mCherry gene. The M. maripaludis mmpX was amplified from M. maripaludis genome using the primers N-mmp10-F/R or C-mmp10-F/R. The TAP tag was amplified from the plasmid AAVS1_Puro_PGK1_3×FLAG_Twin_Strep (Addgene, 68375) with the primers N-TAP-F/R or C-TAP-F/R. The two PCR products (mmpX gene and TAP tag) were assembled into mCherry-less backbone using the NEBuilder® HiFi DNA Assembly Cloning Kit (NEB #E5520), resulting in the expression plasmids pMEV5mT-P243-N-TAP-MMPX and pMEV5mT-P243-C-TAP-MMP10. All resulting plasmids except for mmpX plasmids purified from E. coli were transformed into M. maripaludis S0001, and colonies were picked from formate medium plates containing puromycin. The two recombinant mmpX plasmids were transformed into the ΔmmpX deletion strain of M. maripaludis, and colonies were also isolated on formate plates with puromycin (see Lyu, Z. et al., Posttranslational Methylation of Arginine in Methyl Coenzyme MReductase Has a Profound Impact on both Methanogenesis and Growth of Methanococcus maripaludis, J Bacteriol 202, 2020, which is incorporated herein by reference in its entirety). All the recombinant plasmids were verified by both PCR and sequencing.
mCherry Reporter Assay.
At each sampling point, 2 mL culture was anaerobically collected in a micro-centrifuge tube. The cells were harvested by centrifugation at 17,000 g for 1 minute. The cell pellet was resuspended in 200 μL of 25 mM PIPES (Piperazine-1,4-bis (2-ethanesulfonic acid) dipotassium salt) buffer adjusted to pH 6.8 using KOH. The cells were lysed by freezing in −20° C. and thawing at room temperature once before exposing the cell extract to oxygen overnight at 30° C. to allow for the mCherry chromophore to mature. The cell debris was separated from the cell extract by centrifugation at 17,000 g for 1 minute before collecting the cell extract containing the mCherry protein in the supernatant. Then, 100 μL of the supernatant was transferred into a Nunc® 96-well polystyrene, black, flat-bottom plate. Fluorescence reading was obtained using the BioTek Synergy™ Mx plate reader. Using the Gen 5 software, the excitation and emission wavelengths were set to 575 nm and 610 nm respectively with a bandwidth of 13.5 nm and gain of 75. The shake time was set to 10 seconds. The optics position was at the top, and the read height was 8 mm. The resulting fluorescent unit (FU) was normalized to the optical density at 600 nm (OD600) using the formula: NFU=FU/OD600. The FU of wild type M. maripaludis 50001 without the expression vector was used to correct for background noise.
RNA Extraction and cDNA Synthesis.
RNA was extracted using the Invitrogen PureLink RNA mini kit and the Invitrogen PureLink Pro 96 total RNA Purification kit, with column digestion of DNA using the Invitrogen PureLink DNase according to manufacturer instructions (Invitrogen, USA). Following the RNA purification, an additional DNase treatment was applied to all pure samples using Invitrogen TURBO DNase according to manufacturer's instructions, in order to completely remove any residual DNA. Total RNA was quantified using the Invitrogen Qubit RNA BR Assay kit. An Agilent 4150 TapeStation was used to confirm clear 16S and 23S rRNA peaks and evaluate RNA integrity before reverse transcription of the purified RNA was performed using the Applied Biosystems Power SYBR Green RNA-to-Ct 1-Step kit on the Applied Biosystems 7500 Fast Real-Time PCR system according to the manufacturers protocol (Applied Biosystems, USA). Data generated was analyzed with the 7500 Software v2.0.6 (Applied Biosystems, USA).
qRT-PCR.
The primers were created from the mCherry gene DNA sequence within the plasmid construct using ThermoFisher Primer Express software v3.0.1, with a melting temperature of 60° C., a primer length of 25 bp on the forward primer (mCherry-F) and 23 bp on the reverse primer (mCherry-R), and a GC content of 48%, and an amplicon product length of 125 bp with a 39% GC content. mCherry-F/R primers were evaluated in silica for similarity to the M. maripaludis genome (RefSeq NC_0035552) with BLASTn search, and evaluated for non-specific binding to the M. maripaludis genome with a qPCR assay using purified genomic gDNA of M. maripaludis. qPCR standard curves were created with 10-fold serial dilutions between 109 and 105 copies per reaction. qPCR DNA standards were made from pure mCherry plasmids. The plasmid DNA was quantified with the Invitrogen Qubit 1× dsDNA HS Assay kit. Triplicate qPCR reactions were performed for each control, standard, and sample. All assays had efficiencies between 90-99% and R2 values above 0.99. Melt curves were analyzed for all reactions and showed no non-specific products. 40 cycle qPCR assays with no reverse transcriptase enzyme was run for all samples to check for amplification from contaminating DNA. No amplification was observed for all no template controls and all no RT enzyme control reactions. For qRT-PCR of mCherry, 0.5 ng of total RNA was used for each PCR reaction. All samples fit within the standard curve, and the amplification efficiency is 97% with R2 of 0.99.
The Phosphate Colorimetric kit (Sigma-Aldrich MAK030-1KT) was modified to measure the Pi concentration in the medium. This modification was required because salts of carboxylic acids, such as sodium formate and glycylglycine present in the medium, interfered with the assay. The interference was eliminated by the addition of HCl (final concentration of 0.9 M) to protonate the cations of these carboxylic acids. For generating the standard curve, 100 μL of phosphate-free medium was acidified with 100 μL of 1.8 M HCl before adding sufficient 0.5 mM K2HPO4 to generate 0, 5, 10, 20, 40 μM K2HPO4 standards. The mixture was performed in a Cellstar® tissue culture 96-well flat-bottom plate (Griener bio-one). To each well, 30 μL of the phosphate reagent was added and mixed. The reaction was incubated for 30 minutes at room temperature before measuring the absorbance at 650 nm using the BioTek Synergy™ Mx plate reader. For the assay of spent culture medium, after centrifugation to remove the cells, 100 μL of supernatant was acidified with 100 μL of 1.8 M HCl, 30 μL of the phosphate reagent was added, and the reaction was performed as described above. For the high Pi cultures, the spent medium was diluted 10-fold with phosphate-free formate medium before acidification.
Cells expressing TAP-tagged recombinant MCR or MMPX proteins were grown in minimal formate medium with replete (800 μM) or limiting phosphate (40 or 80 μM). Cells were harvested by centrifugation at 17,000 g for 4 min and resuspended in 50 mM Tris-HCl [pH 7.6]. Cells were lysed by sonication for 10 cycles of 5 sec ON/OFF with the output set at 4 and the duty cycle set at 40%. The cell debris was removed by centrifugation at 17,000 g for 4 min, and the supernatant was collected. Protein concentrations were determined by a Pierce BCA protein assay kit (Thermo Fisher Scientific). Proteins were separated on precast 4-20% SDS-PAGE gels (Bio-Rad) and then transferred onto methanol-activated polyvinylidene difluoride (PVDF) membranes. Nonspecific binding was blocked with 5% milk in phosphate-buffered saline and 0.1% Tween 20 (PBST) for 1.5 h at room temperature. The PVDF membranes were then incubated with primary antibodies against the FLAG tag (1:2,000 dilution; catalog no. MA1-91878, Thermo Fisher Scientific) for 1.5 h at room temperature, washed three times for 15 min with PBST, and then incubated with horseradish peroxidase (HRP)-conjugated goat anti-mouse secondary antibodies (1:20,000 dilution; catalog no. 31430; Thermo Fisher Scientific) for 1 h. After additional washing with PBST, PVDF membranes were developed using the Western HRP substrate (ECL, catalog no. 32132; Thermo Fisher Scientific). The relative intensity of each immunoreactive band was estimated using ImageJ.
The genome of M. maripaludis contains a gene cluster consisting of five ORFs (MMP1095-MMP1099) predicted to encode the phosphate specific transport (Pst) system that is highly upregulated during phosphate limitation. The 243 bp intergenic sequence between MMP1094 and MMP1095 was presumed to contain the promoter or Ppst (
To test the functionality of the Ppst, the entire 243 bp intergenic region was cloned upstream of a mCherry reporter system optimized for M. maripaludis in the expression plasmid, pMEV5mT, resulting the pMEV5-Ppst-mCherry plasmid (
The predicted regulatory elements of Ppst were examined by a series of truncation mutants. To determine the role of the cis-1 BRE and TATA box, the 243 bp Ppst was truncated to create 93 bp Ppst. Expression from 93 bp Ppst was very similar to that from 243 bp Ppst, indicating that the cis-1 BRE and TATA site did not play a significant role in expression from the promoter (
In formate-based medium, growth is limited by formate availability at an OD600 of about 0.8. Thus, it was possible that expression might have been limited by the availability of an energy source late in growth. To test this possibility, expression was examined in H2:CO2-based medium where the electron donor was in great excess (
To optimize regulation from Ppst, the best growth conditions to achieve good cell yields and mCherry expression were investigated. Four initial phosphate concentrations (40, 80, 150, and 800 μM) were selected based on phosphate limitation studies in M. maripaludis34, 38, 39. Increasing phosphate concentrations resulted in higher cell yields, with the highest optical density at 800 μM Pi(OD600, ˜0.8) and the lowest at 40 μM Pi(OD600, ˜0.3) (
To further examine the role of phosphate in the inoculum, these experiments were repeated with inocula grown at high or low phosphate (
Puromycin is required for selection and maintenance of the expression plasmid; however, its detoxification by puromycin transacetylase requires acetyl-CoA, a central metabolite whose limitation might impose metabolic restraints40. That said, the addition of up to 2.5 μg ml−1 puromycin had no significant impact on either cell growth or mCherry expression (
BRE and TATA box sequences are major determinants of transcription efficiency in archaea. Stable binding of the TATA binding protein (TBP) to the highly conserved TATA element requires the concomitant binding of the transcription factor B protein (TFB) to the BRE. Consequently, weak binding of the TBP to the TATA box can be compensated by a strong TFB-BRE interaction or the presence of activators to recruit TBP. On the other hand, a weak TFB-BRE can only be compensated by an activator. There are no reported specific BRE consensus sequences for methanogens. However, in archaea generally, positions −3 and −6 of a 6-7 bp BRE relative to the TATA box have the strongest specificity determinants. To test the effect of BRE on the strength of the pst promoter, the native BRE sequence was replaced by sequences from the promoter of a highly transcribed gene, slp (BRE-slp), and two weakly transcribed genes, MMP1338 and MMP0466 (BRE-1338 and BRE-0466, respectively) (
Next, the impact of translation initiation on mCherry expression was studied. First, the effect of 5′ UTR on protein production from Ppst was investigated. The 5′ UTR can influence transcript stability, translation efficiency and protect mRNA from ribonuclease attack. Using the RNAfold program, the secondary structure of the native Ppst 22 bp 5′ UTR (UTR-pst) was predicted. The minimum free energy of folding (ΔG) was −16.3 kJ/mol, and the RBS was within a predicted stem-loop structure (
Similarly, the stability of the hairpins in UTR-slp (−3.3 kJ/mol), UTR-hmmA (−3.8 kJ/mol) and UTR-mcr (−1.7 kJ/mol) were all weaker than that in UTR-pst (
The effect of alternative start codons on translational efficiency of mCherry was also examined. The predominant start codon in most archaeal genomes is ATG, constituting 70-90% of predicted start codons; however, TTG and GTG are also common start codons. In fact, TTG is the start codon of MMP1095, the first ORF of the native pst operon. The ATG start codon of mCherry was replaced with TTG and GTG. During phosphate limitation, there was no significant difference in mCherry NFU between ATG and the alternative start codons TTG and GTG (
To test the Ppst gene expression system, expression vectors were constructed for recombinant mmpX and mcrBDCGA in M. maripaludis.
The mmpX gene encodes the methanogen marker protein 10, an S-adenosyl methionine-dependent arginine methylase responsible for the methyl-Arg posttranslational modifications of methylcoenzyme M reductase (which can conveniently be termed Mcr or MCR, and which is referred to in this Example as MCR). Under control of the PhmvA promoter, only low levels of the recombinant protein were expressed in M. maripaludis, presumably because of its toxicity. By contrast, both C-terminal and N-terminal FLAG-tagged MmpXs were expressed at high levels using the 243 bp Ppst following growth at low phosphate concentrations (
The gene operon mcrBDCGA encodes methylcoenzyme M reductase (MCR). MCR is the enzyme catalyzing the terminal step in methanogenesis. This complex enzyme contains the nickel tetrapyrrole coenzyme F430 as a prosthetic group and possesses multiple unusual posttranslational modifications.
The 243-bp Ppst was introduced into the pMEV5 shuttle vector for heterologous expression of the Methanococcus aeolicus methylcoenzyme M reductase (MCRaeo) enzyme complex. The recombinant MCR was provided with a tandem affinity purification (TAP) tag comprising of a 3×-FLAG and Twin-Strep tags on the N-terminal of the gamma subunit.
Following growth in formate medium under low and high phosphate concentrations, the relative expression levels of the recombinant MCR were determined by Western blots of cell free extracts of M. maripaludis. Following growth in 40 and 80 μM phosphate, MCR was expressed 2.6 and 3.3-fold higher, respectively, than following growth in 800 μM phosphate (
The 93-bp Ppst was introduced into the pMEV5 shuttle vector for heterologous expression of the Methanococcus aeolicus methylcoenzyme M reductase (MCRaeo) enzyme complex (
In summary, an inorganic phosphate-regulated gene expression system for heterologous protein production in M. maripaludis was studied using the mCherry reporter assay, and a 3- to 4-fold increase in protein production was achieved upon phosphate limitation. This fold-change was further increased to 6 with comparable overall mCherry expressions when translation initiation was optimized via changes to the 5′ UTR of Ppst. Other (alternative) changes to the 5′-UTR increased overall mCherry expression by 2.5-fold while maintaining the same 3- to 4-fold change at limiting and replete Pi concentrations, suggesting translation initiation plays an important and tunable role in protein expression. The optimal growth conditions for increased gene expression were found to be 80-150 μM initial phosphate concentration. Recombinant MCR expression of ca. 5% of total protein was observed in limiting phosphate concentrations. This expression system, using an inducible promoter, also overcomes the growth burden associated with the toxicity of expressing MmpX with a constitutive promoter, since it largely increases expression late in growth, after biomass has been accumulated.
Set forth below are sequences disclosed herein together with their SEQ ID Nos where applicable.
M. aeolicus_
M. voltae_A3
M. vannielii_
M. maripaludis_
M. maripaludis_
M. maripaludis_
M. maripaludis_
M. maripaludis_
Many alterations, modifications, and variations will be apparent to those skilled in the art in light of the foregoing description without departing from the spirit or scope of the present disclosure.
When numerical lower limits and numerical upper limits are listed herein, ranges from any lower limit to any upper limit are contemplated.
This application claims the priority benefit of U.S. Ser. No. 63/202,230, filed Jun. 2, 2021, which is hereby incorporated by reference in its entirety. This application contains references to nucleic acid sequences which have been submitted concurrently herewith as a sequence listing file. The aforementioned sequence listing is hereby incorporated by reference in its entirety.