The present invention relates to the fields of life sciences, genetics and regulation of gene expression. Specifically, the invention relates to a non-viral transcription activation domain for a eukaryotic host. Also, the present invention relates to a polypeptide or artificial transcription factor comprising the transcription activation domain of the present invention. And furthermore, the present invention relates to a polynucleotide, an expression cassette, expression system, and/or a eukaryotic host. Still, the present invention relates to a method for producing a desired protein product in the eukaryotic host of the present invention or to a method of preparing a non-viral transcription activation domain of the present invention or a polynucleotide encoding said non-viral transcription activation domain. And still further, the present invention relates to use of the transcription activation domain, polypeptide, artificial transcription factor, polynucleotide, expression cassette, expression system or eukaryotic host of the present invention for metabolic engineering and/or production of a desired protein product.
Controlled and predictable gene expression is very difficult to achieve even in well-established hosts, especially in terms of stable expression in diverse cultivation conditions or stages of growth. In addition, for many potentially interesting industrial hosts, there is a very limited (or even absent) spectrum of tools and/or methods to accomplish expression of heterologous genes or to control expression of endogenous genes. In many instances, this prohibits the use of said interesting industrial hosts (often very promising hosts) in industrial applications.
Transcription factors greatly influence the regulation of gene expression. Usually there are at least two domains in transcription factors. DNA binding domains (DBD) bind promoters of target genes and activation domains (AD) participate in activating the transcription by interacting with the transcriptional machinery. There have been numerous previous attempts to introduce new transcription factors or domains thereof suitable for robust control of gene expression in engineered biological systems.
In artificial gene expression systems, the use of virus-derived transcription activation domains (e.g. VP16 or VP64) is currently the most common solution for high-level expression. Also, other components derived from viruses or cancer-development-associated proteins may be used in efficient artificial expression systems. For example, Chavez A et al. describe an improved transcriptional regulator obtained through the rational design of a tripartite activator, VP64-p65-Rta (VPR) fused to nuclease-null Cas9, where the VP64 is derived from human herpes simplex virus, p65 is a human protein associated with multiple types of cancer, and Rta is derived from the Epstein-Barr virus (Chavez A et al. 2015, Nat Methods, 12(4), 326-328).
Use of plant (Arabidopsis thaliana) native transcription factors for regulation of gene expression in yeast have been described by Naseri G et al. (2017, ACS Synthetic Biology, 6, 1742-1756). In that study, Naseri G et al., focused on use of fusion transcription factors containing additional activation domains in their structure, especially the virus-based VP16 activation domain, the GAL4-activation domain of Saccharomyces cerevisiae origin, and the EDLL motif of Arabidopsis thaliana origin.
While the expression systems containing viral or cancer associated transcription activation domains are highly efficient, their use in many biotechnological applications, especially in food or medicine production, might be problematic due to the current regulations and customer and/or patient acceptance. There is, therefore, a need for novel transcription activation domains, which would replace the currently used virus-based domains. Furthermore, the new types of activation domains must provide sufficient level of functionality in the gene expression systems to achieve similar or better production of the target compounds. In addition, the efficient non-viral transcription activation domains, and gene expression systems based on them, should provide robust and stable gene expression in several different species and genera of production organisms.
The objects of the invention, namely novel efficient transcription activation domains and tools and methods related thereto, can be used for functionally replacing the virus-based activation domains without compromising the performance of the gene expression system. The expression systems, containing the novel transcription activation domains, will provide robust and stable expression, a broad spectrum of expression levels, and can be used in several different species and genera. This is achieved by utilizing transcription activation domains derived from transcription factors found in plant species, e.g. in the species of edible plants.
Indeed, it has now been surprisingly found that modifications of plant derived transcription activation domains rendered novel activation domains, which are highly active, and, importantly, retain high activity in diverse eukaryotic organisms. These novel activation domains are non-viral transcription activation domains originating from plants that can be used for regulation of gene expression in an expression system e.g. in eukaryotes.
With the present invention defects of the prior art including but not limited to use of viral DNA-elements in an artificial expression system, can be overcome. The prior art lacks efficient activation domains and expression systems, which are functional across diverse species and at the same time are acceptable or suitable for all technological fields and industries utilizing gene expression including food and pharma.
Surprisingly, the inventors were able to develop specific activation domains originating from plants species. Said activation domains can be used in diverse expression systems as such, e.g. replacing the current activation domains used. Indeed, the activation domains of the present invention can be incorporated into expression systems based on the artificial (synthetic) transcription factors, without compromising the function of said systems; all previously demonstrated benefits of the artificial transcription systems can be retained or improved.
The present invention enables e.g. efficient transfer to and testing of engineered metabolic pathways simultaneously in several potential production hosts for functionality evaluation. Furthermore, the present invention provides tools for an orthogonal gene expression thus providing benefits to the scientific community studying e.g. eukaryotic organisms.
Furthermore, the present invention allows broadening the use of artificial expression systems in applications, where the use of potentially problematic (viral) DNA elements is not welcome.
The present invention relates to a non-viral transcription activation domain for a eukaryotic host or for an artificial expression system in a eukaryotic host, wherein said transcription activation domain originates from a plant or from a plant transcription factor, e.g. from an edible plant or found in an edible plant.
Also, the present invention relates to a polypeptide comprising a non-viral transcription activation domain for a eukaryotic host or for an artificial expression system in a eukaryotic host, wherein said transcription activation domain originates from a plant or from a plant transcription factor.
Also, the present invention relates to an artificial transcription factor, wherein said artificial transcription factor comprises a non-viral transcription activation domain for a eukaryotic host or for an artificial expression system in a eukaryotic host, a DNA-binding domain and a nuclear localization signal, wherein said transcription activation domain originates from a plant or from a plant transcription factor. Still, the present invention relates to a polynucleotide encoding the transcription activation domain, polypeptide or artificial transcription factor of the present invention.
And still, the present invention relates to an expression cassette or expression system, wherein said expression cassette or expression system comprises the polynucleotide encoding the transcription activation domain, polypeptide or artificial transcription factor of the present invention.
Still furthermore, the present invention relates to a eukaryotic host comprising the transcription activation domain, polypeptide, artificial transcription factor, polynucleotide, expression cassette or expression system of the present invention.
Still furthermore, the present invention relates to a method for producing a desired protein product in a eukaryotic host comprising cultivating the host of the present invention under suitable cultivation conditions.
And still furthermore, the present invention relates to use of the transcription activation domain, polypeptide, artificial transcription factor, polynucleotide, expression cassette, expression system or eukaryotic host of the present invention for metabolic engineering and/or production of a desired protein product.
And still furthermore, the present invention relates to a method of preparing a non-viral transcription activation domain of the present invention or a polynucleotide encoding said non-viral transcription activation domain, wherein said method comprises obtaining a transcription activation domain polypeptide originating from a plant transcription factor or obtaining a polynucleotide encoding said transcription activation domain polypeptide originating from a plant transcription factor, and modifying the obtained transcription activation domain polypeptide or polynucleotide.
Other objects, details and advantages of the present invention will become apparent from the following drawings, detailed description and examples.
The target gene expression cassette can comprise or comprises multiple sTF-specific binding sites, here exemplified by eight sTF-specific binding sites (8 BS) positioned upstream of a core promoter, here exemplified by An_201cp (SEQ ID NO: 23) of Aspergillus niger origin. The eight sTF-binding sites and the core promoter form a synthetic promoter, which strongly activates the transcription of a target gene, in presence of synthetic transcription factor (sTF). The target gene could be any DNA sequence encoding a protein product of interest, here exemplified by mCherry-encoding DNA sequence (see Example 1, Example 2, and Example 8), or exemplified by a xylanase enzyme-encoding DNA sequence (see Example 3 and Example 5), or exemplified by a bovine β-lactoblobulin B-encoding DNA sequence (see Example 7). The transcription of the target gene can be terminated on the terminator sequence, here exemplified by the Trichoderma reesei pdc1 terminator (Tr_PDC1t).
The synthetic transcription factor (sTF) expression cassette contains a core promoter (Tr_hfb2cp; SEQ ID NO: 25), a sTF coding sequence, and a terminator. The core promoter provides constitutive low expression of the sTF. The sTF binds to the sTF-dependent synthetic promoter in the target gene expression cassette facilitating its transcription. The sTF comprises or is composed of a DNA-binding-domain (BDB), which consists of bacterial DNA binding protein and nuclear localization signal, such as the SV40 NLS, and the transcription activation domain (AD). The AD is any transcription activation domain of plant origin, here exemplified by ten examples based on or originating from transcription factors found in Arabidopsis thaliana, Brassica napus, and Spinacia oleracea. The control AD is VP16 of herpes simplex virus origin. The transcription of the sTF gene can be terminated on the terminator sequence, here exemplified by the Trichoderma reesei tel1 terminator (Tr_TEF1t).
The selection marker (SM) expression cassette is any expression cassette allowing production of a specific protein in a host organism, which provides to the host organism means to grown under selection conditions, such as in presence of an antibiotic compound or an absence of essential metabolite. The SM cassette is exemplified here by the expression cassette allowing expression of the pyr4 gene (encoding orotidine 5′-phosphate decarboxylase enzyme) in Trichoderma reesei strain (Example 1, Example 3, and Example 8), or allowing expression of the hygR gene (encoding Hygromycin-B 4-O-kinase) in Myceliophthora thermophila (Example 5), or allowing expression of the pyrG gene (encoding orotidine 5′-phosphate decarboxylase enzyme) in Aspergillus oryzae strain (Example 7).
The sTF expression cassette can comprise (or consists of) a core promoter (An_008cp SEQ ID NO: 22), a sTF coding sequence, and a terminator. The sTF comprises (or consists of) DNA-binding-domain (BDB), which consists of bacterial DNA binding protein, here exemplified by the Bm3R1 repressor (Example 4), and nuclear localization signal, such as the SV40 NLS, and the transcription activation domain (AD). The AD is any transcription activation domain of plant origin, here exemplified by five examples based on or originating from transcription factors found in Arabidopsis thaliana, Brassica napus, and Spinacia oleracea selected based on the analysis performed in Example 1 (
The target gene expression cassette can comprise or comprises multiple sTF-specific binding sites, here exemplified by eight Bm3R1-specific binding sites (8 BS) positioned upstream of a core promoter, here exemplified by An_201cp (SEQ ID NO: 23) of Aspergillus niger origin. The target gene could be any DNA sequence encoding a protein product of interest, here exemplified by a phytase enzyme-encoding DNA sequence (see Example 4). The transcription of the target gene can be terminated on the terminator sequence, here exemplified by the Saccharomyces cerevisiae ADH1 terminator (Sc_ADH1t). The SM cassette is exemplified here by the expression cassette allowing expression of the Pichia pastoris URA3 gene (encoding orotidine 5′-phosphate decarboxylase enzyme) in Pichia pastoris (Example 4). The genome integration DNA regions (flanks) are exemplified here by genomic DNA sequences from Pichia pastoris located upstream of the AOX2 gene (AOX2-5′) and downstream of the AOX2 gene (AOX2-3′).
The target gene expression cassette can comprise or comprises multiple sTF-specific binding sites, here exemplified by eight sTF-specific binding sites (8 BS) positioned upstream of a core promoter (CP1), here exemplified by any of Mm_Atp5Bcp (SEQ ID NO: 26), or Mm_Eef2cp (SEQ ID NO: 27), or Mm_Rpl4cp (SEQ ID NO: 28) of Mus musculus origin. The target gene could be any DNA sequence encoding a protein product of interest, here exemplified by mCherry-encoding DNA sequence (see Example 6). The transcription of the target gene can be terminated on the terminator sequence (term1), here exemplified by any of SV40 terminator of simian virus 40 origin, or FTH1 terminator of Mus musculus origin (Table 1F; sequences shown in italics with grey highlight).
The sTF expression cassette can comprise a core promoter (CP2), a sTF coding sequence, and a terminator. The CP2 is exemplified here by any of Mm_Atp5Bcp (SEQ ID NO: 26), or Mm_Eef2cp (SEQ ID NO: 27), or Mm_Rpl4cp (SEQ ID NO: 28) of Mus musculus origin (Example 6). The sTF comprises or is composed of a DNA-binding-domain (BDB), which comprises or consists of bacterial DNA binding protein, exemplified here by the PhIF repressor of Pseudomonas protegens origin, or exemplified by the McbR repressor of Corynebacterium sp. origin (Example 6), and nuclear localization signal, such as the SV40 NLS, and the transcription activation domain (AD). The AD is any transcription activation domain of plant origin, here exemplified by two examples (So-NAC102M-SEQ ID NO: 10, and Bn-TAF1M-SEQ ID NO: 11) based on transcription factors found in Brassica napus, and Spinacia oleracea, which were selected based on the analysis performed in fungal hosts (Example 3, Example 4, Example 5). The control AD is VP64 of herpes simplex virus origin (SEQ ID NO: 30). The transcription of the sTF gene can be terminated on the terminator sequence (term2), here exemplified by any of SV40 terminator of simian virus 40 origin, or FTH1 terminator of Mus musculus origin (Table 1F; sequences shown in italics with grey highlight). The SM cassette is exemplified here by the expression cassette allowing expression of the pac gene (encoding puromycin N-acetyltransferase enzyme) in CHO cells (Example 6).
The target gene expression cassette can comprise or comprises multiple sTF-specific binding sites, here exemplified by eight sTF-specific binding sites (8 BS) positioned upstream of a core promoter (cp1), exemplified in Example 8 by An_201cp (SEQ ID NO: 23) of Aspergillus niger origin or exemplified by YI_565cp (SEQ ID NO: 32) of Yarrowia lipolytica origin, or exemplified in Example 9 by other core promoters. The eight sTF-binding sites and the core promoter form a synthetic promoter, which strongly activates the transcription of a target gene, in presence of synthetic transcription factor (sTF). The target gene could be any DNA sequence encoding a protein product of interest, here exemplified by mCherry-encoding DNA sequence (see Example 8 and Example 9). The transcription of the target gene can be terminated on the terminator sequence, here exemplified by the Saccharomyces cerevisiae ADH1 terminator (term1).
The synthetic transcription factor (sTF) expression cassette contains a core promoter (cp2), exemplified in Example 8 by An_008cp (SEQ ID NO: 22) or YI_242cp (SEQ ID NO: 33) or exemplified in Example 9 by other core promoters; the expression cassette further contains a sTF coding sequence, and a terminator. The core promoter provides constitutive low expression of the sTF. The sTF comprises or is composed of a DNA-binding-domain (BDB), which consists of bacterial DNA binding protein, such as Bm3R1 or TetR, and nuclear localization signal, such as the SV40 NLS, and the transcription activation domain, here exemplified by Bn_TAF1M (SEQ ID NO: 11). The sTF binds to the sTF-dependent synthetic promoter in the target gene expression cassette facilitating its transcription. In Example 8, where the TetR was used as the DBD of the sTF, the binding occurs in the absence of doxycycline, and the presence of increasing amounts of doxycycline leads to inhibition of the binding. The transcription of the sTF gene can be terminated on the terminator sequence, here exemplified by the Trichoderma reesei tef1 terminator (term2).
The selection marker (SM) expression cassette is any expression cassette allowing production of a specific protein in a host organism, which provides to the host organism means to grown under selection conditions, such as in presence of an antibiotic compound or an absence of essential metabolite. The SM cassette is exemplified here by the expression cassette allowing expression of the kanR gene (encoding aminoglycoside phosphotransferase enzyme) in Pichia pastoris strain (Example 8), or the expression cassette allowing expression of the NAT gene (encoding nourseothricin N-acetyl transferase) in Yarrowia lipolytica (Example 8 and Example 9) or Cutaneotrichosporon oleaginosus (Example 9).
The transcription factors studied by Naseri G et al. (2017, ACS Synthetic Biology, 6, 1742-1756) were from the NAC family of the Arabidopsis thaliana transcription factors, and some of the tested transcription factors, namely JUB1 and ATAF1, were shown to activate the transcription in Saccharomyces cerevisiae also without a fusion with other activation domains.
The NAC (i.e. NAM, ATAF, and CUC) family of the transcription factors is a large protein family containing functionally and structurally dissimilar proteins (Olsen, Ernst et al. 2015, Trends Plant Sci 10(2): 79-87). The NAC transcription factors share high degree of homology in the DNA-binding domains (the NAC domain), but often very low homology in the transcription activation domains.
The inventors of the present disclosure have now been able to identify the transcription activation domains of (e.g. NAC-family) transcription factors from e.g. Arabidopsis thaliana, Brassica napus, and Spinacia oleracea, the latter two species being common edible plant species, oilseed rape and spinach, respectively. While the high degree of sequence identity was present within the NAC domain, a large variation of sequence homology was found between the corresponding activation domains. For instance, the amino-acid sequence identity between TAF1-activation domain from Arabidopsis thaliana and Brassica napus was approximately 77%, while, the amino-acid sequence identity between JUB1-activation domain from Arabidopsis thaliana and Spinacia oleracea was only approximately 23%.
Also, the level of the activation domains functionality in the expression systems implemented in diverse fungal hosts was highly variable. For instance, the TAF1 activation domain of Arabidopsis thaliana origin was highly active in Trichoderma reesei, but almost inactive in Pichia pastoris (
In addition, the EDLL motif previously successfully used by Naseri G et al. in S. cerevisiae, or by Tiwari, Belachew et al. (2012, The Plant Journal 70(5): 855-865) in Arabidopsis thaliana, proved to be completely inactive when tested in Trichoderma reesei (data not shown). Therefore, observations of the present disclosure indicate unpredictable function of (some) plant activation domains in diverse host organisms.
The inventors noticed that some of the tested plant-derived activation domains, in particular the TAF1 activation domain of Brassica napus (Bn-TAF1-SEQ ID NO: 6) and the NAC102 activation domain of Spinacia oleracea (So-NAC102-SEQ ID NO: 3); comprise an amino-acid composition resembling the typical acidic activation domains, enriched with acidic amino acids (such as glutamate and/or aspartate) and hydrophobic amino acids (such as leucine, isoleucine, and/or phenylalanine). The native versions of these activation domains, however, also contained some basic amino acids (e.g. especially lysine), which was hypothesized to limit the activity of the activation domains. The inventors modified the sequences of the two mentioned activation domains by replacing the unfavorable amino acids (e.g. lysines) in their structures for the amino acids more fitting the typical acidic activation domains sequence (e.g. leucines and/or glutamates). Surprising results were found with the modified domains.
Indeed, the inventors of the present disclosure were able to create modified effective transcription activation domains from native plant transcription activation domains. Very strong domains were obtained, which can be successfully used e.g. for replacing the current viral or other domains in artificial expression systems.
Indeed, the present invention concerns a modified non-viral transcription activation domain i.e. a variant of a non-viral transcription activation domain. As used herein “a modified domain” or “a modified transcription activation domain” refers to any non-native domain or transcription activation domain, respectively, that contains different material (e.g. a different amino acid or modified amino acid) compared to a corresponding unmodified (i.e. native or wild type) domain. As an example, a modified domain may comprise a deletion, substitution, disruption or insertion of one or more amino acids or parts of a domain, or insertion of one or more modified amino acids, compared to the corresponding (native or wild type) domain without said modification.
A modification of a domain may have been obtained e.g. by modifying the polynucleotide encoding said domain by any genetic method. Methods for making genetic modifications are generally well known and are described in various practical manuals describing laboratory molecular techniques. Some examples of the general procedure and specific embodiments are described in the Examples chapter. In one specific embodiment of the invention a modified non-viral transcription activation domain has been obtained by rational mutagenesis or random mutagenesis of the polynucleotide encoding said transcription activation domain.
In one embodiment of the invention the transcription activation domain comprises one or several modifications and/or mutations compared to the corresponding wild type transcription activation domain (amino acid) sequence. In a specific embodiment said transcription activation domain comprises one or several amino acid modifications or amino acid mutations compared to the corresponding wild type (i.e. native) transcription activation domain sequence.
In one embodiment the modified transcription activation domain is a transcription activation domain variant comprising increased acidic and/or hydrophobic amino acid content compared to a native (i.e. unmodified) transcription activation domain. The acidic amino acids include aspartate and glutamate. The hydrophobic amino acids include alanine, valine, leucine, isoleucine, proline, phenylalanine, cysteine and methionine. In a specific embodiment the modified transcription activation domain or the transcription activation domain variant comprises more aspartate, glutamate, leucine, isoleucine, and/or phenylalanine amino acids compared to the native (i.e. unmodified) transcription activation domain.
In one embodiment the transcription activation domain is a recombinant, synthetic or artificial transcription activation domain. As used herein “a recombinant activation domain” refers to an activation domain that has been obtained by genetically modifying genetic material, i.e. said domain may have been produced by a recombinant DNA technology. In one embodiment a polynucleotide encoding “a recombinant activation domain” comprises mutations compared to the corresponding wild type polynucleotide (e.g. comprise a deletion, substitution, disruption or insertion of one or more nucleic acids including an entire gene(s) or parts thereof compared to the domain before modification). In one embodiment “a recombinant activation domain” comprises or is a polypeptide encoded by a polynucleotide that has been cloned in a system that supports expression of said polynucleotide and furthermore translation of said polypeptide. Indeed, a (genetically) modified polynucleotide can encode a mutant polypeptide. As used herein “a synthetic domain” refers to a domain that has been produced by linking multiple amino acids via amide bonds. Synthesis of polypeptides can be carried out by methods including but not limited to classical solution-phase techniques and solid-phase methods. Also, in some embodiments “synthetic” can be seen as a synonym for “recombinant” as defined above. “An artificial domain” refers to a domain, which is non-native i.e. has not been made by nature or does not occur in nature, or e.g. a wild type domain when used in a non-native context.
A transcription activation domain (e.g. a modified transcription activation domain) of the present invention originates from a plant or plant transcription factor (e.g. an edible plant). As used herein “originates from a plant or plant transcription factor” i.e. “is of plant or plant transcription factor origin” or “is derived from a plant or plant transcription factor” refers to a situation, wherein said transcription activation domain is a protein or polypeptide, typically transcription factor, which exists in plants. Indeed, in one embodiment of the invention the amino acid sequence of a plant activation domain or a nucleotide sequence encoding said plant activation domain has been modified. In one specific embodiment the transcription activation domain originates from an edible plant or plant species, or from a food grade plant or plant species. As used herein “a food grade plant” refers to a non-toxic plant, which is safe for consumption, and is e.g. of sufficient quality to be used for food production, food storage, or food preparation purposes.
In one embodiment, the transcription activation domain originates from Spinacia, Brassica, Ocimum or Arabidopsis, or from Spinacia oleracea, Brassica napus, Ocimum basilicum or Arabidopsis thaliana. The transcription activation domain is any transcription activation domain of plant origin, here exemplified by ten examples based on or originating from transcription factors found in Arabidopsis thaliana, Brassica napus, and Spinacia oleracea.
Many see the use of viral activation domains or viral transcription factors as a problem in synthetic expression systems. Thus, there is a strong need for highly functional activation domains, which originate from acceptable sources (e.g. as judged by public or industry). The present invention provides a non-viral transcription activation domain originating from a plant, i.e. a transcription activation domain free from any viral components. Said non-viral transcription activation domains can offer the same or improved efficiency as the current virus-based transcription activation domains.
In one embodiment the transcription activation domain is selected from the group consisting of a transcription activation domain from the plant NAC-family transcription factors (e.g. a TAF (e.g. TAF1) transcription activation domain, a JUB (e.g. JUB1) transcription activation domain), or any fragment thereof. JUB transcription activation domains refer to transcription activation domains of JUNGBRUNNEN factors. E.g. among other effects JUB1 acts as a negative regulator of senescence and a positive regulator of the tolerance to heat and salinity stress in plants.
The new activation domains can be incorporated into existing synthetic expression systems, in particular in the structure of the synthetic transcription factors of the expression systems, where they can replace the current activation domains without compromising the function of the systems. In one embodiment the transcription activation domain of the present invention is used in a structure of an artificial transcription factor or said transcription activation domain is for a synthetic expression system.
In one embodiment of the invention the transcription activation domain is functional across diverse species. In cases where the transcription activation domain is for a synthetic expression system, the synthetic expression system is functional across diverse species.
The activation domain of the present invention can be of any length, preferably less than 500 amino acids. In one embodiment the transcription activation domain has a length of 20-300 amino acids, specifically 30-250 amino acids, or more specifically 40-200 amino acids, e.g. 20-30, 31-40, 41-50, 51-60, 61-70, 71-80, 81-90, 91-100, 101-110, 111-120, 121-130, 131-140, 141-150, 151-160, 161-170, 171-180, 181-190, 191-200, 201-210, 211-220, 221-230, 231-240, 241-250, 251-260, 261-270, 271-280, 281-290, 291-300 amino acids.
In a specific embodiment the transcription activation domain comprises or consists of an amino acid sequence having 70-100%, 75-100%, 80-100, 85-100%, 90-100%, or 95-100% sequence identity, e.g. at least 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence identity, to the amino acid sequence of SEQ ID NO: 2, 3, 4, 5, 6, 7, 8, 9, 10 or 11 (no nuclear localization signals comprised within said sequences), e.g. SEQ ID NO: 3, 5, 6, 8, 9, 10 or 11.
In one embodiment the transcription activation domain comprises or consists of an amino acid sequence having 60-100%, 65-100%, 70-100%, 75-100%, 80-100, 85-100%, 90-100%, or 95-100% sequence identity, e.g. at least 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence identity, to the amino acid sequence of SEQ ID NO: 12, 13, 14, 15, 16, 17, 18, 19, 20 or 21 (nuclear localization signals comprised in the sequences), e.g. SEQ ID NO: 13, 15, 16, 18, 19, 20 or 21.
In a very specific embodiment the transcription activation domain belongs to a group of i) acidic domains (called also “acid blobs” or “negative noodles”, rich in D and E amino acids), ii) glutamine-rich domains (comprises multiple repetitions, e.g. “QQQXXXQQQ”-type repetitions), iii) proline-rich domains (comprises repetitions like “PPPXXXPPP”) or iv) isoleucine-rich domains (comprises repetitions e.g. “IIXXII”).
The present invention also concerns a polypeptide comprising the modified non-viral plant based transcription activation domain of the present invention, and a nuclear localization signal.
In one embodiment the modified activation domain of the present invention is for an artificial transcription factor. The present invention also concerns an artificial transcription factor. Generally, a transcription factors refers to a protein that binds to specific DNA sequences present in the upstream activation sequence (UAS), thereby controlling the rate of transcription, which is performed by RNA II polymerase. Transcription factors perform this function alone or with other proteins in a complex, by promoting (as an activator), or blocking (as a repressor) the recruitment of RNA polymerase to core promoters of genes. Artificial or synthetic transcription factor (sTF) refers to a protein which functions as a transcription factor but is not a native protein of a host organism. The artificial transcription factor of the present invention comprises the transcription activation domain of the present invention, a DNA-binding domain and a nuclear localization signal. In one embodiment, the DNA-binding protein of the artificial transcription factor is of prokaryotic origin. In one embodiment, the artificial transcription factor comprises a transcription activation domain of the present invention, a DNA-binding protein derived from prokaryotic, typically bacterial origin, and a nuclear localization signal, such as the SV40 NLS.
In the polypeptides or artificial transcription factors of the present invention the nuclear localization signal can be any suitable localization signal known to a person skilled in the art e.g. a SV40 nuclear localization signal or the nuclear localization signal can have an amino acid sequence comprising or consisting of PKKKRKV.
DNA-binding domain refers to the region of a protein, typically specific protein domain, which is responsible for interaction (binding) of the protein with a specific DNA sequence, such as a promoter of a target gene.
The modified transcription activation domain, polypeptide or artificial transcription factor of the present invention can be obtained from a polynucleotide encoding said modified transcription activation domain, polypeptide or artificial transcription factor, or from a polynucleotide modified to encode said modified transcription activation domain, polypeptide or artificial transcription factor.
The present invention also concerns a polynucleotide encoding the transcription activation domain, polypeptide or artificial transcription factor of the present invention.
The polynucleotide encoding the transcription activation domain, polypeptide or artificial transcription factor of the present invention may be operatively linked to any suitable promoter or controlling sequence including, but not limited to core promoter sequences, e.g. anyone presented in e.g. SEQ ID NO:s 22, 23, 25, 26, 27, 28, or any of SEQ ID NO:s 32-44, or any combination thereof.
As used herein “polynucleotide” refers to any polynucleotide, such as single or double-stranded DNA (synthetic DNA, genomic DNA, or cDNA) or RNA, comprising a nucleic acid sequence encoding a polymer of amino acids or a polypeptide in question.
Codon is a tri-nucleotide unit which is coding for a single amino acid in the genes that code for proteins. The codons encoding one amino acid may differ in any of their three nucleotides. Different organisms have different frequency of the codons in their genomes, which has implications for the efficiency of the mRNA translation and protein production.
Coding sequence refers to a DNA sequence that encodes a specific RNA or polypeptide (i.e. a specific amino acid sequence). The coding sequence could, in some instances, contain introns (i.e. additional sequences interrupting the reading frame, which are removed during RNA molecule maturation in a process called RNA splicing). If the coding sequence encodes a polypeptide, this sequence contains a reading frame.
Reading frame is defined by a start codon (AUG in RNA; corresponding to ATG in the DNA sequence), and it is a sequence of consecutive codons encoding a polypeptide (protein). The reading frame is ending by a stop codon (one of the three: UAG, UGA, and UAA in RNA; corresponding to TAG, TGA, and TAA in the DNA sequence). A person skilled in the art can predict the location of open reading frames by using generally available computer programs and databases.
Herein, the terms “polypeptide” and “protein” are used interchangeably to refer to polymers of amino acids of any length.
Variations or modifications of any one of the sequences or subsequences set forth in the description and claims are still within the scope of the invention provided that they can be used in the present invention or as activation domains for engineering of gene expressions or polynucleotides encoding said activation domains.
Identity of any sequence or fragments thereof compared to the sequence of this disclosure refers to the identity of any sequence compared to the entire sequence of the present invention. As used herein, the %identity between the two sequences is a function of the number of identical positions shared by the sequences (e.g. identity=# of identical positions/total # of positions×100), taking into account the number of gaps, and the length of each gap, which need to be introduced for optimal alignment of the two sequences. The comparison of sequences and determination of identity percentage between two sequences can be accomplished using mathematical algorithms available in the art. This applies to both amino acid and nucleic acid sequences. As an example, sequence identity may be determined by using BLAST (Basic Local Alignment Search Tools) or FASTA (FAST-AII). In the searches, setting parameters “gap penalties” and “matrix” are typically selected as default.
An expression cassette or expression system of the present invention comprises the polynucleotide encoding the transcription activation domain, polypeptide or artificial transcription factor of the present invention. In one embodiment the expression cassette further comprises a polynucleotide sequence encoding a desired product.
In one embodiment the polynucleotide encoding the modified activation domain of the present invention is for an expression cassette or expression system or the modified activation domain of the present invention is for an expression cassette or expression system.
In one embodiment the expression system comprises one or more expression cassettes, and optionally at least one expression cassette further comprises a polynucleotide sequence encoding a desired product.
An expression system of the present invention can be an orthogonal expression system, i.e. a system comprising or consisting of heterologous (non-native) core promoters, transcription factor(s), and transcription-factor-specific binding sites. Typically, the orthogonal expression system is functional (transferable) in diverse eukaryotic organisms such as eukaryotic microorganisms.
In one embodiment an expression system comprises a target gene expression cassette and/or an artificial transcription factor expression cassette comprising the activation domain of the present invention. Furthermore, the expression system can comprise e.g. one or more selection marker (SM) expression cassettes and optionally genome integration DNA regions (flanks). In one embodiment the expression system is constructed as a single DNA molecule or as two separate DNA molecules.
3 and 14 show examples of schemes of an expression system or expression cassette comprising the activation domain of the present invention e.g. for heterologous protein production.
In one embodiment a target gene expression cassette refers to a cassette, which comprises a target gene coding sequence and the sequences controlling the expression (see
In one embodiment a target gene expression cassette comprises a synthetic promoter, which comprises a variable number of sTF-binding sites, usually 1 to 10, typically 1, 2, 4 or 8, separated by 0-20, typically 5-15, random nucleotides, and a core promoter (CP); a target gene; and a terminator.
A target gene can be any DNA sequence (e.g. native or heterologous) encoding a polypeptide or a protein product of interest (see e.g. Examples 1, 4, 6, 8 and 9,
In a specific embodiment the expression system comprises at least two individual expression cassettes e.g. formed as one or more DNA molecules (e.g. two or more)
(a) a target gene expression cassette, which comprises a synthetic promoter, which comprises a variable number of sTF-binding sites, usually 1 to 10, typically 1, 2, 4 or 8, separated by 0-20, typically 5-15, random nucleotides, and a CP; a target gene; and a terminator, and
(b) an artificial transcription factor cassette, which comprises a CP controlling expression of a gene encoding a fusion protein (artificial transcription factor, sTF), the artificial transcription factor itself (sTF), and a terminator.
A selection marker (SM) expression cassette is any expression cassette allowing production of a specific protein in a host organism, which provides to the host organism means to grown under selection conditions, such as in presence of an antibiotic compound or an absence of essential metabolite. In one embodiment of the invention the SM cassette can be an expression cassette allowing expression of the pyr4 gene (encoding orotidine 5′-phosphate decarboxylase enzyme) e.g. in Trichoderma reesei strain (see e.g. Examples 1 and 3), the pyrG gene (encoding orotidine 5′-phosphate decarboxylase enzyme) e.g. in Aspergillus oryzae strain (see e.g. Example 7), the hygR gene (encoding Hygromycin-B 4-O-kinase) e.g. in Myceliophthora thermophila strain (see e.g. Example 5), the URA3 gene (encoding orotidine 5′-phosphate decarboxylase enzyme) e.g. in Pichia pastoris strain (see e.g. Example 4), A (encoding aminoglycoside phosphotransferase enzyme) e.g. in Pichia pastoris strain (see e.g. Example 4), the pac gene (encoding puromycin N-acetyltransferase enzyme) e.g. in CHO cells (see e.g. Example 6), kanR gene (encoding aminoglycoside phosphotransferase enzyme) e.g. in Pichia pastoris strain (see e.g. Example 8), and/or NAT gene (encoding nourseothricin N-acetyl transferase) e.g. in Yarrowia lipolytica or Cutaneotrichosporon oleaginosus strain (see e.g. Examples 8 and 9).
When an expression system is constructed as two separate DNA molecules, the first DNA can comprise or can be composed of an artificial transcription factor expression cassette comprising the activation domain of the present invention, and optionally a selection marker (SM) expression cassette and/or genome integration DNA regions (flanks); and the second DNA can comprise or be composed of a target gene expression cassette, and optionally a selection marker (SM) expression cassette and/or genome integration DNA regions (flanks). Each cassette can be integrated into separate locus of the host genome, together forming a functional gene expression system.
The genome integration DNA regions (flanks) used in the present invention can be selected from any genomic loci present in the productions hosts, e.g. the genomic DNA sequences from Trichoderma reesei located upstream of the egl1 gene (EGL1-5′) and downstream of the egl1 gene (EGL1-3′) (see e.g. Example 5), e.g. the genomic DNA sequences from Pichia pastoris located upstream of the URA3 gene (URA3-5′) and downstream of the URA3 gene (URA3-3′) (see e.g. Example 4) and genomic DNA sequences from Pichia pastoris located upstream of the AOX2 gene (AOX2-5′) and downstream of the AOX2 gene (AOX2-3′) (see e.g. Example 4), or e.g. the genomic DNA sequences from Aspergillus oryzae located upstream of the gaaC gene (gaaC-5′) and downstream of the gaaC gene (gaaC-3′) (see e.g. Example 7) and genomic DNA sequences from Aspergillus oryzae located upstream of the gluC gene (gluC-5′) and downstream of the gluC gene (gluC-3′) (see e.g. Example 7), or e.g. the genomic DNA sequences for targeting the ADE1 gene of Pichia pastoris or the anti gene of Y. lipolytica (examples 8 and 9).
In one specific embodiment of the present invention the expression system e.g. for a eukaryotic or microorganism host, which comprises: (a) an expression cassette comprising a core promoter, said core promoter being the only “promoter” controlling the expression of a DNA sequence encoding the activation domain or artificial transcription factor (sTF) of the present invention, and (b) one or more expression cassettes each comprising a target gene sequence encoding a desired protein product operably linked to a synthetic promoter, said synthetic promoter comprising a core promoter identical to (a) or another core promoter, and activation domain or sTF-specific binding sites upstream of the core promoter.
Eukaryotic promoter is a region of DNA necessary for initiation of transcription of a gene. It is upstream of a DNA sequence encoding a specific RNA or polypeptide (coding sequence). It contains an upstream activation sequence (UAS) and a core promoter. A person skilled in the art can predict the location of a promoter by using generally available computer programs and databases.
Core promoter (CP) is a part of a (eukaryotic) promoter and it is a region of DNA immediately upstream (5′-upstream region) of a coding sequence which encodes a polypeptide, as defined by the start codon. The core promoter comprises all the general transcription regulatory motifs necessary for initiation of transcription, such as a TATA-box, but does not comprise any specific regulatory motifs, such as UAS sequences (binding sites for native activators and repressors).
The selection of the CPs can be based on the level of expression of the genes in the selected organisms, containing the candidate CP in their promoters. Another selection criterion can be the presence of a TATA-box in the candidate CP. In one embodiment the screen for functional CPs to be used in the present invention is advantageously performed by in vivo assembling the candidate CP with the sTF-dependent reporter cassette expressed in an organism, e.g. in S. cerevisiae strain, constitutively expressing the sTF. The resulting strains are tested for a level of a reporter, preferably fluorescence, and these levels are compared to a control strain.
The core promoter (CP) typically comprises a DNA sequence containing the 5′-upstream region of a eukaryotic gene, starting 10-50 bp upstream of a TATA-box and ending 9 bp upstream of the ATG start codon. In one embodiment the distance between the TATA-box and the start codon is no greater than 180 bp and no smaller than 80 bp. The core promoter typically comprises also a DNA sequence comprising random 1-20 bp at its 3′-end. In one embodiment the core promoter comprises a DNA sequence having at least 90% sequence identity to said 5′-upstream region of a eukaryotic gene, and a DNA sequence comprising random 1-20 bp at its 3′-end.
In one embodiment the core promoter is a DNA sequence containing: 1) a 5′-upstream region of a highly expressed gene starting 10-50 bp upstream of the TATA box and ending 9 bp upstream of the start codon, where the distance between the TATA box and the start codon is no greater than 180 bp and no smaller than 80 bp, 2) random 1-20 bp, typically 5 to 15 or 6 to 10, which are located in place of the 9bp of the DNA region (1) immediately upstream of the start codon; or a DNA sequence containing: 1) a DNA sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity to said 5′-upstream region and 2) random 1-20 bp, typically 5 to 15 or 6 to 10, which are located in place of the 9bp of the DNA region (1) immediately upstream of the start codon.
As used in the above chapter “highly expressed gene” in an organism is a gene which has been shown in that organism to be expressed among the top 3% or 5% of all genes in any studied condition as determined by transcriptomics analysis, or a gene, in an organism where the transcriptomics analysis has not been performed, which is the closest sequence homologue to the highly expressed gene.
TATA-box refers to a DNA sequence (TATA) upstream of the start codon, where the distance of the TATA sequence and the start codon is no greater than 180 bp and no smaller than 80 bp. In case of multiple sequences fulfilling the description, the TATA-box is defined as the TATA sequence with smallest distance from the start codon.
The core promoters (CPs) used in the expression system or one or several expression cassettes of the present invention can be different or identical with each other, e.g. the first one, CP1, can be identical to the second one CP2, (or the third one CP3, or the fourth one CP4—in the expression systems composed of multiple expression cassettes), or the first one, CP1, can be different from the second one, CP2.
In one embodiment one or more CPs are universal core promoters functional in diverse eukaryotic organisms. In one embodiment of the present invention, e.g. Tr_hfb2cp (SEQ ID NO: 25), An_008cp (SEQ ID NO: 22), or YI_242cp (SEQ ID NO: 33) can be used for controlling the expression of the sTF in several organisms, e.g. Trichoderma reesei (see e.g. Examples 1 and 3 and 8), Aspergillus oryzae (see e.g. Example 7), Myceliophthora thermophila strain (see e.g. Example 5), Pichia pastoris (see e.g. Example 8) or Yarrowia lipolytica (see e.g. Example 8). In another embodiment of the present invention, e.g. An_201cp (SEQ ID NO: 23) can be used for controlling the expression of the target gene in conjunction with upstream located sTF-binding sites in several organisms, e.g. Pichia pastoris (see e.g. Example 4 and 8), Trichoderma reesei (see e.g. Examples 1 and 3 and 8), Aspergillus oryzae (see e.g. Example 7), Myceliophthora thermophila strain (see e.g. Example 5) or Yarrowia lipolytica (example 8). Also, other CPs suitable for the present invention include but are not limited to An_008cp (SEQ ID NO: 22) (e.g. in Pichia pastoris, see example 4), Mm_Atp5Bcp (SEQ ID NO: 26) (e.g. in Trichoderma reesei or CHO cells, see examples 1 and 6), Mm_Eef2cp (SEQ ID NO: 27) (e.g. in Trichoderma reesei or CHO cells, see examples 1 and 6), Mm_Rpl4cp (SEQ ID NO: 28), any CP of SEQ ID NO:s 32-44, or any combination thereof.
The sTF-binding sites and a core promoter (e.g. eight Bm3R1-specific binding sites and An_201cp;
A synthetic promoter refers to a region of DNA which functions as a eukaryotic promoter, but it is not a naturally occurring promoter of a host organism. It contains an upstream activation sequence (UAS) and a core promoter, wherein the UAS, or the core promoter, or both elements, are not native to the host organism. In one embodiment of the invention, the synthetic promoter comprises (usually 1-10, typically 1, 2, 4 or 8) sTF-specific binding sites (synthetic UAS—sUAS) linked to a core promoter. In one embodiment of the invention sTF-binding sites and the core promoter form a synthetic promoter, which strongly activates the transcription of a target gene, in the presence of an artificial transcription factor capable of binding sTF binding sites. It is also possible to construct multiple synthetic promoters with different numbers of binding sites (usually 1-10, typically 1, 2, 4 or 8, separated by 0-20, typically 5-15 random nucleotides) controlling different target genes simultaneously by one sTF. This would for instance result in a set of differently expressed genes forming a metabolic pathway.
Two or more expression cassettes can be introduced to a eukaryotic host (typically integrated into a genome) as two or more individual DNA molecules, or as one DNA molecule in which the two or more expression cassettes are connected (fused) to form a single DNA.
In one embodiment, the present invention provides tools for expression systems not dependent on the intrinsic transcriptional regulation of the expression host.
The tuning of the expression system for different expression levels of at least target genes and/or transcription factors can be carried out in a host organism where a multitude of options, including choices of CPs, sTFs, different numbers of BSs, and target genes, can be tested.
The present invention concerns a non-viral transcription activation domain, which can be used in a eukaryotic host. In one embodiment the polypeptide, artificial transcription factor, polynucleotide, expression cassette or expression system of the present invention is for a eukaryotic host. A eukaryotic host of the present invention comprises the transcription activation domain, polypeptide, artificial transcription factor, polynucleotide, expression cassette or expression system of the present invention.
A eukaryotic (production) host suitable for the present invention can be selected from the group consisting of:
1) Fungal kingdom, including yeast, such as classes Saccharomycetales, including but not limited to species Saccharomyces cerevisiae, Kluyveromyces lactis, Candida krusei (Pichia kudriavzevii), Pichia pastoris (Komagataella pastoris), Pichia kudriavzevii, Eremothecium gossypii, Kazachstania exigua, Yarrowia lipolytica, Zygosaccharomyces lentus, and others; or Schizosaccharomycetes, such as Schizosaccharomyces pombe; filamentous fungi, such as classes Eurotiomycetes, including but not limited to species Aspergillus niger, Aspergillus nidulans, Aspergillus oryzae, Penicillium chrysogenum, and others; Sordariomycetes, including but not limited to species Trichoderma reesei, Myceliophthora thermophila, and others; or Mucorales, such as Mucor indicus and others;
2) Animal kingdom, including but not limited to mammals (Mammalia) and cells thereof, including but not limited to species Mus musculus (mouse), Cricetulus griseus (hamster), Homo sapiens (human), and others; insects, including but not limited to species Mamestra brassicae, Spodoptera frugiperda, Trichoplusia ni, Drosophila melanogaster, and others.
In one embodiment the eukaryotic host is selected from the group consisting of a cell of fungal species including yeast and filamentous fungi, and a cell of animal species including mammals (e.g. non-human mammals); or from the group consisting of a cell of Trichoderma, Trichoderma reesei, Pichia, Pichia pastoris, Pichia kudriavzevii, Aspergillus, Aspergillus oryzae, Aspergillus niger, Myceliophthora, Myceliophthora thermophila, Saccharomyces, Saccharomyces cerevisiae, Yarrowia, Yarrowia lipolytica, Cutaneotrichosporon, Cutaneotrichosporon oleaginosus (Trichosporon oleaginosus, Cryptococcus curvatus), Zygosaccharomyces, Chinese hamster ovary (CHO) cells, and Cricetulus griseus.
A method for producing a desired protein product in a eukaryotic host comprises cultivating the host under suitable cultivation conditions. By “suitable cultivation conditions” are meant any conditions allowing survival or growth of the host organism, and/or production of the desired product in the host organism. A desired product can be a product of the target polynucleotide (i.e. a polypeptide or protein), or a compound produced by a polypeptide or protein or by a metabolic pathway. In the present context the desired product is typically a protein product.
The present invention also concerns use of the transcription activation domain, polypeptide, artificial transcription factor, polynucleotide, expression cassette, expression system or eukaryotic host for metabolic engineering and/or production of a desired protein product. As used herein “metabolic engineering” refers to controlling or optimizing genetic or regulatory processes within a cell. Metabolic engineering allows e.g. modified production of a desired protein product in a cell.
The tools of the present invention speed up the process of industrial host development and enable the use of novel hosts which have high potential for specific purposes, but very limited spectrum of tools for genetic engineering.
The present invention also relates to a method of preparing a non-viral transcription activation domain of the present invention or a polynucleotide encoding said non-viral transcription activation domain, wherein said method comprises obtaining a transcription activation domain polypeptide originating from a plant transcription factor or obtaining a polynucleotide encoding said transcription activation domain polypeptide originating from a plant transcription factor, and modifying the obtained transcription activation domain polypeptide or polynucleotide. Methods of modifying polypeptides are well known to a person skilled in the art and include but are not limited to e.g. methods causing a deletion, substitution, disruption or insertion of one or more amino acids or parts of a polypeptide, or insertion of one or more modified amino acids. Methods of modifying polynucleotides are also well known to a person skilled in the art and include but are not limited to e.g. methods causing a deletion, substitution, disruption or insertion of one or more nucleic acids or parts of a polynucleotide, or insertion of one or more modified nucleic acids. A modification of a polypeptide can be obtained e.g. by modifying the polynucleotide encoding the polypeptide by any genetic method. Methods for making genetic modifications are generally well known and are described in various practical manuals describing laboratory molecular techniques. Some examples of the general procedure and specific embodiments are described in the Examples chapter. In one specific embodiment of the invention a modified non-viral transcription activation domain has been obtained by rational mutagenesis or random mutagenesis of the polynucleotide encoding said transcription activation domain.
It will be obvious to a person skilled in the art that, as the technology advances, the inventive concept can be implemented in various ways. The invention and its embodiments are not limited to the examples described below but may vary within the scope of the claims.
The reporter expression systems for testing different transcription activation domains were constructed as single DNA molecules (plasmids) (
The sTF coding regions of all the plasmids contained the same DNA-binding-domain (DBD; Bm3R1 transcriptional regulator from Bacillus megaterium; NCBI Reference Sequence: WP_013083972.1; encoding DNA codon optimized for Aspergillus niger; sequence shown in Table 1A and 1B), and SV40 NLS. The transcription activation domains (AD) were selected from plant transcription factors available in public databases and the corresponding protein encoding DNA were codon optimized for T. reesei. Following protein sequences were selected and used:
Trichoderma reesei strain M1909 (VTT culture collection) was used as the parental strain. This strain is a mutagenized version of the QM9414 strain and it contains additional deletions including deletion of the pyr4 gene-rendering the uracil auxotrophy of the strain. The reporter expression systems (
The correct strains were selected by qPCR of the genomic DNA of each transformed strain. The qPCR signal of the mCherry gene was compared to a qPCR signal of a unique native sequence in each host. In addition the correct deletion of the egl1 gene was confirmed by absent qPCR signal of the egl1 target. The selected strains were sporulated on PDA agar plates (39 g/L BD-Difco Potato dextrose agar). Spores (conidia) were collected from the PDA plates, and used as inoculum in liquid cultivations for the fluorescence analysis.
For the quantitative fluorometry analysis of the mCherry production in the mycelia of the tested strains (
CATTCCGGACTCTAGATAAGCACGGAATGAACTTTCATTCCGCTGAAGCTTGTCAATCGGAATGAAGGTTCAT
CTCCAAGTTCTATCTAACCAGCCATCCTACACTCTACATATCCACACCAATCTACTACAATTAATTAAAATGGTG
AGCAAGGGCGAGGAGGATAACATGGCCATCATCAAGGAGTTCATGCGCTTCAAGGTGCACATGGAGGGCT
CCGTGAACGGCCACGAGTTCGAGATCGAGGGCGAGGGCGAGGGCCGCCCCTACGAGGGCACCCAGACCG
CCAAGCTGAAGGTGACCAAGGGTGGCCCCCTGCCCTTCGCCTGGGACATCCTGTCCCCTCAGTTCATGTAC
GGCTCCAAGGCCTACGTGAAGCACCCCGCCGACATCCCCGACTACTTGAAGCTGTCCTTCCCCGAGGGCTT
CAAGTGGGAGCGCGTGATGAACTTCGAGGACGGCGGCGTGGTGACCGTGACCCAGGACTCCTCCCTGCAG
GACGGCGAGTTCATCTACAAGGTGAAGCTGCGCGGCACCAACTTCCCCTCCGACGGCCCCGTAATGCAGA
AGAAGACCATGGGCTGGGAGGCCTCCTCCGAGCGGATGTACCCCGAGGACGGCGCCCTGAAGGGCGAGA
TCAAGCAGAGGCTGAAGCTGAAGGACGGCGGCCACTACGACGCTGAGGTCAAGACCACCTACAAGGCCA
AGAAGCCCGTGCAGCTGCCCGGCGCCTACAACGTCAACATCAAGTTGGACATCACCTCCCACAACGAGGA
CTACACCATCGTGGAACAGTACGAACGCGCCGAGGGCCGCCACTCCACCGGCGGCATGGACGAGTTATAC
AAG
TAATGAGGATCTCCCGGCATGAAGTCTGACCGGGTAGTATGAGGGTTCATCGTCACCTTGATAGAATAAT
AGACGATAAAGCAGGCCACGGGCAGGTACCGATTGTCAATCCGGCAGGTTAGGAGGCGTGTTGGAAATGAGT
TTATGGGTTATGGTCAAATCGGATAGTATGAGGTACATAGTTTGTAAATCTCAAGATTATTTTCTTCCTTAATCT
TGCACGTCGCATGAGAGGGACCGAGAAGAGAATTGATGAAGGGCTCTTGAAGATGAGATGAATCACGTGGTT
GCTGAAGCTTCAGTAGTCTCGGGTACCTGTTCTTTCCCACAAACAGTAGCCAGGCTAGAGGTACTGAGTACCC
GCTCACCGTATCTAATCATCCGACCTGAAATCTTCAAGCTGTTTTATTGACACTTCGAGTCCATCTTCATTCAC
GTAAGGAGAACTTCTAGGACATCACTTATCCCGCCATATTTAGCTGCAAGGAGTCAATTGCAATGTCAGATTCC
GCTCCTAAGAGGAAACAGGGCCCTGGCGGCTCAGATGGCTCGGCATTGAAGAAGAGAAAGGTATGATGACAA
GAATGCTTGCTACAAATTACCCAGTAGCCGGGCACTAACAGCTCCCTGGCCTAGGTAGACTACCTACCTCAAG
GTACGACACATGGCAGCACTGGAGGGGGAATAGGCAGACTGGACGACAGTGGACAAGATACGGTCGCACAA
CCTTTGTCGTGGCATCGCGAGAATAATCGTCACAAGCTTCACGTATGCAGACGGAGACAAGATGATTTGGTTG
TCGAAGTCATGAATTCACTTCTATCTAGTTTTTTTGTTCCCTTTTGTTTTGCATTCCCAGAGAAGTTCTGATGGA
ACCCTTATTCCCAGCCTCTCAATTAACGTGCCTCGATTCATAGTCGAGTGCTCATGCATAGCAACATTGATCGT
TTCGTCGTAGAAGTGAGCGCATGGTGGTGCCCACCTGGAGAAACCTCACGAGGGACCCCAGAACATCAGGT
GTTGATGATGGGTATCGCGGCCGGCCTTA
TGTTATAAGTGGTGATGGTTGGTATTCAACAAAGA
ATGTTTGTGTTTGGAGAGTTGAGAAAGAGGAGTTGAGTGAATGTGGTGATGGTTGTAGATGAGTGTGCTGATG
AGGATGGAAAAGATTGTTGGATGGCGGGAATCGAGGTCTTCTTTATACTTTTTTTTCTGGCCCTCTTCATCTTC
CAGCTCTCGCAGGCTGTTGCTAGAAATCTCGACGCGCAATTAACCCTCACGGGCGCGGCCGC
CATTCCGGACTCTAGATAAGCACGGAATGAACTTTCATTCCGCTGAAGCTTGTCAATCGGAATGAAGGTTCAT
TCCGGCTAGTCGGAATGAACATTCATTCCGAGACCTAGGATGTGACGGAATGAAGGTTCATTCCGGACTCTA
CTCCAAGTTCTATCTAACCAGCCATCCTACACTCTACATATCCACACCAATCTACTACAATTAATTAAAATGGTG
AGCAAGGGCGAGGAGGATAACATGGCCATCATCAAGGAGTTCATGCGCTTCAAGGTGCACATGGAGGGCT
CCGTGAACGGCCACGAGTTCGAGATCGAGGGCGAGGGCGAGGGCCGCCCCTACGAGGGCACCCAGACCG
CCAAGCTGAAGGTGACCAAGGGTGGCCCCCTGCCCTTCGCCTGGGACATCCTGTCCCCTCAGTTCATGTAC
GGCTCCAAGGCCTACGTGAAGCACCCCGCCGACATCCCCGACTACTTGAAGCTGTCCTTCCCCGAGGGCTT
CAAGTGGGAGCGCGTGATGAACTTCGAGGACGGCGGCGTGGTGACCGTGACCCAGGACTCCTCCCTGCAG
GACGGCGAGTTCATCTACAAGGTGAAGCTGCGCGGCACCAACTTCCCCTCCGACGGCCCCGTAATGCAGA
AGAAGACCATGGGCTGGGAGGCCTCCTCCGAGCGGATGTACCCCGAGGACGGCGCCCTGAAGGGCGAGA
TCAAGCAGAGGCTGAAGCTGAAGGACGGCGGCCACTACGACGCTGAGGTCAAGACCACCTACAAGGCCA
AGAAGCCCGTGCAGCTGCCCGGCGCCTACAACGTCAACATCAAGTTGGACATCACCTCCCACAACGAGGA
CTACACCATCGTGGAACAGTACGAACGCGCCGAGGGCCGCCACTCCACCGGCGGCATGGACGAGTTATAC
AAG
TAATGAGGATCTCCCGGCATGAAGTCTGACCGGGTAGTATGAGGGTTCATCGTCACCTTGATAGAATAAT
AGACGATAAAGCAGGCCACGGGCAGGTACCGATTGTCAATCCGGCAGGTTAGGAGGCGTGTTGGAAATGAGT
TTATGGGTTATGGTCAAATCGGATAGTATGAGGTACATAGTTTGTAAATCTCAAGATTATTTTCTTCCTTAATCT
TGCACGTCGCATGAGAGGGACCGAGAAGAGAATTGATGAAGGGCTCTTGAAGATGAGATGAATCACGTGGTT
GCTGAAGCTTCAGTAGTCTCGGGTACCTGTTCTTTCCCACAAACAGTAGCCAGGCTAGAGGTACTGAGTACCC
GCTCACCGTATCTAATCATCCGACCTGAAATCTTCAAGCTGTTTTATTGACACTTCGAGTCCATCTTCATTCAC
GTAAGGAGAACTTCTAGGACATCACTTATCCCGCCATATTTAGCTGCAAGGAGTCAATTGCAATGTCAGATTCC
GCTCCTAAGAGGAAACAGGGCCCTGGCGGCTCAGATGGCTCGGCATTGAAGAAGAGAAAGGTATGATGACAA
GAATGCTTGCTACAAATTACCCAGTAGCCGGGCACTAACAGCTCCCTGGCCTAGGTAGACTACCTACCTCAAG
GTACGACACATGGCAGCACTGGAGGGGGAATAGGCAGACTGGACGACAGTGGACAAGATACGGTCGCACAA
CCTTTGTCGTGGCATCGCGAGAATAATCGTCACAAGCTTCACGTATGCAGACGGAGACAAGATGATTTGGTTG
TCGAAGTCATGAATTCACTTCTATCTAGTTTTTTTGTTCCCTTTTGTTTTGCATTCCCAGAGAAGTTCTGATGGA
ACCCTTATTCCCAGCCTCTCAATTAACGTGCCTCGATTCATAGTCGAGTGCTCATGCATAGCAACATTGATCGT
TTCGTCGTAGAAGTGAGCGCATGGTGGTGCCCACCTGGAGAAACCTCACGAGGGACCCCAGAACATCAGGT
GTTGATGATGGGTATCGCGGCCGGCCCTA
TGTTATAAGTGGTGATGGTTGGTATTCAACAAAGAATGTTTGTGTTTGGA
GAGTTGAGAAAGAGGAGTTGAGTGAATGTGGTGATGGTTGTAGATGAGTGTGCTGATGAGGATGGAAAAGAT
TGTTGGATGGCGGGAATCGAGGTCTTCTTTATACTTTTTTTTCTGGCCCTCTTCATCTTCCAGCTCTCGCAGGC
TGTTGCTAGAAATCTCGACGCGCAATTAACCCTCACGGGCGCGGCCGC
ACTATAAATCAACCACTTTCCCTCCTCCCCCCCGCCCCCACTTGGTCGATTCTTCGTTTTCTCTCTACCTTCTTT
CTATTCGGTTTTCTTCTTCTTTTATTTTCCCTCTCCCATCAATCAAATTCATATTTGAAAAAAATTAACATTAATAA
ATATCTACA
TGAGGCCGGCCG
CGATACCCATCATCAACACCTGATGTTCTGGGGTCCCTCGTGAGGTTTCTCCAGGTGGGCACCACCATGCGC
TCACTTCTACGACGAAACGATCAATGTTGCTATGCATGAGCACTCGACTATGAATCGAGGCACGTTAATTGAG
AGGCTGGGAATAAGGGTTCCATCAGAACTTCTCTGGGAATGCAAAACAAAAGGGAACAAAAAAACTAGATAGA
AGTGAATTCATGACTTCGACAACCAAATCATCTTGTCTCCGTCTGCATACGTGAAGCTTGTGACGATTATTCTC
GCGATGCCACGACAAAGGTTGTGCGACCGTATCTTGTCCACTGTCGTCCAGTCTGCCTATTCCCCCTCCAGTG
CTGCCATGTGTCGTACCTTGAGGTAGGTAGTCTACCTAGGCCAGGGAGCTGTTAGTGCCCGGCTACTGGGTA
ATTTGTAGCGCTGGAGCG
ACTATAAATCAACCACTTTCCCTCCTCCCCCCCGCCCCCACTTGGTCGATTCTTCGTTTTCTCTCTACCTTCTTT
CTATTCGGTTTTCTTCTTCTTTTATTTTCCCTCTCCCATCAATCAAATTCATATTTGAAAAAAATTAACATTAATAA
ATATGTACA
TGAGGCCGGCCGCGATACCCATCATCAACACCTGATGTTCTGGGGTCCCTCGTGAGGTTTCTCCAGGTGGG
CACCACCATGCGCTCACTTCTACGACGAAACGATCAATGTTGCTATGCATGAGCACTCGACTATGAATCGAGG
CACGTTAATTGAGAGGCTGGGAATAAGGGTTCCATCAGAACTTCTCTGGGAATGCAAAACAAAAGGGAACAAA
AAAACTAGATAGAAGTGAATTCATGACTTCGACAACCAAATCATCTTGTCTCCGTCTGCATACGTGAAGCTTGT
GACGATTATTCTCGCGATGCCACGACAAAGGTTGTGCGACCGTATCTTGTCCACTGTCGTCCAGTCTGCCTAT
TCCCCCTCCAGTGCTGCCATGTGTCGTACCTTGAGGTAGGTAGTCTACCTAGGCCAGGGAGCTGTTAGTGCC
CGGCTACTGGGTAATTTGTAGCGCTGGAGCG
CGTACCGTATCGTTAAGGTAGACCTAGGATGTGAATGATACGAAACGTACCGTATCGTTAAGGTGACTCTAG
TCGTTAAGGTGCTAGTATGATACGAAACGTACCGTATCGTTAAGGTAGACCTAGGATGTGAATGATACGAAA
CGTACCGTATCGTTAAGGTGACTCTAGATAAGCCATGATACGAAACGTACCGTATCGTTAAGGTCTGAAGCT
GGCGCCGTCACGTGACGCACCCAACCGGCGTTGACCTATAAAAGGCCGGGCGTTGACGTCAGCGGTCTCTT
CCGCCGCAGCCGCCGCCATCGTCGGCGCGCTTCCCTGTTCACCTCTGACTCTGAGAATCCGTCGCCATCCG
CCACCATGGTGAGCAAGGGCGAGGAGGATAACATGGCCATCATCAAGGAGTTCATGCGCTTCAAGGTGCA
CATGGAGGGCTCCGTGAACGGCCACGAGTTCGAGATCGAGGGCGAGGGCGAGGGCCGCCCCTACGAGGG
CACCCAGACCGCCAAGCTGAAGGTGACCAAGGGTGGCCCCCTGCCCTTCGCCTGGGACATCCTGTCCCCT
CAGTTCATGTACGGCTCCAAGGCCTACGTGAAGCACCCCGCCGACATCCCCGACTACTTGAAGCTGTCCTT
CCCCGAGGGCTTCAAGTGGGAGCGCGTGATGAACTTCGAGGACGGCGGCGTGGTGACCGTGACCCAGGA
CTCCTCCCTGCAGGACGGCGAGTTCATCTACAAGGTGAAGCTGCGCGGCACCAACTTCCCCTCCGACGGC
CCCGTAATGCAGAAGAAGACCATGGGCTGGGAGGCCTCCTCCGAGCGGATGTACCCCGAGGACGGCGCC
CTGAAGGGCGAGATCAAGCAGAGGCTGAAGCTGAAGGACGGCGGCCACTACGACGCTGAGGTCAAGACC
ACCTACAAGGCCAAGAAGCCCGTGCAGCTGCCCGGCGCCTACAACGTCAACATCAAGTTGGACATCACCT
CCCACAACGAGGACTACACCATCGTGGAACAGTACGAACGCGCCGAGGGCCGCCACTCCACCGGCGGCA
TGGACGAGCTGTACAAGTCCGGACTCAGATCTCGAGCTCAAGCTTCGAATTCTGCAGTCGACGGTACCGCG
GGCCCGGGATCCACCGGATCTAGA
TAACTGATCATAATCAGCCATACCACATTTGTAGAGGTTTTACTTGCTT
TAAAAAACCTCCCACACCTCCCCCTGAACCTGAAACATAAAATGAATGCAATTGTTGTTGTTAACTTGTTTATTG
CAGCTTATAATGGTTACAAATAAAGCAATAGCATCACAAATTTCACAAATAAAGCATTTTTTCACTGCATTCTA
GTTGTGGTTTGTCCAAACTCATCAATGTATCTTAACGCGTAAATTGTAAGCGTTAATATTTTGTTAAAATTCGCG
TTAAATTTTTGATTTAAATGGCGCGCCGCCGTCACTGACCCAGTCAAAGGCACACAAGCAGCGACACCCAGGA
GTGTGTTCCCACGACAGTCTAGCATGTAACTCAGAACCAAGAGTACTTAATAGTCCTGCCTGAAAACACCTGT
ATTTTACGATCTTTCCCAAACTAAGGAGTTTAATAAACGTGAATATTCTTTTAGGTGTTTCAGTGTGATTAGTATA
ACTGGCGGTGAAGCAACTGGAAGCTGGAATGCTTATCCTCAATCACAAAGAAAAGAAGCTGGGTACCAAAATT
CTTTATTTGAAGAAATGGTACAAATTAAAGAACTTAAGCAGATGTTTTGGTGCAACTTATAGAAAAGATGAAGG
CAGCCTGACATGCATGCACTGCCTCAGTGACCAGTAAAGTCACGTGGCTTTGGGGAAGTTA
GGCGGAATCCGGGTGGAGACTGAGCGCCGAAGCGGTCCTCTCCGCCGGTCCTGCAGCTGGGGCGGGGCAA
CCTCCGCCGTAGGCACAGTAATTGGGTGATTTTGCTGTTCGTCATCACCACTAACGCTTCTATAGGGTAAAAA
AACTCGGAGCTTATCAGCTATTGGTCTAAACTGGTGCCAATGGCGCGCCACGTCCGAGGGCGGCCGC
GTCTAATTGACAAGCTTCAGATAGACTTGAGTGTCTAGGCTTATCTAGAGTCATAGACAGAGCAGTCTATCAC
AGTCAGTCTAGGCTTATCTAGAGTCATAGACACGCTTGTCTATCACATCCTAGGTCTATAGACTGAATCGTCT
ACCTACTTGAGCAAATGCCTGATTGGCACCAGTTTAGACCAATAGCTGATAAGCTCCGAGTTTTTTTACCCTAT
AGAAGCGTTAGTGGTGATGACGAACAGCAAAATCACCCAATTACTGTGCCTACGGCGGAGGTTGCCCCGCCC
CAGCTGCAGGACCGGCGGAGAGGACCGCTTCGGCGCTCAGTCTCCACCCGGATTCCGCCATGGTGAGCAA
GGGCGAGGAGGATAACATGGCCATCATCAAGGAGTTCATGCGCTTCAAGGTGCACATGGAGGGCTCCGTG
AACGGCCACGAGTTCGAGATCGAGGGCGAGGGCGAGGGCCGCCCCTACGAGGGCACCCAGACCGCCAAG
CTGAAGGTGACCAAGGGTGGCCCCCTGCCCTTCGCCTGGGACATCCTGTCCCCTCAGTTCATGTACGGCTC
CAAGGCCTACGTGAAGCACCCCGCCGACATCCCCGACTACTTGAAGCTGTCCTTCCCCGAGGGCTTCAAGT
GGGAGCGCGTGATGAACTTCGAGGACGGCGGCGTGGTGACCGTGACCCAGGACTCCTCCCTGCAGGACG
GCGAGTTCATCTACAAGGTGAAGCTGCGCGGCACCAACTTCCCCTCCGACGGCCCCGTAATGCAGAAGAA
GACCATGGGCTGGGAGGCCTCCTCCGAGCGGATGTACCCCGAGGACGGCGCCCTGAAGGGCGAGATCAA
GCAGAGGCTGAAGCTGAAGGACGGCGGCCACTACGACGCTGAGGTCAAGACCACCTACAAGGCCAAGAA
GCCCGTGCAGCTGCCCGGCGCCTACAACGTCAACATCAAGTTGGACATCACCTCCCACAACGAGGACTAC
ACCATCGTGGAACAGTACGAACGCGCCGAGGGCCGCCACTCCACCGGCGGCATGGACGAGCTGTACAAGT
CCGGACTCAGATCTCGAGCTCAAGCTTCGAATTCTGCAGTCGACGGTACCGCGGGCCCGGGATCCACCGG
ATCTAGA
TAACTGATCATAATCAGCCATACCACATTTGTAGAGGTTTTACTTGCTTTAAAAAACCTCCCACACCT
CCCCCTGAACCTGAAACATAAAATGAATGCAATTGTTGTTGTTAACTTGTTTATTGCAGCTTATAATGGTTACAA
ATAAAGCAATAGCATCACAAATTTCACAAATAAAGCATTTTTTTCACTGCATTCTAGTTGTGGTTTGTCCAAACT
CATCAATGTATCTTAACGCGTAAATTGTAAGCGTTAATATTTTGTTAAAATTCGCGTTAAATTTTTGATTTAAATG
GCGCGCCGCCGTCACTGACCCAGTCAAAGGCACACAAGCAGCGACACCCAGGAGTGTGTTCCCACGACAGT
CTAGCATGTAACTCAGAACCAAGAGTACTTAATAGTCCTGCCTGAAAACACCTGTATTTTACGATCTTTCCCAA
ACTAAGGAGTTTAATAAACGTGAATATTCTTTTAGGTGTTTCAGTGTGATTAGTATAACTGGCGGTGAAGCAAC
TGGAAGCTGGAATGCTTATCCTCAATCACAAAGAAAAGAAGCTGGGTACCAAAATTCTTTATTTGAAGAAATGG
TACAAATTAAAGAACTTAAGCAGATGTTTTGGTGCAACTTATAGAAAAGATGAAGGCAGCCTGACATGCATGCA
CTGCCTCAGTGACCAGTAAAGTCACGTGGCTTTGGGGAAGTTA
GGTGGCGGATGGCGACGGATTCTCAGAGTCAGAGGTGAACAGGGAAGCGCGCCGACGAT
GGCGGCGGCTGCGGCGGAAGAGACCGCTGACGTCAACGCCCGGCCTTTTATAGGTCAACGCCGGTTGGGTG
CGTCACGTGACGGCGCCGGGTGCTCGTCCGGGGCGCGCCACGTCCGAGGGCGGCCGC
To increase the activity of plant-based transcription activation domains, rational mutagenesis was performed on two selected activation domains derived from transcription factors found in edible plant species: spinach (Spinacia oleracea) and rapeseed/canola (Brassica napus). The So_NAC102-AD and Bn_TAF1-AD (Example 1) contain significant amounts of acidic (glutamate and aspartate) and hydrophobic (leucine, isoleucine, phenylalanine) amino acids, which indicates that they could belong to a group of acidic/hydrophobic transcription activation domains, which are typically enriched with these types of amino acids. There are, however, some basic amino acids (lysine and arginine) present in the native sequences of these activation domains. Some of these amino acids were mutated (and other changes were introduced) to modify the sequences of these selected activation domains to gain more pronounced acid/hydrophobic pattern. Two novel activation domains were designed:
The new activation domains were tested in the setup identical to the Example 1, following the same steps. The domains were implemented in the reporter expression system (
The five best performing expression systems containing plant-based activation domains according to the results presented in
The xylanase expression cassettes were transformed into T. reesei by the protocol described in Example 1. Trichoderma reesei strain M1909 was used as the parental strain, and the DNA was transformed into the T. reesei protoplasts by the CRISPR-Cas9 protein transformation protocol. The selection of the transformed colonies and the analysis of the strains was done as described above (in Example 1), except the xynHB_N188A gene instead of the mCherry gene was targeted in qPCR analysis.
The xylanase production was tested in small-scale liquid cultures and analyzed in the culture supernatants by SDS-PAGE (
The 1 L bioreactor cultivations were carried out in the Sartorius Stedim BioStat Q Plus Fermentor Bioreactor System. Pre-cultures (inoculated by conidia) were grown for 24 hours in 100 mL of YE-glc medium to produce sufficient amount of mycelium for bioreactor inoculations. The bioreactor cultivations were started by inoculating 80 mL of the pre-culture into 800 mL of the YE-glucose medium (10 g/L glucose, 20 g/L yeast extract, 5 g/L KH2PO4, 5 g/L NH4SO4, 1 mL/L trace elements, 2.4 mM MgSO4, and 4.1 mM CaCl2, 1mL/L Antifoam J647, pH 4.8). These cultures were continuously fed with 500 g/L glucose (with Watson Marlow 120U/DV peristaltic pump at flow rate 0.3-0.7 rpm), air flow at 0.5 slpm (0.4-0.6 vvm), and stirring at 900-1200 rpm. The cultivation was carried out for 6 days, samples taken every day. A subset of the culture supernatants was analyzed by SDS-PAGE (
Equivalent of 2 μL of different time-points culture supernatants from each culture was loaded on a gel (4-20% gradient) and the proteins were separated in in an electric field (PowerPac HC; BioRad). The gel was stained with colloidal coomassie (PageBlue Protein Staining Solution; Thermo Fisher Scientific), and the visualization was performed on the Odyssey CLx Imaging System instrument (LI-COR Biosciences). The scan of the stained gel is shown in
The culture supernatants from xylanase production bioreactor cultures (day 5 and day 6), and a culture supernatant from a bioreactor culture performed under same conditions with T. reesei strain not containing the xylanase production expression system (day 6, negative control—NC in
The five best performing plant-based activation domains according to the results presented in
The first DNA was composed of: 1) sTF expression cassette; 2) selection marker (SM) expression cassette, 3) genome integration DNA regions (flanks); and 4) regions needed for propagation of the plasmids in E. coli. The sTF expression cassette was consisting of a core promoter (An_008cp SEQ ID NO: 22), a sTF coding sequence, and a terminator (see Table 1C and 1D for example sequences of sTF expression cassettes used in Pichia pastoris). The sTF gene was encoding a fusion protein (synthetic transcription factor) composed of bacterial DNA binding protein, Bm3R1, whose encoding DNA sequence was codon-optimized for Saccharomyces cerevisiae, nuclear localization signal SV40 NLS, short peptide linker, and the transcription activation domain (AD). The activation domains encoding DNA sequences were codon optimized for Pichia pastoris. The control AD was the VP16-AD. The terminator was the Trichoderma reesei tef1 terminator (Tr_TEF1t). The SM cassette was the expression cassette allowing expression of the kanR gene (encoding aminoglycoside phosphotransferase enzyme) in Pichia pastoris using a suitable promoter and terminator. The genome integration DNA regions (flanks) were used to allow integration of the construct into the URA3 locus of P. pastoris (JG138543; genome.jgi.doe.gov.Picipa1/Picpa1.home.html). The URA3-integration flanks contained DNA sequences corresponding to outside DNA regions of the URA3 coding region: URA3-5′ was a sequence 500 to 1 bp upstream of the start codon; URA3-3′ was a sequence 1 to 499 bp downstream of the stop codon.
The second DNA was composed of: 1) target gene expression cassette; 2) selection marker (SM) expression cassette; 3) genome integration DNA regions (flanks); and 4) regions needed for propagation of the plasmids in E. coli. The target gene expression cassette contained eight Bm3R1-biding sites (BS; sequences shown in Table 1A and 1B); An_201 core promoter (An_201cp SEQ ID NO: 23; sequence shown in Table 1A and 1B); target gene encoding DNA (target gene); and the Saccharomyces cerevisiae ADH1 terminator (Sc_ADH1t). The target gene was a DNA sequence encoding a phytase enzyme (thermo-stable mutated version AppA_K24E amino acid SEQ ID NO: 24) of Escherichia coli origin previously produced in Pichia pastoris (Zhang J. et al, 2016, Biosci. Biotech. Res. Comm. 9(3): 357-365). The phytase coding DNA was codon-optimized for Pichia pastoris and an appropriate secretion signal sequence (SS) with the Kex2 recognition site was added in-frame into its 5′-end. This resulted in a DNA encoding a fusion protein (SS-Kex2-AppA_K24E; target gene in
Each cassette was integrated into separate loci of the P. pastoris genome. The transformations were done sequentially; first, the sTF expression cassette-containing constructs were integrated into the P. pastoris parental strain forming the sTF-background strains; and then the target gene expression cassette-containing construct was integrated into the sTF-background strains forming the final production strains.
Pichia pastoris strain Y-11430 (currently also called Komagataella phafii, the strain obtained from NRRL Culture Collection) was used as the parental strain. The sTF-expression-cassette-containing constructs (
The transformed clones were first tested for growth in absence of uracil, and those not able to grow were analyzed by qPCR. The genomic DNA of each selected strain was isolated and used as a template DNA in qPCR reactions. The qPCR signal of the sTF gene (Bm3R1) was compared to a qPCR signal of a unique native sequence in each strain. In addition, the correct deletion of the URA3 gene was confirmed by absent qPCR signal of the URA3 target. Strains with correct URA3 deletions and single-copy sTF cassette integrated in the genome (sTF-background strains) were selected for second round of transformations.
The second transformation was done by a lithium-acetate protocol: The sTF-background strains were cultivated in YPD+URA medium (20 g/L bacto bacto peptone, 10 g/L yeast extract, 1 g/L uracil, 20 g/L D-glucose) to reach OD600=0.6-1.0. Fifty mL of each culture was centrifuged, the cell pellet was washed with water and then with LiAc/TE solution (100 mM lithium acetate; 10 mM Tris.HCl (pH=7.5); 1 mM EDTA). The washed cell pellets were resuspended in 0.5 mL of LiAc/TE solution. Fifty μL of the cell suspension was mixed with 10 μg of the AppA-expression construct DNA (linear AppA-target gene expression cassette fragment corresponding to the construct shown in
The genomic DNA of each selected clone was isolated and used as a template DNA in qPCR reactions. The qPCR signal of the target gene (AppA) was compared to a qPCR signal of a unique native sequence in each strain. Strains with single-copy target-gene-cassette cassette integrated in the genome were used in phytase production experiments.
The phytase production was tested in small-scale liquid cultures and analyzed in the culture supernatants by SDS-PAGE (
The 1 L bioreactor cultivations were carried out in the Sartorius Stedim BioStat Q Plus Fermentor Bioreactor System. Pre-cultures were grown for 24 hours in 100 mL of BMG medium to produce sufficient amount of biomass for bioreactor inoculations. The bioreactor cultivations were started by inoculating 80 mL of the preculture into 800 mL of the BMG medium containing 1 mL/L Antifoam J647. These cultures were continuously fed with 500 g/L glucose (with Watson Marlow 120U/DV peristaltic pump at flow rate 0.3-0.7 rpm), air flow at 0.5 slpm (0.4-0.6 vvm), and stirring at 900-1200 rpm. The cultivation was carried out for 6 days, samples taken every day. The culture supernatants was analyzed by SDS-PAGE (
Equivalent of 2 μL of different time-points culture supernatants from each culture was loaded on a gel (4-20% gradient) and the proteins were separated in in an electric field (PowerPac HC; BioRad). The gel was stained with colloidal coomassie (PageBlue Protein Staining Solution; Thermo Fisher Scientific), and the visualization was performed on the Odyssey CLx Imaging System instrument (LI-COR Biosciences). The scan of the stained gel is shown in
The culture supernatants from the phytase production bioreactor cultures (day 4 and day 6), and a culture supernatant from a bioreactor culture performed under same conditions with P. pastoris strain not containing the phytase production expression system (negative control—NC in
The activity was calculated and expressed in arbitrary units per mL of the culture supernatant (AU/mL). The obtained phytase activities are shown in
The two best performing plant-based activation domains (So_NAC102M and Bn_TAF1M) according to the results presented in
Myceliophthora thermophila strain D-76003 (also called Thielavia heterothallica, VTT culture collection) was used as the parental strain, and the DNA was transformed into the M. thermophila protoplasts by the PEG transformation protocol: Isolated M. thermophila protoplasts were suspended into 400 μL of STC solution (1.33 M sorbitol, 10 mM Tris-HCl, 50 mM CaCl2, pH 8.0). For each transformation, one hundred μL of protoplast suspension was mixed with 30 μg of the expression construct DNA dissolved in <100 μL of solution (linear fragment corresponding to the construct shown in
Four clones from each transformation were selected for small-scale liquid cultures and analysis of the culture supernatants by SDS-PAGE (
The culture supernatants from cultures of M. thermophila strains transformed by the xylanase expression constructs, and a culture supernatant from a culture performed under same conditions with the parental M. thermophila strain (NC in
The two best plant-based activation domains based on fungal experiments, So_NAC102M and Bn_TAF1M, are used to construct artificial expression systems for the CHO cells (Cricetulus griseus) (see Table 1E and 1F for example sequences of the expression cassettes for CHO cells). The CHO K1 cell line is transformed with a plasmid comprising eight sTF-specific binding sites (8 BS) positioned upstream of a core promoter Mm_Atp5Bcp (SEQ ID NO: 26). The target gene, mCherry, is positioned right after the core promoter. The transcription of the mCherry is terminated at the SV40 terminator. Adjacent to mCherry expression cassette, in opposite direction, there is the sTF expression cassette, which consist of a core promoter Mm_Eef2cp (SEQ ID NO: 27), the PhIF repressor, a nuclear localization signal, the SV40 NLS, and the transcription activation domain (AD) of plant origin. The transcription of the sTF gene is terminated on the terminator sequence FTH1 terminator of Mus musculus origin. The plasmid contains also a pac gene encoding puromycin N-acetyltransferase enzyme giving resistance to puromycin antibiotics. The performance of these expression systems are compared to the expression system using the CMV (cytomegalovirus) promoter for the expression of mCherry, and to the artificial expression system where the VP64 activation domain (of herpes simplex virus origin) (SEQ ID NO: 30) is used instead of plant-based ADs.
CHO-K1 cells are maintained in RPMI media (Thermo Fischer) supplemented with 2 mM L-glutamine, 10% fetal bovine serum and penicillin streptomycin solution to a final concentration of 100 units penicillin and 0.1 g/l streptomycin. Cells are grown at 37° C. in presence of 5% CO2. The day before transfection 70-80% confluent CHO cells are washed with PBS, pH˜7.4 and after that trypsinized for by adding 2 mL of trypsin into cultures in 250 mL, 75 cm2 flasks and incubating them in +37° C. for 2-4 minutes until the cells have dissociated. Eight mL of fresh RPMI media with the above mentioned supplements is added into flask. One hundred μL of the cell solution is pipetted on to each well of a 24 well plate containing 400 μL of RPMI media (1/5 dilution) supplemented with 2 mM L-glutamine, 10% fetal bovine serum and penicillin streptomycin solution to a final concentration of 100 units penicillin and 0.1 g/l streptomycin. The following day the media is removed by pipetting and replaced immediately with 400 μL of fresh RPMI media without antibiotic supplements. Cells are incubated for 20 minutes in 37° C. with 5% CO2. For each transfection, two μL of Lipofectamine LTX (Thermo Fischer) is combined with 25 μL of Opti-MEM medium (Thermo Fischer), and 0.5-1 μg of plasmid DNA is combined with 0.5 μL of Plus reagent (provided with the Lipofectamine LTX reagent) and 25 μL of Opti-MEM medium. Opti-MEM diluted DNA is then mixed with diluted Lipofectamine® LTX reagent, and incubated for 5 minutes in room temperature. DNA-lipid complex is immediately added to the CHO cell by slow pipetting on top of each culture. The cells are incubated for 1-2 days in 37° C. in presence of 5% CO2. The expression of mCherry can by visualized and analyzed by fluorescent microscopy or by flow-cytometry. For selection of stably transfected cells, the media is replaced by puromycin (1-10 μg/mL) supplemented RPMI medium 2-4 days after transfection.
The expression system containing one example plant-based activation domain, Bn_TAF1M-AD (SEQ ID NO: 11), was constructed and tested in Aspergillus oryzae for the production of an example heterologous protein product secreted into the culture medium. The expression system described in Example 2 (and its scheme shown in
Aspergillus oryzae strain D-171652 (VTT culture collection) was used as a parental strain. This strain was first modified by deleting two genes: the AO090011000868 gene (fungi.ensembl.org/) encoding the orotidine 5′-phosphate decarboxylase (pyrG) enzyme, and the AO090120000322 gene (fungi.ensembl.org/) encoding homolog of NHEJ complex subunit (lig4) protein. The resulting strain (called here A. oryzae pyrGΔ/lig4Δ) is not able to grow in absence of uracil and it is defective in non-homologous end-joining DNA-repair pathway.
The two LGB-expression cassettes were transformed into the protoplasts prepared from the A. oryzae pyrGΔ/lig4Δ strain by the PEG transformation protocol: Isolated A. oryzae pyrGΔ/lig4Δ protoplasts were suspended into 400 μL of STC solution (1.33 M sorbitol, 10 mM Tris-HCl, 50 mM CaCl2, pH 8.0). For the transformation, one hundred μL of protoplast suspension was mixed with 20 μg of the LGB expression construct with the gaaC-genome-integration flanks dissolved in 50 μL of solution (linear fragment corresponding to the construct shown in
Transformed strains were tested by qPCR of the genomic DNA isolated from the strains. The qPCR signal of the LGB gene was compared to a qPCR signal of a unique native sequence in each strain. In addition the correct simultaneous deletion of the gaaC and gluC genes was confirmed by absent qPCR signal of the gaaC and gluC targets. Four correct selected strains were sporulated on PDA agar plates (39 g/L BD-Difco Potato dextrose agar). Spores (conidia) were collected from the PDA plates, and used as inoculum in liquid cultivations for the LBG production experiment.
Four selected clones were tested in small-scale liquid cultures and analysis of the culture supernatants by SDS-PAGE were done in day 2, day 3, and day 4 (
The reporter expression system for testing doxycycline-dependent expression in Trichoderma reesei was constructed as a single DNA molecule (plasmid) (
In all three expression cassettes, the DNA-binding-domain (DBD) was TetR (transcriptional regulator from Escherichia coli, GenBank: EFK45326.1) extended by SV40 NLS. The DBD encoding DNA was codon optimized for Saccharomyces cerevisiae in case of the construct used in Pichia pastoris (Table 2B), or for Aspergillus niger in case of the constructs used in Trichoderma reesei (Table 2A) and Yarrowia lipolytica (Table 2C).
The transcription activation domain (AD) was Bn-TAF1M (SEQ ID NO: 11) in all expression cassettes; The AD encoding DNA was codon optimized for Aspergillus niger in case of the constructs used in Trichoderma reesei and Yarrowia lipolytica (Table 2A and 2B), or for Pichia pastoris for in case of the construct used in Pichia pastoris (Table 2C).
The expression cassettes contained target gene cassette, which consisted of eight TetR-binding sites (BS; sequences shown in Table 2A, 2B, and 2C); Aspergillus niger 201 core promoter (An_201cp; sequence shown in Table 2A and 2B), or Yarrowia lipolytica 565 core promoter (YI_565cp; sequence shown in Table 2C); mCherry encoding DNA (target gene; sequence shown in Table 2A, 2B and 2C); and Trichoderma reesei pdc1 terminator (Tr_PDC1t; Table 2A), or Saccharomyces cerevisiae ADH1 terminator (Sc_ADH1t; Table 2B and 2C). The plasmids further contained synthetic transcription factor (sTF) expression cassette, which consisted of Trichoderma reesei hfb2 core promoter (Tr_hfb2cp; sequence shown in Table 2A), or Aspergillus niger 008 core promoter (An_008cp; Table 2B), or Yarrowia lipolytica 242 core promoter (YI_242cp; Table 2C); the sTF coding region; and Trichoderma reesei tef1 terminator (Tr_TEF1t; Table 2A, 2B and 2C).
The expression cassette for Pichia pastoris also contained a selection marker allowing expression of the kanR gene, and genome integration DNA flanks for targeting the ADE1 gene. The expression cassette for Yarrowia lipolytica also contained a selection marker allowing expression of the NAT gene, and genome integration DNA flanks for targeting the anti gene.
Trichoderma reesei strain M1909 (VTT culture collection), Pichia pastoris Y-11430 strain, and Yarrowia lipolytica strain C-00365 (VTT culture collection) were used as the parental strains. The expression system (
Three randomly selected colonies from each transformation were analyzed for mCherry fluorescence in liquid cultures, in absence of doxycycline (DOX), and in presence of 1 mg/L or 3mg/L doxycycline (DOX) (
For the quantitative fluorometry analysis of the mCherry production in the mycelia of the T. reesei strains or in the cells of P. pastoris and Y. lipolytica strains (
Developing a Synthetic Expression System Based on Plant-Derived Activation Domain for High-Level Gene Expression in Yarrowia Lipolytica and Cutaneotrichosporon Oleaginosus
Microbial lipid production is becoming increasingly attractive topic in biotechnology, including food applications. Several promising production hosts have been identified and some of them are being established in diverse lipid compounds production bioprocesses. Further development of the production hosts is, however, often hindered by limited amount of robust gene expression tools available for genetic manipulation, such as heterologous gene expression. Synthetic expression system based on the sTF containing plant-derived activation domain was tested and optimized for two yeast species known for high-level lipid production, Yarrowia lipolytica and Cutaneotrichosporon oleaginosus.
One of the best performing plant-based activation domain identified and extensively tested in previous examples, Bn_TAF1M, was chosen as an activation domain for development of expression systems for Yarrowia lipolytica and Cutaneotrichosporon oleaginosus. The expression systems were constructed as a single DNA molecule (
In case of Yarrowia lipolytica, the expression system (
In case of Cutaneotrichosporon oleaginosus, the expression system (
Yarrowia lipolytica strain C-00365 (VTT culture collection) and Cutaneotrichosporon oleaginosus (previously known as Trichosporon oleaginosus, Cryptococcus curvatus, Apiotrichum curvatum or Candida curvata) strain ATCC 20509 were used as the parental strains. The expression systems were transformed into Y. lipolytica by a lithium-acetate protocol (described in Example 4). The expression systems were transformed into C. oleaginosus by electroporation (following protocol is for 1 transformation): 20 mL of liquid culture grown in YPD to reach OD˜1.0 was centrifuged shortly (4000 rpm/1 min) to pellet the cells. The cells were washed with 10 mL of ice cold sterile EB-solution (10 mM Tris pH=7.5; 270 mM sucrose; 1 mM MgCl2) and resuspended in 5 mL of IB-solution (25 mM DTT; 20 mM HEPES pH=8.0; in YPD). The cell suspension was incubated at 30° C. shaking at 22 rpm for 30 min, then centrifuge shortly (4000 rpm/1 min) to pellet the cells. The cells were washed with washed with 20 mL of EB-solution, and the cell pellet after centrifugation (4000 rpm/1 min) was resuspended in 500 μL of EB-solution to prepare transformation competent cells. 400 μL of this cells suspension was mixed with 5-10 ug of DNA (expression system DNA cassette) in electroporation cuvette (4 mm gap) and incubated on ice for 15 min. Two consecutive electroporations were performed (BioRad GenePulser; 1800V; 1000 Ω; 25 uF). The transformation mix was diluted with 1 mL of YPD and incubated at 30° C. shaking 220 rpm for 4 h prior to spreading the cells on selective agar plates.
The transformed cells of Y. lipolytica and C. oleaginosus were selected for growth on media (YPD agar) containing 150 mg/L Nourseothricin. Three colonies from each transformation were analyzed for mCherry fluorescence in liquid cultures.
For the quantitative fluorometry analysis of the mCherry production in the cells of P. pastoris (
DNA sequences of example doxycycline-repressible reporter expression cassettes for testing the engineered plant-based transcription activation domains in Trichoderma reesei (A), Pichia pastoris (B), Yarrowia lipolytica (C), and an example expression system used in Cutaneotrichosporon oleaginosus (D). The functional DNA parts are indicated: 8×sTF-specific binding site (black bolded text); core promoters (underlined text); mCherry coding region (black bolded underlined text); terminators (italics); and sTF (bolded italics) including the plant-based activation domain (bolded underlined italics).
ATCACTGATAGGGAGTATTGACAAGCTTTCTCTATCACTGATAGGAGTGGCTTATCTAGATCTCTATCACTG
ATAGGGAGTTCACATCCTAGGTCTCTATCACTGATAGGGAGTACTAGTTCTCCCCGGAAACTGTGGCCATA
CAGCCATCCTACACTCTACATATCCACACCAATCTACTACAATTAATTAAAATGGTGAGCAAGGGCGAGGA
GGATAACATGGCCATCATCAAGGAGTTCATGCGCTTCAAGGTGCACATGGAGGGCTCCGTGAACGGCCA
CGAGTTCGAGATCGAGGGCGAGGGCGAGGGCCGCCCCTACGAGGGCACCCAGACCGCCAAGCTGAAG
GTGACCAAGGGTGGCCCCCTGCCCTTCGCCTGGGACATCCTGTCCCCTCAGTTCATGTACGGCTCCAAG
GCCTACGTGAAGCACCCCGCCGACATCCCCGACTACTTGAAGCTGTCCTTCCCCGAGGGCTTCAAGTGG
GAGCGCGTGATGAACTTCGAGGACGGCGGCGTGGTGACCGTGACCCAGGACTCCTCCCTGCAGGACGG
CGAGTTCATCTACAAGGTGAAGCTGCGCGGCACCAACTTCCCCTCCGACGGCCCCGTAATGCAGAAGAA
GACCATGGGCTGGGAGGCCTCCTCCGAGCGGATGTACCCCGAGGACGGCGCCCTGAAGGGCGAGATCA
AGCAGAGGCTGAAGCTGAAGGACGGCGGCCACTACGACGCTGAGGTCAAGACCACCTACAAGGCCAAG
AAGCCCGTGCAGCTGCCCGGCGCCTACAACGTCAACATCAAGTTGGACATCACCTCCCACAACGAGGAC
TACACCATCGTGGAACAGTACGAACGCGCCGAGGGCCGCCACTCCACCGGCGGCATGGACGAGTTATA
CAAG
TAATGAGGATCTCCCGGCATGAAGTCTGACCGGGTAGTATGAGGGTTCATCGTCACCTTGATAGAAT
AATAGACGATAAAGCAGGCCACGGGCAGGTACCGATTGTCAATCCGGCAGGTTAGGAGGCGTGTTGGAAA
TGAGTTTATGGGTTATGGTCAAATCGGATAGTATGAGGTACATAGTTTGTAAATCTCAAGATTATTTTCTTCC
TTAATCTTGCACGTCGCATGAGAGGGACCGAGAAGAGAATTGATGAAGGGCTCTTGAAGATGAGATGAATC
ACGTGGTTGCTGAAGCTTCAGTAGTCTCGGGTACCTGTTCTTTCCCACAAACAGTAGCCAGGCTAGAGGTA
CTGAGTACCCGCTCACCGTATCTAATCATCCGACCTGAAATCTTCAAGCTGTTTTATTGACACTTCGAGTCC
ATCTTCATTCACGTAAGGAGAACTTCTAGGACATCACTTATCCCGCCATATTTAGCTGCAAGGAGTCAATTG
CAATGTCAGATTCCGCTCCTAAGAGGAAACAGGGCCCTGGCGGCTCAGATGGCTCGGCATTGAAGAAGAG
AAAGGTATGATGACAAGAATGCTTGCTACAAATTACCCAGTAGCCGGGCACTAACAGCTCCCTGGCCTAGG
TAGACTACCTACCTCAAGGTACGACACATGGCAGCACTGGAGGGGGAATAGGCAGACTGGACGACAGTGG
ACAAGATACGGTCGCACAACCTTTGTCGTGGCATCGCGAGAATAATCGTCACAAGCTTCACGTATGCAGAC
GGAGACAAGATGATTTGGTTGTCGAAGTCATGAATTCACTTCTATCTAGTTTTTTTGTTCCCTTTTGTTTTGCA
TTCCCAGAGAAGTTCTGATGGAACCCTTATTCCCAGCCTCTCAATTAACGTGCCTCGATTCATAGTCGAGTG
CTCATGCATAGCAACATTGATCGTTTCGTCGTAGAAGTGAGCGCATGGTGGTGCCCACCTGGAGAAACCTC
ACGAGGGACCCCAGAACATCAGGTGTTGATGATGGGTATCGCGGCCGGCCCTA
TGTATTTAAATGTGATGGTTGGTATTCAACA
AAGAATGTTTGTGTTTGGAGAGTTGAGAAAGAGGAGTTGAGTGAATGTGGTGATGGTTGTAGATGAGTGTG
CTGATGAGGATGGAAAAGATTGTTGGATGGCGGGAATCGAGGTCTTCTTTATACTTTTTTTTCTGGCCCTCT
TCATCTTCCAGCTCTCGCAGGCTGTTGCTAGAAATCTCGACGCGCAATTAACCCTCACGGGCGCGGCCGC
ATCACTGATAGGGAGTATTGACAAGCTTTCTCTATCACTGATAGGAGTGGCTTATCTAGATCTCTATCACTG
ATAGGGAGTTCACATCCTAGGTCTCTATCACTGATAGGGAGTACTAGTTCTCCCCGGAAACTGTGGCCATA
CAGCCATCCTACACTCTACATATCCACACCAATCTACTACAATTATTAATTAAAATGGTGAGCAAGGGCGAG
GAGGATAACATGGCCATCATCAAGGAGTTCATGCGCTTCAAGGTGCACATGGAGGGCTCCGTGAACGGC
CACGAGTTCGAGATCGAGGGCGAGGGCGAGGGCCGCCCCTACGAGGGCACCCAGACCGCCAAGCTGA
AGGTGACCAAGGGTGGCCCCCTGCCCTTCGCCTGGGACATCCTGTCCCCTCAGTTCATGTACGGCTCCA
AGGCCTACGTGAAGCACCCCGCCGACATCCCCGACTACTTGAAGCTGTCCTTCCCCGAGGGCTTCAAGT
GGGAGCGCGTGATGAACTTCGAGGACGGCGGCGTGGTGACCGTGACCCAGGACTCCTCCCTGCAGGAC
GGCGAGTTCATCTACAAGGTGAAGCTGCGCGGCACCAACTTCCCCTCCGACGGCCCCGTAATGCAGAAG
AAGACCATGGGCTGGGAGGCCTCCTCCGAGCGGATGTACCCCGAGGACGGCGCCCTGAAGGGCGAGAT
CAAGCAGAGGCTGAAGCTGAAGGACGGCGGCCACTACGACGCTGAGGTCAAGACCACCTACAAGGCCA
AGAAGCCCGTGCAGCTGCCCGGCGCCTACAACGTCAACATCAAGTTGGACATCACCTCCCACAACGAGG
ACTACACCATCGTGGAACAGTACGAACGCGCCGAGGGCCGCCACTCCACCGGCGGCATGGACGAGTTA
TACAAG
TAATGAGGATCCGAATTTCTTATGATTTATGATTTTTATTATTAAATAAGTTATAAAAAAAATAAGTGT
ATACAAATTTTAAAGTGACTCTTAGGTTTTAAAACGAAAATTCTTATTCTTGAGTAACTCTTTCCTGTAGGTCA
GGTTGCTTTCTCAGGTATAGCATGAGGTCGCTCTTATTGACCACACCTCTACCGGCCAGCTTTTGTTCCCTT
AATTCCAAACTATAAATCAACCACTTTCCCTCCTCCCCCCCGCCCCCACTTGGTCGATTCTTCGTTTTCTCTC
TACCTTCTTTCTATTCGGTTTTCTTCTTCTTTTATTTTCCCTCTCCCATCAATCAAATTCATATTTGAAAAAAAT
TAACATTAATTTAAATACA
TGA
CCACCATGCGCTCACTTCTACGACGAAACGATCAATGTTGCTATGCATGAGCACTCGACTATGAATCGAGG
CACGTTAATTGAGAGGCTGGGAATAAGGGTTCCATCAGAACTTCTCTGGGAATGCAAAACAAAAGGGAACA
AAAAAACTAGATAGAAGTGAATTCATGACTTCGACAACCAAATCATCTTGTCTCCGTCTGCATACGTGAAGCT
TGTGACGATTATTCTCGCGATGCCACGACAAAGGTTGTGCGACCGTATCTTGTCCACTGTCGTCCAGTCTG
CCTATTCCCCCTCCAGTGCTGCCATGTGTCGTACCTTGAGGTAGGTAGTCTA
ATCACTGATAGGGAGTATTGACAAGCTTTCTCTATCACTGATAGGAGTGGCTTATCTAGATCTCTATCACTG
ATAGGGAGTTCACATCCTAGGTCTCTATCACTGATAGGGAGTACTAGTTCTCCCCGGAAACTGTGGCCATA
GGAGATTTGGAGCCGTCTACTCTGTCGGCCAACGACATAAATAGACCCCCTCAGTCACCTTAGACACAGCA
GAATTCCACCAGATCAGCTTCCTTAATTAATCATGGTGAGCAAGGGCGAGGAGGATAACATGGCCATCAT
CAAGGAGTTCATGCGCTTCAAGGTGCACATGGAGGGCTCCGTGAACGGCCACGAGTTCGAGATCGAGG
GCGAGGGCGAGGGCCGCCCCTACGAGGGCACCCAGACCGCCAAGCTGAAGGTGACCAAGGGTGGCCC
CCTGCCCTTCGCCTGGGACATCCTGTCCCCTCAGTTCATGTACGGCTCCAAGGCCTACGTGAAGCACCCC
GCCGACATCCCCGACTACTTGAAGCTGTCCTTCCCCGAGGGCTTCAAGTGGGAGCGCGTGATGAACTTC
GAGGACGGCGGCGTGGTGACCGTGACCCAGGACTCCTCCCTGCAGGACGGCGAGTTCATCTACAAGGT
GAAGCTGCGCGGCACCAACTTCCCCTCCGACGGCCCCGTAATGCAGAAGAAGACCATGGGCTGGGAGG
CCTCCTCCGAGCGGATGTACCCCGAGGACGGCGCCCTGAAGGGCGAGATCAAGCAGAGGCTGAAGCTG
AAGGACGGCGGCCACTACGACGCTGAGGTCAAGACCACCTACAAGGCCAAGAAGCCCGTGCAGCTGCC
CGGCGCCTACAACGTCAACATCAAGTTGGACATCACCTCCCACAACGAGGACTACACCATCGTGGAACA
GTACGAACGCGCCGAGGGCCGCCACTCCACCGGCGGCATGGACGAGTTATACAAG
TAATGATCAGAATT
TCTTATGATTTATGATTTTTATTATTAAATAAGTTATAAAAAAAATAAGTGTATACAAATTTTAAAGTGACTCTTA
GGTTTTAAAACGAAAATTCTTATTCTTGAGTAACTCTTTCCTGTAGGTCAGGTTGCTTTCTCAGGTATAGCAT
GAGGTCGCTCTTATTGACCACACCTCTACCGGCCAGCTTTTGTTCCCTTTAGTGAGGGTTAATTGCGCGTCG
GTCACCTCCGTGACATGATGTAACTCCTTTACTATATATAGACGTGTGTTCGTATCGAAAATAGCCAGACACT
CTTTGCTCCATCACTCACATTTAAATACA
TAGGGCCGGCCGCGATACCCATCATCAACACCTGATGTTCTGGGGTCCCTCGT
GAGGTTTCTCCAGGTGGGCACCACCATGCGCTCACTTCTACGACGAAACGATCAATGTTGCTATGCATGAG
CACTCGACTATGAATCGAGGCACGTTAATTGAGAGGCTGGGAATAAGGGTTCCATCAGAACTTCTCTGGGA
ATGCAAAACAAAAGGGAACAAAAAAACTAGATAGAAGTGAATTCATGACTTCGACAACCAAATCATCTTGTCT
CCGTCTGCATACGTGAAGCTTGTGACGATTATTCTCGCGATGCCACGACAAAGGTTGTGCGACCGTATCTT
GTCCACTGTCGTCCAGTCTGCCTATTCCCCCTCCAGTGCTGCCATGTGTCGTACCTTGAGGTAGGTAGTCT
ACCTAGGCCAGGGAGCTGTTAGTGCCCGGCTACTGGGTAATTTGTAGCGCTGGAGCGATTCGGTCACAGG
CGTCAAGAGTGCTGTAGCAATGTCCGACGCCATTGATCCTGATATCAAATACCACCTGGGCAGGTCTGGGT
ATGTGAGGTCTTGTCGGATGTGTCGAGTTCTTCTCCAACGTAGTGTTCATTCGCGCTCAT
TTCATTCCGGACTCTAGATAAGCACGGAATGAACTTTCATTCCGCTGAAGCTTGTCAATCGGAATGAAGGT
TCATTCCGGCTAGTCGGAATGAACATTCATTCCGAGACCTAGGATGTGACGGAATGAAGGTTCATTCCGG
TCCCGTTATAAGAAGCCGACGACGTGGCTAAGCCCCCAAAGCCTCCACCACCTTCCATCCGTCTCTCTCTT
CTCCTACTACCACAACTTAATTAATCATGGTGAGCAAGGGCGAGGAGGATAACATGGCCATCATCAAGGA
GTTCATGCGCTTCAAGGTGCACATGGAGGGCTCCGTGAACGGCCACGAGTTCGAGATCGAGGGCGAGG
GCGAGGGCCGCCCCTACGAGGGCACCCAGACCGCCAAGCTGAAGGTGACCAAGGGTGGCCCCCTGCCC
TTCGCCTGGGACATCCTGTCCCCTCAGTTCATGTACGGCTCCAAGGCCTACGTGAAGCACCCCGCCGAC
ATCCCCGACTACTTGAAGCTGTCCTTCCCCGAGGGCTTCAAGTGGGAGCGCGTGATGAACTTCGAGGAC
GGCGGCGTGGTGACCGTGACCCAGGACTCCTCCCTGCAGGACGGCGAGTTCATCTACAAGGTGAAGCTG
CGCGGCACCAACTTCCCCTCCGACGGCCCCGTAATGCAGAAGAAGACCATGGGCTGGGAGGCCTCCTC
CGAGCGGATGTACCCCGAGGACGGCGCCCTGAAGGGCGAGATCAAGCAGAGGCTGAAGCTGAAGGAC
GGCGGCCACTACGACGCTGAGGTCAAGACCACCTACAAGGCCAAGAAGCCCGTGCAGCTGCCCGGCGC
CTACAACGTCAACATCAAGTTGGACATCACCTCCCACAACGAGGACTACACCATCGTGGAACAGTACGA
ACGCGCCGAGGGCCGCCACTCCACCGGCGGCATGGACGAGTTATACAAGTAA
TGATCAGAATTTCTTATG
ATTTATGATTTTTATTATTAAATAAGTTATAAAAAAAATAAGTGTATACAAATTTTAAAGTGACTCTTAGGTTTT
AAAACGAAAATTCTTATTCTTGAGTAACTCTTTCCTGTAGGTCAGGTTGCTTTCTCAGGTATAGCATGAGGTC
GCTCTTATTGACCACACCTCTACCGGCCAGCTTTTGTTCCCTTTAGTGAGGGTTAATTGCGCGTCGAGGCTA
CTGCCACCGCCGATCACCTCCACTCCCTCCACTACCTCACCACTACCACCTCACCTCATTTCATTTAAATAC
A
TAGGGCCGGCCGCGATACCCATCATCAACACCTGATGTTCTGGGGTCCCT
CGTGAGGTTTCTCCAGGTGGGCACCACCATGCGCTCACTTCTACGACGAAACGATCAATGTTGCTATGCAT
GAGCACTCGACTATGAATCGAGGCACGTTAATTGAGAGGCTGGGAATAAGGGTTCCATCAGAACTTCTCTG
GGAATGCAAAACAAAAGGGAACAAAAAAACTAGATAGAAGTGAATTCATGACTTCGACAACCAAATCATCTT
GTCTCCGTCTGCATACGTGAAGCTTGTGACGATTATTCTCGCGATGCCACGACAAAGGTTGTGCGACCGTA
TCTTGTCCACTGTCGTCCAGTCTGCCTATTCCCCCTCCAGTGCTGCCATGTGTCGTACCTTGAGGTAGGTA
GTCTACCTAGGCCAGGGAGCTGTTAGTGCCCGGCTACTGGGTAATTTGTAGCGCTGGAGCGATTCGGTCA
CAGGCGTCAAGAGTGCTGTAGCAATGTCCGACGCCATTGATCCTGATATCAAATACCACCTGGGCAGGTCT
GGGTATGTGAGGTCTTGTCGGATGTGTCGAGTTCTTCTCCAACGTAGTGTTCATTCGCGCTCAT
Chavez A et al. (2015). “Highly efficient Cas9-mediated transcriptional programming.” Nat Methods, 12(4), 326-328.
Lu, Y. et al. (2016). “High-level expression of improved thermo-stable alkaline xylanase variant in Pichia Pastoris through codon optimization, multiple gene insertion and high-density fermentation.” Scientific Reports volume 6, Article number: 37869
Naseri G et al. (2017). “Plant-derived transcription factors for orthologous regulation of gene expression in the yeast Saccharomyces cerevisiae. ACS Synthetic Biology, 6, 1742-1756.
Olsen, A. N., H. A. Ernst, et al. (2005). “NAC transcription factors: structurally distinct, functionally diverse.” Trends Plant Sci 10(2): 79-87.
Tiwari, S. B., A. Belachew, et al. (2012). “The EDLL motif: a potent plant transcriptional activation domain from AP2/ERF transcription factors.” The Plant Journal 70(5): 855-865.
Zhang, J. et al. (2016). “ Site-directed mutagenesis and thermal stability analysis of phytase from Escherichia coli.” Biosci. Biotech. Res. Comm. 9(3): 357-365.
Number | Date | Country | Kind |
---|---|---|---|
20195988 | Nov 2019 | FI | national |
This application is a United States National Phase Patent Application of International Patent Application Number PCT/FI2020/050772, filed on Nov. 18, 2020, which claims the benefit of priority to Finnish National Patent Application number FI 20195988, filed on Oct. 19, 2019, both of which are incorporated by reference herein in their entireties.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/FI2020/050772 | 11/18/2020 | WO |