This patent application file contains a Sequence Listing submitted in computer readable ASCII text format (file name: CBTH-13-US_SeqList.txt, date recorded: May 27, 2022, size: 1,485,254 bytes). The Sequence Listing, which is a part of the present disclosure, includes a computer readable form and a written sequence listing comprising nucleotide and/or amino acid sequences of the present invention. The sequence listing information recorded in computer readable form is identical to the written sequence listing. The content of the Sequence Listing file is incorporated herein by reference in its entirety.
The present application generally relates to recombinant enzymes and genes encoding those enzymes. More specifically, the application provides recombinant olivetolic acid cyclase genes and enzymes that function in microorganisms.
Cannabinoids are a class of organic small molecules of meroterpenoid structures found in the plant genus Cannabis. The small molecules are currently under investigation as therapeutic agents for a wide variety of health issues, including epilepsy, pain, and other neurological problems, and mental health conditions such as depression, PTSD, opioid addiction, and alcoholism (Committee on the Health Effects of Marijuana, 2017).
While it is known that cannabinoids may be obtained via biosynthesis in plant species, there are many problems associated with the synthesis of such molecules which need to be overcome, including problems with large-scale manufacturing, purification, and heterologous expression for biosynthesis.
Producing cannabinoids, in recombinant microorganisms such as yeast is a promising solution to the above problems. See, e.g., U.S. patent application Ser. Nos. 16/553,103, 16/553,120, 16/558,973, 17/068,636 and 63/053,539; U.S. Pat. No. 10,435,727; and US Patent Publications 2020/0063170 and 2020/0063171, all incorporated by reference.
One way to improve biosynthetic cannabinoid production in microorganisms is by the discovery and use of new enzymes that catalyze the same reactions as plant derived enzymes but with improved parameters. The present invention provides such enzymes with olivetolic acid cyclase activity.
In Cannabis spp., the olivetolic acid cyclase (OAC) enzyme catalyzes the cyclization of the linear polyketide olivetol to form olivetolic acid (OA), a precursor for biosynthesis of downstream cannabinoids such as cannabidiol (CBD) and tetrahydrocannabinol (THC). See
DABB domains are small alpha/beta barrel motifs of unknown function. They are known to be upregulated in response to salt stress in plants and for purposes of molybdopterin uptake. A description of the domain can be found in the SMART (Simple Modular Architecture Research Tool) database (Letunic I, Khedkar S, Bork P. SMART: recent updates, new developments and status in 2020. Nucleic Acids Res. 2021 Jan. 8; 49(D1):D458-D460). The domain typically forms an alpha-beta barrel dimer.
There remains a need for new engineered enzymes that are capable of catalyzing the cyclization of the linear polyketide olivetol to form olivetolic acid (OA), which is a precursor for biosynthesis of downstream valuable cannabinoid molecules.
Engineered enzymes from microorganisms with OAC activity are provided herein.
Provided is an engineered olivetolic acid cyclase (OAC or altOAC or PKSC) enzyme derived from a microorganism or a non-Cannabis plant, wherein the enzyme catalyzes cyclization of a polyketide (I) into a resorcyclic acid derivative (II), where R1=CH3, CH2CH3, (CH2)2CH3, (CH2)3CH3, (CH2)4CH3, (CH2)5CH3, or (CH2)6CH3
Also provided is engineered nucleic acids encoding the above OAC enzyme, expression cassettes comprising those nucleic acids, and recombinant microorganisms comprising those expression cassettes that express the OAC enzyme encoded therein.
Additionally provided is a method of converting a polyketide (I) into a resorcyclic acid derivative (II), where R1=CH3, CH2CH3, (CH2)2CH3, (CH2)3CH3, (CH2)4CH3, (CH2)5CH3, or (CH2)6CH3,
the method comprising contacting the polyketide with the above-identified engineered OAC enzyme in a manner and for a time sufficient to convert the polyketide (I) into the resorcyclic acid derivative (II).
The present teachings also include an engineered olivetolic acid cyclase (OAC or altOAC or PKSC) enzyme derived from a microorganism or a non-Cannabis plant, wherein the enzyme catalyzes cyclization of a polyketide (I) into a resorcyclic acid derivative (II), where R1=CH3, CH2CH3, (CH2)2CH3, (CH2)3CH3, (CH2)4CH3, (CH2)5CH3, or (CH2)6CH3
and wherein the enzyme comprises an amino acid sequence that is at least 95% identical to any one of the sequences set forth in SEQ ID NO: 193-SEQ ID NO: 378. In some preferred embodiments, the enzyme comprises an amino acid sequence that is at least 99% identical to any one of the sequences set forth in SEQ ID NO: 193-SEQ ID NO: 378. In some more preferred embodiments, the enzyme comprises an amino acid sequence that is one of the sequences set forth in SEQ ID NO: 193-SEQ ID NO: 378. In some preferred embodiments, the enzyme comprises an amino acid sequence that is at least 50% identical to the sequence set forth in either SEQ ID NO: 385 or SEQ ID NO: 386. In some embodiments, the enzyme comprises an amino acid sequence that is at least 20% identical to SEQ ID NO: 334.
The present teachings also include an isolated codon-optimized polynucleotide that encodes the above-identified engineered OAC enzymes, comprising a nucleotide sequence that is at least 50% identical to the sequence set forth in either SEQ ID NO:1, or SEQ ID NO:2, or SEQ ID NO:3. In some embodiments, the codon-optimized polynucleotide that encodes the above-identified engineered OAC enzymes comprises a nucleotide sequence that is at least 95% identical to any one of the sequences set forth in SEQ ID NO: 1-SEQ ID NO: 186. In some embodiments, the codon-optimized polynucleotide that encodes the above-identified engineered OAC enzymes comprises a nucleotide sequence that is at least 95% identical to sequence set forth in SEQ ID NO: 142. In some embodiments, the codon-optimized polynucleotide is inserted in a vector configured for replication and protein expression in microbial (e.g., yeast) cells.
It is expected that the above-identified engineered OAC enzymes from a microorganism would have low amino acid sequence homology (e.g., less than 60%, less than 50%, less than 40%, less than 30% or less than 20% sequence homology) to amino acid sequences of OAC from a Cannabis sp (for example, to sequence set forth in SEQ ID NO: 379).
To facilitate understanding of the invention, a number of terms and abbreviations as used herein are defined below as follows:
Conservative amino acid substitutions: As used herein, when referring to mutations in a protein, “conservative amino acid substitutions” are those in which at least one amino acid of the polypeptide encoded by the nucleic acid sequence is substituted with another amino acid having similar characteristics. Examples of conservative amino acid substitutions are ser for ala, thr, or cys; lys for arg; gln for asn, his, or lys; his for asn; glu for asp or lys; asn for his or gln; asp for glu; pro for gly; leu for ile, phe, met, or val; val for ile or leu; ile for leu, met, or val; arg for lys; met for phe; tyr for phe or trp; thr for ser; trp for tyr; and phe for tyr.
Functional variant: The term “functional variant,” as used herein, refers to a recombinant enzyme such as a OAC enzyme that comprises a nucleotide and/or amino acid sequence that is altered by one or more nucleotides and/or amino acids compared to the nucleotide and/or amino acid sequences of the parent protein and that is still capable of performing an enzymatic function (e.g., synthesis of olivetolic acid) of the parent enzyme. In other words, the modifications in the amino acid and/or nucleotide sequence of the parent enzyme may cause desirable changes in reaction parameters without altering fundamental enzymatic function encoded by the nucleotide sequence or containing the amino acid sequence. The functional variant may have conservative change including nucleotide and amino acid substitutions, additions and deletions. These modifications can be introduced by standard techniques known in the art, such as site-directed mutagenesis and random PCR-mediated mutagenesis, and may comprise natural as well as non-natural nucleotides and amino acids. Also envisioned is the use of amino acid analogs, e.g. amino acids not DNA or RNA encoded in biological systems, and labels such as fluorescent dyes, radioactive elements, electron dense agents, or any other protein modification, now known or later discovered.
The term “modified” or “engineered”, or “variant”, as used in reference to nucleic acid molecules and protein molecules, e.g., an engineered enzyme, implies that such molecules are created by human intervention and they are non-naturally occurring. The engineered enzyme is a polypeptide or peptide having an altered amino acid sequence, relative to an unmodified or wild-type protein, such as starting naturally occurring enzyme (wild-type enzyme), or a portion thereof. An engineered enzyme is a polypeptide or peptide which differs from a wild-type enzyme sequence, or a portion thereof, by one or more amino acid substitutions, deletions, additions, or combinations thereof. Sequence of an engineered enzyme can contain 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 or more amino acid differences (e.g., mutations) compared to the sequence of starting wild-type enzyme. An engineered enzyme generally exhibits at least 70%, 80%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity to a corresponding starting wild-type enzyme. An engineered enzyme can exhibit at least 70%, 80%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence homology to a corresponding starting wild-type enzyme. Non-naturally occurring amino acids as well as naturally occurring amino acids are included within the scope of permissible substitutions or additions. An engineered enzyme is not limited to any enzymes made or generated by a particular method of making and includes, for example, an engineered enzyme made or generated by genetic selection, protein engineering, directed evolution, de novo recombinant DNA techniques, or combinations thereof.
In some embodiments, variants of an engineered enzyme displaying only non-substantial or negligible differences in structure can be generated by making conservative amino acid substitutions in the engineered enzyme. By doing this, engineered enzyme variants that comprise a sequence having at least 90% (90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, and 99%) sequence identity with wild-type or another engineered enzyme sequences can be generated, retaining at least one functional activity of the engineered enzyme, e.g., ability to catalyze a specific reaction. Examples of conservative amino acid changes are known in the art. Examples of non-conservative amino acid changes that are likely to cause major changes in protein structure are those that cause substitution of (a) a hydrophilic residue, e.g., serine or threonine, for (or by) a hydrophobic residue, e.g., leucine, isoleucine, phenylalanine, valine or alanine; (b) a cysteine or proline for (or by) any other residue; (c) a residue having an electropositive side chain, e.g., lysine, arginine, or histidine, for (or by) an electronegative residue, e.g., glutamic acid or aspartic acid; or (d) a residue having a bulky side chain, e.g., phenylalanine, for (or by) one not having a side chain, e.g., glycine. Methods of making targeted amino acid substitutions, deletions, truncations, and insertions are generally known in the art. For example, amino acid sequence variants can be prepared by mutations in the DNA. Methods for polynucleotide alterations are well known in the art, for example, Kunkel et al. (1987) Methods in Enzymol. 154:367-382; U.S. Pat. No. 4,873,192 and the references cited therein.
The term “sequence identity” as used herein refers to the sequence identity between polynucleotides or proteins at the nucleotide or amino acid level, respectively. “Sequence identity” is a measure of identity between proteins at the amino acid level and a measure of identity between nucleic acids at nucleotide level. The protein sequence identity may be determined by comparing the amino acid sequence in a given position in each sequence when the sequences are aligned. Similarly, the nucleic acid sequence identity may be determined by comparing the nucleotide sequence in a given position in each sequence when the sequences are aligned. “Sequence identity” means the percentage of identical subunits at corresponding positions in two sequences when the two sequences are aligned to maximize subunit matching, i.e., taking into account gaps and insertions. Sequence identity is present when a subunit position in both of the two sequences is occupied by the same nucleotide or amino acid, e.g., if a given position is occupied by an adenine in each of two DNA molecules, then the molecules are identical at that position. Methods for the alignment of sequences for comparison are well known in the art, such methods include the BLAST algorithm, which calculates percent sequence identity and performs a statistical analysis of the similarity between the two sequences. The software for performing BLAST analysis is publicly available through the National Center for Biotechnology Information (NCBI) website.
The term “sequence homology” as used herein is a measure of similarity and refers to the sequence similarity between polynucleotides or proteins at the nucleotide or amino acid level, respectively. The protein sequence homology may be determined by comparing the amino acid sequence in a given position in each sequence when the sequences are aligned. Similarly, the nucleic acid sequence homology may be determined by comparing the nucleotide sequence in a given position in each sequence when the sequences are aligned. “Sequence homology” means the percentage of homologous subunits (i.e., amino acids) at corresponding positions in two sequences when the two sequences are aligned to maximize subunit matching, i.e., taking into account gaps which factor in insertions and deletions in the aligned sequences. Sequence homology is present when a subunit position in each of the two or more sequences is occupied by identical amino acid residues or functionally similar amino acid residues (e.g., isosteric or isoelectric amino acid identities; amino acid residues that belong to the same functional class, such as e.g. positively charged residues, or small hydrophobic residues). Sequence homology is absent when a subunit position in each of the two or more sequences is occupied by a functionally different amino acid (i.e., lacking structural and/or functional similarity). Methods for the alignment of sequences for comparison are well known in the art, such methods include the BLAST algorithm, which calculates percent sequence homology and performs a statistical analysis of the homology between the two sequences.
The terms “corresponding to position(s)” or “position(s) . . . with reference to position(s)” of or within a peptide or a polynucleotide, such as recitation that nucleotides or amino acid positions “correspond to” nucleotides or amino acid positions of a disclosed sequence, such sequence set forth in the Sequence Listing, refers to nucleotides or amino acid positions identified in the polynucleotide or in the peptide upon alignment with the disclosed sequence using a standard alignment algorithm, such as the BLAST algorithm (NCBI). One skilled in the art can identify any given amino acid residue in a given peptide at a position corresponding to a particular position of a reference sequence, such as set forth in the Sequence Listing, by performing alignment of the peptide sequence with the reference sequence (for example, by using BLASTP publicly available through the NCBI website), matching the corresponding position of the reference sequence with the position in peptide sequence and thus identifying the amino acid residue within the peptide. Amino acid positions corresponding to the recited residues can be also determined by structural alignment to the experimentally-determined template structure in the PDB (as given by the PDB accession code after making structural truncations corresponding to the SEQ ID NO of interest). The reference structures used in the structural alignment can be experimentally determined or generated by homology modeling using state of the art homology modeling methods such as Rosetta or PyRosetta macromolecular software suites, machine learning models such as AlphaFold2, or the like. Other useful structural alignment methods and/or programs include, but are not limited to, TM-align, PyMOL (superalign, cealign, and align methods), LSQMAN, Fr-TM-align, DALI, DaliLite, CE, CE-MC, and the like.
Recombinant nucleic acid and recombinant protein: As used herein, a recombinant nucleic acid or protein is a nucleic acid or protein produced by recombinant DNA technology, e.g., as described in Green and Sambrook (2012).
Polypeptide, protein, and peptide: The terms “polypeptide,” “protein,” and “peptide” are used herein interchangeably to refer to amino acid chains in which the amino acid residues are linked by peptide bonds or modified peptide bonds. The amino acid chains can be of any length of greater than two amino acids. Unless otherwise specified, the terms “polypeptide,” “protein,” and “peptide” also encompass various modified forms thereof. Such modified forms may be naturally occurring modified forms or chemically modified forms. Examples of modified forms include, but are not limited to, glycosylated forms, phosphorylated forms, myristoylated forms, palmitoylated forms, ribosylated forms, acetylated forms, and the like. Modifications also include intra-molecular crosslinking and covalent attachment of various moieties such as lipids, flavin, biotin, polyethylene glycol or derivatives thereof, and the like. In addition, modifications may also include protein cyclization, branching of the amino acid chain, and cross-linking of the protein. Further, amino acids other than the conventional twenty amino acids encoded by genes may also be included in a polypeptide.
The term “protein” or “polypeptide” may also encompass a “purified” polypeptide that is substantially separated from other polypeptides in a cell or organism in which the polypeptide naturally occurs (e.g., 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, 100% free of contaminants).
Primer, probe and oligonucleotide: The terms “primer,” “probe,” and “oligonucleotide” may be used herein interchangeably to refer to a relatively short nucleic acid fragment or sequence. They can be DNA, RNA, or a hybrid thereof, or chemically modified analogs or derivatives thereof. Typically, they are single-stranded. However, they can also be double-stranded having two complementing strands that can be separated apart by denaturation. In certain aspects, they are of a length of from about 8 nucleotides to about 200 nucleotides. In other aspects, they are from about 12 nucleotides to about 100 nucleotides. In additional aspects, they are about 18 to about 50 nucleotides. They can be labeled with detectable markers or modified in any conventional manners for various molecular biological applications.
Vector: As used herein, the term “vector” refers to a nucleic acid molecule capable of transporting another nucleic acid to which it has been linked. One type of vector is an episome, i.e., a nucleic acid capable of extra-chromosomal replication. Various vectors are those capable of autonomous replication and/expression of nucleic acids to which they are linked. Vectors capable of directing the expression of genes to which they are operatively linked are referred to herein as “expression vectors.”
Linker: The term “linker” refers to a short amino acid sequence that separates multiple domains of a polypeptide. In some embodiments, the linker prohibits energetically or structurally unfavorable interactions between the discrete domains.
Cannabinoid: As used herein, the term “cannabinoid” refers to a family of structurally related aromatic meroterpenoid molecules. Cannabinoids are generally formed by the enzymatic fusion, by a cannabinoid synthase (having geranylpyrophosphate:olivetolate geranyltransferase activity), of an alkylresorcylic acid
where R1=CH3, (CH2)2CH3 (divarinolic acid), (CH2)4CH3 (olivetolic acid), or (CH2)6CH3, with a polyprenyl pyrophosphate such as geranyl pyrophosphate, neryl pyrophosphate, geranylgeranyl pyrophosphate, of farnesyl pyrophosphate (
Codon optimized: As used herein, a recombinant gene is “codon optimized” when its nucleotide sequence is modified to accommodate codon bias of the host organism to improve gene expression and increase translational efficiency of the gene.
Expression cassette: As used herein, an “expression cassette” is a nucleic acid that comprises a gene and a regulatory sequence operatively coupled to the gene such that the promoter drives the expression of the gene in a cell. An example is a gene for an enzyme with a promoter functional in yeast, where the promoter is situated such that the promoter drives the expression of the enzyme in a yeast cell.
Olivetolic Acid Cyclase Enzymes from Microorganisms and Non-Cannabis Plants
Provided herein is an olivetolic acid cyclase (OAC or altOAC or PKSC) enzyme derived from a microorganism or a non-Cannabis plant, where the enzyme catalyzes cyclization of a polyketide (I) into a resorcyclic acid derivative (II), where R1=CH3, CH2CH3, (CH2)2CH3, (CH2)3CH3, (CH2)4CH3, (CH2)5CH3, or (CH2)6CH3
Many enzymes in nature catalyze cyclization reactions of polyketides. The ability of the plant enzyme to make OAC is not unique; enzymes derived from other species are able to catalyze the same reaction if the substrate is present.
It is useful to have several possible enzymes to catalyze a single step in any biosynthesis pathway. A new enzyme may have better catalytic properties than the original enzyme, may be higher throughput or have better binding affinity for the substrate. Other properties can make an alternative enzyme desirable as well, including better expression in a heterologous host and greater robustness under fermentation conditions, or requirements that more closely match the requirements of other enzymes in the pathway. There is also a benefit to using multiple OAC enzymes in combination within a single host in order to take advantage of different properties of the enzymes. Some variant enzymes may have longer half-lives inside the host, and so would be able to sustain high rates of catalysis for a longer time period. An additional desirable property would be the ability to selectively make one of the alkyl chain variant cannabinoids depicted in
In some embodiments, the enzyme is engineered by selected conservative or non-conservative amino acid changes to alter its substrate specificity making them more or less likely to synthesize a specific alkyl chain variant. Other enzymes are able to synthesize multiple variants. Enzymes with increased specificity for only one chain length are desirable for industrial purposes as a pure substrate leads to a pure product and eliminates the need to separate different but closely related variants after the fermentation.
These enzymes may be sourced from nature or specificity may be engineered into an enzyme by changing some residues, resulting in a non-natural amino acid sequence with improved properties. An example of this is the Y24F mutation described in Yang (2016) that increases the capacity of the Cannabis OAC for making OA, and also increases its specificity for OA over the lactone byproduct.
Enzyme variations may also result from mutations introduced into the DNA and amino acid sequences to prevent or promote post translational modifications of the protein. Nonlimiting examples of post translational modifications include phosphorylation, acetylation, methylation, SUMOylation, ubiquitination, proteolytic cleavage, lipidation, including prenylation such as farnesylation or myristoylation, glycosylation, nitrosylation and biotinylation.
In some embodiments, the naturally occurring enzymes found in nature from non-Cannabis sources (e.g., a microorganism or a non-Cannabis plant) carries out the cyclization reaction using the same mechanism as the plant enzyme, i.e., a C2-C7 aldol cyclization. In other embodiments, the enzyme carries out the polyketide cyclization is by a different mechanism, for example Diekmann condensation, Claisen condensation or Knoevenagel condensation (
Enzymes may be sourced from nature by homology to the plant enzyme, resulting in mostly DABB domain or DABB domain-like proteins. The OAC enzyme from Cannabis acts as a dimer. Novel altOAC or PKSC enzymes from other organisms may also act as dimer, though they need not necessarily dimerize in order to be active. They have potential to act as monomers or as higher order complexes, binding either other OACs or other proteins.
Enzymes that catalyze this type of reaction in fungi and bacteria are often involved in secondary metabolite biosynthesis. Genes involved in biosynthesis of a single compound are often organized into clusters in the genomes of fungi and bacteria, so searching genomes for secondary metabolite clusters is a way to uncover desirable candidate genes. Thus, in some embodiments, the altOAC or PKSC is encoded by a gene that is located within a secondary metabolite cluster in a microorganism.
Another source of cyclases is individual domains of fungal polyketide synthases (PKSs). Polyketides are the basis for the biosynthesis of many secondary metabolites. Polyketide synthases (PKSs) are the enzymes that biosynthesize polyketides and these enzymes are conserved in multiple kingdoms. PKSs are particularly common in bacteria, plants and fungi. PKS enzymes are classified into three types (
It is expected that engineered altOAC or PKSC enzyme from a microorganism would have low amino acid sequence homology (e.g., less than 60%, less than 50%, less than 40%, less than 30% or less than 20% sequence homology) to amino acid sequences of OAC from a Cannabis sp (for example, to sequence set forth in SEQ ID NO: 379). In some preferred embodiments, the engineered olivetolic acid cyclase enzyme has an amino acid sequence that is less than 60% homologous to sequence set forth in SEQ ID NO: 334. In some preferred embodiments, the engineered olivetolic acid cyclase enzyme has an amino acid sequence that is less than 50%, less than 40%, less than 30% or less than 20% homologous to sequence set forth in SEQ ID NO: 334.
The present teachings include an engineered olivetolic acid cyclase (OAC or altOAC or PKSC) enzyme derived from a microorganism or a non-Cannabis plant, wherein the enzyme catalyzes cyclization of a polyketide (I) into a resorcyclic acid derivative (II), where R1=CH3, CH2CH3, (CH2)2CH3, (CH2)3CH3, (CH2)4CH3, (CH2)5CH3, or (CH2)6CH3
and wherein the enzyme comprises an amino acid sequence that is at least 90% or at least 95% identical, or at least 99% identical to any one of the sequences set forth in SEQ ID NO: 193-SEQ ID NO: 378. In some preferred embodiments, the enzyme comprises an amino acid sequence that is one of the sequences set forth in SEQ ID NO: 193-SEQ ID NO: 378. In some embodiments, the enzyme comprises an amino acid sequence that is at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, or at least 90% identical to SEQ ID NO: 334. In some embodiments, the enzyme comprises an amino acid sequence that is at least 50%, at least 60%, at least 70%, at least 80%, or at least 90% identical to the sequence set forth in SEQ ID NO: 193, or in SEQ ID NO: 194, or in SEQ ID NO: 195.
The present teachings also include an isolated codon-optimized polynucleotide that encodes the above-identified engineered OAC enzymes, comprising a nucleotide sequence that is at least 50%, at least 60%, at least 70%, at least 80%, or at least 90% identical to the sequence set forth in SEQ ID NO: 1, or in SEQ ID NO: 2, or in SEQ ID NO: 3. In some embodiments, the codon-optimized polynucleotide that encodes the above-identified engineered OAC enzymes comprises a nucleotide sequence that is at least 90%, at least 95%, or at least 99% identical to any one of the sequences set forth in SEQ ID NO: 1-SEQ ID NO: 186. In some embodiments, the codon-optimized polynucleotide that encodes the above-identified engineered OAC enzymes comprises a nucleotide sequence that is at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or at least 95% identical to SEQ ID NO: 142. In some embodiments, the codon-optimized polynucleotide is inserted in a vector configured for replication and protein expression in microbial (e.g., yeast) cells.
In some embodiments, engineered olivetolic acid cyclase enzyme (e.g., PKSC) sequences listed in Table 1 and in the Sequence Listing are defined as (i) having at least 50% sequence identity to a reference sequence and (ii) functioning as part of a large, multiprotein complex. In some embodiments, engineered PKSC sequences, such as comprising amino acid sequence set forth in SEQ ID NOs: 196, 199, 202, 205, 208, 211, 214, 217, 220, 223, 235, 238 or 250, have at least 50% sequence identity to SEQ ID NO: 193. In other embodiments, engineered PKSC sequences, such as comprising amino acid sequence set forth in SEQ ID NOs: 197, 203, 206, 209, 212, 215, 218, 221, 224, 236, 251, 256 or 257, have at least 50% sequence identity to SEQ ID NO: 194. In yet other embodiments, engineered PKSC sequences, such as comprising amino acid sequence set forth in SEQ ID NOs: 204, 207, 210, 213, 216, 219, 222, 237, 255, 258, 259, have at least 50% sequence identity to SEQ ID NO: 195. In yet other embodiments, engineered olivetolic acid cyclase enzyme (e.g., altOAC) sequences listed in Table 1 and in the Sequence Listing comprise an amino acid sequence that is at least 50% identical to the sequence set forth in either SEQ ID NO: 385 or SEQ ID NO: 386. In yet other embodiments, engineered olivetolic acid cyclase enzyme (e.g., altOAC) sequences listed in Table 1 and in the Sequence Listing comprise an amino acid sequence that is at least 50% identical to the sequence set forth in SEQ ID NO: 334.
AltOAC sequences (for example, SEQ ID NO: 260-SEQ ID NO: 378) are characterized by greater sequence diversity, smaller protein size and the ability to act independently of non-OAC polypeptides. OACs also frequently contain a DABB domain, a protein motif that is associated with stress response. They share this in common with the OAC enzyme from Cannabis sativa and as a result may have some homology to csOAC. Similarities in amino acid sequences underlie similarities in protein function for the claimed engineered olivetolic acid cyclase enzyme. In some embodiments, altOAC enzymes contain a DABB domain, which is a structural motif that is often present in proteins involved in stress response in plants. Two sequences of DABB domain are provided herein (SEQ ID NO: 385-SEQ ID NO: 386), which can be used as reference sequences during aligning of sequences of altOAC enzymes disclosed herein. In some preferred embodiments, engineered olivetolic acid cyclase enzyme (e.g., altOAC) sequences comprise an amino acid sequence that is at least 50% identical to the sequence set forth in either SEQ ID NO: 385 or SEQ ID NO: 386. Not all proteins that contain a DABB domain will be capable of catalyzing the OAC reaction, but the presence of a DABB domain is likely in sequences that are catalytically capable of catalyzing the OAC reaction. Similarly, PKSC enzymes are likely to contain structural interaction motifs to facilitate their recruitment to and interaction with other subunits in a multi protein complex.
PKSC enzymes were chosen to belong within (derived from) the following protein superfamilies: Abhydrolase superfamily, ABM superfamily, Acyl_transf_1 superfamily, AdoMet_MTases superfamily, AFD_class_I superfamily, BioC superfamily, cond_enzymes superfamily, cupin_RmlC-like superfamily, entF superfamily, EthD superfamily, fabG superfamily, hot_dog superfamily, MDR superfamily, NADB_Rossmann superfamily, NRPS_MxcG superfamily, omega_3_PfaA superfamily, PKS_MbtD superfamily, PKS_NbtC superfamily, PP-binding superfamily, PRK12467 superfamily, PKS_MbtD superfamily, PKS_NbtC superfamily, PP-binding superfamily, and/or PRK12467 superfamily. These superfamilies have the corresponding accession numbers as provided by the National Center for Biotechnology Information (NCBI) structure database (NCBI, Conserved Protein Domain Families): c117068, c109936, c108282, c116912, c117173, c127680, c117068, c137044, c140423, c110022, c121494, c135902, c135333, c109938, c121454, c140423, c141646, c137173, c100509, c141573, c141574, c136129.
PKSC enzymes within these superfamilies of proteins were also chosen to include one or more of the following protein domains: A_NRPS, ACP, AcpP, acyl_carrier, Acyl_transf_1, ADH_zinc_N, AdoMet_MTases, alpha_am_amid, AMP-binding, BioC, CaiC, Cupin_2, cupin_RmlC-like, Dabb, enoyl_red, EntF, entF, FabD, fabG, ketoacyl-synt, KR, KR_2_FAS_SDR_x, KR_FAS_SDR_x, ManC, Methyltransf_11, Methyltransf_12, NRPS_MxcG, omega_3_PfaA, PKS, PKS_AT, PKS_DH, PKS_ER, PKS_KR, PKS_KS, PKS_MbtD, PKS_MbtD superfamily, PKS_NbtC, PKS_NbtC superfamily, PKS_PP, PKS_TE, PksD, PLN02752, PLN02836, PP-binding, PP-binding superfamily, PRK06333, PRK07314, PRK12467, PRK12467 superfamily, PS-DH, PT_fungal_PKS, PTZ00050, PTZ00354, Qor, quinone_pig3, SAT, Thioesterase, UbiE, and/or ubiE.
The altOAC and PKSC enzymes can be naturally occurring enzymes, or enzymes derived from a naturally occurring enzyme, now known or later discovered, that occurs in any living organism, for example a bacterium, an archaeon, a protist, a fungus, an algae, an animal or a plant.
Many enzymes catalyze reactions of these classes using similar substrates, but have never been tested for activity on cannabinoids. To determine a source of an altOAC or PKSC enzyme, microbes can be screened for bioconversion activity of appropriate cannabinoids, after the methods of Abbott (1977). Enzymes from the above listed enzyme classes can be found, e.g., from the sequenced genomes, by cloning enzymes homologous to other cyclases, or by other cloning methods, thereby identifying good candidates for OAC activity. Organisms that make molecules similar to desired cannabinoids can also be identified from literature and those genomes searched as well to identify additional candidate enzymes. Bioinformatics methods to do this are provided in U.S. Pat. No. 10,671,632.
In some embodiments, the gene for the altOAC or PKSC enzyme is derived from a bacterium. It is envisioned that an altOAC or PKSC enzyme derived from any bacterium now known or later discovered can be utilized in the present invention. For example, the bacterium can be from phylum Abditibacteriota, including class Abditibacteria, including order Abditibacteriales; phylum Abyssubacteria or Acidobacteria, including class Acidobacteriia, Blastocatellia, Holophagae, Thermoanaerobaculia, or Vicinamibacteria, including order Acidobacteriales, Bryobacterales, Blastocatellales, Acanthopleuribacterales, Holophagales, Thermotomaculales, Thermoanaerobaculales, or Vicinamibacteraceae; phylum Actinobacteria, including class Acidimicrobiia, Actinobacteria, Actinomarinidae, Coriobacteriia, Nitriliruptoria, Rubrobacteria, or Thermoleophilia, including orders Acidimicrobiales, Acidothermales, Actinomycetales, Actinopolysporales, Bifidobacteriales, Nanopelagicales, Catenulisporales, Corunebacteriales, Cryptosporangiales, Frankiales, Geodermatophilales, Glycomycetales, Jiangellales, Micrococcales, Micromonosporales, Nakamurellales, Propionibacteriales, Pseudonocardiales, Sporichthyales, Streptomycetales, Streptosporangiales, Actinomarinales, Coriobacteriales, Eggerthellales, Egibacterales, Egicoccales, Euzebyales, Nitriliruptorales, Gaiellales, Rubrobacterales, Solirubrobacterales, or Thermoleophilales; phylum Aquificae, including class Aquificae, including order Aquificales or Desulfurobacteriales; phylum Armatimonadetes, including class Armatimonadia, including order Armatimonadales, Capsulimonadales, Chthonomonadetes, Chthonomonadales, Fimbriimonadia, or Fimbriimonadales; phylum Aureabacteria or Bacteroidetes, including class Armatimonadia, Bacteroidia, Chitinophagia, Cytophagia, Flavobacteria, Saprospiria or Sphingobacteriia, including order Bacteroidales, Marinilabiliales, Chitinophagales, Cytophagales, Flavobacteriales, Saprospirales, or Sphingopacteriales; phylum Balneolaeota, Caldiserica, Calditrichaeota, or Chlamydiae, including class Balneolia, Caldisericia, Calditrichae, or Chlamydia, including order Balneolales, Caldisericales, Calditrichales, Anoxychlamydiales, Chlamydiales, or Parachlamydiales; phylum Chlorobi or Chloroflexi, including class Chlorobia, Anaerolineae, Ardenticatenia, Caldilineae, Thermofonsia, Chloroflexia, Dehalococcoidia, Ktedonobacteria, Tepidiformia, Thermoflexia, Thermomicrobia, or Sphaerobacteridae, including order Chlorobiales, Anaerolineales, Ardenticatenales, Caldilineales, Chloroflexales, Herpetosiphonales, Kallotenuales, Dehalococcoidales, Dehalogenimonas, Ktedonobacterales, Thermogemmatisporales, Tepidiformales, Thermoflexales, Thermomicrobiales, or Sphaerobacterales; phylum Chrysiogenetes, Cloacimonetes, Coprothermobacterota, Cryosericota, or Cyanobacteria, including class Chrysiogenetes, Coprothermobacteria, Gloeobacteria, or Oscillatoriophycideae, including order Chrysiogenales, Coprothermobacterales, Chroococcidiopsidales, Gloeoemargaritales, Nostocales, Pleurocapsales, Spirulinales, Synechococcales, Gloeobacterales, Chroococcales, or Oscillatoriales; phyla: Eferribacteres, Deinococcus-thermus, Dictyoglomi, Dormibacteraeota, Elusimicrobia, Eremiobacteraeota, Fermentibacteria, or Fibrobacteres, including class Deferribacteres, Deinococci, Dictyoglomia, Elusimicrobia, Endomicrobia, Chitinispirillia, Chitinivibrionia, or Fibrobacteria, including order Deferribacterales, Deinococcales, Thermales, Dictyoglomales, Elusimicrobiales, Endomicrobiales, Chitinspirillales, Chitinvibrionales, Fibrobacterales, or Fibromonadales; phylum Firmicutes, Fusobacteria, Gemmatimonadetes, or Hydrogenedentes, including class Bacilli, Clostridia, Erysipelotrichia, Limnochordia, Negativicutes, Thermolithobacteria, Tissierellia, Fusobacteriia, Gemmatimonadetes, Longimicrobia, including order Bacillales, Lactobacillales, Borkfalkiales, Clostridiales, Halanaerobiales, Natranaerobiales, Thermoanaerobacterales, Erysipelotrichales, Limnochordales, Acidaminococcales, Selenomonadales, Veillonellales, Thermolithobacterales, Tissierellales, Fusobacteriales, Gemmatimonadales, or Longimicrobia; phylum Hydrogenedentes, Ignavibacteriae, Kapabacteria, Kiritimatiellaeota, Krumholzibacteriota, Kryptonia, Latescibacteria, LCP-89, Lentisphaerae, Margulisbacteria, Marinimicrobia, Melainabacteria, Nitrospinae, or Omnitrophica, including class Ignavibacteria, Kiritimatiellae, Krumholzibacteria, Lentisphaeria, Oligosphaeria, or Nitrospinae, including order Ignavibacteriales, Kiritimatiellales, Krumholzibacteriales, Lentisphaerales, Victivallales, Oligosphaerales, or Nitrospinia; phylum Omnitrophica or Planctomycetes, including class Brocadiae, Phycisphaerae, Planctomycetia, or Phycisphaerales, including order Sedimentisphaerales, Tepidisphaerales, Gemmatales, Isosphaerales, Pirellulales, or Planctomycetales; phylum Proteobacteria including class Acidithiobacillia, Alphaproteobacteria, Betaproteobacteria, Lambdaproteobacteria, Muproteobacteria, Deltaproteobacteria, Epsilonproteobacteria, Gammaproteobacteria, Hydrogenophilalia, Oligoflexia, or Zetaproteobacteria, including order Acidithiobacillales, Caulobacterales, Emcibacterales, Holosporales, Iodidimonadales, Kiloniellales, Kopriimonadales, Kordiimonadales, Magnetococcales, Micropepsales, Minwuiales, Parvularculales, Pelagibacterales, Rhizobiales, Rhodobacterales, Rhodospirillales, Rhodothalassiales, Rickettsiales, Sneathiellales, Sphingomonadales, Burkholderiales, Ferritrophicales, Ferrovales, Neisseriales, Nitrosomonadales, Procabacteriales, Rhodocyclales, Bradymonadales, Acidulodesulfobacterales, Desulfarculales, Desulfobacterales, Desulfovibrionales, Desulfurellales, Desulfuromonadales, Myxococcales, Syntrophobacterales, Campylobacterales, Nautiliales, Acidiferrobacterales, Aeromonadales, Alteromonadales, Arenicellales, Cardiobacteriales, Cellvibrionales, Chromatiales, Enterobacterales, Immundisolibacterales, Legionellales, Methylococcales, Nevskiales, Oceanospirillales, Orbales, Pasteurellales Pseudomonadales, Salinisphaerales, Thiotrichales, Vibrionales, Xanthomonadales, Hydrogenophilales, Bacteriovoracales, Bdellovibrionales, Oligoflexales, Silvanigrellales, or Mariprofundales; phylum Rhodothermaeota, Saganbacteria, Sericytochromatia, Spirochaetes, Synergistetes, Tectomicrobia, or Tenericutes, including class Rhodothermia, Spirochaetia, Synergistia, Izimaplasma, or Mollicutes, including order Rhodothermales, Brachyspirales, Brevinematales, Leptospirales, Spirochaetales, Synergi stales, Acholeplasmatales, Anaeroplasmatales, Entomoplasmatales, or Mycoplasmatales; phylum Thermodesulfobacteria, Thermotogae, Verrucomicrobia, or Zixibacteria, including class Thermodesulfobacteria, Thermotogae, Methylacidiphilae, Opitutae, Spartobacteria, or Verrucomicrobiae, including order Thermodesulfobacteriales, Kosmotogales, Mesoaciditogales, Petrotogales, Thermotogales, Methylacidiphilales, Opitutales, Puniceicoccales, Xiphinematobacter, Chthoniobacterales, Terrimicrobium, or Verrucomicrobiales.
In other embodiments, the gene for the enzyme is derived from an archaeon. It is envisioned that an altOAC or PKSC enzyme derived from any archaeon now known or later discovered can be utilized in the present invention. For example, the archaeon can be from phylum Euryarchaeota, including class Archaeoglobi, Hadesarchaea, Halobacteria, Methanobacteria, Methanococci, Methanofastidiosa, Methanomicrobia, Methanopyri, Nanohaloarchaea, Theionarchaea, Thermococci, or Thermoplasmata, including order Archaeoglobales, Hadesarchaeales, Halobacteriales, Methanobacteriales, Methanococcales, Methanocellales, Methanomicrobiales, Methanophagales, Methanosarcinales, Methanopyrales, Thermococcales, Methanomassiliicoccales, Thermoplasmatales, or Nanoarchaeales; DPANN superphylum, including subphyla Aenigmarcheota, Altiarchaeota, Diapherotrites, Micrarchaeota, Nanoarchaeota, Pacearchaeota, Parvarchaeota, or Woesearchaeota; TACK superphylum, including subphylum Korarchaeota, Crenarchaeota, Aigarchaeota, Geoarchaeota, Thaumarchaeota, or Bathyarchaeota; Asgard superphylum including subphylium Odinarchaeota, Thorarchaeota, Lokiarchaeota, Helarchaeota, or Heimdallarchaeota.
In additional embodiments, the gene for the altOAC or PKSC enzyme is derived from a fungus. It is envisioned that an altOAC or PKSC enzyme from any fungus now known or later discovered can be utilized in the present invention. This includes but is not limited to the phyla Chytridiomycota, Basidiomycota, Ascomycota, Blastocladiomycota, Ascomycota, Microsporidia, Basidiomycota, Glomeromycota, Symbiomycota, and Neocallimastigomycota. For example, the fungus can be from the phylum Ascomycota, including classes and orders Pezizomycotina, Arthoniomycetes, Coniocybomycetes, Dothideomycetes, Eurotiomycetes, Geoglossomycetes, Laboulbeniomycetes, Lecanoromycetes, Leotiomycetes, Lichinomycetes, Orbiliomycetes, Pezizomycetes, Sordariomycetes, Xylonomycetes, Lahmiales, Itchiclahmadion, Triblidiales, Saccharomycotina, Saccharomycetes, Taphrinomycotina, Archaeorhizomyces, Neolectomycetes, Pneumocystidomycetes, Schizosaccharomycetes, Taphrinomycetes; phylum Basidiomycota including subphyla or classes Pucciniomycotina, Ustilaginomycotina, Wallemiomycetes, and Entorrhizomycetes; subphylum Agaricomycotina including classes Tremellomycetes, Dacrymycetes, and Agaricomycetes; phylum Symbiomycota, including class Entorrhizomycota; subphylum Ustilaginomycotina including classes Ustilaginomycetes and Exobasidiomycetes; phylum Glomeromycota including classes Archaeosporomycetes, Glomeromycetes, and Paraglomeromycetes; subphylum Pucciniomycotina including orders and classes: Pucciniomycotina, Cystobasidiomycetes, Agaricostilbomycetes, Microbotryomycetes, Atractiellomycetes, Classiculomycetes, Mixiomycetes, and Cryptomycocolacomycetes; subphylum incertae sedis Mucoromyceta including orders Calcarisporiellomycota and Mucoromycota; phylum Mortierellomyceta including class Mortierellomycota; subphylum incertae sedis Entomophthoromycotina including order Entomophthorales; phylum Zoopagomyceta including classes Basidiobolomycota, Entomophthoromycota, Kickxellomycota, and Zoopagomycotina; subphylum incertae sedis Mucoromycotina including orders Mucorales, Endogonales, and Mortierellales; phylum Neocallimastigomycota including class Neocallimastigomycetes; phylum Blastocladiomycota including classes Physodermatomycetes and Blastocladiomycetes; phylum Rozellomyceta including classes Rozellomycota and Microsporidia; phylum Aphelidiomyceta including class Aphelidiomycota; Chytridiomyceta including classes Chytridiomycetes and Monoblepharidomycetes; and phylum Oomycota including classes or orders Leptomitales, Myzocytiopsidales, Olpidiopsidales, Peronosporales, Pythiales, Rhipidiales, Salilagenidiales, Saprolegniales, Sclerosporales, Anisolpidiales, Lagenismatales, Rozellopsidales, and Haptoglossales.
The present invention is additionally directed to nucleic acids encoding any of the above-described altOAC and PKSC enzymes including but are not limited to the microbial and non-Cannabis plant OACs. Gene sequences can be determined using the techniques disclosed in U.S. Pat. No. 10,671,632, or by any other method known in the art. Table 1 provides SEQ ID NOs for nucleic acid and amino acid sequences listed in the sequence listing provided below. A Clustermap showing homologies between the amino acid sequences of select altOACs is provided in
In some embodiments, the nucleic acids are codon optimized to improve expression, e.g., using techniques as disclosed in U.S. Pat. No. 10,435,727. More specifically, optimized nucleotide sequences are generated based on a number of considerations: (1) For each amino acid of the recombinant polypeptide to be expressed, a codon (triplet of nucleotide bases) is selected based on the frequency of each codon in the Saccharomyces cerevisiae genome; the codon can be chosen to be the most frequent codon or can be selected probabilistically based on the frequencies of all possible codons. (2) In order to prevent DNA cleavage due to a restriction enzyme, certain restriction sites are removed by changing codons that cover those sites. (3) To prevent low-complexity regions, long repeats (sequences of any single base longer than five bases) are modified. (2) and (3) are performed recursively to ensure that codon modification does not lead to additional undesirable sequences. (4) A ribosome binding site is added to the N-terminus. (5) A stop codon is added.
In various embodiments, the nucleic acids further comprise additional nucleic acids encoding amino acids that are not part of the altOAC or PKSC enzyme. In some of these embodiments, the additional sequences encode additional amino acids present when the nucleic acid is translated, encoding, for example, an additional protein domain, with or without a linker sequence, creating a fusion protein. Other examples are localization sequences, i.e., signals directing the localization of the folded protein to a specific subcellular compartment or membrane.
In some embodiments, the nucleic acids have, at the 5′ end, a nucleic acid encoding codon optimized cofolding peptides to create a fusion protein, e.g., comprising SEQ ID NOs:188-192 (Table 2), joining the sequences together to form a fusion polypeptide, e.g., comprising the amino acid sequence of SEQ ID NO:380-384 fused at the N terminus of the enzyme polypeptide, generating recombinant fusion polypeptides.
Further provided are non-naturally occurring nucleic acids that encode an enzyme having the enzymatic activity of any of the non-naturally occurring altOAC or PKSC enzymes described above, or a naturally occurring enzyme having OAC activity. The nucleic acids may be codon optimized, e.g., for production in yeast.
In some embodiments, the nucleic acid comprises additional nucleotide sequences that are not translated. Nonlimiting examples include promoters, terminators, barcodes, Kozak sequences, targeting sequences, and enhancer elements. Particularly useful here are promoters that are functional in yeast.
Expression of a gene encoding an enzyme is determined by the promoter controlling the gene. In order for a gene to be expressed, a promoter must be present within 1,000 nucleotides upstream of the gene. A gene is generally cloned under the control of a desired promoter. The promoter regulates the amount of enzyme expressed in the cell and also the timing of expression, or expression in response to external factors such as sugar source.
Any promoter now known or later discovered can be utilized to drive the expression of the altOAC or PKSC genes described herein. See e.g. http://parts.igem.org/Yeast for a listing of various yeast promoters. Exemplary promoters listed in Table 3 below drive strong expression, constant gene expression, medium or weak gene expression, or inducible gene expression. Inducible or repressible gene expression is dependent on the presence or absence of a certain molecule. For example, the GAL1, GAL7, and GAL10 promoters are activated by the presence of the sugar galactose and repressed by the presence of the sugar glucose. The HO promoter is active and drives gene expression only in the presence of the alpha factor peptide. The HXT1 promoter is activated by the presence of glucose while the ADH2 promoter is repressed by the presence of glucose.
In various embodiments, the nucleic acid is in an expression cassette, e.g., a yeast expression cassette. Any yeast expression cassette capable of expressing the enzyme in a yeast cell can be utilized. In some embodiments, the expression cassette consists of a nucleic acid encoding an altOAC or PKSC with a promoter.
Additional regulatory elements can also be present in the expression cassette, including restriction enzyme cleavage sites, antibiotic resistance genes, integration sites, auxotrophic selection markers, origins of replication, and degrons.
The expression cassette can be present in a vector that, when transformed into a host cell, either integrates into chromosomal DNA or remains episomal in the host cell. Such vectors are well-known in the art. See e.g. http://parts.igem.org/Yeast for a listing of various yeast vectors.
A nonlimiting example of a yeast vector is a yeast episomal plasmid (YEp) that contains the pBluescript II SK(+) phagemid backbone, an auxotrophic selectable marker, yeast and bacterial origins of replication and multiple cloning sites enabling gene cloning under a suitable promoter (see Table 3). Other exemplary vectors include pRS series plasmids.
The present invention is also directed to genetically engineered host cells that comprise the above-described nucleic acids. Such cells may be, e.g., any species of filamentous fungus, including but not limited to any species of Aspergillus, which have been genetically altered to produce precursor molecules, intermediate molecules, or cannabinoid molecules. Host cells may also be any species of bacteria, including but not limited to Escherichia, Corynebacterium, Caulobacter, Pseudomonas, Streptomyces, Bacillus, or Lactobacillus.
In some embodiments, the genetically engineered host cell is a yeast cell, which may comprise any of the above-described expression cassettes, and capable of expressing the recombinant altOAC or PKSC enzyme encoded therein.
Any yeast cell capable of being genetically engineered can be utilized in these embodiments. Nonlimiting examples of such yeast cells include species of Saccharomyces, Candida, Pichia, Schizosaccharomyces, Scheffersomyces, Blakeslea, Rhodotorula, or Yarrowia.
These cells can achieve gene expression controlled by inducible promoter systems; natural or induced mutagenesis, recombination, and/or shuffling of genes, pathways, and whole cells performed sequentially or in cycles; overexpression and/or deletion of single or multiple genes and reducing or eliminating parasitic side pathways that reduce precursor concentration.
The host cells of the recombinant organism may also be engineered to produce any or all precursor molecules necessary for the biosynthesis of cannabinoids, including but not limited to olivetol (OL), farnesyl diphosphate (FPP) and geranyl diphosphate (GPP), hexanoic acid and hexanoyl-CoA, malonic acid and malonyl-CoA, dimethylallylpyrophosphate (DMAPP) and isopentenylpyrophosphate (IPP) as disclosed in U.S. Pat. No. 10,435,727.
The gene encoding the enzyme can be cloned into vectors with the proper regulatory elements for gene expression (e.g. promoter, terminator) and the derived plasmid can be confirmed by DNA sequencing. As an alternative to expression from an episomal plasmid, the gene encoding the enzyme may be inserted into the recombinant host genome. Integration may be achieved by a single or double cross-over insertion event of a plasmid, or by nuclease-based genome editing methods, as are known in the art e.g. CRISPR, TALEN and ZFR. Strains with the integrated gene can be screened by rescue of auxotrophy and genome sequencing. See, e.g., Green and Sambrook (2012).
To produce the desired cannabinoid, each candidate polypeptide may be introduced into a host cell genetically modified to contain all necessary components for cannabinoid biosynthesis using standard yeast cell transformation techniques (Green and Sambrook, 2012), e.g., other enzymes in the cannabinoid biosynthetic pathway such as PKS, geranyl pyrophosphate synthase (see, e.g., U.S. Provisional Patent Application 63/141,486), prenyltransferase (see, e.g., U.S. Provisional Patent Application 63/053,539), and the enzymes described in U.S. Provisional Patent Application 63/164,126. Cells are subjected to fermentation under conditions that activate the promoter controlling the candidate polypeptide (see, e.g., Table 2). The broth may be subsequently subjected to HPLC analysis.
In some embodiments, for recombinant enzyme purification, the gene encoding the enzyme is cloned into an expression vector such as the pET expression vectors from Novagen, transformed into a protease deficient strain of E. coli such as BL21 and expressed by induction with IPTG. The protein of interest may be tagged with a common tag to facilitate purification, e.g. hexahistidine, GST, calmodulin, TAP, AP, CAT, HA, FLAG, MBP etc. Coexpression of a bacterial chaperone such as dnaK, GroES/GroEL or SecY may help facilitate protein folding. See Green and Sambrook (2012).
The present invention is also directed to a method of converting a polyketide (I) into a resorcyclic acid derivative (II), where R1=CH3, CH2CH3, (CH2)2CH3, (CH2)3CH3, (CH2)4CH3, (CH2)5CH3, or (CH2)6CH3,
The method comprises contacting the polyketide with any of the olivetolic acid cyclase (OAC) enzymes described herein in a manner and for a time sufficient to convert the polyketide (I) into the resorcyclic acid derivative (II).
In some embodiments, the method is carried out in vitro (outside of a cell). In other embodiments, the polyketide and the OAC enzyme are present in a living microorganism, for example any of the recombinant host cells described above, such as a yeast cell. In various embodiments, the recombinant host cell can further convert the resorcyclic acid derivative into a cannabinoid, and/or synthesize the polyketide from precursors.
In various specific embodiments, samples from fermentations of recombinant hosts expressing the cannabinoid pathway with fungal olivetolic acid cyclases outlined above are: (i) prepared and extracted using a combination of fermentation, dissolution, and purification steps; and (ii) analyzed by HPLC for the presence of directing molecules, precursor molecules, intermediate molecules, and target molecules such as OA, OL and common variants.
In various embodiments, the host cells are provided with various feedstocks to drive production of the desired ergolines, e.g., glucose, fructose, sucrose, galactose, raffinose, maltose, ethanol, xylose, fatty acids, glycerol, acetate, molasses, malt syrup, corn steep liquor, dairy, flour, protein powder, olive mill waste, fish waste, etc. for example as discussed in U.S. patent application Ser. No. 17/068,636.
In various embodiments, an inducer is used to activate the expression of the OAC pathway, such as the expression of altOAC or PKSC, or a combination of PKSC genes, or a combination of the cannabis OAC, csOAC, with altOAC and PKSC genes. Inducers include: galactose, glycerol, sucrose, maltose, lactose, glucose, hexanoic acid, hexanol, butyric acid, butanol, tributyrin, xylose, copper, and/or zinc.
In some embodiments, a vitamin mixture is added to a fermentation. Such as mixture can contain: choline chloride, niacin, pyridoxine hydrochloride, riboflavin, calcium pantothenate, para-aminobenzoic acid (PABA), thiamine HCL, biotin, cyanocobalamin, and/or folic acid, and mineral mixes, which can include calcium chloride dihydrate, ferrous sulfate heptahydrate, manganese (II) sulfate monohydrate, copper sulfate pentahydrate, zinc sulfate heptahydrate, magnesium chloride, and solutes, such as glycerol, up to 10% v/v. Since these are oxidoreductase reactions, they may be stimulated by changing the redox potential of the culture. This can be accomplished by addition of oxidants such as H2O2, sulfuric acid (H2SO4), nitric acid (HNO3), potassium permanganate (KMnO4), and/or Fenton's reagent, antioxidants such as ascorbic acid, butylated hydroxyanisole (BHA), and/or butylated hydroxytoluene (BHT), or reductants such as 2-Mercaptoethanol (B-ME), dithiothreitol (DTT), glutathione, cysteine hydrochloride, and/or tris(2-carboxyethyl)phosphine (TCEP)
The following enumerated embodiments are representative of the invention:
1. An olivetolic acid cyclase (OAC) enzyme derived from a microorganism or a non-Cannabis plant, wherein the enzyme catalyzes cyclization of a polyketide (I) into a resorcyclic acid derivative (II), where R1=CH3, CH2CH3, (CH2)2CH3, (CH2)3CH3, (CH2)4CH3, (CH2)5CH3, or (CH2)6CH3, wherein (I) and (II) have the following structural formulas:
2. The enzyme of embodiment 1, wherein the microorganism is a bacterium or a fungus.
3. The enzyme of embodiment 1, comprising one or more mutations that increase specificity for particular R1 alkyl chain length.
4. The enzyme of any one of embodiments 1-3, having an amino acid sequence that has less than 50% homology to the sequence set forth in SEQ ID NO: 379.
5. The enzyme of any one of embodiments 1-3, having an amino acid sequence that that is at least 95% identical to any one of the sequences set forth in SEQ ID NO: 193-SEQ ID NO: 378.
6. The enzyme of any one of embodiments 1-3, having an amino acid sequence that that is at least 50% identical to the sequence set forth in either SEQ ID NO: 385 or SEQ ID NO: 386.
7. The enzyme of any one of embodiments 1-3, having an amino acid sequence that that is at least 20% identical to the sequence set forth in SEQ ID NO: 334.
8. The enzyme of any one of embodiments 1-3, having an amino acid sequence that that is at least 50% identical to any one of the sequences set forth in SEQ ID NO: 193-SEQ ID NO: 195.
9. The enzyme of any one of embodiments 1-3, wherein the catalysis of the polyketide cyclization is by a mechanism selected from the group consisting of C2-C7 aldol condensation, Diekmann condensation, Claisen condensation, and Knoevenagel condensation.
10. An isolated nucleic acid encoding the enzyme of any one of embodiments 1-9.
11. The isolated nucleic acid of embodiment 10, which is codon optimized for production in yeast.
12. The codon-optimized nucleic acid of embodiment 11, inserted in a vector configured for replication and protein expression in yeast cells.
13. The isolated nucleic acid of any one of embodiments 10-12, having a nucleic acid sequence that is at least 95% identical to a sequence selected from the group consisting of SEQ ID NO: 1-SEQ ID NO: 186.
14. An expression cassette comprising the isolated nucleic acid of any one of embodiments 10-13.
15. The expression cassette of embodiment 14, which is a yeast expression cassette.
16. The expression cassette of embodiment 14 or embodiment 15, further comprising a nucleic acid fragment at the 5′ end of the isolated nucleic acid, wherein the nucleic acid fragment encodes a codon optimized cofolding peptide.
17. The expression cassette of embodiment 16, wherein the codon optimized cofolding peptide has an amino acid sequence that is at least 95% identical to a sequence selected from the group consisting of SEQ ID NO: 380-384.
18. A recombinant microorganism comprising the expression cassette of any one of embodiments 14-17, that expresses the engineered OAC enzyme encoded therein.
19. The recombinant microorganism of embodiment 18, which is a yeast cell.
20. The yeast cell of embodiment 19, which is a species of Saccharomyces, Candida, Pichia, Schizosaccharomyces, Scheffersomyces, Blakeslea, Rhodotorula, Aspergillus or Yarrowia.
21. The recombinant microorganism of any one of embodiments 18-20, further expressing at least one other enzyme in a cannabinoid biosynthetic pathway.
22. The recombinant microorganism of embodiment 21, wherein the at least one other enzyme is an OAC enzyme having an amino acid sequence that is at least 80% identical to the sequence set forth in SEQ ID NO: 379.
23. The recombinant microorganism of embodiment 21, wherein the at least one other enzyme is a polyketide synthase or a prenyltransferase.
24. The recombinant microorganism of any one of embodiments 21-23, capable of synthesizing a cannabinoid.
25. The recombinant microorganism of any one of embodiments 21-24, which is a yeast cell.
26. A method of converting a polyketide (I) into a resorcyclic acid derivative (II), where R1=CH3, CH2CH3, (CH2)2CH3, (CH2)3CH3, (CH2)4CH3, (CH2)5CH3, or (CH2)6CH3, wherein (I) and (II) have the following structural formulas:
and wherein the method comprises contacting the polyketide with the olivetolic acid cyclase (OAC) enzyme of any one of embodiments 1-10 in a manner and for a time sufficient to convert the polyketide (I) into the resorcyclic acid derivative (II).
In preferred embodiments of the method, the olivetolic acid cyclase (OAC) enzyme catalyzes cyclization of the polyketide by a mechanism selected from the group consisting of C2-C7 aldol condensation, Diekmann condensation, Claisen condensation, and Knoevenagel condensation.
27. The method of embodiment 26, wherein the polyketide and the OAC enzyme are in vitro.
28. The method of embodiment 26, wherein the polyketide and the OAC enzyme are present in a living microorganism.
29. The method of embodiment 28, wherein the living microorganism is the recombinant microorganism of any one of embodiments 18-25.
30. The method of embodiment 29, wherein the living microorganism is the yeast cell of embodiment 25, and wherein the resorcyclic acid derivative is converted into a cannabinoid in the yeast cell.
In view of the above, it will be seen that several objectives of the invention are achieved and other advantages attained.
As various changes could be made in the above methods and compositions without departing from the scope of the invention, it is intended that all matter contained in the above description and shown in the accompanying drawings shall be interpreted as illustrative and not in a limiting sense.
All references cited in this specification, including but not limited to patent publications and non-patent literature, and references cited therein, are hereby incorporated by reference. The discussion of the references herein is intended merely to summarize the assertions made by the authors and no admission is made that any reference constitutes prior art. Applicants reserve the right to challenge the accuracy and pertinence of the cited references.
As used herein, in particular embodiments, the terms “about” or “approximately” when preceding a numerical value indicates the value plus or minus a range of 10%. Where a range of values is provided, it is understood that each intervening value, to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limit of that range and any other stated or intervening value in that stated range is encompassed within the disclosure. That the upper and lower limits of these smaller ranges can independently be included in the smaller ranges is also encompassed within the disclosure, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the disclosure.
The indefinite articles “a” and “an,” as used herein in the specification and in the embodiments, unless clearly indicated to the contrary, should be understood to mean “at least one.”
The phrase “and/or,” as used herein in the specification and in the embodiments, should be understood to mean “either or both” of the elements so conjoined, i.e., elements that are conjunctively present in some cases and disjunctively present in other cases. Multiple elements listed with “and/or” should be construed in the same fashion, i.e., “one or more” of the elements so conjoined. Other elements can optionally be present other than the elements specifically identified by the “and/or” clause, whether related or unrelated to those elements specifically identified. Thus, as a non-limiting example, a reference to “A and/or B”, when used in conjunction with open-ended language such as “comprising” can refer, in one embodiment, to A only (optionally including elements other than B); in another embodiment, to B only (optionally including elements other than A); in yet another embodiment, to both A and B (optionally including other elements); etc.
As used herein in the specification and in the embodiments, “or” should be understood to have the same meaning as “and/or” as defined above. For example, when separating items in a list, “or” or “and/or” shall be interpreted as being inclusive, i.e., the inclusion of at least one, but also including more than one, of a number or list of elements, and, optionally, additional unlisted items. Only terms clearly indicated to the contrary, such as “only one of” or “exactly one of,” or, when used in the embodiments, “consisting of,” will refer to the inclusion of exactly one element of a number or list of elements. In general, the term “or” as used herein shall only be interpreted as indicating exclusive alternatives (i.e. “one or the other but not both”) when preceded by terms of exclusivity, such as “either,” “one of” “only one of” or “exactly one of.” “Consisting essentially of,” when used in the embodiments, shall have its ordinary meaning as used in the field of patent law.
As used herein in the specification and in the embodiments, the phrase “at least one,” in reference to a list of one or more elements, should be understood to mean at least one element selected from any one or more of the elements in the list of elements, but not necessarily including at least one of each and every element specifically listed within the list of elements and not excluding any combinations of elements in the list of elements. This definition also allows that elements can optionally be present other than the elements specifically identified within the list of elements to which the phrase “at least one” refers, whether related or unrelated to those elements specifically identified. Thus, as a non-limiting example, “at least one of A and B” (or, equivalently, “at least one of A or B,” or, equivalently “at least one of A and/or B”) can refer, in one embodiment, to at least one, optionally including more than one, A, with no B present (and optionally including elements other than B); in another embodiment, to at least one, optionally including more than one, B, with no A present (and optionally including elements other than A); in yet another embodiment, to at least one, optionally including more than one, A, and at least one, optionally including more than one, B (and optionally including other elements); etc.
The following examples are offered to illustrate but not to limit the methods, compositions, and uses provided herein. Certain aspects of the present invention, including, but not limited to, methods for producing cannabinoid compounds in microorganisms, such as yeast cells, were disclosed in the earlier published patents and patent publications U.S. Pat. Nos. 10,435,727; 10,671,632; 9,765,308; 11,028,417; 10,988,785; 11,041,002; 10,837,031, 11,293,038; 2020/0063170; 2020/0063171, the contents of which are incorporated herein by reference in its entirety.
Modified host cells that produce olivetolic acid (OA), and/or divarinic acid, and downstream cannabinoid compounds, such as the altOAC- and PKSC-expressing strains disclosed herein, express engineered altOAC or PKSC biosynthesis genes and enzymes, singly or in combination. Combining two or more altOAC- or PKSC-expressing genes in a microorganism can increase yields of production of OA, divarinic acid, and downstream cannabinoid compounds. More specifically, the OA-producing strain herein is grown in a minimal, complete culture media containing yeast nitrogen base, amino acids, vitamins, ammonium sulfate, and a carbon source of glucose and galactose. The recombinant host cells are grown in 24-well plates or shake flasks in a volume range of 2 mL to 100 mL of media starting from an inoculation density of OD600 nm=1. The strains herein can be harvested during a fermentation period ranging from 12 hours onward from the start of pathway enzyme induction.
Construction of the Saccharomyces OA and cannabinoid is carried out via expression of the altOAC or PKSC genes, singly, or co-expressed with genes which encode the downstream cannabinoid enzymes which can consume olivetolic acid or divarinic acid to produce cannabigerolic acid or cannabigerovarinic acid, as described in synthase in a GPP-production host as described in PCT/US21/42090 filed on Jul. 16, 2021, and in the U.S. Provisional Patent Application No. 63/553,539. AltOAC and PKSC genes encode the enzymes that synthesize olivetolic acid or divarinic acid which serve as a precursor for synthesis of valuable cannabinoids. In particular, they serve as a prenyl acceptor for a cannabinoid prenyltransferase, which combines the prenyl acceptor with a prenyl donor, such geranyl-pyrophsphate (GPP), farnesyl pyrophosphate (FPP), and/or geranylgeranyl-pyrophosphate (GGPP). Recombinant genes for producing prenyl donors can be co-expressed with altOAC and PKSC genes, alongside with cannabinoid enzymes, as described in PCT/US22/13857 and in the U.S. Provisional Patent Application No. 63/141,486. The optimized altOAC and pKASC genes described herein are synthesized using DNA synthesis techniques known in the art. The optimized genes can be cloned into vectors with the proper regulatory elements for gene expression (e.g. promoter, terminator) and the derived plasmid can be confirmed by DNA sequencing. As an alternative to expression from an episomal plasmid, the optimized altOAC and PKSC genes are inserted into the recombinant host genome. Integration is achieved by a single cross-over insertion event of the plasmid. Strains with the integrated gene can be screened by rescue of auxotrophy and genome sequencing.
Construction of Saccharomyces cerevisiae altOAC and/or PKSC production strains is carried out via expression of 1) an altOAC gene in combination with the cannabis OAC csOAC, 2) a PKSC1A-PKSC20A genes with a PKSC1B-PKSC25B gene with a PKSC1C-PKSC27C gene, 3) a PKSC1A-PKSC20A genes with a PKSC1B-PKSC25B gene, 4) a PKSC1A-PKSC20A genes with a PKSC1B-PKSC25B gene with a PKSC1C-PKSC27C gene with the recombinant cannabis OAC csOAC gene, 5) a PKSC1C-PKSC27C gene, or 6) the recombinant cannabis OAC, csOAC, coexpressed with a PKSC1C-PKSC27C gene. The optimized altOAC, csOAC, and PKSC genes are synthesized using DNA synthesis techniques known in the art. The optimized gene can be cloned into vectors with the proper regulatory elements for gene expression (e.g. promoter, terminator) and the derived plasmid can be confirmed by DNA sequencing. Plasmids can be constructed to contain multiple expression cassettes to encode multiple genes on a single plasmid by methods known to those skilled in the art. As an alternative to expression from an episomal plasmid, the optimized combination of genes is inserted into the recombinant host genome. Integration is achieved by a single cross-over insertion event of the plasmid. Strains with the integrated gene can be screened by rescue of auxotrophy and genome sequencing.
To identify fermentation-derived olivetolic acid, divarinic acid, olivetol, divarinol, their precursors, downstream cannabinoids, and all other products of a recombinant host expressing an engineered biosynthetic pathway for OA and cannabinoids, an Agilent 1100 series liquid chromatography (LC) system equipped with a reverse phase C18 column (Agilent Eclipse Plus C18, Santa Clara, CA, USA) is used. A gradient is used of mobile phase A (ultraviolet (UV) grade H2O+0.1% formic acid) and mobile phase B (UV grade acetonitrile+0.1% formic acid). Column temperature is set at 30° C. Compound absorbance is measured at 210 nm and 305 nm using a diode array detector (DAD) and spectral analysis from 200 nm to 400 nm wavelengths. A 0.1 mg/mL analytical standard is made from certified reference material for each compound (Cayman Chemical Company, USA). Each sample is prepared by diluting fermentation biomass from a recombinant host expressing the engineered biosynthesis pathway 1:3 or 1:20 in 100% acetonitrile and filtered in 0.2 um nanofilter vials. The retention time and UV-visible absorption spectrum (i.e., spectral fingerprint) of the samples are compared to the analytical standard retention time and UV-visible spectra (i.e. spectral fingerprint) when identifying the olivetolic acid and related compounds mentioned above. Examples of results from the detection of isolated cannabinoid products via fermentation of recombinant host organisms are shown in
Cyclase genes such as the altOACs and PKSCs described herein can be expressed in a modified Saccharomyces cerevisiae host cell to yield cannabinoid precursors, such as olivetolic acid and divarinic acid. Construction of a modified host cell expressing a cyclase such as the altOACs, and/or PKSCs can be accomplished by transforming a microorganism such as a modified Saccharomyces cerevisiae via chemical transformation of episomal plasmids containing the gene cassettes encoding cyclases such as altOAC and/or PKSCs. Such plasmid transformation protocols are known by those skilled in the art. Such transformation procedures include chemical transformations or electroporation via mixtures of plasmids and the host microorganism.
Cyclase genes such as the altOACs and PKSCs described herein can be expressed singly or in combination to yield cannabinoids such as cannabigerolic acid (CBGA). Construction of a modified Saccharomyces cerevisiae with OACs, altOACs, and/or PKSCs can be accomplished by transforming a microorganism such as a modified Saccharomyces cerevisiae via chemical transformation of episomal plasmids containing the gene cassettes encoding cyclases such as altOAC and/or PKSCs. Such plasmid transformation protocols are known by those skilled in the art. Such transformation procedures include chemical transformations or electroporation via mixtures of plasmids and the host microorganism. When the modified host strain expresses downstream genes of the cannabinoid biosynthesis pathway, such as a CBGA synthase, including those genes described in application US21/42090, the cyclase product, stemming from the cyclases disclosed, including the altOAC and/or PKSCs, is consumed by a CBGA synthase to yield the cannabinoid CBGA.
The present application claims priority to U.S. provisional patent application No. 63/194,121, filed on May 27, 2021, the disclosure and content of which is incorporated herein by reference in its entirety for all purposes.