BIOSYNTHESIS OF CANNABINOIDS AND CANNABINOID PRECURSORS

REFERENCE TO A SEQUENCE LISTING SUBMITTED AS A TEXT FILE VIA EFS-WEB

The instant application contains a Sequence Listing which has been submitted in ASCII format via EFS-Web and is hereby incorporated by reference in its entirety. Said ASCII copy, created on Oct. 8, 2021, is named G091970063WO00-SEQ-OMJ, and is 3,122,581 bytes in size.

FIELD OF INVENTION

The present disclosure relates to the biosynthesis of cannabinoids and cannabinoid precursors, such as in recombinant cells.

BACKGROUND

Cannabinoids are chemical compounds that may act as ligands for endocannabinoid receptors and have multiple medical applications. Traditionally, cannabinoids have been isolated from plants of the genus Cannabis. The use of plants for producing cannabinoids is inefficient, however, with isolated products often limited to the two most prevalent endogenous cannabinoids. THC and CBD, as other cannabinoids are typically produced in very low concentrations in Cannabis plants. Further, the cultivation of Cannabis plants is restricted in many jurisdictions. In addition, in order to obtain consistent results, Cannabis plants are often grown in a controlled environment, such as indoor grow rooms without windows, to provide flexibility in modulating growing conditions such as lighting, temperature, humidity, airflow, etc. Growing Cannabis plants in such controlled environments can result in high energy usage per gram of cannabinoid produced, especially for rare cannabinoids that the plants produce only in small amounts. For example, lighting in such grow rooms is provided by artificial sources, such as high-powered sodium lights. As many species of Cannabis have a vegetative cycle that requires 18 or more hours of light per day, powering such lights can result in significant energy expenditures. It has been estimated that between 0.88-1.34 kWh of energy is required to produce one gram of THC in dried Cannabis flower form (e.g., before any extraction or purification). Additionally, concern has been raised over agricultural practices in certain jurisdictions, such as California, where the growing season coincides with the dry season such that the water usage may impact connected surface water in streams (Dillis, Christopher, Connor McIntee, Van Butsic, Lance Le, Kason Grady, and Theodore Grantham. “Water storage and irrigation practices for Cannabis drive seasonal patterns of water extraction and use in Northern California.” Journal of Environmental Management 272 (2020): 110955).

Cannabinoids can also be produced through chemical synthesis (see, e.g., U.S. Pat. No. 7,323,576 to Souza et al). However, such methods suffer from low yields and high cost.

Production of cannabinoids, cannabinoid analogs, and cannabinoid precursors using engineered organisms may provide an advantageous approach to meet the increasing demand for these compounds.

SUMMARY

Aspects of the present disclosure provide methods for production of cannabinoids and cannabinoid precursors from fatty acid substrates using genetically modified host cells.

Aspects of the disclosure relate to chimeric prenyltransferases (PTs), wherein the chimeric PT comprises one or more portions of at least two different PTs and wherein the chimeric PT is capable of producing a CBG-type cannabinoid from a resorcylic acid. In some embodiments, the CBG-type cannabinoid and the resorcylic acid are: cannabigerolic acid (CBGA) and olivetolic acid; or cannabigerovarinic acid (CBGVA) and divaric acid (DA).

In some embodiments, the chimeric PT comprises one or more portions of CsPT1. In some embodiments, the chimeric PT comprises one or more portions of CsPT4. In some embodiments, the chimeric PT comprises one or more portions of CsPT6. In some embodiments, the chimeric PT comprises one or more portions of CsPT7.

In some embodiments, the chimeric PT comprises multiple transmembrane helices, and at least one transmembrane helix of the multiple transmembrane helices comprises one or more portions of at least two different CsPTs. In some embodiments, at least one transmembrane helix of the multiple transmembrane helices comprises both a portion of CsPT4 and a portion of CsPT1, CsPT6 or CsPT7. In some embodiments, all the transmembrane helices comprise both a portion of CsPT4 and a portion of CsPT1, CsPT6 or CsPT7.

In some embodiments, the chimeric PT comprises one or more of the following motifs: MTVMGMT (SEQ ID NO: 11); [EV][LMW][RS]P[SAP]F[ST]F[IL][IL]AF (SEQ ID NO: 12); QFFEFIW (SEQ ID NO: 13), HNTNL (SEQ ID NO: 14); TCWKL (SEQ ID NO: 15); M[IL]LSHAILAFC (SEQ ID NO: 16); HVG[LV][AN]FT[SCF]Y[YS]A[ST][RT][AS]A[LF] (SEQ ID NO: 17); GLIVT (SEQ ID NO: 18); L[YH]YAEY[LF]V (SEQ ID NO: 19); KAFFAL (SEQ ID NO: 20); KLGARNMT (SEQ ID NO: 21); QAF[NK]SN (SEQ ID NO: 22); LIFQT (SEQ ID NO: 23), SIIVALT (SEQ ID NO: 24); MSIETAW (SEQ ID NO: 25); VVSGV (SEQ ID NO: 26); RPYVV (SEQ ID NO: 27); KPDLP (SEQ ID NO: 28); RWKQY (SEQ ID NO: 29); FLITI (SEQ ID NO: 30); DIEGD (SEQ ID NO: 31); and KYGVST (SEQ ID NO: 32).

In some embodiments, the chimeric PT comprises the structure: X1-X2-X3-X4-X5-X6-X7-X8-X9-X10, wherein at least one of X1, X2, X3, X4, X5, X6, X7, X8, X9 or X10 comprises a portion of CsPT4. In some embodiments, at least one of X1, X3, X5, X7, and X9 comprises a portion of CsPT4. In some embodiments, all of X1, X3, X5, X7, and X9 comprise portions of CsPT4. In some embodiments, at least one of X2, X4, X6, X8, and X10 comprises a portion of CsPT1, CsPT6, or CsPT7. In some embodiments, all of X2, X4, X6, X8, and X10 comprise portions of CsPT1, CsPT6 or CsPT7.

In some embodiments, the chimeric PT comprises the structure: X1-X2-X3-X4-X5-X6-X7-X8-X9-X10, and: the sequence of X1 comprises any of SEQ ID NOs: 33-39 or a sequence that comprises no more than 2 amino acid substitutions, insertions, additions or deletions relative to any one of SEQ ID NOs: 33-39; the sequence of X2 comprises any of SEQ ID NOs: 40-46 or a sequence that comprises no more than 2 amino acid substitutions, insertions, additions or deletions relative to any one of SEQ ID NOs: 40-46; the sequence of X3 comprises any of SEQ ID NOs: 47-53 or a sequence that comprises no more than 2 amino acid substitutions, insertions, additions or deletions relative to any one of SEQ ID NOs: 47-53; the sequence of X4 comprises any of SEQ ID NOs: 54-60 or a sequence that comprises no more than 2 amino acid substitutions, insertions, additions or deletions relative to any one of SEQ ID NOs: 54-60; the sequence of X5 comprises any of SEQ ID NOs: 61-67 or a sequence that comprises no more than 2 amino acid substitutions, insertions, additions or deletions relative to any one of SEQ ID NOs: 61-67; the sequence of X6 comprises any of SEQ ID NOs: 68-74 or a sequence that comprises no more than 2 amino acid substitutions, insertions, additions or deletions relative to any one of SEQ ID NOs: 68-74; the sequence of X7 comprises any of SEQ ID NOs: 75-81 or a sequence that comprises no more than 2 amino acid substitutions, insertions, additions or deletions relative to any one of SEQ ID NOs: 75-81; the sequence of X8 comprises any of SEQ ID NOs: 82-88 or a sequence that comprises no more than 2 amino acid substitutions, insertions, additions or deletions relative to any one of SEQ ID NOs: 82-88; the sequence of X9 comprises any of SEQ ID NOs: 89-95 or a sequence that comprises no more than 2 amino acid substitutions, insertions, additions or deletions relative to any one of SEQ ID NOs: 89-95; and/or the sequence of X10 comprises any of SEQ ID NOs: 96-102 or a sequence that comprises no more than 2 amino acid substitutions, insertions, additions or deletions relative to any one of SEQ ID NOs: 96-102.

In some embodiments, the chimeric PT comprises a sequence that is at least 90% identical to any one of SEQ ID NOs: 113-121, 757-868, and 982-1081. In some embodiments, the chimeric PT comprises any one of SEQ ID NOs: 113-118, 757-868, and 982-1081.

In some embodiments, the chimeric PT comprises an amino acid substitution relative to SEQ ID NO: 5 at one or more of the following positions within SEQ ID NO: 5: C31, M43, M75, I46, F82, F83, I86, M87, D94, E113, F145, I147, F151, Q162, A227, S232, F245, Q267, Q288, and L311. In some embodiments, the chimeric PT comprises one or more of the following amino acid substitutions relative to SEQ ID NO: 5: C31F, M43V, M43L, I46C, M75V, F82G, F83Y, I86S, I86A, I86G, I86V, I86S, M87V, M87I, D94E, E113R, I140L, F145T, F145L, F145S, I147L, F151T, A227K. S232R, F245R, F245W, T254N, Q267F, Q288R, L331N, and L311R. In some embodiments, the chimeric PT is capable of producing more CBGA from olivetolic acid or more CBGVA from divaric acid than a chimeric PT that comprises SEQ ID NO:324.

Further aspects of the disclosure relate to polynucleotides encoding any of the chimeric PTs of the disclosure. In some embodiments, the polynucleotide comprises a sequence that is at least 90% identical to any one of SEQ ID NOs: 136-144, 869-980, and 1083-1182. In some embodiments, the polynucleotide comprises the sequence of any one of SEQ ID NOs: 136-144, 869-980, and 1083-1182.

Further aspects of the disclosure relate to fusion proteins comprising chimeric PTs of the disclosure wherein the fusion protein further comprises a farnesyl pyrophosphate synthase. In some embodiments, the farnesyl pyrophosphate synthase comprises a mutation that increases the production of geranylpyrophosphate relative to farnesylpyrophosphate. In some embodiments, the farnesyl pyrophosphate synthase sequence comprises a tryptophan residue at a residue corresponding to residues 96, 127, or both 96 and 127, in wild-type ERG20 (SEQ ID NO: 424).

In some embodiments, the farnesyl pyrophosphate synthase is amino terminal to the chimeric prenyltransferase within the fusion protein. In some embodiments, the farnesyl pyrophosphate synthase and the chimeric prenyltransferase are separated by a linker sequence. In some embodiments, the linker comprises any one of SEQ ID NOs: 104-109, or a sequence that comprises no more than 2 amino acid substitutions, insertions, additions or deletions relative to any one of SEQ ID NOs: 104-109.

In some embodiments, the sequence of the farnesyl pyrophosphate synthase comprises one or more of the following motifs: NVPGGKLNR (SEQ ID NO: 647); FYLPVALA[LM]H (SEQ ID NO: 648); A[EH]D[IV]LIPLG (SEQ ID NO: 651); LGW[CL][ITV]ELLQA[FY]FL (SEQ ID NO: 655); KKEV[FL][ET][SA]FL[AGN]KIYK (SEQ ID NO: 663), QRK[VI]L[DE]ENYG (SEQ ID NO: 667); VGMIAIWD (SEQ ID NO: 672); TDI[QK]DNKCSW (SEQ ID NO: 673); TAYYSFYLP (SEQ ID NO: 676); GKIGTDI[QK]DNKCSW (SEQ ID NO: 677); ILIP[LM]GEYFQ (SEQ ID NO: 680); IL[VM][EP][ML]G[ET][YF]FQ (SEQ ID NO: 683); AKIYKRSK (SEQ ID NO: 685); DPEVIGKI (SEQ ID NO: 686); RGQPCW[YF]RVP[EQ] (SEQ ID NO: 687); IVKYKTA[YF]Y[ST]FYLP (SEQ ID NO: 689); WC[IV]E[LW]LQA[YF][WF]LV[ALW]D (SEQ ID NO: 692); CSWLV[VN]Q[AC]L[AQ][RI][AC][ST]P[ED]Q (SEQ ID NO: 699).

In some embodiments, the farnesyl pyrophosphate synthase comprises a sequence that is at least 90% identical to any one of SEQ ID NOs: 103, 426-476, or 753. In some embodiments, the farnesyl pyrophosphate synthase comprises any one of SEQ ID NOs: 426-476 or 753.

In some embodiments, the fusion protein comprises a sequence that is at least 90% identical to any one of SEQ ID NOs: 532-582 or 755. In some embodiments, the fusion protein comprises any one of SEQ ID NOs: 532-582 or 755.

Further aspects of the disclosure relate to host cells comprising any of the chimeric PTs or fusion proteins associated with the disclosure. In some embodiments, the host cell comprises one or more copies of a heterologous farnesyl pyrophosphate synthase. In some embodiments, one or more copies of the farnesyl pyrophosphate synthase are integrated into the genome of the host cell. In some embodiments, the host cell is a plant cell, an algal cell, a yeast cell, a bacterial cell, or an animal cell. In some embodiments, the host cell is a yeast cell. In some embodiments, the yeast cell is a Saccharomyces cell, a Yarrowia cell, a Komagataella cell, or a Pichia cell. In some embodiments, the Saccharomyces cell is a Saccharomyces cerevisiae cell. In some embodiments, the host cell is a bacterial cell. In some embodiments, the bacterial cell is an E. coli cell.

In some embodiments, the host cell further comprises one or more heterologous polynucleotides encoding one or more of: an acyl activating enzyme (AAE), a polyketide synthase (PKS), a polyketide cyclase (PKC), and/or a terminal synthase (TS). In some embodiments, the PKS is an olivetol synthase (OLS).

Further aspects of the disclosure relate to methods comprising culturing any of the host cells associated with the disclosure.

Further aspects of the disclosure relate to host cells that comprises a heterologous polynucleotide encoding a farnesyl pyrophosphate synthase wherein the sequence of the farnesyl pyrophosphate synthase comprises one or more of the following motifs: NVPGGKLNR (SEQ ID NO: 647); FYLPVALA[LM]H (SEQ ID NO: 648); A[EH]D[IV]LIPLG (SEQ ID NO: 651); LGW[CL][ITV]ELLQA[FY]FL (SEQ ID NO: 655); KKEV[FL][ET][SA]FL[AGN]KIYK (SEQ ID NO: 663); QRK[VI]L[DE]ENYG (SEQ ID NO: 667); VGMIAIWD (SEQ ID NO: 672); TDI[QK]DNKCSW (SEQ ID NO: 673); TAYYSFYLP (SEQ ID NO: 676); GKIGTDI[QK]DNKCSW (SEQ ID NO: 677); ILIP[LM]GEYFQ (SEQ ID NO: 680); IL[VM][EP][ML]G[ET][YF]FQ (SEQ ID NO: 683); AKIYKRSK (SEQ ID NO: 685); DPEVIGKI (SEQ ID NO: 686); RGQPCW[YF]RVP[EQ](SEQ ID NO: 687); IVKYKTA[YF]Y[ST]FYLP (SEQ ID NO: 689); WC[IV]E[LW]LQA[YF][WF]LV[ALW]D (SEQ ID NO: 692); CSWLV[VN]Q[AC]L[AQ][RI][AC][ST]P[ED]Q (SEQ ID NO: 699); wherein the farnesyl pyrophosphate synthase does not comprise SEQ ID NO: 103 or SEQ ID NO: 424.

In some embodiments, the farnesyl pyrophosphate synthase comprises a sequence that is at least 90% identical to any one of SEQ ID NOs: 426-476 or 753. In some embodiments, the farnesyl pyrophosphate synthase comprises any one of SEQ ID NOs: 426-476 or 753.

Further aspects of the disclosure relate to polynucleotides encoding a chimeric PT, wherein the polynucleotide comprises a sequence that is at least 90% identical to any one of SEQ ID NOs: 136-144, 869-980, and 1083-1182.

Further aspects of the disclosure relate to non-naturally occurring polynucleotides encoding a farnesyl pyrophosphate synthase, wherein the non-naturally occurring polynucleotide comprises a sequence that is at least 90% identical to any one of SEQ ID NOs: 479-529 or 754.

Further aspects of the disclosure relate to polynucleotides encoding a fusion protein, wherein the polynucleotide comprises a sequence that is at least 90% identical to any one of SEQ ID NOs: 585-635, 728-752 or 756.

Further aspects of the disclosure relate to vectors comprising any of the polynucleotides associated with the disclosure. Further aspects of the disclosure relate to expression cassettes comprising any of the polynucleotides associated with the disclosure. Further aspects of the disclosure relate to host cells transformed with any of the polynucleotides associated with the disclosure, any of the vectors associated with the disclosure, or any of the expression cassettes associated with the disclosure.

Further aspects of the disclosure relate to variant PTs or active fragments thereof comprising a non-naturally occurring amino acid sequence relative to a wild-type PT, wherein the variant PT or active fragment thereof acts on a substrate to produce an altered amount of a cannabinoid relative to the amount of the cannabinoid produced by the wild-type PT. In some embodiments, the variant PT or active fragment thereof comprises an amino acid substitution relative to a prenyltransferase of SEQ ID NO: 5. In some embodiments, the variant PT or active fragment thereof comprises an amino acid substitution relative to SEQ ID NO: 5 at one or more of the following positions within SEQ ID NO: 5: C31, M43, I46, F82, F83, I86, M87, D94, E113, S119, V122, F145, I147, F151, Q162, S232, F245, Q267, Q288, and L311. In some embodiments, the PT comprises one or more of the following amino acid substitutions relative to SEQ ID NO: 5: C31F, M43V, M43L, I46C, F82G, F83Y, I86S, I86A, I86G, I86V, I86S, M87V, M87I, D94E, E113R, F145T, F145L, F145S, I147L, F151T, S232R, F245R, F245W, Q267F, Q288R, L331N, and L311R.

In some embodiments, the variant PT or active fragment thereof produces an increased amount of CBGA relative to the amount of CBGA produced by the wild-type PT. In some embodiments, the variant PT or active fragment thereof produces an increased amount of CBGVA relative to the amount of CBGVA produced by the wild-type PT.

Further aspects of the disclosure relate to polynucleotides encoding variant PTs or active fragments thereof. Further aspects of the disclosure relate to vectors comprising variant PTs or active fragments thereof. Further aspects of the disclosure relate to expression cassettes comprising variant PTs or active fragments thereof. Further aspects of the disclosure relate to host cells transformed with polynucleotides, vectors, or expression cassettes comprising variant PTs or active fragments thereof.

Further aspects of the disclosure relate to methods of producing a cannabinoid comprising reacting:

- a) a CBG-type compound, and
- b) a prenyl pyrophosphate, in the presence of: a chimeric PT associated with the disclosure, a PT encoded by a polynucleotide associated with the disclosure, a fusion protein associated with the disclosure, or a variant PT associated with the disclosure.

In some embodiments, the compound of Formula (6) is CBGA or CBGVA. In some embodiments, the prenyl pyrophosphate is geranyl pyrophosphate.

Further aspects of the disclosure relate to bioreactors for producing a cannabinoid compound. In some embodiments, the bioreactors comprise a chimeric PT associated with the disclosure, a PT encoded by a polynucleotide associated with the disclosure, a fusion protein associated with the disclosure, a variant PT associated with the disclosure, and/or a host cell associated with the disclosure.

Each of the limitations of the invention can encompass various embodiments of the invention. It is, therefore, anticipated that each of the limitations of the invention involving any one element or combinations of elements can be included in each aspect of the invention. This disclosure is not limited in its application to the details of construction and the arrangement of components set forth in the following description or illustrated in the drawings. The invention is capable of other embodiments and of being practiced or of being carried out in various ways. Also, the phraseology and terminology used in this application is for the purpose of description and should not be regarded as limiting. The use of “including,” “comprising,” or “having,” “containing,” “involving,” and variations thereof, is meant to encompass the items listed thereafter and equivalents thereof as well as additional items.

BRIEF DESCRIPTION OF DRAWINGS

The accompanying drawings are not intended to be drawn to scale. In the drawings, each identical or nearly identical component that is illustrated in various figures is represented by a like numeral. For purposes of clarity, not every component may be labeled in every drawing. In the drawings:

FIG. 1 is a schematic depicting the native Cannabis biosynthetic pathway for production of cannabinoid compounds, including five enzymatic steps mediated by: (R1a) acyl activating enzymes (AAE); (R2a) olivetol synthase enzymes (OLS); (R3a) olivetolic acid cyclase enzymes (OAC); (R4a) prenyltransferase enzymes (PT); and (R5a) terminal synthase enzymes (TS). Formulae 1a-11a correspond to hexanoic acid (1a), hexanoyl-CoA (2a), malonyl-CoA (3a), 3,5,7-trioxododecanoyl-CoA (4a), olivetol (5a), olivetolic acid (6a), geranyl pyrophosphate (7a), cannabigerolic acid (8a), cannabidiolic acid (9a), tetrahydrocannabinolic acid (10a), and cannabichromenic acid (11a). Hexanoic acid is an exemplary carboxylic acid substrate; other carboxylic acids may also be used (e.g., butyric acid, isovaleric acid, octanoic acid, decanoic acid, etc.; see e.g., FIG. 3 below). The enzymes that catalyze the synthesis of 3,5,7-trioxododecanoyl-CoA and olivetolic acid are shown in R2a and R3a, respectively, and can include multi-functional enzymes that catalyze the synthesis of 3,5,7-trioxododecanoyl-CoA and olivetolic acid. The enzymes cannabidiolic acid synthase (CBDAS), tetrahydrocannabinolic acid synthase (THCAS), and cannabichromenic acid synthase (CBCAS) that catalyze the synthesis of cannabidiolic acid, tetrahydrocannabinolic acid, and cannabichromenic acid, respectively, are shown in step R5a. FIG. 1 is adapted from Carvalho et al. “Designing Microorganisms for Heterologous Biosynthesis of Cannabinoids” (2017) FEMS Yeast Research June 1:17(4), which is incorporated by reference in its entirety.

FIG. 2 is a schematic depicting a heterologous biosynthetic pathway for production of cannabinoid compounds, including five enzymatic steps mediated by: (RI) acyl activating enzymes (AAE); (R2) polyketide synthase enzymes (PKS) or bifunctional polyketide synthase-polyketide cyclase enzymes (PKS-PKC); (R3) polyketide cyclase enzymes (PKC) or bifunctional PKS-PKC enzymes; (R4) prenyltransferase enzymes (PT); and (R5) terminal synthase enzymes (TS). Any carboxylic acid of varying chain lengths, structures (e.g., aliphatic, alicyclic, or aromatic) and functionalization (e.g., hydroxylic-, keto-, amino-, thiol-, aryl-, or alogeno-) may also be used as precursor substrates (e.g., thiopropionic acid, hydroxy phenyl acetic acid, norleucine, bromodecanoic acid, butyric acid, isovaleric acid, octanoic acid, decanoic acid, etc).

FIG. 3 is a non-exclusive representation of select putative precursors for the cannabinoid pathway in FIG. 2.

FIG. 4 is a schematic showing a reaction catalyzed by a PT enzyme wherein Olivetolic Acid (OA, Formula (6a)) and Geranyl Pyrophosphate (GPP, Formula (7a)) are condensed to form either the major cannabinoid Cannabigerolic Acid (CBGA, Formula (8a)) or 2-O-Geranyl Olivetolic Acid (OGOA, Formula (8b)).

FIGS. 5A-5B depict 3-D structural models showing regions that were targeted for mutagenesis in a representative C. sativa PT (CsPT) protein. FIG. 5A depicts an approach whereby point mutations were generated at locations (depicted in black) spread throughout the whole sequence of a CsPT protein based on bioinformatics analysis. FIG. 5B depicts an approach whereby point mutations were focused within regions (depicted in black) near the active site of a representative CsPT protein. The active site is located around the pair of Mg2+ ions (depicted as spheres) and GPP substrate (depicted as sticks).

FIG. 6 depicts the crystal structure of the PT AfUbiA from A. fulgidus (corresponding to PDB ID 4TQ3; UniProt Accession No. 028625).

FIGS. 7A-7B depict approaches used to generate chimeras involving CsPT enzymes. FIG. 7A depicts an example of a “within membrane” approach for generating chimeras in which the cross-over points between different CsPT proteins occur within the membrane. FIG. 7B depicts an example of a “through membrane” approach for generating chimeras in which there is a single cross-over point between two helices. In the example shown in FIG. 7B, the cross-over point is between helices 6&7 of the CsPT protein.

FIG. 8 is a schematic showing a plasmid bearing the transcriptional unit encoding each PT. The coding sequence for the PT enzymes (labeled “Library gene”) was driven by the GAL1 promoter. The plasmid contains markers for both yeast (URA3) and bacteria (ampR), as well as origins of replication for yeast (2 micron), and bacteria (pBR322).

FIGS. 9A-9B depict graphs showing secondary screening activity data of PT enzymes, including point mutations, chimeric PTs, and PT fusion proteins based on an in vivo activity assay in S. cerevisiae described in Example 2. FIG. 9A depicts results for CBGA production and FIG. 9B depicts results for CBGVA production. Strain t444508, expressing a truncated CsPT4 protein (SEQ ID NO: 5), was used as a positive control and for determining hit ranking of the library members. Strain t444525, expressing GFP, was used as a negative control. The data represent the average of four bioreplicates±one standard deviation of the mean. Strain IDs and their corresponding activity from these graphs are shown in Table 5.

FIGS. 10A-10B depict graphs showing secondary screening activity data of PT fusion proteins based on an in vivo activity assay in S. cerevisiae described in Example 2. FIG. 10A depicts results for CBGA production, and FIG. 10B depicts results for CBGVA production. The data represent the average of four bioreplicates±one standard deviation of the mean. Strain IDs and their corresponding activity from these graphs are shown in Table 5.

FIGS. 11A-11B depict graphs showing secondary screening activity data of chimeric PTs based on an in vivo activity assay in S. cerevisiae described in Example 2. FIG. 11A depicts results for CBGA production, and FIG. 11B depicts results for CBGVA production. Strain IDs and their corresponding activity from these graphs are shown in Table 5.

FIGS. 12A-12B depict graphs showing activity data from a second-generation library of chimeric PTs and chimeric fusion proteins (Gen 2 library) based on an in vivo activity assay in S. cerevisiae described in Examples 3-4. Strain t612212, expressing a truncated CsPT4 protein (SEQ ID NO: 5), was used as a positive control and for determining hit ranking of the library members. FIG. 12A depicts results for CBGA production in the presence of 1 mM olivetolic acid (OA), and FIG. 12B depicts results for CBGVA production in the presence of 1 mM divaric acid (DA). Strain IDs and their corresponding activity from these graphs are shown in Table 7.

FIG. 13 depicts a graph showing activity data from a third-generation library of chimeric fusion proteins (Gen3 PT library) for CBGA production based on an in vivo activity assay in S. cerevisiae described in Example 5. Strain t704346, which comprises an ERG20ww-CsPT chimera identified in Example 4, was used as a benchmark for determining hit ranking of the library members. Strain IDs and their corresponding activity from this graph are shown in Table 8.

FIG. 14 depicts a graph showing library screening activity data of chimeric fusions including ERG20 homologs based on an in vivo activity assay for CBGA production in S. cerevisiae described in Example 6. Strains t756346 and t56349 were used as positive controls. Strain IDs and their corresponding activity from this graph are shown in Table 9.

FIGS. 15A-15B depict graphs showing activity data from a fourth-generation library of chimeric PTs (Gen 4 library) based on an in vivo activity assay in S. cerevisiae described in Example 7. The Gen4 library contained chimeric PTs from strains t523834 (SEQ ID NO: 114, corresponding to a CsPT1-CsPT4 chimera) and t524816 (SEQ ID NO: 116, corresponding to a CsPT4-CsPT7 chimera), described in Examples 1 and 2, which were modified to include point mutations characterized in Example 1. FIG. 15A depicts results for CBGA production and FIG. 15B depicts results for CBGVA production. Strain t827885, expressing a chimeric PT corresponding to SEQ ID NO: 324, was used as a positive control and for determining hit ranking of the library members. Strain t819232, expressing RFP, was used as a negative control. The data represent the average of four bioreplicates±one standard deviation of the mean. Strain IDs and their corresponding activity from these graphs are shown in Table 11.

FIG. 16 depicts a graph showing activity data from a fifth-generation library of chimeric PTs (Gen 5 library) based on an in vivo activity assay in S. cerevisiae described in Example 8. The Gen 5 library contained chimeric PTs from the Gen 4 library described in Example 7 that were modified to include additional point mutations. Strain t819140, expressing RFP, was used as a negative control. Strains t818980 and t819132 were used as positive controls. Strain IDs and their corresponding activity from this graph are shown in Table 12.

DETAILED DESCRIPTION

This disclosure provides methods for production of cannabinoids and cannabinoid precursors from fatty acid substrates using genetically modified host cells. Methods include heterologous expression of a prenyltransferase (PT). The application describes the identification of multiple PTs that can be functionally expressed in host cells such as S. cerevisiae cells. As demonstrated in Examples 1-8, synthetic chimeric PTs were generated that contain portions of different C. sativa PT proteins. Surprisingly, chimeric PTs, and fusion proteins including chimeric PTs, were identified that were capable of producing more cannabigerolic acid (CBGA) and/or cannabigerovarinic acid (CBGVA) than CsPT4.

Definitions

While the following terms are believed to be well understood by one of ordinary skill in the art, the following definitions are set forth to facilitate explanation of the disclosed subject matter.

The term “a” or “an” refers to one or more of an entity, i.e., can identify a referent as plural. Thus, the terms “a” or “an,” “one or more” and “at least one” are used interchangeably in this application. In addition, reference to “an element” by the indefinite article “a” or “an” does not exclude the possibility that more than one of the elements is present, unless the context clearly requires that there is one and only one of the elements.

The terms “microorganism” or “microbe” should be taken broadly. These terms are used interchangeably and include, but are not limited to, the two prokaryotic domains, Bacteria and Archaea, as well as certain eukaryotic fungi and protists. In some embodiments, the disclosure may refer to the “microorganisms” or “microbes” of lists/tables and figures present in the disclosure. This characterization can refer to not only the identified taxonomic genera of the tables and figures, but also the identified taxonomic species, as well as the various novel and newly identified or designed strains of any organism in the tables or figures. The same characterization holds true for the recitation of these terms in other parts of the specification, such as in the Examples.

The term “prokaryotes” is recognized in the art and refers to cells that contain no nucleus or other cell organelles. The prokaryotes are generally classified in one of two domains, the Bacteria and the Archaea.

“Bacteria” or “eubacteria” refers to a domain of prokaryotic organisms. Bacteria include at least 11 distinct groups as follows: (1) Gram-positive (gram+) bacteria, of which there are two major subdivisions: (a) high G+C group (Actinomycetes, Mycobacteria, Micrococcus, others) and (b) low G+C group (Bacillus, Clostridia, Lactobacillus, Staphylococci, Streptococci, Mycoplasmas); (2) Proteobacteria, e.g., Purple photosynthetic+non-photosynthetic Gram-negative bacteria (includes most “common” Gram-negative bacteria); (3) Cyanobacteria, e.g., oxygenic phototrophs; (4) Spirochetes and related species; (5) Planctomyces; (6) Bacteroides, Flavobacteria; (7) Chlamydia; (8) Green sulfur bacteria; (9) Green non-sulfur bacteria (also anaerobic phototrophs); (10) Radioresistant micrococci and relatives; and (11) Thermotoga and Thermosipho thermophiles.

The term “Archaea” refers to a taxonomic classification of prokaryotic organisms with certain properties that make them distinct from Bacteria in physiology and phylogeny.

The term “Cannabis” refers to a genus in the family Cannabaceae. Cannabis is a dioecious plant. Glandular structures located on female flowers of Cannabis, called trichomes, accumulate relatively high amounts of a class of terpeno-phenolic compounds known as phytocannabinoids (described in further detail below). Cannabis has conventionally been cultivated for production of fibre and seed (commonly referred to as “hemp-type”), or for production of intoxicants (commonly referred to as “drug-type”). In drug-type Cannabis, the trichomes contain relatively high amounts of tetrahydrocannabinolic acid (THCA), which can convert to tetrahydrocannabinol (THC) via a decarboxylation reaction, for example upon combustion of dried Cannabis flowers, to provide an intoxicating effect. Drug-type Cannabis often contains other cannabinoids in lesser amounts. In contrast, hemp-type Cannabis contains relatively low concentrations of THCA, often less than 0.3% THC by dry weight. Hemp-type Cannabis may contain non-THC and non-THCA cannabinoids, such as cannabidiolic acid (CBDA), cannabidiol (CBD), and other cannabinoids. Presently, there is a lack of consensus regarding the taxonomic organization of the species within the genus. Unless context dictates otherwise, the term “Cannabis” is intended to include all putative species within the genus, such as, without limitation, Cannabis sativa, Cannabis indica, and Cannabis ruderalis and without regard to whether the Cannabis is hemp-type or drug-type.

The term “cyclase activity” in reference to a polyketide synthase (PKS) enzyme (e.g., an olivetol synthase (OLS) enzyme) or a polyketide cyclase (PKC) enzyme (e.g., an olivetolic acid cyclase (OAC) enzyme), refers to the activity of catalyzing the cyclization of an oxo fatty acyl-CoA (e.g., 3,5,7-trioxododecanoyl-COA, 3,5,7-trioxodecanoyl-COA) to the corresponding intramolecular cyclization product (e.g., olivetolic acid, divarinic acid). In some embodiments, the PKS or PKC catalyzes the C2-C7 aldol condensation of an acyl-COA with three additional ketide moieties added thereto.

A “cytosolic” or “soluble” enzyme refers to an enzyme that is predominantly localized (or predicted to be localized) in the cytosol of a host cell.

A “eukaryote” is any organism whose cells contain a nucleus and other organelles enclosed within membranes. Eukaryotes belong to the taxon Eukarya or Eukaryota. The defining feature that sets eukaryotic cells apart from prokaryotic cells (i.e., bacteria and archaea) is that they have membrane-bound organelles, especially the nucleus, which contains the genetic material, and is enclosed by the nuclear envelope.

The term “host cell” refers to a cell that can be used to express a polynucleotide, such as a polynucleotide that encodes an enzyme used in biosynthesis of cannabinoids or cannabinoid precursors. The terms “genetically modified host cell,” “recombinant host cell,” and “recombinant strain” are used interchangeably and refer to host cells that have been genetically modified by, e.g., cloning and transformation methods, or by other methods known in the art (e.g., selective editing methods, such as CRISPR). Thus, the terms include a host cell (e.g., bacterial cell, yeast cell, fungal cell, insect cell, plant cell, mammalian cell, human cell, etc.) that has been genetically altered, modified, or engineered, so that it exhibits an altered, modified, or different genotype and/or phenotype, as compared to the naturally-occurring cell from which it was derived. It is understood that in some embodiments, the terms refer not only to the particular recombinant host cell in question, but also to the progeny or potential progeny of such a host cell.

The term “control host cell,” or the term “control” when used in relation to a host cell, refers to an appropriate comparator host cell for determining the effect of a genetic modification or experimental treatment. In some embodiments, the control host cell is a wild type cell. In other embodiments, a control host cell is genetically identical to the genetically modified host cell, except for the genetic modification(s) differentiating the genetically modified or experimental treatment host cell. In some embodiments, the control host cell has been genetically modified to express a wild type or otherwise known variant of an enzyme being tested for activity in other test host cells.

The term “heterologous” with respect to a polynucleotide, such as a polynucleotide comprising a gene, is used interchangeably with the term “exogenous” and the term “recombinant” and refers to: a polynucleotide that has been artificially supplied to a biological system; a polynucleotide that has been modified within a biological system, or a polynucleotide whose expression or regulation has been manipulated within a biological system. A heterologous polynucleotide that is introduced into or expressed in a host cell may be a polynucleotide that comes from a different organism or species from the host cell, or may be a synthetic polynucleotide, or may be a polynucleotide that is also endogenously expressed in the same organism or species as the host cell. For example, a polynucleotide that is endogenously expressed in a host cell may be considered heterologous when it is situated non-naturally in the host cell; expressed recombinantly in the host cell, either stably or transiently; modified within the host cell; selectively edited within the host cell; expressed in a copy number that differs from the naturally occurring copy number within the host cell; or expressed in a non-natural way within the host cell, such as by manipulating regulatory regions that control expression of the polynucleotide. In some embodiments, a heterologous polynucleotide is a polynucleotide that is endogenously expressed in a host cell but whose expression is driven by a promoter that does not naturally regulate expression of the polynucleotide. In other embodiments, a heterologous polynucleotide is a polynucleotide that is endogenously expressed in a host cell and whose expression is driven by a promoter that does naturally regulate expression of the polynucleotide, but the promoter or another regulatory region is modified. In some embodiments, the promoter is recombinantly activated or repressed. For example, gene-editing based techniques may be used to regulate expression of a polynucleotide, including an endogenous polynucleotide, from a promoter, including an endogenous promoter. See, e.g., Chavez el al., Nat Methods. 2016 July; 13(7): 563-567. A heterologous polynucleotide may comprise a wild-type sequence or a mutant sequence as compared with a reference polynucleotide sequence.

The term “at least a portion” or “at least a fragment” of a nucleic acid or polypeptide means a portion having the minimal size characteristics of such sequences, or any larger fragment of the full length molecule, up to and including the full length molecule. A fragment of a polynucleotide of the disclosure may encode a biologically active portion of an enzyme, such as a catalytic domain. A biologically active portion of a genetic regulatory element may comprise a portion or fragment of a full length genetic regulatory element and have the same type of activity as the full length genetic regulatory element, although the level of activity of the biologically active portion of the genetic regulatory element may vary compared to the level of activity of the full length genetic regulatory element.

A coding sequence and a regulatory sequence are said to be “operably joined” or “operably linked” when the coding sequence and the regulatory sequence are covalently linked and the expression or transcription of the coding sequence is under the influence or control of the regulatory sequence. If the coding sequence is to be translated into a functional protein, the coding sequence and the regulatory sequence are said to be operably joined if induction of a promoter in the 5′ regulatory sequence promotes transcription of the coding sequence and if the nature of the linkage between the coding sequence and the regulatory sequence does not (1) result in the introduction of a frame-shift mutation, (2) interfere with the ability of the promoter region to direct the transcription of the coding sequence, or (3) interfere with the ability of the corresponding RNA transcript to be translated into a protein.

The terms “link,” “linked,” or ‘linkage’ means two entities (e.g., two polynucleotides or two proteins) are bound to one another by any physicochemical means. Any linkage known to those of ordinary skill in the art, covalent or non-covalent, is embraced. In some embodiments, a nucleic acid sequence encoding an enzyme of the disclosure is linked to a nucleic acid encoding a signal peptide. In some embodiments, an enzyme of the disclosure is linked to a signal peptide. Linkage can be direct or indirect.

The terms “transformed” or “transform” with respect to a host cell refer to a host cell in which one or more nucleic acids have been introduced, for example on a plasmid or vector or by integration into the genome. In some instances where one or more nucleic acids are introduced into a host cell on a plasmid or vector, one or more of the nucleic acids, or fragments thereof, may be retained in the cell, such as by integration into the genome of the cell, while the plasmid or vector itself may be removed from the cell. In such instances, the host cell is considered to be transformed with the nucleic acids that were introduced into the cell regardless of whether the plasmid or vector is retained in the cell or not.

The term “volumetric productivity” or “production rate” refers to the amount of product formed per volume of medium per unit of time. Volumetric productivity can be reported in gram per liter per hour (g/L/h).

The term “specific productivity” of a product refers to the rate of formation of the product normalized by unit volume or mass or biomass and has the physical dimension of a quantity of substance per unit time per unit mass or volume [M·T⁻¹·M⁻¹or M·T⁻¹·L⁻³, where M is mass or moles, T is time, L is length].

The term “biomass specific productivity” refers to the specific productivity in gram product per gram of cell dry weight (CDW) per hour (g/g CDW/h) or in mmol of product per gram of cell dry weight (CDW) per hour (mmol/g CDW/h). Using the relation of CDW to OD600 for the given microorganism, specific productivity can also be expressed as gram product per liter culture medium per optical density of the culture broth at 600 nm (OD) per hour (g/L/h/OD). Also, if the elemental composition of the biomass is known, biomass specific productivity can be expressed in mmol of product per C-mole (carbon mole) of biomass per hour (mmol/C-mol/h).

The term “yield” refers to the amount of product obtained per unit weight of a certain substrate and may be expressed as g product per g substrate (g/g) or moles of product per mole of substrate (mol/mol). Yield may also be expressed as a percentage of the theoretical yield. “Theoretical yield” is defined as the maximum amount of product that can be generated per a given amount of substrate as dictated by the stoichiometry of the metabolic pathway used to make the product and may be expressed as g product per g substrate (g/g) or moles of product per mole of substrate (mol/mol).

The term “titer” refers to the strength of a solution or the concentration of a substance in solution. For example, the titer of a product of interest (e.g., small molecule, peptide, synthetic compound, fuel, alcohol, etc.) in a fermentation broth is described as g of product of interest in solution per liter of fermentation broth or cell-free broth (g/L) or as g of product of interest in solution per kg of fermentation broth or cell-free broth (g/Kg).

The term “total titer” refers to the sum of all products of interest produced in a process, including but not limited to the products of interest in solution, the products of interest in gas phase if applicable, and any products of interest removed from the process and recovered relative to the initial volume in the process or the operating volume in the process. For example, the total titer of products of interest (e.g., small molecule, peptide, synthetic compound, fuel, alcohol, etc.) in a fermentation broth is described as g of products of interest in solution per liter of fermentation broth or cell-free broth (g/L) or as g of products of interest in solution per kg of fermentation broth or cell-free broth (g/Kg).

The term “amino acid” refers to organic compounds that comprise an amino group, —NH2, and a carboxyl group, —COOH. The term “amino acid” includes both naturally occurring and unnatural amino acids. Nomenclature for the twenty common amino acids is as follows: alanine (ala or A); arginine (arg or R); asparagine (asn or N); aspartic acid (asp or D); cysteine (cys or C); glutamine (gln or Q); glutamic acid (glu or E); glycine (gly or G); histidine (his or H); isoleucine (ile or I); leucine (leu or L); lysine (lys or K); methionine (met or M); phenylalanine (phe or F); proline (pro or P); serine (ser or S); threonine (thr or T); tryptophan (trp or W); tyrosine (tyr or Y); and valine (val or V). Non-limiting examples of unnatural amino acids include homo-amino acids, proline and pyruvic acid derivatives, 3-substituted alanine derivatives, glycine derivatives, ring-substituted phenylalanine derivatives, ring-substituted tyrosine derivatives, linear core amino acids, amino acids with protecting groups including Fmoc, Boc, and Cbz, β-amino acids (β3 and β2), and N-methyl amino acids.

The term “aliphatic” refers to alkyl, alkenyl, alkynyl, and carbocyclic groups. Likewise, the term “heteroaliphatic” refers to heteroalkyl, heteroalkenyl, heteroalkynyl, and heterocyclic groups.

The term “alkyl” refers to a radical of, or a substituent that is, a straight-chain or branched saturated hydrocarbon group having from 1 to 20 carbon atoms (“C_1-20alkyl”). In certain embodiments, the term “alkyl” refers to a radical of, or a substituent that is, a straight-chain or branched saturated hydrocarbon group having from 1 to 10 carbon atoms (“C_1-10alkyl”). In some embodiments, an alkyl group has 1 to 9 carbon atoms (“C_1-9alkyl”). In some embodiments, an alkyl group has 1 to 8 carbon atoms (“C_1-8alkyl”). In some embodiments, an alkyl group has 1 to 7 carbon atoms (“C_1-7alkyl”). In some embodiments, an alkyl group has 2 to 7 carbon atoms (“C_2-7alkyl”). In some embodiments, an alkyl group has 3 to 7 carbon atoms (“C_3-7alkyl”). In some embodiments, an alkyl group has 1 to 6 carbon atoms (“C_1-6alkyl”). In some embodiments, an alkyl group has 2 to 6 carbon atoms (“C_2-6alkyl”). In some embodiments, an alkyl group has 3 to 5 carbon atoms (“C_3-5alkyl”). In some embodiments, an alkyl group has 5 carbon atoms (“C₅alkyl”). In some embodiments, the alkyl group has 3 carbon atoms (“C₃alkyl”). In some embodiments, the alkyl group has 7 carbon atoms (“C₇alkyl”). In some embodiments, an alkyl group has 1 to 5 carbon atoms (“C_1-5alkyl”). In some embodiments, an alkyl group has 1 to 4 carbon atoms (“C_1-4alkyl”). In some embodiments, an alkyl group has 1 to 3 carbon atoms (“C_1-3alkyl”). In some embodiments, an alkyl group has 1 to 2 carbon atoms (“C_1-2alkyl”). In some embodiments, an alkyl group has 1 carbon atom (“C₁alkyl”).

Examples of C_1-6alkyl groups include methyl (C₁), ethyl (C₂), propyl (C₃) (e.g., n-propyl, isopropyl), butyl (C₄) (e.g., n-butyl, tert-butyl, sec-butyl, iso-butyl), pentyl (C₅) (e.g., n-pentyl, 3-pentanyl, amyl, neopentyl, 3-methyl-2-butanyl, tertiary amyl), and hexyl (C₆) (e.g., n-hexyl). Additional examples of alkyl groups include n-heptyl (C₇), n-octyl (C₈), and the like. Unless otherwise specified, each instance of an alkyl group is independently unsubstituted (an “unsubstituted alkyl”) or substituted (a “substituted alkyl”) with one or more substituents (e.g., halogen, such as F). In certain embodiments, the alkyl group is an unsubstituted C_1-10alkyl (such as unsubstituted C_1-6alkyl, e.g., —CH₃(Me), unsubstituted ethyl (Et), unsubstituted propyl (Pr, e.g., unsubstituted n-propyl (n-Pr), unsubstituted isopropyl (i-Pr)), unsubstituted butyl (Bu, e.g., unsubstituted n-butyl (n-Bu), unsubstituted tert-butyl (tert-Bu or t-Bu), unsubstituted sec-butyl (sec-Bu), unsubstituted isobutyl (i-Bu)). In certain embodiments, the alkyl group is a substituted C_1-10alkyl (such as substituted C_1-6alkyl, e.g., —CF₃, benzyl).

The term “acyl” refers to a group having the general formula —C(═O)R^X1, —C(═O)OR^X1, —C(═O)—O—C(═O)R^X1, —C(═O)SR^X1, —C(═O)N(R^X1)₂, —C(═S)R^X1, —C(═S)N(R^X1)₂, and —C(═S)S(R^X1), —C(═NR^X1)R^X1, —C(═NR^X1)OR^X1, —C(═NR^X1)SR^X1, and —C(═NR^X1)N(R^X1)₂, wherein R^X1is hydrogen; halogen; substituted or unsubstituted hydroxyl; substituted or unsubstituted thiol; substituted or unsubstituted amino; substituted or unsubstituted acyl, cyclic or acyclic, substituted or unsubstituted, branched or unbranched aliphatic; cyclic or acyclic, substituted or unsubstituted, branched or unbranched heteroaliphatic; cyclic or acyclic, substituted or unsubstituted, branched or unbranched alkyl; cyclic or acyclic, substituted or unsubstituted, branched or unbranched alkenyl; substituted or unsubstituted alkynyl; substituted or unsubstituted aryl, substituted or unsubstituted heteroaryl, aliphaticoxy, heteroaliphaticoxy, alkyloxy, heteroalkyloxy, aryloxy, heteroaryloxy, aliphaticthioxy, heteroaliphaticthioxy, alkylthioxy, heteroalkylthioxy, arylthioxy, heteroarylthioxy, mono- or di-aliphaticamino, mono- or di-heteroaliphaticamino, mono- or di-alkylamino, mono- or di-heteroalkylamino, mono- or di-arylamino, or mono- or di-heteroarylamino; or two R^X1groups taken together form a 5- to 6-membered heterocyclic ring. Exemplary acyl groups include aldehydes (—CHO), carboxylic acids (—CO₂H), ketones, acyl halides, esters, amides, imines, carbonates, carbamates, and ureas. Acyl substituents include, but are not limited to, any of the substituents described in this application that result in the formation of a stable moiety (e.g., aliphatic, alkyl, alkenyl, alkynyl, heteroaliphatic, heterocyclic, aryl, heteroaryl, acyl, oxo, imino, thiooxo, cyano, isocyano, amino, azido, nitro, hydroxyl, thiol, halo, aliphaticamino, heteroaliphaticamino, alkylamino, heteroalkylamino, arylamino, heteroarylamino, alkylaryl, arylalkyl, aliphaticoxy, heteroaliphaticoxy, alkyloxy, heteroalkyloxy, aryloxy, heteroaryloxy, aliphaticthioxy, heteroaliphaticthioxy, alkylthioxy, heteroalkylthioxy, arylthioxy, heteroarylthioxy, acyloxy, and the like, each of which may or may not be further substituted).

“Alkenyl” refers to a radical of, or a substituent that is, a straight-chain or branched hydrocarbon group having from 2 to 20 carbon atoms, one or more carbon-carbon double bonds, and no triple bonds (“C_2-20alkenyl”). In some embodiments, an alkenyl group has 2 to 10 carbon atoms (“C_2-10alkenyl”). In some embodiments, an alkenyl group has 2 to 9 carbon atoms (“C_2-9alkenyl”). In some embodiments, an alkenyl group has 2 to 8 carbon atoms (“C_2-8alkenyl”). In some embodiments, an alkenyl group has 2 to 7 carbon atoms (“C_2-7alkenyl”). In some embodiments, an alkenyl group has 2 to 6 carbon atoms (“C_2-6alkenyl”). In some embodiments, an alkenyl group has 2 to 5 carbon atoms (“C_2-5alkenyl”). In some embodiments, an alkenyl group has 2 to 4 carbon atoms (“C_2-4alkenyl”). In some embodiments, an alkenyl group has 2 to 3 carbon atoms (“C_2-3alkenyl”). In some embodiments, an alkenyl group has 2 carbon atoms (“C₂alkenyl”). The one or more carbon-carbon double bonds can be internal (such as in 2-butenyl) or terminal (such as in 1-butenyl). Examples of C_2-4alkenyl groups include ethenyl (C₂), 1-propenyl (C₃), 2-propenyl (C₃), 1-butenyl (C₄), 2-butenyl (C₄), butadienyl (C₄), and the like. Examples of C_2-6alkenyl groups include the aforementioned C_2-4alkenyl groups as well as pentenyl (C₅), pentadienyl (C₅), hexenyl (C₆), and the like. Additional examples of alkenyl include heptenyl (C₇), octenyl (C₈), octatrienyl (C₈), and the like. Unless otherwise specified, each instance of an alkenyl group is independently optionally substituted, i.e., unsubstituted (an “unsubstituted alkenyl”) or substituted (a “substituted alkenyl”) with one or more substituents. In certain embodiments, the alkenyl group is unsubstituted C_2-10alkenyl. In certain embodiments, the alkenyl group is substituted C_2-10alkenyl.

“Alkynyl” refers to a radical of, or a substituent that is, a straight-chain or branched hydrocarbon group having from 2 to 20 carbon atoms, one or more carbon-carbon triple bonds, and optionally one or more double bonds (“C_2-20alkynyl”). In some embodiments, an alkynyl group has 2 to 10 carbon atoms (“C_2-10alkynyl”). In some embodiments, an alkynyl group has 2 to 9 carbon atoms (“C_2-9alkynyl”). In some embodiments, an alkynyl group has 2 to 8 carbon atoms (“C_2-8alkynyl”). In some embodiments, an alkynyl group has 2 to 7 carbon atoms (“C_2-7alkynyl”). In some embodiments, an alkynyl group has 2 to 6 carbon atoms (“C_2-6alkynyl”). In some embodiments, an alkynyl group has 2 to 5 carbon atoms (“C_2-5alkynyl”). In some embodiments, an alkynyl group has 2 to 4 carbon atoms (“C_2-4alkynyl”). In some embodiments, an alkynyl group has 2 to 3 carbon atoms (“C_2-3alkynyl”). In some embodiments, an alkynyl group has 2 carbon atoms (“C₂alkynyl”). The one or more carbon-carbon triple bonds can be internal (such as in 2-butynyl) or terminal (such as in 1-butynyl). Examples of C_2-4alkynyl groups include, without limitation, ethynyl (C₂), 1-propynyl (C₃), 2-propynyl (C₃), 1-butynyl (C₄), 2-butynyl (C₄), and the like. Examples of C_2-6alkenyl groups include the aforementioned C_2-4alkynyl groups as well as pentynyl (C₅), hexynyl (C₆), and the like. Additional examples of alkynyl include heptynyl (C₇), octynyl (C₅), and the like. Unless otherwise specified, each instance of an alkynyl group is independently optionally substituted, i.e., unsubstituted (an “unsubstituted alkynyl”) or substituted (a “substituted alkynyl”) with one or more substituents. In certain embodiments, the alkynyl group is unsubstituted C_2-10alkynyl. In certain embodiments, the alkynyl group is substituted C_2-10alkynyl.

“Carbocyclyl” or “carbocyclic” refers to a radical of a non-aromatic cyclic hydrocarbon group having from 3 to 10 ring carbon atoms (“C_3-10carbocyclyl”) and zero heteroatoms in the non-aromatic ring system. In some embodiments, a carbocyclyl group has 3 to 8 ring carbon atoms (“C_3-8carbocyclyl”). In some embodiments, a carbocyclyl group has 3 to 6 ring carbon atoms (“C_3-6carbocyclyl”). In some embodiments, a carbocyclyl group has 3 to 6 ring carbon atoms (“C_3-6carbocyclyl”). In some embodiments, a carbocyclyl group has 5 to 10 ring carbon atoms (“C_5-10carbocyclyl”). Exemplary C_3-6carbocyclyl groups include, without limitation, cyclopropyl (C₃), cyclopropenyl (C₃), cyclobutyl (C₄), cyclobutenyl (C₄), cyclopentyl (C₅), cyclopentenyl (C₅), cyclohexyl (C₆), cyclohexenyl (C₆), cyclohexadienyl (C₆), and the like. Exemplary C_3-8carbocyclyl groups include, without limitation, the aforementioned C_3-6carbocyclyl groups as well as cycloheptyl (C₇), cycloheptenyl (C₇), cycloheptadienyl (C₇), cycloheptatrienyl (C₇), cyclooctyl (C₈), cyclooctenyl (C₈), bicyclo[2.2.1]heptanyl (C₇), bicyclo[2.2.2]octanyl (C₈), and the like. Exemplary C_3-10carbocyclyl groups include, without limitation, the aforementioned C_3-8carbocyclyl groups as well as cyclononyl (C₉), cyclononenyl (C₉), cyclodecyl (C₁₀), cyclodecenyl (C₁₀), octahydro-1H-indenyl (C₉), decahydronaphthalenyl (C₁₀), spiro[4.5]decanyl (C₁₀), and the like. As the foregoing examples illustrate, in certain embodiments, the carbocyclyl group is either monocyclic (“monocyclic carbocyclyl”) or contain a fused, bridged or spiro ring system such as a bicyclic system (“bicyclic carbocyclyl”) and can be saturated or can be partially unsaturated. “Carbocyclyl” also includes ring systems wherein the carbocyclic ring, as defined above, is fused with one or more aryl or heteroaryl groups wherein the point of attachment is on the carbocyclic ring, and in such instances, the number of carbons continue to designate the number of carbons in the carbocyclic ring system. Unless otherwise specified, each instance of a carbocyclyl group is independently optionally substituted, i.e., unsubstituted (an “unsubstituted carbocyclyl”) or substituted (a “substituted carbocyclyl”) with one or more substituents. In certain embodiments, the carbocyclyl group is unsubstituted C_3-10carbocyclyl. In certain embodiments, the carbocyclyl group is a substituted C_3-10carbocyclyl.

In some embodiments, “carbocyclyl” is a monocyclic, saturated carbocyclyl group having from 3 to 10 ring carbon atoms (“C_3-10cycloalkyl”). In some embodiments, a cycloalkyl group has 3 to 8 ring carbon atoms (“C_3-8cycloalkyl”). In some embodiments, a cycloalkyl group has 3 to 6 ring carbon atoms (“C_3-6cycloalkyl”). In some embodiments, a cycloalkyl group has 5 to 6 ring carbon atoms (“C_5-6cycloalkyl”). In some embodiments, a cycloalkyl group has 5 to 10 ring carbon atoms (“C_5-10cycloalkyl”). Examples of C_5-6cycloalkyl groups include cyclopentyl (C₅) and cyclohexyl (C₅). Examples of C_3-6cycloalkyl groups include the aforementioned C_5-6cycloalkyl groups as well as cyclopropyl (C₃) and cyclobutyl (C₄). Examples of C_3-8cycloalkyl groups include the aforementioned C_3-6cycloalkyl groups as well as cycloheptyl (C₇) and cyclooctyl (C₈). Unless otherwise specified, each instance of a cycloalkyl group is independently unsubstituted (an “unsubstituted cycloalkyl”) or substituted (a “substituted cycloalkyl”) with one or more substituents. In certain embodiments, the cycloalkyl group is unsubstituted C_3-10cycloalkyl. In certain embodiments, the cycloalkyl group is substituted C_3-10cycloalkyl.

“Aryl” refers to a radical of a monocyclic or polycyclic (e.g., bicyclic or tricyclic) 4n+2 aromatic ring system (e.g., having 6, 10, or 14 pi electrons shared in a cyclic array) having 6-14 ring carbon atoms and zero heteroatoms provided in the aromatic ring system (“C_6-14aryl”). In some embodiments, an aryl group has six ring carbon atoms (“C₆aryl”; e.g., phenyl). In some embodiments, an aryl group has ten ring carbon atoms (“C₁₀aryl”; e.g., naphthyl such as 1-naphthyl and 2-naphthyl). In some embodiments, an aryl group has fourteen ring carbon atoms (“C₁₄aryl”; e.g., anthracyl). “Aryl” also includes ring systems wherein the aryl ring, as defined above, is fused with one or more carbocyclyl or heterocyclyl groups wherein the radical or point of attachment is on the aryl ring, and in such instances, the number of carbon atoms continue to designate the number of carbon atoms in the aryl ring system. Unless otherwise specified, each instance of an aryl group is independently optionally substituted, i.e., unsubstituted (an “unsubstituted aryl”) or substituted (a “substituted aryl”) with one or more substituents. In certain embodiments, the aryl group is unsubstituted C_6-14aryl. In certain embodiments, the aryl group is substituted C_6-14aryl.

“Aralkyl” is a subset of alkyl and aryl and refers to an optionally substituted alkyl group substituted by an optionally substituted aryl group. In certain embodiments, the aralkyl is optionally substituted benzyl. In certain embodiments, the aralkyl is benzyl. In certain embodiments, the aralkyl is optionally substituted phenethyl. In certain embodiments, the aralkyl is phenethyl. In certain embodiments, the aralkyl is 7-phenylheptanyl. In certain embodiments, the aralkyl is C7 alkyl substituted by an optionally substituted aryl group (e.g., phenyl). In certain embodiments, the aralkyl is a C7-C10 alkyl group substituted by an optionally substituted aryl group (e.g., phenyl).

“Partially unsaturated” refers to a group that includes at least one double or triple bond. A “partially unsaturated” ring system is further intended to encompass rings having multiple sites of unsaturation but is not intended to include aromatic groups (e.g., aryl or heteroaryl groups) as defined in this application. Likewise. “saturated” refers to a group that does not contain a double or triple bond, i.e., contains all single bonds.

The term “optionally substituted” means substituted or unsubstituted.

Alkyl, alkenyl, alkynyl, carbocyclyl, heterocyclyl, aryl, and heteroaryl groups are optionally substituted (e.g., “substituted” or “unsubstituted” alkyl, “substituted” or “unsubstituted” alkenyl, “substituted” or “unsubstituted” alkynyl, “substituted” or “unsubstituted” carbocyclyl, “substituted” or “unsubstituted” heterocyclyl, “substituted” or “unsubstituted” aryl or “substituted” or “unsubstituted” heteroaryl group). In general, the term “substituted,” whether preceded by the term “optionally” or not, means that at least one hydrogen present on a group (e.g., a carbon or nitrogen atom) is replaced with a permissible substituent, e.g., a substituent which upon substitution results in a stable compound, e.g., a compound which does not spontaneously undergo transformation such as by rearrangement, cyclization, elimination, or other reaction. Unless otherwise indicated, a “substituted” group has a substituent at one or more substitutable positions of the group, and when more than one position in any given structure is substituted, the substituent is either the same or different at each position. The term “substituted” is contemplated to include substitution with all permissible substituents of organic compounds, any of the substituents described in this application that results in the formation of a stable compound. The present invention contemplates any and all such combinations in order to arrive at a stable compound. For purposes of this invention, heteroatoms such as nitrogen may have hydrogen substituents and/or any suitable substituent as described in this application which satisfy the valencies of the heteroatoms and results in the formation of a stable moiety.

Exemplary carbon atom substituents include, but are not limited to, halogen, —CN, —NO₂, —N₃, —SO₂H, —SO₃H, —OH, —OR^aa, —ON(R^bb)₂, —N(R^bb)₂, —N(R^bb)₃⁺X⁻, —N(OR^cc)R^bb, —SH, —SR^aa, —SSR^cc, —C(═O)R^aa, —CO₂H, —CHO, —C(OR^cc)₂, —CO₂R^aa, —OC(═O)R^aa, —CO₂R^aa, —C(═O)N(R^bb)₂, —OC(═O)N(R^bb)₂, —NR^bbC(═O)R^aa, —NR^bbCO₂R^aa, —NR^bbC(═O)N(R^bb)₂, —C(═NR^bb)R^aa, —C(═NR^bb)OR^aa, —OC(═NR^bb)R^aa, —OC(═NR^bb)OR^aa, —C(═NR^bb)N(R^bb)₂, —OC(═NR^bb)N(R^bb)₂, —NR^bbC(═NR^bb)N(R^bb)₂, —C(═O)NR^bbSO₂R^aa, —NR^bbSO₂R^aa, —SO₂N(R^bb)₂, —SO₂R^aa, —SO₂OR^aa, —OSO₂R^aa, —S(═O)R^aa, —OS(═O)R^aa, —Si(R^aa)₃, —OSi(R^aa)₃—C(═S)N(R^bb)₂, —C(═O)SR^aa, —C(═S)SR^aa, —SC(═S)SR^aa, —SC(═O)SR^aa, —OC(═O)SR^aa, —SC(═O)OR^aa, —SC(═O)R^aa, —P(═O)(R^aa)₂, —P(═O)(OR^cc)₂, —OP(═O)(R^aa)₂, —OP(═O)(OR^cc)₂, —P(═O)(N(R^bb)₂)₂, —OP(═O)(N(R^bb)₂)₂, —NR^bbP(═O)(R^aa)₂, —NR^bbP(═O)(OR^cc)₂, —NR^bbP(═O)(N(R^bb)₂)₂, —P(R^cc)₂, —P(OR^cc)₂, —P(R^cc)₃⁺X⁻, —P(OR^cc)₃⁺X⁻, —P(R^cc)₄, —P(OR^cc)₄, —OP(R^cc)₂, —OP(R^cc)₃⁺X⁻, —OP(OR^cc)₂, —OP(OR^cc)₃⁺X⁻, —OP(R^cc)₄, —OP(OR^cc)₄, —B(R^aa)₂, —B(OR^cc)₂, —BR^aa(OR^cc), C_1-10alkyl, C_1-10perhaloalkyl, C_2-10alkenyl, C_2-10alkynyl, heteroC_1-10alkyl, heteroC_2-10alkenyl, heteroC_2-10alkynyl, C_3-10carbocyclyl, 3-14 membered heterocyclyl, C_6-14aryl, and 5-14 membered heteroaryl;

- wherein:
  - each instance of R^aais, independently, selected from C^1-10alkyl, C_1-10perhaloalkyl, C_2-10alkenyl, C_2-10alkynyl, heteroC_1-10alkyl, heteroC_2-10alkenyl, heteroC_2-10alkynyl, C_3-10carbocyclyl, 3-14 membered heterocyclyl, C_6-14aryl, and 5-14 membered heteroaryl, or two R^aagroups are joined to form a 3-14 membered heterocyclyl or 5-14 membered heteroaryl ring, wherein each alkyl, alkenyl, alkynyl, heteroalkyl, heteroalkenyl, heteroalkynyl, carbocyclyl, heterocyclyl, aryl, and heteroaryl is independently substituted with 0, 1, 2, 3, 4, or 5 R^ddgroups;
  - each instance of R^bbis, independently, selected from hydrogen, —OH, —OR^aa, —N(R^cc)₂, —CN, —C(═O)R^aa, —C(═O)N(R^cc)₂, —CO₂R^aa, —SO₂R^aa, —C(═NR^cc)OR^aa, —C(═NR^cc)N(R^cc)₂, —SO₂N(R^cc)₂, —SO₂R^cc, —SO₂OR^cc, —SOR^aa, —C(═S)N(R^cc)₂, —C(═O)SR^cc, —C(═S)SR^cc, —P(═O)(R^aa)₂, —P(═O)(OR^cc)₂, —P(═O)(N(R^cc)₂)₂, C_1-10alkyl, C_1-10perhaloalkyl, C_2-10alkenyl, C_2-10alkynyl, heteroC_1-10alkyl, heteroC_2-10alkenyl, heteroC_2-10alkynyl, C_3-10carbocyclyl, 3-14 membered heterocyclyl, C_6-14aryl, and 5-14 membered heteroaryl, or two R^bbgroups are joined to form a 3-14 membered heterocyclyl or 5-14 membered heteroaryl ring, wherein each alkyl, alkenyl, alkynyl, heteroalkyl, heteroalkenyl, heteroalkynyl, carbocyclyl, heterocyclyl, aryl, and heteroaryl is independently substituted with 0, 1, 2, 3, 4, or 5 R^ddgroups; wherein X⁻ is a counterion;
  - each instance of R^ccis, independently, selected from hydrogen, C_1-10alkyl, C_1-10perhaloalkyl, C_2-10alkenyl, C_2-10alkynyl, heteroC_1-10alkyl, heteroC_2-10alkenyl, heteroC_2-10alkynyl, C_3-10carbocyclyl, 3-14 membered heterocyclyl, C_6-14aryl, and 5-14 membered heteroaryl, or two R^ccgroups are joined to form a 3-14 membered heterocyclyl or 5-14 membered heteroaryl ring, wherein each alkyl, alkenyl, alkynyl, heteroalkyl, heteroalkenyl, heteroalkynyl, carbocyclyl, heterocyclyl, aryl, and heteroaryl is independently substituted with 0, 1, 2, 3, 4, or 5 R^ddgroups;
  - each instance of R^ddis, independently, selected from halogen, —CN, —NO₂, —N₃, —SO₂H, —SO₃H, —OH, —OR^ee, —ON(R^ff)₂, —N(R^ff)₂, —N(R^ff)₃⁺X⁻, —N(OR^ee)R^ff, —SH, —SR^ee, —SSR^ee, —C(═O)R^ee, —CO₂H, —CO₂R^ee, —OC(═O)R^ee, —OCO₂R^ee, —C(═O)N(R^ff)₂, —OC(═O)N(R^ff)₂, —NR^ffC(═O)R^ee, —NR^ffCO₂R^ee, —NR^ffC(═O)N(R^ff)₂, —C(═NR^ff)OR^ee, —OC(═NR^ff)R^ee, —OC(═NR^ff)OR^ee, —C(═NR^ff)N(R^ff)₂, —OC(═NR^ff)N(R^ff)₂, —NR^ffC(═NR^ff)N(R^ff)₂, —NR^ffSO₂R^ee, —SO₂N(R^ff)₂, —SO₂R^ee, —SO₂OR^ee, —OSO₂R^ee, —S(═O))R^ee, —Si(R^ee)₃, —OSi(R^ee)₃, —C(═S)N(R^ff)₂, —C(═O)SR^ee, —C(═S)SR^ee, —SC(═S)SR^ee, —P(═O)(OR^ee)₂, —P(═O)(R^ee)₂, —OP(═O)(R^ee)₂, —OP(═O)(OR^ee)₂, C_1-6alkyl, C_1-6perhaloalkyl, C_2-6alkenyl, C_2-6alkynyl, heteroC_1-6alkyl, heteroC_2-6alkenyl, heteroC_2-6alkynyl, C_3-10carbocyclyl, 3-10 membered heterocyclyl, C_6-10aryl, 5-10 membered heteroaryl, wherein each alkyl, alkenyl, alkynyl, heteroalkyl, heteroalkenyl, heteroalkynyl, carbocyclyl, heterocyclyl, aryl, and heteroaryl is independently substituted with 0, 1, 2, 3, 4, or 5 R^gggroups, or two geminal R^ddsubstituents can be joined to form ═O or ═S; wherein X⁻ is a counterion;
  - each instance of R^ddis, independently, selected from C_1-6alkyl, C_1-6perhaloalkyl, C_2-6alkenyl, C_2-6alkynyl, heteroC_1-6alkyl, heteroC_2-6alkenyl, heteroC_2-6alkynyl, C_3-10carbocyclyl, C_6-10aryl, 3-10 membered heterocyclyl, and 3-10 membered heteroaryl, wherein each alkyl, alkenyl, alkynyl, heteroalkyl, heteroalkenyl, heteroalkynyl, carbocyclyl, heterocyclyl, aryl, and heteroaryl is independently substituted with 0, 1, 2, 3, 4, or 5 R^gggroups;
  - each instance of R^ffis, independently, selected from hydrogen, C_1-6alkyl, C_1-6perhaloalkyl, C_2-6alkenyl, C_2-6alkynyl, heteroC_1-6alkyl, heteroC_2-6alkenyl, heteroC_2-6alkynyl, C_3-10carbocyclyl, 3-10 membered heterocyclyl, C_6-10aryl and 5-10 membered heteroaryl, or two R^ffgroups are joined to form a 3-10 membered heterocyclyl or 5-10 membered heteroaryl ring, wherein each alkyl, alkenyl, alkynyl, heteroalkyl, heteroalkenyl, heteroalkynyl, carbocyclyl, heterocyclyl, aryl, and heteroaryl is independently substituted with 0, 1, 2, 3, 4, or 5 R^gggroups; and
  - each instance of R^ggis, independently, halogen, —CN, —NO₂, —N₃, —SO₂H, —SO₃H, —OH, —OC_1-6alkyl, —ON(C_1-6alkyl)₂, —N(C_1-6alkyl)₂, —N(C_1-6alkyl)₃⁺X⁻, —NH(C_1-6alkyl)₂⁺X⁻, —NH₂(C_1-6alkyl) ⁺X⁻, —NH₃⁺X⁻, —N(OC_1-6alkyl)(C_1-6alkyl), —N(OH)(C_1-6alkyl), —NH(OH), —SH, —SC_1-6alkyl, —SS(C_1-6alkyl), —C(═O)(C_1-6alkyl), —CO₂H, —CO₂(C_1-6alkyl), —OC(═O)(C_1-6alkyl), —OCO₂(C_1-6alkyl), —C(═O)NH₂, —C(═O)N(C_1-6alkyl)₂, —OC(═O)NH(C_1-6alkyl), —NHC(═O)(C_1-6alkyl), —N(C_1-6alkyl)C(═O)(C_1-4alkyl), —NHCO₂(C_1-6alkyl), —NHC(═O)N(C_1-6alkyl)₂, —NHC(═O)NH(C_1-6alkyl), —NHC(═O)NH₂, —C(═NH)O(C_1-6alkyl), —OC(═NH)(C_1-6alkyl), —OC(═NH)OC_1-6alkyl, —C(═NH)N(C_1-6alkyl)₂, —C(═NH)NH(C_1-6alkyl), —C(═NH)NH₂, —OC(═NH)N(C_1-6alkyl)₂, —OC(NH)NH(C_1-6alkyl), —OC(NH)NH₂, —NHC(NH)N(C_1-6alkyl)₂, —NHC(═NH)NH₂, —NHSO₂(C_1-6alkyl), —SO₂N(C_1-6alkyl)₂, —SO₂NH(C_1-6alkyl), —SO₂NH₂, —SO₂C_1-6alkyl, —SO₂OC_1-6alkyl, —OSO₂C_1-6alkyl, —SOC_1-6alkyl, —Si(C_1-6alkyl)₃, —OSi(C_1-6alkyl)₃—C(═S)N(C_1-6alkyl)₂, C(═S)NH(C_1-6alkyl), C(═S)NH₂, —C(═O)S(C_1-6alkyl), —C(═S)SC_1-6alkyl, —SC(═S)SC_1-6alkyl, —P(═OX(OC_1-6alkyl)₂, —P(═O)(C_1-6alkyl)₂, —OP(═O)(C_1-6alkyl)₂, —OP(═O)(OC_1-6alkyl)₂, C_1-6alkyl, C_1-6perhaloalkyl, C_2-6alkenyl, C_2-6alkynyl, heteroC_1-6alkyl, heteroC_2-6alkenyl, heteroC_2-6alkynyl, C_3-10carbocyclyl, C_6-10aryl, 3-10 membered heterocyclyl, 5-10 membered heteroaryl; or two geminal R^ggsubstituents can be joined to form ═O or ═S; wherein X⁻ is a counterion. Alternatively, two geminal hydrogens on a carbon atom are replaced with the group ═O, ═S, ═NN(R^bb)₂, ═NNR^bbC(═O)R^aa, ═NNR^bbC(═O)OR^aa, ═NNR^bbS(═O)₂R^aa, ═NR^bb, or ═NOR^cc; wherein each alkyl, alkenyl, alkynyl, heteroalkyl, heteroalkenyl, heteroalkynyl, carbocyclyl, heterocyclyl, aryl, and heteroaryl is independently substituted with 0, 1, 2, 3, 4, or 5 R^ddgroups; wherein X⁻ is a counterion;
- wherein:
- each instance of R^aais, independently, selected from C_1-10alkyl, C_1-10perhaloalkyl, C_2-10alkenyl, C_2-10alkynyl, heteroC_1-10alkyl, heteroC_2-10alkenyl, heteroC_2-10alkynyl, C_3-10carbocyclyl, 3-14 membered heterocyclyl, C_6-14aryl, and 5-14 membered heteroaryl, or two R^aagroups are joined to form a 3-14 membered heterocyclyl or 5-14 membered heteroaryl ring, wherein each alkyl, alkenyl, alkynyl, heteroalkyl, heteroalkenyl, heteroalkynyl, carbocyclyl, heterocyclyl, aryl, and heteroaryl is independently substituted with 0, 1, 2, 3, 4, or 5 R^ddgroups;
- each instance of R^bbis, independently, selected from hydrogen, —OH, —OR^cc, —N(R^cc)₂, —CN, —C(═O)R^aa, —C(═O)N(R^cc)₂, —CO₂R^aa, —SO₂R^aa, —C(═NR^cc)OR^aa, —C(═NR^cc)N(R^cc)₂, —SO₂N(R^cc)₂, —SO₂R^cc, —SO₂OR^cc, —SOR^aa, —C(═S)N(R^cc)₂, —C(═O)SR^cc, —C(═S)SR^cc, —P(═O)(R^aa)₂, —P(═O)(OR^cc)₂, —P(═O)(N(R^cc)₂)₂, C_1-10alkyl, C_1-10perhaloalkyl, C_2-10alkenyl, C_2-10alkynyl, heteroC_1-10alkyl, heteroC_2-10alkenyl, heteroC_2-10alkynyl, C_3-10carbocyclyl, 3-14 membered heterocyclyl, C_6-14aryl, and 5-14 membered heteroaryl, or two R^bbgroups are joined to form a 3-14 membered heterocyclyl or 5-14 membered heteroaryl ring, wherein each alkyl, alkenyl, alkynyl, heteroalkyl, heteroalkenyl, heteroalkynyl, carbocyclyl, heterocyclyl, aryl, and heteroaryl is independently substituted with 0, 1, 2, 3, 4, or 5 R^ddgroups; wherein X⁻ is a counterion:
- each instance of R^ccis, independently, selected from hydrogen. C_1-10alkyl, C_1-10perhaloalkyl, C_2-10alkenyl, C_2-10alkynyl, heteroC_1-10alkyl, heteroC_2-10alkenyl, heteroC_2-10alkynyl, C_3-10carbocyclyl, 3-14 membered heterocyclyl, C_6-14aryl, and 5-14 membered heteroaryl, or two R^ccgroups are joined to form a 3-14 membered heterocyclyl or 5-14 membered heteroaryl ring, wherein each alkyl, alkenyl, alkynyl, heteroalkyl, heteroalkenyl, heteroalkynyl, carbocyclyl, heterocyclyl, aryl, and heteroaryl is independently substituted with 0, 1, 2, 3, 4, or 5 R^ddgroups;
- each instance of R^ddis, independently, selected from halogen, —CN, —NO₂, —N₃, —SO₂H, —SO₃H, —OH, —OR^ee, —ON(R^ff)₂, —N(R^ff)₂, —N(R^ff)₃⁺X⁻, —N(OR^ee)R^ff, —SH, —SR^ee, —SSR^ee, —C(═O)R^ee, —CO₂H, —CO₂R^ee, —OC(═O)R^ee, —OCO₂R^ee, —C(═O)N(R^ff)₂, —OC(═O)N(R^ff)₂, —NR^ffC(═O)R^ee, —NR^ffCO₂R^ee, —NR^ffC(═O)N(R^ff)₂, —C(═NR^ff)OR^ee, —OC(═NR^ff)R^ee, —OC(═NR^ff)OR^ee, —C(═NR^ff)N(R^ff)₂, —OC(═NR^ff)N(R^ff)₂, —NR^ffC(═NR^ff)N(R^ff)₂, —NR^ffSO₂R^ee, —SO₂N(R^ff)₂, —SO₂R^ee, —SO₂OR^ee, —OSO₂R^ee, —S(═O)R^ee, —Si(R^ee)₃, —OSi(R^ee)₃, —C(═S)N(R^ff)₂, —C(═O)SR^ee, —C(═S)SR^ee, —SC(═S)SR^ee, —P(═O)(OR^ee)₂, —P(═O)(R^ee)₂, —OP(═O)(R^ee)₂, —OP(═O)(OR^ee)₂, C_1-6alkyl, C_1-6perhaloalkyl, C_2-6alkenyl, C_2-6alkynyl, heteroC_1-6alkyl, heteroC_2-6alkenyl, heteroC_2-6alkynyl, C_3-10carbocyclyl, 3-10 membered heterocyclyl, C_6-10aryl, 5-10 membered heteroaryl, wherein each alkyl, alkenyl, alkynyl, heteroalkyl, heteroalkenyl, heteroalkynyl, carbocyclyl, heterocyclyl, aryl, and heteroaryl is independently substituted with 0, 1, 2, 3, 4, or 5 R^gggroups, or two geminal R^ddsubstituents can be joined to form ═O or ═S; wherein X⁻ is a counterion;
- each instance of R^eeis, independently, selected from C_1-6alkyl, C_1-6perhaloalkyl, C_2-6alkenyl, C_2-6alkynyl, heteroC_1-6alkyl, heteroC_2-6alkenyl, heteroC_2-6alkynyl, C_3-10carbocyclyl, C_6-10aryl, 3-10 membered heterocyclyl, and 3-10 membered heteroaryl, wherein each alkyl, alkenyl, alkynyl, heteroalkyl, heteroalkenyl, heteroalkynyl, carbocyclyl, heterocyclyl, aryl, and heteroaryl is independently substituted with 0, 1, 2, 3, 4, or 5 R^gggroups:
- each instance of R^ffis, independently, selected from hydrogen, C_1-6alkyl, C_1-6perhaloalkyl, C_2-6alkenyl, C_2-6alkynyl, heteroC_1-6alkyl, heteroC_2-6alkenyl, heteroC_2-6alkynyl, C_3-10carbocyclyl, 3-10 membered heterocyclyl, C_6-10aryl and 5-10 membered heteroaryl, or two R^ffgroups are joined to form a 3-10 membered heterocyclyl or 5-10 membered heteroaryl ring, wherein each alkyl, alkenyl, alkynyl, heteroalkyl, heteroalkenyl, heteroalkynyl, carbocyclyl, heterocyclyl, aryl, and heteroaryl is independently substituted with 0, 1, 2, 3, 4, or 5 R^gggroups; and
- each instance of R^ggis, independently, halogen, —CN, —NO₂, —N₃, —SO₂H, —SO₃H, —OH, —OC_1-6alkyl, —ON(C_1-6alkyl)₂, —N(C_1-6alkyl)₂, —N(C_1-6alkyl)₃⁺X⁻, —NH(C_1-6alkyl)₂⁺X⁻, —NH₂(C_1-6alkyl)⁺X⁻, —NH₃⁺X⁻, —N(OC_1-6alkyl)(C_1-6alkyl), —N(OH)(C_1-6alkyl), —NH(OH), —SH, —SC_1-6alkyl, —SS(C_1-6alkyl), —C(═O)(C_1-6alkyl), —CO₂H, —CO₂(C_1-6alkyl), —OC(═O)(C_1-6alkyl), —OCO₂(C_1-6alkyl), —C(═O)NH₂, —C(═O)N(C_1-6alkyl)₂, —OC(═O)NH(C_1-6alkyl), —NHC(═O)(C_1-6alkyl), —N(C_1-6alkyl)C(═O)(C_1-6alkyl), —NHCO₂(C_1-6alkyl), —NHC(═O)N(C_1-6alkyl)₂, —NHC(═O)NH(C_1-6alkyl), —NHC(═O)NH₂, —C(═NH)O(C_1-6alkyl), —OC(═NH)(C_1-6alkyl), —OC(═NH)OC_1-6alkyl, —C(═NH)N(C_1-6alkyl)₂, —C(═NH)NH(C_1-6alkyl), —C(═NH)NH₂, —OC(═NH)N(C_1-6alkyl)₂, —OC(NH)NH(C_1-6alkyl), —OC(NH)NH₂, —NHC(NH)N(C_1-6alkyl)₂, —NHC(═NH)NH₂, —NHSO₂(C_1-6alkyl), —SO₂N(C_1-6alkyl)₂, —SO₂NH(C_1-6alkyl), —SO₂NH₂, —SO₂C_1-6alkyl, —SO₂OC_1-6alkyl, —OSO₂C_1-6alkyl, —SOC_1-6alkyl, —Si(C_1-6alkyl)₃, —OSi(C_1-6alkyl)₃—C(═S)N(C_1-6alkyl)₂, C(═S)NH(C_1-6alkyl), C(═S)NH₂, —C(═O)S(C_1-6alkyl), —C(═S)SC_1-6alkyl, —SC(═S)SC_1-6alkyl, —P(═O)(OC_1-6alkyl)₂, —P(═O)(C_1-6alkyl)₂, —OP(═O)(C_1-6alkyl)₂, —OP(═O)(C_1-6alkyl)₂, C_1-6alkyl, C_1-6perhaloalkyl, C_2-6alkenyl, C_2-6alkynyl, heteroC_1-6alkyl, heteroC_2-6alkenyl, heteroC_2-6alkynyl, C_3-10carbocyclyl, C_6-10aryl, 3-10 membered heterocyclyl, 5-10 membered heteroaryl; or two geminal R^ggsubstituents can be joined to form ═O or ═S; wherein X⁻ is a counterion.

A “counterion” or “anionic counterion” is a negatively charged group associated with a positively charged group in order to maintain electronic neutrality. An anionic counterion may be monovalent (i.e., including one formal negative charge). An anionic counterion may also be multivalent (i.e., including more than one formal negative charge), such as divalent or trivalent. Exemplary counterions include halide ions (e.g., F⁻, Cl⁻, Br⁻, I⁻), NO₃⁻, ClO₄⁻, OH⁻, H₂PO₄⁻, HCO₃⁻, HSO₄⁻, sulfonate ions (e.g., methansulfonate, trifluoromethanesulfonate, p-toluenesulfonate, benzenesulfonate, 10-camphor sulfonate, naphthalene-2-sulfonate, naphthalene-1-sulfonic acid-5-sulfonate, ethan-1-sulfonic acid-2-sulfonate, and the like), carboxylate ions (e.g., acetate, propanoate, benzoate, glycerate, lactate, tartrate, glycolate, gluconate, and the like), BF₄⁻, PF₄⁻, PF₆⁻, AsF₆⁻, SbF₆⁻, B[3,5-(CF₃)₂C₆H₃]₄]⁻, B(C₆F₅)₄⁻, BPh₄⁻, Al(OC(CF₃)₃)₄⁻, and carborane anions (e.g., CB₁₁H₁₂⁻ or (HCB₁₁Me₅Br₆)⁻). Exemplary counterions which may be multivalent include CO₃²⁻, HPO₄²⁻, PO₄³⁻, B₄O₇²⁻, SO₄²⁻, S₂O₃²⁻, carboxylate anions (e.g., tartrate, citrate, fumarate, maleate, malate, malonate, gluconate, succinate, glutarate, adipate, pimelate, suberate, azelate, sebacate, salicylate, phthalates, aspartate, glutamate, and the like), and carboranes.

The term “pharmaceutically acceptable salt” refers to those salts which are, within the scope of sound medical judgment, suitable for use in contact with the tissues of humans and lower animals without undue toxicity, irritation, allergic response and the like, and are commensurate with a reasonable benefit/risk ratio. Pharmaceutically acceptable salts are well known in the art. For example, Berge et al., describe pharmaceutically acceptable salts in detail in J. Pharmaceutical Sciences, 1977, 66, 1-19, incorporated by reference. Pharmaceutically acceptable salts of the compounds disclosed in this application include those derived from suitable inorganic and organic acids and bases. Examples of pharmaceutically acceptable, nontoxic acid addition salts are salts of an amino group formed with inorganic acids such as hydrochloric acid, hydrobromic acid, phosphoric acid, sulfuric acid, and perchloric acid or with organic acids such as acetic acid, oxalic acid, maleic acid, tartaric acid, citric acid, succinic acid, or malonic acid or by using other methods known in the art such as ion exchange. Other pharmaceutically acceptable salts include adipate, alginate, ascorbate, aspartate, benzenesulfonate, benzoate, bisulfate, borate, butyrate, camphorate, camphorsulfonate, citrate, cyclopentanepropionate, digluconate, dodecylsulfate, ethanesulfonate, formate, fumarate, glucoheptonate, glycerophosphate, gluconate, hemisulfate, heptanoate, hexanoate, hydroiodide, 2-hydroxy-ethanesulfonate, lactobionate, lactate, laurate, lauryl sulfate, malate, maleate, malonate, methanesulfonate, 2-naphthalenesulfonate, nicotinate, nitrate, oleate, oxalate, palmitate, pamoate, pectinate, persulfate, 3-phenylpropionate, phosphate, picrate, pivalate, propionate, stearate, succinate, sulfate, tartrate, thiocyanate, p-toluenesulfonate, undecanoate, valerate salts, and the like. Salts derived from appropriate bases include alkali metal, alkaline earth metal, ammonium and N⁺(C_1-4alkyl)₄⁻ salts. Representative alkali or alkaline earth metal salts include sodium, lithium, potassium, calcium, magnesium, and the like. Further pharmaceutically acceptable salts include, when appropriate, nontoxic ammonium, quaternary ammonium, and amine cations formed using counterions such as halide, hydroxide, carboxylate, sulfate, phosphate, nitrate, lower alkyl sulfonate, and aryl sulfonate.

The term “solvate” refers to forms of a compound that are associated with a solvent, usually by a solvolysis reaction. This physical association may include hydrogen bonding. Conventional solvents include water, methanol, ethanol, acetic acid, DMSO, THF, diethyl ether, and the like. The compounds of Formula (1), (9), (10), and (11) may be prepared. e.g., in crystalline form, and may be solvated. Suitable solvates include pharmaceutically acceptable solvates and further include both stoichiometric solvates and non-stoichiometric solvates. In certain instances, the solvate will be capable of isolation, for example, when one or more solvent molecules are incorporated in the crystal lattice of a crystalline solid. “Solvate” encompasses both solution-phase and isolable solvates. Representative solvates include hydrates, ethanolates, and methanolates.

The term “hydrate” refers to a compound that is associated with water. Typically, the number of the water molecules contained in a hydrate of a compound is in a definite ratio to the number of the compound molecules in the hydrate. Therefore, a hydrate of a compound may be represented, for example, by the general formula R·x H₂O, wherein R is the compound and wherein x is a number greater than 0. A given compound may form more than one type of hydrates, including, e.g., monohydrates (x is 1), lower hydrates (x is a number greater than 0 and smaller than 1, e.g., hemihydrates (R·0.5 H₂O)), and polyhydrates (x is a number greater than 1, e.g., dihydrates (R·2 H₂O) and hexahydrates (R·6 H₂O)).

The term “tautomers” refer to compounds that are interchangeable forms of a particular compound structure, and that vary in the displacement of hydrogen atoms and electrons. Thus, two structures may be in equilibrium through the movement of a electrons and an atom (usually H). For example, enols and ketones are tautomers because they are rapidly interconverted by treatment with either acid or base. Another example of tautomerism is the aci- and nitro-forms of phenylnitromethane, which are likewise formed by treatment with acid or base. Tautomeric forms may be relevant to the attainment of the optimal chemical reactivity and biological activity of a compound of interest.

It is also to be understood that compounds that have the same molecular formula but differ in the nature or sequence of bonding of their atoms or the arrangement of their atoms in space are termed “isomers.” Isomers that differ in the arrangement of their atoms in space are termed “stereoisomers.”

Stereoisomers that are not mirror images of one another are termed “diastereomers” and those that are non-superimposable mirror images of each other are termed “enantiomers.” When a compound has an asymmetric center, for example, it is bonded to four different groups, a pair of enantiomers is possible. An enantiomer can be characterized by the absolute configuration of its asymmetric center and described by the R- and S-sequencing rules of Cahn and Prelog. An enantiomer can also be characterized by the manner in which the molecule rotates the plane of polarized light, and designated as dextrorotatory or levorotatory (i.e., as (+) or (−)-isomers respectively). A chiral compound can exist as either an individual enantiomer or as a mixture of enantiomers. A mixture containing equal proportions of the enantiomers is called a “racemic mixture.”

The term “co-crystal” refers to a crystalline structure comprising at least two different components (e.g., a compound described in this application and an acid), wherein each of the components is independently an atom, ion, or molecule. In certain embodiments, none of the components is a solvent. In certain embodiments, at least one of the components is a solvent. A co-crystal of a compound and an acid is different from a salt formed from a compound and the acid. In the salt, a compound described in this application is complexed with the acid in a way that proton transfer (e.g., a complete proton transfer) from the acid to a compound described in this application easily occurs at room temperature. In the co-crystal, however, a compound described in this application is complexed with the acid in a way that proton transfer from the acid to a compound described in this application does not easily occur at room temperature. In certain embodiments, in the co-crystal, there is no proton transfer from the acid to a compound described in this application. In certain embodiments, in the co-crystal, there is partial proton transfer from the acid to a compound described in this application. Co-crystals may be useful to improve the properties (e.g., solubility, stability, and ease of formulation) of a compound described in this application.

The term “polymorphs” refers to a crystalline form of a compound (or a salt, hydrate, or solvate thereof) in a particular crystal packing arrangement. All polymorphs of the same compound have the same elemental composition. Different crystalline forms usually have different X-ray diffraction patterns, infrared spectra, melting points, density, hardness, crystal shape, optical and electrical properties, stability, and solubility. Recrystallization solvent, rate of crystallization, storage temperature, and other factors may cause one crystal form to dominate. Various polymorphs of a compound can be prepared by crystallization under different conditions.

The term “prodrug” refers to compounds, including derivatives of the compounds of Formula (X), (8), (9), (10), or (11), that have cleavable groups and become by solvolysis or under physiological conditions the compounds of Formula (X), (8), (9), (10), or (11) and that are pharmaceutically active in vivo. The prodrugs may have attributes such as, without limitation, solubility, bioavailability, tissue compatibility, or delayed release in a mammalian organism. Examples include, but are not limited to, derivatives of compounds described in this application, including derivatives formed from glycosylation of the compounds described in this application (e.g., glycoside derivatives), carrier-linked prodrugs (e.g., ester derivatives), bioprecursor prodrugs (a prodrug metabolized by molecular modification into the active compound), and the like. Non-limiting examples of glycoside derivatives are disclosed in and incorporated by reference from PCT Publication No. WO2018/208875 and U.S. Patent Publication No. 2019/0078168. Non-limiting examples of ester derivatives are disclosed in and incorporated by reference from U.S. Patent Publication No. US2017/0362195.

Other derivatives of the compounds of this invention have activity in both their acid and acid derivative forms, but the acid sensitive form often offers advantages of solubility, bioavailability, tissue compatibility, or delayed release in a mammalian organism (see, Bundgard, H., Design of Prodrugs, pp. 7-9, 21-24, Elsevier, Amsterdam 1985). Prodrugs include acid derivatives well known to practitioners of the art, such as, for example, esters prepared by reaction of the parent acid with a suitable alcohol, or amides prepared by reaction of the parent acid compound with a substituted or unsubstituted amine, or acid anhydrides, or mixed anhydrides. Simple aliphatic or aromatic esters, amides, and anhydrides derived from acidic groups pendant on the compounds of this invention are particular prodrugs. In some cases it is desirable to prepare double ester type prodrugs such as (acyloxy)alkyl esters or ((alkoxycarbonyl)oxy)alkylesters. C₁-C₈alkyl, C₂-C₈alkenyl, C₂-C₈alkynyl, aryl, C₇-C₁₂substituted aryl, and C₇-C₁₂arylalkyl esters of the compounds of Formula (X), (8), (9), (10), or (11) may be preferred.

Cannabinoids

As used in this application, the term “cannabinoid” includes compounds of Formula (X):

embedded image

or a pharmaceutically acceptable salt, co-crystal, tautomer, stereoisomer, solvate, hydrate, polymorph, isotopically enriched derivative, or prodrug thereof, wherein R1 is hydrogen, optionally substituted acyl, optionally substituted alkyl, optionally substituted alkenyl, optionally substituted alkynyl, optionally substituted carbocyclyl, or optionally substituted aryl; R2 and R6 are, independently, hydrogen or carboxyl; R3 and R5 are, independently, hydroxyl, halogen, or alkoxy; and R4 is a hydrogen or an optionally substituted prenyl moiety; or optionally R4 and R3 are taken together with their intervening atoms to form a cyclic moiety, or optionally R4 and R5 are taken together with their intervening atoms to form a cyclic moiety, or optionally both 1) R4 and R3 are taken together with their intervening atoms to form a cyclic moiety and 2) R4 and R5 are taken together with their intervening atoms to form a cyclic moiety. In certain embodiments, R4 and R3 are taken together with their intervening atoms to form a cyclic moiety. In certain embodiments, R4 and R5 are taken together with their intervening atoms to form a cyclic moiety. In certain embodiments, “cannabinoid” refers to a compound of Formula (X), or a pharmaceutically acceptable salt thereof. In certain embodiments, both 1) R4 and R3 are taken together with their intervening atoms to form a cyclic moiety and 2) R4 and R5 are taken together with their intervening atoms to form a cyclic moiety.

In some embodiments, cannabinoids may be synthesized via the following steps: a) one or more reactions to incorporate three additional ketone moieties onto an acyl-CoA scaffold, where the acyl moiety in the acyl-CoA scaffold comprises between four and fourteen carbons; b) a reaction cyclizing the product of step (a); and c) a reaction to incorporate a prenyl moiety to the product of step (b) or a derivative of the product of step (b). In some embodiments, non-limiting examples of the acyl-CoA scaffold described in step (a) include hexanoyl-CoA and butyryl-CoA. In some embodiments, non-limiting examples of the product of step (b) or a derivative of the product of step (b) include olivetolic acid, divarinic acid, and sphaerophorolic acid.

In some embodiments, a cannabinoid compound of Formula (X) is of Formula (X-A), (X-B), or (X-C):

embedded image

or a pharmaceutically acceptable salt, solvate, hydrate, polymorph, co-crystal, tautomer, stereoisomer, isotopically labeled derivative, or prodrug thereof:

- wherein is a double bond or a single bond, as valency permits,
- R is hydrogen, optionally substituted acyl, optionally substituted alkyl, optionally substituted alkenyl, optionally substituted alkynyl, optionally substituted carbocyclyl, or optionally substituted aryl;
- R^Z1is hydrogen, optionally substituted acyl, optionally substituted alkyl, optionally substituted alkenyl, optionally substituted alkynyl, optionally substituted carbocyclyl, or optionally substituted aryl;
- R^Z2is hydrogen, optionally substituted acyl, optionally substituted alkyl, optionally substituted alkenyl, optionally substituted alkynyl, optionally substituted carbocyclyl, or optionally substituted aryl;
- or optionally, R^Z1and R^Z2are taken together with their intervening atoms to form an optionally substituted carbocyclic ring;
- R^3Ais hydrogen, optionally substituted acyl, optionally substituted alkyl, optionally substituted alkenyl, or optionally substituted alkynyl;
- R^3Bis hydrogen, optionally substituted acyl, optionally substituted alkyl, optionally substituted alkenyl, or optionally substituted alkynyl;
- R^Yis hydrogen, optionally substituted acyl, optionally substituted alkyl, optionally substituted alkenyl, or optionally substituted alkynyl;
- R^Zis hydrogen, optionally substituted acyl, optionally substituted alkyl, optionally substituted alkenyl, or optionally substituted alkynyl.

In certain embodiments, a cannabinoid compound is of Formula (X-A):

embedded image

wherein custom-character is a double bond, and each of R^Z1and R^Z2is hydrogen, one of R^3Aand R^3Bis optionally substituted C_2-6alkenyl, and the other one of R^3Aand R^3Bis optionally substituted C_2-6alkyl. In some embodiments, a cannabinoid compound of Formula (X) is of Formula (X-A), wherein each of R^Z1and R^Z2is hydrogen, one of R^3Aand R^3Bis a prenyl group, and the other one of R^3Aand R^3Bis optionally substituted methyl.

In certain embodiments, a cannabinoid compound of Formula (X) of Formula (X-A) is of Formula (11-z):

embedded image

wherein custom-character is a double bond or single bond, as valency permits; one of R^3Aand R^3Bis C_1-6alkyl optionally substituted with alkenyl, and the other of R^3Aand R^3Bis optionally substituted C_1-6alkyl. In certain embodiments, in a compound of Formula (11-z), is a single bond; one of R^3Aand R^3Bis C_1-6alkyl optionally substituted with prenyl; and the other of one of R^3Aand R^3Bis unsubstituted methyl; and R is as described in this application. In certain embodiments, in a compound of Formula (11-z), custom-character is a single bond; one of R^3Aand R^3Bis

embedded image

and the other of one of R^3Aand R^3Bis unsubstituted methyl; and R is as described in this application. In certain embodiments, a cannabinoid compound of Formula (11-z) is of Formula (11a):

- om

embedded image

In certain embodiments, a cannabinoid compound of Formula (X) of Formula (X-A) is of Formula (11a):

embedded image

In certain embodiments, a cannabinoid compound of Formula (X-A) is of Formula (10-z):

embedded image

wherein custom-character is a double bond or single bond, as valency permits; R^Yis hydrogen, optionally substituted acyl, optionally substituted alkyl, optionally substituted alkenyl, or optionally substituted alkynyl; and each of R^3Aand R^3Bis independently optionally substituted C_1-6alkyl. In certain embodiments, in a compound of Formula (10-z) custom-character is a single bond; each of R^3Aand R^3Bis unsubstituted methyl, and R is as described in this application. In certain embodiments, a cannabinoid compound of Formula (10-z) is of Formula (10a):

embedded image

In certain embodiments, a compound of Formula (10a)

embedded image

has a chiral atom labeled with * at carbon 10 and a chiral atom labeled with ** at carbon 6. In certain embodiments, in a compound of Formula (10a)

embedded image

the chiral atom labeled with * at carbon 10 is of the R-configuration or S-configuration; and a chiral atom labeled with ** at carbon 6 is of the R-configuration. In certain embodiments, in a compound of Formula (10a)

embedded image

the chiral atom labeled with * at carbon 10 is of the S-configuration; and a chiral atom labeled with ** at carbon 6 is of the R-configuration or S-configuration. In certain embodiments, in a compound of Formula (10a)

embedded image

the chiral atom labeled with * at carbon 10 is of the R-configuration and a chiral atom labeled with ** at carbon 6 is of the R-configuration. In certain embodiments, a compound of Formula (10a)

embedded image

is of the formula:

embedded image

In certain embodiments, in a compound of Formula (10a)

embedded image

the chiral atom labeled with * at carbon 10 is of the S-configuration and a chiral atom labeled with ** at carbon 6 is of the S-configuration. In certain embodiments, a compound of Formula (10a)

embedded image

is of the formula:

embedded image

In certain embodiments, a cannabinoid compound is of Formula (X-B):

embedded image

wherein custom-character a double bond; R^Yis hydrogen, optionally substituted acyl, optionally substituted alkyl, optionally substituted alkenyl, or optionally substituted alkynyl; and each of R^3Aand R^3Bis independently optionally substituted C_1-6alkyl. In certain embodiments, in a compound of Formula (X-B), R^Yis optionally substituted C_1-6alkyl; one of R^3Aand R^3Bis

embedded image

and the other one of R^3Aand R^3Bis unsubstituted methyl, and R is as described in this application. In certain embodiments, a compound of Formula (X-B) is of Formula (9a):

embedded image

In certain embodiments, a compound of Formula (9a)

embedded image

has a chiral atom labeled with * at carbon 3 and a chiral atom labeled with ** at carbon 4. In certain embodiments, in a compound of Formula (9a)

embedded image

the chiral atom labeled with * at carbon 3 is of the R-configuration or S-configuration; and a chiral atom labeled with ** at carbon 4 is of the R-configuration. In certain embodiments, in a compound of Formula (9a)

embedded image

the chiral atom labeled with * at carbon 3 is of the S-configuration; and a chiral atom labeled with ** at carbon 4 is of the R-configuration or S-configuration. In certain embodiments, in a compound of Formula (9a)

embedded image

the chiral atom labeled with * at carbon 3 is of the R-configuration and a chiral atom labeled with ** at carbon 4 is of the R-configuration. In certain embodiments, a compound of Formula (9a)

embedded image

is of the formula:

embedded image

In certain embodiments, in a compound of Formula (9a)

embedded image

the chiral atom labeled with * at carbon 3 is of the S-configuration and a chiral atom labeled with ** at carbon 4 is of the S-configuration. In certain embodiments, a compound of Formula (9a)

embedded image

is of the formula:

embedded image

In certain embodiments, a cannabinoid compound is of Formula (X-C):

embedded image

wherein R^Zis optionally substituted alkyl or optionally substituted alkenyl. In certain embodiments, a compound of Formula (X-C) is of formula:

embedded image

wherein a is 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10. In certain embodiments, a is 1. In certain embodiments, a is 2. In certain embodiments, a is 3. In certain embodiments, a is 1, 2, or 3 for a compound of Formula (X-C). In certain embodiments, a cannabinoid compound is of Formula (X-C), and a is 1, 2, 3, 4, or 5. In certain embodiments, a compound of Formula (X-C) is of Formula (8a):

embedded image

In some embodiments, cannabinoids of the present disclosure comprise cannabinoid receptor ligands. Cannabinoid receptors are a class of cell membrane receptors in the G protein-coupled receptor superfamily. Cannabinoid receptors include the CB₁receptor and the CB₂receptor. In some embodiments, cannabinoid receptors comprise GPR18, GPR55, and PPAR. (See Brain et al. “Activation of GPR18 by cannabinoid compounds: a tale of biased agonism” Br J Pharmcol v171 (16) (2014); Shi et al. “The novel cannabinoid receptor GPR55 mediates anxiolytic-like effects in the medial orbital cortex of mice with acute stress” Molecular Brain 10, No. 38 (2017); and O'Sullvan, Elizabeth. “An update on PPAR activation by cannabinoids” Br J Pharmcol v. 173(12) (2016)).

In some embodiments, cannabinoids comprise endocannabinoids, which are substances produced within the body, and phytocannabinoids, which are cannabinoids that are naturally produced by plants of genus Cannabis. In some embodiments, phytocannabinoids comprise the acidic and decarboxylated acid forms of the naturally-occurring plant-derived cannabinoids, and their synthetic and biosynthetic equivalents.

Over 94 phytocannabinoids have been identified to date (Berman, Paula, et al. “A new ESI-LC/MS approach for comprehensive metabolic profiling of phytocannabinoids in Cannabis.” Scientific reports 8.1 (2018): 14280; El-Alfy et al., 2010, “Antidepressant-like effect of delta-9-tetrahydrocannabinol and other cannabinoids isolated from Cannabis sativa L”, Pharmacology Biochemistry and Behavior 95 (4): 434-42; Rudolf Brenneisen, 2007, Chemistry and Analysis of Phytocannabinoids, Citti, Cinzia, et al. “A novel phytocannabinoid isolated from Cannabis sativa L. with an in vivo cannabimimetic activity higher than Δ9-tetrahydrocannabinol: Δ9-Tetrahydrocannabiphorol.” Sci Rep 9 (2019): 20335, each of which is incorporated by reference in this application in its entirety). In some embodiments, cannabinoids comprise Δ⁹-tetrahydrocannabinol (THC) type (e.g., (−)-trans-delta-9-tetrahydrocannabinol or dronabinol, (+)-trans-delta-9-tetrahydrocannabinol, (−)-cis-delta-9-tetrahydrocannabinol, or (+)-cis-delta-9-tetrahydrocannabinol), cannabidiol (CBD) type, cannabigerol (CBG) type, cannabichromene (CBC) type, cannabicyclol (CBL) type, cannabinodiol (CBND) type, or cannabitriol (CBT) type cannabinoids, or any combination thereof (see, e.g., R Pertwee, ed, Handbook of Cannabis (Oxford, UK: Oxford University Press, 2014)), which is incorporated by reference in this application in its entirety). A non-limiting list of cannabinoids comprises: cannabiorcol-C1 (CBNO), CBND-C1 (CBNDO), Δ⁹-trans-Tetrahydrocannabiorcolic acid-C1 (Δ⁹-THCO), Cannabidiorcol-C1 (CBDO), Cannabiorchromene-C1 (CBCO), (−)-Δ⁸-trans-(6aR,10aR)-Tetrahydrocannabiorcol-C1 (Δ⁸-THCO), Cannabiorcyclol C1 (CBLO). CBG-C1 (CBGO), Cannabinol-C2 (CBN-C2), CBND-C2, Δ⁹-THC-C2, CBD-C2, CBC-C2, Δ⁸-THC-C2, CBL-C2, Bisnor-cannabielsoin-C1 (CBEO), CBG-C2, Cannabivarin-C3 (CBNV), Cannabinodivarin-C3 (CBNDV), (−)-Δ⁹-trans-Tetrahydrocannabivarin-C3 (Δ⁹-THCV), (−)-Cannabidivarin-C3 (CBDV), (±)-Cannabichromevarin-C3 (CBCV), (−)-Δ⁸-trans-THC-C3 (Δ⁸-THCV), (+)-(1aS,3aR,8bR,8cR)-Cannabicyclovarin-C3 (CBLV), 2-Methyl-2-(4-methyl-2-pentenyl)-7-propyl-2H-1-benzopyran-5-ol, Δ⁷-tetrahydrocannabivarin-C3 (Δ⁷-THCV), CBE-C2. Cannabigerovarin-C3 (CBGV), Cannabitriol-C1 (CBTO), Cannabinol-C4 (CBN-C4), CBND-C4, (−)-Δ⁹-trans-Tetrahydrocannabinol-C4 (Δ⁹-THC-C4), Cannabidiol-C4 (CBD-C4), CBC-C4, (−)-trans-Δ⁸-THC-C4, CBL-C4, Cannabielsoin-C3 (CBEV), CBG-C4, CBT-C2, Cannabichromanone-C3, Cannabiglendol-C3 (OH-iso-HHCV-C3), Cannabioxepane-C5 (CBX), Dehydrocannabifuran-C5 (DCBF), Cannabinol-C5 (CBN), Cannabinodiol-C5 (CBND), (−)-Δ⁹-trans-Tetrahydrocannabinol-C5 (Δ⁹-THC), (−)-Δ⁸-trans-(6aR,10aR)-Tetrahydrocannabinol-C5 (Δ⁸-THC), (±)-Cannabichromene-C5 (CBC), (−)-Cannabidiol-C5 (CBD), (±)-(1aS,3aR,8bR,8cR)-CannabicyclolC5 (CBL), Cannabicitran-C5 (CBR), (−)-Δ⁹-(6aS,10aR-cis)-Tetrahydrocannabinol-C5 ((−)-cis-Δ⁹-THC), (−)-Δ⁷-trans-(1R,3R,6R)-Isotetrahydrocannabinol-C5 (trans-isoΔ⁷-THC), CBE-C4, Cannabigerol-C5 (CBG), Cannabitrol-C3 (CBTV), Cannabinol methyl ether-C5 (CBNM), CBNDM-C5, 8-OH-CBN-C5 (OH-CBN), OH-CBND-C5 (OH-CBND), 10-Oxo-Δ^6a(10a)-Tetrahydrocannabinol-C5 (OTHC), Cannabichromanone D-C5, Cannabicoumaronone-C5 (CBCON-C5), Cannabidiol monomethyl ether-C5 (CBDM), Δ⁹-THCM-C5, (±)-3″-hydroxy-Δ⁴″-cannabichromene-C5, (5aS,6S,9R,9aR)-Cannabielsoin-C5 (CBE), 2-geranyl-5-hydroxy-3-n-pentyl-1,4-benzoquinone-C5, 5-geranyl olivetolic acid, 5-geranyl olivetolate, 8α-Hydroxy-Δ⁹-Tetrahydrocannabinol-C5 (8α-OH-Δ⁹-THC), 8β-Hydroxy-Δ⁹-Tetrahydrocannabinol-C5 (8β-OH-Δ⁹-THC), 10α-Hydroxy-Δ⁸-Tetrahydrocannabinol-C5 (10α-OH-Δ⁸-THC), 10β-Hydroxy-Δ⁸-Tetrahydrocannabinol-C5 (10β-OH-Δ⁸-THC), 10α-hydroxy-Δ^9,11-hexahydrocannabinol-C5, 9β,10β-Epoxyhexahydrocannabinol-C5, OH-CBD-C5 (OH-CBD), Cannabigerol monomethyl ether-C5 (CBGM). Cannabichromanone-C5, CBT-C4, (±)-6,7-cis-epoxycannabigerol-C5, (±)-6,7-trans-epoxycannabigerol-C5, (−)-7-hydroxycannabichromane-C5, Cannabimovone-C5, (−)-trans-Cannabitriol-C5 ((−)-trans-CBT), (+)-trans-Cannabitriol-C5 ((+)-trans-CBT), (±)-cis-Cannabitriol-C5 ((±)-cis-CBT), (−)-trans-10-Ethoxy-9-hydroxy-Δ^6a(10a)-tetrahydrocannabivarin-C3 [(−)-trans-CBT-OEt], (−)-(6aR,9S,10S,10aR)-9,10-Dihydroxyhexahydrocannabinol-C5 [(−)-Cannabiripsol] (CBR), Cannabichromanone C-C5, (−)-6a,7,10a-Trihydroxy-Δ⁹-tetrahydrocannabinol-C5 [(−)-Cannabitetrol] (CBTT), Cannabichromanone B-C5, 8,9-Dihydroxy-Δ^6a(10a)-tetrahydrocannabinol-C5 (8,9-Di-OHCBT), (±)-4-acetoxycannabichromene-C5, 2-acetoxy-6-geranyl-3-n-pentyl-1,4-benzoquinone-C5, 11-Acetoxy-Δ 9-TetrahydrocannabinolC5 (11-OAc-Δ 9-THC), 5-acetyl-4-hydroxycannabigerol-C5, 4-acetoxy-2-geranyl-5-hydroxy-3-npentylphenol-C5, (−)-trans-10-Ethoxy-9-hydroxy-Δ^6a(10a)-tetrahydrocannabinol-C5 ((−)-trans-CBTOEt), sesquicannabigerol-C5 (SesquiCBG), carmagerol-C5, 4-terpenyl cannabinolate-C5, β-fenchyl-Δ⁹-tetrahydrocannabinolate-C5, α-fenchyl-Δ⁹-tetrahydrocannabinolate-C5, epi-bornyl-Δ⁹-tetrahydrocannabinolate-C5, bornyl-Δ⁹-tetrahydrocannabinolate-C5, α-terpenyl-Δ⁹-tetrahydrocannabinolate-C5, 4-terpenyl-Δ⁹-tetrahydrocannabinolate-C5, 6,6,9-trimethyl-3-pentyl-6H-dibenzo[b,d]pyran-1-ol, 3-(1,1-dimethylheptyl)-6,6a,7,8,10,10a-hexahydro-1-hydroxy-6,6-dimethyl-9H-dibenzo[b,d]pyran-9-one, (−)-(3S,4S)-7-hydroxy-Δ⁶-tetrahydrocannabinol-1,1-dimethylheptyl, (+)-(3S,4S)-7-hydroxy-Δ⁶-tetrahydrocannabinol-1,1-dimethylheptyl, 11-hydroxy-Δ⁹-tetrahydrocannabinol, and Δ⁸-tetrahydrocannabinol-11-oic acid)); certain piperidine analogs (e.g., (−)-(6S,6aR,9R,10aR)-5,6a,7,8,9,10,10a-octahydro-6-methyl-3-[(R)-1-methy-4-phenylbutoxy]-1,9-phenanthridinediol 1-acetate)), certain aminoalkylindole analogs (e.g., (R)-(+)-[2,3-dihydro-5-methyl-3-(4-morpholinymlethyl)-pyrrolo[1,2,3-de]-1,4-benzoxazin-6-yl]-1-naphthalenyl-methanone), certain open pyran ring analogs (e.g., 2-[3-methyl-6-(1-methylethenyl)-2-cyclohexen-1-yl]-5-pentyl-1,3-benzenediol and 4-(1,1-dimethylheptyl)-2,3′-dihydroxy-6′alpha-(3-hydroxypropyl)-1′,2′,3′,4′,5′6′-hexahydrobiphenyl, tetrahydrocannabiphorol (THCP), cannabidiphorol (CBDP), CBGP, CBCP, their acidic forms, salts of the acidic forms, dimers of any combination of the above, trimers of any combination of the above, polymers of any combination of the above, or any combination thereof.

A cannabinoid described in this application can be a rare cannabinoid. For example, in some embodiments, a cannabinoid described in this application corresponds to a cannabinoid that is naturally produced in conventional Cannabis varieties at concentrations of less than 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6%, 0.5%, 0.25%, or 0.1% by dry weight of the female flower. In some embodiments, rare cannabinoids include CBGA, CBGVA, THCVA, CBDVA, CBCVA, and CBCA. In some embodiments, rare cannabinoids are cannabinoids that are not THCA, THC, CBDA or CBD.

A cannabinoid described in this application can also be a non-rare cannabinoid.

In some embodiments, the cannabinoid is selected from the cannabinoids listed in Table 1.

TABLE 1

Non-limiting examples of cannabinoids according to the present disclosure.

embedded image

Δ⁹-Tetrahydro-

cannabinol

Δ⁹-THC-C₅

embedded image

Δ⁹-Tetrahydro-

cannabinol-C₄

Δ⁹-THC-C₄

embedded image

Δ⁹-Tetrahydro-

cannabivarin

Δ⁹-THCV-C₃

embedded image

Δ⁹-Tetrahydro-

cannabiorcol

Δ⁹-THCO-C₁

embedded image

(−)-(6aS,10aR)-Δ⁹-

Tetrahydro-

cannabinol

(−)-cis-Δ⁹-THC-C₅

embedded image

Δ⁹-Tetrahydro-

cannabinolic acid A

Δ⁹-THCA-C₅A

embedded image

Δ⁹-Tetrahydro-

cannabinolic acid B

Δ⁹-THCA-C₅B

embedded image

Δ⁹-Tetrahydro-

cannabinolic acid-C₄

A and/or B

Δ⁹-THCA-C₄A and/or

B

embedded image

Δ⁹-Tetrahydro-

cannabivarinic acid A

Δ⁹-THCVA-C₃A

embedded image

Δ⁹-Tetrahydro-

cannabiorcolic acid

A and/or B

Δ⁹-THCOA-C₁A

and/or B

embedded image

(−)-Δ⁸-trans-

(6aR,10aR)-

Δ⁸-Tetrahydro-

cannabinol

Δ⁸-THC-C₅

embedded image

(−)-Δ⁸-trans-

(6aR,10aR)-

Tetrahydro-

cannabinolic

acid A

Δ⁸-THCA-C₅A

embedded image

(−)-Cannabidiol

CBD-C5

embedded image

Cannabidiol

momomethyl ether

CBDM-C5

embedded image

Cannabidiol-C4

CBD-C4

embedded image

Cannabidiolic acid

CBDA-C5

embedded image

Cannabidivarinic acid

CBDVA-C3

embedded image

(−)-Cannabidivarin

CBDV-C3

embedded image

Cannabidiorcol

CBD-C1

embedded image

Cannabigerolic acid A

(E)-CBGA-C₅A

embedded image

Cannabigerol

(E)-CBG-C₅

embedded image

Cannabigerol

monomethyl ether

(E)-CBGM-C₅A

embedded image

Cannabinerolic acid A

(Z)-CBGA-C₅A

embedded image

Cannabigerovarin

(E)-CBGV-C₃

embedded image

Cannabigerol

(E)-CBG-C₅

embedded image

Cannabigerolic acid A

(E)-CBGA-C₅A

embedded image

Cannabigerolic acid A

monomethyl ether

(E)-CBGAM-C₅A

embedded image

Cannabigerovarinic

acid A

(E)-CBGVA-C₃A

embedded image

Cannabinolic acid A

CBNA-C5 A

embedded image

Cannabinol methyl

ether

CBNM-C5

embedded image

Cannabinol

CBN-C5

embedded image

Cannabinol-C4

CBN-C4

embedded image

Cannabivarin

CBN-C3

embedded image

Cannabinol-C2

CBN-C2

embedded image

Cannabiorcol

CBN-C1

embedded image

(±)-Cannabichromene

CBC-C₅

embedded image

(±)-Cannabichromenic

acid A

CBCA-C₅A

embedded image

(±)-

Cannabivarichromene,

(±)-

Cannabichromevarin

CBCV-C₃

embedded image

(±)-Cannabichro-

mevarinic

acid A

CBCVA-C₃A

embedded image

(±)-Cannabichromene

CBC-C₅

embedded image

(±)-(1aS,3aR,8bR,8cR)-

Cannabicyclol

CBL-C₅

embedded image

(±)-(1aS,3aR,8bR,8cR)-

Cannabicyclolic acid A

CBLA-C₅A

embedded image

(±)-

(1aS,3aR,8bR,8cR)-

Cannabicyclovarin

CBLV-C₃

embedded image

(−)-(9R,10R)-trans-

10-O-Ethyl-

cannabitriol

(−)-trans-CBT-OEt-C5

embedded image

(±)-(9R,10R/9S,10S)-

Cannabitriol-C3

(±)-trans-CBT-C3

embedded image

(−)-(9R,10R)-trans-

Cannabitriol

(−)-trans-CBT-C5

embedded image

(+)-(9S,10S)-

Cannabitriol

(+)-trans-CBT-C5

embedded image

(±)-(9R,10S/9S,10R)-

Cannabitriol

(±)-cis-CBT-C5

embedded image

(−)-6a,7,10a-Trihydroxy-

Δ9-tetrahydro-

cannabinol

(−)-Cannabitetrol

embedded image

10-Oxo-Δ6a(10a)-

tetrahydro-

cannabinol

OTHC

embedded image

8,9-Dihydroxy-

Δ6a(10a)-

tetrahydro-

cannabinol

8,9-Di-OH-CBT-C5

embedded image

Cannabidiolic acid A

cannabitriol ester

CBDA-C5 9-OH-CBT-

C5 ester

embedded image

(−)-

(6aR,98,10S,10aR)-

9,10-Dihydroxy-

hexahydrocannabinol,

Cannabiripsol

Cannabiripsol-C5

embedded image

(5aS,6S,9R,9aR)-

Cannabielsoic acid B

CBEA-C5 B

embedded image

(5aS,68,9R,9aR)-

C3-Cannabielsoic

acid B

CBEA-C3 B

embedded image

(5aS,6S,9R,9aR)-

Cannabielsoin

CBE-C5

embedded image

(5aS,6S,9R,9aR)-

C3-Cannabielsoin

CBB-C3

embedded image

(5aS,68,9R,9aR)-

Cannabielsoic acid A.

CBEA-C5 A

embedded image

Cannabiglendol-C3

OH-iso-HHCV-C3

embedded image

Dehydro-

cannabifuran

DCBF-C5

embedded image

Cannabifuran

CBF-C5

embedded image

Cannabidiphorol

(CBDP)

embedded image

Tetrahydro-

cannabiphorol

(THCP)

text missing or illegible when filed

Cannabinoids are often classified by “type”, i.e., by the topological arrangement of their prenyl moieties (See, for example, M. A. Elsohly and D. Slade, Life Sci., 2005, 78, 539-548; and L. O. Hanus et al. Nat. Prod. Rep., 2016, 33, 1357). Generally, each “type” of cannabinoid includes the variations possible for ring substitutions of the resorcinol moiety at the position meta to the two hydroxyl moieties. As used herein, a “CBG-type” cannabinoid is a 3-[(2E)-3,7-dimethylocta-2,6-dienyl]-2,4-dihydroxybenzoic acid optionally substituted at the 6 position of the benzoic acid moiety. As used herein, “CBC-type” cannabinoids refer to 5-hydroxy-2-methyl-2-(4-methylpent-3-enyl)-chromene-6-carboxylic acid optionally substituted at the 7 position of the chromene moiety. As used herein, a “THC-type” cannabinoid is a (6aR,10aR)-1-hydroxy-6,6,9-trimethyl-6a,7,8,10a-tetrahydrobenzo[c]chromene-2-carboxylic acid optionally substituted at the 3 position of the benzo[c]chromene moiety. As used herein, a “CBD-type” cannabinoid is a 2,4-dihydroxy-3-[(1R,6R)-3-methyl-6-prop-1-en-2-ylcyclohex-2-en-1-yl]-benzoic acid optionally substituted at the 6 position of the benzoic acid moiety. In some embodiments, the optional ring substitution for each “type” is an optionally substituted C1-C11 alkyl, an optionally substituted C1-C11 alkenyl, an optionally substituted C1-C11 alkynyl, or an optionally substituted C1-C11 aralkyl.

Biosynthesis of Cannabinoids and Cannabinoid Precursors

Aspects of the present disclosure provide tools, sequences, and methods for the biosynthetic production of cannabinoids in host cells. In some embodiments, the present disclosure teaches expression of enzymes that are capable of producing cannabinoids by biosynthesis.

As a non-limiting example, one or more of the enzymes depicted in FIG. 2 may be used to produce a cannabinoid or cannabinoid precursor of interest. FIG. 1 shows a cannabinoid biosynthesis pathway for the most abundant phytocannabinoids found in Cannabis. See also, de Meijer et al. I, II, III, and IV (I: 2003, Genetics, 163:335-346; II: 2005, Euphytica, 145:189-198; III: 2009, Euphytica, 165:293-311; and IV: 2009, Euphytica, 168:95-112), and Carvalho et al. “Designing Microorganisms for Heterologous Biosynthesis of Cannabinoids” (2017) FEMS Yeast Research June 1:17(4), each of which is incorporated by reference in this application in its entirety for all purposes.

It should be appreciated that a precursor substrate for use in cannabinoid biosynthesis is generally selected based on the cannabinoid of interest. Non-limiting examples of cannabinoid precursors include compounds of Formulae (1)-(8) in FIG. 2. In some embodiments, polyketides, including compounds of Formula (5), could be prenylated. In certain embodiments, the precursor is a precursor compound shown in FIG. 1, 2, or 3. Substrates in which R contains 1-40 carbon atoms are preferred. In some embodiments, substrates in which R contains 3-8 carbon atoms are most preferred.

As used in this application, a cannabinoid or a cannabinoid precursor may comprise an R group. See, e.g., FIG. 2. In some embodiments, R may be a hydrogen. In certain embodiments, R is optionally substituted alkyl. In certain embodiments, R is optionally substituted C_1-40alkyl. In certain embodiments, R is optionally substituted C2-40 alkyl. In certain embodiments, R is optionally substituted C2-40 alkyl, which is straight chain or branched alkyl. In certain embodiments, R is optionally substituted C3-8 alkyl. In certain embodiments, R is optionally substituted C1-C40 alkyl, C1-C20 alkyl, C1-C10 alkyl, C1-C8 alkyl, C1-C5 alkyl, C3-C5 alkyl, C3 alkyl, or C5 alkyl. In certain embodiments, R is optionally substituted C1-C20 alkyl. In certain embodiments, R is optionally substituted C1-C10 alkyl. In certain embodiments, R is optionally substituted C1-C8 alkyl. In certain embodiments, R is optionally substituted C1-C5 alkyl. In certain embodiments, R is optionally substituted C1-C7 alkyl. In certain embodiments, R is optionally substituted C3-C5 alkyl. In certain embodiments, R is optionally substituted C3 alkyl. In certain embodiments, R is unsubstituted C3 alkyl. In certain embodiments, R is n-C3 alkyl. In certain embodiments, R is n-propyl. In certain embodiments, R is n-butyl. In certain embodiments, R is n-pentyl. In certain embodiments, R is n-hexyl. In certain embodiments, R is n-heptyl. In certain embodiments, R is of formula:

embedded image

In certain embodiments, R is optionally substituted C4 alkyl. In certain embodiments, R is unsubstituted C4 alkyl. In certain embodiments, R is optionally substituted C5 alkyl. In certain embodiments, R is unsubstituted C5 alkyl. In certain embodiments, R is optionally substituted C6 alkyl. In certain embodiments, R is unsubstituted C6 alkyl. In certain embodiments, R is optionally substituted C7 alkyl. In certain embodiments, R is unsubstituted C7 alkyl. In certain embodiments, R is of formula:

embedded image

In certain embodiments, R is of formula:

embedded image

In certain embodiments, R is of formula:

embedded image

In certain embodiments, R is of formula

embedded image

In certain embodiments, R is of formula:

embedded image

In certain embodiments, R is optionally substituted n-propyl. In certain embodiments, R is n-propyl optionally substituted with optionally substituted aryl. In certain embodiments, R is n-propyl optionally substituted with optionally substituted phenyl. In certain embodiments, R is n-propyl substituted with unsubstituted phenyl. In certain embodiments, R is optionally substituted butyl. In certain embodiments, R is optionally substituted n-butyl. In certain embodiments, R is n-butyl optionally substituted with optionally substituted aryl. In certain embodiments, R is n-butyl optionally substituted with optionally substituted phenyl. In certain embodiments, R is n-butyl substituted with unsubstituted phenyl. In certain embodiments, R is optionally substituted pentyl. In certain embodiments, R is optionally substituted n-pentyl. In certain embodiments, R is n-pentyl optionally substituted with optionally substituted aryl. In certain embodiments, R is n-pentyl optionally substituted with optionally substituted phenyl. In certain embodiments, R is n-pentyl substituted with unsubstituted phenyl. In certain embodiments, R is optionally substituted hexyl. In certain embodiments, R is optionally substituted n-hexyl. In certain embodiments, R is optionally substituted n-heptyl. In certain embodiments, R is optionally substituted n-octyl. In certain embodiments, R is alkyl optionally substituted with aryl (e.g., phenyl). In certain embodiments. R is optionally substituted acyl (e.g., —C(═O)Me).

In certain embodiments, R is optionally substituted alkenyl (e.g., substituted or unsubstituted C_2-6alkenyl). In certain embodiments, R is substituted or unsubstituted C_2-6alkenyl. In certain embodiments, R is substituted or unsubstituted C_2-5alkenyl. In certain embodiments, R is of formula:

embedded image

In certain embodiments, R is optionally substituted alkynyl (e.g., substituted or unsubstituted C_2-6alkynyl). In certain embodiments, R is substituted or unsubstituted C_2-6alkynyl. In certain embodiments, R is of formula:

embedded image

In certain embodiments, R is optionally substituted carbocyclyl. In certain embodiments, R is optionally substituted aryl (e.g., phenyl or napthyl).

The chain length of a precursor substrate can be from C1-C40. Those substrates can have any degree and any kind of branching or saturation or chain structure, including, without limitation, aliphatic, alicyclic, and aromatic. In addition, they may include any functional groups including hydroxy, halogens, carbohydrates, phosphates, methyl-containing or nitrogen-containing functional groups.

In some embodiments, R is H, an optionally substituted C1-C11 alkyl, an optionally substituted C1-C11 alkenyl, an optionally substituted C1-C11 alkynyl, or an optionally substituted C1-C11 aralkyl.

For example, FIG. 3 shows a non-exclusive set of putative precursors for the cannabinoid pathway. Aliphatic carboxylic acids including four to eight total carbons (“C4”-“C8” in FIG. 3) and up to 10-12 total carbons with either linear or branched chains may be used as precursors for the heterologous pathway. Non-limiting examples include methanoic acid, butyric acid, pentanoic acid, hexanoic acid, heptanoic acid, isovaleric acid, octanoic acid, and decanoic acid. Additional precursors may include ethanoic acid and propanoic acid. In some embodiments, in addition to acids, the ester, salt, and acid forms may all be used as substrates. Substrates may have any degree and any kind of branching, saturation, and chain structure, including, without limitation, aliphatic, alicyclic, and aromatic. In addition, they may include any functional modifications or combination of modifications including, without limitation, halogenation, hydroxylation, amination, acylation, alkylation, phenylation, and/or installation of pendant carbohydrates, phosphates, sulfates, heterocycles, or lipids, or any other functional groups.

Substrates for any of the enzymes disclosed in this application may be provided exogenously or may be produced endogenously by a host cell. In some embodiments, the cannabinoids are produced from a glucose substrate, so that compounds of Formula 1 shown in FIG. 2 and CoA precursors are synthesized by the cell. In other embodiments, a precursor is fed into the reaction. In some embodiments, a precursor is a compound selected from Formulae 1-8 in FIG. 2.

Cannabinoids produced by methods disclosed in this application include rare cannabinoids. Due to the low concentrations at which cannabinoids, including rare cannabinoids, occur in nature, producing industrially significant amounts of isolated or purified cannabinoids from the Cannabis plant may become prohibitive, especially in the case of rare cannabinoids, due to, e.g., the large volumes of Cannabis plants, and the large amounts of space, labor, time, and capital requirements to grow, harvest, and/or process the plant materials (see, for example, Crandall, K., 2016. A Chronic Problem: Taming Energy Costs and Impacts from Marijuana Cultivation. EQ Research; Mills, E., 2012. The carbon footprint of indoor Cannabis production. Energy Policy, 46, pp. 58-67; Jourabchi. M. and M. Lahet. 2014. Electrical Load Impacts of Indoor Commercial Cannabis Production. Presented to the Northwest Power and Conservation Council; O'Hare, M., D. Sanchez, and P. Alstone. 2013. Environmental Risks and Opportunities in Cannabis Cultivation. Washington State Liquor and Cannabis Board; 2018. Comparing Cannabis Cultivation Energy Consumption. New Frontier Data; and Madhusoodanan. J., 2019. Can Cannabis go green? Nature Outlook: Cannabis; all of which are incorporated by reference in this disclosure). The disclosure provided in this application represents a potentially efficient method for producing high yields of cannabinoids, including rare cannabinoids. The disclosure provided in this application also represents a potential method for addressing concerns related to agricultural practices and water usage associated with traditional methods of cannabinoid production (Dillis et al. “Water storage and irrigation practices for Cannabis drive seasonal patterns of water extraction and use in Northern California.” Journal of Environmental Management 272 (2020): 110955, incorporated by reference in this disclosure).

Cannabinoids produced by the disclosed methods also include non-rare cannabinoids. Without being bound by a particular theory, the methods described in this application may be advantageous compared with traditional plant-based methods for producing non-rare cannabinoids. For example, methods provided in this application represent potentially efficient means for producing consistent and high yields of non-rare cannabinoids. With traditional methods of cannabinoid production, in which cannabinoids are harvested from plants, maintaining consistent and uniform conditions, including airflow, nutrients, lighting, temperature, and humidity, can be difficult. For example, with plant-based methods, there can be microclimates created by branching, which can lead to inconsistent yields and by-product formation. In some embodiments, the methods described in this application are more efficient at producing a cannabinoid of interest as compared to harvesting cannabinoids from plants. For example, with plant-based methods, seed-to-harvest can take up to half a year, while cutting-to-harvest usually takes about 4 months. Additional steps including drying, curing, and extraction are also usually needed with plant-based methods. In contrast, in some embodiments, the fermentation-based methods described in this application only take about 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 days. In some embodiments, the fermentation-based methods described in this application only take about 3-5 days. In some embodiments, the fermentation-based methods described in this application only take about 5 days. In some embodiments, the methods provided in this application reduce the amount of security needed to comply with regulatory standards. For example, a smaller secured area may be needed to be monitored and secured to practice the methods described in this application as compared to the cultivation of plants. In some embodiments, the methods described in this application are advantageous over plant-sourced cannabinoids.

Penyltransferase (PT)

Aspects of the disclosure relate to prenyltransferase (PT) enzymes. As used in this disclosure, a “PT” refers to an enzyme that is capable of transferring prenyl groups to acceptor molecule substrates. Non-limiting examples of prenyltransferases are described in U.S. Pat. No. 7,544,498 and Kumano et al., Bioorg Med Chem. 2008 Sep. 1; 16(17): 8117-8126 (e.g., NphB), PCT Publication No. WO 2018/200888 (e.g., CsPT4), U.S. Pat. No. 8,884,100 (e.g., CsPT1); CA2718469; Valliere et al., Nat Commun. 2019 Feb. 4; 10(1):565 (e.g., NphB variants); PCT Publication Nos: WO2019/173770, WO2019/183152, and WO2020/210810 (e.g., NphB variants); Luo et al., Nature 2019 March; 567(7746):123-126 (e.g., CsPT4); and WO2021/034848. In some embodiments, a PT is capable of producing cannabigerolic acid (CBGA), cannabigerophorolic acid (CBGPA), cannabigerovarinic acid (CBGVA), a CBG-type cannabinoid, or other cannabinoids or cannabinoid-like substances. In some embodiments, a PT is a cannabigerolic acid synthase (CBGAS). In some embodiments, a PT is cannabigerovarinic acid synthase (CBGVAS).

In some embodiments, the PT is a NphB prenyltransferase. See, e.g., U.S. Pat. No. 7,544,498; and Kumano et al., Bioorg Med Chem. 2008 Sep. 1; 16(17): 8117-8126, which are incorporated by reference in this application in their entireties. In some embodiments, a PT corresponds to NphB from Streptomyces sp. (see, e.g., UniprotKB Accession No. Q4R2T2, see also SEQ ID NO: 2 of U.S. Pat. No. 7,361,483). The protein sequence corresponding to UniprotKB Accession No. Q4R2T2 is provided by SEQ ID NO: 1:

(SEQ ID NO: 1)

MSEAADVERVYAAMEEAAGLLGVACARDKIYPLLSTFQDT

LVEGGSVVVFSMASGRHSTELDFSISVPTSHGDPYATVVE

KGLFPATGHPVDDLLADTQKHLPVSMFAIDGEVTGGFKKT

YAFFPTDNMPGVAELSAIPSMPPAVAENAELFARYGLDKV

QMTSMDYKKRQVNLYFSELSAQTLEAESVLALVRELGLHV

PNELGLKFCKRSFSVYPTLNWETGKIDRLCFAVISNDPTL

VPSSDEGDIEKFHNYATKAPYAYVGEKRTLVYGLTLSPKE

EYYKLGAYYHITDVQRGLLKAFDSLED.

A non-limiting example of a nucleic acid sequence encoding NphB is:

(SEQ ID NO: 2)

atgtcagaagccgcagatgtcgaaagagtttacgccgcta

tggaagaagccgccggittgttaggtgttgcctgtgccag

agataagatctacccattgttgtctacttttcaagataca

ttagttgaaggggttcagttgttgttttctctatggcttc

aggtagacattctacagaattggatttctctatctcagtt

ccaacatcacatggtgatccatacgctactgttgttgaaa

aaggtttatttccagcaacaggtcatccagttgatgattt

gttggctgatactcaaaagcatttgccagittctatgtit

gcaattgatggtgaagttactggtggtttcaagaaaactt

acgctttctttccaactgataacatgccaggtgttgcaga

attatctgctattccatcaatgccaccagctgttgcagaa

aatgcagaattatttgctagatacggtttggataaggttc

aaatgacatctatggattacaagaaaagacaagttaattt

gtactittctgaattatcagcacaaactitggaagctgaa

tcagttttggcattagttagagaattgggtitacatgttc

caaacgaattgggtttgaagttttgtaaaagatctttctc

agtttatccaactttaaactgggaaacaggcaagatcgat

agattatgtttcgcagttatctctaacgatccaacattgg

ttccatcttcagatgaaggtgatatcgaaaagtttcataa

ctacgctactaaagcaccatatgcttacgttggtgaaaag

agaacattagtttatggittgactttatcaccaaaggaag

aatactacaagttgggtgcttactaccacattaccgacgt

acaaagaggtttattgaaagcattcgatagtttagaagac

taa.

In other embodiments, a PT is CsPT1, which is disclosed as SEQ ID NO:2 in U.S. Pat. No. 8,884,100, corresponding to SEQ ID NO: 3 in this application:

(SEQ ID NO: 3)

MGLSSVCTFSFQTNYHTLLNPHNNNPKTSLLCYRHPKTPI

KYSYNNFPSKHCSTKSFHLQNKCSESLSIAKNSIRAATTN

QTEPPESDNHSVATKILNFGKACWKLQRPYTIIAFTSCAC

GLFGKELLHNTNLISWSLMFKAFFFLVAILCIASFTTTIN

QIYDLHIDRINKPDLPLASGEISVNTAWIMSIIVALFGLI

ITIKMKGGPLYIFGYCFGIFGGIVYSVPPFRWKQNPSTAF

LLNFLAHIITNFTFYYASRAALGLPFELRPSFTFLLAFMK

SMGSALALIKDASDVEGDTKFGISTLASKYGSRNLTLFCS

GIVLLSYVAAILAGIIWPQAFNSNVMLLSHAILAFWLILQ

TRDFALTNYDPEAGRRFYEFMWKLYYAEYLVYVFI.

In some embodiments, a PT is a truncated CsPT1. In some embodiments, a truncated CsPT1 corresponds to SEQ ID NO: 1185:

(SEQ ID NO: 1185)

MAATTNQTEPPESDNHSVATKILNFGKACWKLQRPYTIIAFTSCACGLF

GKELLHNTNLISWSLMFKAFFFLVAILCIASFTTTINQIYDLHIDRINK

PDLPLASGEISVNTAWIMSIIVALFGLIITIKMKGGPLYIFGYCFGIFG

GIVYSVPPFRWKQNPSTAFLLNFLAHIITNFTFYYASRAALGLPFELRP

SFTFLLAFMKSMGSALALIKDASDVEGDTKFGISTLASKYGSRNLTLFc

SGIVLLSYVAAILAGIIWPQAFNSNVMLLSHAILAFWLILQTRDFALTN

YDPEAGRRFYEFMWKLYYAEYLVYVFI.

In some embodiments, a PT is CsPT4, which is disclosed as SEQ ID NO:1 in WO 2019/071000, corresponding to SEQ ID NO: 4 in this application;

(SEQ ID NO: 4)

MGLSLVCTFSFQTNYHTLLNPHNKNPKNSLLSYQHPKTPIIKSSYDNFP

SKYCLTKNFHLLGLNSHNRISSQSRSIRAGSDQIEGSPHHESDNSIATK

ILNFGHTCWKLQRPYVVKGMISIACGLFGRELFNNRHLFSWGLMWKAFF

ALVPILSFNFFAAIMNQIYDVDIDRINKPDLPLVSGEMSIETAWILSII

VALTGLIVTIKLKSAPLFVFIYIFGIFAGFAYSVPPIRWKQYPFTNFLI

TISSHVGLAFTSYSATTSALGLPFVWRPAFSFIIAFMTVMGMTIAFAKD

ISDIEGDAKYGVSTVATKLGARNMTFVVSGVLLLNYLVSISIGIIWPQV

FKSNIMILSHAILAFCLIFQTRELALANYASAPSRQFFEFIWLLYYAEY

FVYVFI.

In some embodiments, a PT is a truncated CsPT4. In some embodiments, a truncated CsPT4 is provided by SEQ ID NO: 5;

(SEQ ID NO: 5)

MSAGSDQIEGSPHHESDNSIATKILNFGHTCWKLQRPYVVKGMISIACG

LFGRELFNNRHLFSWGLMWKAFFALVPILSFNFFAAIMNQIYDVDIDRI

NKPDLPLVSGEMSIETAWILSIIVALTGLIVTIKLKSAPLFVFIYIFGI

FAGFAYSVPPIRWKQYPFTNFLITISSHVGLAFTSYSATTSALGLPFVW

RPAFSFIIAFMTVMGMTIAFAKDISDIEGDAKYGVSTVATKLGARNMTF

VVSGVLLLNYLVSISIGIIWPQVFKSNIMILSHAILAFCLIFQTRELAL

ANYASAPSRQFFEFIWLLYYAEYFVYVFI.

In some embodiments, a truncated CsPT4 is provided by SEQ ID NO: 6.

(SEQ ID NO: 6)

SAGSDQIEGSPHHESDNSIATKILNFGHTCWKLQRPYVVKGMISIACGL

FGRELFNNRHLFSWGLMWKAFFALVPILSFNFFAAIMNQIYDVDIDRIN

KPDLPLVSGEMSIETAWILSIIVALTGLIVTIKLKSAPLFVFIYIFGIF

AGFAYSVPPIRWKQYPFTNFLITISSHVGLAFTSYSATTSALGLPFVWR

PAFSFIIAFMTVMGMTIAFAKDISDIEGDAKYGVSTVATKLGARNMTFV

VSGVLLLNYLVSISIGIIWPQVFKSNIMILSHAILAFCLIFQTRELALA

NYASAPSRQFFEFIWLLYYAEYFVYVFI.

In some embodiments, a truncated CsPT4 is provided by SEQ ID NO: 7.

(SEQ ID NO: 7)

IEGSPHHESDNSIATKILNFGHTCWKLQRPYVVKGMISIACGLFGRELF

NNRHLFSWGLMWKAFFALVPILSFNFFAAIMNQIYDVDIDRINKPDLPL

VSGEMSIETAWILSIIVALTGLIVTIKLKSAPLFVFIYIFGIFAGFAYS

VPPIRWKQYPFTNFLITISSHVGLAFTSYSATTSALGLPFVWRPAFSFI

IAFMTVMGMTIAFAKDISDIEGDAKYGVSTVATKLGARNMTFVVSGVLL

LNYLVSISIGIIWPQVFKSNIMILSHAILAFCLIFQTRELALANYASAP

SRQFFEFIWLLYYAEYFVYVFI.

In some embodiments, a truncated CsPT4 is provided by SEQ ID NO: 8.

(SEQ ID NO: 8)

HHESDNSIATKILNFGHTCWKLQRPYVVKGMISIACGLFGRELFNNRHL

FSWGLMWKAFFALVPILSFNFFAAIMNQIYDVDIDRINKPDLPLVSGEM

SIETAWILSIIVALTGLIVTIKLKSAPLFVFIYIFGIFAGFAYSVPPIR

WKQYPFTNFLITISSHVGLAFTSYSATTSALGLPFVWRPAFSFIIAFMT

VMGMTIAFAKDISDIEGDAKYGVSTVATKLGARNMTFVVSGVLLLNYLV

SISIGIIWPQVFKSNIMILSHAILAFCLIFQTRELALANYASAPSRQFF

EFIWLLYYAEYFVYVFI.

In some embodiments, a PT is CsPT6, which is provided by SEQ ID NO: 9, corresponding to UniProt Accession No. A0A455ZIL7.

(SEQ ID NO: 9)

SSDLPSVLSKGGSNWRRCNLKNVEFSGSYVAYNVSRLRVWKVREKPCSA

VFQPSSLKHCAKGSETFVFYQRPNERFLVKAAGGQPLESEPKNDMNSAK

DALDAFYRFSRPHTVIGTALSIVSVSLLAIEKLSDFSPLFFVGMLEAIV

AALLMNIYIVGLNQLYDIDIDKVNKPYLPLASGEYSIQTGVMIVASFSI

LSFGVGWLVGSWPLFWALFISFVLGTAYSINVPLLRWKRFALVAAMCIL

AVRAVIVQLAFFLHIQTHVFKRPAVFSRPLIFATAFMSFFSVVIALFKD

IPDIDGDRIYGIRSFTVRLGQKRVFWICISLLEIAYTVALLVGASSGFL

WSKVVTVLGHTILASILWTNAKSVDLSSKAAITSFYMFIWKLFYAEYLL

IPLVR.

In other embodiments, a PT is a truncated CsPT6. In some embodiments, a truncated CsPT6 is provided by SEQ ID NO: 701.

(SEQ ID NO: 701)

MSYVAYNVSRLRVWKVREKPCSAVFQPSSLKHCAKGSETFVFYQRPNER

FLVKAAGGQPLESEPKNDMNSAKDALDAFYRFSRPHTVIGTALSIVSVS

LLAIEKLSDFSPLFFVGMLEAIVAALLMNIYIVGLNQLYDIDIDKVNKP

YLPLASGEYSIQTGVMIVASFSILSFGVGWLVGSWPLFWALFISFVLGT

AYSINVPLLRWKRFALVAAMCILAVRAVIVQLAFFLHIQTHVFKRPAVF

SRPLIFATAFMSFFSVVIALFKDIPDIDGDRIYGIRSFTVRLGQKRVFW

ICISLLEIAYTVALLVGASSGFLWSKVVTVLGHTILASILWTNAKSVDL

SSKAAITSFYMFIWKLFYAEYLLIPLVR.

In some embodiments, a PT is CsPT7, which is provided by SEQ ID NO: 10, corresponding to UniProt Accession No. A0A455ZJ77.

MELSSICNFSFQTNYHTLLNPHNKNPKSSLLSHQHPKTPIITSSYNNFP

SNYCSNKNFHLQNRCSKSLLIAKNSIRTDTANQTEPPESNTKYSVVTKI

LSFGHTCWKLQRPYTFIGVISCACGLFGRELFHNTNLLSWSLMLKAFSS

LMVILSVNLCTNIINQITDLDIDRINKPDLPLASGEMSIETAWIMSIIV

ALTGLILTIKLNCGPLFISLYCVSILVGALYSVPPFRWKQNPNTAFSSY

FMGLVIVNFTCYYASRAAFGLPFEMSPPFTFILAFVKSMGSALFLCKDV

SDIEGDSKHGISTLATRYGAKNITFLCSGIVLLTYVSAILAAIIWPQAF

KSNVMLLSHATLAFWLIFQTREFALTNYNPEAGRKFYEFMWKLHYAEYL

VYVFI.

In other embodiments, a CsPT is a truncated CsPT7. In some embodiments, a truncated CsPT7 is provided by SEQ ID NO: 702

(SEQ ID NO: 702)

MSTDTANQTEPPESNTKYSVVTKILSFGHTCWKLQRPYTFIGVISCACG

LFGRELFHNTNLLSWSLMLKAFSSLMVILSVNLCTNIINQITDLDIDRI

NKPDLPLASGEMSIETAWIMSIIVALTGLILTIKLNCGPLFISLYCVSI

LVGALYSVPPFRWKQNPNTAFSSYFMGLVIVNFTCYYASRAAFGLPFEM

SPPFTFILAFVKSMGSALFLCKDVSDIEGDSKHGISTLATRYGAKNITF

LCSGIVLLTYVSAILAAIIWPQAFKSNVMLLSHATLAFWLIFQTREFAL

TNYNPEAGRKFYEFMWKLHYABYLVYVFI.

a. Chimeric Prenyltransferase

Examples 1-8 describe identification of synthetic PTs that can be functionally expressed in host cells such as S. cerevisiae. Nucleic acid and protein sequences for PTs identified in this application are provided in Tables 13-16 and 19-20.

PTs provided in this disclosure include chimeric PTs. As used in this disclosure, a “chimeric PT” refers to a PT that includes one or more portions of at least two different PT proteins. It has previously been reported that it is difficult to express C. sativa PTs in S. cerevisiae; for example, out of CsPT1-7, only CsPT4 was reported to produce CBGA when expressed heterologously in S. cerevisiae, and only at low titers (Luo et al., Nature 2019 March; 567(7746):123-126). It was surprisingly shown in Examples 1-8 of this disclosure that chimeric PTs, such as PTs that included portions of at least two of CsPT1, CsPT4, CsPT6, and CsPT7, were able to produce CBGA and/or CBGVA.

In some embodiments, chimeric PTs comprise one or more portions of CsPT1 and one or more portions of a non-CsPT1 PT. A portion can include, for example, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, or more than 390 amino acids. In some embodiments, a non-CsPT1 PT is a PT from C. sativa. In some embodiments, a non-CsPT1 PT is CsPT4, CsPT6, or CsPT7.

In some embodiments, chimeric PTs comprise one or more portions of CsPT4 and one or more portions of a non-CsPT4 PT. A portion can include, for example, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, or more than 390 amino acids. In some embodiments, a non-CsPT4 PT is a PT from C. sativa. In some embodiments, a non-CsPT4 PT is CsPT1, CsPT6, or CsPT7.

In some embodiments, chimeric PTs comprise one or more portions of CsPT6 and one or more portions of a non-CsPT6 PT. A portion can include, for example, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, or more than 390 amino acids. In some embodiments, a non-CsPT6 PT is a PT from C. sativa. In some embodiments, anon-CsPT6 PT is CsPT1, CsPT4, or CsPT7.

In some embodiments, chimeric PTs comprise one or more portions of CsPT7 and one or more portions of a non-CsPT7 PT. A portion can include, for example, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, or more than 390 amino acids. In some embodiments, a non-CsPT7 PT is a PT from C. sativa. In some embodiments, a non-CsPT7 PT is CsPT1, CsPT4, or CsPT6.

As described in Example 1 and FIG. 7, two different approaches were pursued for developing chimeric PTs based on where the cross-over points between different PT proteins occurred. As used in this disclosure, a “cross-over point” for a chimeric PT that contains portions of proteins “A” and “B” refers to the position where the sequence of the chimeric PT changes from protein A to B or vice versa. As discussed in Example 1 and as shown in FIG. 7, chimeric PTs can be generated using a “within membrane” approach or a “through membrane” approach. An example of a chimeric PT generated using a “within membrane” approach is shown in FIG. 7A. In this approach, the one or more cross-over points in the chimeric PT occur within the transmembrane helices of the chimeric PT. A “through membrane approach” is shown in FIG. 7B. In this approach, the one or more cross-over points in the chimeric PT occur outside of the transmembrane helices of the chimeric PT. For example, in FIG. 7B one single cross-over point is shown between helices 6&7 of the chimeric PT protein. Cross-over points can also occur between other helices, such as between helices 7&8 or 8&9.

Chimeric PTs associated with the disclosure include multiple transmembrane helices. As used in this disclosure, “multiple” transmembrane helices refers to more than one transmembrane helix. In some embodiments, chimeric PTs include 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 or more than 15 transmembrane helices. In some embodiments, chimeric PTs include 9 transmembrane helices.

In some embodiments, at least one transmembrane helix includes both a portion of CsPT1 and a portion of a non-CsPT1 PT. In some embodiments, the non-CsPT1 PT is a PT from C. sativa. In some embodiments, the non-CsPT1 PT is CsPT4, CsPT6 or CsPT7. In some embodiments, all the transmembrane helices comprise both a portion of CsPT1 and a portion of a non-CsPT1 PT. In some embodiments, all the transmembrane helices comprise both a portion of CsPT1 and a portion of CsPT4, CsPT6 or CsPT7.

In some embodiments, at least one transmembrane helix includes both a portion of CsPT4 and a portion of a non-CsPT4 PT. In some embodiments, the non-CsPT4 PT is a PT from C. sativa. In some embodiments, the non-CsPT4 PT is CsPT1, CsPT6 or CsPT7. In some embodiments, all the transmembrane helices comprise both a portion of CsPT4 and a portion of a non-CsPT4 PT. In some embodiments, all the transmembrane helices comprise both a portion of CsPT4 and a portion of CsPT1, CsPT6 or CsPT7.

In some embodiments, at least one transmembrane helix includes both a portion of CsPT6 and a portion of a non-CsPT6 PT. In some embodiments, the non-CsPT6 PT is a PT from C. sativa. In some embodiments, the non-CsPT6 PT is CsPT1, CsPT4 or CsPT7. In some embodiments, all the transmembrane helices comprise both a portion of CsPT6 and a portion of a non-CsPT6 PT. In some embodiments, all the transmembrane helices comprise both a portion of CsPT6 and a portion of CsPT1, CsPT4 or CsPT7.

In some embodiments, at least one transmembrane helix includes both a portion of CsPT7 and a portion of a non-CsPT7 PT. In some embodiments, the non-CsPT7 PT is a PT from C. sativa. In some embodiments, the non-CsPT7 PT is CsPT1, CsPT4 or CsPT6. In some embodiments, all the transmembrane helices comprise both a portion of CsPT7 and a portion of a non-CsPT7 PT. In some embodiments, all the transmembrane helices comprise both a portion of CsPT7 and a portion of CsPT1, CsPT4 or CsPT6.

As one of ordinary skill in the art would appreciate, multiple different computational analysis programs may be used to determine secondary structures in proteins, such as CsPT proteins. Different computational analysis programs may define the boundaries of the secondary structures differently. For example, the Uniprot entry AOA455ZJC3 (corresponding to CsPT4) uses Phobius to predict that there are 8 sequences therewithin that are highly probable to be transmembrane helices. There is also a portion of the sequence with lower probability to be a transmembrane domain that is not listed on the Uniprot entry. As a comparison, for Uniprot entry 028625, which is a protein with the highest sequence identity to CsPT4 for which there is a crystal structure (ex. pdbID: 4tq3), the Uniprot entry similarly indicates that there are 8 transmembrane helices, while the structure itself shows 9 transmembrane helices. Without being bound by any theory, the lower probability transmembrane domain helix of CsPTs may be an actual transmembrane domain helix that did not meet an arbitrary probability threshold for annotation on UniProt based on the software prediction.

Table 2 provides a non-limiting example of predicted domains within CsPT1-CsPT7. “Inner” means inside the cell, “membrane” means in the cell membrane, and “outer” means outside the cell.

TABLE 2

Predicted domains within CsPT1-CsPT7

Domain
CsPT1
CsPT2
CsPT3
CsPT4
CsPT5
CsPT6
CsPT7

1
Inner,
Inner,
Inner,
Inner,
Inner,
Inner,
Inner,

1-35
1-34
1-94
1-37
1-34
1-85
1-37

2
Membrane,
Membrane,
Membrane,
Membrane,
Membrane,
Membrane,
Membrane,

36-53
35-51
95-111
38-55
35-54
86-102
38-55

3
Outer,
Outer,
Outer,
Outer,
Outer,
Outer,
Outer,

54-57
52-59
112-125
56-69
55-62
103-110
56-69

4
Membrane,
Membrane,
Membrane,
Membrane,
Membrane,
Membrane,
Membrane,

68-84
60-82
126-143
70-87
63-80
111-133
70-87

5
Inner,
Inner,
Inner,
Inner,
Inner,
Inner,
Inner,

86-111
83-108
144-169
88-113
81-106
134-159
88-113

6
Membrane,
Membrane,
Membrane,
Membrane,
Membrane,
Membrane,
Membrane,

112-129
109-128
170-188
114-131
107-125
160-179
114-131

7
Outer,
Outer,
Outer,
Outer,
Outer,
Outer,
Outer,

130-135
129-132
189-196
132-137
126-129
180-183
132-137

8
Membrane,
Membrane,
Membrane,
Membrane,
Membrane,
Membrane,
Membrane,

136-153
133-150
197-214
138-155
130-149
184-201
138-155

9
Inner,
Inner,
Inner,
Inner,
Inner,
Inner,
Inner,

154-165
151-158
215-226
156-167
150-157
202-209
156-167

10
Membrane,
Membrane,
Membrane,
Membrane,
Membrane,
Membrane,
Membrane,

166-183
159-181
227-246
168-192
158-177
210-232
168-185

11
Outer,
Outer,
Outer,
Outer,
Outer,
Outer,
Outer,

184-197
182-197
247-254
193-198
178-189
233-248
186-199

12
Membrane,
Membrane,
Membrane,
Membrane.
Membrane,
Membrane,
Membrane,

200-215
198-215
255-272
199-216
190-209
249-266
200-217

13
Inner,
Inner,
Inner,
Inner,
Inner,
Inner,
Inner,

216-241
216-241
273-298
217-244
210-237
267-292
218-243

14
Membrane,
Membrane,
Membrane,
Membrane,
Membrane,
Membrane,
Membrane,

242-259
242-261
299-320
245-264
238-257
293-312
244-263

15
Outer,
Outer,
Outer,
Outer,
Outer,
Outer,
Outer,

260-265
262-269
321-328
265-270
258-265
313-320
264-267

16
Membrane,
Membrane,
Membrane,
Membrane,
Membrane,
Membrane,
Membrane,

266-284
270-287
329-348
271-288
266-285
321-338
268-287

17
Inner,
Inner,
Inner,
Inner,
Inner,
Inner,
Inner,

285-302
288-299
349-360
289-304
286-293
339-350
288-304

18
Membrane,
Membrane,
Membrane,
Membrane,
Membrane,
Membrane,
Membrane,

303-320
300-319
361-378
305-322
294-313
351-370
305-322

19
Outer,
Outer,
Outer,
Outer,
Outer,
Outer,
Outer,

321
320
379
323
314-316
371
323

In some embodiments, a chimeric PT comprises portions of 1, 2, 3, 4, 5, 6, 7, or more than 7 different PTs. In some embodiments, the chimeric PT comprises one or more portions of CsPT1 and one or more portions of CsPT2, CsPT3, CsPT4, CsPT5, CsPT6, or CsPT7. In some embodiments, the chimeric PT comprises one or more portions of CsPT1 and one or more portions of CsPT4. In some embodiments, the chimeric PT comprises one or more portions of CsPT1 and one or more portions of CsPT6. In some embodiments, the chimeric PT comprises one or more portions of CsPT1 and one or more portions of CsPT7. In some embodiments, the chimeric PT comprises one or more portions of CsPT1, one or more portions of CsPT4, one or more portions of CsPT6, and/or one or more portions of CsPT7.

In some embodiments, the chimeric PT comprises one or more portions of CsPT4 and one or more portions of CsPT1, CsPT2, CsPT3, CsPT5, CsPT6 or CsPT7. In some embodiments, the chimeric PT comprises one or more portions of CsPT4 and one or more portions of CsPT1. In some embodiments, the chimeric PT comprises one or more portions of CsPT4 and one or more portions of CsPT6. In some embodiments, the chimeric PT comprises one or more portions of CsPT4 and one or more portions of CsPT7. In some embodiments, the chimeric PT comprises one or more portions of CsPT4, one or more portions of CsPT1, one or more portions of CsPT6, and/or one or more portions of CsPT7.

In some embodiments, the chimeric PT comprises one or more portions of CsPT6 and one or more portions of CsPT1, CsPT2, CsPT3, CsPT4, CsPT5 or CsPT7. In some embodiments, the chimeric PT comprises one or more portions of CsPT6 and one or more portions of CsPT1. In some embodiments, the chimeric PT comprises one or more portions of CsPT6 and one or more portions of CsPT4. In some embodiments, the chimeric PT comprises one or more portions of CsPT6 and one or more portions of CsPT7. In some embodiments, the chimeric PT comprises one or more portions of CsPT6, one or more portions of CsPT1, one or more portions of CsPT4, and/or one or more portions of CsPT7.

In some embodiments, the chimeric PT comprises one or more portions of CsPT7 and one or more portions of CsPT1, CsPT2, CsPT3, CsPT4, CsPT5 or CsPT6. In some embodiments, the chimeric PT comprises one or more portions of CsPT7 and one or more portions of CsPT1. In some embodiments, the chimeric PT comprises one or more portions of CsPT7 and one or more portions of CsPT4. In some embodiments, the chimeric PT comprises one or more portions of CsPT7 and one or more portions of CsPT6. In some embodiments, the chimeric PT comprises one or more portions of CsPT7, one or more portions of CsPT1, one or more portions of CsPT4, and/or one or more portions of CsPT6.

In some embodiments, at least 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, or 95% of a chimeric PT is derived from CsPT1. In some embodiments, at least 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, or 95% of a transmembrane helix of a chimeric PT is derived from CsPT1.

In some embodiments, at least 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39% 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, or 95% of a chimeric PT is derived from CsPT2. In some embodiments, at least 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, or 95% of a transmembrane helix of a chimeric PT is derived from CsPT2.

In some embodiments, at least 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, or 95% of a chimeric PT is derived from CsPT3. In some embodiments, at least 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, or 95% of a transmembrane helix of a chimeric PT is derived from CsPT3.

In some embodiments, at least 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, or 95% of a chimeric PT is derived from CsPT4. In some embodiments, at least 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, or 95% of a transmembrane helix of a chimeric PT is derived from CsPT4.

In some embodiments, at least 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44% 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, or 95% of a chimeric PT is derived from CsPT5. In some embodiments, at least 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, or 95% of a transmembrane helix of a chimeric PT is derived from CsPT5.

In some embodiments, at least 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, or 95% of a chimeric PT is derived from CsPT6. In some embodiments, at least 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, or 95% of a transmembrane helix of a chimeric PT is derived from CsPT6.

In some embodiments, at least 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, or 95% of a chimeric PT is derived from CsPT7. In some embodiments, at least 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, or 95% of a transmembrane helix of a chimeric PT is derived from CsPT7.

In some embodiments, a chimeric PT comprises all or part of the active site of CsPT1. In some embodiments, a chimeric PT comprises all or part of the active site of CsPT2. In some embodiments, a chimeric PT comprises all or part of the active site of CsPT3. In some embodiments, a chimeric PT comprises all or part of the active site of CsPT4. In some embodiments, a chimeric PT comprises all or part of the active site of CsPT5. In some embodiments, a chimeric PT comprises all or part of the active site of CsPT6. In some embodiments, a chimeric PT comprises all or part of the active site of CsPT7.

In some embodiments, a chimeric PT includes one or more of the following motifs: MTVMGMT (SEQ ID NO; 11); [EV][LMW][RS]P[SAP]F[ST]F[IL][IL]AF (SEQ ID NO: 12); QFFEFIW (SEQ ID NO: 13); HNTNL (SEQ ID NO: 14); TCWKL (SEQ ID NO: 15); M[IL]LSHAILAFC (SEQ ID NO: 16); HVG[LV][AN]FT[SCF]Y[YS]A[ST][RT][AS]A[LF] (SEQ ID NO; 17); GLIVT (SEQ ID NO: 18); L[YH]YAEY[LF]V (SEQ ID NO: 19); KAFFAL (SEQ ID NO: 20); KLGARNMT (SEQ ID NO: 21); QAF[NK]SN (SEQ ID NO: 22); LIFQT (SEQ ID NO: 23); SIIVALT (SEQ ID NO: 24); MSIETAW (SEQ ID NO: 25); VVSGV (SEQ ID NO: 26); RPYVV (SEQ ID NO: 27); KPDLP (SEQ ID NO: 28); RWKQY (SEQ ID NO: 29); FLITI (SEQ ID NO: 30); DIEGD (SEQ ID NO: 31); and KYGVST (SEQ ID NO: 32).

In some embodiments, motifs identified in this disclosure are located at chimeric junctions. Chimeric junctions refer to crossover points in a chimeric sequence. For example, in a chimeric PT that includes portions of CsPT4 and portions of CsPT7, a chimeric junction occurs at a region where a sequence derived from CsPT4 is joined to a sequence derived from CsPT7. A motif located at a chimeric junction therefore includes sequences derived from two or more CsPT proteins.

In some embodiments, a chimeric PT includes the motif MTVMGMT (SEQ ID NO: 11) at or near a chimeric junction. In some embodiments, a chimeric PT includes the motif MTVMGMT (SEQ ID NO: 11) at residues corresponding to residues 207-213 in SEQ ID NO: 5.

In some embodiments, a chimeric PT includes the motif [EV][LMW][RS]P[SAP]F[ST]F[IL][IL]AF (SEQ ID NO: 12) at or near a chimeric junction. In some embodiments, a chimeric PT includes the motif MTVMGMT (SEQ ID NO: 11) at residues corresponding to residues 195-206 of SEQ ID NO: 5.

In some embodiments, a chimeric PT includes the motif QFFEFIW (SEQ ID NO: 13) at or near a chimeric junction. In some embodiments, a chimeric PT includes the motif QFFEFIW (SEQ ID NO: 13) at residues corresponding to residues 304-310 of SEQ ID NO: 5.

In some embodiments, a chimeric PT includes the motif HNTNL (SEQ ID NO: 14) at residues corresponding to residues 57-61 of SEQ ID NO: 5.

In some embodiments, a chimeric PT includes the motif TCWKL (SEQ ID NO: 15) at residues corresponding to residues 30-34 of SEQ ID NO: 5.

In some embodiments, a chimeric PT includes the motif M[IL]LSHAILAFC (SEQ ID NO: 16) at or near a chimeric junction. In some embodiments, a chimeric PT includes the motif M[IL]LSHAILAFC (SEQ ID NO: 16) at residues corresponding to residues 274-284 of SEQ ID NO: 5.

In some embodiments, a chimeric PT includes the motif HVG[LV][AN]FT[SCF]Y[YS]A[ST][RT][AS]A[LF] (SEQ ID NO: 17) at or near a chimeric junction. In some embodiments, a chimeric PT includes the motif HVG[LV][AN]FT[SCF]Y[YS]A[ST][RT][AS]A[LF] (SEQ ID NO: 17) at residues corresponding to residues 175-190 of SEQ ID NO: 5.

In some embodiments, a chimeric PT includes the motif GLIVT (SEQ ID NO: 18) at or near a chimeric junction. In some embodiments, a chimeric PT includes the motif GLIVT (SEQ ID NO: 18) at residues corresponding to residues 126-130 of SEQ ID NO: 5.

In some embodiments, a chimeric PT includes the motif L[YH]YAEY[LF]V (SEQ ID NO: 19) at or near a chimeric junction. In some embodiments, a chimeric PT includes the motif L[YH]YAEY[LF]V (SEQ ID NO: 19) at residues corresponding to residues 312-319 of SEQ ID NO: 5.

In some embodiments, a chimeric PT includes the motif KAFFAL (SEQ ID NO: 20) at or near a chimeric junction. In some embodiments, a chimeric PT includes the motif KAFFAL (SEQ ID NO: 20 at residues corresponding to residues 69-74 of SEQ ID NO: 5.

In some embodiments, a chimeric PT includes the motif KLGARNMT (SEQ ID NO: 21) at residues corresponding to residues 237-244 of SEQ ID NO: 5.

In some embodiments, a chimeric PT includes the motif QAF[NK]SN (SEQ ID NO: 22) at or near a chimeric junction. In some embodiments, a chimeric PT includes the motif QAF[NK]SN (SEQ ID NO: 22) at residues corresponding to residues 267-272 of SEQ ID NO: 5.

In some embodiments, a chimeric PT includes the motif LIFQT (SEQ ID NO: 23) at or near a chimeric junction. In some embodiments, a chimeric PT includes the motif LIFQT (SEQ ID NO: 23) at residues corresponding to residues 285-289 of SEQ ID NO: 5.

In some embodiments, a chimeric PT includes the motif SIIVALT (SEQ ID NO: 24) at or near a chimeric junction. In some embodiments, a chimeric PT includes the motif SIIVALT (SEQ ID NO: 24) at residues corresponding to residues 119-125 of SEQ ID NO: 5.

In some embodiments, a chimeric PT includes the motif MSIETAW (SEQ ID NO: 25) at or near a chimeric junction. In some embodiments, a chimeric PT includes the motif MSIETAW (SEQ ID NO: 25) at residues corresponding to residues 110-116 of SEQ ID NO: 5.

In some embodiments, a chimeric PT includes the motif VVSGV (SEQ ID NO: 26) at or near a chimeric junction. In some embodiments, a chimeric PT includes the motif VVSGV (SEQ ID NO: 26) at residues corresponding to residues 246-250 of SEQ ID NO: 5.

In some embodiments, a chimeric PT includes the motif RPYVV (SEQ ID NO: 27) at or near a chimeric junction. In some embodiments, a chimeric PT includes the motif RPYVV (SEQ ID NO: 27) at residues corresponding to residues 36-40 of SEQ ID NO: 5.

In some embodiments, a chimeric PT includes the motif KPDLP (SEQ ID NO: 28) at residues corresponding to residues 100-104 of SEQ ID NO: 5.

In some embodiments, a chimeric PT includes the motif RWKQY (SEQ ID NO: 29) at residues corresponding to residues 100-104 of SEQ ID NO: 5.

In some embodiments, a chimeric PT includes the motif FLITI (SEQ ID NO: 30) at or near a chimeric junction. In some embodiments, a chimeric PT includes the motif FLITI (SEQ ID NO: 30) at residues corresponding to residues 168-172 of SEQ ID NO: 5.

In some embodiments, a chimeric PT includes the motif DIEGD (SEQ ID NO: 31) at residues corresponding to residues 222-226 of SEQ ID NO: 5.

In some embodiments, a chimeric PT includes the motif KYGVST (SEQ ID NO: 32) at residues corresponding to residues 228-233 of SEQ ID NO: 5.

The sequence of a chimeric PT associated with the disclosure can comprise the structure: X1-X2-X3-X4-X5-X6-X7-X8-X9-X10. In some embodiments, any one of X1, X2, X3, X4, X5, X6, X7, X8, X9, and X10 can comprise portions of CsPT1, CsPT2, CsPT3, CsPT4, CsPT5, CsPT6 or CsPT7. In some embodiments, X1, X2, X3, X4, X5, X6, X7, X8, X9 and/or X10 comprise portions of CsPT1. In some embodiments, X1, X2, X3, X4, X5, X6, X7, X8, X9 and/or X10 comprise portions of CsPT4. In some embodiments, X1, X2, X3, X4, X5, X6, X7, X8, X9 and/or X10 comprise portions of CsPT6. In some embodiments, X1, X2, X3, X4, X5, X6, X7, X8, X9 and/or X10 comprise portions of CsPT7. In some embodiments, X1, X3, X5, X7, and X9 comprise portions of CsPT4. In some embodiments, X2, X4, X6, X8, and X10 comprise portions of CsPT1, CsPT6 or CsPT7. In some embodiments, one or more of X1, X2, X3, X4, X5, X6, X7, X8, X9 and X10 includes a portion of a transmembrane helix. In some embodiments, each of X1, X2, X3, X4, X5, X6, X7, X8, X9 and X10 includes a portion of a transmembrane helix.

In some embodiments, the sequence of X1 comprises any of SEQ ID NOs: 33-39 or a sequence that comprises no more than 2 amino acid substitutions, insertions, additions or deletions relative to any one of SEQ ID NOs: 33-39. In some embodiments, the sequence of X2 comprises any of SEQ ID NOs: 40-46 or a sequence that comprises no more than 2 amino acid substitutions, insertions, additions or deletions relative to any one of SEQ ID NOs: 40-46. In some embodiments, the sequence of X3 comprises any of SEQ ID NOs: 47-53 or a sequence that comprises no more than 2 amino acid substitutions, insertions, additions or deletions relative to any one of SEQ ID NOs: 47-53. In some embodiments, the sequence of X4 comprises any of SEQ ID NOs: 54-60 or a sequence that comprises no more than 2 amino acid substitutions, insertions, additions or deletions relative to any one of SEQ ID NOs: 54-60. In some embodiments, the sequence of X5 comprises any of SEQ ID NOs: 61-67 or a sequence that comprises no more than 2 amino acid substitutions, insertions, additions or deletions relative to any one of SEQ ID NOs: 61-67. In some embodiments, the sequence of X6 comprises any of SEQ ID NOs: 68-74 or a sequence that comprises no more than 2 amino acid substitutions, insertions, additions or deletions relative to any one of SEQ ID NOs: 68-74. In some embodiments, the sequence of X7 comprises any of SEQ ID NOs: 75-81 or a sequence that comprises no more than 2 amino acid substitutions, insertions, additions or deletions relative to any one of SEQ ID NOs: 75-81. In some embodiments, the sequence of X8 comprises any of SEQ ID NOs: 82-88 or a sequence that comprises no more than 2 amino acid substitutions, insertions, additions or deletions relative to any one of SEQ ID NOs: 82-88. In some embodiments, the sequence of X9 comprises any of SEQ ID NOs: 89-95 or a sequence that comprises no more than 2 amino acid substitutions, insertions, additions or deletions relative to any one of SEQ ID NOs: 89-95. In some embodiments, the sequence of X10 comprises any of SEQ ID NOs: 96-102 or a sequence that comprises no more than 2 amino acid substitutions, insertions, additions or deletions relative to any one of SEQ ID NOs: 96-102.

In some embodiments, a chimeric PT comprises a sequence that is at least 5%, at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 71%, at least 72%, at least 73%, at least 74%, at least 75%, at least 76%, at least 77%, at least 78%, at least 79%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or is 100% identical, including all values in between, to any one of SEQ ID NOs: 110-121, 133-144, 757-868, 869-980, 982-1081 or 1083-1182, to any chimeric PT disclosed in Tables 13-16 and 19-20, or to any chimeric PT disclosed in this application.

b. Prenyltransferase Fusions

Further aspects of the disclosure relate to fusion proteins comprising PTs associated with the disclosure, including chimeric PTs. Chimeric PTs that are components of fusion proteins may in some instances be referred to within this disclosure as “chimeric fusions.”

For example, a PT may be linked to one or more genes in the cannabinoid biosynthesis pathway or a metabolic pathway of a host cell. In some embodiments, the one or more genes linked to the PT includes a gene that encodes a polypeptide having enzymatic activity such that its product is a substrate for the PT. In some embodiments, the one or more genes linked to the PT includes a gene that encodes a polypeptide having enzymatic activity such that the product of the PT is a substrate for the downstream polypeptide. In certain embodiments, a PT may be linked to a mutant form of one or more genes in the metabolic pathway of a host cell. In certain embodiments, a PT may be linked to a farnesyl pyrophosphate synthase. The farnesyl pyrophosphate synthase can be linked to the amino terminus or the carboxy terminus of a PT. In some embodiments, the farnesyl pyrophosphate synthase is linked to the amino terminus of the PT, with or without a linker sequence separating the farnesyl pyrophosphate synthase and the PT sequence.

Farnesyl pyrophosphate synthase enzymes convert isopentenyl diphosphate (IPP) and dimethylallyl diphosphate (DMAP) to geranyl pyrophosphate (GPP) and farnesyl pyrophosphate (FPP) in yeast cells. In some embodiments, a farnesyl pyrophosphate synthase enzyme may produce neryl pyrophosphate (NPP). In some embodiments, the farnesyl pyrophosphate synthase component of a PT fusion protein is the S. cerevisiae ERG20 protein. In some embodiments, the farnesyl pyrophosphate synthase comprises one or more mutations relative to a wild-type farnesyl pyrophosphate synthase. Mutations in a farnesyl pyrophosphate synthase may modulate the ratio of GPP and FPP produced by the enzyme. In some embodiments, the farnesyl pyrophosphate synthase comprises a mutation that increases the production of GPP relative to FPP. In some embodiments, the farnesyl pyrophosphate synthase comprises one or more mutations that reduce the levels of production of FPP and/or increase production of GPP. See, Ignea et al. ACS Synth. Biol. (2014) 3: 298-306.

In some embodiments, the farnesyl pyrophosphate synthase is ERG20, corresponding to UniProt Accession No. P08524, provided by SEQ ID NO: 424:

(SEQ ID NO: 424)

MASEKEIRRERFLNVFPKLVEELNASLLAYGMPKEACDWYAHSLNYNTP

GGKLNRGLSVVDTYAILSNKTVEQLGQEEYEKVAILGWCIELLQAYFLV

ADDMMDKSITRRGQPCWYKVPEVGEIAINDAFMLEAAIYKLLKSHFRNE

KYYIDITELFHEVTFQTELGQLMDLITAPEDKVDLSKFSLKKHSFIVTF

KTAYYSFYLPVALAMYVAGITDEKDLKQARDVLIPLGEYFQIQDDYLDC

FGTPEQIGKIGTDIQDNKCSWVINKALELASAEQRKTLDENYGKKDSVA

EAKCKKIFNDLKIEQLYHEYEESIAKDLKAKISQVDESRGFKADVLTAF

LNKVYKRSK.

In some embodiments, the farnesyl pyrophosphate synthase is ERG20 comprising F96W and/or N127W substitutions relative to the wildtype ERG20 sequence. The sequence of ERG20 F96W N127W is provided by SEQ ID NO: 103.

(SEQ ID NO: 103)

MASEKEIRRERFLNVFPKLVEELNASLLAYGMPKEACDWYAHSLNYNTP

GGKLNRGLSVVDTYAILSNKTVEQLGQEEYEKVAILGWCIELLQAYWLV

ADDMMDKSITRRGQPCWYKVPEVGEIAIWDAFMLEAAIYKLLKSHFRNE

KYYIDITELFHEVTFQTELGQLMDLITAPEDKVDLSKFSLKKHSFIVTF

KTAYYSFYLPVALAMYVAGITDEKDLKQARDVLIPLGEYFQIQDDYLDC

FGTPEQIGKIGTDIQDNKCSWVINKALELASAEQRKTLDENYGKKDSVA

EAKCKKIFNDLKIEQLYHEYEESIAKDLKAKISQVDESRGFKADVLTAF

LNKVYKRSK.

In some embodiments, the farnesyl pyrophosphate synthase comprises a mutation at position K197 of ERG20.

In some embodiments, the farnesyl pyrophosphate synthase comprises a protein sequence that is at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 71%, at least 72%, at least 73%, at least 74%, at least 75%, at least 76%, at least 77%, at least 78%, at least 79%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% identical, or is 100% identical, to SEQ ID NO: 424 or 103. In some embodiments, a farnesyl pyrophosphate synthase does not comprise SEQ ID NO: 103 or SEQ ID NO: 424.

Example 6 describes the identification of ERG20 homologs. In some embodiments, the farnesyl pyrophosphate synthase component of a fusion protein is an ERG20 homolog identified in Example 6, the sequences of which are provided in Table 17. In some embodiments, an ERG20 homolog comprises a tryptophan residue at a residue corresponding to amino acid positions F96 and/or N127 in S. cerevisiae ERG20. In some embodiments, an ERG20 homolog comprises a substitution at a residue corresponding to amino acid position K197 in S. cerevisiae ERG20.

In some embodiments, the farnesyl pyrophosphate synthase comprises a protein or nucleic acid sequence that is at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 71%, at least 72%, at least 73%, at least 74%, at least 75%, at least 76%, at least 77%, at least 78%, at least 79%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% identical, or is 100% identical, to any one of SEQ ID NOs: 426-476, 479-529, 753 or 754, to any sequence provided in Table 17, or to any other ERG20 homolog sequence provided in this disclosure.

Example 6 describes the identification of putative farnesyl pyrophosphate synthases that were effective in producing CBGA when fused with a prenyltransferase. Table 10 provides non-limiting examples of motifs that were identified in the sequences of the putative farnesyl pyrophosphate synthases that were effective in producing CBGA. In some embodiments, a farnesyl pyrophosphate synthase includes one or more of the following motifs, provided in Table 10: NVPGGKLNR (SEQ ID NO: 647), FYLPVALA[LM]H (SEQ ID NO: 648), A[EH]D[IV]LIPLG (SEQ ID NO: 651), LGW[CL][ITV]ELLQA[FY]FL (SEQ ID NO: 655), KKEV[FL][ET][SA]FL[AGN]KIYK (SEQ ID NO: 663), QRK[VI]L[DE]ENYG (SEQ ID NO: 667), VGMIAIWD (SEQ ID NO: 672), TDI[QK]DNKCSW (SEQ ID NO; 673), TAYYSFYLP (SEQ ID NO; 676), GKIGTDI[QK]DNKCSW (SEQ ID NO: 677), ILIP[LM]GEYFQ (SEQ ID NO: 680), IL[VM][EP][ML]G[ET][YF]FQ (SEQ ID NO: 683), AKIYKRSK (SEQ ID NO: 685), DPEVIGKI (SEQ ID NO: 686), RGQPCW[YF]RVP[EQ] (SEQ ID NO: 687), IVKYKTA[YF]Y[ST]FYLP (SEQ ID NO: 689), WC[IV]E[LW]LQA[YF][WF]LV[ALW]D (SEQ ID NO: 692), CSWLV[VN]Q[AC]L[AQ][RI][AC][ST]P[ED]Q (SEQ ID NO: 699).

In some embodiments, a farnesyl pyrophosphate synthase includes the motif NVPGGKLNR (SEQ ID NO. 647) at residues corresponding to residues 47-55 in SEQ ID NO: 424.

In some embodiments, a farnesyl pyrophosphate synthase includes the motif FYLPVALA[LM]H (SEQ ID NO: 648) at residues corresponding to residues 203-212 in SEQ ID NO: 424. In some embodiments, the motif FYLPVALA[LM]H is FYLPVALALH (SEQ ID NO: 649) or FYLPVALAMH (SEQ ID NO: 650).

In some embodiments, a farnesyl pyrophosphate synthase includes the motif A[EH]D[IV]LIPLG (SEQ ID NO: 651) at residues corresponding to residues 225-233 of SEQ ID NO: 424. In some embodiments, the motif A[EH]D[IV]LIPLG (SEQ ID NO: 651) is AEDILIPLG (SEQ ID NO: 652), AHDILIPLG (SEQ ID NO: 653), or AHDVLIPLG (SEQ ID NO: 654).

In some embodiments, a farnesyl pyrophosphate synthase includes the motif LGW[CL][ITV]ELLQA[FY]FL (SEQ ID NO: 655) at residues corresponding to residues 85-97 of SEQ ID NO: 424. In some embodiments, the motif LGW[CL][ITV]ELLQA[FY]FL (SEQ ID NO: 655) is LGWLTELLQAYFL (SEQ ID NO: 656), LGWLTELLQAFFL (SEQ ID NO: 657), LGWCIELLQAYFL (SEQ ID NO: 658), LGWCVELLQAYFL (SEQ ID NO: 659), LGWCVELLQAFFL (SEQ ID NO: 660), LGWCIELLQAFFL (SEQ ID NO: 661), or LGWCTELLQAFFL (SEQ ID NO: 662).

In some embodiments, a farnesyl pyrophosphate synthase includes the motif KKEV[FL][ET][SA]FL[AGN]KIYK (SEQ ID NO: 663) at residues corresponding to residues 336-349 of SEQ ID NO: 424. In some embodiments, the motif KKEV[FL][ET][SA]FL[AGN]KIYK (SEQ ID NO: 663) is KKEVFESFLAKIYK (SEQ ID NO: 664), KKEVFEAFLGKIYK (SEQ ID NO: 665), or KKEVLTSFLNKIYK (SEQ ID NO: 666).

In some embodiments, a farnesyl pyrophosphate synthase includes the motif QRK[VI]L[DE]ENYG (SEQ ID NO: 667) at residues corresponding to residues 279-288 of SEQ ID NO: 424. In some embodiments, the motif QRK[VI]L[DE]ENYG (SEQ ID NO: 667) is QRKVLDENYG (SEQ ID NO: 668), QRKILDENYG (SEQ ID NO: 669), QRKILEENYG (SEQ ID NO: 670), or QRKVLEENYG (SEQ ID NO: 671).

In some embodiments, a farnesyl pyrophosphate synthase includes the motif VGMIAIWD at residues corresponding to residues 121-128 of SEQ ID NO: 424.

In some embodiments, a farnesyl pyrophosphate synthase includes the motif TDI[QK]DNKCSW (SEQ ID NO: 673) at residues corresponding to residues 217-226 of SEQ ID NO: 424. In some embodiments, the motif TDI[QK]DNKCSW (SEQ ID NO: 673) is TDIQDNKCSW (SEQ ID NO: 674) or TDIKDNKCSW (SEQ ID NO: 675).

In some embodiments, a farnesyl pyrophosphate synthase includes the motif TAYYSFYLP (SEQ ID NO: 676) at residues corresponding to residues 198-206 of SEQ ID NO: 424.

In some embodiments, a farnesyl pyrophosphate synthase includes the motif GKIGTDI[QK]DNKCSW (SEQ ID NO: 677) at residues corresponding to residues 253-266 of SEQ ID NO: 424. In some embodiments, the motif GKIGTDI[QK]DNKCSW (SEQ ID NO: 677) is GKIGTDIQDNKCSW (SEQ ID NO: 678) or GKIGTDIKDNKCSW (SEQ ID NO: 679).

In some embodiments, a farnesyl pyrophosphate synthase includes the motif ILIP[LM]GEYFQ (SEQ ID NO: 680) at residues corresponding to residues 228-237 of SEQ ID NO: 424. In some embodiments, the motif ILIP[LM]GEYFQ (SEQ ID NO: 680) is ILIPLGEYFQ (SEQ ID NO: 681) or ILIPMGEYFQ (SEQ ID NO: 682).

In some embodiments, a farnesyl pyrophosphate synthase includes the motif IL[VM][EP][ML]G[ET][YF]FQ (SEQ ID NO: 683) at residues corresponding to residues 228-237 of SEQ ID NO: 424. In some embodiments, the motif IL[VM][EP][ML]G[ET][YF]FQ (SEQ ID NO: 683) is ILVPMGEYFQ (SEQ ID NO: 684).

In some embodiments, a farnesyl pyrophosphate synthase includes the motif AKIYKRSK (SEQ ID NO: 685) at residues corresponding to residues 345-352 of SEQ ID NO: 424.

In some embodiments, a farnesyl pyrophosphate synthase includes the motif DPEVIGKI (SEQ ID NO: 248) at residues corresponding to residues 248-255 of SEQ ID NO: 424.

In some embodiments, a farnesyl pyrophosphate synthase includes the motif RGQPCW[YF]RVP[EQ] (SEQ ID NO: 687) at residues corresponding to residues 110-120 of SEQ ID NO: 424. In some embodiments, the motif RGQPCW[YF]RVP[EQ] (SEQ ID NO: 687) is RGQPCWYRVPE (SEQ ID NO: 688).

In some embodiments, a farnesyl pyrophosphate synthase includes the motif IVKYKTA[YF]Y[ST]FYLP (SEQ ID NO: 689) at residues corresponding to residues 193-206 of SEQ ID NO: 424. In some embodiments, the motif IVKYKTA[YF]Y[ST]FYLP (SEQ ID NO: 689) is IVKYKTAFYSFYLP (SEQ ID NO: 690) or IVKYKTAYYSFYLP (SEQ ID NO: 691).

In some embodiments, a farnesyl pyrophosphate synthase includes the motif WC[IV]E[LW]LQA[YF][WF]LV[ALW]D (SEQ ID NO: 692) at residues corresponding to residues 87-100 of SEQ ID NO: 424. In some embodiments, the motif WC[IV]E[LW]LQA[YF][WF]LV[ALW]D (SEQ ID NO: 692) is WCIELLQAFFLVAD (SEQ ID NO: 693), WCIELLQAFWLVAD (SEQ ID NO: 694), WCIELLQAYFLVAD (SEQ ID NO: 695), WCIELLQAYWLVAD (SEQ ID NO: 696), WCIEWLQAFFLVAD (SEQ ID NO: 697) or WCVELLQAYFLVAD (SEQ ID NO: 698).

In some embodiments, a farnesyl pyrophosphate synthase includes the motif CSWLV[VN]Q[AC]L[AQ][RI][AC][ST]P[ED]Q (SEQ ID NO: 699) at residues corresponding to residues 264-279 of SEQ ID NO: 424. In some embodiments, the motif CSWLV[VN]Q[AC]L[AQ][RI][AC][ST]P[ED]Q (SEQ ID NO: 699) is CSWLVVQALARATPEQ (SEQ ID NO: 700).

In some embodiments of fusion proteins associated with the disclosure, a farnesyl pyrophosphate synthase and a chimeric PT are separated by a linker sequence. In some embodiments, the linker joins a C-terminal residue of the farnesyl pyrophosphate synthase and an N-terminal residue of the PT enzyme. In some embodiments, the linker is a peptide linker. Examples of peptide linkers include, for example SG, GGGS (SEQ ID NO: 104), SGSGSGSGS (SEQ ID NO: 105), GGGSGGGGSGGGGS (SEQ ID NO: 106), GGGSGGGGSGGGGSGGGGSGGGGS (SEQ ID NO: 107), GGGSGGGGSGGGGSGGGGSGGGGSGGGGSGGGGS (SEQ ID NO: 108), and GGGSGGGGSGGGGSGGGGSGGGGSGGGGSGGGGSGGGGSGGGGS (SEQ ID NO: 109).

Any of the PTs provided in this disclosure, including truncated PTs and/or chimeric PTs can be expressed as fusion proteins with any farnesyl pyrophosphate synthase provided in this disclosure.

In some embodiments, fusion proteins associated with the disclosure comprise, from N-terminus to C-terminus, a farnesyl pyrophosphate synthase, a linker, and a chimeric PT enzyme, or truncation thereof. In some embodiments, a fusion protein comprises, from N-terminus to C-terminus, ERG20 F96W N127W provided by SEQ ID NO: 103, a linker, and any of the chimeric PTs described in this disclosure, including truncations thereof. In other embodiments, a fusion protein comprises, from N-terminus to C-terminus, an ERG20 homolog provided by any one of SEQ ID NOs: 426476, a linker, and any of the chimeric PTs described in this disclosure, including truncations thereof.

In some embodiments, a fusion protein that includes a farnesyl pyrophosphate synthase and a PT comprises a protein or nucleic acid sequence that is at least 5%, at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 71%, at least 72%, at least 73%, at least 74%, at least 75%, at least 76%, at least 77%, at least 78%, at least 79%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or is 100% identical, including all values in between, to any one of SEQ ID NOs: 122-132, 145-155, 156-225, 226-423, 532-582, 585-635, 704, 710, 724, 729, 735, 749, 755 or 756, or any fusion protein disclosed in Tables 13-14, 16 and 18, or any fusion protein disclosed in this application.

c. Prenyltransferase Mutations

PTs associated with the disclosure, including chimeric PTs and chimeric fusions, may include one or more amino acid substitutions, additions, deletions or insertions corresponding to a reference sequence. In some embodiments, a PT comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, or 323 amino acid substitutions, additions, deletions or insertions relative to a reference sequence. In some embodiments, the reference sequence is SEQ ID NO: 5.

In some embodiments, a PT comprises an amino acid substitution, addition, deletion or insertion at a residue corresponding to position 29, 31, 39, 41, 43, 46, 47, 48, 52, 56, 59, 60, 67, 68, 72, 80, 82, 83, 86, 87, 91, 94, 110, 113, 136, 140, 141, 142, 145, 147, 148, 149, 151, 162, 163, 167, 170, 173, 174, 182, 184, 187, 197, 199, 210, 215, 216, 223, 231, 232, 243, 244, 245, 258, 260, 261, 263, 267, 272, 273, 277, 284, 288, 289, 298, 301, 302, 311, and/or 318 in SEQ ID NO: 5.

In some embodiments, the PT comprises the amino acid D at a residue corresponding to position 29 in SEQ ID NO: 5; the amino acid A at a residue corresponding to position 30 in SEQ ID NO: 5; the amino acid F at a residue corresponding to position 31 in SEQ ID NO: 5; the amino acid F at a residue corresponding to position 34 in SEQ ID NO: 5: the amino acid T at a residue corresponding to position 35 in SEQ ID NO; 5; the amino acid M, T, or A at a residue corresponding to position 39 in SEQ ID NO: 5; the amino acid I at a residue corresponding to position 40 in SEQ ID NO: 5; the amino acid V or I at a residue corresponding to position 41 in SEQ ID NO: 5; the amino acid V, A, or L at a residue corresponding to position 43 in SEQ ID NO: 5; the amino acid L, F, or I at a residue corresponding to position 45 in SEQ ID NO: 5; the amino acid G, C, or A at a residue corresponding to position 46 in SEQ ID NO: 5; the amino acid V or S at a residue corresponding to position 47 in SEQ ID NO: 5; the amino acid T at a residue corresponding to position 48 in SEQ ID NO: 5, the amino acid S or A at a residue corresponding to position 49 in SEQ ID NO: 5; the amino acid L or A at a residue corresponding to position 52 in SEQ ID NO: 5; the amino acid L, T, I at a residue corresponding to position 56 in SEQ ID NO: 5; the amino acid P at a residue corresponding to position 59 in SEQ ID NO: 5; the amino acid E. D, or N at a residue corresponding to position 60 in SEQ ID NO: 5; the amino acid I or F at a residue corresponding to position 62 in SEQ ID NO: 5; the amino acid L or I at a residue corresponding to position 67 in SEQ ID NO: 5; the amino acid G or F at a residue corresponding to position 68 in SEQ ID NO: 5; the amino acid E at a residue corresponding to position 72 in SEQ ID NO: 5; the amino acid G at a residue corresponding to position 73 in SEQ ID NO: 5; the amino acid V, L, F, or I at a residue corresponding to position 75 in SEQ ID NO: 5; the amino acid L or C at a residue corresponding to position 79 in SEQ ID NO: 5; the amino acid W at a residue corresponding to position 80 in SEQ ID NO: 5; the amino acid G at a residue corresponding to position 82 in SEQ ID NO: 5; the amino acid Y at a residue corresponding to position 83 in SEQ ID NO: 5; the amino acid N at a residue corresponding to position 85 in SEQ ID NO: 5; the amino acid S, T, A, G, F, V, or C at a residue corresponding to position 86 in SEQ ID NO: 5; the amino acid T, I, C, Q, V, or L at a residue corresponding to position 87 in SEQ ID NO: 5; the amino acid L or F at a residue corresponding to position 91 in SEQ ID NO: 5; the amino acid E at a residue corresponding to position 94 in SEQ ID NO: 5; the amino acid Y at a residue corresponding to position 102 in SEQ ID NO: 5; the amino acid I at a residue corresponding to position 105 in SEQ ID NO: 5; the amino acid A at a residue corresponding to position 106 in SEQ ID NO: 5; the amino acid I or L at a residue corresponding to position 110 in SEQ ID NO: 5; the amino acid R at a residue corresponding to position 113 in SEQ ID NO: 5; the amino acid L at a residue corresponding to position 117 in SEQ ID NO: 5; the amino acid I at a residue corresponding to position 118 in SEQ ID NO: 5; the amino acid A at a residue corresponding to position 119 in SEQ ID NO: 5; the amino acid S at a residue corresponding to position 121 in SEQ ID NO: 5; the amino acid S or F at a residue corresponding to position 122 in SEQ ID NO: 5; the amino acid I or L at a residue corresponding to position 129 in SEQ ID NO: 5; the amino acid G at a residue corresponding to position 134 in SEQ ID NO: 5; the amino acid P or S at a residue corresponding to position 136 in SEQ ID NO: 5; the amino acid L or I at a residue corresponding to position 139 in SEQ ID NO: 5; the amino acid L, I, T, or F at a residue corresponding to position 140 in SEQ ID NO: 5; the amino acid L, S, V, A, C, or I at a residue corresponding to position 141 in SEQ ID NO: 5; the amino acid A, L, M, or T at a residue corresponding to position 142 in SEQ ID NO: 5, the amino acid S, I, C, V, L, M, T, or F at a residue corresponding to position 145 in SEQ ID NO: 5; the amino acid L at a residue corresponding to position 147 in SEQ ID NO: 5; the amino acid S, A or L at a residue corresponding to position 148 in SEQ ID NO: 5; the amino acid E, W, C, I, Q, S, T or L at a residue corresponding to position 149 in SEQ ID NO: 5; the amino acid M, G, H, T, I, A, or C at a residue corresponding to position 151 in SEQ ID NO: 5; the amino acid I or L at a residue corresponding to position 152 in SEQ ID NO: 5; the amino acid R at a residue corresponding to position 162 in SEQ ID NO: 5; the amino acid F at a residue corresponding to position 163 in SEQ ID NO: 5; the amino acid A at a residue corresponding to position 167 in SEQ ID NO: 5; the amino acid A at a residue corresponding to position 169 in SEQ ID NO: 5; the amino acid T or C at a residue corresponding to position 170 in SEQ ID NO: 5; the amino acid I at a residue corresponding to position 171 in SEQ ID NO: 5; the amino acid F, L, or V at a residue corresponding to position 172 in SEQ ID NO: 5; the amino acid W, G, L, or T at a residue corresponding to position 173 in SEQ ID NO: 5; the amino acid T at a residue corresponding to position 174 in SEQ ID NO: 5; the amino acid F at a residue corresponding to position 176 in SEQ ID NO: 5; the amino acid T, L, A, I, or V at a residue corresponding to position 177 in SEQ ID NO: 5; the amino acid P or N at a residue corresponding to position 179 in SEQ ID NO: 5; the amino acid L, V, F, or S at a residue corresponding to position 182 in SEQ ID NO: 5; the amino acid Y or L at a residue corresponding to position 184 in SEQ ID NO: 5; the amino acid R at a residue corresponding to position 187 in SEQ ID NO: 5; the amino acid L or V at a residue corresponding to position 190 in SEQ ID NO: 5; the amino acid L, I, F, or W at a residue corresponding to position 196 in SEQ ID NO: 5; the amino acid I, A, V, or S at a residue corresponding to position 197 in SEQ ID NO: 5; the amino acid S or A at a residue corresponding to position 199 in SEQ ID NO: 5; the amino acid L at a residue corresponding to position 200 in SEQ ID NO: 5; the amino acid I or T at a residue corresponding to position 204 in SEQ ID NO: 5; the amino acid V at a residue corresponding to position 207 in SEQ ID NO: 5; the amino acid L at a residue corresponding to position 209 in SEQ ID NO: 5; the amino acid Y or F at a residue corresponding to position 210 in SEQ ID NO: 5; the amino acid S, T. or A at a residue corresponding to position 211 in SEQ ID NO: 5, the amino acid I or L at a residue corresponding to position 212 in SEQ ID NO: 5; the amino acid V, A, I, or G at a residue corresponding to position 213 in SEQ ID NO: 5; the amino acid Y at a residue corresponding to position 215 in SEQ ID NO: 5; the amino acid I at a residue corresponding to position 216 in SEQ ID NO: 5; the amino acid L at a residue corresponding to position 220 in SEQ ID NO: 5; the amino acid V at a residue corresponding to position 223 in SEQ ID NO: 5: the amino acid R or K at a residue corresponding to position 227 in SEQ ID NO: 5; the amino acid E or A at a residue corresponding to position 228 in SEQ ID NO: 5; the amino acid H or F at a residue corresponding to position 229 in SEQ ID NO: 5; the amino acid N at a residue corresponding to position 230 in SEQ ID NO: 5; the amino acid M, L, or I at a residue corresponding to position 231 in SEQ ID NO: 5; the amino acid R or K at a residue corresponding to position 232 in SEQ ID NO: 5, the amino acid L, F, or M at a residue corresponding to position 234 in SEQ ID NO: 5; the amino acid V at a residue corresponding to position 236 in SEQ ID NO: 5; the amino acid K at a residue corresponding to position 241 in SEQ ID NO: 5; the amino acid T at a residue corresponding to position 242 in SEQ ID NO: 5; the amino acid I, T, L, or A at a residue corresponding to position 243 in SEQ ID NO: 5; the amino acid A at a residue corresponding to position 244 in SEQ ID NO: 5; the amino acid W or R at a residue corresponding to position 245 in SEQ ID NO: 5; the amino acid L, I, M, or F at a residue corresponding to position 246 in SEQ ID NO: 5; the amino acid C, S. G, or A at a residue corresponding to position 247 in SEQ ID NO: 5; the amino acid L, T, I, A, or F at a residue corresponding to position 250 in SEQ ID NO: 5; the amino acid N, L, A, or C at a residue corresponding to position 254 in SEQ ID NO: 5, the amino acid V at a residue corresponding to position 256 in SEQ ID NO: 5; the amino acid G or L at a residue corresponding to position 257 in SEQ ID NO: 5; the amino acid A at a residue corresponding to position 258 in SEQ ID NO: 5; the amino acid L, V, A, 1, or F at a residue corresponding to position 260 in SEQ ID NO: 5; the amino acid G at a residue corresponding to position 262 in SEQ ID NO: 5; the amino acid A at a residue corresponding to position 261 in SEQ ID NO: 5: the amino acid A at a residue corresponding to position 263 in SEQ ID NO: 5; the amino acid G at a residue corresponding to position 262 in SEQ ID NO: 5; the amino acid N or F at a residue corresponding to position 264 in SEQ ID NO: 5; the amino acid F at a residue corresponding to position 267 in SEQ ID NO: 5; the amino acid K or L at a residue corresponding to position 271 in SEQ ID NO: 5; the amino acid S at a residue corresponding to position 272 in SEQ ID NO: 5; the amino acid F at a residue corresponding to position 273 in SEQ ID NO: 5; the amino acid I at a residue corresponding to position 275 in SEQ ID NO: 5; the amino acid F at a residue corresponding to position 276 in SEQ ID NO: 5; the amino acid S at a residue corresponding to position 277 in SEQ ID NO: 5; the amino acid L, W, or 1 at a residue corresponding to position 284 in SEQ ID NO: 5; the amino acid S at a residue corresponding to position 283 in SEQ ID NO: 5; the amino acid I or W at a residue corresponding to position 284 in SEQ ID NO: 5; the amino acid F at a residue corresponding to position 286 in SEQ ID NO: 5; the amino acid R at a residue corresponding to position 288 in SEQ ID NO: 5; the amino acid A at a residue corresponding to position 289 in SEQ ID NO: 5; the amino acid D at a residue corresponding to position 298 in SEQ ID NO: 5; the amino acid D, G, or T at a residue corresponding to position 301 in SEQ ID NO: 5; the amino acid T at a residue corresponding to position 302 in SEQ ID NO: 5 the amino acid R, N, or K at a residue corresponding to position 311 in SEQ ID NO: 5; and/or the amino acid L at a residue corresponding to position 318 in SEQ ID NO: 5.

In some embodiments, one or more substitution mutations are located at residues at or near the active site of a PT protein. The active site of a PT may be defined by generating the three-dimensional structure of the PT and identifying the residues within a particular distance of the GPP substrate binding site and/or the Mg binding site. As a non-limiting example, the structure of a PT may be generated using ROSETTA software. See, e.g., Kaufmann et al., Biochemistry 2010, 49, 2987-2998. As used in this disclosure, a residue is within the active site of a PT enzyme if it is within about 8 angstroms from the GPP substrate binding site and/or the Mg binding site. As used in this disclosure, a residue is near the active site of a PT enzyme if it is within about 8-12 angstroms from the GPP substrate binding site and/or the Mg binding site. In some embodiments, a substitution mutation is present in a residue corresponding to residue M43, F82, F83, I86, M87, S119, V122, F145, I147, or F151 in SEQ ID NO: 5.

In some embodiments, one or more substitution mutations are located in an apposing face of a helix that forms part of the active site of a CsPT. For example, in some embodiments, a substitution mutation is present in a residue corresponding to residue 186. F83, or M87 of SEQ ID NO: 5. In some embodiments, one or more substitution mutations are located in residues that are predicted to interact with a residue corresponding to residue 186 of SEQ ID NO: 5. For example, in some embodiments, a substitution mutation is present in a residue corresponding to residue F82, F83, M87, S119, or V122.

Without wishing to be bound by any theory, substitution mutations at a residue corresponding to position 86 in SEQ ID NO: 5 (e.g., I86S, I86G, I86A) may increase activity of the PT enzyme due to the decreased residue size relative to the corresponding residue in the wildtype protein. Reduction in side-chain volume at this position may lead to a slight shift in the helix, which could increase the volume of the olivetolic/divarinic acid binding pocket. Without wishing to be bound by any theory, substitution mutations at a residue corresponding to position 82 (e.g., F82G), 83 (e.g., F83Y), 87 (e.g., M87T, M87I, M87C, M87Q or M87V), 119 (e.g., S119A) and/or 122 (e.g., V122F or V122S) of SEQ ID NO: 5, may impact the olivetolic/divarinic acid binding pocket in a similar manner to that discussed above for position 86 in SEQ ID NO: 5. Without wishing to be bound by any theory, substitution mutations at a residue corresponding to position 82 (e.g., F82G), 94 (e.g., D94E), 147 (e.g., I147L), 227 (e.g., A227K), and/or 254 (e.g., T254N) of SEQ ID NO: 5, may increase CBGA production.

It should be appreciated that any of the PTs provided in this disclosure, including chimeric PTs and fusion proteins, can comprise any of the point mutations provided in this disclosure.

A PT described in this disclosure, including a chimeric PT and/or a chimeric fusion, may be capable of producing more CBGA and/or CBGVA relative to a control PT. In some embodiments, a control PT comprises any of SEQ ID NOs: 1-5.

In some embodiments, a PT described in this disclosure, including a chimeric PT and/or a chimeric fusion, that produces more CBGA and/or CBGVA relative to a control PT may be capable of producing at least 1% (e.g., at least 5%, at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 100%, at least 125%, at least 150%, at least 175%, at least 200%, at least 300%, at least 400%, at least 500%, at least 600%, at least 700%, at least 800%, at least 900%, or at least 1,000%) more CBGA and/or CBGVA than a control PT. In some embodiments, a control PT comprises any of SEQ ID NOs: 1-5.

In some embodiments, a PT described in this disclosure, including a chimeric PT and/or a fusion protein, that produces more CBGA and/or CBGVA relative to a control PT may be capable of producing at least 1% (e.g., at least 5%, at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 100%, at least 125%, at least 150%, at least 175%, at least 200%, at least 300%, at least 400%, at least 500%, at least 600%, at least 700%, at least 800%, at least 900%, or at least 1,000%) more CBGA and/or OGOA than a control PT. In some embodiments, a control PT comprises any of SEQ ID NOs: 1-5.

A recombinant host cell that expresses a heterologous gene encoding a PT described in this disclosure, including a chimeric PT and/or a chimeric fusion, may be capable of producing more CBGA and/or CBGVA relative to a host cell that expresses a control PT. In some embodiments, a control PT comprises any of SEQ ID NOs: 1-5.

In some embodiments, a recombinant host cell that expresses a heterologous gene encoding a PT described in this disclosure, including a chimeric PT and/or a chimeric fusion, that produces more CBGA and/or CBGVA relative to a control PT may be capable of producing at least 1% (e.g., at least 5%, at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 100%, at least 125%, at least 150%, at least 175%, at least 200%, at least 300%, at least 400%, at least 500%, at least 600%, at least 700%, at least 800%, at least 900%, or at least 1,000%) more CBGA and/or CBGVA relative to a host cell that expresses a control PT. In some embodiments, a control PT comprises any of SEQ ID NOs: 1-5.

In some embodiments, a recombinant host cell that expresses a heterologous gene encoding a PT described in this disclosure, including a chimeric PT and/or a fusion protein, that produces more CBGA and/or CBGVA relative to a control PT may be capable of producing at least 1% (e.g., at least 5%, at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 100%, at least 125%, at least 150%, at least 175%, at least 200%, at least 300%, at least 400%, at least 500%, at least 600%, at least 700%, at least 800%, at least 900%, or at least 1,000%) more CBGA and/or OGOA relative to a host cell that expresses a control PT. In some embodiments, a control PT comprises any of SEQ ID NOs: 1-5.

PTs for use in producing cannabinoids may be selected based on any one or more desired features, such as substrate selectivity, potential products formed, yield/titer of a product of interest, solubility, and/or localization (e.g. cytosolic localization, intramembrane localization) of the enzyme.

d. Substrate Selectivity

Many prenyltransferases are known to have promiscuity in regard to prenyl donors and acceptors, which may result in a broad spectrum of potential products formed using a particular enzyme (Chen et al. Nat. Chem. Biol. (2017): 13(2): 226-234). Without being bound by a particular theory, promiscuous enzymes may be useful in some embodiments because different products may be produced by the enzyme by varying the substrate. In some embodiments, a promiscuous enzyme may be useful in producing different products from a composition of heterogenous substrates.

As a non-limiting example, the PT from Streptomyces sp., NphB, has been previously shown to prenylate both olivetol and olivetolic acid (Kuzuyama et al. Nature, 2005). Wild-type NphB has also been reported to display a high degree of both substrate and product promiscuity. Similarly, C. sativa CsPT4 has been previously shown to prenylate both olivetol and olivetolic acid (Luo et al. Nature, 2019).

In some instances, it may be preferable for the prenyltransferase to have high specificity and not be promiscuous. For example, it may be preferable for the prenyltransferase to be specific for a particular substrate, so that the prenyltransferase produces a more homogenous product mix (i.e., greater product purity). Without being bound by a particular theory, an enzyme that has high specificity for a particular substrate may be useful because it may reduce possible by-products due to impurities in the substrate composition. For instance, when an enzyme is used with a host cell, the host cell may have intracellular mechanisms to convert a particular feed substrate into an undesirable substrate. In such instances, an enzyme that is highly specific for the non-converted substrate may be used to produce a product that has a higher purity of a compound of interest. In some instances, a highly specific enzyme may be useful for simplifying downstream processing, e.g., removing the need for further product purification.

In certain embodiments, prenyltransferases may use a resorcinol optionally substituted at the 5-position, a compound of Formula (5), a β-resorcylic acid optionally substituted at the 6-position, or a compound of Formula (6):

embedded image

wherein R is hydrogen, optionally substituted acyl, optionally substituted alkyl, optionally substituted alkenyl, optionally substituted alkynyl, optionally substituted carbocyclyl, or optionally substituted aryl; and a compound comprising a prenyl group (e.g., geranyl diphosphate (GPP), isopentenyl diphosphate (IPP), neryl diphosphate (NPP), farnesyl diphosphate (FPP), and geranylgeranyl diphosphate (GGPP)) as substrates. R is as defined in this disclosure. In some embodiments, R is H, an optionally substituted C1-C11 alkyl, an optionally substituted C1-C11 alkenyl, an optionally substituted C1-C11 alkynyl, or an optionally substituted C1-C11 aralkyl.

In certain embodiments, prenyltransferases may use a compound of Formula (6):

embedded image

wherein R is hydrogen, optionally substituted acyl, optionally substituted alkyl, optionally substituted alkenyl, optionally substituted alkynyl, optionally substituted carbocyclyl, or optionally substituted aryl; and a compound comprising a prenyl group (e.g., geranyl diphosphate (GPP), isopentenyl diphosphate (IPP), farnesyl diphosphate (FPP), and geranylgeranyl diphosphate (GGPP)) as substrates. R is as defined in this disclosure.

A prenyltransferase may have different affinities for a particular substrate based on the R group on the substrate (e.g., the R group on a compound of Formula (5) and/or the R group on a compound of Formula (6)) and/or based on the presence or absence of a carboxylic acid on the substrate. In some embodiments, a particular R group may confer particular physiological effects to a compound. In some embodiments, a prenyltransferase may be chosen based on the ability of the prenyltransferase to use a substrate with a particular R group to produce a cannabinoid or cannabinoid precursor with a particular physiological effect.

In certain embodiments, a compound of Formula (6) is olivetolic acid (OA) (compound 6a of formula:

embedded image

divarinic acid, a 6-acyl-resorcinolic acid derivative, 6-alkyl-resorcinolic acid derivative, or a 2,4 dihydroxy-6-acylbenzoic acid. In certain embodiments, a compound of Formula (6) is olivetolic acid (OA). In certain embodiments, a compound of Formula (6) is of the formula:

embedded image

wherein R is optionally substituted C_1-6alkyl. In certain embodiments, a compound of Formula (6) is of the formula:

embedded image

wherein R is unsubstituted C_1-6alkyl. In certain embodiments, a compound of Formula (6) is divarinic acid. In certain embodiments, a compound of Formula (6) is a 6-acyl-resorcinolic acid derivative. In certain embodiments, a compound of Formula (6) is a 6-alkyl-resorcinolic acid derivative. In certain embodiments, a compound of Formula (6) is a 2,4 dihydroxy-6-acylbenzoic acid. In certain embodiments, in a compound of Formula (6). R is optionally substituted acyl. In some embodiments, orcinol, orsellinic acid, divarinol, divaric acid, olivetol, olivetolic acid, sphaerophorol, sphaeropholic acid, phlorisovalerophenone, naringenin, resveratrol, or a combination thereof are substrates.

In some embodiments, a substrate of the prenyltransferase is a compound of Formula (7′):

embedded image

wherein a is 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10, where examples include, but are not limited to, geranyl diphosphate or geranyl pyrophosphate (GPP), neryl pyrophosphate (NPP) or farnesyl pyrophosphate. In certain embodiments, a prenyltransferase substrate is a compound of Formula (7′):

embedded image

wherein a is 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10. In certain embodiments, a prenyltransferase substrate is a compound of Formula (7′):

embedded image

wherein a is 1, 2, 3, 4, or 5. In certain embodiments, a prenyltransferase substrate is geranyl diphosphate or geranyl pyrophosphate (GPP).

In some embodiments, a is 1. In some embodiments, a is 2. In some embodiments, a is 3. In some embodiments, a is 4. In some embodiments, a is 5. In some embodiments, a is 6. In some embodiments, a is 7. In some embodiments, a is 8. In some embodiments, a is 9. In some embodiments, a is 10. In some embodiments, a is 1, 2, 3, 4, or 5. In some embodiments, a is 1, 2, 3, or 4. In some embodiments, a is 6, 7, 8, 9, or 10.

In some embodiments, a substrate of the prenyltransferase is a compound of Formula (7a):

embedded image

In some embodiments, PT catalyzes the formation of a compound one or more of Formula (8a), Formula (8w), Formula (8x), Formula (8′), Formula (8y), and/or Formula (8z):

embedded image

wherein a is 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10.

In some embodiments, PT catalyzes the formation of a compound of Formula (8′);

embedded image

wherein a is 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10.

In some embodiments, PT catalyzes the formation of a compound of Formula (8):

embedded image

In some embodiments, a compound of Formula (8) is a compound of Formula (8a):

embedded image

In some embodiments, PT catalyzes the formation of a compound of Formula (8x):

embedded image

In some embodiments, a compound of Formula (8x) is of Formula (13):

embedded image

In some embodiments, PT catalyzes the formation of a compound of Formula (13):

embedded image

In some embodiments, a compound of Formula (13) is a compound of Formula (8b):

embedded image

In some embodiments, the PT is a cannabigerolic acid synthase (CBGAS). CBGAS catalyzes the formation of CBGA from OA and GPP.

In some embodiments, a PT is a cannabigerovarinic acid synthase (CBGVAS). CBGVAS catalyze the formation of CBGVA from divarinic acid (DVA) and geranyl pyroshosphate (GPP).

In some embodiments, a PT may be capable of consuming a substrate of a compound of Formula 6 in FIG. 2 at a rate that is at least 1% (e.g., at least 5%, at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 100%, at least 125%, at least 150%, at least 175%, at least 200%, at least 300%, at least 400%, at least 500%, at least 600%, at least 700%, at least 800%, at least 900%, or at least 1,000%) faster or slower relative to a control.

In some embodiments, a control is a wild-type reference PT. A wild-type reference PT can be full-length or truncated. A wild-type reference PT can be part of a fusion protein. In some embodiments, a control is any one of SEQ ID NOs: 1-10. In some embodiments, a control is a fusion protein comprising any one of SEQ ID NOs: 1-10.

e. Prenylation

In addition to promiscuity in regard to potential substrates utilized, many prenyltransferases are known to also be promiscuous as to the products formed due to the ability to prenylate a prenyl acceptor at different sites, further resulting in a broad spectrum of potential products formed using a particular enzyme (Chen et al. Nat. Chem. Biol. (2017): 13(2): 226-234). When tested for activity using geranyl pyrophosphate (GPP) and olivetolic acid (OA) as substrates, NphB and CsPT4 produce multiple prenylation products (Kumano et al. Bioorganic Medicinal Chemistry, 2008; Luo et al. Nature, 2019). In particular, on OA at carbon positions labeled 3 and 5 and oxygen positions labeled 2 and 4 in Structure 6a (FIG. 4). Zirpel et al. reported the major prenylation product of wild-type NphB to be 2-O-Geranyl Olivetolic Acid (OGOA, Formula (8b) in FIG. 4)), with CBGA produced as the minor product (Formula (8a) in FIG. 1 and FIG. 4, Zirpel et al. Journal of Biotechnology, 2017). Functional expression of NphB and production of CBGA in S. cerevisiae was detected (Zirpel et al. Journal of Biotechnology, 2017).

In some instances, it may be preferable to prenylate at a particular position in Formula (6) or Formula (5). For example, it may be preferable to use a prenyltransferase (e.g., in combination with a terminal synthase) to produce phytocannabinoids, which are commonly prenylated at the C3 position of Formula (6).

In some instances, prenylation at a particular position in Formula (6) or Formula (5) may be used to alter the pharmacokinetic profile of cannabinoid products. For example, prenylation at a particular position in Formula (6) or Formula (5) may allow for the development of a cannabinoid product that crosses the blood brain barrier.

In some embodiments, a PT described in this disclosure transfers one or more prenyl groups to any of positions 2, 3, 4, or 5 in a compound of Formula (5), shown below:

embedded image

In some embodiments, a PT described in this disclosure transfers one or more prenyl groups to position 3 in a compound of Formula (5), shown below:

embedded image

In some embodiments, a PT described in this disclosure transfers one or more prenyl groups to any of positions 1, 2, 3, 4, or 5 in a compound of Formula (6), shown below:

embedded image

In some embodiments, the PT transfers a prenyl group to any of positions 1, 2, 3, 4, or 5 in a compound of Formula (6), shown below:

embedded image

to form a compound of one or more of Formula (8w), Formula (8x), Formula (8′), Formula (8y), Formula (8z):

embedded image

or a pharmaceutically acceptable salt, solvate, hydrate, polymorph, co-crystal, tautomer, stereoisomer, isotopically labeled derivative, or prodrug thereof, wherein a is 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10. In some embodiments, the PT transfers a prenyl group to any of positions 1, 2, 3, 4, or 5 in a compound of Formula (6), shown below:

embedded image

to form a compound of one or more of Formula (8w), Formula (8x), Formula (8′), Formula (8y), Formula (8z), wherein a is 1, 2, 3, 4, or 5. In some embodiments, the PT transfers a prenyl group to any of positions 1, 2, 3, 4, or 5 in a compound of Formula (6), shown below:

embedded image

to form a compound of one or more of Formula (8w), Formula (8x), Formula (8′), Formula (8y), Formula (8z), or a pharmaceutically acceptable salt thereof, wherein a is 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10.