BIOSYNTHESIS OF CANNABINOIDS AND CANNABINOID PRECURSORS

Information

  • Patent Application
  • 20220307060
  • Publication Number
    20220307060
  • Date Filed
    March 09, 2022
    2 years ago
  • Date Published
    September 29, 2022
    2 years ago
Abstract
Aspects of the disclosure relate to biosynthesis of cannabinoids and cannabinoid precursors in recombinant cells and in vitro.
Description
REFERENCE TO A SEQUENCE LISTING SUBMITTED AS A TEXT FILE VIA EFS-WEB

The instant application contains a Sequence Listing which has been submitted in text format via EFS-Web and is hereby incorporated by reference in its entirety. Said text copy, created on Oct. 23, 2020, is named G091970030US07-SEQ-FL.txt and is 1,806,581 bytes in size.


FIELD OF INVENTION

The present disclosure relates to the biosynthesis of cannabinoids and cannabinoid precursors in recombinant cells.


BACKGROUND

Cannabinoids are chemical compounds that may act as ligands for endocannabinoid receptors and have multiple medical applications. Traditionally, cannabinoids have been isolated from plants of the genus Cannabis. The use of plants for producing cannabinoids is inefficient, however, with isolated products restricted to the primary endogenous Cannabis compounds, and the cultivation of Cannabis plants is restricted in many jurisdictions. Cannabinoids can also be produced through chemical synthesis (see, e.g., U.S. Pat. No. 7,323,576 to Souza et al). However, such methods suffer from low yields and high cost. Production of cannabinoids, cannabinoid analogs, and cannabinoid precursors using engineered organisms may provide an advantageous approach to meet the increasing demand for these compounds.


SUMMARY

Aspects of the present disclosure provide methods for production of cannabinoids and cannabinoid precursors from fatty acid substrates using genetically modified host cells.


Aspects of the present disclosure provide host cells that comprises a heterologous polynucleotide encoding a polyketide synthase (PKS), wherein the PKS comprises a sequence that is at least 90% identical to SEQ ID NO: 7. In some embodiments, relative to the sequence of SEQ ID NO: 7, the PKS comprises an amino acid substitution at a residue corresponding to position 34, 50, 70, 71, 76, 100, 151, 203, 219, 285, 359, and/or 385 in SEQ ID NO: 7. In some embodiments, the PKS comprises: the amino acid Q at a residue corresponding to position 34 in SEQ ID NO: 7; the amino acid N at a residue corresponding to position 50 in SEQ ID NO: 7; the amino acid M at a residue corresponding to position 70 in SEQ ID NO: 7; the amino acid Y at a residue corresponding to position 71 in SEQ ID NO: 7; the amino acid I at a residue corresponding to position 76 in SEQ ID NO: 7; the amino acid P or T at a residue corresponding to position 100 in SEQ ID NO: 7; the amino acid P at a residue corresponding to position 151 in SEQ ID NO: 7; the amino acid K at a residue corresponding to position 203 in SEQ ID NO: 7; the amino acid C at a residue corresponding to position 219 in SEQ ID NO: 7; the amino acid A at a residue corresponding to position 285 in SEQ ID NO: 7; the amino acid M at a residue corresponding to position 359 in SEQ ID NO: 7; and/or the amino acid M at a residue corresponding to position 385 in SEQ ID NO: 7.


In some embodiments, the PKS is capable of producing:


a) a compound of Formula (4):




embedded image


b) a compound of Formula (5):




embedded image


c) a compound of Formula (6):




embedded image


In some embodiments,


a) the compound of Formula (4) is the compound for Formula (4a):




embedded image


b) the compound of Formula (5) is the compound for Formula (5a):




embedded image


and/or


c) the compound of Formula (6) is the compound of Formula (6a):




embedded image


In some embodiments, the host cell produces more of a compound of Formula (5) than a host cell that comprises a heterologous polynucleotide encoding a PKS that comprises the sequence of SEQ ID NO: 7. In some embodiments, the PKS comprises one or more of the following amino acid substitutions relative to SEQ ID NO: 7: V71Y and F70M. In some embodiments, the PKS comprises: C at a residue corresponding to position 164 in SEQ ID NO: 7; H at a residue corresponding to position 304 in SEQ ID NO: 7; and/or N at a residue corresponding to position 337 in SEQ ID NO: 7. In some embodiments, the PKS comprises SEQ ID NO: 7. In some embodiments, the PKS comprises SEQ ID NO: 15 or 145. In some embodiments, the heterologous polynucleotide comprises a sequence that is at least 90% identical to SEQ ID NOs: 38 or 176.


Aspects of the present disclosure relate to host cell that comprises a heterologous polynucleotide encoding a polyketide synthase (PKS), wherein the PKS comprises a sequence that is at least 90% identical to SEQ ID NO: 714. In some embodiments, the PKS is capable of producing:


a. a compound of Formula (4):




embedded image


b. a compound of Formula (5):




embedded image


c. a compound of Formula (6):




embedded image


In some embodiments,


a) the compound of Formula (4) is the compound for Formula (4a):




embedded image


b) the compound of Formula (5) is the compound for Formula (5a):




embedded image


and/or


c) the compound of Formula (6) is the compound of Formula (6a):




embedded image


Further aspects of the present disclosure provide host cells that comprises a heterologous polynucleotide encoding a polyketide synthase (PKS), wherein relative to the sequence of SEQ ID NO: 5 the PKS comprises one or more amino acid substitutions within the active site of the PKS, and wherein the host cell is capable of producing a compound of Formula (4), (5), or (6).


In some embodiments, relative to SEQ ID NO: 5, the PKS comprises an amino acid substitution at one or more of the following positions in SEQ ID NO: 5: 17, 23, 25, 51, 54, 64, 95, 123, 125, 153, 196, 201, 207, 241, 247, 267, 273, 277, 296, 307, 320, 324, 326, 328, 334, 335C, and 375. In some embodiments, relative to SEQ ID NO:5, the PKS comprises: T17K, I23C, L25R, K51R, D54R, F64Y, V95A, T123C, A125S, Y153G, E196K, L201C, I207L, L241I, T247A, M267K, M267G, I273V, L277M, T296A, V307I, D320A, V324I, S326R, H328Y, S334P, S334A, T335C, R375T, or any combination thereof. In some embodiments, relative to SEQ ID NO: 5, the PKS further comprises an amino acid substitution at one or more of the following positions in SEQ ID NO: 5: 284, 100, 116, 278, 108, 348, 71, 92, 128, 100, 135, 229, 128, and 128. In some embodiments, relative to SEQ ID NO:5, the PKS comprises: I284Y, K100L, K116R, I278E, K108D, L348S, K71R, V92G, T128V, K100M, Y135V, P229A, T128A, T128I, or any combination thereof. In some embodiments,


a) the compound of Formula (4) is the compound of Formula (4a):




embedded image


b) the compound of Formula (5) is the compound of Formula (5a):




embedded image


and/or


c) the compound of Formula (6) is the compound of Formula (6a):




embedded image


In some embodiments, the host cell produces more of a compound of Formula (5) than a host cell that comprises a heterologous polynucleotide encoding a PKS that comprises the sequence of SEQ ID NO: 5. In some embodiments, the PKS comprises at least 90% to any one SEQ ID NOs: 207-249. In some embodiments, the heterologous polynucleotide comprises a sequence that is at least 90% identical to SEQ ID NOs: 250-292.


Further aspects of the present disclosure relate to host cells that comprises a heterologous polynucleotide encoding a polyketide synthase (PKS), wherein relative to the sequence of SEQ ID NO: 5 the PKS comprises the amino acid substitution T335C. In some embodiments, the PKS is at least 90% identical to SEQ ID NO: 207. In some embodiments, the PKS comprises SEQ ID NO: 207.


Further aspects of the present disclosure relate to host cells that comprises a heterologous polynucleotide encoding a polyketide synthase (PKS), wherein the PKS comprises the amino acid C at a residue corresponding to position 335 of SEQ ID NO: 5, and wherein the host cell is capable of producing more of a compound of Formula (5) than a host cell that comprises a heterologous polynucleotide encoding a PKS comprising SEQ ID NO: 5.


In some embodiments, the PKS comprises a sequence that is at least 90% identical to any one of SEQ ID NOs: 7, 13, 145, 8, and 15. In some embodiments, the PKS comprises a sequence that is at least 90% identical to SEQ ID NO: 5. In some embodiments, the compound of Formula (5) is the compound of Formula (5a):




embedded image


In some embodiments, the heterologous polynucleotide comprises a sequence that is at least 90% identical to SEQ ID NO: 250 or 706.


Further aspects of the present disclosure provide host cells that comprises a heterologous polynucleotide encoding a polyketide synthase (PKS), wherein the PKS is capable of reacting a compound of Formula (2) with a compound of Formula (3):




embedded image


to produce a compound of Formula (6):




embedded image


In some embodiments, the PKS comprises a sequence that is at least 90% identical to SEQ ID NO: 6. In some embodiments, the PKS comprises the amino acid W at a residue corresponding to position 339 of SEQ ID NO: 6. In some embodiments, the PKS comprises: C at a residue corresponding to position 164 in SEQ ID NO: 6; H at a residue corresponding to position 304 in SEQ ID NO: 6; and/or N at a residue corresponding to position 337 in SEQ ID NO: 6. In some embodiments, the PKS is capable of producing:


a compound of Formula (4):




embedded image


b) or a compound of Formula (5):




embedded image


In some embodiments,


the compound of Formula (6) is a compound for Formula (6a):




embedded image


In some embodiments,


a) the compound of Formula (4) is a compound for Formula (4a):




embedded image


b) the compound of Formula (5) is a compound for Formula (5a):




embedded image


In some embodiments,


a. the compound of Formula (2) is a compound of Formula (2a):




embedded image


b. the compound of Formula (3) is a compound of Formula (3a):




embedded image


In some embodiments, the host cell produces a ratio of compound (6) to compound (5) that is higher than the ratio produced by a host cell that comprises a heterologous polynucleotide encoding a PKS that comprises the sequence of SEQ ID NO: 6. In some embodiments, the PKS comprises SEQ ID NO: 6. In some embodiments, the heterologous polynucleotide comprises a sequence that is at least 90% identical to SEQ ID NO: 37 or 186.


Further aspects of the disclosure provide host cells that comprises a heterologous polynucleotide encoding an acyl activating enzyme (AAE), wherein the AAE comprising a sequence that is at least 90% identical to a sequence selected from SEQ ID NOs: 63-69, 141-142, and 707-708.


Further aspects of the disclosure provide host cell that comprises a heterologous polynucleotide comprising a sequence that is at least 90% identical to a sequence selected from SEQ ID NOs: 70-76 and 712-713.


Further aspects of the disclosure provide host cells that comprises a heterologous polynucleotide encoding an acyl activating enzyme (AAE), wherein the AAE comprises: the amino acid sequence SGAAPLG (SEQ ID NO: 114); the amino acid sequence AYLGMSSGTSGG (SEQ ID NO: 115); the amino acid sequence DQPA (SEQ ID NO: 116); the amino acid sequence QVAPAELE (SEQ ID NO: 117); the amino acid sequence VVID (SEQ ID NO: 118); and/or the amino acid sequence SGKILRRLLR (SEQ ID NO: 119). In some embodiments, the host cell produces at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 100% more hexanoyl-coenzyme A in the presence of hexanoic acid and Coenzyme A relative to a recombinant host cell that does not comprise a heterologous gene encoding an AAE and/or the host cell produces at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 100% more butanoyl-coenzyme A in the presence of butyric acid and Coenzyme A relative to a recombinant host cell that does not comprise a heterologous gene encoding an AAE.


In some embodiments, the AAE comprises: the amino acid sequence SGAAPLG (SEQ ID NO: 114) at residues corresponding to positions 319-325 in SEQ ID NO:64; the amino acid sequence AYLGMSSGTSGG (SEQ ID NO: 115) at residues corresponding to positions 194-205 in SEQ ID NO:64; the amino acid sequence DQPA (SEQ ID NO: 116) at residues corresponding to positions 398-401 in SEQ ID NO:64; the amino acid sequence QVAPAELE (SEQ ID NO: 117) at residues corresponding to positions 495-502 in SEQ ID NO:64; the amino acid sequence VVID (SEQ ID NO: 118) at residues corresponding to positions 564-567 in SEQ ID NO:64; and/or the amino acid sequence SGKILRRLLR (SEQ ID NO: 119) at residues corresponding to positions 574-583 in SEQ ID NO:64.


Further aspects of the disclosure provide host cells that comprises a heterologous polynucleotide encoding an acyl activating enzyme (AAE), wherein the AAE comprises: an amino acid sequence with no more than three amino acid substitutions at residues corresponding to positions 428-440 in SEQ ID NO:64; or an amino acid sequence with no more than one amino acid substitution at residues corresponding to positions 482-491 in SEQ ID NO:64, wherein the host cell produces at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 100% more hexanoyl-coenzyme A in the presence of hexanoic acid and Coenzyme A relative to a recombinant host cell that does not comprise a heterologous gene encoding an AAE; and/or wherein the host cell produces at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 100% more butanoyl-coenzyme A in the presence of butyric acid and Coenzyme A relative to a recombinant host cell that does not comprise a heterologous gene encoding an AAE.


In some embodiments, the AAE comprises: I or V at a residue corresponding to position 432 in SEQ ID NO:64; S or D at a residue corresponding to position 434 in SEQ ID NO:64; K or N at a residue corresponding to position 438 in SEQ ID NO:64; and/or L or M at a residue corresponding to position 488 in SEQ ID NO:64. In some embodiments, the AAE comprises: the amino acid sequence RGPQIMSGYHKNP (SEQ ID NO: 120); the amino acid sequence RGPQVMDGYHNNP (SEQ ID NO: 121); the amino acid sequence RGPQIMDGYHKNP (SEQ ID NO: 122); the amino acid sequence VDRTKELIKS (SEQ ID NO: 123); and/or the amino acid sequence VDRTKEMIKS (SEQ ID NO: 124). In some embodiments, the AAE comprises a sequence that is at least 90% identical to a sequence selected from SEQ ID NOs: 63-69, 141-142, and 707-708. In some embodiments, the AAE comprises at least one conservative substitution relative to the sequence of SEQ ID NO:64. In some embodiments, the host cell further comprises one or more heterologous polynucleotides encoding one or more of: a polyketide synthase (PKS), a polyketide cyclase (PKC); a prenyltransferase (PT); and/or a terminal synthase (TS).


Further aspects of the present disclosure provide methods comprising culturing any of the host cells of the disclosure. In some embodiments, the host cell is cultured in media comprising sodium hexanoate. In some embodiments, the host cell is a plant cell, an algal cell, a yeast cell, a bacterial cell, or an animal cell. In certain embodiments, the host cell is a yeast cell. In some embodiments, the yeast cell is a Saccharomyces cell, a Yarrowia cell, a Pichia cell or a Komagataella cell. In certain embodiments, wherein the Saccharomyces cell is a Saccharomyces cerevisiae cell. In some embodiments, the host cell is a bacterial cell. In certain embodiments, the bacterial cell is an E. coli cell. In some embodiments, the host cell further comprises one or more heterologous polynucleotides encoding one or more of: an acyl activating enzyme (AAE), a polyketide cyclase (PKC), a prenyltransferase (PT), and/or a terminal synthase (TS).


Further aspects of the present disclosure provide non-naturally occurring nucleic acid encoding a polyketide synthase (PKS), wherein the non-naturally occurring nucleic acid comprises at least 90% identity to SEQ ID NOs: 32-62, 93-108, 172-206, 250-292, 421-548, 628-705 or 706. Further aspects of the present disclosure provide vectors comprising non-naturally occurring nucleic acids of the disclosure. Further aspects of the present disclosure provide expression cassettes comprising non-naturally occurring nucleic acids of the disclosure. Further aspects of the present disclosure provide host cells transformed with non-naturally occurring nucleic acids, vector, or expression cassettes of the present disclosure. Further aspects of the present disclosure provide host cells that comprise non-naturally occurring nucleic acids, vector, or expression cassettes of the present disclosure.


Each of the limitations of the invention can encompass various embodiments of the invention. It is, therefore, anticipated that each of the limitations of the invention involving any one element or combinations of elements can be included in each aspect of the invention. This disclosure is not limited in its application to the details of construction and the arrangement of components set forth in the following description or illustrated in the drawings. The invention is capable of other embodiments and of being practiced or of being carried out in various ways. Also, the phraseology and terminology used in this application is for the purpose of description and should not be regarded as limiting. The use of “including,” “comprising,” or “having,” “containing,” “involving,” and variations thereof, is meant to encompass the items listed thereafter and equivalents thereof as well as additional items.





BRIEF DESCRIPTION OF DRAWINGS

The accompanying drawings are not intended to be drawn to scale. In the drawings, each identical or nearly identical component that is illustrated in various figures is represented by a like numeral. For purposes of clarity, not every component may be labeled in every drawing. In the drawings:



FIG. 1 is a schematic depicting the native Cannabis biosynthetic pathway for production of cannabinoid compounds, including five enzymatic steps mediated by: (Rla) acyl activating enzymes (AAE); (R2a) olivetol synthase enzymes (OLS); (R3a) olivetolic acid cyclase enzymes (OAC); (R4a) cannabigerolic acid synthase enzymes (CBGAS); and (R5a) terminal synthase enzymes (TS). Formulae 1a-11a correspond to hexanoic acid (1a), hexanoyl-CoA (2a), malonyl-CoA (3a), 3,5,7-trioxododecanoyl-CoA (4a), olivetol (5a), olivetolic acid (6a), geranyl pyrophosphate (7a), cannabigerolic acid (8a), cannabidiolic acid (9a), tetrahydrocannabinolic acid (10a), and cannabichromenic acid (11a). Hexanoic acid is an exemplary carboxylic acid substrate; other carboxylic acids may also be used (e.g., butyric acid, isovaleric acid, octanoic acid, decanoic acid, etc.; see e.g., FIG. 3 below). The enzymes that catalyze the synthesis of 3,5,7-trioxododecanoyl-CoA and olivetolic acid are shown in R2a and R3a, respectively, and can include multi-functional enzymes that catalyze the synthesis of 3,5,7-trioxododecanoyl-CoA and olivetolic acid. The enzymes cannabidiolic acid synthase (CBDAS), tetrahydrocannabinolic acid synthase (THCAS), and cannabichromenic acid synthase (CBCAS) that catalyze the synthesis of cannabidiolic acid, tetrahydrocannabinolic acid, and cannabichromenic acid, respectively, are shown in step R5a. FIG. 1 is adapted from Carvalho et al. “Designing Microorganisms for Heterologous Biosynthesis of Cannabinoids” (2017) FEMS Yeast Research June 1; 17(4), which is incorporated by reference in its entirety.



FIG. 2 is a schematic depicting a heterologous biosynthetic pathway for production of cannabinoid compounds, including five enzymatic steps mediated by: (R1) acyl activating enzymes (AAE); (R2) polyketide synthase enzymes (PKS) or bifunctional polyketide synthase-polyketide cyclase enzymes (PKS-PKC); (R3) polyketide cyclase enzymes (PKC) or bifunctional PKS-PKC enzymes; (R4) prenyltransferase enzymes (PT); and (R5) terminal synthase enzymes (TS). Any carboxylic acid of varying chain lengths, structures (e.g., aliphatic, alicyclic, or aromatic) and functionalization (e.g., hydroxylic-, keto-, amino-, thiol-, aryl-, or alogeno-) may also be used as precursor substrates (e.g., thiopropionic acid, hydroxy phenyl acetic acid, norleucine, bromodecanoic acid, butyric acid, isovaleric acid, octanoic acid, decanoic acid, etc).



FIG. 3 is a non-exclusive representation of select putative precursors for the cannabinoid pathway in FIG. 2.



FIG. 4 is a graph showing activity of E. coli strains expressing candidate AAEs as measured by a 5,5′-dithiobis-(2-nitrobenzoic acid) (“DTNB”) assay. Lysates of E. coli expressing candidate AAEs were assayed for ligation activity of free CoA to either butyrate or hexanoate. Activity was quantified by measuring the decrease in absorbance at 412 nm, corresponding to a decrease of free CoA in solution. Error bars represent the standard deviation of 2 independent measurements. Negative control strain t49568 expresses an aldehyde dehydrogenase protein from Y. lipolytica (corresponding to Uniprot ID Q6C5T1).



FIGS. 5A-5B show a plasmid used to express AAE and OLS proteins in S. cerevisiae. The coding sequence for the enzyme being expressed (labeled “Library gene”) is driven by the GAL1 promoter. The plasmid contains markers for both yeast (URA3) and bacteria (ampR), as well as origins of replication for yeast (2 micron), bacteria (pBR322), and phage (f1).



FIG. 6 is a graph showing activity of S. cerevisiae strains expressing candidate AAEs as measured by a DTNB assay. Lysates of S. cerevisiae expressing candidate AAEs were assayed for ligation activity of free CoA to hexanoate. Activity was quantified by measuring the decrease in absorbance at 412 nm, corresponding to a decrease of free CoA in solution. Error bars represent the standard deviation of 3 independent measurements. Negative control strain t390338 expresses GFP.



FIGS. 7A-7C show a sequence alignment of acyl activating enzymes (AAEs). An alignment of t51477 (SEQ ID NO: 65), t49578 (SEQ ID NO: 63), t49594 (SEQ ID NO: 64), t392878 (SEQ ID NO: 68), t392879 (SEQ ID NO: 69), t55127 (SEQ ID NO: 66), and t55128 (SEQ ID NO: 67) is shown. The sequence alignment was conducted using Clustal Omega. See, e.g., Chojnacki et al., Nucleic Acids Res. 2017 Jul. 3; 45(W1):W550-W553.



FIG. 8 is a graph showing olivetol production by S. cerevisiae strains expressing OLS candidate enzymes. Peak areas obtained via LC/MS quantification were normalized to an internal standard for olivetol. Normalized peak areas were further normalized to a positive control strain (t339582) contained on each plate. As explained in Example 2 and in Table 6, OLS candidate enzymes within the library depicted in this Figure, and the positive control OLSs depicted in this Figure, were later found to contain a deletion in the nucleotide sequence encoding the OLS proteins, which led to the production of truncated proteins. Accordingly, all candidate OLS enzymes in this library, and the positive controls, were also tested independently in a new library containing only full-length OLS sequences, described in Example 3.



FIG. 9 is a graph showing olivetolic acid (OA) production by S. cerevisiae strains expressing OLS candidate enzymes. Peak areas obtained via LC/MS quantification were normalized to an internal standard for OA. Normalized peak areas were further normalized to a positive control strain (t339582) contained on each plate. As explained in Example 2 and in Table 6, OLS candidate enzymes within the library depicted in this Figure, and the positive control OLSs depicted in this Figure, were later found to contain a deletion in the nucleotide sequence encoding the OLS proteins, which led to the production of truncated proteins. Accordingly, all candidate OLS enzymes in this library, and the positive controls, were also tested independently in a new library containing only full-length OLS sequences, described in Example 3.



FIG. 10 is a graph showing normalized OA versus olivetol production by S. cerevisiae strains expressing OLS candidate enzymes. Peak areas obtained via LC/MS quantification for olivetol and OA were normalized to an internal standard for olivetol or OA, respectively. The regression line shown in FIG. 10 represents a 1:1 ratio of olivetol and olivetolic acid. Normalized peak areas were further normalized to positive control strain (t339582) contained on each plate. The strain t395094 demonstrated significantly increased olivetol production compared to the positive controls, while the strain t393974 showed enhanced production of OA over the positive control strains. The enhanced production of OA over olivetol by t393974 suggests that this enzyme possesses bifunctional PKS-PKC activity. As explained in Example 2 and in Table 6, OLS candidate enzymes within the library depicted in this Figure, and the positive control OLSs depicted in this Figure, were later found to contain a deletion in the nucleotide sequence encoding the OLS proteins, which led to the production of truncated proteins. Accordingly, all candidate OLS enzymes in this library, and the positive controls, were also tested independently in a new library containing only full-length OLS sequences, described in Example 3.



FIGS. 11A-11H show a sequence alignment of olivetol synthases (OLSs). An alignment of t394911 (SEQ ID NO: 28), t393974 (SEQ ID NO: 6), t393720 (SEQ ID NO: 27), t394336 (SEQ ID NO: 8), t393991 (SEQ ID NO: 7), t395011 (SEQ ID NO: 15), t339568 (SEQ ID NO: 5), t339579 (SEQ ID NO: 30), t339582 (SEQ ID NO: 31), t394457 (SEQ ID NO: 10), t394521 (SEQ ID NO: 11), t394436 (SEQ ID NO: 26), t395094 (SEQ ID NO: 17), t394087 (SEQ ID NO: 1), t395023 (SEQ ID NO: 29), t395103 (SEQ ID NO: 18), t394687 (SEQ ID NO: 2), t393835 (SEQ ID NO: 19), t394037 (SEQ ID NO: 22), t394905 (SEQ ID NO: 13), t393563 (SEQ ID NO: 4), t394981 (SEQ ID NO: 14), t394790 (SEQ ID NO: 12), t394797 (SEQ ID NO: 16), t394091 (SEQ ID NO: 21), t394043 (SEQ ID NO: 24), t394404 (SEQ ID NO: 25), t393495 (SEQ ID NO: 3), t394547 (SEQ ID NO: 9), t394115 (SEQ ID NO: 20), and t394279 (SEQ ID NO: 23) is shown. The sequence alignment was conducted using Clustal Omega. See, e.g., Chojnacki et al., Nucleic Acids Res. 2017 Jul. 3; 45(W1):W550-W553.



FIG. 12A-12B are graphs showing olivetol production (FIG. 12A) and olivetolic acid production (FIG. 12B) from S. cerevisiae strains expressing OLS candidate enzymes. Strains shown were determined to be hits from the primary screen of the library of OLS candidates screened in Example 3.



FIG. 13 is a graph showing olivetol production by S. cerevisiae strains expressing C. sativa OLS (CsOLS) point-mutant variants. Concentrations of olivetol in g/L were determined via LC/MS quantification. The strain t405417 (having a T335C point mutation relative to the CsOLS set forth in SEQ ID NO: 5) demonstrated the highest olivetol production. Error bars represent the standard deviation of 4 independent measurements.



FIGS. 14A-14B are graphs showing olivetol production from S. cerevisiae strains expressing single point mutation and multiple point mutation variants based on a Cymbidium hybrid cultivar OLS template (ChOLS) (FIG. 14A) and a Corchorus olitorius OLS (CoOLS) template (FIG. 14B). Strains shown were screened in a secondary screen as described in Example 5. Olivetol titers were normalized to the mean olivetol titer produced by the positive control strain t527346 (FIG. 14A), and t606797 (FIG. 14B).



FIG. 15 is a graph showing olivetol production by a prototrophic S. cerevisiae strain expressing candidate OLS enzymes. Concentrations of olivetol in g/L were determined via LC/MS quantification. Performance of OLS candidate enzymes exhibiting higher olivetol production than C. sativa OLS positive controls is shown. The strains t485662, t485672, and t496073 demonstrated comparable olivetol production to the CsOLS T335C point-mutant positive control.



FIG. 16 is a three-dimensional homology model showing residues within about 8 angstroms of any of the residues within the catalytic triad of the C. sativa OLS comprising SEQ ID NO: 5 and/or within about 8 angstroms of a docked substrate within the C. sativa OLS comprising SEQ ID NO: 5. Only residues at which an amino acid substitution resulted in production of at least 10 mg/L olivetol are shown with their electron clouds in light gray. The active site was defined to include a docked molecule of hexanoyl-CoA (OLS substrate) plus the catalytic triad. The top model was rotated 900 to produce the bottom model.



FIG. 17 is a three-dimensional homology model showing residues within about 12 angstroms of any of the residues within the catalytic triad of the C. sativa OLS comprising SEQ ID NO: 5 and/or within about 12 angstroms of a docked substrate within the C. sativa OLS comprising SEQ ID NO: 5. Only residues at which an amino acid substitution resulted in production of at least 10 mg/L olivetol are shown with their electron clouds. The active site was defined to include a docked molecule of hexanoyl-CoA (OLS substrate) plus the catalytic triad. The top model was rotated 90° to produce the bottom model.





DETAILED DESCRIPTION

This disclosure provides methods for production of cannabinoids and cannabinoid precursors from fatty acid substrates using genetically modified host cells. Methods include heterologous expression of enzymes including acyl activating enzymes (AAE) and polyketide synthase enzymes (PKS) such as olivetol synthase enzymes (OLS). The disclosure describes identification of AAE and OLS enzymes that can be functionally expressed in eukaryotic (e.g., S. cerevisiae) and prokaryotic (E. coli) host cells such as S. cerevisiae and E. coli. As demonstrated in Example 1, novel AAE enzymes were identified that are capable of using hexanoate and butyrate as substrates to produce cannabinoid precursors. As demonstrated in Examples 2-3, novel OLS enzymes were identified that are capable of producing olivetol and olivetolic acid. Examples 4-6 further demonstrate enhanced production of olivetol and/or olivetolic acid by protein engineering of OLS enzymes. The novel enzymes described in this disclosure may be useful in increasing the efficiency and purity of cannabinoid production.


Definitions

While the following terms are believed to be well understood by one of ordinary skill in the art, the following definitions are set forth to facilitate explanation of the disclosed subject matter.


The term “a” or “an” refers to one or more of an entity, i.e., can identify a referent as plural. Thus, the terms “a” or “an,” “one or more” and “at least one” are used interchangeably in this application. In addition, reference to “an element” by the indefinite article “a” or “an” does not exclude the possibility that more than one of the elements is present, unless the context clearly requires that there is one and only one of the elements.


The terms “microorganism” or “microbe” should be taken broadly. These terms are used interchangeably and include, but are not limited to, the two prokaryotic domains, Bacteria and Archaea, as well as certain eukaryotic fungi and protists. In some embodiments, the disclosure may refer to the “microorganisms” or “microbes” of lists/tables and figures present in the disclosure. This characterization can refer to not only the identified taxonomic genera of the tables and figures, but also the identified taxonomic species, as well as the various novel and newly identified or designed strains of any organism in the tables or figures. The same characterization holds true for the recitation of these terms in other parts of the specification, such as in the Examples.


The term “prokaryotes” is recognized in the art and refers to cells that contain no nucleus or other cell organelles. The prokaryotes are generally classified in one of two domains, the Bacteria and the Archaea.


“Bacteria” or “eubacteria” refers to a domain of prokaryotic organisms. Bacteria include at least 11 distinct groups as follows: (1) Gram-positive (gram+) bacteria, of which there are two major subdivisions: (a) high G+C group (Actinomycetes, Mycobacteria, Micrococcus, others) and (b) low G+C group (Bacillus, Clostridia, Lactobacillus, Staphylococci, Streptococci, Mycoplasmas); (2) Proteobacteria, e.g., Purple photosynthetic+non-photosynthetic Gram-negative bacteria (includes most “common” Gram-negative bacteria); (3) Cyanobacteria, e.g., oxygenic phototrophs; (4) Spirochetes and related species; (5) Planctomyces; (6) Bacteroides, Flavobacteria; (7) Chlamydia; (8) Green sulfur bacteria; (9) Green non-sulfur bacteria (also anaerobic phototrophs); (10) Radioresistant micrococci and relatives; and (11) Thermotoga and Thermosipho thermophiles.


The term “Archaea” refers to a taxonomic classification of prokaryotic organisms with certain properties that make them distinct from Bacteria in physiology and phylogeny.


The term “Cannabis” refers to a genus in the family Cannabaceae. Cannabis is a dioecious plant. Glandular structures located on female flowers of Cannabis, called trichomes, accumulate relatively high amounts of a class of terpeno-phenolic compounds known as phytocannabinoids (described in further detail below). Cannabis has conventionally been cultivated for production of fibre and seed (commonly referred to as “hemp-type”), or for production of intoxicants (commonly referred to as “drug-type”). In drug-type Cannabis, the trichomes contain relatively high amounts of tetrahydrocannabinolic acid (THCA), which can convert to tetrahydrocannabinol (THC) via a decarboxylation reaction, for example upon combustion of dried Cannabis flowers, to provide an intoxicating effect. Drug-type Cannabis often contains other cannabinoids in lesser amounts. In contrast, hemp-type Cannabis contains relatively low concentrations of THCA, often less than 0.3% THC by dry weight, accounting for the ability of THCA to convert to THC. Hemp-type Cannabis may contain non-THC and non-THCA cannabinoids, such as cannabidiolic acid (CBDA), cannabidiol (CBD), and other cannabinoids. Presently, there is a lack of consensus regarding the taxonomic organization of the species within the genus. Unless context dictates otherwise, the term “Cannabis” is intended to include all putative species within the genus, such as, without limitation, Cannabis sativa, Cannabis indica, and Cannabis ruderalis and without regard to whether the Cannabis is hemp-type or drug-type.


The term “cyclase activity” in reference to a polyketide synthase (PKS) enzyme (e.g., an olivetol synthase (OLS) enzyme) or a polyketide cyclase (PKC) enzyme (e.g., an olivetolic acid cyclase (OAC) enzyme), refers to the activity of catalyzing the cyclization of an oxo fatty acyl-CoA (e.g., 3,5,7-trioxododecanoyl-COA, 3,5,7-trioxodecanoyl-COA) to the corresponding intramolecular cyclization product (e.g., olivetolic acid, divarinic acid). In some embodiments, the PKS catalyzes the C2-C7 aldol condensation of an acyl-COA with three additional ketide moieties added thereto.


A “cytosolic” or “soluble” enzyme refers to an enzyme that is predominantly localized (or predicted to be localized) in the cytosol of a host cell.


A “eukaryote” is any organism whose cells contain a nucleus and other organelles enclosed within membranes. Eukaryotes belong to the taxon Eukarya or Eukaryota. The defining feature that sets eukaryotic cells apart from prokaryotic cells (i.e., bacteria and archaea) is that they have membrane-bound organelles, especially the nucleus, which contains the genetic material, and is enclosed by the nuclear envelope.


The term “host cell” refers to a cell that can be used to express a polynucleotide, such as a polynucleotide that encodes an enzyme used in biosynthesis of cannabinoids or cannabinoid precursors. The terms “genetically modified host cell,” “recombinant host cell,” and “recombinant strain” are used interchangeably and refer to host cells that have been genetically modified by, e.g., cloning and transformation methods, or by other methods known in the art (e.g., selective editing methods, such as CRISPR). Thus, the terms include a host cell (e.g., bacterial cell, yeast cell, fungal cell, insect cell, plant cell, mammalian cell, human cell, etc.) that has been genetically altered, modified, or engineered, so that it exhibits an altered, modified, or different genotype and/or phenotype, as compared to the naturally-occurring cell from which it was derived. It is understood that in some embodiments, the terms refer not only to the particular recombinant host cell in question, but also to the progeny or potential progeny of such a host cell.


The term “control host cell,” or the term “control” when used in relation to a host cell, refers to an appropriate comparator host cell for determining the effect of a genetic modification or experimental treatment. In some embodiments, the control host cell is a wild type cell. In other embodiments, a control host cell is genetically identical to the genetically modified host cell, except for the genetic modification(s) differentiating the genetically modified or experimental treatment host cell. In some embodiments, the control host cell has been genetically modified to express a wild type or otherwise known variant of an enzyme being tested for activity in other test host cells.


The term “heterologous” with respect to a polynucleotide, such as a polynucleotide comprising a gene, is used interchangeably with the term “exogenous” and the term “recombinant” and refers to a polynucleotide that has been artificially supplied to a biological system, a polynucleotide that has been modified within a biological system, or a polynucleotide whose expression or regulation has been manipulated within a biological system. A heterologous polynucleotide that is introduced into or expressed in a host cell may be a polynucleotide that comes from a different organism or species than the host cell, or may be a synthetic polynucleotide, or may be a polynucleotide that is also endogenously expressed in the same organism or species as the host cell. For example, a polynucleotide that is endogenously expressed in a host cell may be considered heterologous when it is situated non-naturally in the host cell; expressed recombinantly in the host cell, either stably or transiently; modified within the host cell; selectively edited within the host cell; expressed in a copy number that differs from the naturally occurring copy number within the host cell; or expressed in a non-natural way within the host cell, such as by manipulating regulatory regions that control expression of the polynucleotide. In some embodiments, a heterologous polynucleotide is a polynucleotide that is endogenously expressed in a host cell but whose expression is driven by a promoter that does not naturally regulate expression of the polynucleotide. In other embodiments, a heterologous polynucleotide is a polynucleotide that is endogenously expressed in a host cell and whose expression is driven by a promoter that does naturally regulate expression of the polynucleotide, but the promoter or another regulatory region is modified. In some embodiments, the promoter is recombinantly activated or repressed. For example, gene-editing based techniques may be used to regulate expression of a polynucleotide, including an endogenous polynucleotide, from a promoter, including an endogenous promoter. See, e.g., Chavez et al., Nat Methods. 2016 July; 13(7): 563-567. A heterologous polynucleotide may comprise a wild-type sequence or a mutant sequence as compared with a reference polynucleotide sequence.


The term “at least a portion” or “at least a fragment” of a nucleic acid or polypeptide means a portion having the minimal size characteristics of such sequences, or any larger fragment of the full length molecule, up to and including the full length molecule. A fragment of a polynucleotide of the disclosure may encode a biologically active portion of an enzyme, such as a catalytic domain. A biologically active portion of a genetic regulatory element may comprise a portion or fragment of a full length genetic regulatory element and have the same type of activity as the full length genetic regulatory element, although the level of activity of the biologically active portion of the genetic regulatory element may vary compared to the level of activity of the full length genetic regulatory element.


A coding sequence and a regulatory sequence are said to be “operably joined” or “operably linked” when the coding sequence and the regulatory sequence are covalently linked and the expression or transcription of the coding sequence is under the influence or control of the regulatory sequence. If the coding sequence is to be translated into a functional protein, the coding sequence and the regulatory sequence are said to be operably joined if induction of a promoter in the 5′ regulatory sequence results in transcription of the coding sequence and if the nature of the linkage between the coding sequence and the regulatory sequence does not (1) result in the introduction of a frame-shift mutation, (2) interfere with the ability of the promoter region to direct the transcription of the coding sequence, or (3) interfere with the ability of the corresponding RNA transcript to be translated into a protein.


The term “volumetric productivity” or “production rate” refers to the amount of product formed per volume of medium per unit of time. Volumetric productivity can be reported in gram per liter per hour (g/L/h).


The term “specific productivity” of a product refers to the rate of formation of the product normalized by unit volume or mass or biomass and has the physical dimension of a quantity of substance per unit time per unit mass or volume [M·T−1·M−1 or M·T−1·L−3, where M is mass or moles, T is time, L is length].


The term “biomass specific productivity” refers to the specific productivity in gram product per gram of cell dry weight (CDW) per hour (g/g CDW/h) or in mmol of product per gram of cell dry weight (CDW) per hour (mmol/g CDW/h). Using the relation of CDW to OD600 for the given microorganism, specific productivity can also be expressed as gram product per liter culture medium per optical density of the culture broth at 600 nm (OD) per hour (g/L/h/OD). Also, if the elemental composition of the biomass is known, biomass specific productivity can be expressed in mmol of product per C-mole (carbon mole) of biomass per hour (mmol/C-mol/h).


The term “yield” refers to the amount of product obtained per unit weight of a certain substrate and may be expressed as g product per g substrate (g/g) or moles of product per mole of substrate (mol/mol). Yield may also be expressed as a percentage of the theoretical yield. “Theoretical yield” is defined as the maximum amount of product that can be generated per a given amount of substrate as dictated by the stoichiometry of the metabolic pathway used to make the product and may be expressed as g product per g substrate (g/g) or moles of product per mole of substrate (mol/mol).


The term “titer” refers to the strength of a solution or the concentration of a substance in solution. For example, the titer of a product of interest (e.g., small molecule, peptide, synthetic compound, fuel, alcohol, etc.) in a fermentation broth is described as g of product of interest in solution per liter of fermentation broth or cell-free broth (g/L) or as g of product of interest in solution per kg of fermentation broth or cell-free broth (g/Kg).


The term “total titer” refers to the sum of all products of interest produced in a process, including but not limited to the products of interest in solution, the products of interest in gas phase if applicable, and any products of interest removed from the process and recovered relative to the initial volume in the process or the operating volume in the process. For example, the total titer of products of interest (e.g., small molecule, peptide, synthetic compound, fuel, alcohol, etc.) in a fermentation broth is described as g of products of interest in solution per liter of fermentation broth or cell-free broth (g/L) or as g of products of interest in solution per kg of fermentation broth or cell-free broth (g/Kg).


The term “amino acid” refers to organic compounds that comprise an amino group, —NH2, and a carboxyl group, —COOH. The term “amino acid” includes both naturally occurring and unnatural amino acids. Nomenclature for the twenty common amino acids is as follows: alanine (ala or A); arginine (arg or R); asparagine (asn or N); aspartic acid (asp or D); cysteine (cys or C); glutamine (gln or Q); glutamic acid (glu or E); glycine (gly or G); histidine (his or H); isoleucine (ile or I); leucine (leu or L); lysine (lys or K); methionine (met or M); phenylalanine (phe or F); proline (pro or P); serine (ser or S); threonine (thr or T); tryptophan (trp or W); tyrosine (tyr or Y); and valine (val or V). Non-limiting examples of unnatural amino acids include homo-amino acids, proline and pyruvic acid derivatives, 3-substituted alanine derivatives, glycine derivatives, ring-substituted phenylalanine derivatives, ring-substituted tyrosine derivatives, linear core amino acids, amino acids with protecting groups including Fmoc, Boc, and Cbz, β-amino acids (β3 and β2), and N-methyl amino acids.


The term “aliphatic” refers to alkyl, alkenyl, alkynyl, and carbocyclic groups. Likewise, the term “heteroaliphatic” refers to heteroalkyl, heteroalkenyl, heteroalkynyl, and heterocyclic groups.


The term “alkyl” refers to a radical of, or a substituent that is, a straight-chain or branched saturated hydrocarbon group having from 1 to 20 carbon atoms (“C1-20 alkyl”). In certain embodiments, the term “alkyl” refers to a radical of, or a substituent that is, a straight-chain or branched saturated hydrocarbon group having from 1 to 10 carbon atoms (“C1-10 alkyl”). In some embodiments, an alkyl group has 1 to 9 carbon atoms (“C1-9 alkyl”). In some embodiments, an alkyl group has 1 to 8 carbon atoms (“C1-8 alkyl”). In some embodiments, an alkyl group has 1 to 7 carbon atoms (“C1-7 alkyl”). In some embodiments, an alkyl group has 1 to 6 carbon atoms (“C1-6 alkyl”). In some embodiments, an alkyl group has 2 to 6 carbon atoms (“C2-6 alkyl”). In some embodiments, an alkyl group has 3 to 5 carbon atoms (“C3-5 alkyl”). In some embodiments, an alkyl group has 5 carbon atoms (“C5 alkyl”). In some embodiments, the alkyl group has 3 carbon atoms (“C3 alkyl”). In some embodiments, an alkyl group has 1 to 5 carbon atoms (“C1-5 alkyl”). In some embodiments, an alkyl group has 1 to 4 carbon atoms (“C1-4 alkyl”). In some embodiments, an alkyl group has 1 to 3 carbon atoms (“C1-3 alkyl”). In some embodiments, an alkyl group has 1 to 2 carbon atoms (“C1-2 alkyl”). In some embodiments, an alkyl group has 1 carbon atom (“C1 alkyl”).


Examples of C1-6 alkyl groups include methyl (C1), ethyl (C2), propyl (C3) (e.g., n-propyl, isopropyl), butyl (C4) (e.g., n-butyl, tert-butyl, sec-butyl, iso-butyl), pentyl (C5) (e.g., n-pentyl, 3-pentanyl, amyl, neopentyl, 3-methyl-2-butanyl, tertiary amyl), and hexyl (C6) (e.g., n-hexyl). Additional examples of alkyl groups include n-heptyl (C7), n-octyl (C8), and the like. Unless otherwise specified, each instance of an alkyl group is independently unsubstituted (an “unsubstituted alkyl”) or substituted (a “substituted alkyl”) with one or more substituents (e.g., halogen, such as F). In certain embodiments, the alkyl group is an unsubstituted C1-10 alkyl (such as unsubstituted C1-6 alkyl, e.g., —CH3 (Me), unsubstituted ethyl (Et), unsubstituted propyl (Pr, e.g., unsubstituted n-propyl (n-Pr), unsubstituted isopropyl (i-Pr)), unsubstituted butyl (Bu, e.g., unsubstituted n-butyl (n-Bu), unsubstituted tert-butyl (tert-Bu or t-Bu), unsubstituted sec-butyl (sec-Bu), unsubstituted isobutyl (i-Bu)). In certain embodiments, the alkyl group is a substituted C1-10 alkyl (such as substituted C1-6 alkyl, e.g., —CF3, benzyl).


The term “acyl” refers to a group having the general formula —C(═O)Rx, —C(═O)ORX1, —C(═O)—O—C(═O)RX1, —C(═O)SRX1, —C(═O)N(RX1)2, —C(═S)RX1, —C(═S)N(RX1)2, and —C(═S)S(RX1), —C(═NRX1)RX1, —C(═NRX1)OR, —C(═NRX1)SRX1, and —C(═NRX1)N(RX1)2, wherein RX1 is hydrogen; halogen; substituted or unsubstituted hydroxyl; substituted or unsubstituted thiol; substituted or unsubstituted amino; substituted or unsubstituted acyl, cyclic or acyclic, substituted or unsubstituted, branched or unbranched aliphatic; cyclic or acyclic, substituted or unsubstituted, branched or unbranched heteroaliphatic; cyclic or acyclic, substituted or unsubstituted, branched or unbranched alkyl; cyclic or acyclic, substituted or unsubstituted, branched or unbranched alkenyl; substituted or unsubstituted alkynyl; substituted or unsubstituted aryl, substituted or unsubstituted heteroaryl, aliphaticoxy, heteroaliphaticoxy, alkyloxy, heteroalkyloxy, aryloxy, heteroaryloxy, aliphaticthioxy, heteroaliphaticthioxy, alkylthioxy, heteroalkylthioxy, arylthioxy, heteroarylthioxy, mono- or di-aliphaticamino, mono- or di-heteroaliphaticamino, mono- or di-alkylamino, mono- or di-heteroalkylamino, mono- or di-arylamino, or mono- or di-heteroarylamino; or two RX1 groups taken together form a 5- to 6-membered heterocyclic ring. Exemplary acyl groups include aldehydes (—CHO), carboxylic acids (—CO2H), ketones, acyl halides, esters, amides, imines, carbonates, carbamates, and ureas. Acyl substituents include, but are not limited to, any of the substituents described in this application that result in the formation of a stable moiety (e.g., aliphatic, alkyl, alkenyl, alkynyl, heteroaliphatic, heterocyclic, aryl, heteroaryl, acyl, oxo, imino, thiooxo, cyano, isocyano, amino, azido, nitro, hydroxyl, thiol, halo, aliphaticamino, heteroaliphaticamino, alkylamino, heteroalkylamino, arylamino, heteroarylamino, alkylaryl, arylalkyl, aliphaticoxy, heteroaliphaticoxy, alkyloxy, heteroalkyloxy, aryloxy, heteroaryloxy, aliphaticthioxy, heteroaliphaticthioxy, alkylthioxy, heteroalkylthioxy, arylthioxy, heteroarylthioxy, acyloxy, and the like, each of which may or may not be further substituted).


“Alkenyl” refers to a radical of, or a substituent that is, a straight-chain or branched hydrocarbon group having from 2 to 20 carbon atoms, one or more carbon-carbon double bonds, and no triple bonds (“C2-20 alkenyl”). In some embodiments, an alkenyl group has 2 to 10 carbon atoms (“C2-10 alkenyl”). In some embodiments, an alkenyl group has 2 to 9 carbon atoms (“C2-9 alkenyl”). In some embodiments, an alkenyl group has 2 to 8 carbon atoms (“C2-8 alkenyl”). In some embodiments, an alkenyl group has 2 to 7 carbon atoms (“C2-7 alkenyl”). In some embodiments, an alkenyl group has 2 to 6 carbon atoms (“C2-6 alkenyl”). In some embodiments, an alkenyl group has 2 to 5 carbon atoms (“C2-5 alkenyl”). In some embodiments, an alkenyl group has 2 to 4 carbon atoms (“C2-4 alkenyl”). In some embodiments, an alkenyl group has 2 to 3 carbon atoms (“C2-3 alkenyl”). In some embodiments, an alkenyl group has 2 carbon atoms (“C2 alkenyl”). The one or more carbon-carbon double bonds can be internal (such as in 2-butenyl) or terminal (such as in 1-butenyl). Examples of C2-4 alkenyl groups include ethenyl (C2), 1-propenyl (C3), 2-propenyl (C3), 1-butenyl (C4), 2-butenyl (C4), butadienyl (C4), and the like. Examples of C2-6 alkenyl groups include the aforementioned C2-4 alkenyl groups as well as pentenyl (C5), pentadienyl (C5), hexenyl (C6), and the like. Additional examples of alkenyl include heptenyl (C7), octenyl (C8), octatrienyl (C8), and the like. Unless otherwise specified, each instance of an alkenyl group is independently optionally substituted, i.e., unsubstituted (an “unsubstituted alkenyl”) or substituted (a “substituted alkenyl”) with one or more substituents. In certain embodiments, the alkenyl group is unsubstituted C2-10 alkenyl. In certain embodiments, the alkenyl group is substituted C2-10 alkenyl.


“Alkynyl” refers to a radical of, or a substituent that is, a straight-chain or branched hydrocarbon group having from 2 to 20 carbon atoms, one or more carbon-carbon triple bonds, and optionally one or more double bonds (“C2-20 alkynyl”). In some embodiments, an alkynyl group has 2 to 10 carbon atoms (“C2-10 alkynyl”). In some embodiments, an alkynyl group has 2 to 9 carbon atoms (“C2-9 alkynyl”). In some embodiments, an alkynyl group has 2 to 8 carbon atoms (“C2-8 alkynyl”). In some embodiments, an alkynyl group has 2 to 7 carbon atoms (“C2-7 alkynyl”). In some embodiments, an alkynyl group has 2 to 6 carbon atoms (“C2-6 alkynyl”). In some embodiments, an alkynyl group has 2 to 5 carbon atoms (“C2-5 alkynyl”). In some embodiments, an alkynyl group has 2 to 4 carbon atoms (“C2-4 alkynyl”). In some embodiments, an alkynyl group has 2 to 3 carbon atoms (“C2-3 alkynyl”). In some embodiments, an alkynyl group has 2 carbon atoms (“C2 alkynyl”). The one or more carbon-carbon triple bonds can be internal (such as in 2-butynyl) or terminal (such as in 1-butynyl). Examples of C2-4 alkynyl groups include, without limitation, ethynyl (C2), 1-propynyl (C3), 2-propynyl (C3), 1-butynyl (C4), 2-butynyl (C4), and the like. Examples of C2-6 alkenyl groups include the aforementioned C2-4 alkynyl groups as well as pentynyl (C5), hexynyl (C6), and the like. Additional examples of alkynyl include heptynyl (C7), octynyl (C8), and the like. Unless otherwise specified, each instance of an alkynyl group is independently optionally substituted, i.e., unsubstituted (an “unsubstituted alkynyl”) or substituted (a “substituted alkynyl”) with one or more substituents. In certain embodiments, the alkynyl group is unsubstituted C2-10 alkynyl. In certain embodiments, the alkynyl group is substituted C2-10 alkynyl.


“Carbocyclyl” or “carbocyclic” refers to a radical of a non-aromatic cyclic hydrocarbon group having from 3 to 10 ring carbon atoms (“C3-10 carbocyclyl”) and zero heteroatoms in the non-aromatic ring system. In some embodiments, a carbocyclyl group has 3 to 8 ring carbon atoms (“C3-8 carbocyclyl”). In some embodiments, a carbocyclyl group has 3 to 6 ring carbon atoms (“C3-6 carbocyclyl”). In some embodiments, a carbocyclyl group has 3 to 6 ring carbon atoms (“C3-6 carbocyclyl”). In some embodiments, a carbocyclyl group has 5 to 10 ring carbon atoms (“C5-10 carbocyclyl”). Exemplary C3-6 carbocyclyl groups include, without limitation, cyclopropyl (C3), cyclopropenyl (C3), cyclobutyl (C4), cyclobutenyl (C4), cyclopentyl (C5), cyclopentenyl (C5), cyclohexyl (C6), cyclohexenyl (C6), cyclohexadienyl (C6), and the like. Exemplary C3-8 carbocyclyl groups include, without limitation, the aforementioned C3-6 carbocyclyl groups as well as cycloheptyl (C7), cycloheptenyl (C7), cycloheptadienyl (C7), cycloheptatrienyl (C7), cyclooctyl (C8), cyclooctenyl (C8), bicyclo[2.2.1]heptanyl (C7), bicyclo[2.2.2]octanyl (C8), and the like. Exemplary C3-10 carbocyclyl groups include, without limitation, the aforementioned C3-8 carbocyclyl groups as well as cyclononyl (C9), cyclononenyl (C9), cyclodecyl (C10), cyclodecenyl (C10), octahydro-1H-indenyl (C9), decahydronaphthalenyl (C10), spiro[4.5]decanyl (C10), and the like. As the foregoing examples illustrate, in certain embodiments, the carbocyclyl group is either monocyclic (“monocyclic carbocyclyl”) or contain a fused, bridged or spiro ring system such as a bicyclic system (“bicyclic carbocyclyl”) and can be saturated or can be partially unsaturated. “Carbocyclyl” also includes ring systems wherein the carbocyclic ring, as defined above, is fused with one or more aryl or heteroaryl groups wherein the point of attachment is on the carbocyclic ring, and in such instances, the number of carbons continue to designate the number of carbons in the carbocyclic ring system. Unless otherwise specified, each instance of a carbocyclyl group is independently optionally substituted, i.e., unsubstituted (an “unsubstituted carbocyclyl”) or substituted (a “substituted carbocyclyl”) with one or more substituents. In certain embodiments, the carbocyclyl group is unsubstituted C3-10 carbocyclyl. In certain embodiments, the carbocyclyl group is a substituted C3-10 carbocyclyl.


In some embodiments, “carbocyclyl” is a monocyclic, saturated carbocyclyl group having from 3 to 10 ring carbon atoms (“C3-10 cycloalkyl”). In some embodiments, a cycloalkyl group has 3 to 8 ring carbon atoms (“C3-8 cycloalkyl”). In some embodiments, a cycloalkyl group has 3 to 6 ring carbon atoms (“C3-6 cycloalkyl”). In some embodiments, a cycloalkyl group has 5 to 6 ring carbon atoms (“C5-6 cycloalkyl”). In some embodiments, a cycloalkyl group has 5 to 10 ring carbon atoms (“C5-10 cycloalkyl”). Examples of C5-6 cycloalkyl groups include cyclopentyl (C5) and cyclohexyl (C5). Examples of C3-6 cycloalkyl groups include the aforementioned C5-6 cycloalkyl groups as well as cyclopropyl (C3) and cyclobutyl (C4). Examples of C3-8 cycloalkyl groups include the aforementioned C3-6 cycloalkyl groups as well as cycloheptyl (C7) and cyclooctyl (C8). Unless otherwise specified, each instance of a cycloalkyl group is independently unsubstituted (an “unsubstituted cycloalkyl”) or substituted (a “substituted cycloalkyl”) with one or more substituents. In certain embodiments, the cycloalkyl group is unsubstituted C3-10 cycloalkyl. In certain embodiments, the cycloalkyl group is substituted C3-10 cycloalkyl.


“Aryl” refers to a radical of a monocyclic or polycyclic (e.g., bicyclic or tricyclic) 4n+2 aromatic ring system (e.g., having 6, 10, or 14 pi electrons shared in a cyclic array) having 6-14 ring carbon atoms and zero heteroatoms provided in the aromatic ring system (“C6-14 aryl”). In some embodiments, an aryl group has six ring carbon atoms (“C6 aryl”; e.g., phenyl). In some embodiments, an aryl group has ten ring carbon atoms (“C10 aryl”; e.g., naphthyl such as 1-naphthyl and 2-naphthyl). In some embodiments, an aryl group has fourteen ring carbon atoms (“C14 aryl”; e.g., anthracyl). “Aryl” also includes ring systems wherein the aryl ring, as defined above, is fused with one or more carbocyclyl or heterocyclyl groups wherein the radical or point of attachment is on the aryl ring, and in such instances, the number of carbon atoms continue to designate the number of carbon atoms in the aryl ring system. Unless otherwise specified, each instance of an aryl group is independently optionally substituted, i.e., unsubstituted (an “unsubstituted aryl”) or substituted (a “substituted aryl”) with one or more substituents. In certain embodiments, the aryl group is unsubstituted C6-14 aryl. In certain embodiments, the aryl group is substituted C6-14 aryl.


“Aralkyl” is a subset of alkyl and aryl and refers to an optionally substituted alkyl group substituted by an optionally substituted aryl group. In certain embodiments, the aralkyl is optionally substituted benzyl. In certain embodiments, the aralkyl is benzyl. In certain embodiments, the aralkyl is optionally substituted phenethyl. In certain embodiments, the aralkyl is phenethyl. In certain embodiments, the aralkyl is 7-phenylheptanyl. In certain embodiments, the aralkyl is C7 alkyl substituted by an optionally substituted aryl group (e.g., phenyl). In certain embodiments, the aralkyl is a C7-C10 alkyl group substituted by an optionally substituted aryl group (e.g., phenyl).


“Partially unsaturated” refers to a group that includes at least one double or triple bond. A “partially unsaturated” ring system is further intended to encompass rings having multiple sites of unsaturation but is not intended to include aromatic groups (e.g., aryl or heteroaryl groups) as defined in this application. Likewise, “saturated” refers to a group that does not contain a double or triple bond, i.e., contains all single bonds.


The term “optionally substituted” means substituted or unsubstituted.


Alkyl, alkenyl, alkynyl, carbocyclyl, heterocyclyl, aryl, and heteroaryl groups are optionally substituted (e.g., “substituted” or “unsubstituted” alkyl, “substituted” or “unsubstituted” alkenyl, “substituted” or “unsubstituted” alkynyl, “substituted” or “unsubstituted” carbocyclyl, “substituted” or “unsubstituted” heterocyclyl, “substituted” or “unsubstituted” aryl or “substituted” or “unsubstituted” heteroaryl group). In general, the term “substituted,” whether preceded by the term “optionally” or not, means that at least one hydrogen present on a group (e.g., a carbon or nitrogen atom) is replaced with a permissible substituent, e.g., a substituent which upon substitution results in a stable compound, e.g., a compound which does not spontaneously undergo transformation such as by rearrangement, cyclization, elimination, or other reaction. Unless otherwise indicated, a “substituted” group has a substituent at one or more substitutable positions of the group, and when more than one position in any given structure is substituted, the substituent is either the same or different at each position. The term “substituted” is contemplated to include substitution with all permissible substituents of organic compounds, any of the substituents described in this application that results in the formation of a stable compound. The present invention contemplates any and all such combinations in order to arrive at a stable compound. For purposes of this invention, heteroatoms such as nitrogen may have hydrogen substituents and/or any suitable substituent as described in this application which satisfy the valencies of the heteroatoms and results in the formation of a stable moiety.


Exemplary carbon atom substituents include, but are not limited to, halogen, —CN, —NO2, —N3, —SO2H, —SO3H, —OH, —ORaa, —ON(Rbb)2, —N(Rbb)2, —N(Rbb)3+X, —N(ORcc)Rbb, —SH, —SRaa, —SSRcc, —C(═O)Raa, —CO2H, —CHO, —C(ORcc)2, —CO2Raa, —OC(═O)Raa, —OCO2Raa, —C(═O)N(Rbb)2, —OC(═O)N(Rbb)2, —NRbbC(═O)Raa, —NRbbCO2Raa, —NbbC(═O)N(Rbb)2, —C(═NRbb)Raa, —C(═NRbb)ORaa, —OC(═NRbb)Raa, —OC(═NRbb)ORaa, —C(═NRbb)N(Rbb)2, —OC(═NRbb)N(Rbb)2, —NRbbC(═NRbb)N(Rbb)2, —C(═O)NRbbSO2Raa, —NRbbSO2Raa, SO2N(Rbb)2, —SO2Raa, —SO2ORaa, —OSO2Raa, —S(═O)Raa, —OS(═O)Raa, —Si(Raa)3, —OSi(Raa)3—C(═S)N(Rbb)2, —C(═O)SRaa, —C(═S)SRaa, —SC(═S)SRaa, —SC(═O)SRaa, —OC(═O)SRaa, —SC(═O)ORaa, —SC(═O)Raa, —P(═O)(Raa)2, —P(═O)(ORcc)2, —OP(═O)(Raa)2, OP(═O)(ORcc)2, —P(═O)(N(Rbb)2)2, —OP(═O)(N(Rbb)2)2, —NRbbP(═O)(Raa)2, —NRbbP(═O)(ORcc)2, —NRbbP(═O)(N(Rbb)2)2, —P(Rcc)2, —P(ORcc)2, —P(Rcc)3+X, —P(ORcc)3+X, —P(Rcc)4, —P(ORcc)4, —OP(Rcc)2, —OP(Rcc)3+X, —OP(ORcc)2, —OP(ORcc)3+X, —OP(Rcc)4, —OP(ORcc)4, —B(Raa)2, —B(ORcc)2, —BRaa(ORcc), C1-10 alkyl, C1-10 perhaloalkyl, C2-10 alkenyl, C2-10 alkynyl, heteroC1-10 alkyl, heteroC2-10 alkenyl, heteroC2-10 alkynyl, C3-10 carbocyclyl, 3-14 membered heterocyclyl, C6-14 aryl, and 5-14 membered heteroaryl;


wherein:


each instance of Raa is, independently, selected from C1-10 alkyl, C1-10 perhaloalkyl, C2-10 alkenyl, C2-10 alkynyl, heteroC1-10 alkyl, heteroC2-10alkenyl, heteroC2-10alkynyl, C3-10 carbocyclyl, 3-14 membered heterocyclyl, C6-14 aryl, and 5-14 membered heteroaryl, or two Raa groups are joined to form a 3-14 membered heterocyclyl or 5-14 membered heteroaryl ring, wherein each alkyl, alkenyl, alkynyl, heteroalkyl, heteroalkenyl, heteroalkynyl, carbocyclyl, heterocyclyl, aryl, and heteroaryl is independently substituted with 0, 1, 2, 3, 4, or 5 Rdd groups; each instance of Rbb is, independently, selected from hydrogen, —OH, —ORaa, —N(Rcc)2, —CN, —C(═O)Raa, —C(═O)N(Rcc)2, —CO2Raa, —SO2Raa, —C(═NRcc)ORaa, —C(═NRcc)N(Rcc)2, —SO2N(Rcc)2, —SO2Rcc, —SO2ORcc, —SORaa, —C(═S)N(Rcc)2, —C(═O)SRcc, —C(═S)SRcc, —P(═O)(Raa)2, —P(═O)(ORcc)2, —P(═O)(N(Rcc)2)2, C1-10 alkyl, C1-10 perhaloalkyl, C2-10 alkenyl, C2-10 alkynyl, heteroC1-10alkyl, heteroC2-10alkenyl, heteroC2-10alkynyl, C3-10 carbocyclyl, 3-14 membered heterocyclyl, C6-14 aryl, and 5-14 membered heteroaryl, or two Rbb groups are joined to form a 3-14 membered heterocyclyl or 5-14 membered heteroaryl ring, wherein each alkyl, alkenyl, alkynyl, heteroalkyl, heteroalkenyl, heteroalkynyl, carbocyclyl, heterocyclyl, aryl, and heteroaryl is independently substituted with 0, 1, 2, 3, 4, or 5 Rdd groups; wherein X is a counterion;


each instance of Rcc is, independently, selected from hydrogen, C1-10 alkyl, C1-10 perhaloalkyl, C2-10 alkenyl, C2-10 alkynyl, heteroC1-10 alkyl, heteroC2-10 alkenyl, heteroC2-10 alkynyl, C3-10 carbocyclyl, 3-14 membered heterocyclyl, C6-14 aryl, and 5-14 membered heteroaryl, or two Rcc groups are joined to form a 3-14 membered heterocyclyl or 5-14 membered heteroaryl ring, wherein each alkyl, alkenyl, alkynyl, heteroalkyl, heteroalkenyl, heteroalkynyl, carbocyclyl, heterocyclyl, aryl, and heteroaryl is independently substituted with 0, 1, 2, 3, 4, or 5 Rdd groups;


each instance of Rdd is, independently, selected from halogen, —CN, —NO2, —N3, —SO2H, —SO3H, —OH, —ORee, —ON(Rff)2, —N(Rff)2, —N(Rff)3+X, —N(ORee)Rff, —SH, —SRee, —SSRee, —C(═O)Ree, —CO2H, —CO2Ree, —OC(═O)Ree, —OCO2Ree, —C(═O)N(Rff)2, —OC(═O)N(Rff)2, —NRffC(═O)Ree, —NRffCO2Ree, —NRffC(═O)N(Rff)2, —C(═NRff)ORee, —OC(═NRff)Ree, —OC(═NRff)ORee, —C(═NRff)N(Rff)2, —OC(═NRff)N(Rff)2, —NRffC(═NRff)N(Rff)2, —NRffSO2Ree, —SO2N(Rff)2, —SO2Ree, —SO2ORee, —OSO2Ree, —S(═O)Ree, —Si(Ree)3, —OSi(Ree)3, —C(═S)N(Rff)2, —C(═O)SRee, —C(═S)SRee, —SC(═S)SRee, —P(═O)(ORee)2, —P(═O)(Ree)2, —OP(═O)(Ree)2, —OP(═O)(ORee)2, C1-6 alkyl, C1-6 perhaloalkyl, C2-6 alkenyl, C2-6 alkynyl, heteroC1-6alkyl, heteroC2-6alkenyl, heteroC2-6alkynyl, C3-10 carbocyclyl, 3-10 membered heterocyclyl, C6-10 aryl, 5-10 membered heteroaryl, wherein each alkyl, alkenyl, alkynyl, heteroalkyl, heteroalkenyl, heteroalkynyl, carbocyclyl, heterocyclyl, aryl, and heteroaryl is independently substituted with 0, 1, 2, 3, 4, or 5 Rgg groups, or two geminal Rdd substituents can be joined to form ═O or ═S; wherein X is a counterion;


each instance of Ree is, independently, selected from C1-6 alkyl, C1-6 perhaloalkyl, C2-6 alkenyl, C2-6 alkynyl, heteroC1-6 alkyl, heteroC2-6alkenyl, heteroC2-6 alkynyl, C3-10 carbocyclyl, C6-10 aryl, 3-10 membered heterocyclyl, and 3-10 membered heteroaryl, wherein each alkyl, alkenyl, alkynyl, heteroalkyl, heteroalkenyl, heteroalkynyl, carbocyclyl, heterocyclyl, aryl, and heteroaryl is independently substituted with 0, 1, 2, 3, 4, or 5 Rgg groups;


each instance of Rff is, independently, selected from hydrogen, C1-6 alkyl, C1-6 perhaloalkyl, C2-6 alkenyl, C2-6 alkynyl, heteroC1-6alkyl, heteroC2-6alkenyl, heteroC2-6alkynyl, C3-10 carbocyclyl, 3-10 membered heterocyclyl, C6-10 aryl and 5-10 membered heteroaryl, or two Rff groups are joined to form a 3-10 membered heterocyclyl or 5-10 membered heteroaryl ring, wherein each alkyl, alkenyl, alkynyl, heteroalkyl, heteroalkenyl, heteroalkynyl, carbocyclyl, heterocyclyl, aryl, and heteroaryl is independently substituted with 0, 1, 2, 3, 4, or 5 Rgg groups; and


each instance of Rgg is, independently, halogen, —CN, —NO2, —N3, —SO2H, —SO3H, —OH, —OC1-6 alkyl, —ON(C1-6 alkyl)2, —N(C1-6 alkyl)2, —N(C1-6 alkyl)3X, —NH(C1-6 alkyl)2+X, —NH2(C1-6 alkyl)+X, —NH3+X, —N(OC1-6 alkyl)(C1-6 alkyl), —N(OH)(C1-6 alkyl), —NH(OH), —SH, —SC1-6 alkyl, —SS(C1-6 alkyl), —C(═O)(C1-6 alkyl), —CO2H, —CO2(C1-6 alkyl), —OC(═O)(C1-6 alkyl), —OCO2(C1-6 alkyl), —C(═ONH)NH2, —C(═O)(C1-6 alkyl)2, —OC(═O)NH(C1-6 alkyl), —NHC(═O)(C1-6 alkyl), —N(C1-6 alkyl)C(═O)(C1-6 alkyl), —NHCO2(C1-6 alkyl), —NHC(═O)N(C1-6 alkyl)2, —NHC(═O)NH(C1-6 alkyl), —NHC(═O)NH2, —C(═NH)O(C1-6 alkyl), —OC(═NH)(C1-6 alkyl), —OC(═NH)OC1-6 alkyl, —C(═NH)N(C1-6 alkyl)2, —C(═NH)NH(C1-6 alkyl), —C(═NH)NH2, —OC(═NH)N(C1-6 alkyl)2, —OC(NH)NH(C1-6 alkyl), —OC(NH)NH2, —NHC(NH)N(C1-6 alkyl)2, —NHC(═NH)NH2, —NHSO2(C1-6 alkyl), —SO2N(C1-6 alkyl)2, —SO2NH(C1-6 alkyl), —SO2NH2, —SO2C1-6 alkyl, —SO2OC1-6 alkyl, —OSO2C1-6 alkyl, —SOC1-6 alkyl, —Si(C1-6 alkyl)3, —OSi(C1-6 alkyl)3-C(═S)N(C1-6 alkyl)2, C(═S)NH(C1-6 alkyl), C(═S)NH2, —C(═O)S(C1-6 alkyl), —C(═S)SC1-6 alkyl, —SC(═S)SC1-6 alkyl, —P(═O)(OC1-6 alkyl)2, —P(═O)(C1-6 alkyl)2, —OP(═O)(C1-6 alkyl)2, —OP(═O)(OC1-6 alkyl)2, C1-6 alkyl, C1-6 perhaloalkyl, C2-6 alkenyl, C2-6 alkynyl, heteroC1-6alkyl, heteroC2-6alkenyl, heteroC2-6alkynyl, C3-10 carbocyclyl, C6-10 aryl, 3-10 membered heterocyclyl, 5-10 membered heteroaryl; or two geminal Rgg substituents can be joined to form ═O or ═S; wherein X is a counterion. Alternatively, two geminal hydrogens on a carbon atom are replaced with the group ═O, ═S, ═NN(Rbb)2, —NNRbbC(═O)Raa, —NNRbbC(═O)ORaa, —NNRbbS(═O)2Raa, —NRbb, or ═NORcc; wherein each alkyl, alkenyl, alkynyl, heteroalkyl, heteroalkenyl, heteroalkynyl, carbocyclyl, heterocyclyl, aryl, and heteroaryl is independently substituted with 0, 1, 2, 3, 4, or 5 Rdd groups; wherein X is a counterion;


wherein:


each instance of Raa is, independently, selected from C1-10 alkyl, C1-10 perhaloalkyl, C2-10 alkenyl, C2-10 alkynyl, heteroC1-10 alkyl, heteroC2-10alkenyl, heteroC2-10alkynyl, C3-10 carbocyclyl, 3-14 membered heterocyclyl, C6-14 aryl, and 5-14 membered heteroaryl, or two Raa groups are joined to form a 3-14 membered heterocyclyl or 5-14 membered heteroaryl ring, wherein each alkyl, alkenyl, alkynyl, heteroalkyl, heteroalkenyl, heteroalkynyl, carbocyclyl, heterocyclyl, aryl, and heteroaryl is independently substituted with 0, 1, 2, 3, 4, or 5 Rdd groups;


each instance of Rbb is, independently, selected from hydrogen, —OH, —ORaa, —N(Rcc)2, —CN, —C(═O)Raa, —C(═O)N(Rcc), —CO2Raa, —SO2Raa, —C(═NRcc)ORaa, —C(═NRcc)N(Rcc)2, —SO2N(Rcc)2, —SO2Rcc, —SO2ORcc, —SORaa, —C(═S)N(Rcc)2, —C(═O)SRcc, —C(═S)SRcc, —P(═O)(Raa)2, —P(═O)(ORcc)2, —P(═O)(N(Rcc)2)2, C1-10 alkyl, C1-10 perhaloalkyl, C2-10 alkenyl, C2-10 alkynyl, heteroC1-10alkyl, heteroC2-10alkenyl, heteroC2-10alkynyl, C3-10 carbocyclyl, 3-14 membered heterocyclyl, C6-14 aryl, and 5-14 membered heteroaryl, or two Rbb groups are joined to form a 3-14 membered heterocyclyl or 5-14 membered heteroaryl ring, wherein each alkyl, alkenyl, alkynyl, heteroalkyl, heteroalkenyl, heteroalkynyl, carbocyclyl, heterocyclyl, aryl, and heteroaryl is independently substituted with 0, 1, 2, 3, 4, or 5 Rdd groups; wherein X is a counterion;


each instance of Rcc is, independently, selected from hydrogen, C1-10 alkyl, C1-10 perhaloalkyl, C2-10 alkenyl, C2-10 alkynyl, heteroC1-10 alkyl, heteroC2-10 alkenyl, heteroC2-10 alkynyl, C3-10 carbocyclyl, 3-14 membered heterocyclyl, C6-14 aryl, and 5-14 membered heteroaryl, or two Rcc groups are joined to form a 3-14 membered heterocyclyl or 5-14 membered heteroaryl ring, wherein each alkyl, alkenyl, alkynyl, heteroalkyl, heteroalkenyl, heteroalkynyl, carbocyclyl, heterocyclyl, aryl, and heteroaryl is independently substituted with 0, 1, 2, 3, 4, or 5 Rdd groups;


each instance of Rdd is, independently, selected from halogen, —CN, —NO2, —N3, —SO2H, —SO3H, —OH, —ORee, —ON(Rff)2, —N(Rff)2, —N(R)3+X+, —N(ORee)Rff, —SH, —SRee, —SSRee, —C(═O)Ree, —CO2H, —CO2Ree, —OC(═O)Ree, —OCO2Ree, —C(═O)N(Rff)2, —OC(═O)N(Rff)2, —NRffC(═O)Ree, —NRffCO2Ree, —NRffC(═O)N(Rff)2, —C(═NRff)ORee, —OC(═NRff)Ree, —OC(═NRff)ORee, —C(═NRff)N(Rff)2, —OC(═NRff)N(Rff)2, —NRffC(═NRff)N(Rff)2, —NRffSO2Ree, —SO2N(Rff)2, —SO2Ree, —SO2ORee, —OSO2Ree, —S(═O)Ree, —Si(Ree)3, —OSi(Ree)3, —C(═S)N(Rff)2, —C(═O)SRee, —C(═S)SRee, —SC(═S)SRee, —P(═O)(ORee)2, —P(═O)(Ree)2, —OP(═O)(Ree), —OP(═O)(ORee)2, C1-6 alkyl, C1-6 perhaloalkyl, C2-6 alkenyl, C2-6 alkynyl, heteroC1-6alkyl, heteroC2-6alkenyl, heteroC2-6alkynyl, C3-10 carbocyclyl, 3-10 membered heterocyclyl, C6-10 aryl, 5-10 membered heteroaryl, wherein each alkyl, alkenyl, alkynyl, heteroalkyl, heteroalkenyl, heteroalkynyl, carbocyclyl, heterocyclyl, aryl, and heteroaryl is independently substituted with 0, 1, 2, 3, 4, or 5 Rgg groups, or two geminal Rdd substituents can be joined to form ═O or ═S; wherein X is a counterion;


each instance of Ree is, independently, selected from C1-6 alkyl, C1-6 perhaloalkyl, C2-6 alkenyl, C2-6 alkynyl, heteroC1-6 alkyl, heteroC2-6alkenyl, heteroC2-6 alkynyl, C3-10 carbocyclyl, C6-10 aryl, 3-10 membered heterocyclyl, and 3-10 membered heteroaryl, wherein each alkyl, alkenyl, alkynyl, heteroalkyl, heteroalkenyl, heteroalkynyl, carbocyclyl, heterocyclyl, aryl, and heteroaryl is independently substituted with 0, 1, 2, 3, 4, or 5 Rgg groups;


each instance of Rff is, independently, selected from hydrogen, C1-6 alkyl, C1-6 perhaloalkyl, C2-6 alkenyl, C2-6 alkynyl, heteroC1-6alkyl, heteroC2-6alkenyl, heteroC2-6alkynyl, C3-10 carbocyclyl, 3-10 membered heterocyclyl, C6-10 aryl and 5-10 membered heteroaryl, or two Rf groups are joined to form a 3-10 membered heterocyclyl or 5-10 membered heteroaryl ring, wherein each alkyl, alkenyl, alkynyl, heteroalkyl, heteroalkenyl, heteroalkynyl, carbocyclyl, heterocyclyl, aryl, and heteroaryl is independently substituted with 0, 1, 2, 3, 4, or 5 Rgg groups; and


each instance of Rgg is, independently, halogen, —CN, —NO2, —N3, —SO2H, —SO3H, —OH, —OC1-6 alkyl, —ON(C1-6 alkyl)2, —N(C1-6 alkyl)2, —N(C1-6 alkyl)3+X, —NH(C1-6 alkyl)2+X, —NH2(C1-6 alkyl)+X, —NH3+X, —N(OC1-6 alkyl)(C1-6 alky, —NN(OH)(C1-6 alkyl), —NH(OH), —SH, —SC1-6 alkyl, —SS(C1-6 alkyl), —C(═O)(C1-6 alkyl)2H, CO2, —CO2(C1-6 alkyl), —OC(═O)(C1-6 alkyl), —OCO2(C1-6 alkyl), —C(═O)NH2, —C(═O)N(C1-6 alkyl)2, —OC(═C)N(C1-6 alkyl)), —NHC(═O)(C1-6 alkyl), —N(C1-6 alkyl)C(═O)(C1-6 alkyl), —NHCO2(C1-6 alkyl), —NHC(═O)N(C1-6 alkyl)2, —NHC(═O)NH(C1-6 alkyl), —NHC(═O)NH2, —C(═NH)O(C1-6 alkyl), —OC(═NH)(C1-6 alkyl), —OC(═NH)OC1-6 alkyl, —C(═NH)N(C1-6 alkyl)2, —C(═NH)NH(C1-6 alkyl), —C(═NH)NH2, —OC(═NH)N(C1-6 alkyl)2, —OC(NH)NH(C1-6 alkyl), —OC(NH)NH2, —NHC(NH)N(C1-6 alkyl)2, —NHC(═NH)NH2, —NHSO2(C1-6 alkyl), —SO2N(C1-6 alkyl)2, —SO2NH(C1-6 alkyl), —SO2NH2, —SO2C1-6 alkyl, —SO2OC1-6 alkyl, —OSO2C1-6 alkyl, —SOC1-6 alkyl, —Si(C1-6 alkyl)3, —OSi(C1-6 alkyl)3-C(═S)N(C1-6 alkyl)2, C(═S)NH(C1-6 alkyl), C(═S)NH2, —C(═O)S(C1-6 alkyl), —C(═S)SC1-6 alkyl, —SC(═S)SC1-6 alkyl, —P(═O)(OC1-6 alkyl)2, —P(═O)(C1-6 alkyl)2, —OP(═O)(C1-6 alkyl)2, —OP(═O)(OC1-6 alkyl)2, C1-6 alkyl, C1-6 perhaloalkyl, C2-6 alkenyl, C2-6 alkynyl, heteroC1-6alkyl, heteroC2-6alkenyl, heteroC2-6alkynyl, C3-10 carbocyclyl, C6-10 aryl, 3-10 membered heterocyclyl, 5-10 membered heteroaryl; or two geminal Rgg substituents can be joined to form ═O or ═S; wherein X is a counterion.


A “counterion” or “anionic counterion” is a negatively charged group associated with a positively charged group in order to maintain electronic neutrality. An anionic counterion may be monovalent (i.e., including one formal negative charge). An anionic counterion may also be multivalent (i.e., including more than one formal negative charge), such as divalent or trivalent. Exemplary counterions include halide ions (e.g., F, Cl, Br, I), NO3, ClO4, OH, H2PO4, HCO3, HSO4, sulfonate ions (e.g., methansulfonate, trifluoromethanesulfonate, p-toluenesulfonate, benzenesulfonate, 10-camphor sulfonate, naphthalene-2-sulfonate, naphthalene-1-sulfonic acid-5-sulfonate, ethan-1-sulfonic acid-2-sulfonate, and the like), carboxylate ions (e.g., acetate, propanoate, benzoate, glycerate, lactate, tartrate, glycolate, gluconate, and the like), BF4, PF4, PF6, AsF6, SbF6, B[3,5-(CF3)2C6H3]4], B(C6F5)4, BPh4, Al(OC(CF3)3)4, and carborane anions (e.g., CB11H12 or (HCB11Me5Br6)). Exemplary counterions which may be multivalent include CO32−, HPO42−, PO43−, B4O72−, SO42−, S2O32−, carboxylate anions (e.g., tartrate, citrate, fumarate, maleate, malate, malonate, gluconate, succinate, glutarate, adipate, pimelate, suberate, azelate, sebacate, salicylate, phthalates, aspartate, glutamate, and the like), and carboranes.


The term “pharmaceutically acceptable salt” refers to those salts which are, within the scope of sound medical judgment, suitable for use in contact with the tissues of humans and lower animals without undue toxicity, irritation, allergic response and the like, and are commensurate with a reasonable benefit/risk ratio. Pharmaceutically acceptable salts are well known in the art. For example, Berge et al., describe pharmaceutically acceptable salts in detail in J. Pharmaceutical Sciences, 1977, 66, 1-19, incorporated by reference. Pharmaceutically acceptable salts of the compounds disclosed in this application include those derived from suitable inorganic and organic acids and bases. Examples of pharmaceutically acceptable, nontoxic acid addition salts are salts of an amino group formed with inorganic acids such as hydrochloric acid, hydrobromic acid, phosphoric acid, sulfuric acid, and perchloric acid or with organic acids such as acetic acid, oxalic acid, maleic acid, tartaric acid, citric acid, succinic acid, or malonic acid or by using other methods known in the art such as ion exchange. Other pharmaceutically acceptable salts include adipate, alginate, ascorbate, aspartate, benzenesulfonate, benzoate, bisulfate, borate, butyrate, camphorate, camphorsulfonate, citrate, cyclopentanepropionate, digluconate, dodecylsulfate, ethanesulfonate, formate, fumarate, glucoheptonate, glycerophosphate, gluconate, hemisulfate, heptanoate, hexanoate, hydroiodide, 2-hydroxy-ethanesulfonate, lactobionate, lactate, laurate, lauryl sulfate, malate, maleate, malonate, methanesulfonate, 2-naphthalenesulfonate, nicotinate, nitrate, oleate, oxalate, palmitate, pamoate, pectinate, persulfate, 3-phenylpropionate, phosphate, picrate, pivalate, propionate, stearate, succinate, sulfate, tartrate, thiocyanate, p-toluenesulfonate, undecanoate, valerate salts, and the like. Salts derived from appropriate bases include alkali metal, alkaline earth metal, ammonium and N+(C1-4 alkyl)4 salts. Representative alkali or alkaline earth metal salts include sodium, lithium, potassium, calcium, magnesium, and the like. Further pharmaceutically acceptable salts include, when appropriate, nontoxic ammonium, quaternary ammonium, and amine cations formed using counterions such as halide, hydroxide, carboxylate, sulfate, phosphate, nitrate, lower alkyl sulfonate, and aryl sulfonate.


The term “solvate” refers to forms of a compound that are associated with a solvent, usually by a solvolysis reaction. This physical association may include hydrogen bonding. Conventional solvents include water, methanol, ethanol, acetic acid, DMSO, THF, diethyl ether, and the like. The compounds of Formula (1), (9), (10), and (11) may be prepared, e.g., in crystalline form, and may be solvated. Suitable solvates include pharmaceutically acceptable solvates and further include both stoichiometric solvates and non-stoichiometric solvates. In certain instances, the solvate will be capable of isolation, for example, when one or more solvent molecules are incorporated in the crystal lattice of a crystalline solid. “Solvate” encompasses both solution-phase and isolable solvates. Representative solvates include hydrates, ethanolates, and methanolates.


The term “hydrate” refers to a compound that is associated with water. Typically, the number of the water molecules contained in a hydrate of a compound is in a definite ratio to the number of the compound molecules in the hydrate. Therefore, a hydrate of a compound may be represented, for example, by the general formula R.x.H2O, wherein R is the compound and wherein x is a number greater than 0. A given compound may form more than one type of hydrates, including, e.g., monohydrates (x is 1), lower hydrates (x is a number greater than 0 and smaller than 1, e.g., hemihydrates (R.0.5 H2O)), and polyhydrates (x is a number greater than 1, e.g., dihydrates (R.2 H2O) and hexahydrates (R.6 H2O)).


The term “tautomers” refer to compounds that are interchangeable forms of a particular compound structure, and that vary in the displacement of hydrogen atoms and electrons. Thus, two structures may be in equilibrium through the movement of π electrons and an atom (usually H). For example, enols and ketones are tautomers because they are rapidly interconverted by treatment with either acid or base. Another example of tautomerism is the aci- and nitro-forms of phenylnitromethane, which are likewise formed by treatment with acid or base. Tautomeric forms may be relevant to the attainment of the optimal chemical reactivity and biological activity of a compound of interest.


It is also to be understood that compounds that have the same molecular formula but differ in the nature or sequence of bonding of their atoms or the arrangement of their atoms in space are termed “isomers.” Isomers that differ in the arrangement of their atoms in space are termed “stereoisomers.”


Stereoisomers that are not mirror images of one another are termed “diastereomers” and those that are non-superimposable mirror images of each other are termed “enantiomers.” When a compound has an asymmetric center, for example, it is bonded to four different groups, a pair of enantiomers is possible. An enantiomer can be characterized by the absolute configuration of its asymmetric center and described by the R- and S-sequencing rules of Cahn and Prelog. An enantiomer can also be characterized by the manner in which the molecule rotates the plane of polarized light, and designated as dextrorotatory or levorotatory (i.e., as (+) or (−)-isomers respectively). A chiral compound can exist as either an individual enantiomer or as a mixture of enantiomers. A mixture containing equal proportions of the enantiomers is called a “racemic mixture.”


The term “co-crystal” refers to a crystalline structure comprising at least two different components (e.g., a compound described in this application and an acid), wherein each of the components is independently an atom, ion, or molecule. In certain embodiments, none of the components is a solvent. In certain embodiments, at least one of the components is a solvent. A co-crystal of a compound and an acid is different from a salt formed from a compound and the acid. In the salt, a compound described in this application is complexed with the acid in a way that proton transfer (e.g., a complete proton transfer) from the acid to a compound described in this application easily occurs at room temperature. In the co-crystal, however, a compound described in this application is complexed with the acid in a way that proton transfer from the acid to a compound described in this application does not easily occur at room temperature. In certain embodiments, in the co-crystal, there is no proton transfer from the acid to a compound described in this application. In certain embodiments, in the co-crystal, there is partial proton transfer from the acid to a compound described in this application. Co-crystals may be useful to improve the properties (e.g., solubility, stability, and ease of formulation) of a compound described in this application.


The term “polymorphs” refers to a crystalline form of a compound (or a salt, hydrate, or solvate thereof) in a particular crystal packing arrangement. All polymorphs of the same compound have the same elemental composition. Different crystalline forms usually have different X-ray diffraction patterns, infrared spectra, melting points, density, hardness, crystal shape, optical and electrical properties, stability, and solubility. Recrystallization solvent, rate of crystallization, storage temperature, and other factors may cause one crystal form to dominate. Various polymorphs of a compound can be prepared by crystallization under different conditions.


The term “prodrug” refers to compounds, including derivatives of the compounds of Formula (X), (8), (9), (10), or (11), that have cleavable groups and become by solvolysis or under physiological conditions the compounds of Formula (X), (8), (9), (10), or (11) and that are pharmaceutically active in vivo. The prodrugs may have attributes such as, without limitation, solubility, bioavailability, tissue compatibility, or delayed release in a mammalian organism. Examples include, but are not limited to, derivatives of compounds described in this application, including derivatives formed from glycosylation of the compounds described in this application (e.g., glycoside derivatives), carrier-linked prodrugs (e.g., ester derivatives), bioprecursor prodrugs (a prodrug metabolized by molecular modification into the active compound), and the like. Non-limiting examples of glycoside derivatives are disclosed in and incorporated by reference from WO2018208875 and US20190078168. Non-limiting examples of ester derivatives are disclosed in and incorporated by reference from US20170362195.


Other derivatives of the compounds of this invention have activity in both their acid and acid derivative forms, but the acid sensitive form often offers advantages of solubility, bioavailability, tissue compatibility, or delayed release in a mammalian organism (see, Bundgard, H., Design of Prodrugs, pp. 7-9, 21-24, Elsevier, Amsterdam 1985). Prodrugs include acid derivatives well known to practitioners of the art, such as, for example, esters prepared by reaction of the parent acid with a suitable alcohol, or amides prepared by reaction of the parent acid compound with a substituted or unsubstituted amine, or acid anhydrides, or mixed anhydrides. Simple aliphatic or aromatic esters, amides, and anhydrides derived from acidic groups pendant on the compounds of this invention are particular prodrugs. In some cases it is desirable to prepare double ester type prodrugs such as (acyloxy)alkyl esters or ((alkoxycarbonyl)oxy)alkylesters. C1-C8 alkyl, C2-C8 alkenyl, C2-C8 alkynyl, aryl, C7-C12 substituted aryl, and C7-C12 arylalkyl esters of the compounds of Formula (X), (8), (9), (10), or (11) may be preferred.


Cannabinoids

As used in this application, the term “cannabinoid” includes compounds of Formula (X):




embedded image


or a pharmaceutically acceptable salt, co-crystal, tautomer, stereoisomer, solvate, hydrate, polymorph, isotopically enriched derivative, or prodrug thereof, wherein R1 is hydrogen, optionally substituted acyl, optionally substituted alkyl, optionally substituted alkenyl, optionally substituted alkynyl, optionally substituted carbocyclyl, or optionally substituted aryl; R2 and R6 are, independently, hydrogen or carboxyl; R3 and R5 are, independently, hydroxyl, halogen, or alkoxy; and R4 is a hydrogen or an optionally substituted prenyl moiety; or optionally R4 and R3 are taken together with their intervening atoms to form a cyclic moiety, or optionally R4 and R5 are taken together with their intervening atoms to form a cyclic moiety, or optionally both 1) R4 and R3 are taken together with their intervening atoms to form a cyclic moiety and 2) R4 and R5 are taken together with their intervening atoms to form a cyclic moiety. In certain embodiments, R4 and R3 are taken together with their intervening atoms to form a cyclic moiety. In certain embodiments, R4 and R5 are taken together with their intervening atoms to form a cyclic moiety. In certain embodiments, “cannabinoid” refers to a compound of Formula (X), or a pharmaceutically acceptable salt thereof. In certain embodiments, both 1) R4 and R3 are taken together with their intervening atoms to form a cyclic moiety and 2) R4 and R5 are taken together with their intervening atoms to form a cyclic moiety.


In some embodiments, cannabinoids may be synthesized via the following steps: a) one or more reactions to incorporate three additional ketone moieties onto an acyl-CoA scaffold, where the acyl moiety in the acyl-CoA scaffold comprises between four and fourteen carbons; b) a reaction cyclizing the product of step (a); and c) a reaction to incorporate a prenyl moiety to the product of step (b) or a derivative of the product of step (b). In some embodiments, non-limiting examples of the acyl-CoA scaffold described in step (a) include hexanoyl-CoA and butyryl-CoA. In some embodiments, non-limiting examples of the product of step (b) or a derivative of the product of step (b) include olivetolic acid and divarinic acid.


In some embodiments, a cannabinoid compound of Formula (X) is of Formula (X-A), (X-B), or (X-C):




embedded image


or a pharmaceutically acceptable salt, solvate, hydrate, polymorph, co-crystal, tautomer, stereoisomer, isotopically labeled derivative, or prodrug thereof, wherein custom-character is a double bond or a single bond, as valency permits;


R is hydrogen, optionally substituted acyl, optionally substituted alkyl, optionally substituted alkenyl, optionally substituted alkynyl, optionally substituted carbocyclyl, or optionally substituted aryl;


RZ1 is hydrogen, optionally substituted acyl, optionally substituted alkyl, optionally substituted alkenyl, optionally substituted alkynyl, optionally substituted carbocyclyl, or optionally substituted aryl;


RZ2 is hydrogen, optionally substituted acyl, optionally substituted alkyl, optionally substituted alkenyl, optionally substituted alkynyl, optionally substituted carbocyclyl, or optionally substituted aryl;


or optionally, RZ1 and RZ2 are taken together with their intervening atoms to form an optionally substituted carbocyclic ring;


R3A is hydrogen, optionally substituted acyl, optionally substituted alkyl, optionally substituted alkenyl, or optionally substituted alkynyl;


R3B is hydrogen, optionally substituted acyl, optionally substituted alkyl, optionally substituted alkenyl, or optionally substituted alkynyl;


RY is hydrogen, optionally substituted acyl, optionally substituted alkyl, optionally substituted alkenyl, or optionally substituted alkynyl;


RZ is hydrogen, optionally substituted acyl, optionally substituted alkyl, optionally substituted alkenyl, or optionally substituted alkynyl.


In certain embodiments, a cannabinoid compound is of Formula (X-A):




embedded image


wherein custom-character is a double bond, and each of RZ1 and RZ2 is hydrogen, one of R3A and R3B is optionally substituted C2-6 alkenyl, and the other one of R3A and R3B is optionally substituted C2-6 alkyl. In some embodiments, a cannabinoid compound of Formula (X) is of Formula (X-A), wherein each of RZ1 and RZ2 is hydrogen, one of R3A and R3B is a prenyl group, and the other one of R3A and R3B is optionally substituted methyl.


In certain embodiments, a cannabinoid compound of Formula (X) of Formula (X-A) is of Formula (11-z):




embedded image


wherein custom-character is a double bond or single bond, as valency permits; one of R3A and R3B is C1-6 alkyl optionally substituted with alkenyl, and the other of R3A and R3B is optionally substituted C1-6 alkyl. In certain embodiments, in a compound of Formula (11-z), custom-character is a single bond; one of R3A and R3B is C1-6 alkyl optionally substituted with prenyl; and the other of one of R3A and R3B is unsubstituted methyl; and R is as described in this application. In certain embodiments, in a compound of Formula (11-z), custom-character is a single bond; one of R3A and R3B is




embedded image


and the other of one of R3A and R3B is unsubstituted methyl; and R is as described in this application. In certain embodiments, a cannabinoid compound of Formula (11-z) is of Formula (11a):




text missing or illegible when filed


In certain embodiments, a cannabinoid compound of Formula (X-A) is of Formula (10-z):




embedded image


wherein custom-character is a double bond or single bond, as valency permits; RY is hydrogen, optionally substituted acyl, optionally substituted alkyl, optionally substituted alkenyl, or optionally substituted alkynyl; and each of R3A and R3B is independently optionally substituted C1-6 alkyl. In certain embodiments, in a compound of Formula (10-z), custom-character is a single bond; each of R3A and R3B is unsubstituted methyl, and R is as described in this application. In certain embodiments, a cannabinoid compound of Formula (10-z) is of Formula (10a):




embedded image


In certain embodiments, a compound of Formula (10a)




embedded image


has a chiral atom labeled with * at carbon 10 and a chiral atom labeled with ** at carbon 6. In certain embodiments, in a compound of Formula (10a)




embedded image


the chiral atom labeled with * at carbon 10 is of the R-configuration or S-configuration; and a chiral atom labeled with ** at carbon 6 is of the R-configuration. In certain embodiments, in a compound of Formula (10a)




embedded image


the chiral atom labeled with * at carbon 10 is of the S-configuration; and a chiral atom labeled with ** at carbon 6 is of the R-configuration or S-configuration. In certain embodiments, in a compound of Formula (10a)




embedded image


the chiral atom labeled with * at carbon 10 is of the R-configuration and a chiral atom labeled with ** at carbon 6 is of the R-configuration. In certain embodiments, a compound of Formula (10a)




embedded image


is of the formula:




embedded image


In certain embodiments, in a compound of Formula (10a)




embedded image


the chiral atom labeled with * at carbon 10 is of the S-configuration and a chiral atom labeled with ** at carbon 6 is of the S-configuration. In certain embodiments, a compound of Formula (10a)




embedded image


is of the formula:




embedded image


In certain embodiments, a cannabinoid compound is of Formula (X-B):




embedded image


wherein custom-character is a double bond; R is hydrogen, optionally substituted acyl, optionally substituted alkyl, optionally substituted alkenyl, or optionally substituted alkynyl; and each of R3A and R3B is independently optionally substituted C1-6 alkyl. In certain embodiments, in a compound of Formula (X-B), R is optionally substituted C1-6 alkyl; one of R3A and R3B is custom-character; and the other one of R3A and R3B is unsubstituted methyl, and R is as described in this application. In certain embodiments, a compound of Formula (X-B) is of Formula (9a):




embedded image


In certain embodiments, a compound of Formula (9a)




embedded image


has a chiral atom labeled with * at carbon 3 and a chiral atom labeled with ** at carbon 4. In certain embodiments, in a compound of Formula (9a)




embedded image


the chiral atom labeled with * at carbon 3 is of the R-configuration or S-configuration; and a chiral atom labeled with ** at carbon 4 is of the R-configuration. In certain embodiments, in a compound of Formula (9a)




embedded image


the chiral atom labeled with * at carbon 3 is of the S-configuration; and a chiral atom labeled with ** at carbon 4 is of the R-configuration or S-configuration. In certain embodiments, in a compound of Formula (9a)




embedded image


the chiral atom labeled with * at carbon 3 is of the R-configuration and a chiral atom labeled with ** at carbon 4 is of the R-configuration. In certain embodiments, a compound of Formula (9a)




embedded image


is of the formula:




embedded image


In certain embodiments, in a compound of Formula (9a)




embedded image


the chiral atom labeled with * at carbon 3 is of the S-configuration and a chiral atom labeled with ** at carbon 4 is of the S-configuration. In certain embodiments, a compound of Formula (9a)




embedded image


is of the formula:




embedded image


In certain embodiments, a cannabinoid compound is of Formula (X-C):




embedded image


wherein RZ is optionally substituted alkyl or optionally substituted alkenyl. In certain embodiments, a compound of Formula (X-C) is of formula:




embedded image


wherein a is 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10. In certain embodiments, a is 1. In certain embodiments, a is 2. In certain embodiments, a is 3. In certain embodiments, a is 1, 2, or 3 for a compound of Formula (X-C). In certain embodiments, a cannabinoid compound is of Formula (X-C), and a is 1, 2, 3, 4, or 5. In certain embodiments, a compound of Formula (X-C) is of Formula (8a): (8a).




embedded image


In some embodiments, cannabinoids of the present disclosure comprise cannabinoid receptor ligands. Cannabinoid receptors are a class of cell membrane receptors in the G protein-coupled receptor superfamily. Cannabinoid receptors include the CB1 receptor and the CB2 receptor. In some embodiments, cannabinoid receptors comprise GPR18, GPR55, and PPAR. (See Bram et al. “Activation of GPR18 by cannabinoid compounds: a tale of biased agonism” Br J Pharmcol v171 (16) (2014); Shi et al. “The novel cannabinoid receptor GPR55 mediates anxiolytic-like effects in the medial orbital cortex of mice with acute stress” Molecular Brain 10, No. 38 (2017); and O'Sullvan, Elizabeth. “An update on PPAR activation by cannabinoids” Br J Pharmcol v. 173(12) (2016)).


In some embodiments, cannabinoids comprise endocannabinoids, which are substances produced within the body, and phytocannabinoids, which are cannabinoids that are naturally produced by plants of genus Cannabis. In some embodiments, phytocannabinoids comprise the acidic and decarboxylated acid forms of the naturally-occurring plant-derived cannabinoids, and their synthetic and biosynthetic equivalents.


Over 94 phytocannabinoids have been identified to date (Berman, Paula, et al. “A new ESI-LC/MS approach for comprehensive metabolic profiling of phytocannabinoids in Cannabis.” Scientific reports 8.1 (2018): 14280; El-Alfy et al., 2010, “Antidepressant-like effect of delta-9-tetrahydrocannabinol and other cannabinoids isolated from Cannabis sativa L”, Pharmacology Biochemistry and Behavior 95 (4): 434-42; Rudolf Brenneisen, 2007, Chemistry and Analysis of Phytocannabinoids, each of which is incorporated by reference in this application in its entirety). In some embodiments, cannabinoids comprise Δ9-tetrahydrocannabinol (THC) type (e.g., (−)-trans-delta-9-tetrahydrocannabinol or dronabinol, (+)-trans-delta-9-tetrahydrocannabinol, (−)-cis-delta-9-tetrahydrocannabinol, or (+)-cis-delta-9-tetrahydrocannabinol), cannabidiol (CBD) type, cannabigerol (CBG) type, cannabichromene (CBC) type, cannabicyclol (CBL) type, cannabinodiol (CBND) type, or cannabitriol (CBT) type cannabinoids, or any combination thereof (see, e.g., R Pertwee, ed, Handbook of Cannabis (Oxford, UK: Oxford University Press, 2014)), which is incorporated by reference in this application in its entirety). A non-limiting list of cannabinoids comprises: cannabiorcol-C1 (CBNO), CBND-C1 (CBNDO), Δ9-trans-Tetrahydrocannabiorcolic acid-C1 (Δ9-THCO), Cannabidiorcol-C1 (CBDO), Cannabiorchromene-C1 (CBCO), (−)-Δ8-trans-(6aR,10aR)-Tetrahydrocannabiorcol-C1 (Δ8-THCO), Cannabiorcyclol C1 (CBLO), CBG-C1 (CBGO), Cannabinol-C2 (CBN-C2), CBND-C2, Δ9-THC-C2, CBD-C2, CBC-C2, Δ8-THC-C2, CBL-C2, Bisnor-cannabielsoin-C1 (CBEO), CBG-C2, Cannabivarin-C3 (CBNV), Cannabinodivarin-C3 (CBNDV), (−)-Δ9-trans-Tetrahydrocannabivarin-C3 (Δ9-THCV), (−)-Cannabidivarin-C3 (CBDV), (±)-Cannabichromevarin-C3 (CBCV), (−)-Δ8-trans-THC-C3 (Δ8-THCV), (±)-(1aS,3aR,8bR,8cR)-Cannabicyclovarin-C3 (CBLV), 2-Methyl-2-(4-methyl-2-pentenyl)-7-propyl-2H-1-benzopyran-5-ol, Δ7-tetrahydrocannabivarin-C3 (Δ7-THCV), CBE-C2, Cannabigerovarin-C3 (CBGV), Cannabitriol-C1 (CBTO), Cannabinol-C4 (CBN-C4), CBND-C4, (−)-Δ9-trans-Tetrahydrocannabinol-C4 (Δ9-THC-C4), Cannabidiol-C4 (CBD-C4), CBC-C4, (−)-trans-Δ8-THC-C4, CBL-C4, Cannabielsoin-C3 (CBEV), CBG-C4, CBT-C2, Cannabichromanone-C3, Cannabiglendol-C3 (OH-iso-HHCV-C3), Cannabioxepane-C5 (CBX), Dehydrocannabifuran-C5 (DCBF), Cannabinol-C5 (CBN), Cannabinodiol-C5 (CBND), (−)-Δ9-trans-Tetrahydrocannabinol-C5 (Δ9-THC), (−)-Δ8-trans-(6aR,10aR)-Tetrahydrocannabinol-C5 (Δ8-THC), (±)-Cannabichromene-C5 (CBC), (−)-Cannabidiol-C5 (CBD), (±)-(1aS,3aR,8bR,8cR)-Cannabicyclol-C5 (CBL), Cannabicitran-C5 (CBR), (−)-Δ9-(6aS,10aR-cis)-Tetrahydrocannabinol-C5 ((−)-cis-Δ9-THC), (−)-Δ7-trans-(1R,3R,6R)-Isotetrahydrocannabinol-C5 (trans-isoΔ7-THC), CBE-C4, Cannabigerol-C5 (CBG), Cannabitriol-C3 (CBTV), Cannabinol methyl ether-C5 (CBNM), CBNDM-C5, 8-OH-CBN-C5 (OH-CBN), OH-CBND-C5 (OH-CBND), 10-Oxo-Δ6a(10a)-Tetrahydrocannabinol-C5 (OTHC), Cannabichromanone D-C5, Cannabicoumaronone-C5 (CBCON-C5), Cannabidiol monomethyl ether-C5 (CBDM), Δ9-THCM-C5, (±)-3″-hydroxy-Δ4″-cannabichromene-C5, (5aS,6S,9R,9aR)-Cannabielsoin-C5 (CBE), 2-geranyl-5-hydroxy-3-n-pentyl-1,4-benzoquinone-C5, 5-geranyl olivetolic acid, 5-geranyl olivetolate, 8α-Hydroxy-Δ9-Tetrahydrocannabinol-C5 (8α-OH-Δ9-THC), 8β-Hydroxy-Δ9-Tetrahydrocannabinol-C5 (8β-OH-Δ9-THC), 10α-Hydroxy-Δ8-Tetrahydrocannabinol-C5 (10α-OH-Δ8-THC), 10β-Hydroxy-Δ8-Tetrahydrocannabinol-C5 (10β-OH-Δ8-THC), 10α-hydroxy-Δ9,11-hexahydrocannabinol-C5, 9β,10β-Epoxyhexahydrocannabinol-C5, OH-CBD-C5 (OH-CBD), Cannabigerol monomethyl ether-C5 (CBGM), Cannabichromanone-C5, CBT-C4, (±)-6,7-cis-epoxycannabigerol-C5, (±)-6,7-trans-epoxycannabigerol-C5, (−)-7-hydroxycannabichromane-C5, Cannabimovone-C5, (−)-trans-Cannabitriol-C5 ((−)-trans-CBT), (+)-trans-Cannabitriol-C5 ((+)-trans-CBT), (±)-cis-Cannabitriol-C5 ((±)-cis-CBT), (−)-trans-1-Ethoxy-9-hydroxy-Δ6a(10a)-tetrahydrocannabivarin-C3 [(−)-trans-CBT-OEt], (−)-(6aR,9S,10S,10aR)-9,10-Dihydroxyhexahydrocannabinol-C5 [(−)-Cannabiripsol] (CBR), Cannabichromanone C-C5, (−)-6a,7,10a-Trihydroxy-Δ9-tetrahydrocannabinol-C5 [(−)-Cannabitetrol] (CBTT), Cannabichromanone B-C5, 8,9-Dihydroxy-Δ6a(10a)-tetrahydrocannabinol-C5 (8,9-Di-OHCBT), (±)-4-acetoxycannabichromene-C5, 2-acetoxy-6-geranyl-3-n-pentyl-1,4-benzoquinone-C5, 11-Acetoxy-Δ 9-TetrahydrocannabinolC5 (11-OAc-Δ 9-THC), 5-acetyl-4-hydroxycannabigerol-C5, 4-acetoxy-2-geranyl-5-hydroxy-3-npentylphenol-C5, (−)-trans-10-Ethoxy-9-hydroxy-Δ6a(10a)-tetrahydrocannabinol-C5 ((−)-trans-CBTOEt), sesquicannabigerol-C5 (SesquiCBG), carmagerol-C5, 4-terpenyl cannabinolate-C5, β-fenchyl-Δ9-tetrahydrocannabinolate-C5, α-fenchyl-Δ9-tetrahydrocannabinolate-C5, epi-bornyl-Δ9-tetrahydrocannabinolate-C5, bornyl-Δ9-tetrahydrocannabinolate-C5, α-terpenyl-Δ9-tetrahydrocannabinolate-C5, 4-terpenyl-Δ9-tetrahydrocannabinolate-C5, 6,6,9-trimethyl-3-pentyl-6H-dibenzo[b,d]pyran-1-ol, 3-(1,1-dimethylheptyl)-6,6a,7,8,10,10a-hexahydro-1-hydroxy-6,6-dimethyl-9H-dibenzo[b,d]pyran-9-one, (−)-(3S,4S)-7-hydroxy-A-tetrahydrocannabinol-1,1-dimethylheptyl, (+)-(3S,4S)-7-hydroxy-Δ6-tetrahydrocannabinol-1,1-dimethylheptyl, l1-hydroxy-Δ9-tetrahydrocannabinol, and Δ8-tetrahydrocannabinol-11-oic acid)); certain piperidine analogs (e g., (−)-(6S,6aR,9R,10aR)-5,6,6a,7,8,9,10,10a-octahydro-6-methyl-3-[(R)-1-methyl-4-phenylbutoxy]-1,9-phenanthridinediol 1-acetate)), certain aminoalkylindole analogs (e.g., (R)-(+)-[2,3-dihydro-5-methyl-3-(4-morpholinylmethyl)-pyrrolo[1,2,3-de]-1,4-benzoxazin-6-yl]-1-naphthalenyl-methanone), certain open pyran ring analogs (e.g., 2-[3-methyl-6-(1-methylethenyl)-2-cyclohexen-1-yl]-5-pentyl-1,3-benzenediol and 4-(1,1-dimethylheptyl)-2,3′-dihydroxy-6′alpha-(3-hydroxypropyl)-1′,2′,3′,4′,5′,6′-hexahydrobiphenyl, tetrahydrocannabiphorol (THCP), cannabidiphorol (CBDP), CBGP, CBCP, their acidic forms, salts of the acidic forms, or any combination thereof.


A cannabinoid described in this application can be a rare cannabinoid. For example, in some embodiments, a cannabinoid described in this application corresponds to a cannabinoid that is naturally produced in conventional Cannabis varieties at concentrations of less than 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1%, 0.9%, 0.8%, 0.7%, 0.6% 0.5%, 0.25%, or 0.1% by dry weight of the female flower. In some embodiments, rare cannabinoids include CBGA, CBGVA, THCVA, CBDVA, CBCVA, and CBCA. In some embodiments, rare cannabinoids are cannabinoids that are not THCA, THC, CBDA or CBD.


A cannabinoid described in this application can also be a non-rare cannabinoid.


In some embodiments, the cannabinoid is selected from the cannabinoids listed in Table 1.









TABLE 1





Non-limiting examples of cannabinoids according to the present disclosure.









embedded image







Δ9-Tetrahydrocannabinol


Δ9-THC-C5







embedded image







Δ9-Tetrahydrocannabinol-C4


Δ9-THC-C4







embedded image







Δ9-Tetrahydrocannabivarin


Δ9-THCV-C3







embedded image







Δ9-Tetrahydrocannabiorcol


Δ9-THCO-C1







embedded image







(−)-(6aS,10aR)-Δ9-


Tetrahydrocannabinol


(−)-cis-Δ9-THC-C5







embedded image







Δ9-Tetrahydro-


cannabinolic acid A


Δ9-THCA-C5 A







embedded image







Δ9-Tetrahydro-


cannabinolic acid B


Δ9-THCA-C5 B







embedded image







Δ9-Tetrahydro-


cannabinolic acid-C4


A and/or B


Δ9-THCA-C4 A and/or B







embedded image







Δ9-Tetrahydro-


cannabivarinic acid A


Δ9-THCVA-C3 A







embedded image







Δ9-Tetrahydro-


cannabiorcolic acid


A and/or B


Δ9-THCOA-C1 A


and/or B







embedded image







(−)-Δ8-trans-


(6aR,10aR)-


Δ8-


Tetrahydrocannabinol


Δ8-THC-C5







embedded image







(−)-Δ8-trans-


(6aR,10aR)-


Tetrahydrocannabinolic







embedded image







(−)-Cannabidiol


CBD-C5







embedded image







Cannabidiol


momomethyl ether


CBDM-C5







embedded image







Cannabidiol-C4


CBD-C4







embedded image







Cannabidiolic acid


CBDA-C5







embedded image







Cannabidivarinic acid


CBDVA-C3







embedded image







(−)-Cannabidivarin


CBDV-C3







embedded image







Cannabidiorcol


CBD-C1







embedded image







Cannabigerolic acid


A


(E)-CBGA-C5 A







embedded image







Cannabigerol


(E)-CBG-C5







embedded image







Cannabigerol


monomethyl ether


(E)-CBGM-C5 A







embedded image







Cannabinerolic acid A


(Z)-CBGA-C5 A







embedded image







Cannabigerovarin


(E)-CBGV-C3







embedded image







Cannabigerol


(E)-CBG-C5







embedded image







Cannabigerolic acid A


(E)-CBGA-C5 A







embedded image







Cannabigerolic acid A


monomethyl ether


(E)-CBGAM-C5 A







embedded image







Cannabigerovarinic acid


A


(E)-CBGVA-C3 A







embedded image







Cannabinolic acid A


CBNA-C5 A







embedded image







Cannabinol methyl


ether


CBNM-C5







embedded image







Cannabinol


CBN-C5







embedded image







Cannabinol-C4


CBN-C4







embedded image







Cannabivarin


CBN-C3







embedded image







Cannabinol-C2


CBN-C2







embedded image







Cannabiorcol


CBN-C1







embedded image







(±)-


Cannabichromene


CBC-C5







embedded image







(±)-Cannabichromenic


acid A


CBCA-C5 A







embedded image







(±)-


Cannabivarichromene,


(±)-


Cannabichromevarin


CBCV-C3







embedded image







(±)-


Cannabichromevarinic


acid A


CBCVA-C3 A







embedded image







(±)-


Cannabichromene


CBC-C5







embedded image







(±)-


(1aS,3aR,8bR,8cR)-


Cannabicyclol


CBL-C5







embedded image







(±)-(1aS,3aR,8bR,8cR)-


Cannabicyclolic acid A


CBLA-C5 A







embedded image







(±)-(1aS,3aR,8bR,8cR)-


Cannabicyclovarin


CBLV-C3







embedded image







(−)-(9R,10R)-trans-


10-O-Ethyl-


cannabitriol


(−)-trans-CBT-OEt-C5







embedded image







(±)-(9R,10R/9S,10S)-


Cannabitriol-C3


(±)-trans-CBT-C3







embedded image







(−)-(9R,10R)-trans-


Cannabitriol


(−)-trans-CBT-C5







embedded image







(+)-(9S,10S)-


Cannabitriol


(+)-trans-CBT-C5







embedded image







(±)-(9R,10S/9S,10R)-


Cannabitriol


(±)-cis-CBT-C5







embedded image







(−)-6a,7,10a-


Trihydroxy-


Δ9-


tetrahydrocannabinol


(−)-Cannabitetrol







embedded image







10-Oxo-Δ6a(10a)-


tetrahydrocannabinol


OTHC







embedded image







8,9-Dihydroxy-


Δ6a(10a)-


tetrahydrocannabinol


8,9-Di-OH-CBT-C5







embedded image







Cannabidiolic acid A


cannabitriol ester


CBDA-C5 9-OH-CBT-C5


ester







embedded image







(−)-(6aR,9S,10S,10aR)-


9,10-Dihydroxy-


hexahydrocannabinol,


Cannabiripsol


Cannabiripsol-C5







embedded image







(5aS,6S,9R,9aR)-


Cannabielsoic acid B


CBEA-C5 B







embedded image







(5aS,6S,9R,9aR)-


C3-Cannabielsoic


acid B


CBEA-C3 B







embedded image







(5aS,6S,9R,9aR)-


Cannabielsoin


CBE-C5







embedded image







(5aS,6S,9R,9aR)-


C3-Cannabielsoin


CBE-C3







embedded image







(5aS,6S,9R,9aR)-


Cannabielsoic acid A


CBEA-C5 A







embedded image







Cannabiglendol-C3


OH-iso-HHCV-C3







embedded image







Dehydrocannabifuran


DCBF-C5







embedded image







Cannabifuran


CBF-C5







embedded image







Cannabidiphorol


(CBDP)







embedded image







Tetrahydrocannabiphorol


(THCP)







text missing or illegible when filed








Biosynthesis of Cannabinoids and Cannabinoid Precursors

Aspects of the present disclosure provide tools, sequences, and methods for the biosynthetic production of cannabinoids in host cells. In some embodiments, the present disclosure teaches expression of enzymes that are capable of producing cannabinoids by biosynthesis.


As a non-limiting example, one or more of the enzymes depicted in FIG. 2 may be used to produce a cannabinoid or cannabinoid precursor of interest. FIG. 1 shows a cannabinoid biosynthesis pathway for the most abundant phytocannabinoids found in Cannabis. See also, de Meijer et al. I, II, III, and IV (I: 2003, Genetics, 163:335-346; II: 2005, Euphytica, 145:189-198; III: 2009, Euphytica, 165:293-311; and IV: 2009, Euphytica, 168:95-112), and Carvalho et al. “Designing Microorganisms for Heterologous Biosynthesis of Cannabinoids” (2017) FEMS Yeast Research June 1; 17(4), each of which is in this application incorporated by reference in its entirety for all purposes.


It should be appreciated that a precursor substrate for use in cannabinoid biosynthesis is generally selected based on the cannabinoid of interest. Non-limiting examples of cannabinoid precursors include compounds of Formulae 1-8 in FIG. 2. In some embodiments, polyketides, including compounds of Formula 5, could be prenylated. In certain embodiments, the precursor is a precursor compound shown in FIG. 1, 2, or 3. Substrates containing 1-40 carbon atoms are preferred. In some embodiments, substrates containing 3-8 carbon atoms are most preferred.


As used in this application, a cannabinoid or a cannabinoid precursor may comprise an R group. See, e.g., FIG. 2. In some embodiments, R may be a hydrogen. In certain embodiments, R is optionally substituted alkyl. In certain embodiments, R is optionally substituted C1-40 alkyl. In certain embodiments, R is optionally substituted C2-40 alkyl. In certain embodiments, R is optionally substituted C2-40 alkyl, which is straight chain or branched alkyl. In certain embodiments, R is optionally substituted C3-8 alkyl. In certain embodiments, R is optionally substituted C1-C40 alkyl, C1-C20 alkyl, C1-C10 alkyl, C1-C8 alkyl, C1-C5 alkyl, C3-C5 alkyl, C3 alkyl, or C5 alkyl. In certain embodiments, R is optionally substituted C1-C20 alkyl. In certain embodiments, R is optionally substituted C1-C10 alkyl. In certain embodiments, R is optionally substituted C1-C8 alkyl. In certain embodiments, R is optionally substituted C1-C5 alkyl. In certain embodiments, R is optionally substituted C1-C7 alkyl. In certain embodiments, R is optionally substituted C3-C5 alkyl. In certain embodiments, R is optionally substituted C3 alkyl. In certain embodiments, R is unsubstituted C3 alkyl. In certain embodiments, R is n-C3 alkyl. In certain embodiments, R is n-propyl. In certain embodiments, R is n-butyl. In certain embodiments, R is n-pentyl. In certain embodiments, R is n-hexyl. In certain embodiments, R is n-heptyl. In certain embodiments, R is of formula:




embedded image


In certain embodiments, R is optionally substituted C4 alkyl. In certain embodiments, R is unsubstituted C4 alkyl. In certain embodiments, R is optionally substituted C5 alkyl. In certain embodiments, R is unsubstituted C5 alkyl. In certain embodiments, R is optionally substituted C6 alkyl. In certain embodiments, R is unsubstituted C6 alkyl. In certain embodiments, R is optionally substituted C7 alkyl. In certain embodiments, R is unsubstituted C7 alkyl. In certain embodiments, R is of formula:




embedded image


In certain embodiments R is of formula:




embedded image


In certain embodiments, R is of formula:




embedded image


In certain embodiments R is of formula:




embedded image


In certain embodiments, R is of formula:




embedded image


In certain embodiments, R is optionally substituted n-propyl. In certain embodiments, R is n-propyl optionally substituted with optionally substituted aryl. In certain embodiments, R is n-propyl optionally substituted with optionally substituted phenyl. In certain embodiments, R is n-propyl substituted with unsubstituted phenyl. In certain embodiments, R is optionally substituted butyl. In certain embodiments, R is optionally substituted n-butyl. In certain embodiments, R is n-butyl optionally substituted with optionally substituted aryl. In certain embodiments, R is n-butyl optionally substituted with optionally substituted phenyl. In certain embodiments, R is n-butyl substituted with unsubstituted phenyl. In certain embodiments, R is optionally substituted pentyl. In certain embodiments, R is optionally substituted n-pentyl. In certain embodiments, R is n-pentyl optionally substituted with optionally substituted aryl. In certain embodiments, R is n-pentyl optionally substituted with optionally substituted phenyl. In certain embodiments, R is n-pentyl substituted with unsubstituted phenyl. In certain embodiments, R is optionally substituted hexyl. In certain embodiments, R is optionally substituted n-hexyl. In certain embodiments, R is optionally substituted n-heptyl. In certain embodiments, R is optionally substituted n-octyl. In certain embodiments, R is alkyl optionally substituted with aryl (e.g., phenyl). In certain embodiments, R is optionally substituted acyl (e.g., —C(═O)Me).


In certain embodiments, R is optionally substituted alkenyl (e.g., substituted or unsubstituted C2-s alkenyl). In certain embodiments, R is substituted or unsubstituted C2-6 alkenyl. In certain embodiments, R is substituted or unsubstituted C2-5 alkenyl. In certain embodiments, R is of formula:




embedded image


In certain embodiments, R is optionally substituted alkynyl (e.g., substituted or unsubstituted C2-6 alkynyl). In certain embodiments R is substituted or unsubstituted C2-6 alkynyl. In certain embodiments, R is of formula:




embedded image


In certain embodiments, R is optionally substituted carbocyclyl. In certain embodiments, R is optionally substituted aryl (e.g., phenyl or napthyl).


The chain length of a precursor substrate can be from C1-C40. Those substrates can have any degree and any kind of branching or saturation or chain structure, including, without limitation, aliphatic, alicyclic, and aromatic. In addition, they may include any functional groups including hydroxy, halogens, carbohydrates, phosphates, methyl-containing or nitrogen-containing functional groups.


For example, FIG. 3 shows a non-exclusive set of putative precursors for the cannabinoid pathway. Aliphatic carboxylic acids including four to eight total carbons (“C4”-“C8” in FIG. 3) and up to 10-12 total carbons with either linear or branched chains may be used as precursors for the heterologous pathway. Non-limiting examples include methanoic acid, butyric acid, pentanoic acid, hexanoic acid, heptanoic acid, isovaleric acid, octanoic acid, and decanoic acid. Additional precursors may include ethanoic acid and propanoic acid. In some embodiments, in addition to acids, the ester, salt, and acid forms may all be used as substrates. Substrates may have any degree and any kind of branching, saturation, and chain structure, including, without limitation, aliphatic, alicyclic, and aromatic. In addition, they may include any functional modifications or combination of modifications including, without limitation, halogenation, hydroxylation, amination, acylation, alkylation, phenylation, and/or installation of pendant carbohydrates, phosphates, sulfates, heterocycles, or lipids, or any other functional groups.


Substrates for any of the enzymes disclosed in this application may be provided exogenously or may be produced endogenously by a host cell. In some embodiments, the cannabinoids are produced from a glucose substrate, so that compounds of Formula 1 shown in FIG. 2 and CoA precursors are synthesized by the cell. In other embodiments, a precursor is fed into the reaction. In some embodiments, a precursor is a compound selected from Formulae 1-8 in FIG. 2.


Cannabinoids produced by methods disclosed in this application include rare cannabinoids. Due to the low concentrations at which rare cannabinoids occur in nature, producing industrially significant amounts of isolated or purified rare cannabinoids from the Cannabis plant may become prohibitive due to, e.g., the large volumes of Cannabis plants, and the large amounts of space, labor, time, and capital requirements to grow, harvest, and/or process the plant materials. The disclosure provided in this application represents a potentially efficient method for producing high yields of cannabinoids, including rare cannabinoids.


Cannabinoids produced by the disclosed methods also include non-rare cannabinoids. Without being bound by a particular theory, the methods described in this application may be advantageous compared with traditional plant-based methods for producing non-rare cannabinoids. For example, methods provided in this application represent potentially efficient means for producing consistent and high yields of non-rare cannabinoids. With traditional methods of cannabinoid production, in which cannabinoids are harvested from plants, maintaining consistent and uniform conditions, including airflow, nutrients, lighting, temperature, and humidity, can be difficult. For example, with plant-based methods, there can be microclimates created by branching, which can lead to inconsistent yields and by-product formation. In some embodiments, the methods described in this application are more efficient at producing a cannabinoid of interest as compared to harvesting cannabinoids from plants. For example, with plant-based methods, seed-to-harvest can take up to half a year, while cutting-to-harvest usually takes about 4 months. Additional steps including drying, curing, and extraction are also usually needed with plant-based methods. In contrast, in some embodiments, the fermentation-based methods described in this application only take about 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 days. In some embodiments, the fermentation-based methods described in this application only take about 3-5 days. In some embodiments, the fermentation-based methods described in this application only take about 5 days. In some embodiments, the methods provided in this application reduce the amount of security needed to comply with regulatory standards. For example, a smaller secured area may be needed to be monitored and secured to practice the methods described in this application as compared to the cultivation of plants. In some embodiments, the methods described in this application are advantageous over plant-sourced cannabinoids.


Cannabinoid Pathway Enzymes

Methods for production of cannabinoids and cannabinoid precursors can include expression of one or more of: an acyl activating anzyme (AAE); a polyketide synthase (PKS) (e.g., OLS); a polyketide cyclase (PKC); a prenyltransferase (PT) and a terminal synthase (TS).


Acyl Activating Enzyme (AAE)

A host cell described in this disclosure may comprise an AAE. As used in this disclosure, an AAE refers to an enzyme that is capable of catalyzing the esterification between a thiol and a substrate (e.g., optionally substituted aliphatic or aryl group) that has a carboxylic acid moiety. In some embodiments, an AAE is capable of using Formula (1):




embedded image


or a salt, solvate, hydrate, polymorph, co-crystal, tautomer, stereoisomer, isotopically labeled derivative thereof to produce a product of Formula (2):




embedded image


R is as defined in this application. In certain embodiments, R is hydrogen. In certain embodiments, R is optionally substituted alkyl. In certain embodiments, R is optionally substituted C1-40 alkyl. In certain embodiments, R is optionally substituted C2-40 alkyl. In certain embodiments, R is optionally substituted C2-40 alkyl, which is straight chain or branched alkyl. In certain embodiments, R is optionally substituted C2-10 alkyl, optionally substituted C10-C20 alkyl, optionally substituted C20-C30 alkyl, optionally substituted C30-C40 alkyl, or optionally substituted C40-C50 alkyl, which is straight chain or branched alkyl. In certain embodiments, R is optionally substituted C3-8 alkyl. In certain embodiments, R is optionally substituted C1-C40 alkyl, C1-C20 alkyl, C1-C10 alkyl, C1-C8 alkyl, C1-C5 alkyl, C3-C5 alkyl, C3 alkyl, or C5 alkyl. In certain embodiments, R is optionally substituted C1-C20 alkyl. In certain embodiments, R is optionally substituted C1-C20 branched alkyl. In certain embodiments, R is optionally substituted C1-C20 alkyl, optionally substituted C1-C10 alkyl, optionally substituted C10-C20 alkyl, optionally substituted C20-C30 alkyl, optionally substituted C30-C40 alkyl, or optionally substituted C40-C50 alkyl. In certain embodiments, R is optionally substituted C1-C10 alkyl. In certain embodiments, R is optionally substituted C3 alkyl. In certain embodiments, R is optionally substituted n-propyl. In certain embodiments, R is unsubstituted n-propyl. In certain embodiments, R is optionally substituted C1-C8 alkyl. In some embodiments, R is a C2-C6 alkyl. In certain embodiments, R is optionally substituted C1-C5 alkyl. In certain embodiments, R is optionally substituted C3-C5 alkyl. In certain embodiments, R is optionally substituted C3 alkyl. In certain embodiments, R is optionally substituted C5 alkyl. In certain embodiments, R is of formula:




embedded image


In certain embodiments, R is of formula:




embedded image


In certain embodiments, R is of formula:




embedded image


In certain embodiments, R is of formula:




embedded image


In certain embodiments, R is optionally substituted propyl. In certain embodiments, R is optionally substituted n-propyl. In certain embodiments, R is n-propyl optionally substituted with optionally substituted aryl. In certain embodiments, R is n-propyl optionally substituted with optionally substituted phenyl. In certain embodiments, R is n-propyl substituted with unsubstituted phenyl. In certain embodiments, R is optionally substituted butyl. In certain embodiments, R is optionally substituted n-butyl. In certain embodiments, R is n-butyl optionally substituted with optionally substituted aryl. In certain embodiments, R is n-butyl optionally substituted with optionally substituted phenyl. In certain embodiments, R is n-butyl substituted with unsubstituted phenyl. In certain embodiments, R is optionally substituted pentyl. In certain embodiments, R is optionally substituted n-pentyl. In certain embodiments, R is n-pentyl optionally substituted with optionally substituted aryl. In certain embodiments, R is n-pentyl optionally substituted with optionally substituted phenyl. In certain embodiments, R is n-pentyl substituted with unsubstituted phenyl. In certain embodiments, R is optionally substituted hexyl. In certain embodiments, R is optionally substituted n-hexyl. In certain embodiments, R is optionally substituted n-heptyl. In certain embodiments, R is optionally substituted n-octyl. In certain embodiments, R is alkyl optionally substituted with aryl (e.g., phenyl). In certain embodiments, R is optionally substituted acyl (e.g., —C(═O)Me).


In certain embodiments, R is optionally substituted alkenyl (e.g., substituted or unsubstituted C2-6 alkenyl). In certain embodiments, R is substituted or unsubstituted C2-6 alkenyl. In certain embodiments, R is substituted or unsubstituted C2-5 alkenyl. In certain embodiments, R is of formula:




embedded image


In certain embodiments, R is optionally substituted alkynyl (e.g., substituted or unsubstituted C2-6 alkynyl). In certain embodiments, R is substituted or unsubstituted C2-6 alkynyl. In certain embodiments, R is of formula:




embedded image


In certain embodiments, R is optionally substituted carbocyclyl. In certain embodiments, R is optionally substituted aryl (e.g., phenyl or napthyl).


In some embodiments, a substrate for an AAE is produced by fatty acid metabolism within a host cell. In some embodiments, a substrate for an AAE is provided exogenously.


In some embodiments, an AAE is capable of catalyzing the formation of hexanoyl-coenzyme A (hexanoyl-CoA) from hexanoic acid and coenzyme A (CoA). In some embodiments, an AAE is capable of catalyzing the formation of butanoyl-coenzyme A (butanoyl-CoA) from butanoic acid and coenzyme A (CoA).


As one of ordinary skill in the art would appreciate, an AAE could be obtained from any source, including naturally occurring sources and synthetic sources (e.g., a non-natually occurring AAE). In some embodiments, an AAE is a Cannabis enzyme. Non-limiting examples of AAEs include C. sativa hexanoyl-CoA synthetase 1 (CsHCS1) and C. sativa hexanoyl-CoA synthetase 2 (CsHCS2) as disclosed in U.S. Pat. No. 9,546,362, which is incorporated by reference in this application in its entirety.









CsHCS1 has the sequence:


(SEQ ID NO: 109)


MGKNYKSLDSVVASDFIALGITSEVAETLHGRLAEIVCNYGAATPQTWIN





IANHILSPDLPFSLHQMLFYGCYKDFGPAPPAWIPDPEKVKSTNLGALLE





KRGKEFLGVKYKDPISSFSHFQEFSVRNPEVYWRTVLMDEMKISFSKDPE





CILRRDDINNPGGSEWLPGGYLNSAKNCLNVNSNKKLNDTMIVWRDEGND





DLPLNKLTLDQLRKRVWLVGYALEEMGLEKGCAIAIDMPMHVDAVVIYLA





IVLAGYVVVSIADSFSAPEISTRLRLSKAKAIFTQDHIIRGKKRIPLYSR





VVEAKSPMAIVIPCSGSNIGAELRDGDISWDYFLERAKEFKNCEFTAREQ





PVDAYTNILFSSGTTGEPKAIPWTQATPLKAAADGWSHLDIRKGDVIVWP





TNLGWMMGPWLVYASLLNGASIALYNGSPLVSGFAKFVQDAKVTMLGVVP





SIVRSWKSTNCVSGYDWSTIRCFSSSGEASNVDEYLWLMGRANYKPVIEM





CGGTEIGGAFSAGSFLQAQSLSSFSSQCMGCTLYILDKNGYPMPKNKPGI





GELALGPVMFGASKTLLNGNHHDVYFKGMPTLNGEVLRRHGDIFELTSNG





YYHAHGRADDTMNIGGIKISSIEIERVCNEVDDRVFETTAIGVPPLGGGP





EQLVIFFVLKDSNDTTIDLNQLRLSFNLGLQKKLNPLFKVTRVVPLSSLP





RTATNKIMRRVLRQFSHFE.





CsHCS2 has the sequence:


(SEQ ID NO: 129)


MEKSGYGRDGIYRSLRPPLHLPNNNNLSMVSFLFRNSSSYPQKPALIDSE





TNQILSFSHFKSTVIKVSHGFLNLGIKKNDVVLIYAPNSIHFPVCFLGII





ASGAIATTSNPLYTVSELSKQVKDSNPKLIITVPQLLEKVKGFNLPTILI





GPDSEQESSSDKVMTFNDLVNLGGSSGSEFPIVDDFKQSDTAALLYSSGT





TGMSKGVVLTHKNFIASSLMVTMEQDLVGEMDNVFLCFLP1VIFHVFGLA





IITYAQLQRGNTVISMARFDLEKMLKDVEKYKVTHLWVVPPVILALSKNS





MVKKFNLSSIKYIGSGAAPLGKDLMEECSKVVPYGIVAQGYGMTETCGIV





SMEDIRGGKRNSGSAGMLASGVEAQIVSVDTLKPLPPNQLGEIWVKGPNM





MQGYFNNPQATKLTIDKKGWVHTGDLGYFDEDGHLYVVDRIKELIKYKGF





QVAPAELEGLLVSHPEILDAVVIPFPDAEAGEVPVAYVVRSPNSSLTEND





VKKFIAGQVASFKRLRKVTFINSVPKSASGKILRRELIQKVRSNM.






In some embodiments, an AAE comprises a sequence that is at least 5%, at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 71%, at least 72%, at least 73%, at least 74%, at least 75%, at least 76%, at least 77%, at least 78%, at least 79%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or is 100% identical, including all values in between, to a sequence (e.g., nucleic acid or amino acid sequence) set forth in SEQ ID NOs:63-69, 141-142, or 707-708.


In some embodiments, an AAE acts on multiple substrates, while in other embodiments, it exhibits substrate specificity. For example, in some embodiments, an AAE exhibits substrate specificity for one or more of hexanoic acid, butyric acid, isovaleric acid, octanoic acid, or decanoic acid. In other embodiments, an AAE exhibits activity on at least two of hexanoic acid, butyric acid, isovaleric acid, octanoic acid, and decanoic acid. AAE enzymes were identified herein that exhibited activity on butyrate and/or hexanoate (FIGS. 5 and 6). Activity on butyrate was unexpected in view of disclosure in Carvalho et al. “Designing Microorganisms for Heterologous Biosynthesis of Cannabinoids” (2017) FEMS Yeast Research June 1; 17(4).


In some embodiments, an AAE described herein comprises: N at a residue corresponding to position 90 in UniProtKB-Q6C577 (SEQ ID NO:64); A at a residue corresponding to position 100 in UniProtKB-Q6C577 (SEQ ID NO:64); G at a residue corresponding to position 105 in UniProtKB-Q6C577 (SEQ ID NO:64); E at a residue corresponding to position 162 in UniProtKB-Q6C577 (SEQ ID NO:64); Y at a residue corresponding to position 195 in UniProtKB-Q6C577 (SEQ ID NO:64); G at a residue corresponding to position 205 in UniProtKB-Q6C577 (SEQ ID NO:64); K at a residue corresponding to position 208 in UniProtKB-Q6C577 (SEQ ID NO:64); P at a residue corresponding to position 243 in UniProtKB-Q6C577 (SEQ ID NO:64); H at a residue corresponding to position 246 in UniProtKB-Q6C577 (SEQ ID NO:64); G at a residue corresponding to position 261 in UniProtKB-Q6C577 (SEQ ID NO:64); F at a residue corresponding to position 270 in UniProtKB-Q6C577 (SEQ ID NO:64); V at a residue corresponding to position 284 in UniProtKB-Q6C577 (SEQ ID NO:64); L at a residue corresponding to position 289 in UniProtKB-Q6C577 (SEQ ID NO:64); V at a residue corresponding to position 290 in UniProtKB-Q6C577 (SEQ ID NO:64); P at a residue corresponding to position 291 in UniProtKB-Q6C577 (SEQ ID NO:64); P at a residue corresponding to position 301 in UniProtKB-Q6C577 (SEQ ID NO:64); A at a residue corresponding to position 321 in UniProtKB-Q6C577 (SEQ ID NO:64); V at a residue corresponding to position 328 in UniProtKB-Q6C577 (SEQ ID NO:64); Y at a residue corresponding to position 356 in UniProtKB-Q6C577 (SEQ ID NO:64); G at a residue corresponding to position 381 in UniProtKB-Q6C577 (SEQ ID NO:64); I at a residue corresponding to position 391 in UniProtKB-Q6C577 (SEQ ID NO:64); P at a residue corresponding to position 400 in UniProtKB-Q6C577 (SEQ ID NO:64); G at a residue corresponding to position 423 in UniProtKB-Q6C577 (SEQ ID NO:64); Y at a residue corresponding to position 436 in UniProtKB-Q6C577 (SEQ ID NO:64); P at a residue corresponding to position 440 in UniProtKB-Q6C577 (SEQ ID NO:64); W at a residue corresponding to position 464 in UniProtKB-Q6C577 (SEQ ID NO:64); G at a residue corresponding to position 468 in UniProtKB-Q6C577 (SEQ ID NO:64); D at a residue corresponding to position 469 in UniProtKB-Q6C577 (SEQ ID NO:64); D at a residue corresponding to position 474 in UniProtKB-Q6C577 (SEQ ID NO:64); G at a residue corresponding to position 477 in UniProtKB-Q6C577 (SEQ ID NO:64); D at a residue corresponding to position 483 in UniProtKB-Q6C577 (SEQ ID NO:64); R at a residue corresponding to position 484 in UniProtKB-Q6C577 (SEQ ID NO:64); I at a residue corresponding to position 489 in UniProtKB-Q6C577 (SEQ ID NO:64); S at a residue corresponding to position 491 in UniProtKB-Q6C577 (SEQ ID NO:64); E at a residue corresponding to position 500 in UniProtKB-Q6C577 (SEQ ID NO:64); E at a residue corresponding to position 502 in UniProtKB-Q6C577 (SEQ ID NO:64); H at a residue corresponding to position 508 in UniProtKB-Q6C577 (SEQ ID NO:64); V at a residue corresponding to position 511 in UniProtKB-Q6C577 (SEQ ID NO:64); A at a residue corresponding to position 515 in UniProtKB-Q6C577 (SEQ ID NO:64); V at a residue corresponding to position 516 in UniProtKB-Q6C577 (SEQ ID NO:64); G at a residue corresponding to position 518 in UniProtKB-Q6C577 (SEQ ID NO:64); A at a residue corresponding to position 531 in UniProtKB-Q6C577 (SEQ ID NO:64); K at a residue corresponding to position 557 in UniProtKB-Q6C577 (SEQ ID NO:64); P at a residue corresponding to position 570 in UniProtKB-Q6C577 (SEQ ID NO:64); G at a residue corresponding to position 575 in UniProtKB-Q6C577 (SEQ ID NO:64); K at a residue corresponding to position 576 in UniProtKB-Q6C577 (SEQ ID NO:64); R at a residue corresponding to position 580 in UniProtKB-Q6C577 (SEQ ID NO:64); and/or L at a residue corresponding to position 582 in UniProtKB-Q6C577 (SEQ ID NO:64).


In some embodiments an AAE described herein comprises one or more of: an amino acid sequence set forth as SGAAPLG (SEQ ID NO: 114); an amino acid sequence set forth as AYLGMSSGTSGG (SEQ ID NO: 115); an amino acid sequence set forth as DQPA (SEQ ID NO: 116); an amino acid sequence set forth as QVAPAELE (SEQ ID NO: 117); an amino acid sequence set forth as VVID (SEQ ID NO: 118); and/or an amino acid sequence set forth as SGKILRRLLR (SEQ ID NO: 119).


In some embodiments an AAE described herein comprises: the amino acid sequence set forth as SGAAPLG (SEQ ID NO: 114) at residues corresponding to positions 319-325 in UniProtKB-Q6C577 (SEQ ID NO:64); the amino acid sequence set forth as AYLGMSSGTSGG (SEQ ID NO: 115) at residues corresponding to positions 194-205 in UniProtKB-Q6C577 (SEQ ID NO:64); the amino acid sequence set forth as DQPA (SEQ ID NO: 116) at residues corresponding to positions 398-401 in UniProtKB-Q6C577 (SEQ ID NO:64); the amino acid sequence set forth as QVAPAELE (SEQ ID NO: 117) at residues corresponding to positions 495-502 in UniProtKB-Q6C577 (SEQ ID NO:64); the amino acid sequence set forth as VVID (SEQ ID NO: 118) at residues corresponding to positions 564-567 in UniProtKB-Q6C577 (SEQ ID NO:64); and/or the amino acid sequence set forth as SGKILRRLLR (SEQ ID NO: 119) at residues corresponding to positions 574-583 in UniProtKB-Q6C577 (SEQ ID NO:64).


In some embodiments an AAE described herein comprises: an amino acid sequence with no more than three amino acid substitutions at residues corresponding to positions 428-440 in UniProtKB-Q6C577 (SEQ ID NO:64); or an amino acid sequence with no more than one amino acid substitution at residues corresponding to positions 482-491 in UniProtKB-Q6C577 (SEQ ID NO:64).


In some embodiments an AAE described herein comprises: I or V at a residue corresponding to position 432 in UniProtKB-Q6C577 (SEQ ID NO:64); S or D at a residue corresponding to position 434 in UniProtKB-Q6C577 (SEQ ID NO:64); K or N at a residue corresponding to position 438 in UniProtKB-Q6C577 (SEQ ID NO:64); and/or L or M at a residue corresponding to position 488 in UniProtKB-Q6C577 (SEQ ID NO:64).


In some embodiments an AAE described herein comprises: an amino acid sequence set forth as RGPQIMSGYHKNP (SEQ ID NO: 120); an amino acid sequence set forth as RGPQVMDGYHNNP (SEQ ID NO: 121); an amino acid sequence set forth as RGPQIMDGYHKNP (SEQ ID NO: 122); an amino acid sequence set forth as VDRTKELIKS (SEQ ID NO: 123); and/or an amino acid sequence set forth as VDRTKEMIKS (SEQ ID NO: 124).


A recombinant host cell that expresses a heterologous gene encoding an AAE described herein may be capable of producing at least 1% (e.g., at least 5%, at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 100%, at least 125%, at least 150%, at least 175%, at least 200%, at least 300%, at least 400%, at least 500%, at least 600%, at least 700%, at least 800%, at least 900%, or at least 1,000%) more hexanoyl-CoA and/or more butanoyl-coenzyme A relative to a control. In some embodiments, a control is a host cell that does not express a heterologous gene encoding an AAE.


Polyketide Synthases (PKS)


A host cell described in this application may comprise a PKS. As used in this application, a “PKS” refers to an enzyme that is capable of producing a polyketide. In certain embodiments, a PKS converts a compound of Formula (2) to a compound of Formula (4), (5), and/or (6). In certain embodiments, a PKS converts a compound of Formula (2) to a compound of Formula (4). In certain embodiments, a PKS converts a compound of Formula (2) to a compound of Formula (5). In certain embodiments, a PKS converts a compound of Formula (2) to a compound of Formula (4) and/or (5). In certain embodiments, a PKS converts a compound of Formula (2) to a compound of Formula (5) and/or (6).


In some embodiments, a PKS is a tetraketide synthase (TKS). In certain embodiments, a PKS is an olivetol synthase (OLS). As used in this application, an “OLS” refers to an enzyme that is capable of using a substrate of Formula (2a) to form a compound of Formula (4a), (5a) and/or (6a) as shown in FIG. 1. In some embodiments, an OLS catalyzes the formation of olivetol (Formula (5a)). In some embodiments, an olivetol synthase (OLS) catalyzes the formation of olivetol with minimal production of 3,5,7-trioxoalkanoyl-CoA and/or olivetolic acid. In some instances, an OLS that is capable of catalyzing the formation of olivetol may be useful in providing olivetol as a substrate for a prenyltransferase. As a non-limiting example, NphB can use olivetol as reactant. See, e.g., Kumano et al., Bioorg Med Chem. 2008 Sep. 1; 16(17): 8117-8126.


In certain embodiments, a PKS is a divarinic acid synthase (DVS).


A non-limiting example of an OLS is provided by UniProtKB-B1Q2B6 from C. sativa. In C. sativa, this OLS uses hexanoyl-CoA and malonyl-CoA as substrates to form 3,5,7-trioxododecanoyl-CoA. OLS (e.g., UniProtKB-B1Q2B6) in combination with olivetolic acid cyclase (OAC) produces olivetolic acid (OA) in C. sativa.


The amino acid sequence of UniProtKB-B1Q2B6 is:









(SEQ ID NO: 5)


MNHLRAEGPASVLAIGTANPENILLQDEFPDYYFRVTKSEHMTQLKEKFR





KICDKSMIRKRNCFLNEEHLKQNPRLVEHEMQTLDARQDMLVVEVPKLGK





DACAKAIKEWGQPKSKITHLIFTSASTTDMPGADYHCAKLLGLSPSVKRV





MMYQLGCYGGGTVLRIAKDIAENNKGARVLAVCCDIMACLFRGPSESDLE





LLVGQAIFGDGAAAVIVGAEPDESVGERPIFELVSTGQTILPNSEGTIGG





HIREAGLIFDLHKDVPMLISNNIEKCLIEAFTPIGISDWNSIFWITHPGG





KAILDKVEEKLHLKSDKFVDSRHVLSEHGNIVISSSTVLFVMDELRKRSL





EEGKSTTGDGFEWGVLFGFGPGLTVERVVVRSVPIKY.






Structurally, an OLS comprises a triad of conserved residues, which have been implicated as catalytic residues. This triad of conserved residues may be referred to as a catalytic triad. See, e.g., Taura et al., FEBS Letters 583 (2009) 2061-2066. The catalytic triad of UniProtKB-B1Q2B6 (SEQ ID NO: 5) comprises C157, H297, and N330. One of ordinary skill in the art would be able to identify corresponding catalytic residues in other PKSs, including OLSs, by aligning the amino acid sequence of interest with UniProtKB-B1Q2B6. A PKS, including an OLS, may comprise the amino acid C at a residue corresponding to position 157 in SEQ ID NO: 5, the amino acid H at a residue corresponding to position 297 in SEQ ID NO: 5, and the amino acid N at a residue corresponding to residue 330 in SEQ ID NO: 5. As a non-limiting example, the residues corresponding to positions 157, 297, and 330 in SEQ ID NO: 5 are C164, H304, and N337, respectively in SEQ ID NO: 6. Similarly, the residues corresponding to positions 157, 297, and 330 in SEQ ID NO: 5 are C164, H304, and N337, respectively, in SEQ ID NO: 7.


The active site of a PKS may be defined by generating the three-dimensional structure of the PKS and identifying the residues within a particular distance of any of the residues within the catalytic triad and/or within a particular distance of a docked substrate within the PKS (e.g., a compound of Formula (2)). A substrate docks or binds in the substrate binding pocket of a PKS. The substrate binding pocket may comprise the active site of the PKS. As a non-limiting example, the structure of a PKS may be generated using ROSETTA software. See, e.g., Kaufmann et al., Biochemistry 2010, 49, 2987-2998.


As used herein, a residue is within the active site of an OLS enzyme if it is within about 12 angstroms of any of the residues within the catalytic triad of the OLS enzyme and/or within about 12 angstroms of a docked substrate within the OLS enzyme.


In some embodiments, a residue is within 12 angstroms (Å), within 11 Å, within 10 Å, within 9 Å, within 8 Å, within 7 Å, within 6 Å, within 5 Å, within 4 Å, within 3 Å, within 2 Å, or within 1 Å from any of the residues within the catalytic triad (i.e., 157, 297, and 330 in SEQ ID NO: 5) and/or from a docked substrate (e.g., hexanoyl-CoA).


In some embodiments, a residue in a PKS is within 20 Å, within 19 Å, within 18 Å, within 17 Å, within 16 Å, within 15 Å, within 14 Å, within 13 Å, within 12 Å, within 11 Å, within 10 Å, within 9 Å, within 8 Å, within 7 Å, within 6 Å, within 5 Å, within 4 Å, within 3 Å, within 2 Å, and/or within 1 Å from any of the residues within the catalytic triad (i.e., residues in the PKS corresponding to positions 157, 297, and 330 in SEQ ID NO: 5) and/or a docked substrate.


As a non-limiting example, positions 17, 23, 25, 51, 54, 64, 95, 123, 125, 153, 196, 201, 207, 241, 247, 267, 273, 277, 296, 307, 320, 324, 326, 328, 334, 335, and 375 in SEQ ID NO: 5 may be located within the active site of a PKS comprising SEQ ID NO: 5. Positions 51, 54, 123, 125, 201, 207, 241, 247, 267, 273, 296, 307, 324, 326, 328, 334, 335, and 375 in SEQ ID NO: 5 may be located within about 8 Å from any of the residues within the catalytic triad and/or a docked substrate of the PKS comprising SEQ ID NO: 5.


In some embodiments, a PKS comprises an amino acid substitution, insertion, or deletion at a residue that is within the active site of the PKS. In some embodiments, a PKS comprises an amino acid substitution, insertion, or deletion at a residue that is within 12 angstroms (Å), within 11 Å, within 10 Å, within 9 Å, within 8 Å, within 7 Å, within 6 Å, within 5 Å, within 4 Å, within 3 Å, within 2 Å, or within 1 Å away from any one of the catalytic triad residues (i.e., positions 157, 297, and 330 in SEQ ID NO: 5) and/or from a docked substrate. In some embodiments, the amino acid substitution, insertion, or deletion is at a residue corresponding to position 17, 23, 25, 51, 54, 64, 95, 123, 125, 153, 196, 201, 207, 241, 247, 267, 273, 277, 296, 307, 320, 324, 326, 328, 334, 335, and/or 375 in SEQ ID NO: 5. In some embodiments, a residue in a PKS corresponding to position 17, 23, 25, 51, 54, 64, 95, 123, 125, 153, 196, 201, 207, 241, 247, 267, 273, 277, 296, 307, 320, 324, 326, 328, 334, 335, and/or 375 in SEQ ID NO: 5 is located within 12 Å from the active site of the PKS. In some embodiments, a residue in a PKS corresponding to position 51, 54, 123, 125, 201, 207, 241, 247, 267, 273, 296, 307, 324, 326, 328, 334, 335, and/or 375 in SEQ ID NO: 5 is located within 8 Å from the active site of the PKS. In some embodiments, the PKS comprises one or more of: T17K, I23C, L25R, K51R, D54R, F64Y, V95A, T123C, A125S, Y153G, E196K, L201C, I207L, L241I, T247A, M267K, M267G, I273V, L277M, T296A, V307I, D320A, V324I, S326R, H328Y, S334P, S334A, T335C, and/or R375T relative to SEQ ID NO: 5. In some embodiments, a host cell comprising one or more of these amino acid substitutions relative to SEQ ID NO: 5 is capable of producing at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 or more than 15 mg/L Olivetol, including all values in between.


In some embodiments, a PKS comprises: an amino acid substitution, insertion, or deletion at a residue that is more than 12 angstroms (Å), more than 11 Å, more than 10 Å, more than 9 Å, more than 8 Å, more than 7 Å, more than 6 Å, more than 5 Å, more than 4 Å, more than 3 Å, more than 2 Å, or more than 1 Å away from the catalytic triad (i.e., 157, 297, and 330 in SEQ ID NO: 5) and/or from a docked substrate. In some embodiments, the residue corresponds to position 71, 92, 100, 108, 116, 128, 135, 229, 278, 284, and/or 348 in SEQ ID NO: 5. In some embodiments, the residue in a PKS corresponding to position 71, 92, 100, 108, 116, 128, 135, 229, 278, 284, and/or 348 in SEQ ID NO: 5 is more than 12A from the active site of the PKS. In some embodiments, the PKS comprises one or more of: I284Y, K100L, K116R, I278E, K108D, L348S, K71R, V92G, T128V, K100M, Y135V, P229A, T128A, and/or T128I. In some embodiments, a host cell comprising one or more of these mutations is capable of producing at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 or more than 15 mg/L Olivetol, including all values in between.


In some embodiments, a PKS comprises the amino acid C at a residue corresponding to position 335 of SEQ ID NO: 5. In some embodiments, a PKS comprises the amino acid substitution T335C relative to a control. In some embodiments, the control is a PKS comprising SEQ ID NO: 5. In some embodiments, a PKS comprises a sequence at least 5%, at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 71%, at least 72%, at least 73%, at least 74%, at least 75%, at least 76%, at least 77%, at least 78%, at least 79%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or is 100% identical, including all values in between, to a sequence (e.g., amino acid or nucleic sequence) set forth in SEQ ID NOs: 38, 172, 175, 176, 196, 204, 205, 7, 17, 145, 13, 8, and 15. In some embodiments, a PKS comprises a sequence at most 5%, at most 10%, at most 15%, at most 20%, at most 25%, at most 30%, at most 35%, at most 40%, at most 45%, at most 50%, at most 55%, at most 60%, at most 65%, at most 70%, at most 71%, at most 72%, at most 73%, at most 74%, at most 75%, at most 76%, at most 77%, at most 78%, at most 79%, at most 80%, at most 81%, at most 82%, at most 83%, at most 84%, at most 85%, at most 86%, at most 87%, at most 88%, at most 89%, at most 90%, at most 91%, at most 92%, at most 93%, at most 94%, at most 95%, at most 96%, at most 97%, at most 98%, at most 99%, or is 100% identical, including all values in between, to a sequence (e.g., amino acid or nucleic sequence) set forth in SEQ ID NOs: 38, 172, 175, 176, 196, 204, 205, 7, 17, 145, 13, 8, and 15.


In some embodiments, a PKS described herein comprises a sequence that is at least 5%, at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 71%, at least 72%, at least 73%, at least 74%, at least 75%, at least 76%, at least 77%, at least 78%, at least 79%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or is 100% identical, including all values in between, to a sequence (e.g., nucleic acid or amino acid sequence) set forth in UniProtKB-A0A088G5Z5 (SEQ ID NO: 7), SEQ ID NO: 714, SEQ ID NO: 715, or SEQ ID NO: 38.


In some embodiments, relative to the sequence of SEQ ID NO: 7, the PKS comprises an amino acid substitution at a residue corresponding to position 28, 34, 50, 70, 71, 76, 88, 100, 151, 203, 219, 285, 359, and/or 385 in SEQ ID NO: 7. In some embodiments, the PKS comprises: the amino acid P at a residue corresponding to position 28 in SEQ ID NO: 7; the amino acid Q at a residue corresponding to position 34 in SEQ ID NO: 7; the amino acid N at a residue corresponding to position 50 in SEQ ID NO: 7; the amino acid M at a residue corresponding to position 70 in SEQ ID NO: 7; the amino acid Y at a residue corresponding to position 71 in SEQ ID NO: 7; the amino acid I at a residue corresponding to position 76 in SEQ ID NO: 7; the amino acid A at a residue corresponding to position 88 in SEQ ID NO: 7; the amino acid P or T at a residue corresponding to position 100 in SEQ ID NO: 7; the amino acid P at a residue corresponding to position 151 in SEQ ID NO: 7; the amino acid K at a residue corresponding to position 203 in SEQ ID NO: 7; the amino acid C at a residue corresponding to position 219 in SEQ ID NO: 7; the amino acid A at a residue corresponding to position 285 in SEQ ID NO: 7; the amino acid M at a residue corresponding to position 359 in SEQ ID NO: 7; and/or the amino acid M at a residue corresponding to position 385 in SEQ ID NO: 7. In some embodiments, the PKS comprises one or more of the following amino acid substitutions relative to SEQ ID NO: 7: E28P, S34Q, V50N, F70M, V71Y, L76I, D88A, R100P, R100T, N151P, E203K, A219C, E285A, K359M, and/or L385M. In some embodiments, the PKS comprises V71Y and/or F70M. In some embodiments, the PKS comprises C at a residue corresponding to position 164 in SEQ ID NO: 7; H at a residue corresponding to position 304 in SEQ ID NO: 7; and/or N at a residue corresponding to position 337 in SEQ ID NO: 7.


In some embodiments, a host cell with a PKS that comprises an amino acid substitution at a residue corresponding to position to position 28, 34, 50, 70, 71, 76, 88, 100, 151, 203, 219, 285, 359, and/or 385 in SEQ ID NO: 7 produces at least 1% (e.g., at least 5%, at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 100%, at least 125%, at least 150%, at least 175%, at least 200%, at least 300%, at least 400%, at least 500%, at least 600%, at least 700%, at least 800%, at least 900%, or at least 1,000%) more of a product (e.g., a compound of Formula (4), (5), and/or (6)) relative to a host cell comprising SEQ ID NO: 7.


In some embodiments, a PKS described herein comprises: A at a residue corresponding to position 17 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); A at a residue corresponding to position 21 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); I at a residue corresponding to position 22 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); G at a residue corresponding to position 23 in UniProtKB-A0A1R3HSU5 (SEQ ID NO: 6); Q at a residue corresponding to position 33 in UniProtKB-A0A1R3HSU5 (SEQ ID NO: 6); D at a residue corresponding to position 38 in UniProtKB-A0A1R3HSU5 (SEQ ID NO: 6); F at a residue corresponding to position 41 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); L at a residue corresponding to position 52 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); K at a residue corresponding to position 55 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); C at a residue corresponding to position 60 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); R at a residue corresponding to position 68 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); R at a residue corresponding to position 94 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); A at a residue corresponding to position 109 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); A at a residue corresponding to position 113 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); W at a residue corresponding to position 117 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); G at a residue corresponding to position 118 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); S at a residue corresponding to position 122 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); I at a residue corresponding to position 124 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); T at a residue corresponding to position 125 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); H at a residue corresponding to position 126 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); P at a residue corresponding to position 138 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); D at a residue corresponding to position 141 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); L at a residue corresponding to position 150 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); G at a residue corresponding to position 163 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); C at a residue corresponding to position 164 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); G at a residue corresponding to position 168 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); R at a residue corresponding to position 172 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); K at a residue corresponding to position 175 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); E at a residue corresponding to position 179 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); R at a residue corresponding to position 185 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); L at a residue corresponding to position 187 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); V at a residue corresponding to position 189 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); C at a residue corresponding to position 190 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); P at a residue corresponding to position 201 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); F at a residue corresponding to position 215 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); D at a residue corresponding to position 217 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); G at a residue corresponding to position 218 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); G at a residue corresponding to position 225 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); E at a residue corresponding to position 234 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); G at a residue corresponding to position 263 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); P at a residue corresponding to position 273 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); F at a residue corresponding to position 288 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); D at a residue corresponding to position 295 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); N at a residue corresponding to position 297 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); F at a residue corresponding to position 300 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); H at a residue corresponding to position 304 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); G at a residue corresponding to position 306 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); G at a residue corresponding to position 307 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); L at a residue corresponding to position 311 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); G at a residue corresponding to position 336 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); N at a residue corresponding to position 337 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); M at a residue corresponding to position 338 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); V at a residue corresponding to position 343 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); D at a residue corresponding to position 348 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); R at a residue corresponding to position 351 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); G at a residue corresponding to position 363 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); G at a residue corresponding to position 365 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); G at a residue corresponding to position 369 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); G at a residue corresponding to position 375 in UniProtKB-A0A1R3HSU5 (SEQ ID NO: 6); P at a residue corresponding to position 376 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); G at a residue corresponding to position 377 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); E at a residue corresponding to position 381 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); and/or S at a residue corresponding to position 387 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6).


In some embodiments, a PKS described herein comprises: S, T, or G at a residue corresponding to position 18 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); V or I at a residue corresponding to position 19 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); E, P, S, A, or D at a residue corresponding to position 28 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); I, C, I, S, F, Y, Q, H, A, or V at a residue corresponding to position 30 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); D, S, C, I, A, or D at a residue corresponding to position 34 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); F or Y at a residue corresponding to position 36 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); Y, F, or V at a residue corresponding to position 39 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); K, N, D, or S at a residue corresponding to position 45 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); K, R, or H at a residue corresponding to position 58 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); F, H, V, Y, or N at a residue corresponding to position 71 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); R, N, C, E, S, or H at a residue corresponding to position 82 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); M, A, D, S, E, or V at a residue corresponding to position 88 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); Q, P, S, N, L, or K at a residue corresponding to position 89 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); T or S at a residue corresponding to position 90 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); M, I, F, L, or V at a residue corresponding to position 97 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); D, E, K, or A at a residue corresponding to position 108 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); C, A, or S at a residue corresponding to position 110 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); T, C, or Y at a residue corresponding to position 130 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); S or T at a residue corresponding to position 131 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); A, S, T, or I at a residue corresponding to position 132 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); L, Q, or H at a residue corresponding to position 162 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); G, A, or S at a residue corresponding to position 166 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); I, L, V, T, M, or Y at a residue corresponding to position 173 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); I, L, F, or V at a residue corresponding to position 177 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); C, S, or A at a residue corresponding to position 191 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); D or E at a residue corresponding to position 192 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); M or T at a residue corresponding to position 194 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); L, C, T, S, M, or N at a residue corresponding to position 197 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); E or D at a residue corresponding to position 207 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); V, L, M, or I at a residue corresponding to position 222 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); I, L, C, S, V, or M at a residue corresponding to position 237 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); T, A, N, or S at a residue corresponding to position 243 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); N, D, E, or G at a residue corresponding to position 250 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); I or L at a residue corresponding to position 299 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); K, P, or R at a residue corresponding to position 308 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); F, L, or M at a residue corresponding to position 325 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); H or Y at a residue corresponding to position 335 in UniProtKB-A0A1R3HSU5; M or L at a residue corresponding to position 347 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); L, M, I, or T at a residue corresponding to position 350 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); F, L, or M at a residue corresponding to position 366 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6); and/or R or T at a residue corresponding to position 382 in UniProtKB-A0A1R3HSU5 (SEQ ID NO:6).


In some embodiments, a PKS comprises a sequence that is at least 5%, at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 71%, at least 72%, at least 73%, at least 74%, at least 75%, at least 76%, at least 77%, at least 78%, at least 79%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or is 100% identical, including all values in between, to a sequence (e.g., nucleic acid or amino acid sequence) set forth in SEQ ID NOs: 1-31, 77-92, 143-171, 207-249, 293-420, 549-627, 32-62, 93-108, 172-206, 250-292, 421-548, 628-705, and 706 or to a sequence selected from Tables 5-6 and 13-16.


In some embodiments, a PKS comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, at least 36, at least 37, at least 38, at least 39, at least 40, at least 41, at least 42, at least 43, at least 44, at least 45, at least 46, at least 47, at least 48, at least 49, at least 50, at least 51, at least 52, at least 53, at least 54, at least 55, at least 56, at least 57, at least 58, at least 59, at least 60, at least 61, at least 62, at least 63, at least 64, at least 65, at least 66, at least 67, at least 68, at least 69, at least 70, at least 71, at least 72, at least 73, at least 74, at least 75, at least 76, at least 77, at least 78, at least 79, at least 80, at least 81, at least 82, at least 83, at least 84, at least 85, at least 86, at least 87, at least 88, at least 89, at least 90, at least 91, at least 92, at least 93, at least 94, at least 95, at least 96, at least 97, at least 98, at least 99, at least 100, at least 101, at least 102, at least 103, at least 104, at least 105, at least 106, at least 107, at least 108, at least 109, at least 110, at least 111, at least 112, at least 113, at least 114, at least 115, at least 116, at least 117, at least 118, at least 119, at least 120, at least 121, at least 122, at least 123, at least 124, at least 125, at least 126, at least 127, at least 128, at least 129, at least 130, at least 131, at least 132, at least 133, at least 134, at least 135, at least 136, at least 137, at least 138, at least 139, at least 140, at least 141, at least 142, at least 143, at least 144, at least 145, at least 146, at least 147, at least 148, at least 149, at least 150, at least 151, at least 152, at least 153, at least 154, at least 155, at least 156, at least 157, at least 158, at least 159, at least 160, at least 161, at least 162, at least 163, at least 164, at least 165, at least 166, at least 167, at least 168, at least 169, at least 170, at least 171, at least 172, at least 173, at least 174, at least 175, at least 176, at least 177, at least 178, at least 179, at least 180, at least 181, at least 182, at least 183, at least 184, at least 185, at least 186, at least 187, at least 188, at least 189, at least 190, at least 191, at least 192, at least 193, at least 194, at least 195, at least 196, at least 197, at least 198, at least 199, at least 200, at least 201, at least 202, at least 203, at least 204, at least 205, at least 206, at least 207, at least 208, at least 209, at least 210, at least 211, at least 212, at least 213, at least 214, at least 215, at least 216, at least 217, at least 218, at least 219, at least 220, at least 221, at least 222, at least 223, at least 224, at least 225, at least 226, at least 227, at least 228, at least 229, at least 230, at least 231, at least 232, at least 233, at least 234, at least 235, at least 236, at least 237, at least 238, at least 239, at least 240, at least 241, at least 242, at least 243, at least 244, at least 245, at least 246, at least 247, at least 248, at least 249, at least 250, at least 251, at least 252, at least 253, at least 254, at least 255, at least 256, at least 257, at least 258, at least 259, at least 260, at least 261, at least 262, at least 263, at least 264, at least 265, at least 266, at least 267, at least 268, at least 269, at least 270, at least 271, at least 272, at least 273, at least 274, at least 275, at least 276, at least 277, at least 278, at least 279, at least 280, at least 281, at least 282, at least 283, at least 284, at least 285, at least 286, at least 287, at least 288, at least 289, at least 290, at least 291, at least 292, at least 293, at least 294, at least 295, at least 296, at least 297, at least 298, at least 299, at least 300, at least 301, at least 302, at least 303, at least 304, at least 305, at least 306, at least 307, at least 308, at least 309, at least 310, at least 311, at least 312, at least 313, at least 314, at least 315, at least 316, at least 317, at least 318, at least 319, at least 320, at least 321, at least 322, at least 323, at least 324, at least 325, at least 326, at least 327, at least 328, at least 329, at least 330, at least 331, at least 332, at least 333, at least 334, at least 335, at least 336, at least 337, at least 338, at least 339, at least 340, at least 341, at least 342, at least 343, at least 344, at least 345, at least 346, at least 347, at least 348, at least 349, at least 350, at least 351, at least 352, at least 353, at least 354, at least 355, at least 356, at least 357, at least 358, at least 359, at least 360, at least 361, at least 362, at least 363, at least 364, at least 365, at least 366, at least 367, at least 368, at least 369, at least 370, at least 371, at least 372, at least 373, at least 374, at least 375, at least 376, at least 377, at least 378, at least 379, or at least 380 amino acid substitutions, deletions, or insertions relative to SEQ ID NOs: 1-31, 77-92, 143-171, 207-249, 293-420, and 549-627 or to an amino acid sequence selected from Tables 5-6 and 13-16.


In some embodiments, a PKS comprises at most 1, at most 2, at most 3, at most 4, at most 5, at most 6, at most 7, at most 8, at most 9, at most 10, at most 11, at most 12, at most 13, at most 14, at most 15, at most 16, at most 17, at most 18, at most 19, at most 20, at most 21, at most 22, at most 23, at most 24, at most 25, at most 26, at most 27, at most 28, at most 29, at most 30, at most 31, at most 32, at most 33, at most 34, at most 35, at most 36, at most 37, at most 38, at most 39, at most 40, at most 41, at most 42, at most 43, at most 44, at most 45, at most 46, at most 47, at most 48, at most 49, at most 50, at most 51, at most 52, at most 53, at most 54, at most 55, at most 56, at most 57, at most 58, at most 59, at most 60, at most 61, at most 62, at most 63, at most 64, at most 65, at most 66, at most 67, at most 68, at most 69, at most 70, at most 71, at most 72, at most 73, at most 74, at most 75, at most 76, at most 77, at most 78, at most 79, at most 80, at most 81, at most 82, at most 83, at most 84, at most 85, at most 86, at most 87, at most 88, at most 89, at most 90, at most 91, at most 92, at most 93, at most 94, at most 95, at most 96, at most 97, at most 98, at most 99, at most 100, at most 101, at most 102, at most 103, at most 104, at most 105, at most 106, at most 107, at most 108, at most 109, at most 110, at most 111, at most 112, at most 113, at most 114, at most 115, at most 116, at most 117, at most 118, at most 119, at most 120, at most 121, at most 122, at most 123, at most 124, at most 125, at most 126, at most 127, at most 128, at most 129, at most 130, at most 131, at most 132, at most 133, at most 134, at most 135, at most 136, at most 137, at most 138, at most 139, at most 140, at most 141, at most 142, at most 143, at most 144, at most 145, at most 146, at most 147, at most 148, at most 149, at most 150, at most 151, at most 152, at most 153, at most 154, at most 155, at most 156, at most 157, at most 158, at most 159, at most 160, at most 161, at most 162, at most 163, at most 164, at most 165, at most 166, at most 167, at most 168, at most 169, at most 170, at most 171, at most 172, at most 173, at most 174, at most 175, at most 176, at most 177, at most 178, at most 179, at most 180, at most 181, at most 182, at most 183, at most 184, at most 185, at most 186, at most 187, at most 188, at most 189, at most 190, at most 191, at most 192, at most 193, at most 194, at most 195, at most 196, at most 197, at most 198, at most 199, at most 200, at most 201, at most 202, at most 203, at most 204, at most 205, at most 206, at most 207, at most 208, at most 209, at most 210, at most 211, at most 212, at most 213, at most 214, at most 215, at most 216, at most 217, at most 218, at most 219, at most 220, at most 221, at most 222, at most 223, at most 224, at most 225, at most 226, at most 227, at most 228, at most 229, at most 230, at most 231, at most 232, at most 233, at most 234, at most 235, at most 236, at most 237, at most 238, at most 239, at most 240, at most 241, at most 242, at most 243, at most 244, at most 245, at most 246, at most 247, at most 248, at most 249, at most 250, at most 251, at most 252, at most 253, at most 254, at most 255, at most 256, at most 257, at most 258, at most 259, at most 260, at most 261, at most 262, at most 263, at most 264, at most 265, at most 266, at most 267, at most 268, at most 269, at most 270, at most 271, at most 272, at most 273, at most 274, at most 275, at most 276, at most 277, at most 278, at most 279, at most 280, at most 281, at most 282, at most 283, at most 284, at most 285, at most 286, at most 287, at most 288, at most 289, at most 290, at most 291, at most 292, at most 293, at most 294, at most 295, at most 296, at most 297, at most 298, at most 299, at most 300, at most 301, at most 302, at most 303, at most 304, at most 305, at most 306, at most 307, at most 308, at most 309, at most 310, at most 311, at most 312, at most 313, at most 314, at most 315, at most 316, at most 317, at most 318, at most 319, at most 320, at most 321, at most 322, at most 323, at most 324, at most 325, at most 326, at most 327, at most 328, at most 329, at most 330, at most 331, at most 332, at most 333, at most 334, at most 335, at most 336, at most 337, at most 338, at most 339, at most 340, at most 341, at most 342, at most 343, at most 344, at most 345, at most 346, at most 347, at most 348, at most 349, at most 350, at most 351, at most 352, at most 353, at most 354, at most 355, at most 356, at most 357, at most 358, at most 359, at most 360, at most 361, at most 362, at most 363, at most 364, at most 365, at most 366, at most 367, at most 368, at most 369, at most 370, at most 371, at most 372, at most 373, at most 374, at most 375, at most 376, at most 377, at most 378, at most 379, or at most 380 amino acid substitutions, deletions, or insertions relative to 1-31, 77-92, 143-171, 207-249, 293-420, and 549-627 or to an amino acid sequence selected from Tables 5-6 and 13-16.


As one of ordinary skill in the art would appreciate a PKS, such as an OLS, could be obtained from any source, including naturally occurring sources and synthetic sources (e.g., a non-natually occurring PKS). In some embodiments a PKS is from Cannabis. In some embodiments a PKS is from Dictyostelium. Non-limiting examples of PKS enzymes may be found in U.S. Pat. No. 6,265,633; WO2019/202510; WO 2018/148848 A1; WO 2018/148849 A1; and US 2018/155748 (granted as U.S. Pat. No. 10,435,727), which are incorporated by reference in this application in their entireties. For example, PKSs include SEQ ID NO: 2 from WO2019/202510, SEQ ID NO: 9 from WO2019/202510, SEQ ID NO: 37 from WO 2018/148848 A1, SEQ ID NO: 38 from WO 2018/148848 A1, SEQ ID NO: 9 from WO 2018/148849 A1, SEQ ID NO: 10 from WO 2018/148849 A1, SEQ ID NO: 13 from WO 2018/148849 A1; and SEQ ID NO: 35 from U.S. Pat. No. 10,435,727.


In certain embodiments, polyketide synthases can use hexanoyl-CoA or any acyl-CoA (or a product of Formula (2)):




embedded image


and three malonyl-CoAs as substrates to form 3,5,7-trioxododecanoyl-CoA or other 3,5,7-trioxo-acyl-CoA derivatives; or to form a compound of Formula (4):




embedded image


wherein R is hydrogen, optionally substituted acyl, optionally substituted alkyl, optionally substituted alkenyl, optionally substituted alkynyl, optionally substituted carbocyclyl, or optionally substituted aryl; depending on substrate. R is as defined in this application. In some embodiments, R is a C2-C6 optionally substituted alkyl. In some embodiments, R is a propyl or pentyl. In some embodiments, R is pentyl. In some embodiments, R is propyl. A PKS may also bind isovaleryl-CoA, octanoyl-CoA, hexanoyl-CoA, and butyryl-CoA. In some embodiments, a PKS is capable of catalyzing the formation of a 3,5,7-trioxoalkanoyl-CoA (e.g. 3,5,7-trioxododecanoyl-CoA). In some embodiments, an OLS is capable of catalyzing the formation of a 3,5,7-trioxoalkanoyl-CoA (e.g. 3,5,7-trioxododecanoyl-CoA).


In some embodiments, a PKS uses a substrate of Formula (2) to form a compound of Formula (4):




embedded image


wherein R is unsubstituted pentyl.


A recombinant host cell that expresses a heterologous gene encoding an PKS described herein may be capable of producing at least 1% (e.g., at least 5%, at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 100%, at least 125%, at least 150%, at least 175%, at least 200%, at least 300%, at least 400%, at least 500%, at least 600%, at least 700%, at least 800%, at least 900%, or at least 1,000%) more of a product (e.g., a compound of Formula (4), (5), and/or (6)) relative to a control. In some embodiments, a compound of Formula (4) is a compound of Formula (4a), a compound of Formula (5) is a compound of Formula (5a), and a compound of Formula (6) is a compound of Formula (6a). In some embodiments, a control is a recombinant host cell that expresses a heterologous gene encoding UniProtKB-B1Q2B6. In some embodiments, a control is a recombinant host cell that expresses a heterologous gene encoding a wild-type PKS.


A recombinant host cell that expresses a heterologous gene encoding an PKS described herein may be capable of producing at least 0.5 mg/L, at least 1 mg/L, at least 1.5 mg/L, at least 2 mg/L, at least 2.5 mg/L, at least 3 mg/L, at least 3.5 mg/L, at least 4 mg/L, at least 4.5 mg/L, at least 5 mg/L, at least 5.5 mg/L, at least 6 mg/L, at least 6.5 mg/L, at least 7 mg/L, at least 7.5 mg/L, at least 8 mg/L, at least 8.5 mg/L, at least 9 mg/L, at least 9.5 mg/L, at least 10 mg/L, at least 10.5 mg/L, at least 11 mg/L, at least 11.5 mg/L, at least 12 mg/L, at least 12.5 mg/L, at least 13 mg/L, at least 13.5 mg/L, at least 14 mg/L, at least 14.5 mg/L, at least 15 mg/L, at least 15.5 mg/L, at least 16 mg/L, at least 16.5 mg/L, at least 17 mg/L, at least 17.5 mg/L, at least 18 mg/L, at least 18.5 mg/L, at least 19 mg/L, at least 19.5 mg/L, at least 20 mg/L, at least 20.5 mg/L, at least 21 mg/L, at least 21.5 mg/L, at least 22 mg/L, at least 22.5 mg/L, at least 23 mg/L, at least 23.5 mg/L, at least 24 mg/L, at least 24.5 mg/L, at least 25 mg/L, at least 25.5 mg/L, at least 26 mg/L, at least 26.5 mg/L, at least 27 mg/L, at least 27.5 mg/L, at least 28 mg/L, at least 28.5 mg/L, at least 29 mg/L, at least 29.5 mg/L, at least 30 mg/L, at least 30.5 mg/L, at least 31 mg/L, at least 31.5 mg/L, at least 32 mg/L, at least 32.5 mg/L, at least 33 mg/L, at least 33.5 mg/L, at least 34 mg/L, at least 34.5 mg/L, at least 35 mg/L, at least 35.5 mg/L, at least 36 mg/L, at least 36.5 mg/L, at least 37 mg/L, at least 37.5 mg/L, at least 38 mg/L, at least 38.5 mg/L, at least 39 mg/L, at least 39.5 mg/L, at least 40 mg/L, at least 40.5 mg/L, at least 41 mg/L, at least 41.5 mg/L, at least 42 mg/L, at least 42.5 mg/L, at least 43 mg/L, at least 43.5 mg/L, at least 44 mg/L, at least 44.5 mg/L, at least 45 mg/L, at least 45.5 mg/L, at least 46 mg/L, at least 46.5 mg/L, at least 47 mg/L, at least 47.5 mg/L, at least 48 mg/L, at least 48.5 mg/L, at least 49 mg/L, at least 49.5 mg/L, at least 50 mg/L, at least 50.5 mg/L, at least 51 mg/L, at least 51.5 mg/L, at least 52 mg/L, at least 52.5 mg/L, at least 53 mg/L, at least 53.5 mg/L, at least 54 mg/L, at least 54.5 mg/L, at least 55 mg/L, at least 55.5 mg/L, at least 56 mg/L, at least 56.5 mg/L, at least 57 mg/L, at least 57.5 mg/L, at least 58 mg/L, at least 58.5 mg/L, at least 59 mg/L, at least 59.5 mg/L, at least 60 mg/L, at least 60.5 mg/L, at least 61 mg/L, at least 61.5 mg/L, at least 62 mg/L, at least 62.5 mg/L, at least 63 mg/L, at least 63.5 mg/L, at least 64 mg/L, at least 64.5 mg/L, at least 65 mg/L, at least 65.5 mg/L, at least 66 mg/L, at least 66.5 mg/L, at least 67 mg/L, at least 67.5 mg/L, at least 68 mg/L, at least 68.5 mg/L, at least 69 mg/L, at least 69.5 mg/L, at least 70 mg/L, at least 70.5 mg/L, at least 71 mg/L, at least 71.5 mg/L, at least 72 mg/L, at least 72.5 mg/L, at least 73 mg/L, at least 73.5 mg/L, at least 74 mg/L, at least 74.5 mg/L, at least 75 mg/L, at least 75.5 mg/L, at least 76 mg/L, at least 76.5 mg/L, at least 77 mg/L, at least 77.5 mg/L, at least 78 mg/L, at least 78.5 mg/L, at least 79 mg/L, at least 79.5 mg/L, at least 80 mg/L, at least 80.5 mg/L, at least 81 mg/L, at least 81.5 mg/L, at least 82 mg/L, at least 82.5 mg/L, at least 83 mg/L, at least 83.5 mg/L, at least 84 mg/L, at least 84.5 mg/L, at least 85 mg/L, at least 85.5 mg/L, at least 86 mg/L, at least 86.5 mg/L, at least 87 mg/L, at least 87.5 mg/L, at least 88 mg/L, at least 88.5 mg/L, at least 89 mg/L, at least 89.5 mg/L, at least 90 mg/L, at least 90.5 mg/L, at least 91 mg/L, at least 91.5 mg/L, at least 92 mg/L, at least 92.5 mg/L, at least 93 mg/L, at least 93.5 mg/L, at least 94 mg/L, at least 94.5 mg/L, at least 95 mg/L, at least 95.5 mg/L, at least 96 mg/L, at least 96.5 mg/L, at least 97 mg/L, at least 97.5 mg/L, at least 98 mg/L, at least 98.5 mg/L, at least 99 mg/L, at least 99.5 mg/L, or at least 100 mg/L of a product (e.g., a compound of Formula (4), (5), and/or (6). In some instances, OLSs may form triketide (PDAL) and/or tetraketide (HTAL and olivetol) by-products. Triketides convert to PDAL, and tetraketides convert to HTAL and olivetol, not to olivetolic acid. In some embodiments, production of by-products is undesirable. In some embodiments, OLS enzymes described herein do not produce by-products or produce minimal by-products relative to a control. In some embodiments, OLS enzymes are selected, at least in part, based on the ratio of olivetolic acid produced relative to olivetol.


It was surprisingly discovered herein that OLSs can exhibit both OLS and OAC activity. PKS enzymes described in this application may or may not have cyclase activity. In some embodiments where the PKS enzyme does not have cyclase activity, one or more exogenous polynucleotides that encode a polyketide cyclase (PKC) enzyme may also be co-expressed in the same host cells to enable conversion of hexanoic acid or butyric acid or other fatty acid conversion into olivetolic acid or divarinolic acid or other precursors of cannabinoids. In some embodiments, the PKS enzyme and a PKC enzyme are expressed as separate and distinct enzymes. In some embodiments, a PKS enzyme that lacks cyclase activity and a PKC are linked as part of a fusion polypeptide that is a bifunctional PKS. In some embodiments, a bifunctional PKC is referred to


As used in this application, a bifunctional PKS is an enzyme that is capable of producing a compound of Formula (6):




embedded image


from a compound of Formula (2):




embedded image


and a compound of Formula (3):




embedded image


In some embodiments, a PKS produces more of a compound of Formula (6):




embedded image


as compared to a compound of Formula (5):




embedded image


As a non-limiting example, a compound of Formula (6):




embedded image


is olivetolic acid (Formula (6a)):




embedded image


As a non-limiting example, a compound of Formula (5):




embedded image


is olivetol (Formula (5a)):




embedded image


In some embodiments, a polyketide synthase of the present disclosure is capable of catalyzing a compound of Formula (2):




embedded image


and a compound of Formula (3):




embedded image


to produce a compound of Formula (4):




embedded image


and also further catalyzes a compound of Formula (4):




embedded image


to produce a compound of Formula (6):




embedded image


In some embodiments, the PKS is not a fusion protein. In some embodiments, a PKS that is capable of catalyzing a compound of Formula (2):




embedded image


and a compound of Formula (3):




embedded image


to produce a compound of Formula (4):




embedded image


and is also capable of further catalyzing the production of a compound of Formula (6):




embedded image


from the compound of Formula (4):




embedded image


is preferred because it avoids the need for an additional polyketide cyclase to produce a compound of Formula (6):




embedded image


In some embodiments, such an enzyme that is a bifunctional PKS eliminates the transport considerations needed with addition of a polyketide cyclase, whereby the compound of Formula (4), being the product of the PKS, must be transported to the PKS for use as a substrate to be converted into the compound of Formula (6).


In some embodiments, a PKS is capable of producing olivetolic acid in the presence of a compound of Formula (2a):




embedded image


and Formula (3a):



embedded image


In some embodiments, an OLS is capable of producing olivetolic acid in the presence of a compound of Formula (2a):




embedded image


and Formula (3a):



embedded image


Without being bound by a particular theory, the presence of the amino acid W at a residue in a PKS corresponding to position 339 of SEQ ID NO: 6 may render or enhance bifunctionality of a PKS. In some embodiments, a bifunctional PKS comprises the amino acid W at a residue corresponding to position 339 of SEQ ID NO: 6. In some embodiments, a bifunctional PKS does not comprise the amino acid S at a residue corresponding to position 339 of SEQ ID NO: 6. As a non-limiting example, a PKS may comprise the amino acid substitution S332W relative to SEQ ID NO: 5 (see, e.g., t606899, SEQ ID NO: 298). In some embodiments, a PKS may comprise the amino acid substitution S339W relative to SEQ ID NO: 7 (see, e.g., t607377, SEQ ID NO: 409) In some embodiments, the PKS comprises a sequence that is at least 5%, at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 71%, at least 72%, at least 73%, at least 74%, at least 75%, at least 76%, at least 77%, at least 78%, at least 79%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or is 100% identical, including all values in between, to a sequence (e.g., nucleic acid or amino acid sequence) set forth in SEQ ID NO: 6.


In some embodiments, an OLS is capable of producing olivetolic acid in the presence of a compound of Formula (2a) and Formula (3a). In some embodiments, the OLS produces more olivetolic acid (OA) than olivetol. In some embodiments, the OLS produces at least 1.1 times, 1.2 times, 1.3 times, 1.4 times, 1.5 times, 1.6 times, 1.7 times, 1.8 times, 1.9 times, 2 times, 2.1 times, 2.2 times, 2.3 times, 2.4 times, 2.5 times, 2.6 times, 2.7 times, 2.8 times, 2.9 times, 3 times, 3.1 times, 3.2 times, 3.3 times, 3.4 times, 3.5 times, 3.6 times, 3.7 times, 3.8 times, 3.9 times, 4 times, 5 times, 6 times, 8 times, 9 times, 10 times, 20 times, 30 times, 40 times, 50 times, 60 times, 70 times, 80 times, 90 times, 100 times, 200 times, 300 times, 400 times, 500 times, 600 times, 700 times, 800 times or 1,000 times more olivetolic acid (OA) than olivetol.


Without wishing to be bound by any theory, in some embodiments, bifunctional OLSs differ from other OLSs in the geometry of the substrate binding pocket, an internal substrate holding cavity, and/or a substrate exit tunnel. For example, the substrate binding pocket of the bifunctional OLSs may be wider as compared to the substrate binding pocket of Cannabis sativa OLS (SEQ ID NO: 5). Without wishing to be bound by any theory, this extra space may alleviate steric clashes between the protein and substrate and permit the pro-cyclization configuration.


Polyketide Cyclase (PKC)

A host cell described in this application may comprise a PKC. As used in this application, a “PKC” refers to an enzyme that is capable of cyclizing a polyketide.


In certain embodiments, a polyketide cyclase (PKC) catalyzes the cyclization of an oxo fatty acyl-CoA (e.g., a compound of Formula (4):




embedded image


or 3,5,7-trioxododecanoyl-COA, 3,5,7-trioxodecanoyl-COA) to the corresponding intramolecular cyclization product (e.g., compound of Formula (6), including olivetolic acid and divarinic acid). In some embodiments, a PKC catalyzes the formation of a compound which occurs in the presence of a PKS. PKC substrates include trioxoalkanol-CoA, such as 3,5,7-Trioxododecanoyl-CoA, or a compound of Formula (4):




embedded image


wherein R is hydrogen, optionally substituted acyl, optionally substituted alkyl, optionally substituted alkenyl, optionally substituted alkynyl, optionally substituted carbocyclyl, or optionally substituted aryl. In certain embodiments, a PKC catalyzes a compound of Formula (4):




embedded image


wherein R is hydrogen, optionally substituted acyl, optionally substituted alkyl, optionally substituted alkenyl, optionally substituted alkynyl, optionally substituted carbocyclyl, or optionally substituted aryl; to form a compound of Formula (6):




embedded image


wherein R is hydrogen, optionally substituted acyl, optionally substituted alkyl, optionally substituted alkenyl, optionally substituted alkynyl, optionally substituted carbocyclyl, or optionally substituted aryl; as substrates. R is as defined in this application. In some embodiments, R is a C2-C6 optionally substituted alkyl. In some embodiments, R is a propyl or pentyl. In some embodiments, R is pentyl. In some embodiments, R is propyl. In certain embodiments, a PKC is an olivetolic acid cyclase (OAC). In certain embodiments, a PKC is a divarinic acid cyclase (DAC).


As one of ordinary skill in the art would appreciate a PKC could be obtained from any source, including naturally occurring sources and synthetic sources (e.g., a non-natually occurring PKC). In some embodiments, a PKC is from Cannabis. Non-limiting examples of PKCs include those disclosed in U.S. Pat. Nos. 9,611,460; 10,059,971; and US Pub 2019/0169661, which are incorporated by reference in this application in their entireties.


In some embodiments, a PKC is an OAC. As used in this application, an “OAC” refers to an enzyme that is capable of catalyzing the formation of olivetolic acid (OA). In some embodiments, an OAC is an enzyme that is capable of using a substrate of Formula (4a) (3,5,7-trioxododecanoyl-CoA):




embedded image


to form a compound of Formula (6a) (olivetolic acid):




embedded image


Olivetolic acid cyclase from C. sativa (CsOAC) is a 101 amino acid enzyme that performs non-decaboxylative cyclization of the tetraketide product of olivetol synthase (FIG. 4 Structure 4a) via aldol condensation to form olivetolic acid (FIG. 4 Structure 6a). CsOAC was identified and characterized by Gagne et al. (PNAS 2012) via transcriptome mining, and its cyclization function was recapitulated in vitro to demonstrate that CsOAC is required for formation of olivetolic acid in C. sativa. A crystal structure of the enzyme was published by Yang et al. (FEBS J. 2016 March; 283(6):1088-106), which revealed that the enzyme is a homodimer and belongs to the α+β barrel (DABB) superfamily of protein folds. CsOAC is the only known plant polyketide cyclase. Multiple fungal Type III polyketide synthases have been identified that perform both polyketide synthase and cyclization functions (Funa et al., J Biol Chem. 2007 May 11; 282(19):14476-81); however, in plants such a dual function enzyme has not yet been discovered.


A non-limiting example of an amino acid sequence encoding OAC in C. sativa is provided by UniProtKB-I6WU39 (SEQ ID NO: 125), which catalyzes the formation of olivetolic acid (OA) from 3,5,7-Trioxododecanoyl-CoA.


The sequence of UniProtKB-I6WU39 (SEQ ID NO: 125) is:









MAVKHLIVLKFKDEITEAQKEEFFKTYVNLVNIIPAMKDVYWGKDVTQKN





KEEGYTHIVEVTFESVETIQDYIIHPAHVGFGDVYRSFWEKLLIFDYTPR





K.






A non-limiting example of a nucleic acid sequence encoding C. sativa OAC is:









(SEQ ID NO: 130)


atggcagtgaagcatttgattgtattgaagttcaaagatgaaatcacaga





agcccaaaaggaagaatttttcaagacgtatgtgaatcttgtgaatatca





tcccagccatgaaagatgtatactggggtaaagatgtgactcaaaagaat





aaggaagaagggtacactcacatagttgaggtaacatttgagagtgtgga





gactattcaggactacattattcatcctgcccatgttggatttggagatg





tctatcgttctttctgggaaaaacttctcatttttgactacacaccacga





aag.






Prenyltransferase (PT)

A host cell described in this application may comprise a prenyltransferase (PT). As used in this application, a “PT” refers to an enzyme that is capable of transferring prenyl groups to acceptor molecule substrates. Non-limiting examples of prenyltransferases are described in WO2018200888 (e.g., CsPT4), U.S. Pat. No. 8,884,100 (e.g., CsPT1); CA2718469; Valliere et al., Nat Commun. 2019 Feb. 4; 10(1):565; and Luo et al., Nature 2019 March; 567(7746):123-126, which are incorporated by reference in their entireties. In some embodiments, a PT is capable of producing cannabigerolic acid (CBGA), cannabigerovarinic acid (CBGVA), or other cannabinoids or cannabinoid-like substances. In some embodiments, a PT is cannabigerolic acid synthase (CBGAS). In some embodiments, a PT is cannabigerovarinic acid synthase (CBGVAS).


In some embodiments, the PT is an NphB prenyltransferase. See, e.g., U.S. Pat. No. 7,544,498; and Kumano et al., BioorgMed Chem. 2008 Sep. 1; 16(17): 8117-8126, which are incorporated by reference in this application in their entireties. In some embodiments, a PT corresponds to NphB from Streptomyces sp. (see, e.g., UniprotKB Accession No. Q4R2T2; see also SEQ ID NO: 2 of U.S. Pat. No. 7,361,483). The protein sequence corresponding to UniprotKB Accession No. Q4R2T2 is provided by SEQ ID NO: 131:









(SEQ ID NO: 131)


MSEAADVERVYAAMEEAAGLLGVACARDKIYPLLSTFQDTLVEGGSVVVF





SMASGRHSTELDFSISVPTSHGDPYATVVEKGLFPATGHPVDDLLADTQK





HLPVSMFAIDGEVTGGFKKTYAFFPTDNMPGVAELSAIPSMPPAVAENAE





LFARYGLDKVQMTSMDYKKRQVNLYFSELSAQTLEAESVLALVRELGLHV





PNELGLKFCKRSFSVYPTLNWETGKIDRLCFAVISNDPTLVPSSDEGDIE





KFHNYATKAPYAYVGEKRTLVYGLTLSPKEEYYKLGAYYHITDVQRGLLK





AFDSLED.






A non-limiting example of a nucleic acid sequence encoding NphB is:









(SEQ ID NO: 132)


atgtcagaagccgcagatgtcgaaagagtttacgccgctatggaagaagc





cgccggtttgttaggtgttgcctgtgccagagataagatctacccattgt





tgtctacttttcaagatacattagttgaaggtggttcagttgttgttttc





tctatggcttcaggtagacattctacagaattggatttctctatctcagt





tccaacatcacatggtgatccatacgctactgttgttgaaaaaggtttat





ttccagcaacaggtcatccagttgatgatttgttggctgatactcaaaag





catttgccagtttctatgtttgcaattgatggtgaagttactggtggttt





caagaaaacttacgctttetttccaactgataacatgccaggtgttgcag





aattatctgctattccatcaatgccaccagctgttgcagaaaatgcagaa





ttatttgctagatacggtttggataaggttcaaatgacatctatggatta





caagaaaagacaagttaatttgtacttttctgaattatcagcacaaactt





tggaagctgaatcagttttggcattagttagagaattgggtttacatgtt





ccaaacgaattgggtttgaagttttgtaaaagatctttctcagtttatcc





aactttaaactgggaaacaggcaagatcgatagattatgtttcgcagtta





tctctaacgatccaacattggttccatcttcagatgaaggtgatatcgaa





aagtttcataactacgctactaaagcaccatatgcttacgttggtgaaaa





gagaacattagtttatggtttgactttatcaccaaaggaagaatactaca





agttgggtgcttactaccacattaccgacgtacaaagaggtttattgaaa





gcattcgatagtttagaagactaa.






In other embodiments, a PT corresponds to CsPT1, which is disclosed as SEQ ID NO:2 in U.S. Pat. No. 8,884,100 (C. sativa; corresponding to SEQ ID NO: 110 in this application):









(SEQ ID NO: 110)


MGLSSVCTFSFQTNYHTLLNPHNNNPKTSLLCYRHPKTPIKYSYNNFPSK





HCSTKSFHLQNKCSESLSIAKNSIRAATTNQTEPPESDNHSVATKILNFG





KACWKLQRPYTIIAFTSCACGLFGKELLHNTNLISWSLMFKAFFFLVAIL





CIASFTTTINQIYDLHIDRINKPDLPLASGEISVNTAWEVISIIVALFGL





IITIKMKGGPLYIFGYCFGIFGGIVYSVPPFRWKQNPSTAFLLNFLAHII





TNFTFYYASRAALGLPFELRPSFTFLLAFMKSMGSALALIKDASDVEGDT





KFGISTLASKYGSRNLTLFCSGIVLLSYVAAILAGIIWPQAFNSNVMLLS





HAILAFWLILQTRDFALTNYDPEAGRRFYEFMWKLYYAEYLVYVFI.






In some embodiments, a PT corresponds to CsPT4, which is disclosed as SEQ ID NO:1 in WO2019071000, corresponding to SEQ ID NO: 133 in this application:









(SEQ ID NO: 133)


MGLSLVCTFSFQTNYHTLLNPHNKNPKNSLLSYQHPKTPIIKSSYDNFPS





KYCLTKNFHLLGLNSHNRISSQSRSIRAGSDQIEGSPHHESDNSIATKIL





NFGHTCWKLQRPYVVKGMISIACGLFGRELFNNRHLFSWGLMWKAFFALV





PILSFNFFAAIMNQIYDVDIDRINKPDLPLVSGEMSIETAWILSIIVALT





GLIVTIKLKSAPLFVFIYIFGIFAGFAYSVPPIRWKQYPFTNFLITISSH





VGLAFTSYSATTSALGLPFVWRPAFSFITAFMTVMGMTIAFAKDISDIEG





DAKYGVSTVATKLGARNIVITFVVSGVLLLNYLVSISIGIIWPQVFKSNE





VIILSHAILAFCLIFQTRELALANYASAPSRQFFEFIWLLYYAEYFVYVF





I.






In some embodiments, a PT corresponds to a truncated CsPT4, which is provided as SEQ ID NO: 134 herein:









MSAGSDQIEGSPHHESDNSIATKILNFGHTCWKLQRPYVVKGMISIACGL





FGRELFNNRHLFSWGLMWKAFFALVPILSFNFFAAIMNQIYDVDIDRINK





PDLPLVSGEMSIETAWILSIIVALTGLIVTIKLKSAPLFVFIYIFGIFAG





FAYSVPPIRWKQYPFTNFLITISSHVGLAFTSYSATTSALGLPFVWRPAF





SFITAFMTVMGMTIAFAKDISDIEGDAKYGVSTVATKLGARNMTFVVSGV





LLLNYLVSISIGIIWPQVFKSNEVIILSHAILAFCLIFQTRELALANYAS





APSRQFFEFIWLLYYAEYFVYVFI.






Functional expression of paralog C. sativa CBGAS enzymes in S. cerevisiae and production of the major cannabinoid CBGA has been reported (Page and Boubakir US 20120144523, 2012, and Luo et al. Nature, 2019). Luo et al. reported the production of CBGA in S. cerevisiae by expressing a truncated version of a C. sativa CBGAS, CsPT4, with its native signal peptide removed (Luo et al. Nature, 2019). Without being bound by a particular theory, the integral-membrane nature of C. sativa CBGAS enzymes may render functional expression of C. sativa CBGAS enzymes in heterologous hosts challenging. Removal of transmembrane domain(s) or signal sequences or use of prenyltransferases that are not associated with the membrane and are not integral membrane proteins may facilitate increased interaction between the enzyme and available substrate, for example in the cellular cytosol and/or in organelles that may be targeted using peptides that confer localization.


In some embodiments, the PT is a soluble PT. In some embodiments, the PT is a cytosolic PT. In some embodiments, the PT is a secreted protein. In some embodiments, the PT is not a membrane-associated protein. In some embodiments, the PT is not an integral membrane protein. In some embodiments, the PT does not comprise a transmembrane domain or a predicted transmembrane. In some embodiments, the PT may be primarily detected in the cytosol (e.g., detected in the cytosol to a greater extent than detected associated with the cell membrane). In some embodiments, the PT is a protein from which one or more transmembrane domains have been removed and/or mutated (e.g., by truncation, deletions, substitutions, insertions, and/or additions) so that the PT localizes or is predicted to localize in the cytosol of the host cell, or to cytosolic organelles within the host cell, or, in the case of bacterial hosts, in the periplasm. In some embodiments, the PT is a protein from which one or more transmembrane domains have been removed or mutated (e.g., by truncation, deletions, substitutions, insertions, and/or additions) so that the PT has increased localization to the cytosol, organelles, or periplasm of the host cell, as compared to membrane localization.


Within the scope of the term “transmembrane domains” are predicted or putative transmembrane domains in addition to transmembrane domains that have been empirically determined. In general, transmembrane domains are characterized by a region of hydrophobicity that facilitates integration into the cell membrane. Methods of predicting whether a protein is a membrane protein or a membrane-associated protein are known in the art and may include, for example, amino acid sequence analysis, hydropathy plots, and/or protein localization assays.


In some embodiments, the PT is a protein from which a signal sequence has been removed and/or mutated so that the PT is not directed to the cellular secretory pathway. In some embodiments, the PT is a protein from which a signal sequence has been removed and/or mutated so that the PT is localized to the cytosol or has increased localization to the cytosol (e.g., as compared to the secretory pathway).


In some embodiments, the PT is a secreted protein. In some embodiments, the PT contains a signal sequence.


In some embodiments, a PT is a fusion protein. For example, a PT may be fused to one or more genes in the metabolic pathway of a host cell. In certain embodiments, a PT may be fused to mutant forms of one or more genes in the metabolic pathway of a host cell.


In some embodiments, a PT described in this application transfers one or more prenyl groups to any of positions 1, 2, 3, 4, or 5 in a compound of Formula (6), shown below:




embedded image


In some embodiments, the PT transfers a prenyl group to any of positions 1, 2, 3, 4, or 5 in a compound of Formula (6), shown below:




embedded image


to form a compound of one or more of Formula (8w), Formula (8x), Formula (8′), Formula (8y), Formula (8z):




embedded image


or a pharmaceutically acceptable salt, solvate, hydrate, polymorph, co-crystal, tautomer, stereoisomer, isotopically labeled derivative, or prodrug thereof, wherein a is 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10.


Terminal Synthases (TS)

A host cell described in this application may comprise a terminal synthase (TS). As used in this application, a “TS” refers to an enzyme that is capable of catalyzing oxidative cyclization of a prenyl moiety (e.g., terpene) to produce a ring-containing product (e.g., heterocyclic ring-containing product). In certain embodiments, a TS is capable of catalyzing oxidative cyclization of a prenyl moiety (e.g., terpene) to produce a carbocyclic-ring containing product (e.g., cannabinoid). In certain embodiments, a TS is capable of catalyzing oxidative cyclization of a prenyl moiety (e.g., terpene) to produce a heterocyclic-ring containing product (e.g., cannabinoid). In certain embodiments, a TS is capable of catalyzing oxidative cyclization of a prenyl moiety (e.g., terpene) to produce a cannabinoid.


In some embodiments, a TS is a tetrahydrocannabinolic acid synthase (THCAS), a cannabidiolic acid synthase (CBDAS), and/or a cannabichromenic acid synthase (CBCAS). As one of ordinary skill in the art would appreciate a TS could be obtained from any source, including naturally occurring sources and synthetic sources (e.g., a non-natually occurring TS).


a. Substrates


A TS may be capable of using one or more substrates. In some instances, the location of the prenyl group and/or the R group differs between TS substrates. For example, a TS may be capable of using as a substrate one or more compounds of Formula (8w), Formula (8x), Formula (8′), Formula (8y), and/or Formula (8z):




embedded image


or a pharmaceutically acceptable salt, solvate, hydrate, polymorph, co-crystal, tautomer, stereoisomer, isotopically labeled derivative, or prodrug thereof, wherein a is 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10.


In certain embodiments, a compound of Formula (8′) is a compound of Formula (8):




embedded image


In some embodiments, a TS catalyzes oxidative cyclization of the prenyl moiety (e.g., terpene) of a compound of Formula (8) described in this application and shown in FIG. 2. In certain embodiments, a compound of Formula (8) is a compound of Formula (8a):




embedded image


b. Products


In embodiments wherein CBGA is the substrate, the TS enzymes CBDAS, THCAS and CBCAS would generally catalyze the formation of cannabidiolic acid (CBDA), A9-tetrahydrocannabinolic acid (THCA) and cannabichromenic acid (CBCA), respectively. However, in some embodiments, a TS can produce more than one different product depending on reaction conditions. For example, the pH of the reaction environment may cause a THCAS or a CBDAS to produce CBCA in greater proportions than THCA or CBDAS, respectively (see, for example, U.S. Pat. No. 9,359,625 to Winnicki and Donsky, incorporated by reference in its entirety).


A TS may be capable of using one or more substrates described in this application to produce one or more products. Non-limiting example of TS products are shown in Table 1. In some instances, a TS is capable of using one substrate to produce 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 different products. In some embodiments, a TS is capable of using more than one substrate to produce 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 different products.


In some embodiments, a TS is capable of producing a compound of Formula (X-A) and/or a compound of Formula (X-B):




embedded image


or a pharmaceutically acceptable salt, solvate, hydrate, polymorph, co-crystal, tautomer, stereoisomer, isotopically labeled derivative, or prodrug thereof,


wherein custom-character is a double bond or a single bond, as valency permits;


R is hydrogen, optionally substituted acyl, optionally substituted alkyl, optionally substituted alkenyl, optionally substituted alkynyl, optionally substituted carbocyclyl, or optionally substituted aryl;


RZ1 is hydrogen, optionally substituted acyl, optionally substituted alkyl, optionally substituted alkenyl, optionally substituted alkynyl, optionally substituted carbocyclyl, or optionally substituted aryl;


RZ2 is hydrogen, optionally substituted acyl, optionally substituted alkyl, optionally substituted alkenyl, optionally substituted alkynyl, optionally substituted carbocyclyl, or optionally substituted aryl;


or optionally, RZ1 and RZ2 are taken together with their intervening atoms to form an optionally substituted carbocyclic ring;


R3A is hydrogen, optionally substituted acyl, optionally substituted alkyl, optionally substituted alkenyl, or optionally substituted alkynyl;


R3B is hydrogen, optionally substituted acyl, optionally substituted alkyl, optionally substituted alkenyl, or optionally substituted alkynyl; and/or


RY is hydrogen, optionally substituted acyl, optionally substituted alkyl, optionally substituted alkenyl, or optionally substituted alkynyl.


In some embodiments, a compound of Formula (X-A) is:




embedded image


In certain embodiments, a compound of Formula (10)




embedded image


has a chiral atom labeled with * at carbon 10 and a chiral atom labeled with ** at carbon 6. In certain embodiments, in a compound of Formula (10)




embedded image


the chiral atom labeled with * at carbon 10 is of the R-configuration or S-configuration; and a chiral atom labeled with ** at carbon 6 is of the R-configuration. In certain embodiments, in a compound of Formula (10)




embedded image


the chiral atom labeled with * at carbon 10 is of the S-configuration; and a chiral atom labeled with ** at carbon 6 is of the R-configuration or S-configuration. In certain embodiments, in a compound of Formula (10)




embedded image


the chiral atom labeled with * at carbon 10 is of the R-configuration and a chiral atom labeled with ** at carbon 6 is of the R-configuration. In certain embodiments, a compound of Formula (10)




embedded image


is of the formula:




embedded image


In certain embodiments, in a compound of Formula (10)




embedded image


the chiral atom labeled with * at carbon 10 is of the S-configuration and a chiral atom labeled with ** at carbon 6 is of the S-configuration. In certain embodiments, a compound of Formula (10)




embedded image


is of the formula:




embedded image


In certain embodiments, a compound of Formula (10a)




embedded image


has a chiral atom labeled with * at carbon 10 and a chiral atom labeled with ** at carbon 6. In certain embodiments, in a compound of Formula (10a)




embedded image


the chiral atom labeled with * at carbon 10 is of the R-configuration or S-configuration; and a chiral atom labeled with ** at carbon 6 is of the R-configuration. In certain embodiments, in a compound of Formula (10a)




embedded image


the chiral atom labeled with * at carbon 10 is of the S-configuration; and a chiral atom labeled with ** at carbon 6 is of the R-configuration or S-configuration. In certain embodiments, in a compound of Formula (10a)




embedded image


the chiral atom labeled with * at carbon 10 is of the R-configuration and a chiral atom labeled with ** at carbon 6 is of the R-configuration. In certain embodiments, a compound of Formula (10a)




embedded image


is of the formula:




embedded image


In certain embodiments, in a compound of Formula (10a)




embedded image


the chiral atom labeled with * at carbon 10 is of the S-configuration and a chiral atom labeled with ** at carbon 6 is of the S-configuration. In certain embodiments, a compound of Formula (10a)




embedded image


is of the formula:




embedded image


In some embodiments, a compound of Formula (X-A) is:




embedded image


In some embodiments, a compound of Formula (X-A) is:




embedded image


In some embodiments, a compound of Formula (X-B) is:




embedded image


In certain embodiments, a compound of Formula (9)




embedded image


has a chiral atom labeled with * at carbon 3 and a chiral atom labeled with ** at carbon 4. In certain embodiments, in a compound of Formula (9)




embedded image


the chiral atom labeled with * at carbon 3 is of the R-configuration or S-configuration; and a chiral atom labeled with ** at carbon 4 is of the R-configuration. In certain embodiments, in a compound of Formula (9)




embedded image


the chiral atom labeled with * at carbon 3 is of the S-configuration; and a chiral atom labeled with ** at carbon 4 is of the R-configuration or S-configuration. In certain embodiments, in a compound of Formula (9)




embedded image


the chiral atom labeled with * at carbon 3 is of the R-configuration and a chiral atom labeled with ** at carbon 4 is of the R-configuration. In certain embodiments, a compound of Formula (9)




embedded image


is of the formula:




embedded image


In certain embodiments, in a compound of Formula (9)




embedded image


the chiral atom labeled with * at carbon 3 is of the S-configuration and a chiral atom labeled with ** at carbon 4 is of the S-configuration. In certain embodiments, a compound of Formula (9)




embedded image


is of the formula:




embedded image


In certain embodiments, a compound of Formula (9a) (CBDA)




embedded image


has a chiral atom labeled with * at carbon 3 and a chiral atom labeled with ** at carbon 4. In certain embodiments, in a compound of Formula (9a)




embedded image


the chiral atom labeled with * at carbon 3 is of the R-configuration or S-configuration; and a chiral atom labeled with ** at carbon 4 is of the R-configuration. In certain embodiments, in a compound of Formula (9a)




embedded image


the chiral atom labeled with * at carbon 3 is of the S-configuration; and a chiral atom labeled with ** at carbon 4 is of the R-configuration or S-configuration. In certain embodiments, in a compound of Formula (9a)




embedded image


the chiral atom labeled with * at carbon 3 is of the R-configuration and a chiral atom labeled with ** at carbon 4 is of the R-configuration. In certain embodiments, a compound of Formula (9a)




embedded image


is of the formula:




embedded image


In certain embodiments, in a compound of Formula (9a)




embedded image


the chiral atom labeled with * at carbon 3 is of the S-configuration and a chiral atom labeled with ** at carbon 4 is of the S-configuration. In certain embodiments, a compound of Formula (9a




embedded image


is of the formula:




embedded image


In some embodiments, as shown in FIG. 2, a TS is capable of producing a cannabinoid from the product of a PT, including, without limitation, an enzyme capable of producing a compound of Formula (9), (10), or (11):




embedded image


or a pharmaceutically acceptable salt, solvate, hydrate, polymorph, co-crystal, tautomer, stereoisomer, isotopically labeled derivative, or prodrug thereof, wherein R is hydrogen, optionally substituted acyl, optionally substituted alkyl, optionally substituted alkenyl, optionally substituted alkynyl, optionally substituted carbocyclyl, or optionally substituted aryl; produced from a compound of Formula (8′):




embedded image


wherein a is 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10; and R is hydrogen, optionally substituted acyl, optionally substituted alkyl, optionally substituted alkenyl, optionally substituted alkynyl, optionally substituted carbocyclyl, or optionally substituted aryl; or using any other substrate. In certain embodiments, a compound of Formula (8′) is a compound of Formula (8):




embedded image


In certain embodiments, a compound of Formula (9), (10), or (11) is produced using a TS from a substrate compound of Formula (8′) (e.g., compound of Formula (8)), for example. Non-limiting examples of substrate compounds of Formula (8′) include but are not limited to cannabigerolic acid (CBGA), cannabigerovarinic acid (CBGVA), or cannabinerolic acid. In certain embodiments, at least one of the hydroxyl groups of the product compounds of Formula (9), (10), or (11) is further methylated. In certain embodiments, a compound of Formula (9) is methylated to form a compound of Formula (12):




embedded image


or a pharmaceutically acceptable salt, solvate, hydrate, polymorph, co-crystal, tautomer, stereoisomer, isotopically labeled derivative, or prodrug thereof.


Tetrahydrocannabinolic Acid Synthase (THCAS)

A host cell described in this application may comprise a TS that is a tetrahydrocannabinolic acid synthase (THCAS). As used in this application “tetrahydrocannabinolic acid synthase (THCAS)” or “Δ1-tetrahydrocannabinolic acid (THCA) synthase” refers to an enzyme that is capable of catalyzing oxidative cyclization of a prenyl moiety (e.g., terpene) of a compound of Formula (8) to produce a ring-containing product (e.g., heterocyclic ring-containing product, carbocyclic-ring containing product) of Formula (10). In certain embodiments, a THCAS refers to an enzyme that is capable of producing Δ9-tetrahydrocannabinolic acid (Δ9-THCA, THCA, Δ9-Tetrahydro-cannabivarinic acid A (Δ9-THCVA-C3 A), THCVA, THCP, or a compound of Formula 10(a), from a compound of Formula (8). In certain embodiments, a THCAS is capable of producing Δ9-tetrahydrocannabinolic acid (Δ9-THCA, THCA, or a compound of Formula 10(a)).


A THCAS may use cannabigerolic acid (CBGA) as a substrate. In some embodiments, the THCAS produces Δ9-THCA from CBGA. In some embodiments, a THCAS may catalyze the oxidative cyclization of other substrates, such as 3-geranyl-2,4-dihydro-6-alkylbenzoic acids. In some embodiments, a THCAS may catalyze the oxidative cyclization of other substrates, such as 3-geranyl-2,4-dihydro-6-alkylbenzoic acids like cannabigerovarinic acid (CBGVA)). In some embodiments, a THCAS exhibits specificity for CBGA substrates. In some embodiments, a THCAS may use cannabivarinic acid (CBDVA) as a substrate. In some embodiments, the THCAS exhibits specificity for CBDVA substrates. In some embodiments, a THCAS may use cannabiphorol acid (CBDP) as a substrate. In some embodiments, the THCAS exhibits specificity for CBDP substrates.


In some embodiments, a THCAS is from C. sativa. C. sativa THCAS performs the oxidative cyclization of the geranyl moiety of Cannabigerolic Acid (CBGA) (FIG. 4 Structure 8a) to form Tetrahydrocannabinolic Acid (FIG. 4 Structure 10a) using covalently bound flavin adenine dinucleotide (FAD) as a cofactor and molecular oxygen as the final electron acceptor. THCAS was first discovered and characterized by Taura et al. (JACS. 1995) following extraction of the enzyme from the leaf buds of C. sativa and confirmation of its THCA synthase activity in vitro upon the addition of CBGA as a substrate. Additional analysis indicated that the enzyme is a monomer and possesses FAD binding and Berberine Bridge Enzyme (BBE) sequence motifs. A crystal structure of the enzyme published by Shoyama et al. (J Mol Biol. 2012 Oct. 12; 423(1):96-105) revealed that the enzyme covalently binds to a molecule of the cofactor FAD. See also, e.g., Sirikantarams et al., J. Biol. Chem. 2004 Sep. 17; 279(38):39767-39774. There are several THCAS isozymes in Cannabis sativa.


In some embodiments, a C. sativa THCAS (Uniprot KB Accession No.: I1V0C5) comprises the amino acid sequence shown below:









(SEQ ID NO: 135)


MNCSAFSFWFVCKIIFFFLSFNIQISIANPQENFLKCFSEYIPNNPANPK





FIYTQHDQLYMSVLNSTIQNLRFTSDTTPKPLVIVTPSNVSHIQASILCS





KKVGLQIRTRSGGHDAEGMSYISQVPFVVVDLRNMHSIKIDVHSQTAWVE





AGATLGEVYYWINEKNENFSFPGGYCPTVGVGGHFSGGGYGALMRNYGLA





ADNIIDAHLVNVDGKVLDRKSMGEDLFWAIRGGGGENFGIIAAWKIKLVA





VPSKSTIFSVKKNIVIEIHGLVKLFNKWQNIAYKYDKDLVLMTHFITKNI





TDNHGKNKTTVHGYFSSIFHGGVDSLVDLMNKSFPELGIKKTDCKEFSWI





DTTIFYSGVVNFNTANFKKEILLDRSAGKKTAFSIKLDYVKKPIPETAMV





KILEKLYEEDVGVGMYVLYPYGGIIVIEEISESAIPFPHRAGIIVIYELW





YTASWEKQEDNEKHINWVRSVYNFTTPYVSQNPRLAYLNYRDLDLGKTNP





ESPNNYTQARIWGEKYFGKNFNRLVKVKTKADPNNFFRNEQSIPPLPPHH





H.






In some embodiments, a THCAS comprises the sequence shown below:









(SEQ ID NO: 136)


NPQENFLKCFSEYIPNNPANPKFIYTQHDQLYMSVLNSTIQNLRFTSDTT





PKPLVIVTPSNVSHIQASILCSKKVGLQIRTRSGGHDAEGMSYISQVPFV





VVDLRNMHSIKIDVHSQTAWVEAGATLGEVYYWINEKNENFSFPGGYCPT





VGVGGHFSGGGYGALMRNYGLAADNIIDAHLVNVDGKVLDRKSMGEDLFW





AIRGGGGENFGIIAAWKIKLVAVPSKSTIFSVKKNMEIHGLVKLFNKWQN





IAYKYDKDLVLMTHFITKNITDNHGKNKTTVHGYFSSIFHGGVDSLVDLM





NKSFPELGIKKTDCKEFSWIDTTIFYSGVVNFNTANFKKEILLDRSAGKK





TAFSIKLDYVKKPIPETAMVKILEKLYEEDVGVGMYVLYPYGGEVIEEIS





ESAIPFPHRAGEVIYELWYTASWEKQEDNEKHINWVRSVYNFTTPYVSQN





PRLAYLNYRDLDLGKTNPESPNNYTQARIWGEKYFGKNFNRLVKVKTKAD





PNNFFRNEQSIPPLPPHHH.






A non-limiting example of a nucleotide sequence encoding SEQ ID NO: 136 is:









(SEQ ID NO: 137)


aacccgcaagaaaactttctaaaatgcttttctgaatacattcctaacaa





ccctgccaacccgaagtttatctacacacaacacgatcaattgtatatga





gcgtgttgaatagtacaatacagaacctgaggtttacatccgacacaacg





ccgaaaccgctagtgatcgtcacaccctccaacgtaagccacattcaggc





aagcattttatgcagcaagaaagtcggactgcagataaggacgaggtccg





gaggacacgacgccgaagggatgagctatatctcccaggtaccttttgtg





gtggtagacttgagaaatatgcactctatcaagatagacgttcactccca





aaccgcttgggttgaggcgggagccacccttggtgaggtctactactgga





tcaacgaaaagaatgaaaattttagctttcctgggggatattgcccaact





gtaggtgttggcggccacttctcaggaggeggttatggggccttgatgcg





taactacggacttgeggccgacaacattatagacgcacatctagtgaatg





tagacggcaaagttttagacaggaagagcatgggtgaggatcttttttgg





gcaattagaggcggagggggagaaaattttggaattatcgctgcttggaa





aattaagctagttgcggtaccgagcaaaagcactatattctctgtaaaaa





agaacatggagatacatggtttggtgaagctttttaataagtggcaaaac





atcgcgtacaagtacgacaaagatctggttctgatgacgcattttataac





gaaaaatatcaccgacaaccacggaaaaaacaaaaccacagtacatggct





acttctctagtatatttcatgggggagtcgattctctggttgatttaatg





aacaaatcattcccagagttgggtataaagaagacagactgtaaggagtt





ctcttggattgacacaactatattctattcaggcgtagtcaactttaaca





cggcgaatttcaaaaaagagatccttctggacagatccgcaggtaagaaa





actgcgttctctatcaaattggactatgtgaagaagcctattcccgaaac





cgcgatggtcaagatacttgagaaattatacgaggaagatgtgggagttg





gaatgtacgtactttatccctatggtgggataatggaagaaatcagcgag





agcgccattccatttccccatcgtgccggcatcatgtacgagctgtggta





tactgcgagttgggagaagcaagaagacaacgaaaagcacattaactggg





tcagatcagtttacaatttcaccaccccatacgtgtcccagaatccgcgt





ctggcttacttgaactaccgtgatcttgacctgggtaaaacgaacccgga





gtcacccaacaattacactcaagctagaatctggggagagaaatactttg





ggaagaacttcaacaggttagtaaaggttaaaaccaaggcagatccaaac





aacttttttagaaatgaacaatccattcccccgctacccccgcaccatca





c.






In some embodiments, a C. sativa THCAS comprises the amino acid sequence set forth in UniProtKB-Q8GTB6 (SEQ ID NO: 112):









MNCSAFSFWFVCKIIFFFLSFHIQISIANPRENFLKCFSKHIPNNVANPK





LVYTQHDQLYMSILNSTIQNLRFISDTTPKPLVIVTPSNNSHIQATILCS





KKVGLQIRTRSGGHDAEGMSYISQVPFVVVDLRNMHSIKIDVHSQTAWVE





AGATLGEVYYWINEKNENLSFPGGYCPTVGVGGHFSGGGYGALMRNYGLA





ADNIIDAHLVNVDGKVLDRKSMGEDLFWAIRGGGGENFGIIAAWKIKLVA





VPSKSTIFSVKKNMEIHGLVKLFNKWQNIAYKYDKDLVLMTHFITKNITD





NHGKNKTTVHGYFSSIFHGGVDSLVDLMNKSFPELGIKKTDCKEFSWIDT





TIFYSGVVNFNTANFKKEILLDRSAGKKTAFSIKLDYVKKPIPETAMVKI





LEKLYEEDVGAGMYVLYPYGGEVIEEISESAIPFPHRAGEVIYELWYTAS





WEKQEDNEKHINWVRSVYNFTTPYVSQNPRLAYLNYRDLDLGKTNHASPN





NYTQARIWGEKYFGKNFNRLVKVKTKVDPNNFFRNEQSIPPLPPHHH.






Additional non-limiting examples of THCAS enzymes may also be found in U.S. Pat. No. 9,512,391 and US Publication No. 2018/0179564, which are incorporated by reference in this application in their entireties.


Cannabidiolic Acid Synthase (CBDAS)

A host cell described in this application may comprise a TS that is a cannabidiolic acid synthase (CBDAS). As used in this application, a “CBDAS” refers to an enzyme that is capable of catalyzing oxidative cyclization of a prenyl moiety (e.g., terpene) of a compound of Formula (8) to produce a compound of Formula 9. In some embodiments, a compound of Formula 9 is a compound of Formula (9a) (cannabidiolic acid (CBDA)), CBDVA, or CBDP. A CBDAS may use cannabigerolic acid (CBGA) or cannabinerolic acid as a substrate. In some embodiments, a cannabidiolic acid synthase is capable of oxidative cyclization of cannabigerolic acid (CBGA) to produce cannabidiolic acid (CBDA). In some embodiments, the CBDAS may catalyze the oxidative cyclization of other substrates, such as 3-geranyl-2,4-dihydro-6-alkylbenzoic acids like cannabigerovarinic acid (CBVGA). In some embodiments, the CBDAS exhibits specificity for CBGA substrates.


In some embodiments, a CBDAS is from Cannabis. In C. sativa, CBDAS is encoded by the CBDAS gene and is a flavoenzyme. A non-limiting example of an amino acid sequence encoding CBDAS is provided by UniProtKB-A6P6V9 (SEQ ID NO: 111) from C. sativa:









MKCSTFSFWFVCKIIFFFFSFNIQTSIANPRENFLKCFSQYIPNNATNLK





LVYTQNNPLYMSVLNSTIHNLRFTSDTTPKPLVIVTPSHVSHIQGTILCS





KKVGLQIRTRSGGHDSEGMSYISQVPFVIVDLRNMRSIKIDVHSQTAWVE





AGATLGEVYYWVNEKNENLSLAAGYCPTVCAGGHFGGGGYGPLMRNYGLA





ADNIIDAHLVNVHGKVLDRKSMGEDLFWALRGGGAESFGIIVAWKIRLVA





VPKSTMFSVKKEVIEIHELVKLVNKWQNIAYKYDKDLLLMTHFITRNITD





NQGKNKTAIHTYFSSVFLGGVDSLVDLMNKSFPELGIKKTDCRQLSWIDT





IIFYSGVVNYDTDNFNKEILLDRSAGQNGAFKIKLDYVKKPIPESVFVQI





LEKLYEEDIGAGMYALYPYGGIMDEISESAIPFPHRAGILYELWYICSWE





KQEDNEKHLNWIRNIYNFMTPYVSKNPRLAYLNYRDLDIGINDPKNPNNY





TQARIWGEKYFGKNFDRLVKVKTLVDPNNFFRNEQSIPPLPRHRH.






Additional non-limiting examples of CBDAS enzymes may also be found in U.S. Pat. No. 9,512,391 and US Publication No. 2018/0179564, which are incorporated by reference in this application in their entireties.


Cannabichromenic Acid Synthase (CBCAS)

A host cell described in this application may comprise a TS that is a cannabichromenic acid synthase (CBCAS). As used in this application, a “CBCAS” refers to an enzyme that is capable of catalyzing oxidative cyclization of a prenyl moiety (e.g., terpene) of a compound of Formula (8) to produce a compound of Formula (11). In some embodiments, a compound of Formula (11) is a compound of Formula (11a) (cannabichromenic acid (CBCA)), CBCVA, or CBCPA. A CBCAS may use cannabigerolic acid (CBGA) as a substrate. In some embodiments, a CBCAS produces cannabichromenic acid (CBCA) from cannabigerolic acid (CBGA). In some embodiments, the CBCAS may catalyze the oxidative cyclization of other substrates, such as 3-geranyl-2,4-dihydro-6-alkylbenzoic acids like cannabigerovarinic acid (CBVGA), or CBCPA. In some embodiments, the CBCAS exhibits specificity for CBGA substrates.


In some embodiments, a CBCAS is from Cannabis. In C. sativa, an amino acid sequence encoding CBCAS is provided by, and incorporated by reference from, SEQ ID NO:2 disclosed in U.S. Patent Publication No. 20170211049. In other embodiments, a CBCAS may be a THCAS described in and incorporated by reference from U.S. Pat. No. 9,359,625. SEQ ID NO:2 disclosed in U.S. Patent Publication No. 20170211049 (corresponding to SEQ ID NO: 113 in this application) has the amino acid sequence:









MNCSTFSFWFVCKIIFFFLSFNIQISIANPQENFLKCFSEYIPNNPANPK





FIYTQHDQLYMSVLNSTIQNLRFTSDTTPKPLVIVTPSNVSHIQASILCS





KKVGLQIRTRSGGHDAEGLSYISQVPFAIVDLRNMHTVKVDIHSQTAWVE





AGATLGEVYYWINEMNENFSFPGGYCPTVGVGGHFSGGGYGALMRNYGLA





ADNIIDAHLVNVDGKVLDRKSMGEDLFWAIRGGGGENFGIIAACKIKLVV





VPSKATIFSVKKNMEIHGLVKLFNKWQNIAYKYDKDLMLTTHFRTRNITD





NHGKNKTTVHGYFSSIFLGGVDSLVDLMNKSFPELGIKKTDCKELSWIDT





TIFYSGVVNYNTANFKKEILLDRSAGKKTAFSIKLDYVKKLIPETAMVKI





LEKLYEEEVGVGMYVLYPYGGIMDEISESAIPFPHRAGEVIYELWYTATW





EKQEDNEKHINWVRSVYNFTTPYVSQNPRLAYLNYRDLDLGKTNPESPNN





YTQARIWGEKYFGKNFNRLVKVKTKADPNNFFRNEQSIPPLPPRHH.






Any of the enzymes, host cells, and methods described in this application may be used for the production of cannabinoids and cannabinoid precursors, such as those provided in Table 1. In general, the term “production” is used to refer to the generation of one or more products (e.g., products of interest and/or by-products/off-products), for example, from a particular substrate or reactant. The amount of production may be evaluated at any one or more steps of a pathway, such as a final product or an intermediate product, using metrics familiar to one of ordinary skill in the art. For example, the amount of production may be assessed for a single enzymatic reaction (e.g., conversion of a compound of Formula (8) to a compound of Formula (10) by a TS). Alternatively or in addition, the amount of production may be assessed for a series of enzymatic reactions (e.g., the biosynthetic pathway shown in FIG. 1 and/or FIG. 2). Production may be assessed by any metrics known in the art, for example, by assessing volumetric productivity, enzyme kinetics/reaction rate, specific productivity biomass-specific productivity, titer, yield, and total titer of one or more products (e.g., products of interest and/or by-products/off-products).


In some embodiments, the metric used to measure production may depend on whether a continuous process is being monitored (e.g., several cannabinoid biosynthesis steps are used in combination) or whether a particular end product is being measured. For example, in some embodiments, metrics used to monitor production by a continuous process may include volumetric productivity, enzyme kinetics and reaction rate. In some embodiments, metrics used to monitor production of a particular product may include specific productivity, biomass-specific productivity, titer, yield, and/or total titer of one or more products (e.g., products of interest and/or by-products/off-products).


Production of one or more products (e.g., products of interest and/or by-products/off-products) may be assessed indirectly, for example by determining the amount of a substrate remaining following termination of the reaction/fermentation. For example, for a TS that catalyzes the formation of products (e.g., a compound of Formula (10), including tetrahydrocannabinolic acid (THCA) (Formula (10a)) from a compound of Formula (8), including CBGA (Formula 8(a))), production of the products may be assessed by quantifying the compound of Formula (10) directly or by quantifying the amount of substrate remaining following the reaction (e.g., amount of the compound of Formula (8)).


Variants

Aspects of the disclosure relate to nucleic acids encoding any of the polypeptides (e.g., AAE, PKS, PKC, PT, or TS) described in this application. In some embodiments, a nucleic acid encompassed by the disclosure is a nucleic acid that hybridizes under high or medium stringency conditions to a nucleic acid encoding an AAE, PKS, PKC, PT, or TS and is biologically active. For example, high stringency conditions of 0.2 to 1×SSC at 65° C. followed by a wash at 0.2×SSC at 65° C. can be used. In some embodiments, a nucleic acid encompassed by the disclosure is a nucleic acid that hybridizes under low stringency conditions to a nucleic acid encoding an AAE, PKS, PKC, PT, or TS and is biologically active. For example, low stringency conditions of 6×SSC at room temperature followed by a wash at 2×SSC at room temperature can be used. Other hybridization conditions include 3×SSC at 40 or 50° C., followed by a wash in 1 or 2×SSC at 20, 30, 40, 50, 60, or 65° C.


Hybridizations can be conducted in the presence of formaldehyde, e.g., 10%, 20%, 30% 40% or 50%, which further increases the stringency of hybridization. Theory and practice of nucleic acid hybridization is described, e.g., in S. Agrawal (ed.) Methods in Molecular Biology, volume 20; and Tijssen (1993) Laboratory Techniques in biochemistry and molecular biology-hybridization with nucleic acid probes, e.g., part I chapter 2 “Overview of principles of hybridization and the strategy of nucleic acid probe assays,” Elsevier, New York provide a basic guide to nucleic acid hybridization.


Variants of enzyme sequences described in this application (e.g., AAE, PKS, PKC, PT, or TS, including nucleic acid or amino acid sequences) are also encompassed by the present disclosure. A variant may share at least 5%, at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 71%, at least 72%, at least 73%, at least 74%, at least 75%, at least 76%, at least 77%, at least 78%, at least 79%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity with a reference sequence, including all values in between.


Unless otherwise noted, the term “sequence identity,” as known in the art, refers to a relationship between the sequences of two polypeptides or polynucleotides, as determined by sequence comparison (alignment). In some embodiments, sequence identity is determined across the entire length of a sequence (e.g., AAE, PKS, PKC, PT, or TS sequence). In some embodiments, sequence identity is determined over a region (e.g., a stretch of amino acids or nucleic acids, e.g., the sequence spanning an active site) of a sequence (e.g., AAE, PKS, PKC, PT, or TS sequence).


Identity can also refer to the degree of sequence relatedness between two sequences as determined by the number of matches between strings of two or more residues (e.g., nucleic acid or amino acid residues). Identity measures the percent of identical matches between the smaller of two or more sequences with gap alignments (if any) addressed by a particular mathematical model, algorithms, or computer program.


Identity of related polypeptides or nucleic acid sequences can be readily calculated by any of the methods known to one of ordinary skill in the art. The “percent identity” of two sequences (e.g., nucleic acid or amino acid sequences) may, for example, be determined using the algorithm of Karlin and Altschul Proc. Natl. Acad. Sci. USA 87:2264-68, 1990, modified as in Karlin and Altschul Proc. Natl. Acad. Sci. USA 90:5873-77, 1993. Such an algorithm is incorporated into the NBLAST® and XBLAST® programs (version 2.0) of Altschul et al., J. Mol. Biol. 215:403-10, 1990. BLAST® protein searches can be performed, for example, with the XBLAST program, score=50, wordlength=3 to obtain amino acid sequences homologous to the proteins described in this application. Where gaps exist between two sequences, Gapped BLAST® can be utilized, for example, as described in Altschul et al., Nucleic Acids Res. 25(17):3389-3402, 1997. When utilizing BLAST® and Gapped BLAST® programs, the default parameters of the respective programs (e.g., XBLAST® and NBLAST®) can be used, or the parameters can be adjusted appropriately as would be understood by one of ordinary skill in the art.


Another local alignment technique which may be used, for example, is based on the Smith-Waterman algorithm (Smith, T. F. & Waterman, M. S. (1981) “Identification of common molecular subsequences.” J. Mol. Biol. 147:195-197). A general global alignment technique which may be used, for example, is the Needleman-Wunsch algorithm (Needleman, S. B. & Wunsch, C. D. (1970) “A general method applicable to the search for similarities in the amino acid sequences of two proteins.” J. Mol. Biol. 48:443-453), which is based on dynamic programming.


More recently, a Fast Optimal Global Sequence Alignment Algorithm (FOGSAA) was developed that purportedly produces global alignment of nucleic acid and amino acid sequences faster than other optimal global alignment methods, including the Needleman-Wunsch algorithm. In some embodiments, the identity of two polypeptides is determined by aligning the two amino acid sequences, calculating the number of identical amino acids, and dividing by the length of one of the amino acid sequences. In some embodiments, the identity of two nucleic acids is determined by aligning the two nucleotide sequences and calculating the number of identical nucleotide and dividing by the length of one of the nucleic acids.


For multiple sequence alignments, computer programs including Clustal Omega (Sievers et al., Mol Syst Biol. 2011 Oct. 11; 7:539) may be used.


It should be appreciated that a sequence, including a nucleic acid or amino acid sequence, may be found to have a specified percent identity to a reference sequence, such as a sequence disclosed in this application and/or recited in the claims, using any method known to one of ordinary skill in the art. Different algorithms may yield different percent identity values for a given set of sequences. The claims of this application should be understood to encompass sequences for which percent identity to a reference sequence is calculated using default parameters and/or parameters typically used by the skilled artisan for a given algorithm.


In some embodiments, a sequence, including a nucleic acid or amino acid sequence, is found to have a specified percent identity to a reference sequence, such as a sequence disclosed in this application and/or recited in the claims when sequence identity is determined using the algorithm of Karlin and Altschul Proc. Natl. Acad. Sci. USA 87:2264-68, 1990, modified as in Karlin and Altschul Proc. Natl. Acad. Sci. USA 90:5873-77, 1993 (e.g., BLAST®, NBLAST®, XBLAST® or Gapped BLAST® programs, using default parameters of the respective programs).


In some embodiments, a sequence, including a nucleic acid or amino acid sequence, is found to have a specified percent identity to a reference sequence, such as a sequence disclosed in this application and/or recited in the claims when sequence identity is determined using the Smith-Waterman algorithm (Smith, T. F. & Waterman, M. S. (1981) “Identification of common molecular subsequences.” J. Mol. Biol. 147:195-197) or the Needleman-Wunsch algorithm (Needleman, S. B. & Wunsch, C. D. (1970) “A general method applicable to the search for similarities in the amino acid sequences of two proteins.” J. Mol. Biol. 48:443-453).


In some embodiments, a sequence, including a nucleic acid or amino acid sequence, is found to have a specified percent identity to a reference sequence, such as a sequence disclosed in this application and/or recited in the claims when sequence identity is determined using a Fast Optimal Global Sequence Alignment Algorithm (FOGSAA).


In some embodiments, a sequence, including a nucleic acid or amino acid sequence, is found to have a specified percent identity to a reference sequence, such as a sequence disclosed in this application and/or recited in the claims when sequence identity is determined using Clustal Omega (Sievers et al., Mol Syst Biol. 2011 Oct. 11; 7:539).


As used in this application, a residue (such as a nucleic acid residue or an amino acid residue) in sequence “X” is referred to as corresponding to a position or residue (such as a nucleic acid residue or an amino acid residue) “Z” in a different sequence “Y” when the residue in sequence “X” is at the counterpart position of “Z” in sequence “Y” when sequences X and Y are aligned using amino acid sequence alignment tools known in the art.


As used in this application, variant sequences may be homologous sequences. As used in this application, homologous sequences are sequences (e.g., nucleic acid or amino acid sequences) that share a certain percent identity (e.g., at least 5%, at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 71%, at least 72%, at least 73%, at least 74%, at least 75%, at least 76%, at least 77%, at least 78%, at least 79%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% percent identity, including all values in between). Homologous sequences include but are not limited to paralogous or orthologous sequences. Paralogous sequences arise from duplication of a gene within a genome of a species, while orthologous sequences diverge after a speciation event.


In some embodiments, a polypeptide variant (e.g., AAE, PKS, PKC, PT, or TS enzyme variant) comprises a domain that shares a secondary structure (e.g., alpha helix, beta sheet) with a reference polypeptide (e.g., a reference AAE, PKS, PKC, PT, or TS enzyme). In some embodiments, a polypeptide variant (e.g., AAE, PKS, PKC, PT, or TS enzyme variant) shares a tertiary structure with a reference polypeptide (e.g., a reference AAE, PKS, PKC, PT, or TS enzyme). As a non-limiting example, a polypeptide variant (e.g., AAE, PKS, PKC, PT, or TS enzyme) may have low primary sequence identity (e.g., less than 80%, less than 75%, less than 70%, less than 65%, less than 60%, less than 55%, less than 50%, less than 45%, less than 40%, less than 35%, less than 30%, less than 25%, less than 20%, less than 15%, less than 10%, or less than 5% sequence identity) compared to a reference polypeptide, but share one or more secondary structures (e.g., including but not limited to loops, alpha helices, or beta sheets), or have the same tertiary structure as a reference polypeptide. For example, a loop may be located between a beta sheet and an alpha helix, between two alpha helices, or between two beta sheets. Homology modeling may be used to compare two or more tertiary structures.


Functional variants of the recombinant AAE, PKS, PKC, PT, or TS enzyme disclosed in this application are encompassed by the present disclosure. For example, functional variants may bind one or more of the same substrates or produce one or more of the same products. Functional variants may be identified using any method known in the art. For example, the algorithm of Karlin and Altschul Proc. Natl. Acad. Sci. USA 87:2264-68, 1990 described above may be used to identify homologous proteins with known functions.


Putative functional variants may also be identified by searching for polypeptides with functionally annotated domains. Databases including Pfam (Sonnhammer et al., Proteins. 1997 July; 28(3):405-20) may be used to identify polypeptides with a particular domain.


Homology modeling may also be used to identify amino acid residues that are amenable to mutation (e.g., substitution, deletion, and/or insertion) without affecting function. A non-limiting example of such a method may include use of position-specific scoring matrix (PSSM) and an energy minimization protocol.


Position-specific scoring matrix (PSSM) uses a position weight matrix to identify consensus sequences (e.g., motifs). PSSM can be conducted on nucleic acid or amino acid sequences. Sequences are aligned and the method takes into account the observed frequency of a particular residue (e.g., an amino acid or a nucleotide) at a particular position and the number of sequences analyzed. See, e.g., Stormo et al., Nucleic Acids Res. 1982 May 11; 10(9):2997-3011. The likelihood of observing a particular residue at a given position can be calculated. Without being bound by a particular theory, positions in sequences with high variability may be amenable to mutation (e.g., substitution, deletion, and/or insertion; e.g., PSSM score ≥0) to produce functional homologs.


PSSM may be paired with calculation of a Rosetta energy function, which determines the difference between the wild-type and the single-point mutant. The Rosetta energy function calculates this difference as (ΔΔGcalc). With the Rosetta function, the bonding interactions between a mutated residue and the surrounding atoms are used to determine whether an amino acid substitution, deletion, or insertion increases or decreases protein stability. For example, an amino acid substitution, deletion, or insertion that is designated as favorable by the PSSM score (e.g. PSSM score 20), can then be analyzed using the Rosetta energy function to determine the potential impact of the amino acid substitution, deletion, or insertion on protein stability. Without being bound by a particular theory, potentially stabilizing amino acid substitutions, deletions, or insertions are desirable for protein engineering (e.g., production of functional homologs). In some embodiments, a potentially stabilizing amino acid substitution, deletion, or insertion has a ΔΔGcalc value of less than −0.1 (e.g., less than −0.2, less than −0.3, less than −0.35, less than −0.4, less than −0.45, less than −0.5, less than −0.55, less than −0.6, less than −0.65, less than −0.7, less than −0.75, less than −0.8, less than −0.85, less than −0.9, less than −0.95, or less than −1.0) Rosetta energy units (R.e.u.). See, e.g., Goldenzweig et al., Mol Cell. 2016 Jul. 21; 63(2):337-346. Doi: 10.1016/j.molcel.2016.06.012.


In some embodiments, an AAE, PKS, PKC, PT, or TS enzyme coding sequence comprises an amino acid substitution, deletion, and/or insertion at 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100 or more than 100 positions corresponding to a reference (e.g., AAE, PKS, PKC, PT, or TS enzyme) coding sequence. In some embodiments, the AAE, PKS, PKC, PT, or TS enzyme coding sequence comprises an amino acid substitution, deletion, and/or insertion in 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100 or more codons of the coding sequence relative to a reference (e.g., AAE, PKS, PKC, PT, or TS enzyme) coding sequence. As will be understood by one of ordinary skill in the art, a substitution, insertion, or deletion within a codon may or may not change the amino acid that is encoded by the codon due to degeneracy of the genetic code. In some embodiments, the one or more substitutions, insertions, or deletions in the coding sequence do not alter the amino acid sequence of the coding sequence (e.g., AAE, PKS, PKC, PT, or TS enzyme) relative to the amino acid sequence of a reference polypeptide (e.g., AAE, PKS, PKC, PT, or TS enzyme).


In some embodiments, the one or more substitutions, deletions, and/or insertions in a recombinant AAE, PKS, PKC, PT, or TS enzyme sequence alters the amino acid sequence of the polypeptide (e.g., AAE, PKS, PKC, PT, or TS enzyme) relative to the amino acid sequence of a reference polypeptide (e.g., AAE, PKS, PKC, PT, or TS enzyme). In some embodiments, the one or more substitutions, insertions, or deletions alters the amino acid sequence of the recombinant polypeptide (e.g., AAE, PKS, PKC, PT, or TS enzyme) relative to the amino acid sequence of a reference polypeptide (e.g., AAE, PKS, PKC, PT, or TS enzyme) and alters (enhances or reduces) an activity of the polypeptide relative to the reference polypeptide.


The activity (e.g., specific activity) of any of the recombinant polypeptides described in this application (e.g., AAE, PKS, PKC, PT, or TS enzyme) may be measured using routine methods. As a non-limiting example, a recombinant polypeptide's activity may be determined by measuring its substrate specificity, product(s) produced, the concentration of product(s) produced, or any combination thereof. As used in this application, “specific activity” of a recombinant polypeptide refers to the amount (e.g., concentration) of a particular product produced for a given amount (e.g., concentration) of the recombinant polypeptide per unit time.


The skilled artisan will also realize that insertions, substitutions, or deletions in a recombinant polypeptide (e.g., AAE, PKS, PKC, PT, or TS enzyme) coding sequence may result in conservative amino acid substitutions to provide functionally equivalent variants of the foregoing polypeptides, e.g., variants that retain the activities of the polypeptides. As used in this application, a “conservative amino acid substitution” refers to an amino acid substitution that does not alter the relative charge or size characteristics or functional activity of the protein in which the amino acid substitution is made.


In some instances, an amino acid is characterized by its R group (see, e.g., Table 3). For example, an amino acid may comprise a nonpolar aliphatic R group, a positively charged R group, a negatively charged R group, a nonpolar aromatic R group, or a polar uncharged R group. Non-limiting examples of an amino acid comprising a nonpolar aliphatic R group include alanine, glycine, valine, leucine, methionine, and isoleucine. Non-limiting examples of an amino acid comprising a positively charged R group includes lysine, arginine, and histidine. Non-limiting examples of an amino acid comprising a negatively charged R group include aspartate and glutamate. Non-limiting examples of an amino acid comprising a nonpolar, aromatic R group include phenylalanine, tyrosine, and tryptophan. Non-limiting examples of an amino acid comprising a polar uncharged R group include serine, threonine, cysteine, proline, asparagine, and glutamine.


Non-limiting examples of functionally equivalent variants of polypeptides may include conservative amino acid substitutions in the amino acid sequences of proteins disclosed in this application. As used in this application “conservative substitution” is used interchangeably with “conservative amino acid substitution” and refers to any one of the amino acid substitutions provided in Table 2.


In some embodiments, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more than 20 residues can be changed when preparing variant polypeptides. In some embodiments, amino acids are replaced by conservative amino acid substitutions.









TABLE 2







Conservative Amino Acid Substitutions











Original

Conservative Amino



Residue
R Group Type
Acid Substitutions







Ala
nonpolar aliphatic R group
Cys, Gly, Ser



Arg
positively charged R group
His, Lys



Asn
polar uncharged R group
Asp, Gln, Glu



Asp
negatively charged R group
Asn, Gln, Glu



Cys
polar uncharged R group
Ala, Ser



Gln
polar uncharged R group
Asn, Asp, Glu



Glu
negatively charged R group
Asn, Asp, Gln



Gly
nonpolar aliphatic R group
Ala, Ser



His
positively charged R group
Arg, Tyr, Trp



Ile
nonpolar aliphatic R group
Leu, Met, Val



Leu
nonpolar aliphatic R group
Ile, Met, Val



Lys
positively charged R group
Arg, His



Met
nonpolar aliphatic R group
Ile, Leu, Phe, Val



Pro
polar uncharged R group



Phe
nonpolar aromatic R group
Met, Trp, Tyr



Ser
polar uncharged R group
Ala, Gly, Thr



Thr
polar uncharged R group
Ala, Asn, Ser



Trp
nonpolar aromatic R group
His, Phe, Tyr, Met



Tyr
nonpolar aromatic R group
His, Phe, Trp



Val
nonpolar aliphatic R group
Ile, Leu, Met, Thr










Amino acid substitutions in the amino acid sequence of a polypeptide to produce a recombinant polypeptide (e.g., AAE, PKS, PKC, PT, or TS enzyme) variant having a desired property and/or activity can be made by alteration of the coding sequence of the polypeptide (e.g., AAE, PKS, PKC, PT, or TS enzyme). Similarly, conservative amino acid substitutions in the amino acid sequence of a polypeptide to produce functionally equivalent variants of the polypeptide typically are made by alteration of the coding sequence of the recombinant polypeptide (e.g., AAE, PKS, PKC, PT, or TS enzyme).


Mutations (e.g., substitutions, insertions, additions, or deletions) can be made in a nucleic acid sequence by a variety of methods known to one of ordinary skill in the art. For example, mutations (e.g., substitutions, insertions, additions, or deletions) can be made by PCR-directed mutation, site-directed mutagenesis according to the method of Kunkel (Kunkel, Proc. Nat. Acad. Sci. U.S.A. 82: 488-492, 1985), by chemical synthesis of a gene encoding a polypeptide, by CRISPR, or by insertions, such as insertion of a tag (e.g., a HIS tag or a GFP tag). Mutations can include, for example, substitutions, insertions, additions, deletions, and translocations, generated by any method known in the art. Methods for producing mutations may be found in in references such as Molecular Cloning: A Laboratory Manual, J. Sambrook, et al., eds., Fourth Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 2012, or Current Protocols in Molecular Biology, F. M. Ausubel, et al., eds., John Wiley & Sons, Inc., New York, 2010.


In some embodiments, methods for producing variants include circular permutation (Yu and Lutz, Trends Biotechnol. 2011 January; 29(1):18-25). In circular permutation, the linear primary sequence of a polypeptide can be circularized (e.g., by joining the N-terminal and C-terminal ends of the sequence) and the polypeptide can be severed (“broken”) at a different location. Thus, the linear primary sequence of the new polypeptide may have low sequence identity (e.g., less than 80%, less than 75%, less than 70%, less than 65%, less than 60%, less than 55%, less than 50%, less than 45%, less than 40%, less than 35%, less than 30%, less than 25%, less than 20%, less than 15%, less than 10%, less or less than 5%, including all values in between) as determined by linear sequence alignment methods (e.g., Clustal Omega or BLAST). Topological analysis of the two proteins, however, may reveal that the tertiary structure of the two polypeptides is similar or dissimilar. Without being bound by a particular theory, a variant polypeptide created through circular permutation of a reference polypeptide and with a similar tertiary structure as the reference polypeptide can share similar functional characteristics (e.g., enzymatic activity, enzyme kinetics, substrate specificity or product specificity). In some instances, circular permutation may alter the secondary structure, tertiary structure or quaternary structure and produce an enzyme with different functional characteristics (e.g., increased or decreased enzymatic activity, different substrate specificity, or different product specificity). See, e.g., Yu and Lutz, Trends Biotechnol. 2011 January; 29(1):18-25.


It should be appreciated that in a protein that has undergone circular permutation, the linear amino acid sequence of the protein would differ from a reference protein that has not undergone circular permutation. However, one of ordinary skill in the art would be able to determine which residues in the protein that has undergone circular permutation correspond to residues in the reference protein that has not undergone circular permutation by, for example, aligning the sequences and detecting conserved motifs, and/or by comparing the structures or predicted structures of the proteins, e.g., by homology modeling.


In some embodiments, an algorithm that determines the percent identity between a sequence of interest and a reference sequence described in this application accounts for the presence of circular permutation between the sequences. The presence of circular permutation may be detected using any method known in the art, including, for example, RASPODOM (Weiner et al., Bioinformatics. 2005 Apr. 1; 21(7):932-7). In some embodiments, the presence of circulation permutation is corrected for (e.g., the domains in at least one sequence are rearranged) prior to calculation of the percent identity between a sequence of interest and a sequence described in this application. The claims of this application should be understood to encompass sequences for which percent identity to a reference sequence is calculated after taking into account potential circular permutation of the sequence.


Expression of Nucleic Acids in Host Cells

Aspects of the present disclosure relate to recombinant enzymes, functional modifications and variants thereof, as well as their uses. For example, the methods described in this application may be used to produce cannabinoids and/or cannabinoid precursors. The methods may comprise using a host cell comprising an enzyme disclosed in this application, cell lysate, isolated enzymes, or any combination thereof. Methods comprising recombinant expression of genes encoding an enzyme disclosed in this application in a host cell are encompassed by the present disclosure. In vitro methods comprising reacting one or more cannabinoid precursors or cannabinoids in a reaction mixture with an enzyme disclosed in this application are also encompassed by the present disclosure. In some embodiments, the enzyme is a TS.


A nucleic acid encoding any of the recombinant polypeptides (e.g., AAE, PKS, PKC, PT, or TS enzyme) described in this application may be incorporated into any appropriate vector through any method known in the art. For example, the vector may be an expression vector, including but not limited to a viral vector (e.g., a lentiviral, retroviral, adenoviral, or adeno-associated viral vector), any vector suitable for transient expression, any vector suitable for constitutive expression, or any vector suitable for inducible expression (e.g., a galactose-inducible or doxycycline-inducible vector).


A vector encoding any of the recombinant polypeptides (e.g., AAE, PKS, PKC, PT, or TS enzyme) described in this application may be introduced into a suitable host cell using any method known in the art. Non-limiting examples of yeast transformation protocols are described in Gietz et al., Yeast transformation can be conducted by the LiAc/SS Carrier DNA/PEG method. Methods Mol Biol. 2006; 313:107-20, which is hereby incorporated by reference in its entirety. Host cells may be cultured under any conditions suitable as would be understood by one of ordinary skill in the art. For example, any media, temperature, and incubation conditions known in the art may be used. For host cells carrying an inducible vector, cells may be cultured with an appropriate inducible agent to promote expression.


In some embodiments, a vector replicates autonomously in the cell. In some embodiments, a vector integrates into a chromosome within a cell. A vector can contain one or more endonuclease restriction sites that are cut by a restriction endonuclease to insert and ligate a nucleic acid containing a gene described in this application to produce a recombinant vector that is able to replicate in a cell. Vectors are typically composed of DNA, although RNA vectors are also available. Cloning vectors include, but are not limited to: plasmids, fosmids, phagemids, virus genomes and artificial chromosomes. As used in this application, the terms “expression vector” or “expression construct” refer to a nucleic acid construct, generated recombinantly or synthetically, with a series of specified nucleic acid elements that permit transcription of a particular nucleic acid in a host cell (e.g., microbe), such as a yeast cell. In some embodiments, the nucleic acid sequence of a gene described in this application is inserted into a cloning vector so that it is operably joined to regulatory sequences and, in some embodiments, expressed as an RNA transcript. In some embodiments, the vector contains one or more markers, such as a selectable marker as described in this application, to identify cells transformed or transfected with the recombinant vector. In some embodiments, a host cell has already been transformed with one or more vectors. In some embodiments, a host cell that has been transformed with one or more vectors is subsequently transformed with one or more vectors. In some embodiments, a host cell is transformed simultaneously with more than one vector. In some embodiments, the nucleic acid sequence of a gene described in this application is recoded. Recoding may increase production of the gene product by at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100%, including all values in between) relative to a reference sequence that is not recoded.


In some embodiments, the nucleic acid encoding any of the proteins described in this application is under the control of regulatory sequences (e.g., enhancer sequences). In some embodiments, a nucleic acid is expressed under the control of a promoter. The promoter can be a native promoter, e.g., the promoter of the gene in its endogenous context, which provides normal regulation of expression of the gene. Alternatively, a promoter can be a promoter that is different from the native promoter of the gene, e.g., the promoter is different from the promoter of the gene in its endogenous context.


In some embodiments, the promoter is a eukaryotic promoter. Non-limiting examples of eukaryotic promoters include TDH3, PGK1, PKC1, PDC1, TEF1, TEF2, RPL18B, SSA1, TDH2, PYK1, TPI1, GAL1, GAL10, GAL7, GAL3, GAL2, MET3, MET25, HXT3, HXT7, ACT1, ADH1, ADH2, CUP1-1, ENO2, and SOD1, as would be known to one of ordinary skill in the art (see, e.g., Addgene website: blog.addgene.org/plasmids-101-the-promoter-region). In some embodiments, the promoter is a prokaryotic promoter (e.g., bacteriophage or bacterial promoter). Non-limiting examples of bacteriophage promoters include Plslcon, T3, T7, SP6, and PL. Non-limiting examples of bacterial promoters include Pbad, PmgrB, Ptrc2, Plac/ara, Ptac, and Pm.


In some embodiments, the promoter is an inducible promoter. As used in this application, an “inducible promoter” is a promoter controlled by the presence or absence of a molecule. This may be used, for example, to controllably induce the expression of an enzyme. In some embodiments, an inducible promoter linked to a PT and/or a TS may be used to regulate expression of the enzyme(s), for example to reduce cannabinoid production in certain scenarios (e.g., during transport of the genetically modified organism to satisfy regulatory restrictions in certain jurisdictions, or between jurisdictions where cannabinoids may not be shipped). In some embodiments, an inducible promoter linked to a CBGAS and/or a TS, the CBGAS and/or TS may be used to regulate expression of the enzyme(s), for example to reduce cannabinoid production in certain scenarios (e.g., during transport of the genetically modified organism to satisfy regulatory restrictions in certain jurisdictions, or between jurisdictions where cannabinoids may not be shipped). Non-limiting examples of inducible promoters include chemically regulated promoters and physically regulated promoters. For chemically regulated promoters, the transcriptional activity can be regulated by one or more compounds, such as alcohol, tetracycline, galactose, a steroid, a metal, an amino acid, or other compounds. For physically regulated promoters, transcriptional activity can be regulated by a phenomenon such as light or temperature. Non-limiting examples of tetracycline-regulated promoters include anhydrotetracycline (aTc)-responsive promoters and other tetracycline-responsive promoter systems (e.g., a tetracycline repressor protein (tetR), a tetracycline operator sequence (tetO) and a tetracycline transactivator fusion protein (tTA)). Non-limiting examples of steroid-regulated promoters include promoters based on the rat glucocorticoid receptor, human estrogen receptor, moth ecdysone receptors, and promoters from the steroid/retinoid/thyroid receptor superfamily. Non-limiting examples of metal-regulated promoters include promoters derived from metallothionein (proteins that bind and sequester metal ions) genes. Non-limiting examples of pathogenesis-regulated promoters include promoters induced by salicylic acid, ethylene or benzothiadiazole (BTH). Non-limiting examples of temperature/heat-inducible promoters include heat shock promoters. Non-limiting examples of light-regulated promoters include light responsive promoters from plant cells. In certain embodiments, the inducible promoter is a galactose-inducible promoter. In some embodiments, the inducible promoter is induced by one or more physiological conditions (e.g., pH, temperature, radiation, osmotic pressure, saline gradients, cell surface binding, or concentration of one or more extrinsic or intrinsic inducing agents). Non-limiting examples of an extrinsic inducer or inducing agent include amino acids and amino acid analogs, saccharides and polysaccharides, nucleic acids, protein transcriptional activators and repressors, cytokines, toxins, petroleum-based compounds, metal containing compounds, salts, ions, enzyme substrate analogs, hormones or any combination.


In some embodiments, the promoter is a constitutive promoter. As used in this application, a “constitutive promoter” refers to an unregulated promoter that allows continuous transcription of a gene. Non-limiting examples of a constitutive promoter include TDH3, PGK1, PKC1, PDC1, TEF1, TEF2, RPL18B, SSA1, TDH2, PYK1, TPI1, HXT3, HXT7, ACT1, ADH1, ADH2, ENO2, and SOD1.


Other inducible promoters or constitutive promoters, including synthetic promoters, that may be known to one of ordinary skill in the art are also contemplated.


The precise nature of the regulatory sequences needed for gene expression may vary between species or cell types, but generally include, as necessary, 5′ non-transcribed and 5′ non-translated sequences involved with the initiation of transcription and translation respectively, such as a TATA box, capping sequence, CAAT sequence, and the like. In particular, such 5′ non-transcribed regulatory sequences will include a promoter region which includes a promoter sequence for transcriptional control of the operably joined gene. Regulatory sequences may also include enhancer sequences or upstream activator sequences. The vectors disclosed may include 5′ leader or signal sequences. The regulatory sequence may also include a terminator sequence. In some embodiments, a terminator sequence marks the end of a gene in DNA during transcription. The choice and design of one or more appropriate vectors suitable for inducing expression of one or more genes described in this application in a heterologous organism is within the ability and discretion of one of ordinary skill in the art.


Expression vectors containing the necessary elements for expression are commercially available and known to one of ordinary skill in the art (see, e.g., Sambrook et al., Molecular Cloning: A Laboratory Manual, Fourth Edition, Cold Spring Harbor Laboratory Press, 2012).


Host Cells

The disclosed cannabinoid biosynthetic methods and host cells are exemplified with S. cerevisiae, but are also applicable to other host cells, as would be understood by one of ordinary skill in the art.


Suitable host cells include, but are not limited to: yeast cells, bacterial cells, algal cells, plant cells, fungal cells, insect cells, and animal cells, including mammalian cells. In one illustrative embodiment, suitable host cells include E. coli (e.g., Shuffle™ competent E. coli available from New England BioLabs in Ipswich, Mass.).


Other suitable host cells of the present disclosure include microorganisms of the genus Corynebacterium. In some embodiments, preferred Corynebacterium strains/species include: C. efficiens, with the deposited type strain being DSM44549, C. glutamicum, with the deposited type strain being ATCC13032, and C. ammoniagenes, with the deposited type strain being ATCC6871. In some embodiments the preferred host cell of the present disclosure is C. glutamicum.


Suitable host cells of the genus Corynebacterium, in particular of the species Corynebacterium glutamicum, are in particular the known wild-type strains: Corynebacterium glutamicum ATCC13032, Corynebacterium acetoglutamicum ATCC15806, Corynebacterium acetoacidophilum ATCC13870, Corynebacterium melassecola ATCC17965, Corynebacterium thermoaminogenes FERM BP-1539, Brevibacterium flavum ATCC14067, Brevibacterium lactofermentum ATCC13869, and Brevibacterium divaricatum ATCC14020; and L-amino acid-producing mutants, or strains, prepared therefrom, such as, for example, the L-lysine-producing strains: Corynebacterium glutamicum FERM-P 1709, Brevibacterium flavum FERM-P 1708, Brevibacterium lactofermentum FERM-P 1712, Corynebacterium glutamicum FERM-P 6463, Corynebacterium glutamicum FERM-P 6464, Corynebacterium glutamicum DM58-1, Corynebacterium glutamicum DG52-5, Corynebacterium glutamicum DSM5714, and Corynebacterium glutamicum DSM12866.


Suitable yeast host cells include, but are not limited to: Candida, Hansenula, Saccharomyces, Schizosaccharomyces, Pichia, Kluyveromyces, and Yarrowia. In some embodiments, the yeast cell is Hansenula polymorpha, Saccharomyces cerevisiae, Saccaromyces carlsbergensis, Saccharomyces diastaticus, Saccharomyces norbensis, Saccharomyces kluyveri, Schizosaccharomyces pombe, Komagataella phaffii, formerly known as Pichia pastoris, Pichia finlandica, Pichia trehalophila, Pichia kodamae, Pichia membranaefaciens, Pichia opuntiae, Pichia thermotolerans, Pichia salictaria, Pichia quercuum, Pichia pijperi, Pichia stipitis, Pichia methanolica, Pichia angusta, Kluyveromyces lactis, Candida albicans, or Yarrowia lipolytica.


In some embodiments, the yeast strain is an industrial polyploid yeast strain. Other non-limiting examples of fungal cells include cells obtained from Aspergillus spp., Penicillium spp., Fusarium spp., Rhizopus spp., Acremonium spp., Neurospora spp., Sordaria spp., Magnaporthe spp., Allomyces spp., Ustilago spp., Botrytis spp., and Trichoderma spp.


In certain embodiments, the host cell is an algal cell such as, Chlamydomonas (e.g., C. Reinhardtii) and Phormidium (P. sp. ATCC29409).


In other embodiments, the host cell is a prokaryotic cell. Suitable prokaryotic cells include gram positive, gram negative, and gram-variable bacterial cells. The host cell may be a species of, but not limited to: Agrobacterium, Alicyclobacillus, Anabaena, Anacystis, Acinetobacter, Acidothermus, Arthrobacter, Azobacter, Bacillus, Bifidobacterium, Brevibacterium, Butyrivibrio, Buchnera, Campestris, Camplyobacter, Clostridium, Corynebacterium, Chromatium, Coprococcus, Escherichia, Enterococcus, Enterobacter, Erwinia, Fusobacterium, Faecalibacterium, Francisella, Flavobacterium, Geobacillus, Haemophilus, Helicobacter, Klebsiella, Lactobacillus, Lactococcus, Ilyobacter, Micrococcus, Microbacterium, Mesorhizobium, Methylobacterium, Methylobacterium, Mycobacterium, Neisseria, Pantoea, Pseudomonas, Prochlorococcus, Rhodobacter, Rhodopseudomonas, Rhodopseudomonas, Roseburia, Rhodospirillum, Rhodococcus, Scenedesmus, Streptomyces, Streptococcus, Synecoccus, Saccharomonospora, Saccharopolyspora, Staphylococcus, Serratia, Salmonella, Shigella, Thermoanaerobacterium, Tropheryma, Tularensis, Temecula, Thermosynechococcus, Thermococcus, Ureaplasma, Xanthomonas, Xylella, Yersinia, and Zymomonas.


In some embodiments, the bacterial host strain is an industrial strain. Numerous bacterial industrial strains are known and suitable for the methods and compositions described in this application.


In some embodiments, the bacterial host cell is of the Agrobacterium species (e.g., A. radiobacter, A. rhizogenes, A. rubi), the Arthrobacterspecies (e.g., A. aurescens, A. citreus, A. globformis, A. hydrocarboglutamicus, A. mysorens, A. nicotianae, A. paraffineus, A. protophonniae, A. roseoparaffinus, A. sulfureus, A. ureafaciens), the Bacillus species (e.g., B. thuringiensis, B. anthracis, B. megaterium, B. subtilis, B. lentus, B. circulars, B. pumilus, B. lautus, B. coagulans, B. brevis, B. firmus, B. alkaophius, B. licheniformis, B. clausii, B. stearothermophilus, B. halodurans and B. amyloliquefaciens. In particular embodiments, the host cell will be an industrial Bacillus strain including but not limited to B. subtilis, B. pumilus, B. licheniformis, B. megaterium, B. clausii, B. stearothermophilus and B. amyloliquefaciens. In some embodiments, the host cell will be an industrial Clostridium species (e.g., C. acetobutylicum, C. tetani E88, C. lituseburense, C. saccharobutylicum, C. perfringens, C. beijerinckii). In some embodiments, the host cell will be an industrial Corynebacterium species (e.g., C. glutamicum, C. acetoacidophilum). In some embodiments, the host cell will be an industrial Escherichia species (e.g., E. coli). In some embodiments, the host cell will be an industrial Erwinia species (e.g., E. uredovora, E. carotovora, E. ananas, E. herbicola, E. punctata, E. terreus). In some embodiments, the host cell will be an industrial Pantoea species (e.g., P. citrea, P. agglomerans). In some embodiments, the host cell will be an industrial Pseudomonas species, (e.g., P. putida, P. aeruginosa, P. mevalonii). In some embodiments, the host cell will be an industrial Streptococcus species (e.g., S. equisimiles, S. pyogenes, S. uberis). In some embodiments, the host cell will be an industrial Streptomyces species (e.g., S. ambofaciens, S. achromogenes, S. avermitilis, S. coelicolor, S. aureofaciens, S. aureus, S. fungicidicus, S. griseus, S. lividans). In some embodiments, the host cell will be an industrial Zymomonas species (e.g., Z. mobilis, Z. lipolytica), and the like.


The present disclosure is also suitable for use with a variety of animal cell types, including mammalian cells, for example, human (including 293, HeLa, WI38, PER.C6 and Bowes melanoma cells), mouse (including 3T3, NS0, NS1, Sp2/0), hamster (CHO, BHK), monkey (COS, FRhL, Vero), insect cells, for example fall armyworm (including Sf9 and Sf21), silkmoth (including BmN), cabbage looper (including BTI-Tn-5B1-4) and common fruit fly (including Schneider 2), and hybridoma cell lines.


In various embodiments, strains that may be used in the practice of the disclosure including both prokaryotic and eukaryotic strains, and are readily accessible to the public from a number of culture collections such as American Type Culture Collection (ATCC), Deutsche Sammlung von Mikroorganismen and Zellkulturen GmbH (DSM), Centraalbureau Voor Schimmelcultures (CBS), and Agricultural Research Service Patent Culture Collection, Northern Regional Research Center (NRRL). The present disclosure is also suitable for use with a variety of plant cell types. In some embodiments, the plant is of the Cannabis genus in the family Cannabaceae. In certain embodiments, the plant is of the species Cannabis sativa, Cannabis indica, or Cannabis ruderalis. In other embodiments, the plant is of the genus Nicotiana in the family Solanaceae. In certain embodiments, the plant is of the species Nicotiana rustica.


The term “cell,” as used in this application, may refer to a single cell or a population of cells, such as a population of cells belonging to the same cell line or strain. Use of the singular term “cell” should not be construed to refer explicitly to a single cell rather than a population of cells. The host cell may comprise genetic modifications relative to a wild-type counterpart. Reduction of gene expression and/or gene inactivation in a host cell may be achieved through any suitable method, including but not limited to, deletion of the gene, introduction of a point mutation into the gene, selective editing of the gene and/or truncation of the gene. For example, polymerase chain reaction (PCR)-based methods may be used (see, e.g., Gardner et al., Methods Mol Biol. 2014; 1205:45-78). As a non-limiting example, genes may be deleted through gene replacement (e.g., with a marker, including a selection marker). A gene may also be truncated through the use of a transposon system (see, e.g., Poussu et al., Nucleic Acids Res. 2005; 33(12): e104). A gene may also be edited through of the use of gene editing technologies known in the art, such as CRISPR-based technologies.


Culturing of Host Cells

Any of the cells disclosed in this application can be cultured in media of any type (rich or minimal) and any composition prior to, during, and/or after contact and/or integration of a nucleic acid. The conditions of the culture or culturing process can be optimized through routine experimentation as would be understood by one of ordinary skill in the art. In some embodiments, the selected media is supplemented with various components. In some embodiments, the concentration and amount of a supplemental component is optimized. In some embodiments, other aspects of the media and growth conditions (e.g., pH, temperature, etc.) are optimized through routine experimentation. In some embodiments, the frequency that the media is supplemented with one or more supplemental components, and the amount of time that the cell is cultured, is optimized.


Culturing of the cells described in this application can be performed in culture vessels known and used in the art. In some embodiments, an aerated reaction vessel (e.g., a stirred tank reactor) is used to culture the cells. In some embodiments, a bioreactor or fermentor is used to culture the cell. Thus, in some embodiments, the cells are used in fermentation. As used in this application, the terms “bioreactor” and “fermentor” are interchangeably used and refer to an enclosure, or partial enclosure, in which a biological, biochemical and/or chemical reaction takes place that involves a living organism or part of a living organism. A “large-scale bioreactor” or “industrial-scale bioreactor” is a bioreactor that is used to generate a product on a commercial or quasi-commercial scale. Large scale bioreactors typically have volumes in the range of liters, hundreds of liters, thousands of liters, or more.


Non-limiting examples of bioreactors include: stirred tank fermentors, bioreactors agitated by rotating mixing devices, chemostats, bioreactors agitated by shaking devices, airlift fermentors, packed-bed reactors, fixed-bed reactors, fluidized bed bioreactors, bioreactors employing wave induced agitation, centrifugal bioreactors, roller bottles, and hollow fiber bioreactors, roller apparatuses (for example benchtop, cart-mounted, and/or automated varieties), vertically-stacked plates, spinner flasks, stirring or rocking flasks, shaken multi-well plates, MD bottles, T-flasks, Roux bottles, multiple-surface tissue culture propagators, modified fermentors, and coated beads (e.g., beads coated with serum proteins, nitrocellulose, or carboxymethyl cellulose to prevent cell attachment).


In some embodiments, the bioreactor includes a cell culture system where the cell (e.g., yeast cell) is in contact with moving liquids and/or gas bubbles. In some embodiments, the cell or cell culture is grown in suspension. In other embodiments, the cell or cell culture is attached to a solid phase carrier. Non-limiting examples of a carrier system includes microcarriers (e.g., polymer spheres, microbeads, and microdisks that can be porous or non-porous), cross-linked beads (e.g., dextran) charged with specific chemical groups (e.g., tertiary amine groups), 2D microcarriers including cells trapped in nonporous polymer fibers, 3D carriers (e.g., carrier fibers, hollow fibers, multicartridge reactors, and semi-permeable membranes that can comprising porous fibers), microcarriers having reduced ion exchange capacity, encapsulation cells, capillaries, and aggregates. In some embodiments, carriers are fabricated from materials such as dextran, gelatin, glass, or cellulose.


In some embodiments, industrial-scale processes are operated in continuous, semi-continuous or non-continuous modes. Non-limiting examples of operation modes are batch, fed batch, extended batch, repetitive batch, draw/fill, rotating-wall, spinning flask, and/or perfusion mode of operation. In some embodiments, a bioreactor allows continuous or semi-continuous replenishment of the substrate stock, for example a carbohydrate source and/or continuous or semi-continuous separation of the product, from the bioreactor.


In some embodiments, the bioreactor or fermentor includes a sensor and/or a control system to measure and/or adjust reaction parameters. Non-limiting examples of reaction parameters include biological parameters (e.g., growth rate, cell size, cell number, cell density, cell type, or cell state, etc.), chemical parameters (e.g., pH, redox-potential, concentration of reaction substrate and/or product, concentration of dissolved gases, such as oxygen concentration and CO2 concentration, nutrient concentrations, metabolite concentrations, concentration of an oligopeptide, concentration of an amino acid, concentration of a vitamin, concentration of a hormone, concentration of an additive, serum concentration, ionic strength, concentration of an ion, relative humidity, molarity, osmolarity, concentration of other chemicals, for example buffering agents, adjuvants, or reaction by-products), physical/mechanical parameters (e.g., density, conductivity, degree of agitation, pressure, and flow rate, shear stress, shear rate, viscosity, color, turbidity, light absorption, mixing rate, conversion rate, as well as thermodynamic parameters, such as temperature, light intensity/quality, etc.). Sensors to measure the parameters described in this application are well known to one of ordinary skill in the relevant mechanical and electronic arts. Control systems to adjust the parameters in a bioreactor based on the inputs from a sensor described in this application are well known to one of ordinary skill in the art in bioreactor engineering.


In some embodiments, the method involves batch fermentation (e.g., shake flask fermentation). General considerations for batch fermentation (e.g., shake flask fermentation) include the level of oxygen and glucose. For example, batch fermentation (e.g., shake flask fermentation) may be oxygen and glucose limited, so in some embodiments, the capability of a strain to perform in a well-designed fed-batch fermentation is underestimated. Also, the final product (e.g., cannabinoid or cannabinoid precursor) may display some differences from the substrate in terms of solubility, toxicity, cellular accumulation and secretion and in some embodiments can have different fermentation kinetics.


In some embodiments, the cells of the present disclosure are adapted to produce cannabinoids or cannabinoid precursors in vivo. In some embodiments, the cells are adapted to secrete one or more enzymes for cannabinoid synthesis (e.g., AAE, PKS, PKC, PT, or TS). In some embodiments, the cells of the present disclosure are lysed, and the remaining lysates are recovered for subsequent use. In such embodiments, the secreted or lysed enzyme can catalyze reactions for the production of a cannabinoid or precursor by bioconversion in an in vitro or ex vivo process. In some embodiments, any and all conversions described in this application can be conducted chemically or enzymatically, in vitro or in vivo.


Purification and Further Processing

In some embodiments, any of the methods described in this application may include isolation and/or purification of the cannabinoids and/or cannabinoid precursors produced (e.g., produced in a bioreactor). For example, the isolation and/or purification can involve one or more of cell lysis, centrifugation, extraction, column chromatography, distillation, crystallization, and lyophilization.


The methods described in this application encompass production of any cannabinoid or cannabinoid precursor known in the art. Cannabinoids or cannabinoid precursors produced by any of the recombinant cells disclosed in this application or any of the in vitro methods described in this application may be identified and extracted using any method known in the art. Mass spectrometry (e.g., LC-MS, GC-MS) is a non-limiting example of a method for identification and may be used to extract a compound of interest.


In some embodiments, any of the methods described in this application further comprise decarboxylation of a cannabinoid or cannabinoid precursor. As a non-limiting example, the acid form of a cannabinoid or cannabinoid precursor may be heated (e.g., at least 90° C.) to decarboxylate the cannabinoid or cannabinoid precursor. See, e.g., U.S. Pat. Nos. 10,159,908, 10,143,706, 9,908,832 and 7,344,736. See also, e.g., Wang et al., Cannabis Cannabinoid Res. 2016; 1(1): 262-271.


Compositions, Kits, and Administration

The present disclosure provides compositions, including pharmaceutical compositions, comprising a cannabinoid or a cannabinoid precursor, or pharmaceutically acceptable salt thereof, produced by any of the methods described in this application, and optionally a pharmaceutically acceptable excipient.


In certain embodiments, a cannabinoid or cannabinoid precursor described in this application is provided in an effective amount in a composition, such as a pharmaceutical composition. In certain embodiments, the effective amount is a therapeutically effective amount. In certain embodiments, the effective amount is a prophylactically effective amount.


Compositions, such as pharmaceutical compositions, described in this application can be prepared by any method known in the art. In general, such preparatory methods include bringing a compound described in this application (i.e., the “active ingredient”) into association with a carrier or excipient, and/or one or more other accessory ingredients, and then, if necessary and/or desirable, shaping, and/or packaging the product into a desired single- or multi-dose unit.


Pharmaceutical compositions can be prepared, packaged, and/or sold in bulk, as a single unit dose, and/or as a plurality of single unit doses. A “unit dose” is a discrete amount of the pharmaceutical composition comprising a predetermined amount of the active ingredient. The amount of the active ingredient is generally equal to the dosage of the active ingredient which would be administered to a subject and/or a convenient fraction of such a dosage, such as one-half or one-third of such a dosage.


Relative amounts of the active ingredient, the pharmaceutically acceptable excipient, and/or any additional ingredients in a pharmaceutical composition described in this application will vary, depending upon the identity, size, and/or condition of the subject treated and further depending upon the route by which the composition is to be administered. The composition may comprise between 0.1% and 100% (w/w) active ingredient.


Pharmaceutically acceptable excipients used in the manufacture of pharmaceutical compositions include inert diluents, dispersing and/or granulating agents, surface active agents and/or emulsifiers, disintegrating agents, binding agents, preservatives, buffering agents, lubricating agents, and/or oils. Excipients such as cocoa butter and suppository waxes, coloring agents, coating agents, sweetening, flavoring, and perfuming agents may also be present in the composition. Exemplary excipients include diluents, dispersing and/or granulating agents, surface active agents and/or emulsifiers, disintegrating agents, binding agents, preservatives, buffering agents, lubricating agents, and/or oils (e.g., synthetic oils, semi-synthetic oils) as disclosed in this application.


Exemplary diluents include calcium carbonate, sodium carbonate, calcium phosphate, dicalcium phosphate, calcium sulfate, calcium hydrogen phosphate, sodium phosphate lactose, sucrose, cellulose, microcrystalline cellulose, kaolin, mannitol, sorbitol, inositol, sodium chloride, dry starch, cornstarch, powdered sugar, and mixtures thereof.


Exemplary granulating and/or dispersing agents include potato starch, corn starch, tapioca starch, sodium starch glycolate, clays, alginic acid, guar gum, citrus pulp, agar, bentonite, cellulose, and wood products, natural sponge, cation-exchange resins, calcium carbonate, silicates, sodium carbonate, cross-linked poly(vinyl-pyrrolidone) (crospovidone), sodium carboxymethyl starch (sodium starch glycolate), carboxymethyl cellulose, cross-linked sodium carboxymethyl cellulose (croscarmellose), methylcellulose, pregelatinized starch (starch 1500), microcrystalline starch, water insoluble starch, calcium carboxymethyl cellulose, magnesium aluminum silicate (Veegum), sodium lauryl sulfate, quaternary ammonium compounds, and mixtures thereof.


Exemplary surface active agents and/or emulsifiers include natural emulsifiers (e.g., acacia, agar, alginic acid, sodium alginate, tragacanth, chondrux, cholesterol, xanthan, pectin, gelatin, egg yolk, casein, wool fat, cholesterol, wax, and lecithin), colloidal clays (e.g., bentonite (aluminum silicate) and Veegum (magnesium aluminum silicate)), long chain amino acid derivatives, high molecular weight alcohols (e.g., stearyl alcohol, cetyl alcohol, oleyl alcohol, triacetin monostearate, ethylene glycol distearate, glyceryl monostearate, and propylene glycol monostearate, polyvinyl alcohol), carbomers (e.g., carboxy polymethylene, polyacrylic acid, acrylic acid polymer, and carboxyvinyl polymer), carrageenan, cellulosic derivatives (e.g., carboxymethylcellulose sodium, powdered cellulose, hydroxymethyl cellulose, hydroxypropyl cellulose, hydroxypropyl methylcellulose, methylcellulose), sorbitan fatty acid esters (e.g., polyoxyethylene sorbitan monolaurate (Tween® 20), polyoxyethylene sorbitan (Tween® 60), polyoxyethylene sorbitan monooleate (Tween® 80), sorbitan monopalmitate (Span® 40), sorbitan monostearate (Span® 60), sorbitan tristearate (Span® 65), glyceryl monooleate, sorbitan monooleate (Span® 80), polyoxyethylene esters (e.g., polyoxyethylene monostearate (Myrj® 45), polyoxyethylene hydrogenated castor oil, polyethoxylated castor oil, polyoxymethylene stearate, and Solutol®), sucrose fatty acid esters, polyethylene glycol fatty acid esters (e.g., Cremophor®), polyoxyethylene ethers, (e.g., polyoxyethylene lauryl ether (Brij® 30)), poly(vinyl-pyrrolidone), diethylene glycol monolaurate, triethanolamine oleate, sodium oleate, potassium oleate, ethyl oleate, oleic acid, ethyl laurate, sodium lauryl sulfate, Pluronic® F-68, poloxamer P-188, cetrimonium bromide, cetylpyridinium chloride, benzalkonium chloride, docusate sodium, and/or mixtures thereof.


Exemplary binding agents include starch (e.g., cornstarch and starch paste), gelatin, sugars (e.g., sucrose, glucose, dextrose, dextrin, molasses, lactose, lactitol, mannitol, etc.), natural and synthetic gums (e.g., acacia, sodium alginate, extract of Irish moss, panwar gum, ghatti gum, mucilage of isapol husks, carboxymethylcellulose, methylcellulose, ethylcellulose, hydroxyethylcellulose, hydroxypropyl cellulose, hydroxypropyl methylcellulose, microcrystalline cellulose, cellulose acetate, poly(vinyl-pyrrolidone), magnesium aluminum silicate (Veegum®), and larch arabogalactan), alginates, polyethylene oxide, polyethylene glycol, inorganic calcium salts, silicic acid, polymethacrylates, waxes, water, alcohol, and/or mixtures thereof.


Exemplary preservatives include antioxidants, chelating agents, antimicrobial preservatives, antifungal preservatives, antiprotozoan preservatives, alcohol preservatives, acidic preservatives, and other preservatives. In certain embodiments, the preservative is an antioxidant. In other embodiments, the preservative is a chelating agent.


Exemplary antioxidants include alpha tocopherol, ascorbic acid, acorbyl palmitate, butylated hydroxyanisole, butylated hydroxytoluene, monothioglycerol, potassium metabisulfite, propionic acid, propyl gallate, sodium ascorbate, sodium bisulfite, sodium metabisulfite, and sodium sulfite.


Exemplary chelating agents include ethylenediaminetetraacetic acid (EDTA) and salts and hydrates thereof (e.g., sodium edetate, disodium edetate, trisodium edetate, calcium disodium edetate, dipotassium edetate, and the like), citric acid and salts and hydrates thereof (e.g., citric acid monohydrate), fumaric acid and salts and hydrates thereof, malic acid and salts and hydrates thereof, phosphoric acid and salts and hydrates thereof, and tartaric acid and salts and hydrates thereof. Exemplary antimicrobial preservatives include benzalkonium chloride, benzethonium chloride, benzyl alcohol, bronopol, cetrimide, cetylpyridinium chloride, chlorhexidine, chlorobutanol, chlorocresol, chloroxylenol, cresol, ethyl alcohol, glycerin, hexetidine, imidurea, phenol, phenoxyethanol, phenylethyl alcohol, phenylmercuric nitrate, propylene glycol, and thimerosal.


Exemplary antifungal preservatives include butyl paraben, methyl paraben, ethyl paraben, propyl paraben, benzoic acid, hydroxybenzoic acid, potassium benzoate, potassium sorbate, sodium benzoate, sodium propionate, and sorbic acid.


Exemplary alcohol preservatives include ethanol, polyethylene glycol, phenol, phenolic compounds, bisphenol, chlorobutanol, hydroxybenzoate, and phenylethyl alcohol.


Exemplary acidic preservatives include vitamin A, vitamin C, vitamin E, beta-carotene, citric acid, acetic acid, dehydroacetic acid, ascorbic acid, sorbic acid, and phytic acid.


Other preservatives include tocopherol, tocopherol acetate, deteroxime mesylate, cetrimide, butylated hydroxyanisol (BHA), butylated hydroxytoluened (BHT), ethylenediamine, sodium lauryl sulfate (SLS), sodium lauryl ether sulfate (SLES), sodium bisulfite, sodium metabisulfite, potassium sulfite, potassium metabisulfite, Glydant® Plus, Phenonip®, methylparaben, Germall® 115, Germaben® II, Neolone®, Kathon®, and Euxyl®.


Exemplary buffering agents include citrate buffer solutions, acetate buffer solutions, phosphate buffer solutions, ammonium chloride, calcium carbonate, calcium chloride, calcium citrate, calcium glubionate, calcium gluceptate, calcium gluconate, D-gluconic acid, calcium glycerophosphate, calcium lactate, propanoic acid, calcium levulinate, pentanoic acid, dibasic calcium phosphate, phosphoric acid, tribasic calcium phosphate, calcium hydroxide phosphate, potassium acetate, potassium chloride, potassium gluconate, potassium mixtures, dibasic potassium phosphate, monobasic potassium phosphate, potassium phosphate mixtures, sodium acetate, sodium bicarbonate, sodium chloride, sodium citrate, sodium lactate, dibasic sodium phosphate, monobasic sodium phosphate, sodium phosphate mixtures, tromethamine, magnesium hydroxide, aluminum hydroxide, alginic acid, pyrogen-free water, isotonic saline, Ringer's solution, ethyl alcohol, and mixtures thereof.


Exemplary lubricating agents include magnesium stearate, calcium stearate, stearic acid, silica, talc, malt, glyceryl behanate, hydrogenated vegetable oils, polyethylene glycol, sodium benzoate, sodium acetate, sodium chloride, leucine, magnesium lauryl sulfate, sodium lauryl sulfate, and mixtures thereof.


Exemplary natural oils include almond, apricot kernel, avocado, babassu, bergamot, black current seed, borage, cade, camomile, canola, caraway, carnauba, castor, cinnamon, cocoa butter, coconut, cod liver, coffee, corn, cotton seed, emu, eucalyptus, evening primrose, fish, flaxseed, geraniol, gourd, grape seed, hazel nut, hyssop, isopropyl myristate, jojoba, kukui nut, lavandin, lavender, lemon, litsea cubeba, macademia nut, mallow, mango seed, meadowfoam seed, mink, nutmeg, olive, orange, orange roughy, palm, palm kernel, peach kernel, peanut, poppy seed, pumpkin seed, rapeseed, rice bran, rosemary, safflower, sandalwood, sasquana, savoury, sea buckthorn, sesame, shea butter, silicone, soybean, sunflower, tea tree, thistle, tsubaki, vetiver, walnut, and wheat germ oils. Exemplary synthetic or semi-synthetic oils include, but are not limited to, butyl stearate, medium chain triglycerides (such as caprylic triglyceride and capric triglyceride), cyclomethicone, diethyl sebacate, dimethicone 360, isopropyl myristate, mineral oil, octyldodecanol, oleyl alcohol, silicone oil, and mixtures thereof. In certain embodiments, exemplary synthetic oils comprise medium chain triglycerides (such as caprylic triglyceride and capric triglyceride).


Liquid dosage forms for oral and parenteral administration include pharmaceutically acceptable emulsions, microemulsions, solutions, suspensions, syrups and elixirs. In addition to the active ingredients, the liquid dosage forms may comprise inert diluents commonly used in the art such as, for example, water or other solvents, solubilizing agents and emulsifiers such as ethyl alcohol, isopropyl alcohol, ethyl carbonate, ethyl acetate, benzyl alcohol, benzyl benzoate, propylene glycol, 1,3-butylene glycol, dimethylformamide, oils (e.g., cottonseed, groundnut, corn, germ, olive, castor, and sesame oils), glycerol, tetrahydrofurfuryl alcohol, polyethylene glycols and fatty acid esters of sorbitan, and mixtures thereof. Besides inert diluents, the oral compositions can include adjuvants such as wetting agents, emulsifying and suspending agents, sweetening, flavoring, and perfuming agents. In certain embodiments for parenteral administration, the conjugates described in this application are mixed with solubilizing agents such as Cremophor®, alcohols, oils, modified oils, glycols, polysorbates, cyclodextrins, polymers, and mixtures thereof.


Injectable preparations, for example, sterile injectable aqueous or oleaginous suspensions can be formulated according to the known art using suitable dispersing or wetting agents and suspending agents. The sterile injectable preparation can be a sterile injectable solution, suspension, or emulsion in a nontoxic parenterally acceptable diluent or solvent, for example, as a solution in 1,3-butanediol. Among the acceptable vehicles and solvents that can be employed are water, Ringer's solution, U.S.P., and isotonic sodium chloride solution. In addition, sterile, fixed oils are conventionally employed as a solvent or suspending medium. For this purpose, any bland fixed oil can be employed including synthetic mono- or di-glycerides. In addition, fatty acids such as oleic acid are used in the preparation of injectables.


The injectable formulations can be sterilized, for example, by filtration through a bacterial-retaining filter, or by incorporating sterilizing agents in the form of sterile solid compositions which can be dissolved or dispersed in sterile water or other sterile injectable medium prior to use.


In order to prolong the effect of a drug, it is often desirable to slow the absorption of the drug from subcutaneous or intramuscular injection. This can be accomplished by the use of a liquid suspension of crystalline or amorphous material with poor water solubility. The rate of absorption of the drug then depends upon its rate of dissolution, which, in turn, may depend upon crystal size and crystalline form. Alternatively, delayed absorption of a parenterally administered drug form may be accomplished by dissolving or suspending the drug in an oil vehicle.


Compositions for rectal or vaginal administration are typically suppositories which can be prepared by mixing the conjugates described in this application with suitable non-irritating excipients or carriers such as cocoa butter, polyethylene glycol, or a suppository wax which are solid at ambient temperature but liquid at body temperature and therefore melt in the rectum or vaginal cavity and release the active ingredient.


Solid dosage forms for oral administration include capsules, tablets, pills, powders, and granules. In such solid dosage forms, the active ingredient is mixed with at least one inert, pharmaceutically acceptable excipient or carrier such as sodium citrate or dicalcium phosphate and/or (a) fillers or extenders such as starches, lactose, sucrose, glucose, mannitol, and silicic acid, (b) binders such as, for example, carboxymethylcellulose, alginates, gelatin, polyvinylpyrrolidinone, sucrose, and acacia, (c) humectants such as glycerol, (d) disintegrating agents such as agar, calcium carbonate, potato or tapioca starch, alginic acid, certain silicates, and sodium carbonate, (e) solution retarding agents such as paraffin, (f) absorption accelerators such as quaternary ammonium compounds, (g) wetting agents such as, for example, cetyl alcohol and glycerol monostearate, (h) absorbents such as kaolin and bentonite clay, and (i) lubricants such as talc, calcium stearate, magnesium stearate, solid polyethylene glycols, sodium lauryl sulfate, and mixtures thereof. In the case of capsules, tablets, and pills, the dosage form may include a buffering agent.


Solid compositions of a similar type can be employed as fillers in soft and hard-filled gelatin capsules using such excipients as lactose or milk sugar as well as high molecular weight polyethylene glycols and the like. The solid dosage forms of tablets, dragees, capsules, pills, and granules can be prepared with coatings and shells such as enteric coatings and other coatings well known in the art of pharmacology. They may optionally comprise opacifying agents and can be of a composition that they release the active ingredient(s) only, or preferentially, in a certain part of the intestinal tract, optionally, in a delayed manner. Examples of encapsulating compositions which can be used include polymeric substances and waxes. Solid compositions of a similar type can be employed as fillers in soft and hard-filled gelatin capsules using such excipients as lactose or milk sugar as well as high molecular weight polethylene glycols and the like.


The active ingredient can be in a micro-encapsulated form with one or more excipients as noted above. The solid dosage forms of tablets, dragees, capsules, pills, and granules can be prepared with coatings and shells such as enteric coatings, release controlling coatings, and other coatings well known in the pharmaceutical formulating art. In such solid dosage forms the active ingredient can be admixed with at least one inert diluent such as sucrose, lactose, or starch. Such dosage forms may comprise, as is normal practice, additional substances other than inert diluents, e.g., tableting lubricants and other tableting aids such a magnesium stearate and microcrystalline cellulose. In the case of capsules, tablets and pills, the dosage forms may comprise buffering agents. They may optionally comprise opacifying agents and can be of a composition that they release the active ingredient(s) only, or preferentially, in a certain part of the intestinal tract, optionally, in a delayed manner. Examples of encapsulating agents which can be used include polymeric substances and waxes.


Dosage forms for topical and/or transdermal administration of a compound described in this application may include ointments, pastes, creams, lotions, gels, powders, solutions, sprays, inhalants, and/or patches. Generally, the active ingredient is admixed under sterile conditions with a pharmaceutically acceptable carrier or excipient and/or any needed preservatives and/or buffers as can be required. Additionally, the present disclosure contemplates the use of transdermal patches, which often have the added advantage of providing controlled delivery of an active ingredient to the body. Such dosage forms can be prepared, for example, by dissolving and/or dispensing the active ingredient in the proper medium. Alternatively or additionally, the rate can be controlled by either providing a rate controlling membrane and/or by dispersing the active ingredient in a polymer matrix and/or gel.


Suitable devices for use in delivering intradermal pharmaceutical compositions described in this application include short needle devices. Intradermal compositions can be administered by devices which limit the effective penetration length of a needle into the skin. Alternatively or additionally, conventional syringes can be used in the classical mantoux method of intradermal administration. Jet injection devices which deliver liquid formulations to the dermis via a liquid jet injector and/or via a needle which pierces the stratum corneum and produces a jet which reaches the dermis are suitable. Ballistic powder/particle delivery devices which use compressed gas to accelerate the compound in powder form through the outer layers of the skin to the dermis are suitable.


Formulations suitable for topical administration include, but are not limited to, liquid and/or semi-liquid preparations such as liniments, lotions, oil-in-water and/or water-in-oil emulsions such as creams, ointments, and/or pastes, and/or solutions and/or suspensions. Topically administrable formulations may, for example, comprise from about 1% to about 10% (w/w) active ingredient, although the concentration of the active ingredient can be as high as the solubility limit of the active ingredient in the solvent. Formulations for topical administration may further comprise one or more of the additional ingredients described in this application.


A pharmaceutical composition described in this application can be prepared, packaged, and/or sold in a formulation suitable for pulmonary administration via the buccal cavity. Such a formulation may comprise dry particles which comprise the active ingredient and which have a diameter in the range from about 0.5 to about 7 nanometers, or from about 1 to about 6 nanometers. Such compositions are conveniently in the form of dry powders for administration using a device comprising a dry powder reservoir to which a stream of propellant can be directed to disperse the powder and/or using a self-propelling solvent/powder dispensing container such as a device comprising the active ingredient dissolved and/or suspended in a low-boiling propellant in a sealed container. Such powders comprise particles wherein at least 98% of the particles by weight have a diameter greater than 0.5 nanometers and at least 95% of the particles by number have a diameter less than 7 nanometers. Alternatively, at least 95% of the particles by weight have a diameter greater than 1 nanometer and at least 90% of the particles by number have a diameter less than 6 nanometers. Dry powder compositions may include a solid fine powder diluent such as sugar and are conveniently provided in a unit dose form.


Low boiling propellants generally include liquid propellants having a boiling point of below 65° F. at atmospheric pressure. Generally, the propellant may constitute 50 to 99.9% (w/w) of the composition, and the active ingredient may constitute 0.1 to 20% (w/w) of the composition. The propellant may further comprise additional ingredients such as a liquid non-ionic and/or solid anionic surfactant and/or a solid diluent (which may have a particle size of the same order as particles comprising the active ingredient).


Although the descriptions of pharmaceutical compositions provided in this application are principally directed to pharmaceutical compositions which are suitable for administration to humans, it will be understood by the skilled artisan that such compositions are generally suitable for administration to animals of all sorts. Modification of pharmaceutical compositions suitable for administration to humans in order to render the compositions suitable for administration to various animals is well understood, and the ordinarily skilled veterinary pharmacologist can design and/or perform such modification with ordinary experimentation.


Compounds provided in this application are typically formulated in dosage unit form for ease of administration and uniformity of dosage. It will be understood, however, that the total daily usage of the compositions described in this application will be decided by a physician within the scope of sound medical judgment. The specific therapeutically effective dose level for any particular subject or organism will depend upon a variety of factors including the disease being treated and the severity of the disorder; the activity of the specific active ingredient employed; the specific composition employed; the age, body weight, general health, sex, and diet of the subject; the time of administration, route of administration, and rate of excretion of the specific active ingredient employed; the duration of the treatment; drugs used in combination or coincidental with the specific active ingredient employed; and like factors well known in the medical arts.


The compounds and compositions provided in this application can be administered by any route, including enteral (e.g., oral), parenteral, intravenous, intramuscular, intra-arterial, intramedullary, intrathecal, subcutaneous, intraventricular, transdermal, interdermal, rectal, intravaginal, intraperitoneal, topical (as by powders, ointments, creams, and/or drops), mucosal, nasal, bucal, sublingual; by intratracheal instillation, bronchial instillation, and/or inhalation; and/or as an oral spray, nasal spray, and/or aerosol. Specifically contemplated routes are oral administration, intravenous administration (e.g., systemic intravenous injection), regional administration via blood and/or lymph supply, and/or direct administration to an affected site. In general, the most appropriate route of administration will depend upon a variety of factors including the nature of the agent (e.g., its stability in the environment of the gastrointestinal tract), and/or the condition of the subject (e.g., whether the subject is able to tolerate oral administration).


In some embodiments, compounds or compositions disclosed in this application are formulated and/or administered in nanoparticles. Nanoparticles are particles in the nanoscale. In some embodiments, nanoparticles are less than 1 μm in diameter. In some embodiments, nanoparticles are between about 1 and 100 nm in diameter. Nanoparticles include organic nanoparticles, such as dendrimers, liposomes, or polymeric nanoparticles. Nanoparticles also include inorganic nanoparticles, such as fullerenes, quantum dots, and gold nanoparticles. Compositions may comprise an aggregate of nanoparticles. In some embodiments, the aggregate of nanoparticles is homogeneous, while in other embodiments the aggregate of nanoparticles is heterogeneous.


The exact amount of a compound required to achieve an effective amount will vary from subject to subject, depending, for example, on species, age, and general condition of a subject, severity of the side effects or disorder, identity of the particular compound, mode of administration, and the like. An effective amount may be included in a single dose (e.g., single oral dose) or multiple doses (e.g., multiple oral doses). In certain embodiments, when multiple doses are administered to a subject or applied to a tissue or cell, any two doses of the multiple doses include different or substantially the same amounts of a compound described in this application. In certain embodiments, when multiple doses are administered to a subject or applied to a tissue or cell, the frequency of administering the multiple doses to the subject or applying the multiple doses to the tissue or cell is three doses a day, two doses a day, one dose a day, one dose every other day, one dose every third day, one dose every week, one dose every two weeks, one dose every three weeks, or one dose every four weeks. In certain embodiments, the frequency of administering the multiple doses to the subject or applying the multiple doses to the tissue or cell is one dose per day. In certain embodiments, the frequency of administering the multiple doses to the subject or applying the multiple doses to the tissue or cell is two doses per day. In certain embodiments, the frequency of administering the multiple doses to the subject or applying the multiple doses to the tissue or cell is three doses per day. In certain embodiments, when multiple doses are administered to a subject or applied to a tissue or cell, the duration between the first dose and last dose of the multiple doses is one day, two days, four days, one week, two weeks, three weeks, one month, two months, three months, four months, six months, nine months, one year, two years, three years, four years, five years, seven years, ten years, fifteen years, twenty years, or the lifetime of the subject, tissue, or cell. In certain embodiments, the duration between the first dose and last dose of the multiple doses is three months, six months, or one year. In certain embodiments, the duration between the first dose and last dose of the multiple doses is the lifetime of the subject, tissue, or cell. In certain embodiments, a dose (e.g., a single dose, or any dose of multiple doses) described in this application includes independently between 0.1 μg and 1 μg, between 0.001 mg and 0.01 mg, between 0.01 mg and 0.1 mg, between 0.1 mg and 1 mg, between 1 mg and 3 mg, between 3 mg and 10 mg, between 10 mg and 30 mg, between 30 mg and 100 mg, between 100 mg and 300 mg, between 300 mg and 1,000 mg, or between 1 g and 10 g, inclusive, of a compound described in this application. In certain embodiments, a dose described in this application includes independently between 1 mg and 3 mg, inclusive, of a compound described in this application. In certain embodiments, a dose described in this application includes independently between 3 mg and 10 mg, inclusive, of a compound described in this application. In certain embodiments, a dose described in this application includes independently between 10 mg and 30 mg, inclusive, of a compound described in this application. In certain embodiments, a dose described in this application includes independently between 30 mg and 100 mg, inclusive, of a compound described in this application.


Dose ranges as described in this application provide guidance for the administration of provided pharmaceutical compositions to an adult. The amount to be administered to, for example, a child or an adolescent can be determined by a medical practitioner or person skilled in the art and can be lower or the same as that administered to an adult.


A compound or composition, as described in this application, can be administered in combination with one or more additional pharmaceutical agents (e.g., therapeutically and/or prophylactically active agents). The compounds or compositions can be administered in combination with additional pharmaceutical agents that improve their activity, improve bioavailability, improve safety, reduce drug resistance, reduce and/or modify metabolism, inhibit excretion, and/or modify distribution in a subject or cell. It will also be appreciated that the therapy employed may achieve a desired effect for the same disorder, and/or it may achieve different effects. In certain embodiments, a pharmaceutical composition described in this application including a compound described in this application and an additional pharmaceutical agent shows a synergistic effect that is absent in a pharmaceutical composition including one of the compound and the additional pharmaceutical agent, but not both.


The compound or composition can be administered concurrently with, prior to, or subsequent to one or more additional pharmaceutical agents, which may be useful as, e.g., combination therapies. Pharmaceutical agents include therapeutically active agents. Pharmaceutical agents also include prophylactically active agents. Pharmaceutical agents include small organic molecules such as drug compounds (e.g., compounds approved for human or veterinary use by the U.S. Food and Drug Administration as provided in the Code of Federal Regulations (CFR)), peptides, proteins, carbohydrates, monosaccharides, oligosaccharides, polysaccharides, nucleoproteins, mucoproteins, lipoproteins, synthetic polypeptides or proteins, small molecules linked to proteins, glycoproteins, steroids, nucleic acids, DNAs, RNAs, nucleotides, nucleosides, oligonucleotides, antisense oligonucleotides, lipids, hormones, vitamins, and cells. In certain embodiments, the additional pharmaceutical agent is a pharmaceutical agent useful for treating and/or preventing a disease (e.g., proliferative disease, neurological disease, painful condition, psychiatric disorder, or metabolic disorder). Each additional pharmaceutical agent may be administered at a dose and/or on a time schedule determined for that pharmaceutical agent. The additional pharmaceutical agents may also be administered together with each other and/or with the compound or composition described in this application in a single dose or administered separately in different doses. The particular combination to employ in a regimen will take into account compatibility of the compound described in this application with the additional pharmaceutical agent(s) and/or the desired therapeutic and/or prophylactic effect to be achieved. In general, it is expected that the additional pharmaceutical agent(s) in combination be utilized at levels that do not exceed the levels at which they are utilized individually. In some embodiments, the levels utilized in combination will be lower than those utilized individually.


In some embodiments, one or more of the compositions described in this application are administered to a subject. In certain embodiments, the subject is an animal. The animal may be of either sex and may be at any stage of development. In certain embodiments, the subject is a human. In other embodiments, the subject is a non-human animal. In certain embodiments, the subject is a mammal. In certain embodiments, the subject is a non-human mammal. In certain embodiments, the subject is a domesticated animal, such as a dog, cat, cow, pig, horse, sheep, or goat. In certain embodiments, the subject is a companion animal, such as a dog or cat. In certain embodiments, the subject is a livestock animal, such as a cow, pig, horse, sheep, or goat. In certain embodiments, the subject is a zoo animal. In another embodiment, the subject is a research animal, such as a rodent (e.g., mouse, rat), dog, pig, or non-human primate.


Also encompassed by the disclosure are kits (e.g., pharmaceutical packs). The kits provided may comprise a composition, such as a pharmaceutical composition, or a compound described in this application and a container (e.g., a vial, ampule, bottle, syringe, and/or dispenser package, or other suitable container). In some embodiments, provided kits may optionally further include a second container comprising a pharmaceutical excipient for dilution or suspension of a pharmaceutical composition or compound described in this application. In some embodiments, the pharmaceutical composition or compound described in this application provided in the first container and the second container a combined to form one unit dosage form.


Thus, in one aspect, provided are kits including a first container comprising a compound or composition described in this application. In certain embodiments, the kits are useful for treating a disease in a subject in need thereof. In certain embodiments, the kits are useful for preventing a disease in a subject in need thereof. In certain embodiments, the kits are useful for reducing the risk of developing a disease in a subject in need thereof.


In certain embodiments, a kit described in this application further includes instructions for using the kit. A kit described in this application may also include information as required by a regulatory agency such as the U.S. Food and Drug Administration (FDA). In certain embodiments, the information included in the kits is prescribing information. In certain embodiments, the kits and instructions provide for treating a disease in a subject in need thereof. In certain embodiments, the kits and instructions provide for preventing a disease in a subject in need thereof. In certain embodiments, the kits and instructions provide for reducing the risk of developing a disease in a subject in need thereof. A kit described in this application may include one or more additional pharmaceutical agents described in this application as a separate composition.


The present invention is further illustrated by the following Examples, which in no way should be construed as limiting. The entire contents of all of the references (including literature references, issued patents, published patent applications, and co-pending patent applications) cited throughout this application are hereby expressly incorporated by reference. If a reference incorporated in this application contains a term whose definition is incongruous or incompatible with the definition of same term as defined in the present disclosure, the meaning ascribed to the term in this disclosure shall govern. However, mention of any reference, article, publication, patent, patent publication, and patent application cited in this application is not, and should not be taken as an acknowledgment or any form of suggestion that they constitute valid prior art or form part of the common general knowledge in any country in the world.


EXAMPLES
Example 1: Functional Expression of AAE Genes in E. coli and S. cerevisiae

It was reported previously that S. cerevisiae has endogenous AAE activity that allows conversion of hexanoate to hexanoyl-CoA (Gagne et al. 2012). However, in some embodiments, the endogenous AAE activity of S. cerevisiae may be insufficient for industrial-scale synthesis of downstream products. This example validates novel genes with AAE activity that can be used in the cells, reactions, and methods of the present disclosure.


Several DNA sequences with predicted AAE functionality were identified from the genomes of the yeast Yarrowia lipolytica (Y. lipolytica) and the bacterium Rhodopseudomonas palustris (R. palustris). The predicted AAE genes were first codon-optimized in silico for expression in E. coli. The codon-optimized gene sequences were synthesized via standard DNA synthesis techniques and were expressed in recombinant E. coli host cells (FIG. 4). Lysates from the recombinant E. coli host cells were then tested for AAE activity using an assay described below.



FIG. 4 shows the results from the AAE activity assay in E. coli host cells. 3 out of 4 predicted Y. lipolytica AAEs (strains t49578, t49594, and t51477) and both of the predicted R. palustris AAEs (strains t55127 and t55128) exhibited activity on a hexanoate substrate. Strains t49594 and t51477 expressed the candidate AAE enzyme as a fusion protein with an N-terminal MYC tag. In addition, the assays also showed that 2 out of 4 predicted Y. lipolytica AAEs (strains t49594 and t51477) and both of the predicted R. palustris AAEs (strains t55127 and t55128) also demonstrated activity on a butyrate substrate.


The newly described AAEs were also found to be capable of exhibiting AAE activity in eukaryotes. Briefly, the Y. lipolytica AAE that produced the best results in E. coli host cells was selected. This corresponded to the AAE expressed by strain t49594 (which encodes a protein corresponding to the protein provided by Uniprot Accession No. Q6C577 with a N-terminal MYC tag). The gene encoding this AAE was codon-optimized for expression in S. cerevisiae, and the last three residues (peroxisomal targeting signal 1) were removed. Two different codon-optimized versions of this AAE were synthesized in the replicative yeast expression vector shown in FIGS. 5A-5B. The recoded sequences only shared 81.66% sequence identity at the DNA level, while encoding for the same polypeptide. Both AAE expression constructs were then transformed into a CEN.PK S. cerevisiae strain, and transformants were selected based on ability to grow on media lacking uracil. The transformants were tested for AAE activity with a colorimetric AAE assay (described below). An S. cerevisiae strain expressing GFP was used as a negative control (strain t390338).


The results from the colorimetric AAE assay are shown in FIG. 6. Both of the codon-optimized versions of the Y. lipolytica AAE (strains t392878 and t392879) exhibited AAE activity on a hexanoate substrate, demonstrating that the newly disclosed AAEs could also be used in eukaryotic hosts. These enzymes were thus demonstrated to be capable of catalyzing the first enzymatic step in microbial production of cannabinoids from carboxylic acids.


This Example demonstrates identification of AAEs that are capable of using hexanoate and butyrate as substrates to produce cannabinoid precursors. Detailed results for the AAE activity experiments in E. coli host cells are provided below in Table 3. Sequence information for strains described in this Example are provided in Table 4 at the end of the Examples section.









TABLE 3







Activity of AAE Enzymes on Hexanoate and Butyrate in E. Coli













Standard

Standard



Average
Deviation
Average
Deviation


Strain
Activity on
Activity on
Activity on
Activity on


(E. coli)
Hexanoate
Hexanoate
Butyrate
Butyrate














t49568
−0.0495
0.014849
−0.0105
0.04879


(Negative


control)


t49578
0.222
0.016971
−0.0195
0.010607


t49580
−0.065
0.019799
−0.055
0.007071


t49594
0.458
0.005657
0.3895
0.000707


t51477
0.347
0.005657
0.046
0.005657


t55127
0.395
0.011314
0.1835
0.024749


t55128
0.2495
0.012021
0.2205
0.012021









Materials and Methods

AAE Assay for E. coli



E. coli BL21 strains harboring a plasmid that contained AAE genes driven by a T7 promoter were inoculated from glycerol stocks into shake flasks with 25 mL LB and grown overnight at 37° C. with shaking at 250 RPM. The next day, strains were inoculated 1% (v/v) into LB and grown for 3-6 hours until an OD600 of ˜0.6 was attained. They were then induced with 1 mM IPTG and incubated overnight at 23° C. with shaking at 250 RPM. The next day, the cultures were harvested and pelleted. Cell pellets were lysed with BugBuster™ reagent (5 mL per g wet pellet) at 18° C. and shaken at 250 RPM for 20 min. Lysates were centrifuged at 4° C. and 4000 RPM for 20 min. The soluble fractions of the lysates were taken for the enzyme assay. The enzyme assay mixture contained 5 mM substrate (sodium hexanoate or sodium butyrate), 3 mM ATP, 1 mM CoA, 5 mM MgCl2, and 100 mM HEPES (pH 7.5). 25 μL of E. coli lysates were added to 500 μL of assay mixture and allowed to react at 30° C., 250 RPM for 20 min. Assays were then quenched by adding 50 μL of the reaction to 50 μL of 2 mM DTNB. Absorbance was measured at 412 nm to quantify the decrease in free CoA.


AAE Assay for S. cerevisiae


5 μL/well of thawed glycerol stocks were stamped into 300 μL/well of SC-URA+4% dextrose in half-height deepwell plates, which were sealed with AeraSeal™ film. Samples were incubated at 30° C. and shaken at 1000 RPM in 80% humidity for 2 days. 10 μL/well of resulting precultures were stamped into 300 μL/well of SC-URA+4% dextrose in half-height deepwell plates, which were sealed with AeraSeal™ film. Samples were incubated at 30° C. and shaken at 1000 RPM in 80% humidity for 3 days. 10 μL of resulting production cultures were stamped into 140 μL/well PBS in flat bottom plates. Optical measurements were taken on a plate reader, with absorbance measured at 600 nm and fluorescence at 528 nm with 485 mn excitation.


Production culture plates were centrifuged at 4000 RPM for 10 min. Supernatant was removed, and the plates of pellets were heat-sealed and frozen at −80° C.


Pellets were thawed and 200 μL Y-PER per well was added. Samples were agitated at room temperature for 20 minutes and then pelleted at 3500 RPM for 10 minutes. 50 μL of the clarified lysate was combined with 50 μL of feed buffer or CoA standard in clear bottom plates. Plates were then incubated at 30° C. and shaken at 1000 RPM in 80% humidity for 60 min. 1 μL of DTNB buffer was added to each well to a final concentration of 100 μM DTNB, and samples were agitated at room temperature for 15 minutes. Absorbance was measured at 412 nm to quantify the decrease in free CoA.


Materials included:

    • Feed Buffers:
      • 10 mM MgCl2
      • 1 mM sodium hexanoate
      • 0.5 mM CoA
      • 1 mM ATP
      • 100 mM Tris HCl pH 7.6
    • DTNB Buffer:
      • 10 mM DTNB (Sigma D8130) (stock of DTNB in DMSO)
      • 100 mM Tris-HCl pH7.6 (Teknova)
    • Y-PER Yeast Protein Extraction Reagent (Thermo 78990):
      • +1 tablet/50 mL complete, EDTA-free, protease inhibitors (Sigma, 11873580001)
    • Coenzyme A trilithium salt (Sigma C3019)


Example 2: Functional Expression of OLS Genes in S. cerevisiae

Functional expression of C. sativa olivetol synthase (OLS) and olivetolic acid cyclase (OAC) enzymes in S. cerevisiae was previously reported (Gagne et al. 2012). To identify other OLS genes that can be functionally expressed, a library of approximately 2000 OLS candidate genes was designed. The genes within the library were codon-optimized for expression in S. cerevisiae and synthesized in the replicative yeast expression vector shown in FIGS. 5A-5B. Each candidate OLS was transformed into an auxotrophic S. cerevisiae CEN.PK GAL80 knockout strain, and transformants were selected based on ability to grow on media lacking uracil. The transformants were tested for olivetol and olivetolic acid production from sodium hexanoate in vivo in a high-throughput primary screen, as described in the materials and methods section below. Top olivetol and/or olivetolic acid-producing strains that were identified in the primary screen were subsequently tested in a secondary screen to verify and further quantify olivetol and olivetolic acid production.


Numerous yeast transformants were observed to be capable of producing olivetol in the primary screen (FIG. 8). In particular, two of the top olivetol-producing strains were strain t395094 and strain t393991 (FIG. 8). These two strains were also found to be among the top olivetol-producing strains in the secondary screen.


When the OLS library described in this Example was designed and screened in the primary and secondary screens, it was expected that the strains expressed full-length candidate OLS enzymes. Specifically, strain t395094 was believed to express a full-length OLS protein from Araucaria cunninghamii (Hoop pine) (corresponding to Uniprot Accession No. A0A0D6QTX3) and strain t393991 was believed to express a full-length OLS protein from Cymbidium hybrid cultivar (corresponding to Uniprot Accession No. A0A088G5Z5). However, as explained further in Table 5 at the end of the Examples section, sequencing analysis of strains from the OLS library used for these screens later revealed that there was a 6-nucleotide deletion in the sequences of many of the genes encoding the OLS enzymes in the library. Specifically, this deletion affected all of the candidate OLSs expressed by the strains identified in FIGS. 8-10, including the candidate OLSs expressed by strains t395094 and t393991 (Table 5).


The 6-nucleotide deletion included the first two nucleotides within the start codon of the genes encoding the OLS enzymes. As one of ordinary skill in the art would appreciate, such a deletion may result in the truncation of one or more amino acids from the N-terminus of the proteins encoded by the affected genes, and such a deletion could potentially extend to the next in-frame methionine residue in the intended protein sequence. For example, strain t393991 expressed a truncated version of a codon-optimized nucleic acid encoding an OLS protein from Cymbidium hybrid cultivar (Table 5). The full-length Cymbidium hybrid cultivar protein corresponds to SEQ ID NO: 7. If the deletion in the nucleic acid encoding this OLS protein were to result in translation commencing from the next start codon within the same reading frame, this would result in an N-terminally truncated version of the full-length OLS protein from Cymbidium hybrid cultivar. A protein sequence for a truncated protein that commences from the next start codon within the same reading frame is provided by SEQ ID NO: 714. SEQ ID NO: 714 has a truncation of the first 86 amino acids of SEQ ID NO: 7 and is approximately 77.9% identical to SEQ ID NO: 7.


Due to the truncation of OLS candidate genes within the library, candidate OLS genes screened in this Example were independently screened again using a new library that expressed only full-length OLS genes (Example 3). As discussed in Example 3, screening with a full-length OLS library independently identified both the OLS protein from Araucaria cunninghamii (Hoop pine) (corresponding to Uniprot Accession No. A0A0D6QTX3) and the OLS protein from Cymbidium hybrid cultivar (corresponding to Uniprot Accession No. A0A088G5Z5), discussed above, verifying the identification of these candidate OLSs as being highly effective for olivetol production in recombinant host cells.


It was determined that the OLS enzymes expressed by positive control strains t339579 and t339582, depicted in FIGS. 8-10, were also affected by the 6-nucleotide deletion discussed above. Accordingly, the low amounts of olivetol and olivetolic acid produced by the strains labelled as positive controls in FIGS. 8-10 may have been caused by disrupted expression of these proteins due to truncation.


Identification of Bifunctional PKS-PKC Enzymes

It was previously observed that S. cerevisiae possesses native OAC activity which enables some amount of the OLS product, 3,5,7-trioxododecanoyl-CoA, to be converted to olivetolic acid instead of undergoing a spontaneous decarboxylative cyclization to olivetol in the absence of OAC activity (FIG. 1). Most strains tested in the primary screen were observed to produce a constant (i.e., fixed) ratio of olivetolic acid to olivetol (FIG. 10). Without wishing to be bound by any theory, the accumulation of olivetolic acid and olivetol in these strains may be due to the reported endogenous S. cerevisiae OAC activity competing with spontaneous conversion of 3,5,7-trioxododecanoyl-CoA to olivetol. Both products, olivetol and olivetolic acid, increase proportionally with their shared precursor.


However, multiple strains were identified in the primary screen that demonstrated olivetolic acid production outside of the constant olivetolic acid to olivetol ratio discussed above. Strain t393974 demonstrated the highest olivetolic acid production (FIG. 9). In particular, strain t393974 was observed to produce substantially more olivetolic acid than olivetol in the primary screen a quantity of olivetolic acid that was outside of the fixed ratio exhibited by other tested strains (FIG. 10). These data suggested that the OLS enzyme expressed by strain t393974 may be a bifunctional enzyme possessing both polyketide synthase and polyketide cyclase catalytic functions and may be capable of catalyzing both reactions R2 and R3 in FIG. 2, and, at least, both reactions R2a and R3a in FIG. 1 (“Bifunctional PKS-PKC”).


As discussed above, when the OLS library described in this Example was designed and screened in the primary and secondary screens, it was expected that the strains expressed full-length candidate OLS enzymes. Specifically, strain t393974 was believed to express a full-length OLS protein from Corchorus olitorius (Jute) (corresponding to UniProt Accession No. A0A1R3HSU5). However, as discussed above and explained further in Table 5 at the end of the Examples section, sequencing analysis of strains from the OLS library used for these screens later revealed that there was a 6-nucleotide deletion in the sequences of many of the genes encoding the OLS enzymes in the library, which affected all of the candidate OLSs expressed by the strains identified in FIGS. 8-10, including the candidate OLS expressed by strain t393974 (Table 5). Accordingly, strain t393974 expressed a truncated version of a codon-optimized nucleic acid encoding an OLS protein from Corchorus olitorius (Jute) (Table 5). The full-length Corchorus olitorius (Jute) protein corresponds to SEQ ID NO: 6.


Due to the sequence truncation of OLS candidate genes within the library, candidate OLS genes screened in this Example were independently screened again using a new library that expressed only full-length OLS genes (Example 3). As discussed in Example 3, screening with a full-length OLS library also independently identified the candidate OLS protein from Corchorus olitorius (Jute) (corresponding to UniProt Accession No. A0A1R3HSU5; SEQ ID NO: 6) as an OLS that produced both olivetol and olivetolic acid.


Materials and Methods
OLS Assay

A library of approximately 2000 OLS enzymes was transformed into S. cerevisiae. 5 μL/well of thawed glycerol stocks were stamped into 300 μL/well of SC-URA+4% dextrose in half-height deepwell plates, which were sealed with AeraSeal™ films. Samples were incubated at 30° C. and shaken at 1000 RPM in 80% humidity for 2 days. 10 μL/well of the resulting precultures were stamped into 300 μL/well of SC-URA+4% Dextrose+1 mM sodium hexanoate in half-height deepwell plates, which were sealed with AeraSeal™ films. Samples were incubated at 30° C. and shaken at 1000 RPM in 80% humidity for 4 days. 10 μL/well of the resulting production cultures were stamped into 140 μL/well PBS in flat bottom plates. Optical measurements were taken on a plate reader, with absorbance measured at 600 nm and fluorescence at 528 nm with 485 mn excitation.


30 μL/well of production cultures were stamped into 270 μL/well of 100% methanol containing 300 μg/L 3-(3-Hydroxypropyl)phenol (3HPP) in half-height deepwell plates. Plates were heat sealed and frozen at −80° C. for two hours. Plates were then thawed for 30 minutes and spun down at 4° C. at 4000 rpm for 10 min. 75 μL of supernatant from each well of each plate was stamped into Corning 3694 (half area) plates, which were then submitted for LC-MS quantification of olivetol and olivetolic acid.


The experimental protocol for the secondary screen was the same as described above, except that four replicates per strain were tested and standard curves of both olivetol and olivetolic acid were prepared so that both products could be quantified.


Example 3: Generation and Screening of Full-Length OLS Library

As discussed in Example 2, sequencing analysis of strains from the OLS library used for the screening described in Example 2 revealed that many of the genes encoding the OLS enzymes in the library were inadvertently truncated N-terminally. This truncation affected all of the strains identified in FIGS. 8-10, including the positive control strains.


Accordingly, a new OLS library was generated to contain only OLS genes that produce full-length OLS enzymes. The full-length OLS library contained full-length versions of the approximately 2000 OLS enzymes from the original library described in Example 2 and also included approximately 900 additional candidate OLS enzymes. All candidate OLS enzymes were codon optimized for expression in S. cerevisiae. Strain t527340, comprising an OLS from C. sativa, was included in the library as a positive control, and strain t527338, comprising GFP, was included in the library as a negative control. A high-throughput primary screen was conducted with the full-length OLS library using the same OLS assay described in Example 2 with the following exceptions: all library members and controls were transformed into an auxotrophic CEN.PK strain that comprised a chromosomally integrated heterologous gene encoding AAE VcsA (Uniprot Accession Q6N4N8) from R. palustris; and neither sodium hexanoate nor sodium butyrate were included in the production cultures.


Top olivetol and/or olivetolic acid-producing strains from the high-throughput primary screen were carried over to a secondary screen to verify and further quantify olivetol production. The experimental protocol for the secondary screen was the same as the primary screen except that: four replicates per strain were tested; and one set of cultures was supplemented with 1 mM sodium hexanoate, while the second set of cultures was not supplemented with sodium hexanoate. Olivetol production was normalized to a positive control strain expressing a C. sativa OLS. Table 7 provides results for strains that exhibited average normalized olivetol >1 and/or that produced higher amounts of olivetolic acid than the positive control in samples that were supplemented with sodium hexanoate. FIG. 12A depicts olivetol production in library strains supplemented with sodium hexanoate. FIG. 12B depicts olivetolic acid production in library strains supplemented with sodium hexanoate. Table 8 provides results for strains that exhibited average normalized olivetol >1 and/or that produced higher amounts of olivetolic acid than the positive control in samples that were not supplemented with sodium hexanoate. Table 6 provides sequence information for strains described in Tables 7 and 8. The Average Normalized Olivetol for each strain was calculated by taking the mean of the following ratio for each replicate of that strain: the ratio of olivetol to absorbance measured at 600 nm to average olivetol produced by the C. sativa OLS positive control in the same plate.


One of the top olivetol-producing strains identified in this Example was strain t527346, comprising an OLS enzyme from Cymbidium hybrid cultivar (Accession ID: A0A088G5Z5; SEQ ID NO: 7; Tables 6-8). As discussed above, this candidate OLS was also identified in the screening conducted in Example 2. Protein alignments conducted with BLASTP using default parameters identified two other OLS candidates that shared at least 90% identity with the OLS enzyme from Cymbidium hybrid cultivar (Accession ID: A0A088G5Z5; SEQ ID NO: 7). These strains were: strain t598916 (expressing an OLS corresponding to SEQ ID NO: 145) and strain t599231 (expressing an OLS corresponding to SEQ ID NO: 15) (Tables 6-8), which were 93.07% and 91.79%, identical to SEQ ID NO: 7, respectively.


Another notable olivetol-producing strain identified in this Example was strain t599285, comprising an OLS enzyme from Araucaria cunninghamii (Accession A0A0D6QTX3; SEQ ID NO: 17). As discussed above, this candidate OLS was also identified in the screening conducted in Example 2.


Consistent with the identification in Example 2 of a candidate OLS from Corchorus olitorius (Jute) as a potentially bifunctional OLS, strain t598084, comprising a full-length version of the Corchorus olitorius (Jute) OLS (corresponding to UniProt Accession No. A0A1R3HSU5; SEQ ID NO: 6) was independently identified in this Example as a candidate OLS that produced more olivetolic acid than a Cannabis OLS positive control, and produced more olivetolic acid than olivetol, based on average olivetol and olivetolic acid produced (Tables 6-8).


Example 4: Generation and Screening of C. sativa OLS Protein Engineering Library in S. cerevisiae

To identify C. sativa OLS (CsOLS) enzyme variants with improved olivetol production, a library comprised of approximately 1300 members was designed. The library included CsOLS enzymes containing single or multiple amino acid substitutions or deletions. Nucleotide sequences were codon-optimized for expression in S. cerevisiae and synthesized in the replicative yeast expression vector shown in FIGS. 5A-5B. Each candidate enzyme expression construct was transformed into an auxotrophic S. cerevisiae CEN.PK GAL80 knockout strain. Transformants were selected based on ability to grow on media lacking uracil. Strain t346317, carrying GFP, was included in the library as a negative control.


The library of candidate CsOLS enzyme variants was assayed for activity in a high-throughput primary screen using the OLS assay described in Example 2 (FIG. 13). LC-MS analysis revealed that approximately 95% of library members produced measurable amounts of olivetol.


The top olivetol and/or olivetolic acid-producing strains from the primary screen were carried over to a secondary screen to verify the results. The experimental protocol for the secondary screen was the same as the primary screen, except that four replicates per strain were tested and a standard curve for olivetol and olivetolic acid was generated so that the amount of olivetol and olivetolic acid could be quantified via LC-MS (FIG. 13.)


Multiple OLS variants were identified that were capable of producing olivetol (Table 9). In order to investigate where the point mutations in the OLS variants were located relative to the OLS enzyme structure, a 3D model of the wildtype C. sativa OLS protein (corresponding to SEQ ID NO: 5) was generated using Rosetta protein modeling software. The active site of the C. sativa OLS enzyme was identified based on the catalytic triad of residues described in Taura et al. (2009) FEBS Letters for OLS enzymes, consisting of residues H297, C157, and N330 in the C. sativa OLS enzyme. The active site was also defined to include a docked molecule of hexanoyl-CoA (OLS substrate). Residues were considered to be within the active site if they were within about 12 angstroms of any of the residues within the catalytic triad of the OLS enzyme and/or within about 12 angstroms of a docked substrate within the OLS enzyme.


A subset of OLS point mutations was identified that included strains that produced at least 10 mg/L olivetol and mapped to within the active site. This group of point mutations included: T17K, I23C, L25R, K51R, D54R, F64Y, V95A, T123C, A125S, Y153G, E196K, L201C, I207L, L241I, T247A, M267K, M267G, I273V, L277M, T296A, V307I, D320A, V324I, S326R, H328Y, S334P, S334A, T335C, R375T (Table 9). FIG. 17 provides a schematic of the 3D structure of the C. sativa OLS protein (corresponding to SEQ ID NO: 5), showing the catalytic triad, the bound hexanoyl-CoA substrate, and the cluster of point mutations identified within the active site.


OLS point mutations from strains that produced at least 10 mg/L olivetol and mapped to within about 8 angstroms of any of the residues within the catalytic triad of the OLS enzyme and/or within about 8 angstroms of a docked substrate within the OLS included: K51R, D54R, T123C, A125S, L201C, I207L, L241I, T247A, M267K, M267G, I273V, T296A, V307I, V324I, S326R, H328Y, S334P, T335C, and R375T. FIG. 16 provides a schematic of the 3D structure of the C. Sativa OLS protein (corresponding to SEQ ID NO: 5), showing the catalytic triad, the bound hexanoyl-CoA substrate, and the cluster of point mutations identified within about 8 angstroms of any of the residues within the catalytic triad of the OLS enzyme and/or within about 8 angstroms of a docked substrate within the PKS.


The point mutation that was found to be associated with the most olivetol production was a T335C mutation in the C. sativa OLS sequence (Table 9). This residue maps to the active site of the OLS enzyme (FIGS. 16-17). In further support of the importance of this residue for olivetol production, at least 5 of the high-producing olivetol candidate OLSs identified in Example 3 contain a C residue at this position (strain IDs t527346 (SEQ ID NO: 7), t598265 (SEQ ID NO: 13), t598301 (SEQ ID NO: 7), t598916 (SEQ ID NO: 145), t598976 (SEQ ID NO: 8), t599231 (SEQ ID NO: 15)). Strains t527346 and t598301 comprise an OLS that has the same amino acid sequence but the OLS is encoded by different nucleic acid sequences.


Additional C. sativa OLS variants were identified that did not map within the active site, but which were observed to produce more than approximately 13 mg/L olivetol (Table 9). This group of point mutations included: I284Y, K100L, K116R, I278E, K108D, L348S, K71R, V92G, T128V, K100M, Y135V, P229A, T128A, T128I (Table 9).


Table 13 provides sequence information for strains described in Table 9.


Thus, novel variants of the C. sativa OLS protein that may be useful for olivetol production in recombinant host cells were identified.


Example 5. Generation and Screening of C. sativa OLS (CsOLS), Cymbidium OLS (ChOLS), and Corchorus OLS (CoOLS) Protein Engineering Libraries in S. cerevisiae

An additional OLS protein engineering library was generated that included OLS variants based on three different OLS templates: C. sativa OLS (CsOLS); Cymbidium hybrid cultivar OLS (ChOLS) and Corchorus olitorius OLS (CoOLS), which were among the candidate OLSs identified in the screens described in Examples 2 and 3. As discussed above, ChOLS was identified as being one of the strongest olivetol-producing candidate OLS enzymes, while CoOLS was identified as being a potential bifunctional enzyme possessing both polyketide synthase and polyketide cyclase catalytic functions.


The library included approximately 300 variants of ChOLS and approximately 200 variants of CoOLS. Variants of ChOLS and CoOLS included both single and multiple amino acid substitutions. For ChOLS, some of the variants were designed by taking beneficial mutations discovered from screening of CsOLS variants described in Example 4 and mapping the corresponding mutations onto the ChOLS template. Corresponding positions in ChOLS were identified and mutated.


For CoOLS, some of the variants were designed to investigate whether there were any specific residues that may contribute to conferring or enhancing bifunctionality. The sequence of the bifunctional CoOLS enzyme (SEQ ID NO: 6) was aligned with the sequence of the CsOLS enzyme (SEQ ID NO: 5), which is not bifunctional, and residues that are different between the sequences were considered for mutagenesis in both the CsOLS and CoOLS sequences. The impact of these mutations on bifunctionality was investigated by measuring the ratio of production of olivetolic acid to olivetol. A specific residue that was investigated with respect to bifunctionality was residue W339 in CoOLS, which corresponds to residue S332 in CsOLS.


Nucleotide sequences of the genes within the library were codon-optimized for expression in S. cerevisiae and synthesized in the replicative yeast expression vector shown in FIGS. 5A-5B. Each candidate OLS expression construct was transformed into a S. cerevisiae CEN.PK strain expressing a heterologous AAE VcsA-Q6N4N8 from R. palustris. The library was screened in a high-throughput primary screen in which the OLS assay was conducted as described in Example 2, except that production cultures were not supplemented with either sodium hexanoate or sodium butyrate. Instead the strains' natural pools of hexanoyl-CoA and butyryl-CoA were used as substrates. Top olivetol and/or olivetolic acid producing strains were carried over to a secondary screen to verify production of olivetol and/or olivetolic acid. The experimental protocol for the secondary screen was identical to the primary screen, except that four replicates per strain were tested; and olivetol production was assessed both in the context of production cultures being supplemented with sodium hexanoate and in the context of production cultures that were not being supplemented with sodium hexanoate.


Strain t527338, expressing a fluorescent protein, was included in the library as a negative control for enzyme activity. Strain t527340, expressing wild-type CsOLS, was included in the library as a positive control. Strain t527346, expressing wild-type ChOLS, was included in the library as a positive control and was used to establish hit ranking for variants designed using ChOLS as a template. Similarly, strain t606797, expressing wild-type CoOLS, was included in the library as a positive control and was used to establish hit ranking for variants designed using CoOLS as a template. Olivetol was normalized to the mean production of its wild-type template (e.g., olivetol produced by a variant of ChOLS was normalized to the mean olivetol titer produced by strain t527346) except that for variants made to the CoOLS template, olivetol was normalized against each of a CsOLS template and a CoOLS template due to inconsistent activity of the CoOLS wild type control (Tables 10A-B and 11A-B). The Average Normalized Olivetol value for wild-type templates was not necessarily 1.0. For example, for the wild-type C. sativa strain t527340 in Table 10B, the Average Normalized Olivetol value was 1.02159. This was because the mean by which values were normalized was based on library controls that were included on each plate within a screen (e.g., strain t527340). The library further contained additional in-library controls of the same strain (e.g., strain t527340). Those additional in-library controls were not used to calculate the mean. In instances where the average normalized olivetol values for all samples of strain t527340 were calculated, if the in-library controls produced slightly more olivetol than the mean olivetol produced by the library controls that were included on each plate, then the Average Normalized Olivetol value was slightly above 1.0.


Results from the secondary screen are provided in Tables 10-11. Table 10 provides results for samples that were supplemented with sodium hexanoate, while Table 11 provides results for samples that were not supplemented with sodium hexanoate. In Table 10, strains comprising ChOLS mutants that produced an average normalized olivetol level of at least 0.5 are shown. The performance of multi-mutation ChOLS enzymes are also shown.


For ChOLS, the approach of mapping equivalent variants from the CsOLS sequence led to the identification of multiple variants that exhibited improved olivetol production. These variants included the following point mutations: V71Y, F70M, L385M, E285A, L76I, N151P, E203K, V50N, S34Q, R100P, A219C, K359M, and R100T (Table 11). Several additional variants exhibited improved olivetol production in samples that were supplemented with sodium hexanoate. These variants included the following point mutations: V71Y and F70M (Table 10).


For CoOLS, when the mutation W339S was made to the CoOLS template (strain t607112), the ratio of olivetolic acid to olivetol decreased, from approximately 1.5 to approximately 0.157 (based on 517.075 ug/L olivetol and 81.225 ug/L olivetolic acid, as shown in Table 10A). However, olivetol levels reported in Table 10A for strain t607112 were within the standard deviation. Accordingly, while the mutation may have had an impact on bifunctionality, it also appears to have more generally affected overall functionality of the enzyme. The reverse mutation was also tested in CsOLS. For CsOLS, S332W (strain t606899) had a significantly negative impact on the function of the enzyme (Table 10A). Similarly, mutation S339W in ChOLS (strain t607377) had a significantly negative impact on the overall function of the enzyme (Table 10A).


Example 6. Functional Expression of OLS Enzymes in a Prototrophic S. cerevisiae Strain

Examples 2-5 utilized an auxotrophic S. cerevisiae CEN.PK strain as a host chassis for the expression of OLS enzyme candidates from a replicative plasmid. OLS candidate enzymes determined to be active in Examples 2-5 were also assessed in a prototrophic S. cerevisiae CEN.PK strain.


A library of approximately 58 OLS genes under the control of the same genetic regulatory elements shown in FIGS. 5A-5B (GAL1 promoter and CYC1 terminator) were integrated into the genome of a prototrophic S. cerevisiae CEN.PK strain. The parental chassis strain t473139, not expressing a heterologous OLS enzyme, was included as a negative control for enzyme activity. Strain t496084, expressing the CsOLS T335C point-mutant, which was the highest ranking CsOLS point mutant identified in Example 5 based on production of olivetol, was also included. The OLS assay was conducted as described in Example 2 with the following exceptions: glycerol stocks were stamped into YEP+4% glucose; a portion of the resulting cultures were then stamped into production cultures containing YEP+4% glucose+1 mM sodium hexanoate; and three bio-replicates were used instead of two.


Despite differences between auxotrophic and prototrophic strains that may impact production of olivetol, candidate OLS enzymes identified in Examples 2-5 through screening in auxotrophic strains were also found to be effective in production of olivetol in a prototrophic strain (Table 12 and FIG. 15). As shown in Table 12, strain t496073, corresponding to a prototrophic S. cerevisiae strain comprising a chromosomally integrated, codon-optimized nucleotide sequence encoding the OLS candidate from Cymbidium hybrid cultivar (Accession ID: A0A088G5Z5), which was identified in Examples 2 and 3, produced the highest olivetol titer of any library member and significantly more olivetol than the C. sativa control (FIG. 15 and Table 12).


Thus, novel candidate OLS enzymes identified in Examples 2-5 were found to be effective for olivetol production when expressed in prototrophic strains as well as auxotrophic strains.


Example 7. Biosynthesis of Cannabinoids in Engineered S. cerevisiae Host Cells

The activation of an organic acid to its CoA-thioester and the subsequent condensation of this thioester with a number of malonyl-CoA molecules, or other similar polyketide extender units, represent the first two steps in the biosynthesis of all known cannabinoids. To demonstrate the biosynthesis of CBGA (FIG. 1, Formula (8a)), CBDA (FIG. 1, Formula (9a)), THCA (FIG. 1, Formula (10a)), and CBCA (FIG. 1, Formula (11a)) the cannabinoid biosynthetic pathway shown in FIG. 1 is assembled in the genome of a prototrophic S. cerevisiae CEN.PK host cell wherein each enzyme (Rla-R5a) may be present in one or more copies. For example, the S. cerevisiae host cell may express one or more copies of one or more of: an AAE, an OLS, an OAC, a CBGAS, and a TS.


An AAE enzyme expressed heterologously in a host cell may be one or more of the AAE candidates from Y. lipolytica or R. palustris that are shown in Example 1 to be functionally expressed in S. cerevisiae. An OLS enzyme expressed heterologously in a host cell may be an OLS identified and characterized in Examples 2-8, such as a Cymbidium hybrid cultivar OLS (SEQ ID NO: 7) or a Phalaenopsis x Doritaenopsis hybrid cultivar OLS (SEQ ID NO: 15), or an OLS corresponding to SEQ ID NO: 145. The OLS enzyme may also be an engineered OLS such as CsOLS T335C (SEQ ID NO: 207) or an engineered version of any other OLS enzyme described in this disclosure. An OAC enzyme expressed heterologously in a host cell may be a naturally occurring or synthetic OAC that is functionally expressed in S. cerevisiae, or a variant thereof, including an OAC from C. sativa or a variant of an OAC from C. sativa. In instances where a bifunctional OLS, such as Corchorus olitorius OLS (SEQ ID NO: 6), is used, a separate OAC enzyme may be omitted.


A CBGAS enzyme, such as a PT enzyme, expressed heterologously in a host cell may be a naturally occurring or synthetic PT that is functionally expressed in S. cerevisiae, or a variant thereof, including a PT from C. sativa or a variant of a PT from C. sativa. For example, a PT may comprise CsPT4 from C. sativa, or a variant thereof, or NphB from Streptomyces sp. Strain CL190, or a variant thereof.


A TS enzyme expressed heterologously in a host cell may be a naturally occurring or synthetic TS that is functionally expressed in S. cerevisiae, or a variant thereof, including a TS from C. sativa or a variant of a TS from C. sativa. The TS enzyme may be a TS that produces one or more of CBDA, THCA, and CBCA as a majority product.


The cannabinoid fermentation procedure may be similar to the OLS assay described in the Examples above with the following exceptions: the incubation of production cultures may last from, for example, 48-144 hours, and production cultures may be supplemented with, for example, 4% galactose and 1 mM sodium hexanoate approximately every 24 hours. Titers of CBGA, CBDA, THCA, and CBCA may be quantified via LC-MS.


It should be appreciated that sequences disclosed herein may or may not contain signal sequences. The sequences disclosed herein encompass versions with or without signal sequences. It should also be understood that protein sequences disclosed herein may be depicted with or without a start codon (M). Accordingly, in some instances amino acid numbering may correspond to protein sequences containing a start codon, while in other instances, amino acid numbering may correspond to protein sequences that do not contain a start codon. Aspects of the disclosure encompass host cells comprising any of the sequences described herein, including the sequences within Tables 4-6, and 13-16 and fragments thereof.


Additional Tables Associated with the Disclosure









TABLE 4







Sequence Information For Strains Described in Example 1








Strain ID
AAE Sequence Information





t49578
This strain comprises a codon-optimized nucleic acid (SEQ ID NO: 70), which encodes a


(E. coli)

Yarrowia lipolytica protein (SEQ ID NO: 63). The protein sequence of SEQ ID NO: 63




corresponds to the protein sequence provided by UniProt Accession No. Q6CFE4.


t49594
This strain comprises a codon-optimized nucleic acid (SEQ ID NO: 71), which encodes a


(E. coli)

Yarrowia lipolytica protein (SEQ ID NO: 64). The protein sequence of SEQ ID NO: 64




corresponds to the protein sequence provided by UniProt Accession No. Q6C577. This



protein was expressed as a fusion protein with an N-terminal MYC tag (SEQ ID NO: 140).



SEQ ID NO: 707 corresponds to the fusion protein. SEQ ID NO: 712 is a codon-optimized



nucleic acid encoding SEQ ID NO: 707.


t51477
This strain comprises a codon-optimized nucleic acid (SEQ ID NO: 72), which encodes a


(E. coli)

Yarrowia lipolytica protein (SEQ ID NO: 65). The protein sequence of SEQ ID NO: 65




corresponds to the protein sequence provided by UniProt Accession No. Q6C650. This



protein was expressed as a fusion protein with an N-terminal MYC tag (SEQ ID NO: 140).



SEQ ID NO: 708 corresponds to the fusion protein). SEQ ID NO: 713 is a codon-



optimized nucleic acid encoding SEQ ID NO: 708.


t392878
This strain comprises a codon-optimized nucleic acid (SEQ ID NO: 75), which encodes a


(S. cerevisiae)

Yarrowia lipolytica protein (SEQ ID NO: 141). SEQ ID NO: 141 corresponds to residues




1-595 of SEQ ID NO: 68. The protein sequence of (SEQ ID NO: 141) corresponds to the



protein sequence provided by UniProt Accession No. Q6C577 except that the last three



residues (peroxisomal targeting signal 1) were removed.


t392879
This strain comprises a codon-optimized nucleic acid (SEQ ID NO: 76), which encodes a


(S. cerevisiae)

Yarrowia lipolytica protein (SEQ ID NO: 142). SEQ ID NO: 142 corresponds to residues




1-595 of SEQ ID NO: 69. The protein sequence of (SEQ ID NO: 142) corresponds to the



protein sequence provided by UniProt Accession No. Q6C577 except that the last three



residues (peroxisomal targeting signal 1) were removed.


t55127
This strain comprises a codon-optimized nucleic acid (SEQ ID NO: 73), which encodes a


(E. coli)

Rhodopseudotnonas palustris protein (SEQ ID NO: 66). The protein sequence of SEQ ID




NO: 66 corresponds to the protein sequence provided by UniProt Accession No. Q6N948.


t55128
This strain comprises a codon-optimized nucleic acid (SEQ ID NO: 74), which encodes a


(E. coli)

Rhodopseudotnonas palustris protein (SEQ ID NO: 67). The protein sequence of SEQ ID




NO: 67 corresponds to the protein sequence provided by UniProt Accession No. Q6N4N8.


t49580
This strain comprises a codon-optimized nucleic acid (SEQ ID NO:72), which encodes a


(E. coli)

Yarrowia lipolytica protein (SEQ ID NO: 65). The protein sequence of SEQ ID NO: 65




corresponds to the protein sequence provided by UniProt Accession No. Q6C650.
















TABLE 5







Sequence Information for Strains Described in Example 2 and FIGS. 8-10








Strain ID
OLS Sequence Information





t394087
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1182 of SEQ



ID NO: 32 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 32). Translation of this sequence



is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 1. The protein



sequence of SEQ ID NO: 1 corresponds to the protein sequence provided by UniProt Accession No.



A0A2G5F4L7, from Aquilegia coerulea (Rocky mountain columbine)


t394687
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1179 of SEQ



ID NO: 33 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 33). Translation of this sequence



is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 2. The protein



sequence of SEQ ID NO: 2 corresponds to the protein sequence provided by UniProt Accession No.



I6VW41, from Vitis pseudoreticulata (Chinese wild grapevine)


t393495
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1185 of SEQ



ID NO: 34 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 34). Translation of this sequence



is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 3. The protein



sequence of SEQ ID NO: 3 corresponds to the protein sequence provided by UniProt Accession No.



M4DVZ4, from Brassica rapa subsp. pekinensis (Chinese cabbage) (Brassica pekinensis)


t393563
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1197 of SEQ



ID NO: 35 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 35). Translation of this sequence



is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 4. The protein



sequence of SEQ ID NO: 4 corresponds to the protein sequence provided by UniProt Accession No.



Q8VWQ7, from Sorghum bicolor (Sorghum) (Sorghum vulgare)


t339568
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1158 of SEQ



ID NO: 36 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 36). Translation of this sequence



is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 5. The protein



sequence of SEQ ID NO: 5 corresponds to the protein sequence provided by UniProt Accession No.



B1Q2B6, from Cannabis sativa


t393974
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1191 of SEQ



ID NO: 37 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 37). Translation of this sequence



is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 6. The protein



sequence of SEQ ID NO: 6 corresponds to the protein sequence provided by UniProt Accession No.



A0A1R3HSU5, from Corchorus olitorius


t393991
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1173 of SEQ



ID NO: 38 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 38). Translation of this sequence



is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 7. The protein



sequence of SEQ ID NO: 7 corresponds to the protein sequence provided by UniProt Accession No.



A0A088G5Z5, from Cymbidium hybrid cultivar


t394336
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1185 of SEQ



ID NO: 39 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 39). Translation of this sequence



is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 8. The protein



sequence of SEQ ID NO: 8 corresponds to the protein sequence provided by UniProt Accession No.



A0A0A6Z8B1, from Paphiopedilum helenae


t394547
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1185 of SEQ



ID NO: 40 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 40). Translation of this sequence



is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 9. The protein



sequence of SEQ ID NO: 9 corresponds to the protein sequence provided by UniProt Accession No.



A0A078IM49, from Brassica napus (Rape)


t394457
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1191 of SEQ



ID NO: 41 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 41). Translation of this sequence



is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 10. The protein



sequence of SEQ ID NO: 10 corresponds to the protein sequence provided by UniProt Accession No.



A0A140KXU1, from Picea jezoensis


t394521
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1191 of SEQ



ID NO: 42 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 42). Translation of this sequence



is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 11. The protein



sequence of SEQ ID NO: 11 corresponds to the protein sequence provided by UniProt Accession No.



P48408, from Pinus strobus (Eastern white pine)


t394790
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1170 of SEQ



ID NO: 43 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 43). Translation of this sequence



is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 12. The protein



sequence of SEQ ID NO: 12 corresponds to the protein sequence provided by UniProt Accession No.



I3QQ50, from Arachis hypogaea (Peanut)


t394905
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1296 of SEQ



ID NO: 44 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 44). Translation of this sequence



is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 13. The protein



sequence of SEQ ID NO: 13 corresponds to the protein sequence provided by UniProt Accession No.



A0A1S4ATN2, from Nicotiana tabacum (Common tobacco)


t394981
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1197 of SEQ



ID NO: 45 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 45). Translation of this sequence



is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 14. The protein



sequence of SEQ ID NO: 14 corresponds to the protein sequence provided by UniProt Accession No.



K3Y7T4, from Setaria italica (Foxtail millet) (Panicum italicum)


t395011
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1173 of SEQ



ID NO: 46 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 46). Translation of this sequence



is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 15. The protein



sequence of SEQ ID NO: 15 corresponds to the protein sequence provided by UniProt Accession No.



Q6WJD6, from Phalaenopsis x Doritaenopsis hybrid cultivar


t394797
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1170 of SEQ



ID NO: 47 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 47). Translation of this sequence



is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 16. The protein



sequence of SEQ ID NO: 16 corresponds to the protein sequence provided by UniProt Accession No.



K7XD27, from Arachis hypogaea (Peanut)


t395094
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1179 of SEQ



ID NO: 48 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 48). Translation of this sequence



is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 17. The protein



sequence of SEQ ID NO: 17 corresponds to the protein sequence provided by UniProt Accession No.



A0A0D6QTX3, from Araucaria cunninghamii (Hoop pine) (Moreton Bay pine)


t395103
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1182 of SEQ



ID NO: 49 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 49). Translation of this sequence



is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 18. The protein



sequence of SEQ ID NO: 18 corresponds to the protein sequence provided by UniProt Accession No.



V7AZ15, from Phaseolus vulgaris (common bean)


t393835
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1179 of SEQ



ID NO: 50 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 50). Translation of this sequence



is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 19. The protein



sequence of SEQ ID NO: 19 corresponds to the protein sequence provided by UniProt Accession No.



I6S977, from Vitis quinquangularis


t394115
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1188 of SEQ



ID NO: 51 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 51). Translation of this sequence



is expected to produce a truncated version of a protein corresponding to SEQ ID NO: 20. The protein



sequence of SEQ ID NO: 20 corresponds to the protein sequence provided by UniProt Accession No.



Q9FR69, from Cardamine penzesii


t394091
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1170 of SEQ



ID NO: 52 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 52). Translation of this sequence



is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 21. The protein



sequence of SEQ ID NO: 21 corresponds to the protein sequence provided by UniProt Accession No.



G7IQL2, from Medicago truncatula (Barrel medic) (Medicago tribuloides)


t394037
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1179 of SEQ



ID NO: 53 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 53). Translation of this sequence



is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 22. The protein



sequence of SEQ ID NO: 22 corresponds to the protein sequence provided by UniProt Accession No.



I6W888, from Vitis pseudoreticulata (Chinese wild grapevine)


t394279
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1188 of SEQ



ID NO: 54 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 54). Translation of this sequence



is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 23. The protein



sequence of SEQ ID NO: 23 corresponds to the protein sequence provided by UniProt Accession No.



P13114, from Arabidopsis thaliana (Mouse-ear cress)


t394043
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1344 of SEQ



ID NO: 55 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 55). Translation of this sequence



is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 24. The protein



sequence of SEQ ID NO: 24 corresponds to the protein sequence provided by UniProt Accession No.



A0A251SHA8, from Helianthus annuus (Common sunflower)


t394404
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1170 of SEQ



ID NO: 56 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 56). Translation of this sequence



is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 25. The protein



sequence of SEQ ID NO: 25 corresponds to the protein sequence provided by UniProt Accession No.



X5I326, from Vaccinium ashei


t394436
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1197 of SEQ



ID NO: 57 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 57). Translation of this sequence



is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 26. The protein



sequence of SEQ ID NO: 26 corresponds to the protein sequence provided by UniProt Accession No.



A0A164ZDA1, from Daucus carota subsp. Sativus


t393720
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1212 of SEQ



ID NO: 58 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 58). Translation of this sequence



is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 27. The protein



sequence of SEQ ID NO: 27 corresponds to the protein sequence provided by UniProt Accession No.



Q58VP7, from Aloe arborescens (Kidachi aloe)


t394911
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1203 of SEQ



ID NO: 59 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 59). Translation of this sequence



is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 28. The protein



sequence of SEQ ID NO: 28 corresponds to the protein sequence provided by UniProt Accession No.



A0A2K3P0B5, from Trifolium pratense (Red clover)


t395023
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1200 of SEQ



ID NO: 60 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 60). Translation of this sequence



is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 29. The protein



sequence of SEQ ID NO: 29 corresponds to the protein sequence provided by UniProt Accession No.



Q8GZP4, from Hydrangea macrophylla (Bigleaf hydrangea) (Viburnum macrophyllum)


t339579
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1158 of SEQ



ID NO: 61 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 61). Translation of this sequence



is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 30. The protein



sequence of SEQ ID NO: 30 corresponds to the protein sequence provided by UniProt Accession No.



B1Q2B6, from C. sativa


t339582
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1158 of SEQ



ID NO: 62 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 62). Translation of this sequence



is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 31. The protein



sequence of SEQ ID NO: 31 corresponds to the protein sequence provided by UniProt Accession No.



B1Q2B6, from C. sativa


t394396
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1158 of SEQ



ID NO: 93 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 93). Translation of this sequence



is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 77. The protein



sequence of SEQ ID NO: 77 corresponds to the protein sequence provided by UniProt Accession No.



B1Q2B6, from C. sativa


t339546
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1158 of SEQ



ID NO: 94 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 94). Translation of this sequence



is expected to produce a truncated version of a protein corresponding to SEQ ID NO: 78. The protein



sequence of SEQ ID NO: 78 corresponds to the protein sequence provided by UniProt Accession No.



B1Q2B6, from C. sativa.


t339549
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1158 of SEQ



ID NO: 95 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 95). Translation of this sequence



is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 79. The protein



sequence of SEQ ID NO: 79 corresponds to the protein sequence provided by UniProt Accession No.



B1Q2B6, from C. sativa


t393360
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1158 of SEQ



ID NO: 96 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 96). Translation of this sequence



is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 80. The protein



sequence of SEQ ID NO: 80 corresponds to the protein sequence provided by UniProt Accession No.



F1LKH5, from C. sativa


t393555
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1188 of SEQ



ID NO: 97 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 97). Translation of this sequence



is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 81. The protein



sequence of SEQ ID NO: 81 corresponds to the protein sequence provided by UniProt Accession No.



Q9SEN0, from Fourraea alpina (Rock-cress) (Arabis pauciflora)


t394593
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1197 of SEQ



ID NO: 98 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 98). Translation of this sequence



is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 82. The protein



sequence of SEQ ID NO: 82 corresponds to the protein sequence provided by UniProt Accession No.



A0A059VFD5, from Punica granatum (Pomegranate)


t394351
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1167 of SEQ



ID NO: 99 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 99). Translation of this sequence



is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 83. The protein



sequence of SEQ ID NO: 83 corresponds to the protein sequence provided by UniProt Accession No.



Q1G6T7, from Cardamine apennina


t394414
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1068 of SEQ



ID NO: 100 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 100). Translation of this



sequence is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 84.



The protein sequence of SEQ ID NO: 84 corresponds to the protein sequence provided by UniProt



Accession No. A0A2T5VUN1, from Mycobacterium sp. YR782


t393402
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1173 of SEQ



ID NO: 101 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 101). Translation of this



sequence is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 85.



The protein sequence of SEQ ID NO: 85 corresponds to the protein sequence provided by UniProt



Accession No. A0A1Q9SCX4, from Kocuria sp. CNJ-770


t394035
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-675 of SEQ



ID NO: 102 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 102). Translation of this



sequence is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 86.



The protein sequence of SEQ ID NO: 86 corresponds to the protein sequence provided by UniProt



Accession No. A0A0K8QHJ1 from Arthrobacter sp. Hiyol


t394155
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1176 of SEQ



ID NO: 103 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 103). Translation of this



sequence is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 87.



The protein sequence of SEQ ID NO: 87 corresponds to the protein sequence provided by UniProt



Accession No. Q9XJ57 from Citrus sinensis (Sweet orange) (Citrus aurantium var. sinensis)


t394137
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1173 of SEQ



ID NO: 104 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 104). Translation of this



sequence is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 88.



The protein sequence of SEQ ID NO: 88 corresponds to the protein sequence provided by UniProt



Accession No. I6R2S0 from Narcissus tazetta var. chinensis


t393976
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1188 of SEQ



ID NO: 105 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 105). Translation of this



sequence is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 89.



The protein sequence of SEQ ID NO: 89 corresponds to the protein sequence provided by UniProt



Accession No. Q2ENA5 from Abies alba (Edeltanne) (European silver fir)


t394689
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1173 of SEQ



ID NO: 106 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 106). Translation of this



sequence is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 90.



The protein sequence of SEQ ID NO: 90 corresponds to the protein sequence provided by UniProt



Accession No. A0A022RTH3 from Erythranthe guttata (Yellow monkey flower)


t393400
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1053 of SEQ



ID NO: 107 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 107). Translation of this



sequence is expected to produce a truncated version of a protein corresponding to SEQ ID NO. 91.



The protein sequence of SEQ ID NO: 91 corresponds to the protein sequence provided by UniProt



Accession No. A0A2T7T652 from Streptomyces scopuliridis RB72


t394693
This strain comprises a codon-optimized nucleic acid that corresponds to nucleotides 3-1188 of SEQ



ID NO: 108 (due to a truncation of nucleotides 1-2 of SEQ ID NO: 108). Translation of this



sequence is expected to produce a truncated version of a protein corresponding to SEQ ID NO: 92.



The protein sequence of SEQ ID NO: 92 corresponds to the protein sequence provided by UniProt



Accession No. Q2EFKO, from Abies alba (Edeltanne) (European silver fir)
















TABLE 6







Sequence Information for Strains Described


in Example 3 and Tables 7 and 8












OLS Nucleotide Sequence
OLS Protein Sequence



Strain
(SEQ ID NO)
(SEQ ID NO)















t527340
62
5



t527346
38
7



t599285
172
17



t598244
173
143



t598490
174
144



t598916
175
145



t598301
176
7



t598212
177
146



t598424
178
147



t598578
179
148



t598836
180
149



t597770
181
150



t597768
182
151



t599210
183
152



t597806
184
153



t598184
185
154



t598084
186
6



t598989
187
155



t598609
188
156



t598907
189
157



t598159
190
158



t598607
191
159



t598132
192
160



t598202
193
161



t598224
194
162



t598242
195
163



t598265
196
13



t598502
197
164



t598669
198
165



t598828
199
166



t598888
200
167



t598890
201
168



t598897
202
169



t598965
203
170



t598976
204
8



t599231
205
15



t599271
206
171

















TABLE 7







Production of Olivetol and Olivetolic Acid in Secondary Screen of


Full-Length OLS Library (with sodium hexanoate supplementation)



















Standard







Standard
Average
Deviation
Average
Standard




Average
Deviation
Olivetolic
Olivetolic
Normalized
Deviation




Olivetol
Olivetol
Acid
Acid
Olivetol
Normalized


Strain
Strain type
[ug/L]
[ug/L]
[ug/L]
[ug/L]
(per OD)
Olivetol

















t527338
GFP Negative
0
0
0
0
0
0



Control


t527340

Cannabis OLS

19907.37
3392.375
464.2222
119.0458
1
0.17338



Positive Control


t527346
Library
47674.72
7310.215
1126.656
275.7896
2.577378
0.5328


t599285
Library
40719.68
5413.028
1006.4
205.3059
1.570539
0.12684


t598244
Library
30067.48
2678.216
1328.625
391.3286
2.322177
0.24813


t598490
Library
29983.43
7816.659
895.1
368.4323
1.423788
0.286677


t598916
Library
23515.18
1702.502
680.575
66.45968
0.938026
0.042088


t598301
Library
21070.73
12453.97
512.175
274.8916
0.881438
0.570619


t598212
Library
19864.23
6981.826
582.3
129.3529
1.315837
0.25006


t598424
Library
18263.73
3738.191
661.325
148.2432
0.782173
0.157387


t598578
Library
18167.93
1534.837
614.4
62.65115
0.733176
0.065004


t598836
Library
17825.58
3298.654
614.75
139.553
0.67585
0.216607


t597770
Library
16611.23
2805.423
565.575
87.49584
1.203593
0.740925


t597768
Library
16140.08
2088.271
469.65
58.01394
0.72298
0.123509


t599210
Library
3019.65
187.1009
4913.925
344.6286
0.0939
0.012992


t597806
Library
1452.425
194.0261
872.65
58.60117
0.096846
0.040743


t598184
Library
466.15
76.41424
6016.625
5727.47
0.033016
0.009583


t598084
Library
298.6
38.96913
711.85
86.71242
0.012889
0.003159


t598989
Library
192.725
3.557504
981.225
924.6046
0.008438
0.000605


t598609
Library
97.025
65.64777
490.925
484.0728
0.003307
0.002213


t598907
Library
97
66.37625
539.75
809.4711
0.004664
0.0034


t598159
Library
73.825
50.2624
1014.55
98.44669
0.003517
0.002408


t598607
Library
57.9
67.11512
1006.4
227.8435
0.0017
0.001963
















TABLE 8







Production of Olivetol and Olivetolic Acid in Secondary Screen of


Full-Length OLS Library (without sodium hexanoate supplementation)



















Standard







Standard
Average
Deviation
Average
Standard




Average
Deviation
Olivetolic
Olivetolic
Normalized
Deviation




Olivetol
Olivetol
Acid
Acid
Olivetol
Normalized


Strain
Strain type
[ug/L]
[ug/L]
[ug/L]
[ug/L]
(per OD)
Olivetol

















t527338
GFP Negative
0
0
0
0
0
0



Control


t527340

Cannabis OLS

233.5102
24.98585
0.139276
0.5909
1
0.15846



Positive Control


t527346
Library
726.4307
68.37505
0.431449
1.830482
3.072299
0.594866


t597768
Library
313.8253
41.20191
0
0
1.887013
0.355292


t597770
Library
600.919
84.71989
0
0
2.958839
0.883218


t598084
Library
41.65336
27.90324
143.5728
17.13177
0.203694
0.149696


t598132
Library
430.7581
98.35626
0
0
2.045727
0.229508


t598202
Library
629.8948
43.73374
0
0
2.319367
0.145214


t598212
Library
439.2199
23.32112
0
0
2.133233
0.157933


t598224
Library
535.1348
45.55404
4.682865
1.486543
2.001014
0.151286


t598242
Library
444.4074
36.07498
0
0
2.509074
0.247446


t598244
Library
780.646
10.19476
21.86307
21.3525
4.250859
0.503227


t598265
Library
399.8005
24.89199
0
0
2.005431
0.247479


t598301
Library
523.4047
10.23296
0
0
2.980013
0.751203


t598424
Library
1039.723
20.19861
22.23645
1.606142
6.95869
1.19209


t598490
Library
1042.654
162.3917
12.97602
6.917579
5.733983
1.383711


t598502
Library
649.3324
105.1879
0
0
3.49989
0.626512


t598578
Library
666.2518
167.478
0
0
3.35228
0.956232


t598669
Library
555.1264
124.4665
0
0
3.070987
0.828156


t598828
Library
362.7059
53.2614
0
0
1.702596
0.290853


t598836
Library
544.9485
38.11311
0
0
2.522761
0.438238


t598888
Library
0
0
1.036494
2.072988
0
0


t598890
Library
216.4236
177.1733
0
0
1.067688
0.958823


t598897
Library
303.5841
49.89216
0
0
1.9597
0.712481


t598916
Library
528.2121
40.26965
3.166338
6.332675
2.217625
0.308839


t598965
Library
259.0065
16.53317
0
0
1.82891
0.368218


t598976
Library
262.3653
15.67957
0
0
1.252384
0.144212


t599210
Library
137.3035
10.28164
453.0067
13.07732
0.545126
0.127023


t599231
Library
575.119
59.46478
0
0
2.438902
0.437276


t599271
Library
644.8095
74.40164
0
0
2.902135
0.475949


t599285
Library
670.6119
86.9222
0
0
2.154638
0.264646
















TABLE 9







Results of Secondary Screen of C. Sativa OLS Protein Engineering Library















Amino Acid



Standard




mutations from

Standard
Average
Deviation




wild-type
Average
Deviation
Olivetolic
Olivetolic





Cannabis protein

Olivetol
Olivetol
Acid
Acid


Strain
Strain type
(SEQ ID NO: 5)
[ug/L]
[ug/L]
[ug/L]
[ug/L]
















t346317
GFP

0
0
0
0



Negative Ctrl


t405417
Library
T335C
29155.65
1352.507
925.5075
84.12739


t404953
Library
S334P
16467.14
2021.617
473.085
68.7974


t405220
Library
Y153G
15190.5
1885.253
437.9325
84.3569


t404192
Library
I284Y
14380.39
1956.468
390.18
39.47375


t404323
Library
K100L
14246.96
544.9192
456.55
35.50002


t404196
Library
K116R
14068.84
2527.921
380.4225
42.26844


t404209
Library
I278E
13888.94
2872.84
439.1325
66.1486


t404164
Library
K108D
13824.77
2873.633
292.71
197.9307


t404170
Library
L348S
13625.61
1648.021
291.5625
195.0828


t404384
Library
K71R
13619.49
3039.582
372.775
42.82276


t405397
Library
V92G
13537.85
363.1012
414.8725
26.8083


t405164
Library
T128V
13374.17
1328.433
385.2925
59.52873


t404191
Library
K100M
13326.69
1006.193
280.65
188.8515


t405340
Library
Y135V
13234.48
1441.185
393.945
57.71197


t404421
Library
P229A
13099.17
2790.466
280.175
190.3239


t404631
Library
L241I
13096.36
2072.187
425.4825
86.29982


t405133
Library
T128A
13050.31
1267.354
408.1175
30.00599


t405081
Library
T128I
12839.46
770.8077
409.595
60.20914


t404898
Library
S334A
12549.31
2014.497
392.02
20.58724


t405017
Library
S326R
12437.99
1793.811
291.3725
198.041


t405140
Library
A125S
12379.56
2038.247
579.6825
87.43007


t404276
Library
I273V
12341.81
673.6841
344.5
243.9399


t404405
Library
K51R
12305.55
2024.022
401.0325
52.97083


t405079
Library
H328Y
11965.56
636.541
380.795
28.50397


t404978
Library
F64Y
11905.29
408.7996
208.145
241.0289


t405347
Library
T17K
11875.39
1484.666
353.8775
41.08276


t404855
Library
I207L
11774.75
1252.291
380.1475
30.28471


t405362
Library
V324I
11489.89
2109.012
350.79
78.35325


t404523
Library
L25R
11375.9
2136.219
212.6875
248.5159


t404951
Library
T296A
11266.41
958.6764
269.73
181.7056


t405308
Library
D320A
11140.11
1318.646
346.7825
18.1635


t405201
Library
V307I
11054.69
3152.4
194.21
241.925


t404219
Library
I23C
11046.19
1061.09
380.345
25.59692


t404673
Library
M267K
11004.45
2014.531
382.22
111.9515


t404274
Library
L277M
10942.46
930.1003
360.4375
80.2903


t405042
Library
T123C
10940.26
899.6448
314.7975
212.1934


t404528
Library
M267G
10521.73
2186.36
260.765
189.9513


t405312
Library
E196K
10503.49
1491.483
321.1425
43.88772


t404725
Library
R375T
10474.14
637.7002
181.37
209.4388


t405303
Library
T247A
10412.99
1484.527
353.6075
15.69949


t405395
Library
V95A
10397.39
1215.177
299.155
49.56504


t405326
Library
D54R
10116.47
1327.968
316.9625
42.33512


t404599
Library
L201C
10033.37
1349.986
368.615
82.31389
















TABLE 10A







Results of Secondary Screen of Protein Engineering Library Using C. sativa,


Cochorus, and Cymbidium Templates (supplemented with sodium hexanoate)



















Standard






Standard
Average
Deviation



Wild-type
Amino Acid
Average
Deviation
Olivetolic
Olivetolic



template
mutations from
Olivetol
Olivetol
Acid
Acid


Strain
used
wild-type
[ug/L]
[ug/L]
[ug/L]
[ug/L]
















t527338


22.72917
79.04841
0
0


GFP


t606794

Cannabis

wild-type
15598.47
5507.375
337.3906
163.6388




sativa




(Hemp) (Marijuana)


t527340

Cannabis

wild-type
15534.95
6243.926
359.0063
194.9093



Cannabis


sativa



OLS


t607067

Cannabis

F367L
10907.1
3397.082
310.75
114.4806




sativa



t607367

Cannabis

G366A
8394.975
2661.736
137.95
44.65143




sativa



t607391

Cannabis

P298N
8381.075
1668.07
167.15
37.8978




sativa



t606801

Cannabis

S334P
22691
2379.928
572.025
55.64575




sativa



t606984

Cannabis

I248M
384.975
256.6688
124.475
6.990649




sativa



t606899

Cannabis

S332W
55.1
110.2
19.4
38.8




sativa



t606797
Corchorus
wild-type
448.8643
190.2768
665.7071
222.9162


Corchorus
olitorius


OLS


t606807
Corchorus
d1-8 Y142C
8173.95
11559.71
639.55
114.0563



olitorius


t607179
Corchorus
Y301W
1487.6
1289.575
528.325
51.91142



olitorius
V302I




V303T




N305P




P308K




T309A


t607149
Corchorus
d1-8 W339S
925.4
1638.079
108.825
35.84888



olitorius


t607139
Corchorus
d1-8 Y266F
539.55
467.458
383.3
31.26137



olitorius


t607112
Corchorus
W339S
517.075
737.3064
81.225
37.55896



olitorius


t607332
Corchorus
d1-8
408.25
41.92648
393.6
66.39794



olitorius
Y301W




V302I




V303T




N305P




P308K




T309A


t607153
Corchorus
d1-8 A373G
334.35
19.09633
529.5
33.9143



olitorius


t607158
Corchorus
A373G
314.8
15.06143
423.9
57.00158



olitorius


t607236
Corchorus
M255I
315.325
23.9742
360.4
7.086607



olitorius


t607141
Corchorus
d1-8 Y266F
168.575
337.15
69.325
8.448422



olitorius
W339S


t607176
Corchorus
L374F
130.45
150.9282
97.05
6.728298



olitorius


t606930
Corchorus
d1-8 M255I
265.85
306.9972
735.6
53.30647



olitorius


t607193
Corchorus
d1-8 T12Y
65.675
131.35
39.3
28.53571



olitorius
F39Y Q42R




L43A Q47E




Q51D Q57K




I77L G79E




S84C E96D




T100E




L121K




N123K




A135V




A137M




T139G




H143Q




N146K




K151R




H152P




K156R




F158M




S174A




V182R




D183G




S184A




N231T




K232N




I241V




T253D




C260G




M287E




M353R




Q357E




S395N


t607006
Corchorus
Y266F
265.975
178.2236
505.325
21.29575



olitorius


t606993
Corchorus
d1-8 N305P
165.875
191.5369
427.15
17.27937



olitorius


t606852
Corchorus
N305P
73.375
146.75
404.3
58.06772



olitorius


t607119
Corchorus
d1-8 L374F
0
0
111.725
18.69962



olitorius


t607371
Corchorus
Y266F
0
0
37.25
3.421988



olitorius
W339S


t527346
Cymbidium
wild-type
29779.86
10784.18
631.1694
320.6464


Cymbidiu
hybrid


m OLS
cultivar


t606952
Cymbidium
V71Y
40374.05
3947.169
1177.275
107.3831



hybrid



cultivar


t607284
Cymbidium
F70M
18119.03
1257.566
374.85
15.30109



hybrid



cultivar


t607262
Cymbidium
L385M
18869.45
2300.831
394.425
43.38251



hybrid



cultivar


t606938
Cymbidium
D88A
30129.05
421.2235
773.1
17.39483



hybrid



cultivar


t607260
Cymbidium
E285A
15646.4
1341.912
319.1
21.50364



hybrid



cultivar


t607159
Cymbidium
L76I
25322
7151.452
633.5
130.6911



hybrid



cultivar


t606946
Cymbidium
N151P
29738.73
5548.193
778.9
116.4196



hybrid



cultivar


t606861
Cymbidium
E203K
44399.18
12437.5
1252.525
394.7546



hybrid



cultivar


t606918
Cymbidium
V50N
27251.28
1711.322
732.125
88.38052



hybrid



cultivar


t607135
Cymbidium
E28P
24306.43
6961.951
615.45
163.6961



hybrid



cultivar


t607286
Cymbidium
S34Q
13463.55
871.8021
287.675
21.0191



hybrid



cultivar


t606942
Cymbidium
R100P
34937.75
2136.806
968.7
20.64752



hybrid



cultivar


t606959
Cymbidium
A219C
32778.4
1462.567
812.475
54.31368



hybrid



cultivar


t607294
Cymbidium
K359M
13309.98
473.943
279
10.60157



hybrid



cultivar


t607282
Cymbidium
R100T
13723.58
1657.273
282.825
20.36768



hybrid



cultivar


t607230
Cymbidium
E116D
21772.93
983.0294
469.875
34.1937



hybrid



cultivar


t606965
Cymbidium
Y142V
26443.03
1993.229
918
96.78736



hybrid



cultivar


t607288
Cymbidium
T289D
13669.95
750.4718
290.775
34.8278



hybrid



cultivar


t607228
Cymbidium
M135I
21348.6
702.6819
604.6
19.56238



hybrid



cultivar


t606909
Cymbidium
W368H
40713.38
2212.329
998.95
50.99526



hybrid



cultivar


t606962
Cymbidium
D229E
28238.98
1771.039
765.65
43.92285



hybrid



cultivar


t607150
Cymbidium
E285K
25386.95
2387.108
593.6
71.86895



hybrid



cultivar


t607361
Cymbidium
E323Q
20209.88
3057.447
378.4
72.17622



hybrid



cultivar


t606932
Cymbidium
S18T
26673.38
4632.723
726.825
106.151



hybrid



cultivar


t606940
Cymbidium
A13S
29094.85
3024.508
737.1
40.72935



hybrid



cultivar


t607269
Cymbidium
A333R
12947.23
883.2885
268.075
35.74207



hybrid



cultivar


t607186
Cymbidium
S180N
23191.9
4783.221
594.2
135.8793



hybrid



cultivar


t607476
Cymbidium
L20F
17200
1998.783
350.375
25.15451



hybrid



cultivar


t607031
Cymbidium
N80H
25388.63
6003.712
611.825
130.4566



hybrid



cultivar


t606916
Cymbidium
R100A
25282.83
1244.898
638.4
21.58997



hybrid



cultivar


t607292
Cymbidium
V331I
12013.93
2802.682
241.875
58.81765



hybrid



cultivar


t606908
Cymbidium
I22M
32699.68
21947.9
1120.825
82.20671



hybrid



cultivar


t607248
Cymbidium
E155C
20670.43
1761.502
477.55
41.49647



hybrid



cultivar


t607023
Cymbidium
V71H
27924.45
1393.419
761.2
34.79224



hybrid



cultivar


t606936
Cymbidium
T111K
26953.6
3174.107
689.975
120.0151



hybrid



cultivar


t607433
Cymbidium
L291V
18071.78
6905.44
351.525
142.8462



hybrid



cultivar


t607600
Cymbidium
R123A
18038.33
2576.146
333.575
50.84194



hybrid



cultivar


t606894
Cymbidium
Y142F
36097.63
2137.695
911.05
54.27403



hybrid



cultivar


t606963
Cymbidium
I147E
26509.6
1811.909
657.275
14.62882



hybrid



cultivar


t607603
Cymbidium
Y1421
18913.93
1120.707
596.475
92.27106



hybrid



cultivar


t607452
Cymbidium
K54E
16597
3291.995
329.3
60.58185



hybrid



cultivar


t607197
Cymbidium
G84C
19663.7
884.8279
459.3
27.12354



hybrid



cultivar


t606996
Cymbidium
T170A
39067.8
2354.185
1157.575
79.54543



hybrid



cultivar


t607043
Cymbidium
N45T
28747.38
934.8057
679.875
21.52431



hybrid



cultivar


t607254
Cymbidium
G262T
11775.15
1040.26
246.425
18.13861



hybrid



cultivar


t607478
Cymbidium
N11R
15894.55
2710.618
324.075
39.31619



hybrid



cultivar


t607132
Cymbidium
D208A
24881.38
1936.584
765.75
60.52451



hybrid



cultivar


t607109
Cymbidium
D61R
25634.95
2078.231
613.375
100.6135



hybrid



cultivar


t607155
Cymbidium
C176D
22283.45
3073.579
526.325
67.26383



hybrid



cultivar


t606956
Cymbidium
L83M
23728.73
2145.982
656.075
79.72258



hybrid



cultivar


t606906
Cymbidium
E28A
33411.15
939.6742
850.55
34.71894



hybrid



cultivar


t607195
Cymbidium
T111R
17676.68
1327.959
403.725
50.31689



hybrid



cultivar


t607449
Cymbidium
E203P
16032.95
2469.127
317.75
28.30789



hybrid



cultivar


t607256
Cymbidium
G373A
12090.73
566.8141
281.125
10.12073



hybrid



cultivar


t607349
Cymbidium
K282D
18699.3
3173.399
354.175
88.50621



hybrid



cultivar


t606960
Cymbidium
N78R
25400.9
667.3651
660.875
37.90764



hybrid



cultivar


t607601
Cymbidium
K121V
18381.8
4015.223
314.125
78.4602



hybrid



cultivar


t607021
Cymbidium
H69Y
24261.93
2128.687
571.075
61.0977



hybrid



cultivar


t606874
Cymbidium
D208S
28092.05
2956.394
886.05
199.1175



hybrid



cultivar


t607320
Cymbidium
T111E
16500.03
4257.055
303.275
77.26765



hybrid



cultivar


t607317
Cymbidium
C82E
17852.45
4825.826
325.225
76.27321



hybrid



cultivar


t607224
Cymbidium
Y142C
19142.53
1455.367
540.25
54.21098



hybrid



cultivar


t606912
Cymbidium
D14P
31141.75
1149.49
769.2
65.33912



hybrid



cultivar


t607602
Cymbidium
R100E
17901.3
2319.379
328.5
67.41187



hybrid



cultivar


t606905
Cymbidium
V388A
29890.83
2522.115
747.45
83.27435



hybrid



cultivar


t607156
Cymbidium
H269S
21143.2
1008.997
527.5
30.55083



hybrid



cultivar


t607474
Cymbidium
G262A
15068.18
1372.284
326.075
28.24894



hybrid



cultivar


t607482
Cymbidium
R24K
14858.45
2175.705
313.6
58.93963



hybrid



cultivar


t606854
Cymbidium
L144I
35182.43
2487.935
921.625
54.02563



hybrid



cultivar


t607032
Cymbidium
K355S
24649.8
1675.56
544.4
16.12203



hybrid



cultivar


t606830
Cymbidium
M135V
34299.15
4732.074
977.675
108.3528



hybrid



cultivar


t606961
Cymbidium
V43M
25357.95
678.9486
646.975
24.3909



hybrid



cultivar


t606868
Cymbidium
L248I
35144.45
1993.999
945.425
66.1703



hybrid



cultivar


t607083
Cymbidium
G84S
21707.73
2407.367
537.85
68.42819



hybrid



cultivar


t606958
Cymbidium
C82N
22576.8
2610.997
564.75
61.60793



hybrid



cultivar


t607273
Cymbidium
V341P
10381.15
857.4199
209.325
25.9962



hybrid



cultivar


t607241
Cymbidium
V388T
17754.23
1010.578
444.275
22.37668



hybrid



cultivar


t606857
Cymbidium
A195V
36686.03
2848.052
937.825
116.369



hybrid



cultivar


t606901
Cymbidium
D208C
30154
1882.794
883.325
84.4674



hybrid



cultivar


t607015
Cymbidium
G84T
23550.9
1835.311
559.725
25.30183



hybrid



cultivar


t607586
Cymbidium
I326R
13845.4
1980.772
297.075
55.00051



hybrid



cultivar


t607122
Cymbidium
F374L
20425.3
2531.711
488.175
62.87863



hybrid



cultivar


t606882
Cymbidium
L222I
27445.5
1536.232
692.15
54.08453



hybrid



cultivar


t607146
Cymbidium
G262T
18830.75
1157.57
440.025
51.65213



hybrid



cultivar


t607585
Cymbidium
L83I
13348.68
1378.214
274.425
30.01082



hybrid



cultivar


t606828
Cymbidium
A107M
32613.03
1865.271
860.575
52.56104



hybrid



cultivar


t607081
Cymbidium
I99G
20827
2377.691
521.525
35.96761



hybrid



cultivar


t606887
Cymbidium
Q274G
27128.45
1010.786
675.45
25.36697



hybrid



cultivar


t606891
Cymbidium
K356Q
26852.25
420.5503
655.325
41.67072



hybrid



cultivar


t607160
Cymbidium
V341A
19361.23
1205.165
469.725
29.27984



hybrid



cultivar


t607194
Cymbidium
F86Y
17323.08
6000.996
483.975
161.5325



hybrid



cultivar


t607377
Cymbidium
S339W
0
0
46
22.64553



hybrid



cultivar


t607265
Cymbidium
E28P F40Y
5858.45
955.8912
165.625
31.40938



hybrid
N51D K54E



cultivar
N80H C82E




L83I G84C




F86Y D88A




N89P N92D




F97M I99R




T111Q




R123K




L137M




I147L




N151P




C176D




S180N




K182P




Q206L




H233N




S237F




A252E




S260G




G262A




G281E




T289K




D367E


t607000
Cymbidium
G373A
16989.08
11567.45
700.525
192.4804



hybrid
F374L



cultivar


t607245
Cymbidium
E28P G84C
8507.025
5710.02
210.4
141.3894



hybrid
N89P



cultivar
T111R




N151P




C176D




Q206L




S237F




A252E




G262T




G281E


t607318
Cymbidium
E28P F40Y
6864.125
789.6743
243.25
102.741



hybrid
N51D K54E



cultivar
N80H C82E




L83I G84C




F86Y D88A




N89P N92D




F97M I99R




T111E




R123K




L137M




I147L




N151P




C176D




S180N




K182P




Q206L




H233N




S237F




A252E




S260G




G262A




G281E




T289K




D367E


t607435
Cymbidium
E28P N51D
4888.425
3390.097
163.75
117.0815



hybrid
K54E N80H



cultivar
C82E L83I




G84C F86Y




D88A N89P




N92D F97M




I99R T111E




R123K




L137M




I147L




N151P




C176D




S180N




K182P




Q206L




H233N




S237F




A252E




S260G




G262A




G281E




T289K




D367E


t607337
Cymbidium
E28P N51E
5250.525
764.6064
135.05
26.46186



hybrid
N80H C82E



cultivar
L83I G84C




F86Y D88A




N89P N92D




F97M I99R




T111R




R123K




L137M




I147L




N151P




C176D




S180N




K182P




Q206L




H233N




S237F




A252E




S260G




G262A




G281E




T289K




D367E


t607316
Cymbidium
W301Y
5549.375
462.0676
169
15.9066



hybrid
P303V



cultivar
P305N




R308P




A309T


t607124
Cymbidium
I255M



hybrid
A266Y
0
0
0
0



cultivar
W301Y




P303V




P305N




R308P




A309T




S339W




V341P




G373A




F374L


t607280
Cymbidium
A266Y
0
0
0
0



hybrid
P305N



cultivar
S339W




V341P




G373A




F374L


t607290
Cymbidium
A266Y
0
0
0
0



hybrid
S339W



cultivar


t607381
Cymbidium
A266Y
0
0
0
0



hybrid
P305N



cultivar
S339W
















TABLE 10B







Results of Secondary Screen of Protein Engineering Library Using C. sativa,


Cochorus, and Cymbidium Templates (supplemented with sodium hexanoate)


















Average
Standard







Olivetol
Deviation





Average
Standard
Normalized to
Olivetol



Wild-type
Amino Acid
Normalized
Deviation
t606797_Cor-
Normalized to



template
mutations from
Olivetol
Normalized
chorus OLS
t606797_Cor-


Strain
used
wild-type
(per OD)
Olivetol
(per OD)
chorus OLS
















t527338


0.001903
0.006512




GFP


t606794

Cannabis

wild-type
1.166072
0.265739




sativa (Hemp)




(Marijuana)


t527340

Cannabis

wild-type
1.02159
0.125216



Cannabis


sativa



OLS


t607067

Cannabis

F367L
0.761216
0.189784




sativa



t607367

Cannabis

G366A
0.745615
0.241737




sativa



t607391

Cannabis

P298N
0.708056
0.110021




sativa



t606801

Cannabis

S334P
0.674894
0.108203




sativa



t606984

Cannabis

I248M
0.00995
0.006837




sativa



t606899

Cannabis

S332W
0.001802
0.003605




sativa



t606797
Corchorus
wild-type
0.027033
0.012052
1
0.520764


Corchorus
olitorius


OLS


t606807
Corchorus
d1-8 Y142C
0.275531
0.38966
29.9306
42.32826



olitorius


t607179
Corchorus
Y301W V302I
0.103413
0.091496
2.842431
2.514883



olitorius
V303T N305P




P308K T309A


t607149
Corchorus
d1-8 W339S
0.075566
0.133268
2.077017
3.663029



olitorius


t607139
Corchorus
d1-8 Y266F
0.038008
0.033998
1.044701
0.934465



olitorius


t607112
Corchorus
W339S
0.033504
0.050586
1.29632
1.957247



olitorius


t607332
Corchorus
d1-8 Y301W
0.027999
0.00441
0.919863
0.144887



olitorius
V302I V303T




N305P P308K




T309A


t607153
Corchorus
d1-8 A373G
0.025872
0.001554
0.711133
0.042715



olitorius


t607158
Corchorus
A373G
0.0221
0.001298
0.607436
0.035688



olitorius


t607236
Corchorus
M255I
0.019521
0.001797
0.701647
0.064597



olitorius


t607141
Corchorus
d1-8 Y266F
0.012974
0.025949
0.356615
0.71323



olitorius
W339S


t607176
Corchorus
L374F
0.010369
0.012076
0.285013
0.33193



olitorius


t606930
Corchorus
d1-8 M255I
0.009039
0.010752
0.948726
1.128513



olitorius


t607193
Corchorus
d1-8 T12Y
0.005613
0.011226
0.154285
0.30857



olitorius
F39Y Q42R




L43A Q47E




Q51D Q57K




I77L G79E




S84C E96D




T100E L121K




N123K




A135V




A137M




T139G H143Q




N146K K151R




H152P K156R




F158M S174A




V182R D183G




S184A N231T




K232N I241V




T253D C260G




M287E




M353R




Q357E S395N


t607006
Corchorus
Y266F
0.004498
0.003084
0.58364
0.400247



olitorius


t606993
Corchorus
d1-8 N305P
0.003933
0.004545
0.510385
0.58981



olitorius


t606852
Corchorus
N305P
0.001714
0.003429
0.186231
0.372463



olitorius


t607119
Corchorus
d1-8 L374F
0
0
0
0



olitorius


t607371
Corchorus
Y266F W339S
0
0
0
0



olitorius


t527346
Cymbidium
wild-type
0.90183
0.165441


Cymbidium
hybrid cultivar


OLS


t606952
Cymbidium
V71Y
0.967928
0.215533



hybrid cultivar


t607284
Cymbidium
F70M
0.949145
0.075505



hybrid cultivar


t607262
Cymbidium
L385M
0.890722
0.082131



hybrid cultivar


t606938
Cymbidium
D88A
0.871
0.251207



hybrid cultivar


t607260
Cymbidium
E285A
0.806964
0.103286



hybrid cultivar


t607159
Cymbidium
L76I
0.760304
0.251978



hybrid cultivar


t606946
Cymbidium
N151P
0.754797
0.320899



hybrid cultivar


t606861
Cymbidium
E203K
0.745482
0.268545



hybrid cultivar


t606918
Cymbidium
V50N
0.734491
0.069332



hybrid cultivar


t607135
Cymbidium
E28P
0.731258
0.315887



hybrid cultivar


t607286
Cymbidium
S34Q
0.716576
0.071466



hybrid cultivar


t606942
Cymbidium
R100P
0.712487
0.038444



hybrid cultivar


t606959
Cymbidium
A219C
0.709144
0.063379



hybrid cultivar


t607294
Cymbidium
K359M
0.707812
0.023224



hybrid cultivar


t607282
Cymbidium
R100T
0.703105
0.073566



hybrid cultivar


t607230
Cymbidium
E116D
0.694814
0.046965



hybrid cultivar


t606965
Cymbidium
Y142V
0.690425
0.129346



hybrid cultivar


t607288
Cymbidium
T289D
0.68465
0.060985



hybrid cultivar


t607228
Cymbidium
M135I
0.682179
0.118232



hybrid cultivar


t606909
Cymbidium
W368H
0.676488
0.062146



hybrid cultivar


t606962
Cymbidium
D229E
0.675087
0.074269



hybrid cultivar


t607150
Cymbidium
E285K
0.664425
0.076686



hybrid cultivar


t607361
Cymbidium
E323Q
0.663279
0.09616



hybrid cultivar


t606932
Cymbidium
S18T
0.660218
0.135435



hybrid cultivar


t606940
Cymbidium
A13S
0.657782
0.010521



hybrid cultivar


t607269
Cymbidium
A333R
0.652497
0.051357



hybrid cultivar


t607186
Cymbidium
S180N
0.649029
0.165224



hybrid cultivar


t607476
Cymbidium
L20F
0.647873
0.057907



hybrid cultivar


t607031
Cymbidium
N80H
0.629308
0.25658



hybrid cultivar


t606916
Cymbidium
R100A
0.628769
0.063422



hybrid cultivar


t607292
Cymbidium
V331I
0.628464
0.157777



hybrid cultivar


t606908
Cymbidium
I22M
0.624153
0.422076



hybrid cultivar


t607248
Cymbidium
E155C
0.622639
0.035079



hybrid cultivar


t607023
Cymbidium
V71H
0.620539
0.024017



hybrid cultivar


t606936
Cymbidium
T111K
0.620218
0.070587



hybrid cultivar


t607433
Cymbidium
L291V
0.61928
0.21641



hybrid cultivar


t607600
Cymbidium
R123A
0.617083
0.084619



hybrid cultivar


t606894
Cymbidium
Y142F
0.617067
0.042344



hybrid cultivar


t606963
Cymbidium
I147E
0.616751
0.065014



hybrid cultivar


t607603
Cymbidium
Y142I
0.615776
0.03639



hybrid cultivar


t607452
Cymbidium
K54E
0.609431
0.073608



hybrid cultivar


t607197
Cymbidium
G84C
0.607758
0.04262



hybrid cultivar


t606996
Cymbidium
T170A
0.606281
0.062689



hybrid cultivar


t607043
Cymbidium
N45T
0.601817
0.034499



hybrid cultivar


t607254
Cymbidium
G262T
0.599911
0.07071



hybrid cultivar


t607478
Cymbidium
N11R
0.593189
0.06751



hybrid cultivar


t607132
Cymbidium
D208A
0.592492
0.112514



hybrid cultivar


t607109
Cymbidium
D61R
0.591253
0.021319



hybrid cultivar


t607155
Cymbidium
C176D
0.589853
0.117858



hybrid cultivar


t606956
Cymbidium
L83M
0.588247
0.063594



hybrid cultivar


t606906
Cymbidium
E28A
0.588171
0.068304



hybrid cultivar


t607195
Cymbidium
T111R
0.587434
0.080522



hybrid cultivar


t607449
Cymbidium
E203P
0.583681
0.065069



hybrid cultivar


t607256
Cymbidium
G373A
0.582906
0.069959



hybrid cultivar


t607349
Cymbidium
K282D
0.581651
0.053625



hybrid cultivar


t606960
Cymbidium
N78R
0.571233
0.098482



hybrid cultivar


t607601
Cymbidium
K121V
0.569034
0.179725



hybrid cultivar


t607021
Cymbidium
H69Y
0.563625
0.026338



hybrid cultivar


t606874
Cymbidium
D208S
0.562002
0.072513



hybrid cultivar


t607320
Cymbidium
T111E
0.561861
0.128738



hybrid cultivar


t607317
Cymbidium
C82E
0.559312
0.140085



hybrid cultivar


t607224
Cymbidium
Y142C
0.557739
0.033377



hybrid cultivar


t606912
Cymbidium
D14P
0.557253
0.024918



hybrid cultivar


t607602
Cymbidium
R100E
0.553566
0.029107



hybrid cultivar


t606905
Cymbidium
V388A
0.552755
0.038488



hybrid cultivar


t607156
Cymbidium
H269S
0.552698
0.060754



hybrid cultivar


t607474
Cymbidium
G262A
0.55226
0.080327



hybrid cultivar


t607482
Cymbidium
R24K
0.551229
0.059821



hybrid cultivar


t606854
Cymbidium
L144I
0.550953
0.054635



hybrid cultivar


t607032
Cymbidium
K355S
0.543176
0.043908



hybrid cultivar


t606830
Cymbidium
M135V
0.542887
0.080821



hybrid cultivar


t606961
Cymbidium
V43M
0.535062
0.061363



hybrid cultivar


t606868
Cymbidium
L248I
0.532705
0.006774



hybrid cultivar


t607083
Cymbidium
G84S
0.529811
0.024702



hybrid cultivar


t606958
Cymbidium
C82N
0.525334
0.152553



hybrid cultivar


t607273
Cymbidium
V341P
0.522317
0.046732



hybrid cultivar


t607241
Cymbidium
V388T
0.519025
0.016822



hybrid cultivar


t606857
Cymbidium
A195V
0.518645
0.087321



hybrid cultivar


t606901
Cymbidium
D208C
0.51686
0.063709



hybrid cultivar


t607015
Cymbidium
G84T
0.515084
0.042136



hybrid cultivar


t607586
Cymbidium
I326R
0.514156
0.080245



hybrid cultivar


t607122
Cymbidium
F374L
0.513354
0.048872



hybrid cultivar


t606882
Cymbidium
L222I
0.510184
0.011174



hybrid cultivar


t607146
Cymbidium
G262T
0.506767
0.057007



hybrid cultivar


t607585
Cymbidium
L83I
0.506322
0.049736



hybrid cultivar


t606828
Cymbidium
A107M
0.505604
0.006906



hybrid cultivar


t607081
Cymbidium
I99G
0.504811
0.041631



hybrid cultivar


t606887
Cymbidium
Q274G
0.50406
0.082289



hybrid cultivar


t606891
Cymbidium
K356Q
0.501544
0.043785



hybrid cultivar


t607160
Cymbidium
V341A
0.501469
0.056063



hybrid cultivar


t607194
Cymbidium
F86Y
0.500062
0.184291



hybrid cultivar


t607377
Cymbidium
S339W
0
0



hybrid cultivar


t607265
Cymbidium
E28P F40Y
0.298621
0.029421



hybrid cultivar
N51D K54E




N80H C82E




L83I G84C




F86Y D88A




N89P N92D




F97M I99R




T111Q R123K




L137M I147L




N151P C176D




S180N K182P




Q206L H233N




S237F A252E




S260G G262A




G281E T289K




D367E


t607000
Cymbidium
G373A F374L
0.279083
0.189883



hybrid cultivar


t607245
Cymbidium
E28P G84C
0.246285
0.179183



hybrid cultivar
N89P T111R




N151P C176D




Q206L S237F




A252E G262T




G281E


t607318
Cymbidium
E28P F40Y
0.205093
0.035905



hybrid cultivar
N51D K54E




N80H C82E




L83I G84C




F86Y D88A




N89P N92D




F97M I99R




T111E R123K




L137M I147L




N151P C176D




S180N K182P




Q206L H233N




S237F A252E




S260G G262A




G281E T289K




D367E


t607435
Cymbidium
E28P N51D
0.181914
0.136726



hybrid cultivar
K54E N80H




C82E L83I




G84C F86Y




D88A N89P




N92D F97M




I99R T111E




R123K




L137M I147L




N151P C176D




S180N K182P




Q206L H233N




S237F A252E




S260G G262A




G281E T289K




D367E


t607337
Cymbidium
E28P N51E
0.163476
0.021163



hybrid cultivar
N80H C82E




L83I G84C




F86Y D88A




N89P N92D




F97M I99R




T111R R123K




L137M I147L




N151P C176D




S180N K182P




Q206L H233N




S237F A252E




S260G G262A




G281E T289K




D367E


t607316
Cymbidium
W301Y
0.161295
0.013439



hybrid cultivar
P303V P305N




R308P A309T


t607124
Cymbidium
I255M A266Y
0
0



hybrid cultivar
W301Y




P303V P305N




R308P A309T




S339W V341P




G373A F374L


t607280
Cymbidium
A266Y P305N
0
0



hybrid cultivar
S339W V341P




G373A F374L


t607290
Cymbidium
A266Y
0
0



hybrid cultivar
S339W


t607381
Cymbidium
A266Y P305N
0
0



hybrid cultivar
S339W
















TABLE 11A







Results of Secondary Screen of Protein Engineering Library Using C. sativa,


Cochorus, and Cymbidium Templates (not supplemented with sodium hexanoate)



















Standard






Standard
Average
Deviation



Wild-type
Amino Acid
Average
Deviation
Olivetolic
Olivetolic



template
mutations from
Olivetol
Olivetol
Acid
Acid


Strain
used
wild-type
[ug/L]
[ug/L]
[ug/L]
[ug/L]
















t527338 GFP


0.620833
3.04145
0
0


t527340

Cannabis

wild-type
127.8438
14.41426
0
0



Cannabis


sativa



OLS


t606801

Cannabis

S334P
220.05
20.91355
0
0




sativa




(Hemp) (Marijuana)


t607067

Cannabis

F367L
160.1
15.69777
0
0




sativa




(Hemp) (Marijuana)


t607367

Cannabis

G366A
142.075
3.482695
0
0




sativa




(Hemp) (Marijuana)


t606794

Cannabis

wild-type
150.8156
12.9258
0
0




sativa




(Hemp) (Marijuana)


t607391

Cannabis

P298N
82.95
5.945587
0
0




sativa




(Hemp) (Marijuana)


t606984

Cannabis

I248M
14.025
0.745542
0
0




sativa




(Hemp) (Marijuana)


t606899

Cannabis

S332W
0
0
0
0




sativa




(Hemp) (Marijuana)


t606797
Corchorus
wild-type
39.93571
6.066497
128.175
20.00325


Corchorus
olitorius


OLS


t606807
Corchorus
d1-8 Y142C
170.75
207.3944
44.3
26.72864



olitorius


t606930
Corchorus
d1-8 M255I
43.925
4.811358
126.175
8.854519



olitorius


t607179
Corchorus
Y301W
40.725
8.259288
111.775
6.540833



olitorius
V302I




V303T




N305P




P308K




T309A


t607332
Corchorus
d1-8 Y301W
43.375
4.289814
143.675
17.99711



olitorius
V302I




V303T




N305P




P308K




T309A


t607236
Corchorus
M255I
38.825
0.869387
101.25
4.184495



olitorius


t607006
Corchorus
Y266F
25.425
1.011187
84.475
1.575595



olitorius


t606993
Corchorus
d1-8 N305P
24.35
1.053565
77.375
1.817278



olitorius


t607139
Corchorus
d1-8 Y266F
21.525
2.692428
62.375
2.57083



olitorius


t607158
Corchorus
A373G
17.725
1.705628
46.75
5.257059



olitorius


t607153
Corchorus
d1-8 A373G
16.875
1.327592
51.575
2.379601



olitorius


t606852
Corchorus
N305P
25.675
1.613227
75.975
4.164433



olitorius


t607112
Corchorus
W339S
0
0
0
0



olitorius


t607119
Corchorus
d1-8 L374F
0
0
20.725
0.780491



olitorius


t607141
Corchorus
d1-8 Y266F
0
0
0
0



olitorius
W339S


t607149
Corchorus
d1-8 W339S
0
0
0
0



olitorius


t607176
Corchorus
L374F
0
0
17.975
0.684957



olitorius


t607193
Corchorus
d1-8 T12Y
0
0
0
0



olitorius
F39Y Q42R




L43A Q47E




Q51D Q57K




I77L G79E




S84C E96D




T100E




L121K




N123K




A135V




A137M




T139G




H143Q




N146K




K151R




H152P




K156R




F158M




S174A




V182R




D183G




S184A




N231T




K232N




I241V




T253D




C260G




M287E




M353R




Q357E




S395N


t607371
Corchorus
Y266F
0
0
0
0



olitorius
W339S


t527346
Cymbidium
wild-type
406.4417
80.3967
3.658333
1.293141


Cymbidium
hybrid


OLS
cultivar


t607221
Cymbidium
N92D
509.9
0
6.5
0.424264



hybrid



cultivar


t607228
Cymbidium
M135I
524.425
26.05281
6.825
0.861684



hybrid



cultivar


t606878
Cymbidium
P303A
632.95
28.30365
8.125
5.596055



hybrid



cultivar


t606986
Cymbidium
E323G
551.95
42.85468
8.6
1.411855



hybrid



cultivar


t606999
Cymbidium
N151P
508.3
23.17873
8.4
0.365148



hybrid



cultivar


t607224
Cymbidium
Y142C
493.525
14.39546
7.725
0.556028



hybrid



cultivar


t606976
Cymbidium
Q274K
541.475
26.72569
6.525
4.353064



hybrid



cultivar


t607241
Cymbidium
V388T
455.325
14.62244
5.75
0.420317



hybrid



cultivar


t607603
Cymbidium
Y142I
540.975
14.56237
11.475
0.853913



hybrid



cultivar


t607222
Cymbidium
N11K
479.35
10.11163
6.3
0.424264



hybrid



cultivar


t607014
Cymbidium
A287M
511.8
26.34755
7.975
1.388944



hybrid



cultivar


t606994
Cymbidium
T289A
496.475
22.44614
8.225
0.57373



hybrid



cultivar


t606982
Cymbidium
V314I
546.875
39.64899
8.9
1.177568



hybrid



cultivar


t606995
Cymbidium
I147L
480.9
23.94006
7.225
0.7932



hybrid



cultivar


t607007
Cymbidium
G281E
489.6
12.27436
8.15
0.544671



hybrid



cultivar


t607008
Cymbidium
I390Q
491.625
18.42107
6.575
0.125831



hybrid



cultivar


t606965
Cymbidium
Y142V
570.85
25.65599
10.1
0.547723



hybrid



cultivar


t607107
Cymbidium
A79E
431.575
22.92108
4.575
0.877021



hybrid



cultivar


t607194
Cymbidium
F86Y
437.325
4.76891
4.875
0.727438



hybrid



cultivar


t606981
Cymbidium
E96R
512.85
23.07993
3.825
4.439501



hybrid



cultivar


t606979
Cymbidium
E32R
492
40.898
6.2
1.249



hybrid



cultivar


t606975
Cymbidium
K182P
532.825
6.97824
7.75
0.3



hybrid



cultivar


t607230
Cymbidium
E116D
461.35
10.56362
5.175
0.499166



hybrid



cultivar


t607004
Cymbidium
L291Y
487.475
6.717825
8.25
0.619139



hybrid



cultivar


t606996
Cymbidium
T170A
485.075
15.50965
8
0.559762



hybrid



cultivar


t607046
Cymbidium
K359R
527.9
18.45842
7.775
0.556028



hybrid



cultivar


t607043
Cymbidium
N45T
524.225
15.59837
6.175
1.040433



hybrid



cultivar


t607021
Cymbidium
H69Y
530.025
26.2045
6.95
0.635085



hybrid



cultivar


t607109
Cymbidium
D61R
446.85
18.05076
5.275
0.873212



hybrid



cultivar


t607036
Cymbidium
K321E
521.625
5.105797
8.05
0.331662



hybrid



cultivar


t606912
Cymbidium
D14P
538.775
35.94546
8.975
0.910586



hybrid



cultivar


t607602
Cymbidium
R100E
465.225
21.39554
5.75
0.83865



hybrid



cultivar


t607361
Cymbidium
E323Q
453.35
6.413787
4.775
0.411299



hybrid



cultivar


t606882
Cymbidium
L222I
501.55
28.07757
7.6
0.559762



hybrid



cultivar


t606962
Cymbidium
D229E
523.325
17.80587
7.225
0.330404



hybrid



cultivar


t607150
Cymbidium
E285K
418.225
12.1393
4.225
0.330404



hybrid



cultivar


t607252
Cymbidium
F40Y
435.9
14.81486
5.275
0.713559



hybrid



cultivar


t607225
Cymbidium
S260G
418.925
13.55221
4.175
0.170783



hybrid



cultivar


t607032
Cymbidium
K355S
500.8
16.68772
6.75
1.767767



hybrid



cultivar


t607248
Cymbidium
E155C
405.15
11.08708
3.3
1.036018



hybrid



cultivar


t607155
Cymbidium
C176D
423
7.037045
4.475
0.095743



hybrid



cultivar


t606958
Cymbidium
C82N
536.2
25.07761
7.575
0.634429



hybrid



cultivar


t607023
Cymbidium
V71H
511.725
10.15755
7
0.216025



hybrid



cultivar


t607027
Cymbidium
T289K
520.675
13.61503
8.1
0.535413



hybrid



cultivar


t606892
Cymbidium
Y142T
572.6
39.21003
10.9
0.702377



hybrid



cultivar


t607035
Cymbidium
Y160G
513.55
23.95308
11.825
0.93586



hybrid



cultivar


t607237
Cymbidium
T289S
424.05
16.8777
4.45
0.556776



hybrid



cultivar


t607189
Cymbidium
N51E
420
11.2116
4.55
0.619139



hybrid



cultivar


t607118
Cymbidium
T289Q
404.3
9.899495
4.25
0.070711



hybrid



cultivar


t607018
Cymbidium
I390N
493.25
17.93813
7.025
1.297112



hybrid



cultivar


t607045
Cymbidium
F30C
522.175
29.45996
8
0.930949



hybrid



cultivar


t606888
Cymbidium
A107L
521.125
20.05731
6.85
0.404145



hybrid



cultivar


t607220
Cymbidium
A13T
432.725
9.603602
4.825
0.206155



hybrid



cultivar


t606830
Cymbidium
M135V
568.425
18.5615
7.6
0.432049



hybrid



cultivar


t606832
Cymbidium
E155Q
552.975
20.81688
7.225
0.741058



hybrid



cultivar


t607601
Cymbidium
K121V
429.25
19.20772
5.025
0.15



hybrid



cultivar


t606857
Cymbidium
A195V
607.575
22.90988
8.125
0.567891



hybrid



cultivar


t607452
Cymbidium
K54E
436.75
12.72962
5.325
0.623832



hybrid



cultivar


t607218
Cymbidium
F97M
451.85
12.7291
5
0.787401



hybrid



cultivar


t607186
Cymbidium
S180N
435.525
20.91274
4.8
0.535413



hybrid



cultivar


t607123
Cymbidium
S237F
421.475
3.484609
4.875
0.386221



hybrid



cultivar


t607286
Cymbidium
S34Q
452.15
17.4135
5.35
0.412311



hybrid



cultivar


t606918
Cymbidium
V50N
506.275
26.08453
6.5
0.752773



hybrid



cultivar


t606916
Cymbidium
R100A
509.775
27.51986
7.05
0.988264



hybrid



cultivar


t606990
Cymbidium
L291W
457.875
21.00831
6.925
1.001249



hybrid



cultivar


t606908
Cymbidium
I22M
523.45
43.08074
4.35
5.05536



hybrid



cultivar


t606963
Cymbidium
I147E
502.075
20.19032
6.4
0.752773



hybrid



cultivar


t607226
Cymbidium
Q115D
435.7
13.37934
5.125
0.359398



hybrid



cultivar


t606961
Cymbidium
V43M
537.45
13.96675
6.8
0.355903



hybrid



cultivar


t607260
Cymbidium
E285A
454.15
7.783101
5.2
0.244949



hybrid



cultivar


t607160
Cymbidium
V341A
405.65
13.91893
4.4
0.541603



hybrid



cultivar


t607156
Cymbidium
H269S
415.35
17.57546
3.925
0.464579



hybrid



cultivar


t607478
Cymbidium
N11R
448.1
9.988994
5.05
0.772442



hybrid



cultivar


t606887
Cymbidium
Q274G
481.45
22.29716
5.7
3.81663



hybrid



cultivar


t606861
Cymbidium
E203K
594.35
17.56407
8.25
0.82664



hybrid



cultivar


t607217
Cymbidium
D367E
421.1
11.77427
4.375
0.805709



hybrid



cultivar


t606894
Cymbidium
Y142F
512.875
39.92705
7.9
0.752773



hybrid



cultivar


t607288
Cymbidium
T289D
447.875
12.13875
4.8
0.547723



hybrid



cultivar


t606952
Cymbidium
V71Y
514.9
31.88239
5.225
3.545302



hybrid



cultivar


t607197
Cymbidium
G84C
444.2
14.33806
4.825
0.262996



hybrid



cultivar


t607146
Cymbidium
G262T
399.375
11.43864
3.925
0.394757



hybrid



cultivar


t607017
Cymbidium
N78E
517.475
8.940311
8.9
1.75119



hybrid



cultivar


t607456
Cymbidium
T111Q
466.95
34.97175
5.375
0.512348



hybrid



cultivar


t606854
Cymbidium
L144I
581.275
44.29946
5.925
4.19871



hybrid



cultivar


t606838
Cymbidium
R123N
529.65
7.364102
6.85
0.896289



hybrid



cultivar


t607213
Cymbidium
T289G
416.8
8.30542
4.375
0.531507



hybrid



cultivar


t606932
Cymbidium
S18T
495.975
39.19034
6.075
1.189888



hybrid



cultivar


t607349
Cymbidium
K282D
419.1
9.083685
4.075
0.419325



hybrid



cultivar


t607585
Cymbidium
L83I
450.375
20.91226
5.175
0.287228



hybrid



cultivar


t606956
Cymbidium
L83M
475.55
9.733276
6.3
0.535413



hybrid



cultivar


t607586
Cymbidium
I326R
443.525
14.63361
5.025
0.330404



hybrid



cultivar


t607025
Cymbidium
I99T
473.65
23.0925
6.4
0.408248



hybrid



cultivar


t607322
Cymbidium
R123K
425.975
8.165935
4.6
0.616441



hybrid



cultivar


t606905
Cymbidium
V388A
489.825
32.71823
6.075
4.224827



hybrid



cultivar


t606835
Cymbidium
Q161F
562.35
16.93999
4.05
2.763452



hybrid



cultivar


t607104
Cymbidium
E323H
406.55
15.15443
4.85
0.74162



hybrid



cultivar


t607135
Cymbidium
E28P
392.8
21.08159
4.05
0.789515



hybrid



cultivar


t606891
Cymbidium
K356Q
484.225
22.33762
7.45
0.660808



hybrid



cultivar


t607031
Cymbidium
N80H
508.1
14.36825
7.05
0.613732



hybrid



cultivar


t607317
Cymbidium
C82E
472.45
13.27466
6.4
0.909212



hybrid



cultivar


t607088
Cymbidium
P303V
402.225
18.42487
3.225
0.826136



hybrid



cultivar


t607262
Cymbidium
L385M
467.625
36.03085
4.5
1.048809



hybrid



cultivar


t606896
Cymbidium
Q115S
500.675
15.10968
6.15
4.194043



hybrid



cultivar


t607269
Cymbidium
A333R
455.775
3.975236
5.15
0.597216



hybrid



cultivar


t607294
Cymbidium
K359M
441.65
4.773887
5.125
0.25



hybrid



cultivar


t607344
Cymbidium
A252E
421.125
19.75135
4.675
0.670199



hybrid



cultivar


t607159
Cymbidium
L76I
411.4
24.21955
4.05
0.660808



hybrid



cultivar


t606890
Cymbidium
K112R
519.575
20.66985
8.55
0.675771



hybrid



cultivar


t607284
Cymbidium
F70M
422.125
9.036731
3.925
0.419325



hybrid



cultivar


t607292
Cymbidium
V331I
439.375
22.82475
4.825
0.618466



hybrid



cultivar


t607476
Cymbidium
L20F
438.925
17.62978
4.075
0.464579



hybrid



cultivar


t606946
Cymbidium
N151P
484.525
10.50599
5.875
0.699405



hybrid



cultivar


t607320
Cymbidium
T111E
427.525
18.14412
5.55
1.021437



hybrid



cultivar


t607083
Cymbidium
G84S
379.5
14.5952
4.175
1.284199



hybrid



cultivar


t607480
Cymbidium
A13V
437.5
12.94656
4.85
0.759386



hybrid



cultivar


t606909
Cymbidium
W368H
496.825
40.68967
4.975
3.350995



hybrid



cultivar


t607449
Cymbidium
E203P
433.675
17.36613
4.35
0.331662



hybrid



cultivar


t606851
Cymbidium
I293V
514.65
43.13626
6.35
1.369915



hybrid



cultivar


t607079
Cymbidium
S34E
466.325
54.18348
7.175
1.25



hybrid



cultivar


t607282
Cymbidium
R100T
433.175
10.15099
4.525
0.655108



hybrid



cultivar


t606967
Cymbidium
M135A
446.85
27.16143
2.8
3.237283



hybrid



cultivar


t606938
Cymbidium
D88A
483.9
1.131371
6.05
0.070711



hybrid



cultivar


t607433
Cymbidium
L291V
424.875
12.5492
4.475
0.394757



hybrid



cultivar


t607357
Cymbidium
T243A
419.45
12.8108
4.5
0.294392



hybrid



cultivar


t607122
Cymbidium
F374L
381.9
18.88686
2.9
0.678233



hybrid



cultivar


t607110
Cymbidium
R317T
374.025
4.807199
3.575
0.221736



hybrid



cultivar


t607015
Cymbidium
G84T
514.575
14.52111
7.575
1.367175



hybrid



cultivar


t607087
Cymbidium
E96A
393.975
20.30195
4.375
1.004573



hybrid



cultivar


t607019
Cymbidium
I99A
462.175
18.38104
6.175
0.543906



hybrid



cultivar


t606839
Cymbidium
D207S
557.425
29.43245
7.425
0.818026



hybrid



cultivar


t606942
Cymbidium
R100P
461.85
7.707464
4.05
0.212132



hybrid



cultivar


t607164
Cymbidium
V327A
393.05
14.87672
3.225
0.35



hybrid



cultivar


t606906
Cymbidium
E28A
482
14.14214
6.85
0.494975



hybrid



cultivar


t606868
Cymbidium
L248I
555.5
18.09586
9
0.955685



hybrid



cultivar


t607089
Cymbidium
L77I
380.525
6.542871
4.125
1.170114



hybrid



cultivar


t606910
Cymbidium
N89P
471.825
23.09998
7.025
1.611159



hybrid



cultivar


t606936
Cymbidium
T111K
468.95
8.328465
5.125
0.287228



hybrid



cultivar


t606856
Cymbidium
V157T
551.4
18.24573
8.525
0.78475



hybrid



cultivar


t607450
Cymbidium
I99R
406.45
13.84786
3.5
0.637704



hybrid



cultivar


t606960
Cymbidium
N78R
456.45
26.30089
4.225
3.001527



hybrid



cultivar


t607600
Cymbidium
R123A
413.025
13.12945
4.675
0.457347



hybrid



cultivar


t606940
Cymbidium
A13S
481.65
28.77925
6.3
0.424264



hybrid



cultivar


t607085
Cymbidium
K55R
349.1
16.14745
3.125
0.543906



hybrid



cultivar


t607474
Cymbidium
G262A
397.225
2.348581
3.4
0.294392



hybrid



cultivar


t606959
Cymbidium
A219C
411.275
14.18506
2.25
0.544671



hybrid



cultivar


t606859
Cymbidium
I99E
511.05
22.442
7.325
0.543906



hybrid



cultivar


t606904
Cymbidium
S10R
467.15
52.82088
7.25
1.626346



hybrid



cultivar


t607195
Cymbidium
T111R
404.8
17.43885
4.25
0.493288



hybrid



cultivar


t607445
Cymbidium
P305N
341.875
14.41281
2.725
0.55



hybrid



cultivar


t607273
Cymbidium
V341P
395.325
9.541619
3.625
0.655108



hybrid



cultivar


t606834
Cymbidium
V157I
516.05
31.53966
6.625
1.192686



hybrid



cultivar


t607254
Cymbidium
G262T
418.575
14.71991
4.35
0.574456



hybrid



cultivar


t606828
Cymbidium
A107M
503.3
16.92986
5.5
1.191638



hybrid



cultivar


t606836
Cymbidium
K104Q
543.5
39.43129
7.1
1.174734



hybrid



cultivar


t607081
Cymbidium
I99G
347.025
12.30864
2.725
0.567891



hybrid



cultivar


t607482
Cymbidium
R24K
405.575
10.66908
4.175
0.434933



hybrid



cultivar


t607132
Cymbidium
D208A
328.9
30.48748
2.675
0.634429



hybrid



cultivar


t606874
Cymbidium
D208S
395.55
19.92327
4.2
0.6733



hybrid



cultivar


t607190
Cymbidium
L137M
292.8
8.173942
1.5
0.216025



hybrid



cultivar


t607028
Cymbidium
D208N
356.3
20.88349
3.125
0.377492



hybrid



cultivar


t607370
Cymbidium
L264F
297.2
7.708869
4.575
0.411299



hybrid



cultivar


t606898
Cymbidium
I102A
371.3
14.25669
2.75
0.834666



hybrid



cultivar


t607216
Cymbidium
Q206L
280.95
22.84637
2.325
0.359398



hybrid



cultivar


t606901
Cymbidium
D208C
351.525
23.67324
1.8
1.275408



hybrid



cultivar


t607256
Cymbidium
G373A
236.425
5.235376
0
0



hybrid



cultivar


t607131
Cymbidium
N51D
191.275
227.01
1.9
2.941655



hybrid



cultivar


t607604
Cymbidium
I99K
203.675
235.3296
2.225



hybrid



cultivar


t606914
Cymbidium
A13N
246.7
285.4356
3.825
4.470925



hybrid



cultivar


t606934
Cymbidium
S10N
230.525
266.3877
2.975
3.435477



hybrid



cultivar


t607312
Cymbidium
I255M
60.1
3.31763
0
0



hybrid



cultivar


t607377
Cymbidium
S339W
0
0
0
0



hybrid



cultivar


t607318
Cymbidium
E28P F40Y
260.2
9.255269
3.3
0.702377



hybrid
N51D K54E



cultivar
N8OH C82E




L83I G84C




F86Y D88A




N89P N92D




F97M I99R




T111E




R123K




L137M




I147L N151P




C176D




S180N




K182P




Q206L




H233N




S237F




A252E




S260G




G262A




G281E




T289K




D367E


t607245
Cymbidium
E28P G84C
265.1
7.813237
0.225
0.45



hybrid
N89P T111R



cultivar
N151P




C176D




Q206L




S237F




A252E




G262T




G281E


t607000
Cymbidium
G373A
253.05
22.66282
0
0



hybrid
F374L



cultivar


t607337
Cymbidium
E28P N51E
214.225
6.286162
1.6
0.141421



hybrid
N80H C82E



cultivar
L83I G84C




F86Y D88A




N89P N92D




F97M I99R




T111R




R123K




L137M




I147L N151P




C176D




S180N




K182P




Q206L




H233N




S237F




A252E




S260G




G262A




G281E




T289K




D367E


t607265
Cymbidium
E28P F40Y
225.35
4.919011
1.8
0.374166



hybrid
N51D K54E



cultivar
N80H C82E




L83I G84C




F86Y D88A




N89P N92D




F97M I99R




T111Q




R123K




L137M




I147L N151P




C176D




S180N




K182P




Q206L




H233N




S237F




A252E




S260G




G262A




G281E




T289K




D367E


t607316
Cymbidium
W301Y
177.7
14.90526
2.075
0.801561



hybrid
P303V



cultivar
P305N




R308P




A309T


t607435
Cymbidium
E28P N51D
156.525
104.7564
1.075
0.813941



hybrid
K54E N80H



cultivar
C82E L83I




G84C F86Y




D88A N89P




N92D F97M




I99R T111E




R123K




L137M




I147L N151P




C176D




S180N




K182P




Q206L




H233N




S237F




A252E




S260G




G262A




G281E




T289K




D367E


t607381
Cymbidium
A266Y
4.825
9.65
0
0



hybrid
P305N



cultivar
S339W


t607124
Cymbidium
I255M
3.625
7.25
0
0



hybrid
A266Y



cultivar
W301Y




P303V




P305N




R308P




A309T




S339W




V341P




G373A




F374L


t607280
Cymbidium
A266Y
0
0
0
0



hybrid
P305N



cultivar
S339W




V341P




G373A




F374L


t607290
Cymbidium
A266Y
0
0
0
0



hybrid
S339W



cultivar
















TABLE 11B







Results of Secondary Screen of Protein Engineering Library Using C. sativa,


Cochorus, and Cymbidium Templates (not supplemented with sodium hexanoate)


















Average
Standard







Olivetol
Deviation





Average
Standard
Normalized to
Olivetol



Wild-type
Amino Acid
Normalized
Deviation
t606797_Cor-
Normalized to



template
mutations from
Olivetol
Normalized
chorus OLS
t606797_Cor-


Strain
used
wild-type
(per OD)
Olivetol
(per OD)
chorus OLS
















t527338


0.0048
0.0235




GFP


t527340

Cannabis

wild-type
1.05197
0.13292



Cannabis


sativa



OLS


t606801

Cannabis

S334P
1.9025
0.12934




sativa



t607067

Cannabis

F367L
1.49374
0.07849




sativa



t607367

Cannabis

G366A
1.32919
0.09047




sativa



t606794

Cannabis

wild-type
1.30342
0.19421




sativa




(Hemp) (Marijuana)


t607391

Cannabis

P298N
0.8132
0.03261




sativa



t606984

Cannabis

I248M
0.15063
0.02208




sativa



t606899

Cannabis

S332W
0
0




sativa



t606797
Corchorus
wild-type
0.31178
0.05119
1
0.14396


Corchorus
olitorius


OLS


t606807
Corchorus
d1-8 Y142C
1.36312
1.64905
5.49321
6.64551



olitorius


t606930
Corchorus
d1-8 M255I
0.44815
0.04224
1.40843
0.13275



olitorius


t607179
Corchorus
Y301W
0.42428
0.08973
1.45657
0.30806



olitorius
V302I




V303T




N305P




P308K




T309A


t607332
Corchorus
d1-8 Y301W
0.39884
0.06665
1.15654
0.19327



olitorius
V302I




V303T




N305P




P308K




T309A


t607236
Corchorus
M255I
0.37718
0.03147
1.27535
0.1064



olitorius


t607006
Corchorus
Y266F
0.2237
0.00789
0.72435
0.02554



olitorius


t606993
Corchorus
d1-8 N305P
0.20738
0.02797
0.67149
0.09057



olitorius


t607139
Corchorus
d1-8 Y266F
0.19496
0.02599
0.66932
0.08921



olitorius


t607158
Corchorus
A373G
0.18139
0.0179
0.62271
0.06145



olitorius


t607153
Corchorus
d1-8 A373G
0.16382
0.01353
0.56242
0.04646



olitorius


t606852
Corchorus
N305P
0.16244
0.00836
0.65462
0.03369



olitorius


t607112
Corchorus
W339S
0
0
0
0



olitorius


t607119
Corchorus
d1-8 L374F
0
0
0
0



olitorius


t607141
Corchorus
d1-8 Y266F
0
0
0
0



olitorius
W339S


t607149
Corchorus
d1-8 W339S
0
0
0
0



olitorius


t607176
Corchorus
L374F
0
0
0
0



olitorius


t607193
Corchorus
d1-8 T12Y
0
0
0
0



olitorius
F39Y Q42R




L43A Q47E




Q51D Q57K




I77L G79E




S84C E96D




T100E




L121K




N123K




A135V




A137M




T139G




H143Q




N146K




K151R




H152P




K156R




F158M




S174A




V182R




D183G




S184A




N231T




K232N




I241V




T253D




C260G




M287E




M353R




Q357E




S395N


t607371
Corchorus
Y266F
0
0
0
0



olitorius
W3395


t527346
Cymbidium
wild-type
1.02322
0.20846


Cymbidium
hybrid


OLS
cultivar


t607221
Cymbidium
N92D
1.67744
0.07244



hybrid



cultivar


t607228
Cymbidium
M135I
1.64252
0.01538



hybrid



cultivar


t606878
Cymbidium
P303A
1.56942
0.06675



hybrid



cultivar


t606986
Cymbidium
E323G
1.56671
0.16314



hybrid



cultivar


t606999
Cymbidium
N151P
1.5521
0.09794



hybrid



cultivar


t607224
Cymbidium
Y142C
1.54592
0.10755



hybrid



cultivar


t606976
Cymbidium
Q274K
1.54274
0.07232



hybrid



cultivar


t607241
Cymbidium
V388T
1.53854
0.07404



hybrid



cultivar


t607603
Cymbidium
Y142I
1.53604
0.11347



hybrid



cultivar


t607222
Cymbidium
N11K
1.52325
0.04008



hybrid



cultivar


t607014
Cymbidium
A287M
1.49053
0.07415



hybrid



cultivar


t606994
Cymbidium
T289A
1.46316
0.12925



hybrid



cultivar


t606982
Cymbidium
V314I
1.46045
0.08295



hybrid



cultivar


t606995
Cymbidium
I147L
1.4549
0.07909



hybrid



cultivar


t607007
Cymbidium
G281E
1.45405
0.06194



hybrid



cultivar


t607008
Cymbidium
1390Q
1.45278
0.0047



hybrid



cultivar


t606965
Cymbidium
Y142V
1.42708
0.18261



hybrid



cultivar


t607107
Cymbidium
A79E
1.4258
0.07329



hybrid



cultivar


t607194
Cymbidium
F86Y
1.42238
0.02094



hybrid



cultivar


t606981
Cymbidium
E96R
1.42122
0.04842



hybrid



cultivar


t606979
Cymbidium
E32R
1.416
0.12096



hybrid



cultivar


t606975
Cymbidium
K182P
1.4052
0.03345



hybrid



cultivar


t607230
Cymbidium
E116D
1.40132
0.0774



hybrid



cultivar


t607004
Cymbidium
L291Y
1.40103
0.06311



hybrid



cultivar


t606996
Cymbidium
T170A
1.38574
0.05793



hybrid



cultivar


t607046
Cymbidium
K359R
1.38299
0.01508



hybrid



cultivar


t607043
Cymbidium
N45T
1.38195
0.03083



hybrid



cultivar


t607021
Cymbidium
H69Y
1.37598
0.10301



hybrid



cultivar


t607109
Cymbidium
D61R
1.37011
0.10414



hybrid



cultivar


t607036
Cymbidium
K321E
1.36745
0.04988



hybrid



cultivar


t606912
Cymbidium
D14P
1.36318
0.09371



hybrid



cultivar


t607602
Cymbidium
R100E
1.35489
0.07769



hybrid



cultivar


t607361
Cymbidium
E323Q
1.35416
0.07809



hybrid



cultivar


t606882
Cymbidium
L222I
1.35377
0.07907



hybrid



cultivar


t606962
Cymbidium
D229E
1.34991
0.03688



hybrid



cultivar


t607150
Cymbidium
E285K
1.3445
0.07571



hybrid



cultivar


t607252
Cymbidium
F40Y
1.34329
0.11563



hybrid



cultivar


t607225
Cymbidium
S260G
1.34171
0.08715



hybrid



cultivar


t607032
Cymbidium
K355S
1.34146
0.08572



hybrid



cultivar


t607248
Cymbidium
E155C
1.3346
0.03904



hybrid



cultivar


t607155
Cymbidium
C176D
1.33173
0.01853



hybrid



cultivar


t606958
Cymbidium
C82N
1.32771
0.01591



hybrid



cultivar


t607023
Cymbidium
V71H
1.32703
0.0343



hybrid



cultivar


t607027
Cymbidium
T289K
1.32433
0.08328



hybrid



cultivar


t606892
Cymbidium
Y142T
1.31787
0.21329



hybrid



cultivar


t607035
Cymbidium
Y160G
1.31694
0.07538



hybrid



cultivar


t607237
Cymbidium
T289S
1.31692
0.10049



hybrid



cultivar


t607189
Cymbidium
N51E
1.31519
0.10593



hybrid



cultivar


t607118
Cymbidium
T289Q
1.31279
0.05154



hybrid



cultivar


t607018
Cymbidium
I390N
1.31146
0.00884



hybrid



cultivar


t607045
Cymbidium
F30C
1.31145
0.13007



hybrid



cultivar


t606888
Cymbidium
A107L
1.31126
0.05185



hybrid



cultivar


t607220
Cymbidium
A13T
1.3073
0.0685



hybrid



cultivar


t606830
Cymbidium
M135V
1.30213
0.13463



hybrid



cultivar


t606832
Cymbidium
E155Q
1.29883
0.18596



hybrid



cultivar


t607601
Cymbidium
K121V
1.29806
0.0748



hybrid



cultivar


t606857
Cymbidium
A195V
1.29426
0.3015



hybrid



cultivar


t607452
Cymbidium
K54E
1.29134
0.03462



hybrid



cultivar


t607218
Cymbidium
F97M
1.29082
0.07837



hybrid



cultivar


t607186
Cymbidium
S180N
1.28937
0.10055



hybrid



cultivar


t607123
Cymbidium
S237F
1.28824
0.07181



hybrid



cultivar


t607286
Cymbidium
S34Q
1.28672
0.12162



hybrid



cultivar


t606918
Cymbidium
V50N
1.28475
0.02954



hybrid



cultivar


t606916
Cymbidium
R100A
1.28388
0.03268



hybrid



cultivar


t606990
Cymbidium
L291W
1.27925
0.06685



hybrid



cultivar


t606908
Cymbidium
I22M
1.27921
0.08525



hybrid



cultivar


t606963
Cymbidium
I147E
1.27917
0.09902



hybrid



cultivar


t607226
Cymbidium
Q115D
1.27909
0.05228



hybrid



cultivar


t606961
Cymbidium
V43M
1.27574
0.04471



hybrid



cultivar


t607260
Cymbidium
E285A
1.27099
0.05173



hybrid



cultivar


t607160
Cymbidium
V341A
1.26952
0.05458



hybrid



cultivar


t607156
Cymbidium
H269S
1.26921
0.0896



hybrid



cultivar


t607478
Cymbidium
N11R
1.26856
0.08255



hybrid



cultivar


t606887
Cymbidium
Q274G
1.26839
0.09424



hybrid



cultivar


t606861
Cymbidium
E203K
1.26653
0.2508



hybrid



cultivar


t607217
Cymbidium
D367E
1.26643
0.04114



hybrid



cultivar


t606894
Cymbidium
Y142F
1.26586
0.09867



hybrid



cultivar


t607288
Cymbidium
T289D
1.26389
0.05999



hybrid



cultivar


t606952
Cymbidium
V71Y
1.26229
0.12989



hybrid



cultivar


t607197
Cymbidium
G84C
1.26224
0.07225



hybrid



cultivar


t607146
Cymbidium
G262T
1.26188
0.03879



hybrid



cultivar


t607017
Cymbidium
N78E
1.26073
0.09115



hybrid



cultivar


t607456
Cymbidium
T111Q
1.25899
0.07745



hybrid



cultivar


t606854
Cymbidium
L144I
1.25807
0.22743



hybrid



cultivar


t606838
Cymbidium
R123N
1.25683
0.23657



hybrid



cultivar


t607213
Cymbidium
T289G
1.25405
0.08316



hybrid



cultivar


t606932
Cymbidium
S18T
1.25389
0.12779



hybrid



cultivar


t607349
Cymbidium
K282D
1.25244
0.08104



hybrid



cultivar


t607585
Cymbidium
L83I
1.2509
0.04892



hybrid



cultivar


t606956
Cymbidium
L83M
1.24841
0.05473



hybrid



cultivar


t607586
Cymbidium
I326R
1.24784
0.04812



hybrid



cultivar


t607025
Cymbidium
I99T
1.24421
0.09819



hybrid



cultivar


t607322
Cymbidium
R123K
1.24342
0.08106



hybrid



cultivar


t606905
Cymbidium
V388A
1.24241
0.09318



hybrid



cultivar


t606835
Cymbidium
Q161F
1.24193
0.10262



hybrid



cultivar


t607104
Cymbidium
E323H
1.24084
0.05988



hybrid



cultivar


t607135
Cymbidium
E28P
1.24063
0.06484



hybrid



cultivar


t606891
Cymbidium
K356Q
1.24008
0.09459



hybrid



cultivar


t607031
Cymbidium
N80H
1.23893
0.09737



hybrid



cultivar


t607317
Cymbidium
C82E
1.23849
0.03065



hybrid



cultivar


t607088
Cymbidium
P303V
1.23838
0.07685



hybrid



cultivar


t607262
Cymbidium
L385M
1.23811
0.14991



hybrid



cultivar


t606896
Cymbidium
Q115S
1.2378
0.10758



hybrid



cultivar


t607269
Cymbidium
A333R
1.23132
0.05599



hybrid



cultivar


t607294
Cymbidium
K359M
1.22923
0.04507



hybrid



cultivar


t607344
Cymbidium
A252E
1.22825
0.1572



hybrid



cultivar


t607159
Cymbidium
L76I
1.22743
0.10259



hybrid



cultivar


t606890
Cymbidium
K112R
1.22535
0.04006



hybrid



cultivar


t607284
Cymbidium
F70M
1.22458
0.03416



hybrid



cultivar


t607292
Cymbidium
V331I
1.22387
0.05296



hybrid



cultivar


t607476
Cymbidium
L20F
1.22172
0.06972



hybrid



cultivar


t606946
Cymbidium
N151P
1.22164
0.0856



hybrid



cultivar


t607320
Cymbidium
T111E
1.22064
0.07826



hybrid



cultivar


t607083
Cymbidium
G84S
1.21542
0.11127



hybrid



cultivar


t607480
Cymbidium
A13V
1.21428
0.00581



hybrid



cultivar


t606909
Cymbidium
W368H
1.21403
0.1423



hybrid



cultivar


t607449
Cymbidium
E203P
1.21329
0.09644



hybrid



cultivar


t606851
Cymbidium
I293V
1.21178
0.19419



hybrid



cultivar


t607079
Cymbidium
S34E
1.21014
0.05037



hybrid



cultivar


t607282
Cymbidium
R100T
1.20952
0.03641



hybrid



cultivar


t606967
Cymbidium
M135A
1.20347
0.12224



hybrid



cultivar


t606938
Cymbidium
D88A
1.19819
0.02307



hybrid



cultivar


t607433
Cymbidium
L291V
1.19755
0.06191



hybrid



cultivar


t607357
Cymbidium
T243A
1.19607
0.10095



hybrid



cultivar


t607122
Cymbidium
F374L
1.19551
0.12138



hybrid



cultivar


t607110
Cymbidium
R317T
1.19291
0.05771



hybrid



cultivar


t607015
Cymbidium
G84T
1.19021
0.12242



hybrid



cultivar


t607087
Cymbidium
E96A
1.18792
0.06196



hybrid



cultivar


t607019
Cymbidium
I99A
1.18519
0.12123



hybrid



cultivar


t606839
Cymbidium
D207S
1.17855
0.17153



hybrid



cultivar


t606942
Cymbidium
R100P
1.17753
0.09144



hybrid



cultivar


t607164
Cymbidium
V327A
1.17683
0.0485



hybrid



cultivar


t606906
Cymbidium
E28A
1.1681
0.05241



hybrid



cultivar


t606868
Cymbidium
L248I
1.16589
0.03553



hybrid



cultivar


t607089
Cymbidium
L77I
1.16044
0.02039



hybrid



cultivar


t606910
Cymbidium
N89P
1.15844
0.18953



hybrid



cultivar


t606936
Cymbidium
T111K
1.15338
0.07375



hybrid



cultivar


t606856
Cymbidium
V157T
1.14804
0.06609



hybrid



cultivar


t607450
Cymbidium
I99R
1.13733
0.02723



hybrid



cultivar


t606960
Cymbidium
N78R
1.13483
0.06431



hybrid



cultivar


t607600
Cymbidium
R123A
1.13448
0.08691



hybrid



cultivar


t606940
Cymbidium
A13S
1.13438
0.04862



hybrid



cultivar


t607085
Cymbidium
K55R
1.12814
0.06807



hybrid



cultivar


t607474
Cymbidium
G262A
1.12724
0.09746



hybrid



cultivar


t606959
Cymbidium
A219C
1.1175
0.07755



hybrid



cultivar


t606859
Cymbidium
I99E
1.10483
0.19373



hybrid



cultivar


t606904
Cymbidium
S10R
1.10411
0.13592



hybrid



cultivar


t607195
Cymbidium
T111R
1.10148
0.06594



hybrid



cultivar


t607445
Cymbidium
P305N
1.0924
0.04527



hybrid



cultivar


t607273
Cymbidium
V341P
1.09021
0.01522



hybrid



cultivar


t606834
Cymbidium
V157I
1.07903
0.07146



hybrid



cultivar


t607254
Cymbidium
G262T
1.07703
0.05546



hybrid



cultivar


t606828
Cymbidium
A107M
1.07407
0.12564



hybrid



cultivar


t606836
Cymbidium
K104Q
1.07292
0.1499



hybrid



cultivar


t607081
Cymbidium
I99G
1.0581
0.0688



hybrid



cultivar


t607482
Cymbidium
R24K
1.05409
0.07847



hybrid



cultivar


t607132
Cymbidium
D208A
1.01083
0.07712



hybrid



cultivar


t606874
Cymbidium
D208S
0.95301
0.04077



hybrid



cultivar


t607190
Cymbidium
L137M
0.90492
0.02932



hybrid



cultivar


t607028
Cymbidium
D208N
0.88114
0.04406



hybrid



cultivar


t607370
Cymbidium
L264F
0.87471
0.07388



hybrid



cultivar


t606898
Cymbidium
I102A
0.85753
0.12759



hybrid



cultivar


t607216
Cymbidium
Q206L
0.83138
0.05303



hybrid



cultivar


t606901
Cymbidium
D208C
0.82507
0.10608



hybrid



cultivar


t607256
Cymbidium
G373A
0.65595
0.02379



hybrid



cultivar


t607131
Cymbidium
N51D
0.5808
0.70692



hybrid



cultivar


t607604
Cymbidium
I99K
0.57364
0.66547



hybrid



cultivar


t606914
Cymbidium
A13N
0.56823
0.65663



hybrid



cultivar


t606934
Cymbidium
S10N
0.54561
0.63028



hybrid



cultivar


t607312
Cymbidium
I255M
0.16984
0.01174



hybrid



cultivar


t607377
Cymbidium
S339W
0
0



hybrid



cultivar


t607318
Cymbidium
E28P F40Y
0.78529
0.05769



hybrid
N51D K54E



cultivar
N80H C82E




L83I G84C




F86Y D88A




N89P N92D




F97M I99R




T111E




R123K




L137M




I147L




N151P




C176D




S180N




K182P




Q206L




H233N




S237F




A252E




S260G




G262A




G281E




T289K




D367E


t607245
Cymbidium
E28P G84C
0.77803
0.02842



hybrid
N89P T111R



cultivar
N151P




C176D




Q206L




S237F




A252E




G262T




G281E


t607000
Cymbidium
G373A
0.71402
0.06528



hybrid
F374L



cultivar


t607337
Cymbidium
E28P N51E
0.66352
0.03712



hybrid
N80H C82E



cultivar
L83I G84C




F86Y D88A




N89P N92D




F97M I99R




T111R




R123K




L137M




I147L




N151P




C176D




S180N




K182P




Q206L




H233N




S237F




A252E




S260G




G262A




G281E




T289K




D367E


t607265
Cymbidium
E28P F40Y
0.64379
0.0247



hybrid
N51D K54E



cultivar
N80H C82E




L83I G84C




F86Y D88A




N89P N92D




F97M I99R




T111Q




R123K




L137M




I147L




N151P




C176D




S180N




K182P




Q206L




H233N




S237F




A252E




S260G




G262A




G281E




T289K




D367E


t607316
Cymbidium
W301Y
0.49342
0.01579



hybrid
P303V



cultivar
P305N




R308P




A309T


t607435
Cymbidium
E28P N51D
0.46307
0.30921



hybrid
K54E N80H



cultivar
C82E L83I




G84C F86Y




D88A N89P




N92D F97M




I99R T111E




R123K




L137M




I147L




N151P




C176D




S180N




K182P




Q206L




H233N




S237F




A252E




S260G




G262A




G281E




T289K




D367E


t607381
Cymbidium
A266Y
0.01389
0.02777



hybrid
P305N



cultivar
S339W


t607124
Cymbidium
I255M
0.01165
0.02329



hybrid
A266Y



cultivar
W301Y




P303V




P305N




R308P




A309T




S339W




V341P




G373A




F374L


t607280
Cymbidium
A266Y
0
0



hybrid
P305N



cultivar
S339W




V341P




G373A




F374L


t607290
Cymbidium
A266Y
0
0



hybrid
S339W



cultivar
















TABLE 12







Screening Results in Prototrophic S. cerevisiae strain















Standard





Average
Olivetol





Olivetol
Deviation



Strain
Strain type
[μg/L]
[μg/L]
















t473139
Negative Control
0
0



t496101

Cannabis OLS variant

20254.13
2236.483




(positive control)



t496102
Library
20566.05
2055.026



t485668
Library
24062.45
4250.129



t496079
Library
29485.08
2786.913



t485662
Library
50257.28
3891.439



t496084
t496084 Cannabis OLS
53595.65
7035.556




T335C point mutant



t485672
Library
53606.37
6230.06



t496073
Library
56729.84
4435.122

















TABLE 13







Sequence Information for Strains described in Table 9












Nucleotide Sequence
Protein Sequence



Strain
(SEQ ID NO)
(SEQ ID NO)















t405417
250
207



t404953
251
208



t405220
252
209



t404192
253
210



t404323
254
211



t404196
255
212



t404209
256
213



t404164
257
214



t404170
258
215



t404384
259
216



t405397
260
217



t405164
261
218



t404191
262
219



t405340
263
220



t404421
264
221



t404631
265
222



t405133
266
223



t405081
267
224



t404898
268
225



t405017
269
226



t405140
270
227



t404276
271
228



t404405
272
229



t405079
273
230



t404978
274
231



t405347
275
232



t404855
276
233



t405362
277
234



t404523
278
235



t404951
279
236



t405308
280
237



t405201
281
238



t404219
282
239



t404673
283
240



t404274
284
241



t405042
285
242



t404528
286
243



t405312
287
244



t404725
288
245



t405303
289
246



t405395
290
247



t405326
291
248



t404599
292
249

















TABLE 14







Sequence Information for Strains Described in Tables 10A-10B












Nucleotide Sequence
Protein Sequence



Strain
(SEQ ID NO)
(SEQ ID NO)















t606794
96
80



t527340
62
5



t607067
421
293



t607367
422
294



t607391
423
295



t606801
424
296



t606984
425
297



t606899
426
298



t606797
37
6



t606807
427
299



t607179
428
300



t607149
429
301



t607139
430
302



t607112
431
303



t607332
432
304



t607153
433
305



t607158
434
306



t607236
435
307



t607141
436
308



t607176
437
309



t606930
438
310



t607193
439
311



t607006
440
312



t606993
441
313



t606852
442
314



t607119
443
315



t607371
444
316



t527346
38
7



t606952
445
317



t607284
446
318



t607262
447
319



t606938
448
320



t607260
449
321



t607159
450
322



t606946
451
323



t606861
452
324



t606918
453
325



t607135
454
326



t607286
455
327



t606942
456
328



t606959
457
329



t607294
458
330



t607282
459
331



t607230
460
332



t606965
461
333



t607288
462
334



t607228
463
335



t606909
464
336



t606962
465
337



t607150
466
338



t607361
467
339



t606932
468
340



t606940
469
341



t607269
470
342



t607186
471
343



t607476
472
344



t607031
473
345



t606916
474
346



t607292
475
347



t606908
476
348



t607248
477
349



t607023
478
350



t606936
479
351



t607433
480
352



t607600
481
353



t606894
482
354



t606963
483
355



t607603
484
356



t607452
485
357



t607197
486
358



t606996
487
359



t607043
488
360



t607254
489
361



t607478
490
362



t607132
491
363



t607109
492
364



t607155
493
365



t606956
494
366



t606906
495
367



t607195
496
368



t607449
497
369



t607256
498
370



t607349
499
371



t606960
500
372



t607601
501
373



t607021
502
374



t606874
503
375



t607320
504
376



t607317
505
377



t607224
506
378



t606912
507
379



t607602
508
380



t606905
509
381



t607156
510
382



t607474
511
383



t607482
512
384



t606854
513
385



t607032
514
386



t606830
515
387



t606961
516
388



t606868
517
389



t607083
518
390



t606958
519
391



t607273
520
392



t607241
521
393



t606857
522
394



t606901
523
395



t607015
524
396



t607586
525
397



t607122
526
398



t606882
527
399



t607146
528
400



t607585
529
401



t606828
530
402



t607081
531
403



t606887
532
404



t606891
533
405



t607160
534
406



t607194
535
407



t607377
537
409



t607265
538
410



t607000
539
411



t607245
540
412



t607318
541
413



t607435
542
414



t607337
543
415



t607316
544
416



t607124
545
417



t607280
546
418



t607290
547
419



t607381
548
420

















TABLE 15







Sequence Information for Strains Described in Tables 11A-11B












Nucleotide Sequence
Protein Sequence



Strain
(SEQ ID NO)
(SEQ ID NO)















t527340
62
5



t606801
424
296



t607067
421
293



t607367
422
294



t606794
96
80



t607391
423
295



t606984
425
297



t606899
426
298



t606797
37
6



t606807
427
299



t606930
438
310



t607179
428
300



t607332
432
304



t607236
435
307



t607006
440
312



t606993
441
313



t607139
430
302



t607158
434
306



t607153
433
305



t606852
442
314



t607112
431
303



t607119
443
315



t607141
436
308



t607149
429
301



t607176
437
309



t607193
439
311



t607371
444
316



t527346
38
7



t607221
628
549



t607228
463
335



t606878
629
550



t606986
630
551



t606999
631
552



t607224
506
378



t606976
632
553



t607241
521
393



t607603
484
356



t607222
633
554



t607014
634
555



t606994
635
556



t606982
636
557



t606995
637
558



t607007
638
559



t607008
639
560



t606965
461
333



t607107
640
561



t607194
535
407



t606981
641
562



t606979
642
563



t606975
643
564



t607230
460
332



t607004
644
565



t606996
487
359



t607046
645
566



t607043
488
360



t607021
502
374



t607109
492
364



t607036
646
567



t606912
507
379



t607602
508
380



t607361
467
339



t606882
527
399



t606962
465
337



t607150
466
338



t607252
647
568



t607225
648
569



t607032
514
386



t607248
477
349



t607155
493
365



t606958
519
391



t607023
478
350



t607027
649
570



t606892
650
571



t607035
651
572



t607237
652
573



t607189
653
574



t607118
654
575



t607018
655
576



t607045
656
577



t606888
657
578



t607220
658
579



t606830
515
387



t606832
659
580



t607601
501
373



t606857
522
394



t607452
485
357



t607218
660
581



t607186
471
343



t607123
661
582



t607286
455
327



t606918
453
325



t606916
474
346



t606990
662
583



t606908
476
348



t606963
483
355



t607226
663
584



t606961
516
388



t607260
449
321



t607160
534
406



t607156
510
382



t607478
490
362



t606887
532
404



t606861
452
324



t607217
664
585



t606894
482
354



t607288
462
334



t606952
445
317



t607197
486
358



t607146
528
400



t607017
665
586



t607456
666
587



t606854
513
385



t606838
667
588



t607213
668
589



t606932
468
340



t607349
499
371



t607585
529
401



t606956
494
366



t607586
525
397



t607025
669
590



t607322
670
591



t606905
509
381



t606835
671
592



t607104
672
593



t607135
454
326



t606891
533
405



t607031
473
345



t607317
505
377



t607088
673
594



t607262
447
319



t606896
674
595



t607269
470
342



t607294
458
330



t607344
675
596



t607159
450
322



t606890
676
597



t607284
446
318



t607292
475
347



t607476
472
344



t606946
451
323



t607320
504
376



t607083
518
390



t607480
677
598



t606909
464
336



t607449
497
369



t606851
678
599



t607079
679
600



t607282
459
331



t606967
680
601



t606938
448
320



t607433
480
352



t607357
681
602



t607122
526
398



t607110
682
603



t607015
524
396



t607087
683
604



t607019
684
605



t606839
685
606



t606942
456
328



t607164
686
607



t606906
495
367



t606868
517
389



t607089
687
608



t606910
688
609



t606936
479
351



t606856
689
610



t607450
690
611



t606960
500
372



t607600
481
353



t606940
469
341



t607085
691
612



t607474
511
383



t606959
457
329



t606859
692
613



t606904
693
614



t607195
496
368



t607445
694
615



t607273
520
392



t606834
695
616



t607254
489
361



t606828
530
402



t606836
696
617



t607081
531
403



t607482
512
384



t607132
491
363



t606874
503
375



t607190
697
618



t607028
698
619



t607370
699
620



t606898
700
621



t607216
701
622



t606901
523
395



t607256
498
370



t607131
702
623



t607604
703
624



t606914
704
625



t606934
705
626



t607312
536
408



t607377
537
409



t607318
541
413



t607245
540
412



t607000
539
411



t607337
543
415



t607265
538
410



t607316
544
416



t607435
542
414



t607381
548
420



t607124
545
417



t607280
546
418



t607290
547
419

















TABLE 16







Sequence Information for Strains Described in Table 12












Nucleotide Sequence
Amino Acid Sequence



Strain
(SEQ ID NO)
(SEQ ID NO)















t496101
62
5



t496102
39
8



t485668
44
13



t496079
47
16



t485662
46
15



t496084
706
627



t485672
48
17



t496073
38
7










EQUIVALENTS

Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments of the invention described herein. Such equivalents are intended to be encompassed by the following claims.


All references, including patent documents, disclosed herein are incorporated by reference in their entirety.

Claims
  • 1. A host cell that comprises a heterologous polynucleotide encoding a polyketide synthase (PKS), wherein the PKS comprises an amino acid sequence that has at least 90% sequence identity to any one of SEQ ID NOs: 7, 15, 145, and 714, or wherein the PKS comprises a conservatively substituted version of any one of SEQ ID NOs: 7, 15, 145, and 714, and wherein the host cell further comprises one or more heterologous polynucleotides encoding one or more of: a polyketide cyclase (PKC), a prenyltransferase (PT), and/or a terminal synthase (TS).
  • 2. The host cell of claim 1, wherein relative to the sequence of SEQ ID NO: 7, the PKS comprises an amino acid substitution at a residue corresponding to position 28, 34, 50, 70, 71, 76, 88, 100, 151, 203, 219, 285, 359, and/or 385 in SEQ ID NO: 7.
  • 3. The host cell of claim 1, wherein the PKS comprises: a) the amino acid P at a residue corresponding to position 28 in SEQ ID NO: 7;b) the amino acid Q at a residue corresponding to position 34 in SEQ ID NO: 7;c) the amino acid N at a residue corresponding to position 50 in SEQ ID NO: 7;d) the amino acid M at a residue corresponding to position 70 in SEQ ID NO: 7;e) the amino acid Y at a residue corresponding to position 71 in SEQ ID NO: 7;f) the amino acid I at a residue corresponding to position 76 in SEQ ID NO: 7;g) the amino acid A at a residue corresponding to position 88 in SEQ ID NO: 7;h) the amino acid P or T at a residue corresponding to position 100 in SEQ ID NO: 7;i) the amino acid P at a residue corresponding to position 151 in SEQ ID NO: 7;j) the amino acid K at a residue corresponding to position 203 in SEQ ID NO: 7;k) the amino acid C at a residue corresponding to position 219 in SEQ ID NO: 7;l) the amino acid A at a residue corresponding to position 285 in SEO ID NO: 7;m) the amino acid M at a residue corresponding to position 359 in SEQ ID NO: 7; and/orn) the amino acid M at a residue corresponding to position 385 in SEQ ID NO: 7.
  • 4-71. (canceled)
  • 72. The host cell of claim 1, wherein the host cell is capable of producing a cannabinoid compound or a cannabinoid precursor, wherein the cannabinoid compound is a compound of Formulas 8, 9, 10, or 11:
  • 73. The host cell of claim 72, wherein the host cell is capable of producing 3,5,7-trioxododecanoyl-CoA, olivetol, olivetolic acid, cannabigerolic acid (8a), cannabidiolic acid (9a), tetrahydrocannabinolic acid (10a), and/or cannabichromenic acid (11a).
  • 74. The host cell of claim 73, wherein the host cell produces more 3,5,7-trioxododecanoyl-CoA, olivetol, and/or olivetolic acid than a host cell that: (i) does not comprise the PKS that comprises the amino acid sequence that has at least 90% sequence identity to any one of SEQ ID NOs: 7, 15, 145, and 714 and (ii) comprises a heterologous polynucleotide encoding a PKS that comprises the amino acid sequence of SEQ ID NO: 5.
  • 75. The host cell of claim 1, wherein the PKS comprises one or more of the following amino acid substitutions relative to SEQ ID NO: 7: V71Y and F70M.
  • 76. The host cell of claim 1, wherein the PKS comprises: a) C at a residue corresponding to position 164 in SEQ ID NO: 7;b) H at a residue corresponding to position 304 in SEQ ID NO: 7; and/orc) N at a residue corresponding to position 337 in SEQ ID NO: 7.
  • 77. The host cell of claim 1, wherein the PKS comprises the amino acid sequence of any one of SEQ ID NOs: 7, 15, 145, and 714.
  • 78. The host cell of claim 1, wherein the heterologous polynucleotide comprises a nucleotide sequence that has at least 90% sequence identity to any one of SEQ ID NOs: 38, 175, 176, and 205, or a codon degenerate nucleotide sequence thereof.
  • 79. The host cell of claim 1, wherein the host cell is a yeast cell.
  • 80. The host cell of claim 79, wherein the yeast cell is a Saccharomyces cell, a Yarrowia cell, a Pichia cell or a Komagataella cell.
  • 81. The host cell of claim 80, wherein the Saccharomyces cell is a Saccharomyces cerevisiae cell.
  • 82. The host cell of claim 1, wherein the PKC is an olivetolic acid cyclase (OAC).
  • 83. The host cell of claim 72, wherein R is a straight-chain unsubstituted C1-10 alkyl.
  • 84. The host cell of claim 72, wherein R is a straight-chain unsubstituted C3 or C5 alkyl.
  • 85. A method comprising culturing the host cell of claim 1.
  • 86. A method for producing a cannabinoid compound or a cannabinoid precursor, wherein the method comprises culturing a host cell that comprises a heterologous polynucleotide encoding a polyketide synthase (PKS), wherein the PKS comprises an amino acid sequence that has at least 90% identity to SEQ ID NO: 7, 15, 145, or 714 and wherein the cannabinoid compound is a compound of Formulas 8, 9, 10, or 11:
  • 87. A bioreactor for producing a cannabinoid compound or a cannabinoid precursor containing: a. malonyl-CoA;b. an optionally substituted alkanoic acid; andc. a polyketide synthase (PKS) comprising an amino acid sequence that has at least 90% sequence identity to any one of SEQ ID NOs: 7, 715, 145, and 714, or a conservatively substituted version thereof.
  • 88. The bioreactor of claim 87, further comprising an acyl activating enzyme (AAE), a polyketide cyclase (PKC), a prenyltransferase (PT), and/or a terminal synthase (TS).
CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of International Application Serial Number PCT/US2020/019760, filed Feb. 25, 2020, entitled “BIOSYNTHESIS OF CANNABINOIDS AND CANNABINOID PRECURSORS,” which claims the benefit under 35 U.S.C. § 119(e) of U.S. Provisional Application Ser. No. 62/810,367, filed Feb. 25, 2019, entitled “BIOSYNTHESIS OF CANNABINOIDS AND CANNABINOID PRECURSORS,” and U.S. Provisional Application Ser. No. 62/810,938, filed Feb. 26, 2019, entitled “BIOSYNTHESIS OF CANNABINOIDS AND CANNABINOID PRECURSORS,” the disclosure of each of which is incorporated by reference herein in its entirety.

Provisional Applications (2)
Number Date Country
62810938 Feb 2019 US
62810367 Feb 2019 US
Continuations (2)
Number Date Country
Parent 17078872 Oct 2020 US
Child 17690446 US
Parent PCT/US2020/019760 Feb 2020 US
Child 17078872 US