The contents of the electronic sequence listing (YEDA-P-010-PCT.xml; size: 251,312 bytes; and date of creation: Aug. 20, 2023) is herein incorporated by reference in its entirety.
The present invention relates to combinations of enzymes derived from Helichrysum umbraculigerum including polynucleotides encoding same, and methods of using same, such as for producing cannabinoids.
Cannabinoids are terpenophenolic compounds found in Cannabis sativa, an annual plant belonging to the Cannabaceae family. The plant contains more than 400 chemicals and approximately 70 cannabinoids. The latter accumulate mainly in the glandular trichomes. Of the naturally occurring cannabinoids, tetrahydrocannabinol (THC), for example, is used for treating a wide range of medical conditions, including glaucoma, AIDS wasting, neuropathic pain, treatment of spasticity associated with multiple sclerosis, fibromyalgia, and chemotherapy-induced nausea. THC is also effective in the treatment of allergies, inflammation, infection, epilepsy, depression, migraine, bipolar disorders, anxiety disorder, drug dependency and drug withdrawal syndromes.
Additional active cannabinoids include cannabidiol (CBD), an isomer of THC, which is a potent antioxidant and anti-inflammatory compound known to provide protection against acute and chronic neuro-degeneration; cannabigerol (CBG), found in high concentrations in hemp, which acts as a high affinity α2-adrenergic receptor agonist, moderate affinity 5-HT1A receptor antagonist and low affinity CB1 receptor antagonist, and possibly has anti-depressant activity; and cannabichromene (CBC), which possesses anti-inflammatory, anti-fungal and anti-viral properties. Many phytocannabinoids have therapeutic potential in a variety of diseases and may play a relevant role in plant defense as well as in pharmacology. Accordingly, biotechnological production of cannabinoids and cannabinoid-like compounds with therapeutic properties is of uttermost importance. Thus, cannabinoids are considered to be promising agents for their beneficial effects in the treatment of various diseases.
Despite their known beneficial effects, therapeutic use of cannabinoids is hampered by the high costs associated with the growing and maintenance of the plants in large scale and the difficulty in obtaining high yields of cannabinoids. Extraction, isolation and purification of cannabinoids from plant tissue is particularly challenging as cannabinoids oxidize easily and are sensitive to light and heat.
Therefore, there is a need for developing methodologies that allow large-scale production of cannabinoids for therapeutic use.
According to a first aspect, there is provided an isolated DNA molecule comprising at least a first nucleic acid sequence encoding a first protein and at least a second nucleic acid sequence encoding a second protein, wherein the first protein and the second protein are derived from Helichrysum umbraculigerum and belonging to an enzyme family selected from the group consisting of: acyl activating enzyme (AAE), polyketide synthase (PKS), polyketide cyclase (PKC), prenyltransferase (PT), and cannabichromenic acid synthase (CBCAS), and wherein the first protein and the second protein belong to different enzyme families.
According to another aspect, there is provided an artificial nucleic acid molecule comprising the isolated DNA molecule disclosed herein.
According to another aspect, there is provided a plasmid or an agrobacterium comprising the artificial nucleic acid molecule disclosed herein.
According to another aspect, there is provided a transgenic cell comprising: (a) the isolated DNA molecule of the invention; (b) the artificial nucleic acid molecule disclosed herein; (c) the plasmid or agrobacterium disclosed herein; or (d) any combination of (a) to (c).
According to another aspect, there is provided an extract derived from the transgenic cell of disclosed herein, or any fraction thereof.
According to another aspect, there is provided transgenic plant, a transgenic plant tissue or a plant part, comprising: (a) the isolated DNA molecule of the invention; (b) the artificial nucleic acid molecule disclosed herein; (c) the plasmid or agrobacterium disclosed herein; (d) the transgenic cell disclosed herein; or (e) any combination of (a) to (d).
According to another aspect, there is provided a composition comprising: (a) the isolated DNA molecule of the invention; (b) the artificial nucleic acid disclosed herein; (c) the plasmid or agrobacterium disclosed herein; (d) the transgenic cell disclosed herein; (e) the extract disclosed herein; (f) the transgenic plant tissue or plant part disclosed herein; or (g) any combination of (a) to (f), and an acceptable carrier.
According to another aspect, there is provided a method for synthesizing a cannabinoid, a precursor thereof, or any combination thereof, comprising the steps: (a) providing a transgenic cell or a cell transfected with the isolated DNA molecule of the invention or the artificial nucleic acid molecule disclosed herein; and (b) culturing the transgenic cell or the transfected cell from step (a) such that at least the first protein and the second protein encoded by the artificial nucleic acid molecule are expressed, thereby synthesizing the cannabinoid, a precursor thereof, or any combination thereof.
According to another aspect, there is provided an extract of a transgenic cell or a transfected cell obtained according to the herein disclosed method.
According to another aspect, there is provided a composition comprising the extract disclosed herein, and an acceptable carrier.
In some embodiments, the isolated DNA molecule further comprises at least a third nucleic acid sequence encoding a third protein derived from H. umbraculigerum and belonging to an enzyme family selected from the group consisting of: AAE, PKS, PKC, PT, and CBCAS, and wherein the first protein, the second protein, and the third protein, belong to different enzyme families.
In some embodiments, the isolated DNA molecule further comprises at least a fourth nucleic acid sequence encoding a fourth protein derived from H. umbraculigerum and belonging to an enzyme family selected from the group consisting of: AAE, PKS, PKC, PT, and CBCAS, and wherein the first protein, the second protein, the third protein, and the fourth protein, belong to different enzyme families.
In some embodiments, the isolated DNA molecule further comprises at least a fifth nucleic acid sequence encoding a fifth protein derived from H. umbraculigerum and belonging to an enzyme family selected from the group consisting of: AAE, PKS, PKC, PT, and CBCAS, and wherein the first protein, the second protein, the third protein, the fourth protein, and the fifth protein, belong to different enzyme families.
In some embodiments, the isolated DNA further comprises a nucleic acid sequence encoding a protein derived from H. umbraculigerum and belonging to an enzyme family selected from the group consisting of: uridine diphosphate (UDP)-glycosyltransferase (UGT), alcohol acyltransferase (AAT), and both.
In some embodiments: (a) the AAE is encoded by a nucleic acid sequence having at least 89% homology to any one of SEQ ID Nos.: 1-11, and any combination thereof; (b) PKS is encoded by a nucleic acid sequence having at least 83% homology to any one of: SEQ ID Nos.: 23-26, and any combination thereof; (c) PKC is encoded by a nucleic acid sequence having at least 88% homology to any one of: SEQ ID Nos.: 31-38, and any combination thereof; (d) PT is encoded by a nucleic acid sequence having at least 91% homology to any one of: SEQ ID Nos.: 47-58, and any combination thereof; (e) CBCAS is encoded by a nucleic acid sequence having at least 82% homology to any one of: SEQ ID Nos.: 71-79, and any combination thereof; or (f) any combination of (a) to (e).
In some embodiments: (a) the UGT is encoded by a nucleic acid sequence having at least 87% homology to any one of: SEQ ID Nos.: 89-101, and any combination thereof; (b) the AAT is encoded by a nucleic acid sequence having at least 87% homology to any one of: SEQ ID Nos.: 115-129, and any combination thereof; or (c) both (a) and (b).
In some embodiments: (a) AAE comprises an amino acid sequence with at least 93% homology to any one of SEQ ID Nos.: 12-22; (b) PKS comprises an amino acid sequence with at least 93% homology to any one of: SEQ ID Nos.: 27-30; (c) PKC comprises an amino acid sequence with at least 87% homology to any SEQ ID Nos.: 39-46; (d) PT comprises an amino acid sequence with at least 92% homology to any one of: SEQ ID Nos.: 59-70; (e) CBCAS comprises an amino acid sequence with at least 86% homology to any one of: SEQ ID Nos.: 80-88; (f) or any combination of (a) to (e).
In some embodiments: (a) the UGT comprises an amino acid sequence with at least 90% homology to any one of: SEQ ID Nos.: 102-114; (b) the AAT comprises an amino acid sequence with at least 91% homology to any one of: SEQ ID Nos.: 130-144; or (c) both (a) and (b).
In some embodiments: (a) the AAE consists of an amino acid sequence of any one of SEQ ID Nos.: 12-22; (b) the PKS consists of an amino acid sequence of any one of SEQ ID Nos.: 27-30; (c) the PKC consists of an amino acid sequence of any one of SEQ ID Nos.: 39-46; (d) the PT consists of an amino acid sequence of any one of SEQ ID Nos.: 59-70; (e) the CBCAS consists of an amino acid sequence of any one of SEQ ID Nos.: 80-88; (f) or any combination of (a) to (e).
In some embodiments: (a) the UGT consists of an amino acid sequence of any one of: SEQ ID Nos.: 102-114; (b) the AAT consists of an amino acid sequence of any one of: SEQ ID Nos.: 130-144; or (c) both (a) and (b).
In some embodiments, the isolated DNA molecule comprises a plurality of isolated DNA molecule types.
In some embodiments, each type of the plurality of isolated DNA molecule types encodes a protein or a plurality of proteins belonging to a different enzyme family.
In some embodiments, the transgenic cell is any one of: a unicellular organism, a cell of a multicellular organism, and a cell in a culture.
In some embodiments, the unicellular organism comprises a fungus or a bacterium. In some embodiments, the fungus is a yeast cell.
In some embodiments, the transgenic cell is a transgenic Cannabis sativa cell.
In some embodiments, the extract comprises a cannabinoid, a precursor thereof, or a combination thereof.
In some embodiments, the precursor is selected from the group consisting of: acyl coenzyme A (CoA), a polyketide, a resorcinoid precursor, and any combination thereof.
In some embodiments, the acyl is C1-C8 alkyl.
In some embodiments, the acyl CoA is hexanoyl CoA.
In some embodiments, the polyketide is a tetraketide.
In some embodiments, the tetraketide is a linear tetraketide.
In some embodiments, the resorcinoid precursor is olivetolic acid.
In some embodiments, the cannabinoid is cannabigerolic acid (CBGA), CBCA, or both.
In some embodiments, the artificial nucleic acid molecule is an expression vector.
In some embodiments, the transgenic cell or the transfected cell is a prokaryote cell or a eukaryote cell.
In some embodiments, the transgenic cell or the transfected cell is a C. sativa cell.
In some embodiments, the method further comprises a step preceding step (a), comprising introducing or transfecting a cell with the artificial nucleic acid molecule, thereby obtaining the transgenic cell or the transfected cell.
In some embodiments, the method further comprises a step of extracting the transgenic cell or the transfected cell, thereby obtaining an extract from the transgenic cell or the transfected cell.
In some embodiments, the extract comprises a cannabinoid, a precursor thereof, or any combination thereof.
Unless otherwise defined, all technical and/or scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the invention pertains. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of embodiments of the invention, exemplary methods and/or materials are described below. In case of conflict, the patent specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and are not intended to be necessarily limiting.
Further embodiments and the full scope of applicability of the present invention will become apparent from the detailed description given hereinafter. However, it should be understood that the detailed description and specific examples, while indicating preferred embodiments of the invention, are given by way of illustration only, since various changes and modifications within the spirit and scope of the invention will become apparent to those skilled in the art from this detailed description.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
The present invention, in some embodiments, is directed to a DNA molecule comprising at least a first nucleic acid sequence encoding a first protein and at least a second nucleic acid sequence encoding a second protein, wherein the first protein and the second protein are derived from Helichrysum umbraculigerum, including methods of using same.
In some embodiments, any one of the first protein and the second protein belongs to an enzyme family selected from: acyl activating enzyme (AAE), polyketide synthase (PKS), polyketide cyclase (PKC), prenyltransferase (PT), cannabichromenic acid synthase (CBCAS), uridine diphosphate (UDP)-glycosyltransferase (UGT), alcohol acyltransferase (AAT).
In some embodiments, the DNA molecule further comprises at least a third nucleic acid sequence encoding a third protein derived from H. umbraculigerum and belonging to an enzyme family selected from: AAE, PKS, PKC, PT, CBCAS, UGT, and AAT.
In some embodiments, the DNA molecule further comprises at least a fourth nucleic acid sequence encoding a third protein derived from H. umbraculigerum and belonging to an enzyme family selected from: AAE, PKS, PKC, PT, CBCAS, UGT, and AAT.
In some embodiments, the DNA molecule further comprises at least a fifth nucleic acid sequence encoding a third protein derived from H. umbraculigerum and belonging to an enzyme family selected from: AAE, PKS, PKC, PT, CBCAS, UGT, and AAT.
In some embodiments, the DNA molecule further comprises at least a sixth nucleic acid sequence encoding a third protein derived from H. umbraculigerum and belonging to an enzyme family selected from: AAE, PKS, PKC, PT, CBCAS, UGT, and AAT.
In some embodiments, the DNA molecule further comprises at least a seventh nucleic acid sequence encoding a third protein derived from H. umbraculigerum and belonging to an enzyme family selected from: AAE, PKS, PKC, PT, CBCAS, UGT, and AAT.
In some embodiments, the first protein and the second protein belong to different enzyme families.
In some embodiments, the first protein, the second protein, and the third protein belong to different enzyme families.
In some embodiments, the first protein, the second protein, the third protein, and the fourth protein belong to different enzyme families.
In some embodiments, the first protein, the second protein, the third protein, the fourth protein, and the fifth protein belong to different enzyme families.
In some embodiments, the first protein, the second protein, the third protein, the fourth protein, the fifth protein, and the sixth protein belong to different enzyme families.
In some embodiments, the first protein, the second protein, the third protein, the fourth protein, the fifth protein, the sixth protein, and the seventh protein belong to different enzyme families.
According to some embodiments: (a) an AAE protein is encoded by a nucleic acid sequence having at least 89% homology or identity to any one of SEQ ID Nos.: 1-11; (b) PKS is encoded by a nucleic acid sequence having at least 83% homology or identity to SEQ ID Nos.: 23-26; (c) PKC is encoded by a nucleic acid sequence having at least 88% homology or identity to SEQ ID Nos.: 31-38; (d) PT is encoded by a nucleic acid sequence having at least 91% homology or identity to SEQ ID Nos.: 47-58; (e) CBCAS is encoded by a nucleic acid sequence having at least 82% homology or identity to SEQ ID Nos.: 71-79; or (f) any combination of (a) to (e).
In some embodiments, the DNA molecule further comprises a nucleic acid sequence being derived from Helichrysum umbraculigerum and encoding one or more protein(s) or enzyme(s) belonging to the uridine diphosphate (UDP)-glycosyltransferase (UGT) family; the alcohol acyltransferase (AAT) family, or both.
In some embodiments: (a) UGT is encoded by a nucleic acid sequence having at least 87% homology to any one of: SEQ ID Nos.: 89-101, and any combination thereof; (b) AAT is encoded by a nucleic acid sequence having at least 87% homology to any one of: SEQ ID Nos.: 115-129, and any combination thereof; or (c) both (a) and (b).
In some embodiments, the DNA molecule comprises at least two nucleic acid sequence encoding at least two enzyme, wherein each enzyme belongs to a different family, wherein the at least two families are selected from: AAE, PKS, PKC, PT, CBCAS, UGT, and AAT.
In some embodiments, the DNA molecule is an isolated DNA molecule. In some embodiments, the DNA molecule is a complementary DNA (cDNA) molecule.
As used herein, the term “DNA molecule” refers to a polynucleotide comprising or consisting of deoxyribonucleotides.
As used herein, the terms “isolated polynucleotide” and “isolated DNA molecule” refer to a nucleic acid molecule that is essentially free from contaminating cellular components, such as carbohydrate, lipid, or other proteinaceous impurities associated with the nucleic acid in nature. Typically, a preparation of isolated DNA or RNA contains the nucleic acid in a highly purified form, e.g., at least about 80% pure, at least about 90% pure, at least about 95% pure, greater than 95% pure, or greater than 99% pure. In some embodiments, the isolated polynucleotide is any one of DNA, RNA, and cDNA. In some embodiments, the isolated polynucleotide is a synthesized polynucleotide. Synthesis of polynucleotides is well known in the art and may be performed, for example, by ligating or covalently linking by primer linkers multiple nucleic acid molecules together.
The term “nucleic acid” is well known in the art of molecular biology. A “nucleic acid” as used herein will generally refer to any molecule (e.g., a strand) of DNA, RNA or a derivative or analog thereof, comprising nucleotides. Nucleotides are comprised of nucleosides and phosphate groups. The nitrogenous bases of nucleosides include, for example, naturally occurring purine or pyrimidine nucleosides as found in DNA (e.g., an adenine “A,” a guanine “G,” a thymine “T” or a cytosine “C”) or RNA (e.g., an A, a G, an uracil “U” or a C).
The term “nucleic acid molecule” includes but is not limited to single-stranded RNA (ssRNA), double-stranded RNA (dsRNA), single-stranded DNA (ssDNA), double-stranded DNA (dsDNA), small RNAs, circular nucleic acids, fragments of genomic DNA or RNA, degraded nucleic acids, amplification products, modified nucleic acids, plasmid or organellar nucleic acids, and artificial nucleic acids such as oligonucleotides.
In some embodiments, the DNA molecule comprises the nucleic acid sequence:
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 89%, at least 92%, at least 95%, or at least 97% homology or identity to SEQ ID NO: 1, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 89% to 95%, 90% to 97%, 95% to 99%, or 90% to 100% homology or identity to SEQ ID NO: 1. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises the nucleic acid sequence:
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 79%, at least 83%, at least 87%, at least 89%, at least 92%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 2, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 79% to 85%, 80% to 92%, 82% to 99%, or 80% to 100% homology or identity to SEQ ID NO: 2. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises the nucleic acid sequence:
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 86%, at least 88%, at least 90%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 3, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 86% to 94%, 88% to 97%, 86% to 100%, or 92% to 99% homology or identity to SEQ ID NO: 3. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises the nucleic acid sequence:
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 88%, at least 90%, at least 92%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 4, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 88% to 95%, 89% to 99%, 91 to 98%, or 88% to 100% homology or identity to SEQ ID NO: 4. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises the nucleic acid sequence:
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 88%, at least 90%, at least 92%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 5, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 88% to 95%, 89% to 99%, 91 to 98%, or 88% to 100% homology or identity to SEQ ID NO: 5. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises the nucleic acid sequence:
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 89%, at least 90%, at least 92%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 6, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 89% to 95%, 90% to 99%, 91 to 98%, or 89% to 100% homology or identity to SEQ ID NO: 6. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises the nucleic acid sequence:
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 85%, at least 87%, at least 90%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 7, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 85% to 94%, 88% to 97%, 85% to 100%, or 92% to 99% homology or identity to SEQ ID NO: 7. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises the nucleic acid sequence:
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 84%, at least 87%, at least 90%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 8, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 84% to 94%, 88% to 97%, 84% to 100%, or 92% to 99% homology or identity to SEQ ID NO: 8. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises the nucleic acid sequence:
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 88%, at least 90%, at least 92%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 9, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 88% to 95%, 89% to 99%, 91 to 98%, or 88% to 100% homology or identity to SEQ ID NO: 9. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises the nucleic acid sequence:
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 89%, at least 92%, at least 95%, or at least 97% homology or identity to SEQ ID NO: 10, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 89% to 95%, 90% to 97%, 95% to 99%, or 90% to 100% homology or identity to SEQ ID NO: 10. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises the nucleic acid sequence:
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 89%, at least 92%, at least 95%, or at least 97% homology or identity to SEQ ID NO: 11, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 89% to 95%, 90% to 97%, 95% to 99%, or 90% to 100% homology or identity to SEQ ID NO: 11. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 83%, at least 85%, at least 87%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 23, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 83% to 100%, 88% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 23. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 83%, at least 85%, at least 87%, at least 89%, at least 92%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 24, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 83% to 100%, 87% to 100%, 90% to 100%, or 93% to 100% homology or identity to SEQ ID NO: 24. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 83%, at least 87%, at least 89%, at least 90%, at least 92%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 25, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 83% to 100%, 88% to 100%, 93% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 25. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 82%, at least 85%, at least 89%, at least 92%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 26, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 82% to 100%, 86% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 26. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 72%, at least 75%, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 31, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 72% to 95%, 72% to 100%, 75% to 99%, or 80% to 100% homology or identity to SEQ ID NO: 31. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 32, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 50% to 95%, 55% to 98%, 60% to 99%, or 50% to 100% homology or identity to SEQ ID NO: 32. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 67%, at least 72%, at least 78%, at least 85%, at least 92%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 33, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 67% to 95%, 70% to 98%, 75% to 99%, or 67% to 100% homology or identity to SEQ ID NO: 33. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 74%, at least 78%, at least 85%, at least 89%, at least 92%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 34, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 74% to 95%, 78% to 98%, 80% to 99%, or 75% to 100% homology or identity to SEQ ID NO: 34. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 69%, at least 75%, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 35, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 69% to 95%, 70% to 100%, 80% to 99%, or 68% to 100% homology or identity to SEQ ID NO: 35. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 73%, at least 75%, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 36, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 73% to 95%, 73% to 100%, 80% to 99%, or 80% to 100% homology or identity to SEQ ID NO: 36. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 69%, at least 78%, at least 85%, at least 89%, at least 92%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 37, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 69% to 95%, 70% to 98%, 71% to 99%, or 69% to 100% homology or identity to SEQ ID NO: 37. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 88%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% homology or identity to SEQ ID NO: 38, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 88% to 95%, 88% to 98%, 89% to 99%, or 88% to 100% homology or identity to SEQ ID NO: 38. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 75%, at least 79%, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 47, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 75% to 100%, 80% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 47. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 48, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 80% to 100%, 85% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 48. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 75%, at least 79%, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 49, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 75% to 100%, 80% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 49. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 91%, at least 92%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 50, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 91% to 100%, 93% to 100%, 95% to 100%, or 97% to 100% homology or identity to SEQ ID NO: 50. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 91%, at least 92%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 51, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 91% to 100%, 93% to 100%, 95% to 100%, or 97% to 100% homology or identity to SEQ ID NO: 51. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 90%, at least 92%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 52, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the polynucleotide comprises a nucleic acid sequence with 90% to 100%, 92% to 100%, 95% to 100%, or 97% to 100% homology or identity to SEQ ID NO: 52. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 77%, at least 79%, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 53, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 77% to 100%, 85% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 53. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 89%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% homology or identity to SEQ ID NO: 54, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 89% to 100%, 92% to 100%, 94% to 100%, or 97% to 100% homology or identity to SEQ ID NO: 54. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 76%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, or at least 99% homology or identity to SEQ ID NO: 55, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 76% to 100%, 83% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 55. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, or at least 99% homology or identity to SEQ ID NO: 56, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 75% to 100%, 80% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 56. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 76%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, or at least 99% homology or identity to SEQ ID NO: 57, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 76% to 100%, 85% to 100%, 90% to 100%, or 96% to 100% homology or identity to SEQ ID NO: 57. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 77%, at least 79%, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 58, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 77% to 100%, 85% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 58. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 68%, at least 75%, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 71, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 68% to 95%, 75% to 100%, 72% to 99%, or 68% to 100% homology or identity to SEQ ID NO: 71. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 71%, at least 77%, at least 85%, at least 93%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 72, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 71% to 95%, 75% to 98%, 80% to 99%, or 71% to 100% homology or identity to SEQ ID NO: 72. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 69%, at least 75%, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 73, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 69% to 95%, 75% to 100%, 72% to 99%, or 69% to 100% homology or identity to SEQ ID NO: 73. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 79%, at least 85%, at least 92%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 74, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 79% to 98%, 80% to 99%, 82% to 99%, or 79% to 100% homology or identity to SEQ ID NO: 74. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 82%, at least 87%, at least 92%, at least 96%, or at least 99% homology or identity to SEQ ID NO: 75, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 82% to 98%, 83% to 99%, 85% to 99%, or 82% to 100% homology or identity to SEQ ID NO: 75. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 80%, at least 87%, at least 93%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 76, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 80% to 98%, 81% to 99%, 85% to 99%, or 80% to 100% homology or identity to SEQ ID NO: 76. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 79, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 77, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 79% to 95%, 82% to 97%, 81% to 98%, or 79% to 100% homology or identity to SEQ ID NO: 77. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 80%, at least 85%, at least 89%, at least 92%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 78, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 80% to 95%, 85% to 98%, 89% to 99%, or 80% to 100% homology or identity to SEQ ID NO: 78. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 79, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO:79, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 79% to 95%, 82% to 97%, 81% to 98%, or 79% to 100% homology or identity to SEQ ID NO: 79. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises the nucleic acid sequence:
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 77%, at least 79%, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 89, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 77% to 95%, 78% to 100%, 79% to 99%, or 77% to 100% homology or identity to SEQ ID NO: 89. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 76%, at least 77%, at least 85%, at least 93%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 99, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 76% to 95%, 77% to 98%, 80% to 99%, or 76% to 100% homology or identity to SEQ ID NO: 90. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 78%, at least 80%, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 91, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 79% to 95%, 78% to 100%, 80% to 99%, or 79% to 100% homology or identity to SEQ ID NO: 91. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 87%, at least 92%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 92, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 87% to 100%, 88% to 99%, 89% to 99%, or 87% to 100% homology or identity to SEQ ID NO: 92. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 87%, at least 92%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 93, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 87% to 100%, 88% to 99%, 89% to 99%, or 87% to 100% homology or identity to SEQ ID NO: 93. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 80%, at least 87%, at least 93%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 94, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 80% to 98%, 81% to 99%, 85% to 99%, or 80% to 100% homology or identity to SEQ ID NO: 94. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 77, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 95, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 77% to 95%, 82% to 97%, 81% to 98%, or 77% to 100% homology or identity to SEQ ID NO: 95. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 82%, at least 85%, at least 89%, at least 92%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 96, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 82% to 95%, 83% to 98%, 82% to 99%, or 82% to 100% homology or identity to SEQ ID NO: 96. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 79, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 97, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 79% to 95%, 82% to 97%, 81% to 98%, or 79% to 100% homology or identity to SEQ ID NO: 97. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 78, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 98, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 78% to 95%, 82% to 97%, 81% to 98%, or 78% to 100% homology or identity to SEQ ID NO: 98. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 82, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 99, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 82% to 95%, 82% to 97%, 83% to 98%, or 82% to 100% homology or identity to SEQ ID NO: 99. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 74, at least 80%, at least 85%, at least 87%, at least 93%, or at least 99% homology or identity to SEQ ID NO: 100, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 74% to 95%, 75% to 97%, 76% to 98%, or 74% to 100% homology or identity to SEQ ID NO: 100. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 80, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 101, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 80% to 95%, 82% to 97%, 81% to 98%, or 80% to 100% homology or identity to SEQ ID NO: 101. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 84%, at least 87%, at least 90%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 115, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 84% to 100%, 88% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 115. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 77%, at least 85%, at least 93%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 116, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 77% to 100%, 80% to 100%, 85% to 100%, or 93% to 100% homology or identity to SEQ ID NO: 116. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 87%, at least 90%, at least 93%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 117, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 87% to 100%, 90% to 100%, 93% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 117. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 82%, at least 90%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 118, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 82% to 100%, 85% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 118. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 74%, at least 80%, at least 85%, or at least 95% homology or identity to SEQ ID NO: 119, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 74% to 100%, 80% to 100%, 87% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 119. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 79%, at least 87%, at least 93%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 120, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 79% to 100%, 85% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 120. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 82, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 121, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 82% to 100%, 85% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 121. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 83%, at least 85%, at least 89%, at least 92%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 122, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 83% to 100%, 88% to 100%, 92% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 122. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 77, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 123, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 77% to 100%, 82% to 100%, 87% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 123. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 84, at least 89%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 124, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 84% to 100%, 88% to 100%, 93% to 100%, or 97% to 100% homology or identity to SEQ ID NO: 124. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 82, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 125, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 82% to 100%, 85% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 125. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 72, at least 80%, at least 85%, at least 87%, at least 93%, or at least 99% homology or identity to SEQ ID NO: 126, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 72% to 100%, 79% to 100%, 86% to 100%, or 91% to 100% homology or identity to SEQ ID NO: 126. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 79, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 127, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 79% to 100%, 85% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 127. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid sequence:
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 82, at least 85%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 128, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 82% to 100%, 88% to 100%, 93% to 100%, or 97% to 100% homology or identity to SEQ ID NO: 128. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises or consists of the nucleic acid
In some embodiments, the DNA molecule comprises a nucleic acid sequence with at least 87, at least 91%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 129, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the DNA molecule comprises a nucleic acid sequence with 87% to 100%, 90% to 100%, 94% to 100%, or 97% to 100% homology or identity to SEQ ID NO: 129. Each possibility represents a separate embodiment of the invention.
In some embodiments, the DNA molecule comprises a plurality of nucleic acid sequences. In some embodiments, the polynucleotide comprises a plurality of types of polynucleotides.
As used herein, the term “plurality” comprises any integer equal to or greater than 2.
In some embodiments, plurality of nucleic acid sequences encode proteins of different enzymatic functions or families as described herein. In some embodiments, plurality of nucleic acid sequences encode at least two proteins of the same enzymatic function or family as described herein. In some embodiments, plurality of nucleic acid sequences encode a plurality of proteins of a plurality of different enzymatic functions or families as described herein.
In some embodiments, the DNA molecule encodes a protein characterized by acyl activating enzymatic (AAE) activity. In some embodiments, the DNA molecule encodes an AAE protein. In some embodiments, the AAE is an AAE derived from Helichrysum umbraculigerum. In some embodiments, the DNA molecule encoding a protein characterized by acyl activating enzymatic (AAE) activity comprises a nucleic acid sequence set forth in SEQ ID Nos.: 1-11.
As used herein, the terms “acyl activating enzyme” and “AAE” are interchangeable, and refer to any peptide, polypeptide, or a protein, capable of catalyzing the activation of a carboxylic acid. In some embodiments, AAE activity comprises forming or formation of a thioester bond. In some embodiments, AAE activity comprises coupling a carboxyl group to an amine group. In some embodiments, AAE activity comprises coupling a carboxyl group to an alcohol. In some embodiments, the AAE is an acid-thiol ligase.
In some embodiments, the DNA molecule encodes a protein characterized by polyketide synthesizing activity. In some embodiments, the DNA molecule encodes a protein being a polyketide synthase (PKS). In some embodiments, the PKS is a PKS derived from Helichrysum umbraculigerum. As used herein, the terms “polyketide synthase” and “PKS” encompasses any enzyme derived from H. umbraculigerum and having or characterized by being functional analog of the “olivetol synthase” or “OLS” of Cannabis sativa. In some embodiments, the DNA molecule encoding a protein characterized by polyketide synthesizing activity comprises a nucleic acid sequence set forth in SEQ ID Nos.: 23-26.
As used herein, the terms “polyketide synthase” and “PKS” are interchangeable, and refer to any peptide, polypeptide, or a protein, capable of catalyzing the elongation of a ketide or a polyketide chain. In some embodiments, PKS activity transacylation. In some embodiments, PKS activity comprises Claisen condensation. In some embodiments, PKS activity comprises reduction of β-keto group to a β-hydroxy group. In some embodiments, PKS activity comprises H2O splitting, thereby obtaining, providing, or resulting in a α-β-unsaturated alkene. In some embodiments, PKS activity comprises reducing a α-β-double-bond to a single-bond. In some embodiments, PKS activity comprises hydrolyzing a polyketide chain or a completed polyketide chain from an acyl carrier protein domain of the PKS. In some embodiments, PKS activity comprises polymerizing and/or ligating a diketide substrate into a polyketide chain. In some embodiments, PKS activity comprises elongating a diketide to a polyketide chain. In some embodiments, PKS activity comprises elongating a polyketide chain.
In some embodiments, the DNA molecule encodes a protein characterized by polyketide cyclizing activity. In some embodiments, the DNA molecule encodes a protein being a polyketide cyclase (PKC). In some embodiments, the PKC is a PKC derived from Helichrysum umbraculigerum. As used herein, the terms “polyketide cyclase” and “PKC” encompasses any enzyme derived from H. umbraculigerum and having or characterized by being functional analog of the “olivetolic acid cyclase” or “OAC” of Cannabis sativa. In some embodiments, the DNA molecule encoding a protein characterized by polyketide cyclizing activity comprises a nucleic acid sequence set forth in SEQ ID Nos.: 31-38.
As used herein, the terms “polyketide cyclase” and “PKC” are interchangeable, and refer to any peptide, polypeptide, or a protein, capable of folding and/or cyclizing a polyketide. In some embodiments, PKC activity comprises an action of a cyclase subunit. In some embodiments, PKC activity comprises site-specific keto-reductase activity.
In some embodiments, the DNA molecule encodes a protein characterized by prenyl transferring activity. In some embodiments, the DNA molecule encodes a protein being a prenyltransferase (PT). In some embodiments, the PT is a PT derived from Helichrysum umbraculigerum. As used herein, the terms “prenyltransferase” and “PT” encompass any enzyme derived from H. umbraculigerum and having or characterized by being functional analog of the “geranylpyrophosphate: olivetolate geranyltransferase” or “GOT” of Cannabis sativa. In some embodiments, the GOT is GOT4 or CsGOT4. In some embodiments, the DNA molecule encoding a protein characterized by prenyl transferring activity comprises a nucleic acid sequence set forth in SEQ ID Nos.: 47-58.
As used herein, the terms “prenyltransferase” and “PT” are interchangeable, and refer to any peptide, polypeptide, or a protein, capable of transferring an allylic prenyl group to an acceptor molecule. In some embodiments, PT activity comprises cyclization. In some embodiments, PT activity comprises transferring an allylic prenyl group to an acceptor molecule.
In some embodiments, the DNA molecule encodes a protein characterized by cannabigerolic acid (CBGA) cyclization or cyclizing activity. In some embodiments, cycling activity comprises cyclization of CBGA to CBCA. In some embodiments, the polynucleotide encodes a protein capable of cyclizing or cyclization of CBGA to CBCA. In some embodiments, the DNA molecule encodes a protein characterized by being capable of synthesizing CBCA or being a CBCA synthase (CBCAS). In some embodiments, the CBCAS is a CBCAS derived from Helichrysum umbraculigerum. As used herein, the terms “CBCA synthase” and “CBCSA” encompass any enzyme derived from H. umbraculigerum and having or characterized by being a functional analog of the CBCA synthase of Cannabis sativa (e.g., CsCBCAS). In some embodiments, the DNA molecule encoding a protein characterized by CBGA cyclization or cyclizing activity comprises a nucleic acid sequence set forth in SEQ ID Nos.: 71-79.
In some embodiments, the polynucleotide encodes a protein characterized by catalytic activity of transfer a glucuronic acid component of UDP-glucuronic acid to a small hydrophobic molecule (e.g., a UGT). In some embodiments, the polynucleotide encodes a protein characterized by glycosyltransferase catalytic activity. In some embodiments, the polynucleotide encodes a protein characterized by being capable of transferring glucuronic acid component of UDP-glucuronic acid to a cannabinoid or a precursor thereof. In some embodiments, the polynucleotide encodes a protein characterized by having a catalytic activity of glycosylating a cannabinoid or a precursor thereof. In some embodiments, the polynucleotide encodes a UGT enzyme.
In some embodiments, the UGT is a UGT derived from Helichrysum umbraculigerum. As used herein, the term “UGT” encompass any enzyme derived from H. umbraculigerum and having or characterized by having an activity as described herein.
In some embodiments, the UGT protein is encoded by a DNA molecule comprising SEQ ID Nos.: 89-101.
In some embodiments, the DNA molecule encodes a protein characterized by being capable of acting on an acyl group. In some embodiments, the DNA molecule encodes a protein characterized by catalytic activity of transferring an acyl group from a donor molecule to an acceptor molecule. In some embodiments, the acceptor molecule is a hydrophobic molecule, a small molecule, or both. In some embodiments, the donor molecule comprises an acyl group, CoA, or both. In some embodiments, the DNA molecule encodes a protein characterized by acyltransferase catalytic activity. In some embodiments, the DNA molecule encodes a protein characterized by being capable of transferring an acyl group to a cannabinoid. In some embodiments, the DNA molecule encodes a protein characterized by having a catalytic activity of acylating a cannabinoid. In some embodiments, the acyltransferase (AT) is an alcohol acyltransferase (AAT). In some embodiments, the DNA molecule encodes an AT enzyme. In some embodiments, the polynucleotide encodes an AAT enzyme.
In some embodiments, the AAT is an AAT derived from Helichrysum umbraculigerum. As used herein, the term “AAT” encompass any enzyme derived from H. umbraculigerum and having or characterized by having an activity as described herein.
In some embodiments, the AAT protein is encoded by a DNA molecule comprising or consisting of SEQ ID Nos.: 115-129.
In some embodiments, the artificial vector comprises a plasmid. In some embodiments, the artificial vector comprises or is an agrobacterium comprising the artificial nucleic acid molecule. In some embodiments, the artificial vector is an expression vector. In some embodiments, the artificial vector is a plant expression vector. In some embodiments, the artificial vector is for use in expressing any one of: AAE, PKS, PKC, PT, or CBCAS encoding nucleic acid sequence as disclosed herein, or any combination thereof. In some embodiments, the artificial vector is further for the use in expressing UGT, AAT, or both. In some embodiments, the artificial vector is for use in heterologous expression of any one of: AAE, PKS, PKC, PT, or CBCAS encoding nucleic acid sequence as disclosed herein, or any combination thereof, in a cell, a tissue, or an organism. In some embodiments, the artificial vector is further for the use in heterologous expression of UGT, AAT, or both in a cell, in a tissue, or an organism. In some embodiments, the artificial vector is for use in producing or the production of an acyl-coenzyme A (acyl-CoA), a polyketide, a cannabinoid, e.g., CBGA, CBCA, any precursor thereof, or any combination thereof, in a cell, a tissue, or an organism. In some embodiments, the artificial vector is further used in producing or the production of a modified acyl-coenzyme A (acyl-CoA), a polyketide, a cannabinoid, e.g., CBGA, CBCA, any precursor thereof, or any combination thereof, in a cell, a tissue, or an organism, wherein the modified further comprises an acyl group, a glycan (e.g., glycosylated), or both.
Expressing a polynucleotide within a cell is well known to one skilled in the art. It can be carried out by, among many methods, transfection, viral infection, or direct alteration of the cell's genome. In some embodiments, the DNA molecule is in an expression vector such as plasmid or viral vector. A vector nucleic acid sequence generally contains at least an origin of replication for propagation in a cell and optionally additional elements, such as a heterologous polynucleotide sequence, expression control element (e.g., a promoter, enhancer), selectable marker (e.g., antibiotic resistance), poly-Adenine sequence.
The vector may be a DNA plasmid delivered via non-viral methods or via viral methods. The viral vector may be a retroviral vector, a herpesviral vector, an adenoviral vector, an adeno-associated viral vector, a virgaviridae viral vector, or a poxviral vector. The barley stripe mosaic virus (BSMV), the tobacco rattle virus and the cabbage leaf curl geminivirus (CbLCV) may also be used. The promoters may be active in plant cells. The promoters may be a viral promoter.
In some embodiments, the DNA molecule as disclosed herein is operably linked to a promoter. The term “operably linked” is intended to mean that the nucleotide sequence of interest is linked to the regulatory element or elements in a manner that allows for expression of the nucleotide sequence (e.g., in an in vitro transcription/translation system or in a host cell when the vector is introduced into the host cell). In some embodiments, the promoter is operably linked to the polynucleotide of the invention. In some embodiments, the promoter is a heterologous promoter. In some embodiments, the promoter is the endogenous promoter.
In some embodiments, the vector is introduced into the cell by standard methods including electroporation (e.g., as described in From et al., Proc. Natl. Acad. Sci. USA 82, 5824 (1985)), heat shock, infection by viral vectors, high velocity ballistic penetration by small particles with the nucleic acid either within the matrix of small beads or particles, or on the surface (Klein et al., Nature 327. 70-73 (1987)), such as biolistic use of coated particles, and needle-like particles, Agrobacterium Ti plasmids and/or the like. The term “promoter” as used herein refers to a group of transcriptional control modules that are clustered around the initiation site for an RNA polymerase i.e., RNA polymerase II. Promoters are composed of discrete functional modules, each consisting of approximately 7-20 bp of DNA, and containing one or more recognition sites for transcriptional activator or repressor proteins. The promoter may extend upstream or downstream of the transcriptional start site and may be any size ranging from a few base pairs to several kilo-bases.
In some embodiments, the DNA molecule is transcribed by RNA polymerase II (RNAP II and Pol II). RNAP II is an enzyme found in eukaryotic cells, known to catalyze the transcription of DNA to synthesize precursors of mRNA and most snRNA and microRNA.
In some embodiments, a plant expression vector is used. In one embodiment, the expression of a polypeptide coding sequence is driven by a number of promoters. In some embodiments, viral promoters such as the 35S RNA and 19S RNA promoters of CaMV [Brisson et al., Nature 310:511-514 (1984)], or the coat protein promoter to TMV [Takamatsu et al., EMBO J. 6:307-311 (1987)] are used. In another embodiment, plant promoters are used such as, for example, the small subunit of RUBISCO [Coruzzi et al., EMBO J. 3:1671-1680 (1984); and Brogli et al., Science 224:838-843 (1984)] or heat shock promoters, e.g., soybean hsp17.5-E or hsp17.3-B [Gurley et al., Mol. Cell. Biol. 6:559-565 (1986)]. In one embodiment, constructs are introduced into plant cells using Ti plasmid, Ri plasmid, plant viral vectors, direct DNA transformation, microinjection, electroporation and other techniques well known to the skilled artisan. See, for example, Weissbach & Weissbach [Methods for Plant Molecular Biology, Academic Press, NY, Section VIII, pp 421-463 (1988)]. Other expression systems such as insects and mammalian host cell systems, which are well known in the art, can also be used by the present invention.
In some embodiments, expression vectors containing regulatory elements from eukaryotic viruses such as retroviruses are used by the present invention. SV40 vectors include pSVT7 and pMT2. In some embodiments, vectors derived from bovine papilloma virus include pBV-IMTHA, and vectors derived from Epstein Bar virus include pHEBO, and p205. Other exemplary vectors include pMSG, pAV009/A+, pMTO10/A+, pMAMneo-5, baculovirus pDSVE, and any other vector allowing expression of proteins under the direction of the SV-40 early promoter, SV-40 later promoter, metallothionein promoter, murine mammary tumor virus promoter, Rous sarcoma virus promoter, polyhedrin promoter, or other promoters shown effective for expression in eukaryotic cells.
In some embodiments, recombinant viral vectors, which offer advantages such as systemic infection and targeting specificity, are used for in vivo expression. In one embodiment, systemic infection is inherent in the life cycle of, for example, the retrovirus and is the process by which a single infected cell produces many progeny virions that infect neighboring cells. In one embodiment, the result is that a large area becomes rapidly infected, most of which was not initially infected by the original viral particles. In one embodiment, viral vectors are produced that are unable to spread systemically. In one embodiment, this characteristic can be useful if the desired purpose is to introduce a specified gene into only a localized number of targeted cells.
In some embodiments, plant viral vectors are used. In some embodiments, a wild-type virus is used. In some embodiments, a deconstructed virus such as are known in the art is used. In some embodiments, Agrobacterium is used to introduce the vector of the invention into a virus.
Various methods can be used to introduce the expression vector of the present invention into cells. Such methods are generally described in Sambrook et al., Molecular Cloning: A Laboratory Manual, Cold Springs Harbor Laboratory, New York (1989, 1992), in Ausubel et al., Current Protocols in Molecular Biology, John Wiley and Sons, Baltimore, Md. (1989), Chang et al., Somatic Gene Therapy, CRC Press, Ann Arbor, Mich. (1995), Vega et al., Gene Targeting, CRC Press, Ann Arbor Mich. (1995), Vectors: A Survey of Molecular Cloning Vectors and Their Uses, Butterworths, Boston Mass. (1988) and Gilboa et at. [Biotechniques 4 (6): 504-512, 1986] and include, for example, stable or transient transfection, lipofection, electroporation, agrobacterium Ti plasmids and infection with recombinant viral vectors. In addition, see U.S. Pat. Nos. 5,464,764 and 5,487,992 for positive-negative selection methods.
It will be appreciated that other than containing the necessary elements for the transcription and translation of the inserted coding sequence (encoding the polypeptide), the expression construct of the present invention can also include sequences engineered to optimize stability, production, purification, yield, or activity of the expressed polypeptide.
In some embodiments, the artificial vector comprises a polynucleotide encoding a protein comprising an amino acid sequence as described herein.
According to some embodiments, there is provided a protein encoded by: (a) the DNA molecule disclosed herein; (b) the artificial vector disclosed herein; or the plasmid or agrobacterium disclosed herein.
In some embodiments, the protein is an isolated protein.
As used herein, the terms “peptide”, “polypeptide” and “protein” are interchangeable and refer to a polymer of amino acid residues. In another embodiment, the terms “peptide”, “polypeptide” and “protein” as used herein encompass native peptides, peptidomimetics (typically including non-peptide bonds or other synthetic modifications) and the peptide analogues peptoids and semipeptoids or any combination thereof. In another embodiment, the peptides, polypeptides and proteins described have modifications rendering them more stable while in the organism or more capable of penetrating into cells. In one embodiment, the terms “peptide”, “polypeptide” and “protein” apply to naturally occurring amino acid polymers. In another embodiment, the terms “peptide”, “polypeptide” and “protein” apply to amino acid polymers in which one or more amino acid residue is an artificial chemical analogue of a corresponding naturally occurring amino acid.
As used herein, the terms “isolated protein” refers to a protein that is essentially free from contaminating cellular components, such as carbohydrate, lipid, or other proteinaceous impurities associated with the nucleic acid in nature. Typically, a preparation of an isolated protein contains the protein in a highly purified form, e.g., at least about 80% pure, at least about 90% pure, at least about 95% pure, greater than 95% pure, or greater than 99% pure. In some embodiments, the isolated protein is a synthesized protein. Synthesis of protein is well known in the art and may be performed, for example, by heterologous expression in a transformed cell, such as exemplified herein.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 91%, at least 93%, at least 95%, or at least 97% homology or identity to SEQ ID NO: 12, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 91% to 97%, 92% to 99%, 93% to 98%, or 90% to 100% homology or identity to SEQ ID NO: 12. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 83%, at least 85%, at least 90%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 13, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 83% to 95%, 85% to 99%, 83% to 100%, or 84% to 97% homology or identity to SEQ ID NO: 13. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 86%, at least 88%, at least 90%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 14, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 86% to 93%, 86% to 95%, 88% to 97%, or 86% to 100% homology to SEQ ID NO: 14. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 86%, at least 88%, at least 90%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 15, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 86% to 93%, 86% to 95%, 88% to 97%, or 86% to 100% homology to SEQ ID NO: 15. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 89%, at least 92%, at least 94%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 16, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 89% to 95%, 89% to 98%, 90% to 99%, or 89% to 100% homology to SEQ ID NO: 16. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 93%, at least 94%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 17, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 93% to 98%, 93% to 99%, 93% to 100%, or 95% to 100% homology to SEQ ID NO: 17. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 84%, at least 87%, at least 91%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 18, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 84% to 99%, 85% to 99%, 84% to 100%, or 90% to 100% homology to SEQ ID NO: 18. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 82%, at least 87%, at least 91%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 19, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 82% to 99%, 83% to 99%, 82% to 100%, or 85% to 100% homology to SEQ ID NO: 19. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 86%, at least 88%, at least 90%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 20, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 86% to 93%, 86% to 95%, 88% to 97%, or 86% to 100% homology to SEQ ID NO: 20. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 89%, at least 92%, at least 94%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 21, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 89% to 95%, 89% to 98%, 90% to 99%, or 89% to 100% homology to SEQ ID NO: 21. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 88%, at least 92%, at least 94%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 22, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 88% to 95%, 89% to 98%, 90% to 99%, or 88% to 100% homology to SEQ ID NO: 22. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 92%, at least 96%, at least 98%, or at least 99% homology or identity to SEQ ID NO: 27, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 92% to 100%, 95% to 100%, 96% to 100%, or 98% to 100% homology or identity to SEQ ID NO: 27. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 91%, at least 94%, at least 95%, or at least 97% homology or identity to SEQ ID NO: 28, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 91% to 100%, 94% to 100%, 97% to 100%, or 97% to 100% homology or identity to SEQ ID NO: 28. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 93%, at least 95%, or at least 97% homology or identity to SEQ ID NO: 29, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 93% to 100%, 94% to 100%, 96% to 100%, or 98% to 100% homology or identity to SEQ ID NO: 29. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 88%, at least 92%, at least 95%, or at least 97% homology or identity to SEQ ID NO: 30, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 88% to 100%, 91% to 100%, 93% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 30. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 86%, at least 91%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 39, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 86% to 99%, 88% to 98%, 90% to 99%, or 89% to 100% homology or identity to SEQ ID NO: 39. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 45%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, or at least 99%, homology or identity to SEQ ID NO: 40, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 45% to 90%, 50% to 99%, 65% to 98%, or 55% to 100% homology or identity to SEQ ID NO: 40. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 71%, at least 80%, at least 90%, or at least 99% homology or identity to SEQ ID NO: 41, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 71% to 97%, 75% to 99%, 80% to 98%, or 71% to 100% homology or identity to SEQ ID NO: 41. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 87%, at least 92%, at least 96%, or at least 97% homology or identity to SEQ ID NO: 42, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 87% to 97%, 88% to 99%, 90% to 98%, or 87% to 100% homology or identity to SEQ ID NO: 42. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 85%, at least 88%, at least 90%, at least 92%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 43, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 85% to 97%, 87% to 99%, 89% to 98%, or 85% to 100% homology or identity to SEQ ID NO: 43. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 79%, at least 82%, at least 85%, at least 90%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 44, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 79% to 95%, 79% to 99%, 80% to 98%, or 79% to 100% homology or identity to SEQ ID NO: 44. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 45, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 50% to 90%, 55% to 99%, 60% to 97%, or 50% to 100% homology or identity to SEQ ID NO: 45. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 87%, at least 93%, at least 95%, or at least 97% homology or identity to SEQ ID NO: 46, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 87% to 97%, 88% to 99%, 89% to 98%, or 87% to 100% homology or identity to SEQ ID NO: 46. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 82%, at least 85%, at least 90%, or at least 99% homology or identity to SEQ ID NO: 59, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 82% to 99%, 85% to 98%, 84% to 99%, or 82% to 100% homology or identity to SEQ ID NO: 59. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 92%, at least 93%, at least 95%, at least 97%, at least 98%, or at least 99% homology or identity to SEQ ID NO: 60, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 92% to 98%, 93% to 99%, 94% to 98%, or 92% to 100% homology or identity to SEQ ID NO: 60. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 89%, at least 90%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 61, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 89% to 97%, 89% to 99%, 90% to 98%, or 89% to 100% homology or identity to SEQ ID NO: 61. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 81%, at least 85%, at least 90%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 62, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 81% to 97%, 83% to 99%, 84% to 98%, or 81% to 100% homology or identity to SEQ ID NO: 62. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 81%, at least 85%, at least 90%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 63, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 81% to 97%, 83% to 99%, 84% to 98%, or 81% to 100% homology or identity to SEQ ID NO: 63. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 92%, at least 93%, at least 95%, at least 97%, at least 98%, or at least 99% homology or identity to SEQ ID NO: 64, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 92% to 98%, 93% to 99%, 94% to 98%, or 92% to 100% homology or identity to SEQ ID NO: 64. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 71%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 65, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 71% to 90%, 75% to 99%, 73% to 97%, or 71% to 100% homology or identity to SEQ ID NO: 65. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 89%, at least 90%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 66, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 89% to 97%, 89% to 99%, 90% to 98%, or 89% to 100% homology or identity to SEQ ID NO: 66. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 68%, at least 75%, at least 80%, at least 855, at least 90%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 67, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 68% to 97%, 69% to 99%, 70% to 98%, or 68% to 100% homology or identity to SEQ ID NO: 67. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 66%, at least 75%, at least 80%, at least 855, at least 90%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 68, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 66% to 97%, 67% to 99%, 70% to 98%, or 66% to 100% homology or identity to SEQ ID NO: 68. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 68%, at least 75%, at least 80%, at least 855, at least 90%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 69, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 68% to 97%, 69% to 99%, 70% to 98%, or 68% to 100% homology or identity to SEQ ID NO: 69. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 71%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 70, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 71% to 100%, 80% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 70. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 69%, at least 75%, at least 85%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 80, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 69% to 99%, 70% to 98%, 75% to 99%, or 69% to 100% homology or identity to SEQ ID NO: 80. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 81%, at least 85%, at least 90%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 81, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 92% to 98%, 93% to 99%, 94% to 98%, or 92% to 100% homology or identity to SEQ ID NO: 81. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 86%, at least 90%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 82, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 86% to 97%, 87% to 99%, 88% to 98%, or 86% to 100% homology or identity to SEQ ID NO: 82. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 69%, at least 75%, at least 80%, at least 85%, at least 92%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 83, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 69% to 97%, 70% to 99%, 75% to 98%, or 69% to 100% homology or identity to SEQ ID NO: 83. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 84%, at least 87%, at least 90%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 84, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 84% to 97%, 86% to 99%, 85% to 98%, or 84% to 100% homology or identity to SEQ ID NO: 84. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 72%, at least 75%, at least 85%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 85, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 72% to 99%, 74% to 98%, 78% to 99%, or 72% to 100% homology or identity to SEQ ID NO: 85. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 69%, at least 75%, at least 85%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 86, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 69% to 99%, 70% to 98%, 75% to 99%, or 69% to 100% homology or identity to SEQ ID NO: 86. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 71%, at least 75%, at least 85%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 87, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 75% to 99%, 74% to 98%, 78% to 99%, or 71% to 100% homology or identity to SEQ ID NO: 87. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 74%, at least 79%, at least 85%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 88, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 74% to 99%, 78% to 98%, 81% to 99%, or 74% to 100% homology or identity to SEQ ID NO: 88. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 75%, at least 85%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 102, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 75% to 99%, 76% to 98%, or 75% to 100% homology or identity to SEQ ID NO: 102. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 76%, at least 85%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 103, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 76% to 99%, 80% to 98%, or 76% to 100% homology or identity to SEQ ID NO: 103. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 77%, at least 85%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 104, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 77% to 99%, 79% to 98%, or 77% to 100% homology or identity to SEQ ID NO: 104. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 88%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 105, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 88% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 105. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 90%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 106, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 90% to 100%, 93% to 100%, 95% to 100%, or 97% to 100% homology or identity to SEQ ID NO: 106. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 77%, at least 85%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 107, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 77% to 100%, 79% to 100%, 80% to 100%, or 90% to 100% homology or identity to SEQ ID NO: 107. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 73%, at least 85%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 108, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 73% to 100%, 77% to 100%, 85% to 100%, or 90% to 100% homology or identity to SEQ ID NO: 108. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 81%, at least 90%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 109, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 81% to 100%, 85% to 100%, 87% to 100%, or 91% to 100% homology or identity to SEQ ID NO: 109. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 74%, at least 85%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 110, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 74% to 100%, 79% to 100%, 85% to 100%, or 90% to 100% homology or identity to SEQ ID NO: 110. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 76%, at least 85%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 111, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 76% to 100%, 80% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 111. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 81%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 112, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 81% to 100%, 85% to 100%, 90% to 100%, or 93% to 100% homology or identity to SEQ ID NO: 112. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 71%, at least 85%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 113, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 71% to 100%, 77% to 100%, 85% to 100%, or 90% to 100% homology or identity to SEQ ID NO: 113. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 78%, at least 85%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 114, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 78% to 100%, 85% to 100%, 90% to 100%, or 93% to 100% homology or identity to SEQ ID NO: 114. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 87%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 130, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 87% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 130. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 72%, at least 80%, at least 89%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 131, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 72% to 100%, 80% to 100%, or 90% to 100% homology or identity to SEQ ID NO: 131. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 90%, at least 92%, at least 95%, at least 97%, or at least 99% homology or identity to SEQ ID NO: 132, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 90% to 100%, 95% to 100%, or 97% to 100% homology or identity to SEQ ID NO: 132. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 86%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 133, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 86% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 133. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 59%, at least 65%, at least 75%, at least 85%, at least 90%, or at least 99% homology or identity to SEQ ID NO: 134, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 59% to 100%, 70% to 100%, 80% to 100%, 90% to 100%, 95% to 100%, or 97% to 100% homology or identity to SEQ ID NO: 134. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 71%, at least 80%, at least 90%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 135, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 71% to 100%, 80% to 100%, 87% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 135. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 88%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 136, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 88% to 100%, 92% to 100%, 97% to 100%, or 99% to 100% homology or identity to SEQ ID NO: 136. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 91%, at least 93%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 137, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 91% to 100%, 93% to 100%, 95% to 100%, or 97% to 100% homology or identity to SEQ ID NO: 137. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 73%, at least 85%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 138, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 73% to 100%, 80% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 138. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 83%, at least 88%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 139, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 83% to 100%, 88% to 100%, 94% to 100%, or 97% to 100% homology or identity to SEQ ID NO: 139. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 76%, at least 84%, at least 92%, or at least 99% homology or identity to SEQ ID NO: 140, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 76% to 100%, 83% to 100%, 90% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 140. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 60%, at least 70%, at least 80%, at least 90%, or at least 95% homology or identity to SEQ ID NO: 141, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 60% to 100%, 70% to 100%, 80% to 100%, or 90% to 100% homology or identity to SEQ ID NO: 141. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 85%, at least 89%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 142, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 85% to 100%, 90% to 100%, 93% to 100%, or 96% to 100% homology or identity to SEQ ID NO: 142. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 82%, at least 85%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 143, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 82% to 100%, 85% to 100%, 90% to 100%, or 93% to 100% homology or identity to SEQ ID NO: 143. Each possibility represents a separate embodiment of the invention.
In some embodiments, the protein comprises or consists of the amino acid sequence:
In some embodiments, the protein comprises an amino acid sequence with at least 88%, at least 92%, at least 95%, or at least 99% homology or identity to SEQ ID NO: 144, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the protein comprises an amino acid sequence with 88% to 100%, 90% to 100%, 93% to 100%, or 95% to 100% homology or identity to SEQ ID NO: 144. Each possibility represents a separate embodiment of the invention.
In some embodiments, a protein comprising an amino acid sequence set forth in SEQ ID Nos.: 12-22, is an AAE.
In some embodiments, a protein comprising an amino acid sequence set forth in SEQ ID Nos.: 27-30, is a PKS.
In some embodiments, a protein comprising an amino acid sequence set forth in SEQ ID Nos.: 39-46, is a PKC.
In some embodiments, a protein comprising an amino acid sequence set forth in SEQ ID Nos.: 59-70, is a PT.
In some embodiments, a protein comprising an amino acid sequence set forth in SEQ ID Nos.: 80-88, is a CBCAS.
In some embodiments, a protein comprising an amino acid sequence set forth in SEQ ID Nos.: 102-114, is a UGT.
In some embodiments, a protein comprising an amino acid sequence set forth in SEQ ID Nos.: 130-144, is a AAT.
The terms “homology” or “identity”, as used interchangeably herein, refer to sequence identity between two amino acid sequences or two nucleic acid sequences, with identity being a stricter comparison. The phrases “percent identity or homology” and “% identity or homology” refer to the percentage of sequence identity found in a comparison of two or more amino acid sequences or nucleic acid sequences. Two or more sequences can be anywhere from 0-100% identical, or any value there between. Identity can be determined by comparing a position in each sequence that can be aligned for purposes of comparison to a reference sequence. When a position in the compared sequence is occupied by the same nucleotide base or amino acid, then the molecules are identical at that position. The degree of identity of amino acid sequences is a function of the number of identical amino acids at positions shared by the amino acid sequences. A degree of identity between nucleic acid sequences is a function of the number of identical or matching nucleotides at positions shared by the nucleic acid sequences. A degree of homology of amino acid sequences is a function of the number of amino acids at positions shared by the polypeptide sequences.
The following is a non-limiting example for calculating homology or sequence identity between two sequences (the terms are used interchangeably herein). The sequences are aligned for optimal comparison purposes (e.g., gaps can be introduced in one or both of a first and a second amino acid or nucleic acid sequence for optimal alignment and non-homologous sequences can be disregarded for comparison purposes). The optimal alignment is determined as the best score using the GAP program in the GCG software package with a Blossum 62 scoring matrix with a gap penalty of 12, a gap extend penalty of 4, and a frame shift gap penalty of 5. The amino acid residues or nucleotides at corresponding amino acid positions or nucleotide positions are then compared. When a position in the first sequence is occupied by the same amino acid residue or nucleotide as the corresponding position in the second sequence, then the molecules are identical at that position. The percentage identity between the two sequences is a function of the number of identical positions shared by the sequences.
In some embodiments, % homology or identity as described herein are calculated or determined using the basic local alignment search tool (BLAST). In some embodiments, % homology or identity as described herein are calculated or determined using Blossum 62 scoring matrix.
In some embodiments, the protein comprises or is characterized by acyl activating enzymatic activity.
In some embodiments, an acyl is selected from: C1-C8 alkyl chain, and alpha-unsaturated phenylalkyl carboxylic acid.
In some embodiments, an acyl is a C1 alkyl chain. In some embodiments, an acyl is a C2 alkyl chain. In some embodiments, an acyl is a C3 alkyl chain. In some embodiments, an acyl is a C4 alkyl chain. In some embodiments, an acyl is a C5 alkyl chain. In some embodiments, an acyl is a C6 alkyl chain. In some embodiments, an acyl is a C7 alkyl chain. In some embodiments, an acyl is a C8 alkyl chain.
In some embodiments, a C1-C8 alkyl chain is hexanoic acid. In some embodiments, an acyl is hexanoic acid.
In some embodiments, an alpha-unsaturated phenylalkyl carboxylic acid comprises cinnamic acid or a derivative thereof.
In some embodiments, a cinnamic acid derivative comprises a hydroxylated derivative of cinnamic acid.
In some embodiments, a hydroxylated derivative of cinnamic acid comprises or is coumaric acid.
In some embodiments, the protein comprises or is characterized by polyketide synthesizing activity, as described herein. In some embodiments, the protein is characterized by having an activity of polymerizing a diketide substrate into a polyketide.
In some embodiments, a diketide substrate is obtained by coupling of an acyl CoA starting unit.
In some embodiments, an acyl CoA starting unit is selected from: acetyl COA, butyryl CoA, hexanoyl CoA, octanoyl CoA, cinnamoyl CoA, coumaroyl CoA, or any combination thereof.
In some embodiments, an acyl CoA is or comprises hexanoyl CoA, cinnamoyl CoA, or both.
In some embodiments, an acyl CoA is hexanoyl CoA.
In some embodiments, a polyketide comprises a tetraketide. In some embodiments, a polyketide comprises a linear polyketide. In some embodiments, a polyketide comprises a linear tetraketide.
In some embodiments, the protein comprises or is characterized by polyketide cyclization or cyclizing activity, as described herein. In some embodiments, the protein is characterized by having an activity of cyclizing a polyketide.
In some embodiments, polyketide cyclization comprises aldol cyclization, Claisen cyclization, or both.
In some embodiments, a polyketide comprises an acyl group, as described herein.
In some embodiments, the protein comprises or is characterized by prenyl transferring activity, as described herein. In some embodiments, the protein is characterized by being capable of transferring a prenyl group to a substrate molecule. In some embodiments, the protein is characterized by being capable of transferring an allylic prenyl group to an acceptor molecule. In some embodiments, the protein is a prenyl diphosphate synthase. In some embodiments, the protein is a trans-prenyltransferase. In some embodiments, the protein is a cis-prenyltransferase.
In some embodiments, the prenyl group is selected from: dimethylallyl diphosphate, geranyl diphosphate, farnesyl diphosphate, or geranylgeranyl diphosphate.
In some embodiments, the protein is characterized by being capable of synthesizing a compound represented by Formula I:
wherein: (i) R1 is selected from: C1-C8 alkyl, an alpha-unsaturated phenylalkyl carboxylic acid, or an alpha saturated phenylalkyl carboxylic acid; and R2 is OH; or (ii) R1 is OH and R2 is selected from: C1-C8 alkyl, an alpha-unsaturated phenylalkyl carboxylic acid, or an alpha saturated phenylalkyl carboxylic acid.
In some embodiments, the compound is represented by a formula selected from:
wherein R3 is C1-C8 alkyl, and wherein R4 is alpha-unsaturated phenylalkyl carboxylic acid.
In some embodiments, the compound is selected from the group:
In some embodiments, the compound is:
In some embodiments, the protein is characterized by cannabigerolic acid (CBGA) cyclization or cyclizing activity. In some embodiments, cycling activity comprises cyclization of CBGA to CBCA. In some embodiments, the protein is characterized by being capable of cyclizing or cyclization of CBGA to CBCA. In some embodiments, the protein is characterized by being capable of synthesizing CBCA or being a CBCA synthase (CBCAS).
In some embodiments, the protein is characterized by being capable of transferring a glucuronic acid component of UDP-glucuronic acid to a cannabinoid or precursor thereof.
In some embodiments, the protein is characterized by being capable of transferring an acyl group from a donor molecule to the cannabinoid.
According to some embodiments, there is provided a transgenic cell comprising: (a) the DNA molecule disclosed herein; (b) the artificial nucleic acid molecule disclosed herein; (c) the plasmid or agrobacterium disclosed herein; (d) the protein disclosed herein; or any combination thereof.
In some embodiments, the cell further comprises a nucleic acid sequence encoding at least one enzyme related to cannabinoidogenesis derived from Cannabis sativa. In some embodiments, the at least one enzyme related to cannabinoidogenesis derived C. sativa is selected from: olivetol synthase (OLS), olivetolic acid cyclase (OAC), prenyltransferase 1 (PT1/GOT1), PT4/GOT4, or any combination thereof.
In some embodiments, the at least one enzyme related to cannabinoidogenesis derived C. sativa is selected from: OLS, OAC, or both.
As used herein, the term “transgenic cell” refers to any cell that has undergone human manipulation on the genomic or gene level. In some embodiments, the transgenic cell has had exogenous polynucleotide, such as the DNA molecule as disclosed herein, introduced into it. In some embodiments, a transgenic cell comprises a cell that has an artificial vector introduced into it. In some embodiments, a transgenic cell is a cell which has undergone genome mutation or modification. In some embodiments, a transgenic cell is a cell that has undergone CRISPR genome editing. In some embodiments, a transgenic cell is a cell that has undergone targeted mutation of at least one base pair of its genome. In some embodiments, the exogenous polynucleotide (e.g., the DNA molecule disclosed herein) or vector is stably integrated into the cell. In some embodiments, the transgenic cell expresses a polynucleotide of the invention. In some embodiments, the transgenic cell expresses a vector of the invention. In some embodiments, the transgenic cell expresses a protein of the invention. In some embodiments, the transgenic cell, is a cell that is devoid of a polynucleotide of the invention that has been transformed or genetically modified to include the polynucleotide of the invention. In some embodiments, CRISPR technology is used to modify the genome of the cell, as described herein.
In some embodiments, the cell is a unicellular organism, a cell of a multicellular organism, and a cell in a culture.
In some embodiments, a unicellular organism comprises a fungus or a bacterium.
In some embodiments, the fungus is a yeast cell.
In some embodiments, the cell is an insect cell. In some embodiments, the cell comprises an insect cell line.
Types of insect cell lines suitable for transformation and/or heterologous expression are common and would be apparent to one of ordinary skill in the art. Non-limiting examples of such insect cell lines include, but are not limited to, Sf-9 cells, SR+ Schneider cells, S2 cells, and others.
According to some embodiments, there is provided an extract derived from a transgenic cell disclosed herein, or any fraction thereof.
In some embodiments, the extract comprises the DNA molecule disclosed herein, a protein as disclosed herein, or any combination thereof.
According to some embodiments, there is provided a homogenate, lysate, extract, derived from a transgenic cell disclosed herein, any combination thereof, or any fraction thereof.
Methods and/or means for extracting, lysing, homogenizing, fractionating, or any combination thereof, a cell or a culture of same, are common and would be apparent to one of ordinary skill in the art of cell biology and biochemistry. Non-limiting examples include, but are not limited to, pressure lysis (e.g., such as using a French press), enzymatic lysis, soluble-insoluble phase separation (such for obtaining a supernatant and a pellet), detergent-based lysis, solvent (e.g., polar, or nonpolar solvent), liquid chromatography mass spectrometry, or others.
According to some embodiments, there is provided a transgenic plant, a transgenic plant tissue or a plant part. In some embodiments, there is provided a transgenic plant, or any portion, seed, tissue, or organ thereof, comprising at least one transgenic plant cell of the invention. In some embodiments, the transgenic plant, transgenic plant tissue or plant part, comprises: (a) the DNA molecule disclosed herein; (b) the artificial disclosed herein; (c) the plasmid or agrobacterium disclosed herein; (d) the protein of the invention; (e) the transgenic cell disclosed herein; or any combination thereof.
In some embodiments, the transgenic plant, transgenic plant tissue, or plant part consists of transgenic plant cells of the invention. In some embodiments, the transgenic plant, transgenic plant tissue, or plant part comprises at least: 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, or 99% transgenic cells of the invention, or any value and range therebetween. Each possibility represents a separate embodiment of the invention. In some embodiments, the transgenic plant, transgenic plant tissue, or plant part comprises 20%-50%, 20%-60%, 20%-70%, 20%-80%, 20%-90%, or 20%-100% transgenic cells of the invention. Each possibility represents a separate embodiment of the invention.
In some embodiments, the transgenic plant, transgenic plant tissue, or plant part is or derived from a Cannabis sativa plant. In some embodiments, the transgenic plant is a C. sativa plant.
In some embodiments, the transgenic plant, transgenic plant tissue, or plant part is or derived from hemp. In some embodiments, C. sativa comprises or is hemp.
According to some embodiments, there is provided a composition comprising any one of the herein disclosed: (a) the DNA molecule of the invention; (b) artificial vector; (c) plasmid or agrobacterium; (d) protein of the invention; (e) transgenic cell; (f) extract; (g) transgenic plant tissue or plant part; and (h) any combination of (a) to (g), and an acceptable carrier.
As used herein, the term “carrier”, “excipient”, or “adjuvant” refers to any component of a composition, e.g., pharmaceutical or nutraceutical, that is not the active agent. As used herein, the term “pharmaceutically acceptable carrier” refers to non-toxic, inert solid, semi-solid liquid filler, diluent, encapsulating material, formulation auxiliary of any type, or simply a sterile aqueous medium, such as saline. Some examples of the materials that can serve as pharmaceutically acceptable carriers are sugars, such as lactose, glucose and sucrose, starches such as corn starch and potato starch, cellulose and its derivatives such as sodium carboxymethyl cellulose, ethyl cellulose and cellulose acetate; powdered tragacanth; malt, gelatin, talc; excipients such as cocoa butter and suppository waxes; oils such as peanut oil, cottonseed oil, safflower oil, sesame oil, olive oil, corn oil and soybean oil; glycols, such as propylene glycol, polyols such as glycerin, sorbitol, mannitol and polyethylene glycol; esters such as ethyl oleate and ethyl laurate, agar; buffering agents such as magnesium hydroxide and aluminum hydroxide; alginic acid; pyrogen-free water; isotonic saline, Ringer's solution; ethyl alcohol and phosphate buffer solutions, as well as other non-toxic compatible substances used in pharmaceutical formulations. Some non-limiting examples of substances which can serve as a carrier herein include sugar, starch, cellulose and its derivatives, powered tragacanth, malt, gelatin, talc, stearic acid, magnesium stearate, calcium sulfate, vegetable oils, polyols, alginic acid, pyrogen-free water, isotonic saline, phosphate buffer solutions, cocoa butter (suppository base), emulsifier (e.g. carbomer, hydroxypropyl cellulose, sodium lauryl sulfate) as well as other non-toxic pharmaceutically compatible substances used in other pharmaceutical formulations. Wetting agents and lubricants such as sodium lauryl sulfate, as well as coloring agents, flavoring agents, excipients, stabilizers, antioxidants, and preservatives may also be present. Any non-toxic, inert, and effective carrier may be used to formulate the compositions contemplated herein. Suitable pharmaceutically acceptable carriers, excipients, and diluents in this regard are well known to those of skill in the art, such as those described in The Merck Index, Thirteenth Edition, Budavari et al., Eds., Merck & Co., Inc., Rahway, N.J. (2001); the CTFA (Cosmetic, Toiletry, and Fragrance Association) International Cosmetic Ingredient Dictionary and Handbook, Tenth Edition (2004); and the “Inactive Ingredient Guide,” U.S. Food and Drug Administration (FDA) Center for Drug Evaluation and Research (CDER) Office of Management, the contents of all of which are hereby incorporated by reference in their entirety. Examples of pharmaceutically acceptable excipients, carriers, and diluents useful in the present compositions include distilled water, physiological saline, Ringer's solution, dextrose solution, Hank's solution, and DMSO. These additional inactive components, as well as effective formulations and administration procedures, are well known in the art and are described in standard textbooks, such as Goodman and Gillman's: The Pharmacological Bases of Therapeutics, 8th Ed., Gilman et al. Eds. Pergamon Press (1990); Remington's Pharmaceutical Sciences, 18th Ed., Mack Publishing Co., Easton, Pa. (1990); and Remington: The Science and Practice of Pharmacy, 21st Ed., Lippincott Williams & Wilkins, Philadelphia, Pa., (2005), each of which is incorporated by reference herein in its entirety. The presently described composition may also be contained in artificially created structures such as liposomes, ISCOMS, slow-releasing particles, and other vehicles which increase the half-life of the peptides or polypeptides in serum. Liposomes include emulsions, foams, micelles, insoluble monolayers, liquid crystals, phospholipid dispersions, lamellar layers, and the like. Liposomes for use with the presently described peptides are formed from standard vesicle-forming lipids which generally include neutral and negatively charged phospholipids and sterol, such as cholesterol. The selection of lipids is generally determined by considerations such as liposome size and stability in the blood. A variety of methods are available for preparing liposomes as reviewed, for example, by Coligan, J. E. et al, Current Protocols in Protein Science, 1999, John Wiley & Sons, Inc., New York, and see also U.S. Pat. Nos. 4,235,871, 4,501,728, 4,837,028, and 5,019,369.
The carrier may comprise, in total, from about 0.1% to about 99.99999% by weight of the pharmaceutical compositions presented herein.
According to some embodiments, there is provided a method for synthesizing a cannabinoid, a precursor thereof, or any combination thereof.
According to some embodiments, there is provided a method for synthesizing acyl coenzyme A (CoA), polyketide, a compound represented by Formula I, a compound represented by Formula II, a cannabinoid, or any combination thereof.
In some embodiments, the method further comprises glycosylating a compound represented by Formula I, a compound represented by Formula II, a cannabinoid, or any combination thereof. In some embodiments, the method further comprises transferring an acyl group to a compound represented by Formula I, a compound represented by Formula II, a cannabinoid, or any combination thereof.
As used herein, the term “cannabinoid” or “cannabinoids” refer to a heterogeneous family of molecules usually exhibiting pharmacological properties by interacting with specific receptors. To date, two membrane receptors for cannabinoids, both coupled to G protein and named CB1 and CB2 have been identified. While CB1 receptors are mainly expressed in the central and peripheral nervous system, CB2 receptors have been reported to be more abundantly detected in cells of the immune system.
In some embodiments, the cannabinoid comprises any compound as presented in
According to some embodiments, the method comprises the steps: (a) providing a transgenic cell or a cell transfected with the DNA molecule of the invention or the artificial nucleic acid molecule disclosed herein; and (b) culturing the transgenic cell the transfected cell from step (a) such that at least a first protein and a second protein encoded by DNA molecule or the artificial nucleic acid molecule are expressed, thereby synthesizing the cannabinoid, a precursor thereof, or any combination thereof.
In some embodiments, the precursor is selected from: acyl coenzyme A (CoA), a polyketide, a resorcinoid precursor, or any combination thereof.
In some embodiments, the resorcinoid precursor is olivetolic acid.
In some embodiments, the cannabinoid comprises or is CBGA, CBCA, or both.
According to some embodiments, there is provided a method for obtaining an extract from a transgenic cell or a transfected cell.
In some embodiments, the method comprises culturing a transgenic cell or a transfected cell in a medium and extracting the transgenic cell or the transfected cell.
In some embodiments, the method comprises the steps: (a) culturing a transgenic cell or a transfected cell in a medium; and (b) extracting the transgenic cell or the transfected cell, thereby obtaining an extract from the transgenic cell or the transfected cell.
In some embodiments, the transgenic cell or the transfected cell comprises the DNA molecule of the invention or a plurality thereof, as disclosed herein.
In some embodiments, the transgenic cell or the transfected cell comprises the artificial nucleic acid molecule or vector as disclosed herein.
In some embodiments, the cell is a transgenic cell, or a cell transfected with a DNA molecule as disclosed herein.
In some embodiments, the method further comprises a step preceding step (a), comprising introducing or transfecting the cell with the artificial nucleic acid molecule or vector, disclosed herein.
Method for introducing or transfecting a cell with an artificial nucleic acid molecule or vector are common and would be apparent to one of ordinary skill in the art.
In some embodiments, introducing or transfecting comprises transferring an artificial nucleic acid molecule or vector comprising the DNA molecule disclosed herein into a cell; or modifying the genome of a cell to include the polynucleotide disclosed herein. In some embodiments, the transferring comprises transfection. In some embodiments, the transferring comprises transformation. In some embodiments, the transferring comprises lipofection. In some embodiments, the transferring comprises nucleofection. In some embodiments, the transferring comprises viral infection.
As used herein, the terms “transfecting” and “introducing” are interchangeable.
In some embodiments, the contacting is in a cell-free system.
Types of suitable cell-free systems for expression and/or synthesis utilizing any one of: the DNA molecule of the invention or a plurality thereof, as disclosed herein, and the protein of the invention, or a plurality thereof, would be apparent to one of ordinary skill in the art.
In some embodiments, the method further comprises a step preceding step (b), comprising separating the cultured transgenic cell or the cultured transfected cell from the medium.
Method for separating cell from a medium are common and may include, but not limited to, centrifugation, ultracentrifugation, or other, as would be apparent to a skilled artisan.
According to some embodiments, there is provided an extract of a transgenic cell, or a transfected cell obtained according to the herein disclosed method.
In some embodiments, the extract comprises a cannabinoid, a precursor thereof, or any combination thereof.
In some embodiments, the extract comprises CBGA, CBCA, or both.
According to some embodiments, there is provided a medium or a portion thereof separated from a cultured transgenic cell or a cultured transfected cell, obtained according to the herein disclosed method.
According to some embodiments, there is provided a composition comprising: (a) the extract disclosed herein; (b) the medium disclosed herein or a portion thereof; or (c) any combination of (a) and (b), and an acceptable carrier, as described herein.
In some embodiments, a portion comprises a fraction or a plurality thereof.
Where a range of values is provided, it is understood that each intervening value, to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limit of that range and any other stated or intervening value in that stated range, is encompassed within the invention. The upper and lower limits of these smaller ranges may independently be included in the smaller ranges, and are also encompassed within the invention, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the invention.
As used herein, the term “about” when combined with a value refers to plus and minus 10% of the reference value. For example, a length of about 1,000 nanometers (nm) refers to a length of 1,000 nm±100 nm.
It is noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to “a polynucleotide” includes a plurality of such polynucleotides and reference to “the polypeptide” includes reference to one or more polypeptides and equivalents thereof known to those skilled in the art, and so forth. It is further noted that the claims may be drafted to exclude any optional element. As such, this statement is intended to serve as antecedent basis for use of such exclusive terminology as “solely,” “only” and the like in connection with the recitation of claim elements or use of a “negative” limitation.
In those instances where a convention analogous to “at least one of A, B, and C, etc.” is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., “a system having at least one of A, B, and C” would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.). It will be further understood by those within the art that virtually any disjunctive word and/or phrase presenting two or more alternative terms, whether in the description, claims, or drawings, should be understood to contemplate the possibilities of including one of the terms, either of the terms, or both terms. For example, the phrase “A or B” will be understood to include the possibilities of “A” or “B” or “A and B”.
It is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable sub-combination. All combinations of the embodiments pertaining to the invention are specifically embraced by the present invention and are disclosed herein just as if each and every combination was individually and explicitly disclosed. In addition, all sub-combinations of the various embodiments and elements thereof are also specifically embraced by the present invention and are disclosed herein just as if each and every such sub-combination was individually and explicitly disclosed herein.
Additional objects, advantages, and novel features of the present invention will become apparent to one ordinarily skilled in the art upon examination of the following examples, which are not intended to be limiting. Additionally, each of the various embodiments and aspects of the present invention as delineated hereinabove and as claimed in the claims section below finds experimental support in the following examples.
Various embodiments and aspects of the present invention as delineated hereinabove and as claimed in the claims section below find experimental support in the following examples.
Generally, the nomenclature used herein, and the laboratory procedures utilized in the present invention include molecular, biochemical, microbiological, and recombinant DNA techniques. Such techniques are thoroughly explained in the literature. See, for example, “Molecular Cloning: A laboratory Manual” Sambrook et al., (1989); “Current Protocols in Molecular Biology” Volumes I-III Ausubel, R. M., ed. (1994); Ausubel et al., “Current Protocols in Molecular Biology”, John Wiley and Sons, Baltimore, Maryland (1989); Perbal, “A Practical Guide to Molecular Cloning”, John Wiley & Sons, New York (1988); Watson et al., “Recombinant DNA”, Scientific American Books, New York; Birren et al. (eds) “Genome Analysis: A Laboratory Manual Series”, Vols. 1-4, Cold Spring Harbor Laboratory Press, New York (1998); methodologies as set forth in U.S. Pat. Nos. 4,666,828; 4,683,202; 4,801,531; 5,192,659 and 5,272,057; “Cell Biology: A Laboratory Handbook”, Volumes I-III Cellis, J. E., ed. (1994); “Culture of Animal Cells-A Manual of Basic Technique” by Freshney, Wiley-Liss, N. Y. (1994), Third Edition; “Current Protocols in Immunology” Volumes I-III Coligan J. E., ed. (1994); Stites et al. (eds), “Basic and Clinical Immunology” (8th Edition), Appleton & Lange, Norwalk, CT (1994); Mishell and Shiigi (eds), “Strategies for Protein Purification and Characterization-A Laboratory Course Manual” CSHL Press (1996); all of which are incorporated by reference. Other general references are provided throughout this document.
Unless otherwise stated, all the analytical metabolites were >95% pure. CBGA 1, CBCA 15, CBDA, acetic acid, propionic acid, butyric acid, pentanoic acid, hexanoic acid, heptanoic acid, octanoic acid, ±2-methyl butyric acid, phenylalanine, hexanoic-D11 acid (D>98%), GPP, IPP, FPP, phloretin 98, naringenin 96, malonyl-CoA (≥90%), acetyl-CoA (≥93%), butyryl-CoA (≥90%), hexanoyl-CoA (≥85%), octanoyl-CoA, iso-valeryl CoA (≥90%), olivetol and sodium hexnoate were purchased from Sigma-Aldrich (Rehovot, Israel). Δ9-THCA was purchased from Silicol Scientific Equipment Ltd. (Or Yehuda, Israel). Acetic-D3 acid (D>99%), propionic-D5 acid (D>99%), butyric-D5 acid (D>98%), pentanoic-D9 acid (D>98%), heptanoic-D5 acid (D>99%), octanoic-D5 acid (D>99%), iso-butyric-D7 acid (D>98%), ±2-methyl butyric-D9 acid (D>99%), iso-valeric-D9 acid (D>98%), iso-caproic-D11 acid (D>98%) were purchased from C/D/N isotopes (Quebec, Canada). Phenylalanine-D5 (D>98%) and phenylalanine-13C9,15N1 (13C,15N>99%) were synthesized by Cambridge Isotope Laboratories (Andover, MA). HeliCBGA 2 (NP009525, 90%) was purchased from Analyticon Discovery GmbH (Potsdam, Germany). APHA 3 was reported as an impurity (NP015136, 5%) in the heliCBGA analytical metabolite. OA 92 (>90%), VA (>90%) and iso-butyryl-CoA were purchased from Cayman Chemical (Ann Arbor, MI, USA). PCP 95, naringenin chalcone 97 and pinocembrin chalcone 100 were purchased from Wuhan ChemFaces Biochemical Co Ltd. (Hubei, China). Cinnamoyl-CoA and Coumaroyl-CoA were purchased from TransMIT GmbH (Hesse, Germany).
Seeds of H. umbraculigerum (Silverhill seeds, Cape Town, South Africa) were germinated, and grown in a greenhouse in a long-day photoperiod. Plants were propagated by cuttings.
All feeding solutions were prepared as aqua solutions of 0.5 mg ml−1 of the precursor. The pH of the FA solutions was adjusted to 5.5-6.0. The phenylalanine feeding experiments were performed on leaves from young mother plants excised by cutting at the proximal side of the pedicel with scissors under water, leaving attached 1-2 cm of the pedicel. For the FA feeding experiments, 10 cm young cuttings were obtained from mother plants. The lower leaves were removed leaving 4-5 leaves on each stem, and the stem was peeled to increase the intake of the labeled solutions. Three to four leaves or the young cuttings were immersed in aqua solutions [DDW (control), unlabeled or labeled precursors, each group consisted of a minimum of three biological replicates]. All feeding experiments were performed in a controlled environment for 48-96 h under 25° C. and constant fluorescent illumination and humidity and the tubes were periodically refilled. Upon termination, the fresh leaves were rinsed with a small amount of water, dried gently, flash frozen and stored at −80° C. for extraction.
Unless otherwise stated, 100 mg frozen powdered plant tissue were extracted with 300 μl ethanol, sonicated for 15 min, agitated for 30 min and centrifuged at 14,000 g for 10 min. The supernatant was filtered through a 0.22 μm syringe filter and analyzed in the obtained concentration. Detection was performed using both targeted and non-targeted approaches as described in Berman et al., “Parallel evolution of cannabinoid biosynthesis”; Nature Plants 9 817-831 (2023) using an ultrahigh-performance liquid chromatography-tandem quadrupole time-of-flight (UPLC-qTOF) system comprised of a UPLC (Waters Acquity) with a diode array detector connected either to a XEVO G2-S QTof (Waters) or to Synapt HDMS (Waters). The chromatographic separation was performed on a 100 mm×2.1 mm i.d. (internal diameter), 1.7 μm UPLC BEH C18 column (Waters Acquity). The mobile phase consisted of 0.1% formic acid in acetonitrile:water (5:95, v/v; phase A) and 0.1% formic acid in acetonitrile (phase B). Terpenophenols were analyzed using UPLC Method 1 as follows: Initial conditions were 40% B for 1 min, raised to 100% B until 23 min, held at 100% B for 3.8 min, decreased to 40% B until 27 min, and held at 40% B until 29 min for re-equilibration of the system. The flow rate was 0.3 ml min−1, and the column temperature was kept at 35° C. Intermediates and glucosylated metabolites were analyzed using UPLC Method 2 as follows: Initial conditions were from 0% to 28% B over 22 min, raised to 100% B until 36 min, held at 100% B for 2 min, decreased to 0% B until 38.5 min, and held at 40% B until 40 min for re-equilibration of the system. The flow rate was 0.3 ml min−1, and the column temperature was kept at 35° C. Electrospray ionization (ESI) was used in either positive or negative ionization modes at an m/z range of 50-1,000 Da. Masses were detected with the following settings: capillary 1 kV, source temperature 140° C., desolvation temperature 450° C., and desolvation gas flow 800 1 h−1. Argon was used as the collision gas. The MS system was calibrated with sodium formate and Leu encephalin was used as the lock mass. Data acquisition for untargeted analysis was performed in negative ionization using the MSE mode. The collision energy was set to 4 eV for the low-energy function and to 15-50 eV ramp for the high-energy function. The R package Miso was run as previously described. Differential metabolites were selected if the fold change was greater or equal to 10 and the p-value was less than 0.05. MS/MS experiments were performed in positive or negative ionization modes according to the specific protonated or deprotonated masses with following settings: capillary spray of 1 kV; cone voltage of 30 eV; collision energy ramps were 10-45 eV for positive mode and 15-50 eV for negative mode.
Fresh samples of leaves (dark and light), flowers, stems and roots were collected from a plant at the flowering stage. Florets and the receptacle of flowers were detached using a scalpel and analyzed separately. All tissues were flash frozen in liquid N2 and ground into fine powder. To measure CBGA 1 content in a dry tissue, fresh leaves were flash frozen, ground and lyophilized. For the extraction, 100 mg of the frozen powders were accurately weighed in triplicates, extracted with 1 ml ethanol, and prepared as previously described. Samples were injected in several dilutions to fit into the linear range of the calibration curves. Injections were performed on a UPLC (Waters) connected to a Triple Quad detector (TQ-S, Waters) in multiple reaction monitoring (MRM) mode. The system was operated with a similar column and mobile phase as for UPLC-qTOF analysis as follows: Initial conditions were 57% B raised to 85% B until 4 min, raised to 100% B until 4.2 min, held at 100% B until 6 min, decreased to 67% B until 6.2 min, and held at 67% B until 7 min for re-equilibration of the system. The flow rate was 0.6 ml min−1, and the column temperature was kept at 40° C. The instrument was operated in negative mode with a capillary voltage of 1.5 kV and a cone voltage of 40 V. Absolute quantification of CBGA 1 was performed by external calibration using two different transitions (359.3>191.2, 32 V for quantification; and 359.3>315.4, 21 V for qualification).
A total of 86 g of fresh leaves were flash frozen in liquid N2 and ground into fine powder using an electrical grinder, extracted with 600 ml ethanol, sonicated for 20 min, and agitated for 30 min. The supernatant was filtered, evaporated using a rotary evaporator at 40° C. and lyophilized. The extract was reconstituted in 25 ml acetonitrile and used for either direct purification (following ten times dilution) or prefractionation via medium pressure liquid chromatography (MPLC). The Büchi Sepacore MPLC System was equipped with two C-605 pump modules, a C-620 control unit, C-660 fraction collector, C-640 UV photometer (Büchi Labortechnik AG, Switzerland), and a C18 manually packed column. The mobile phase consisted of acetonitrile:water (5:95, v/v; phase A) and acetonitrile (phase B), with the following multistep gradient method: initial conditions were 0% B for 10 min, raised to 99% B until 530 min, and slowly raised to 100% B until 660 min. The flow rate was 15 ml min−1, the injection volume was 15 ml, and the wavelengths were: 210, 224, 270 and 350 nm. Fractions of 100 ml were collected throughout the run and analyzed by UPLC-qTOF to select specific metabolites for purification. The selected fractions were evaporated using a rotary evaporator at 40° C., lyophilized, reconstituted in ethanol or methanol (only for the fraction with Glc-OA 102 and Glc-DHSA 103), and filtered through a 0.22 μm syringe filter. Purification of metabolites was performed on either an Agilent 1290 Infinity II UPLC system (System 1, the general instrument setup was according to Jozwiak et al. 2020); or a UPLC system (Waters Acquity) equipped with a binary pump, an autosampler, a fraction manager and a diode array detector (System 2) with similar mobile phase as for the UPLC-qTOF. Triggering was performed using specific UV wavelengths according to the metabolite.
In System 1, method development was performed by acquisition of both MS and UV signals. MS spectra were acquired in negative full scan mode between m/z 50 and 1,700. HPLC columns were either XBridge (BEH C18, 250× 4.6 mm i.d., 5 μm; Waters) or Luna (C18, 250× 4.6 mm i.d., 5 μm; Phenomenex), and the conditions were adjusted and optimized for each metabolite. In this system, the eluent with the metabolites of interest were mixed with a makeup-flow of 1.8 ml min−1 water and then trapped on solid phase extraction (SPE) cartridges (10×2 mm Hysphere resin GP cartridges). Each cartridge was loaded four times with the same metabolite, and 36-72 cartridges were used for trapping one metabolite, depending on the concentration of the sample injected. After collection, SPE cartridges were dried with a stream of N2, and eluted with 150 μl methanol. Eluents containing the same metabolite were pooled, dried under a stream of N2, and stored at −20° C. until NMR analysis. A UPLC BEH C18 column (100 mm×2.1 mm i.d., 1.7 μm; Waters) was used on System 2, apart from metabolites Glc-OA 102 and Glc-DHSA 103 which were fractionated on a Luna Phenyl-Hexyl column (150 mm×2 mm i.d., 3 μm; Phenomenex). The flow rate was 0.3 ml min−1, and the column temperature was kept at 35° C. All other conditions were adjusted and optimized according to the sample. The eluent with the metabolite of interest was collected in 2 ml HPLC vials. Eluents containing the same metabolite were pooled, dried under a stream of N2, lyophilized, and stored at −20° C. until NMR analysis.
Purified metabolites were resuspended in 300 μl of Methanol-d4, dried under a stream of N2, reconstituted in 70 μl Methanol-d4 with 0.01% of 3-(trimethylsilyl) propionic-2,2,3,3-d4 acid sodium salt (TMSP, used as an internal chemical shift reference for 1H and 13C) and transferred into 1.7 mm micro-NMR test tubes for structure elucidation. NMR spectra were collected on a Bruker AVANCE NEO-600 NMR spectrometer equipped with a 5 mm TCI-xyz CryoProbe. All spectra were acquired at 298 K. The structures of the different metabolites were determined by one dimensional (1D) 1H NMR spectra, as well as various two-dimensional (2D) NMR spectra: 1H-1H Correlation Spectroscopy (COSY), 1H-1H Total Correlation Spectroscopy (TOCSY), 1H-1H Rotating Frame Nuclear Overhauser Spectroscopy (ROESY), 1H-13C Heteronuclear Single Quantum Coherence (HSQC), and 1H-13C Heteronuclear Multiple Bond Correlation (HMBC) spectra.
One dimensional 1H NMR spectra were collected using 16,384 data points and a recycling delay of 2.5 s. Two-dimensional COSY, TOCSY and ROESY spectra were acquired using 16,384-8,192 (t2) by 400-512 (t1) data points. 2D TOCSY spectra were acquired using isotropic mixing times of 100-300 ms. A T-ROESY experiment was used in this study, TOCSY-less ROESY that effectively suppresses TOCSY transfer in ROESY experiments. T-ROESY spectra were recorded using spin lock pulses of 100-400 ms. 2D HSQC and 2D HMBC spectra were collected using 4,096 (t2) by 400-512 (t1) data points. Multiplicity editing HSQC enables differentiating between methyl and methine groups that give rise to positive correlation, versus methylene groups that appear as negative peaks. HMBC delay for evolution of long-range couplings was set to observe long-range couplings of JH,C=8 Hz. All data were processed and analyzed using TopSpin 4.1.1 software (Bruker).
For the peeling experiment, whole fresh leaves from a young plant were attached onto glass slides using double-sided tape with either the abaxial or adaxial surfaces, gently peeled above/below the midrib using duct tape and desiccated overnight under moderate vacuum. Images were taken using a digital camera. For localization of metabolites to individual trichomes, fresh leaves and flowers were sectioned, and matrix was sprayed as previously described. Sections were imaged with a Nikon DS-Ri2 microscope. MALDI imaging was performed using a 7 T Solarix FT-ICR (Fourier Transform Ion Cyclotron Resonance) mass spectrometer (Bruker Daltonics). The datasets were collected in positive ionization using lock mass calibration (DHB matrix peak: [3DHB+H-3H2O]+, m/z 409.055408 Da) at a frequency of 1 kHz and a laser power of 40%, with 200 laser shots per pixel and 50, 15 or 25 μm pixel size for the peeled trichomes and for the sectioned leaves and flowers, respectively. Each mass spectrum was recorded in the range of m/z 150-3,000 in broadband mode with a Time Domain for Acquisition of 1M, providing an estimated resolving power of 115,000 at m/z 400. The spectra were normalized to root-mean-square intensity and MALDI images were plotted at theoretical m/z±0.005% with pixel interpolation on.
For cryo scanning electron microscopy (cryo-SEM) analyses, frozen samples were attached to a holder either by mechanical clamping (leaves) or by a glue made of a concentrated PVP solution. The holder with the samples was then plunged frozen in liquid N2, transferred to a BAF 60 freeze fracture device (Leica Microsystems, Vienna, Austria) using a VCT 100 Vacuum Cryo Transfer device (Leica) and was sublimed for 30 min at-95° C. Samples were transferred to an Ultra 55 cryo-SEM (Zeiss, Germany) using a VCT 100 shuttle and were and observed at −95° C. without coating using mostly mixed mode of InLens+SE detectors at 1-1.3 kV. For transmission electron microscopy (TEM) analysis, H. umbraculigerum leaves were fixed with 4% paraformaldehyde, 2% glutaraldehyde in 0.1 M cacodylate buffer containing 5 mM CaCl2) (pH 7.4), then postfixed with 1% osmium tetroxide supplemented with 0.5% potassium hexacyanoferrate tryhidrate and potasssium dichromate in 0.1 M cacodylate (1 h), stained with 2% uranyl acetate in water (1 h), dehydrated in graded ethanol solutions and embedded in Agar 100 epoxy resin (Agar scientific Ltd., Stansted, UK). Ultrathin sections (70-90 nm) were viewed and photographed with a FEI Tecnai SPIRIT (FEI, Eidhoven, Netherlands) transmission electron microscope operated at 120 kV and equipped with an One View Gatan Camera. Confocal microscopy of trichomes was carried out on a Nikon eclipse A1 microscope. Transmitted light was used to image the trichomes since they lack fluorescence. Autofluorescence of chlorophyll (chloroplasts) was used as a contrast for better visualization of the trichomes. Far-red laser was used to detect autofluorescence of chlorophyll (excitation: 640 nm; emission: 663-738 nm).
Trichomes were enriched following Bergau et al. guidelines with modifications. Briefly, young leaves were harvested and soaked in ice-cold, distilled water and then abraded using a BeadBeater machine (Biospec Products, Bartlesville, OK). The polycarbonate chamber was filled with 15 g of plant material and filled with half the volume with glass beads (0.5 mm diameter), XAD-4 resin (1 g/g plant material), and ethanol 80% to full volume. Leaves were beaten by 2-4 pulses of operation of 1 min each. This procedure was carried out at 4° C., and after each pulse the chamber was allowed to cool on ice. Following abrasion, the contents of the chamber were first filtered through a kitchen mesh strainer and then through a 100 μm nylon mesh to remove the plant material, glass beads, and XAD-4 resin. The residual plant material and beads were scraped from the mesh and rinsed twice with additional ethanol 80% that was also passed through the 100 μm mesh. The presence of enriched glandular trichome secretory cells was checked by visualization in an inverted optical microscope.
High molecular weight DNA was extracted from young frozen leaves and sequenced in UC Davis Genome Center. Sequencing was done in a Pacbio Sequel II platform with ˜12-kilobase DNA SMRT bell library preparation according to the manufacturer's protocol. Three different SMRT 8M cells were used, yielding a total of 57.8 Gb of HiFi data (˜44× haploid coverage). In addition to Pacbio HiFi data, 200 M reads of PE 2×150 Illumina Hi-C data were obtained by the company Phase Genomics. Hifiasm software was used to integrate both Pacbio HiFi and HiC data to produce chromosome-scale and haplotype-resolved assemblies. Further scaffolding was performed with the Hi-C data, mapping the reads following Arima Genomics pipeline and the SALSA software. Visualizations of Hi-C heatmaps were performed with Juicer and quality metrics were obtained with Assemblathon 2 script. Finally, the assembly was softmasked for repetitive elements using EDTA with the −cds flag to incorporate CDS sequences from the transcriptomic data. Parameter details of each of the commands can be found in github.com/Luisitox/Helichrysum_paper.
RNA was extracted from seven tissues: young leaves, old leaves, florets and receptacles of flowers, stems, roots and trichomes (Berman et al., “Parallel evolution of cannabinoid biosynthesis”; Nature Plants 9 817-831 (2023)). RNA integrity was checked using a TapeStation instrument. Paired-end Illumina libraries were prepared for five of the tissues and sequenced on Illumina HiSeq 3000 instrument (PE 2×150, ˜40 M reads per sample) and processed following Freedman and Weeks guidelines. Briefly, random sequencing errors were corrected using Rcorrector and uncorrectable reads were removed. Adaptor and quality trimming were performed using TrimGalore! Ribosomal RNA was filtered by discarding reads mapping to SILVA_132_LSURef and SILVA_138_SSURef non-redundant databases using bowtie2. Fastq quality checks on each of the steps were performed using MultiQC. The remaining reads were pooled and used for genome-guided and genome-independent de novo transcriptome assembly using Trinity.
The Iso-Seq data was obtained from four of the tissues (Berman et al., “Parallel evolution of cannabinoid biosynthesis”; Nature Plants 9 817-831 (2023)) and processed with isoseq3. Fused and unspliced transcripts were removed, and only polyA-positive transcripts were kept for a unique set of high-quality isoforms. Iso-Seq and Trinity transcripts were aligned to the assembly using minimap2 and the BAM files were incorporated to the PASA pipeline to generate RNA-based gene model structures. In addition, de novo gene structures were obtained using the software braker2 and the BAM file alignments of long and short reads as extrinsic training evidence. Ab initio and RNA-based gene models were combined using EvidenceModeler followed by a final round of PASA pipeline. Gene functional annotation was performed for the predicted mature transcripts using TransDecoder, which takes into account HMMER hits against PFAM and BLASTP hits against UniProt databases for similarity retention criteria. Further annotation of protein-coding transcripts was performed by taking the best hit of BLASTP searches against other plant protein databases (Uniprot protein fasta files of sunflower id UP000215914_4232, Arabidopsis id UP000006548_3702, tomato id UP000004994_4081, rice id UP000059680_39947 and Cannabis NCBI id GCF_900626175.1_cs10). Signal peptides were predicted with SignalP, transmembrane domains were predicted with TMHMM, and GO and KEGG terms were obtained with Trinotate. The full script used for the functional annotation of the proteins can be found in github.com/Luisitox/Helichrysum_paper. BUSCO was used at multiple stages of the analysis to assess the completeness of the different versions of both the transcriptome and the genome.
UMI-based 3′ RNAseq of three replicates of the seven tissues was obtained similarly as described. Adaptor and quality trimming were performed using TrimGalore! in two steps, including PolyA trimming mode. Reads were mapped to the genome using STAR UMI-deduplicated using UMI-tools, and counts were obtained with featureCounts. Normalization was performed with the varianceStabilizingTransformation algorithm of DESeq2, and the CEMItools package was used for co-expression analysis (dissimilarity threshold of 0.6, pvalue of 0.1).
Gene and TEs density were calculated by intersecting the corresponding gff files with 0.1 Mb non-overlapping windows using bedtools makewindows and bedtools intersect. True-seq and Tran-seq coverage were calculated using bedtools genomecov in BedGraph format. The circus plot was made with the R circlize package, and the gene clusters plots were made with the gggenes package. The full R scripts can be found at github.com/Luisitox/Helichrysum_paper.
The selection of the proteins for each of the families analyzed in this study was based on functionally tested enzymes according to studies referenced in each Figure. The full list of IDs can be found in Berman et al., “Parallel evolution of cannabinoid biosynthesis”; Nature Plants 9 817-831 (2023). The Maximum Likelihood trees were constructed with 100 bootstrap tests based on a MUSCLE multiple alignment using the MEGA11 software. The evolutionary distances were computed using the JTTmatrix-based method.
Proteomes were obtained from all available annotated Asteraceae genomes present in NCBI: GCA_003112345.1 (Artemisia annua), GCA_009363875.1 (Mikania micrantha), GCA_023376185.1 (Cichorium endivia), GCA_023525715.1 (Cichorium intybus), GCA_023525745.1 (Arctium lappa), GCA_023525975.1 (Smallanthus sonchifolius), GCA_024762085.1 (Ambrosia artemisiifolia), GCF_001531365.2 (Cynara cardunculus var. scolymus), GCF_002127325.2 (Helianthus annuus), GCF_002870075.4 (Lactuca sativa), GCF_010389155.1 (Erigeron canadensis) and Cannabis sativa GCA_900626175.1. Orthogroups and their phylogenetic relationship were inferred with Orthofinder. Genomic positions and putative function of all the genes belonging to the orthogroups of HuCoAT6 (OG0014461), HuOLS4 (OG0000313), and HuCBGAS4 (OG0002538) were determined using the corresponding GFF files and the plots were produced with the gggenomes package. Phylogenetic gene trees generated by Orthofinder were plotted with MEGA11.
MPLC fractions (50 ml each) containing Glc-DHSA 103 were evaporated using a rotary evaporator at 40° C., lyophilized and reconstituted in 15 ml McIlvaine buffer (20 mM, pH 5.0). Reactions were performed in separate 20 ml vials incubated at 45° C. for 24 h. Each reaction consisted of 6 ml of McIlvaine buffer (pH 5.0), 3 ml of 0.1 mg ml−1 of an almond β-glucosidase solution in Mcilvaine buffer (≥6 U mg−1, Sigma Aldrich), and 1.5 ml of the fractions containing Glc-DHSA 103. The metabolites were extracted using 3 volumes of ethyl acetate: diethyl ether 1:1, evaporated using a rotary evaporator and reconstituted in 5 ml methanol. The products from the reaction contained a mixture of both glucosylated and non-glucosylated metabolites. DHSA 93 was therefore purified using System 2 and reconstituted in 100 μl methanol for the enzymatic assay. The purified DHSA 93 was analyzed via UPLC-qTOF to verify that the purified fraction did not contain Glc-DHSA 103.
AAE, PKS, PKC, UGT and AAT Expression in E. coli and Protein Purification
HuAAE1-6, HuUGT1-13 and HuAAT1-15 coding sequences from H. umbraculigerum and previously characterized sequences from rice (OsUGT) and stevia (SrUGT), were individually cloned into the pET28b vector digested with EcoRI using the ClonExpress II one step cloning kit (Vazyme, Germany). HuPKS1-4, HuPKC1-5, CsOLS and CsOAC were ligated into the pOPINF vector (digested with HindIII and KpnI) using the ClonExpress II one step cloning kit (Vazyme, Germany). Due to the high sequence similarity of the coding sequences, HuPKS2-4 were synthesized by the company Twist Biosciences. All constructs were expressed in E. coli BL21 (DE3) cells (a complete list of the primers can be found in Berman et al., “Parallel evolution of cannabinoid biosynthesis”; Nature Plants 9 817-831 (2023)). Bacterial starters were grown overnight in LB medium at 37° C., diluted in fresh LB 1:100, and re-incubated at 37° C. When cultures reached A600=0.6, protein expression was induced with 400 μM of isopropyl-1-thio-β-d-galactopyranoside (IPTG) overnight at 15° C. Bacterial cells were lysed by sonication in 50 mM Tris-HCl pH 8.0, 0.5 mM phenylmethylsulfonyl fluoride (PMSF, Sigma Aldrich) solution in isopropanol, 10% glycerol and protease inhibitor cocktail (Sigma Aldrich), and 1 mg ml−1 lysozyme (Sigma Aldrich). The whole-cell extract was either kept for functional activity or used for protein purification. Purification of hexahistidine-tagged proteins was performed on Ni-NTA agarose beads (Adar Biotech). The proteins were eluted with 200 mM imidazole (Fluka) in buffer containing 50 mM NaH2PO4, pH 8.0. and 0.5 M NaCl. Protein concentration of the eluted fractions was measured with Pierce™ 660 nm protein assay reagent (Thermo Scientific).
Recombinant AAE assays were performed in a 20 μl reaction mix that contained 0.1 μg recombinant AAE, 50 mM HEPES pH 9.0, 8 mM ATP, 10 mM MgCl2, 0.5 mM CoA and 4 mM of the sodium salt of the respective acid (acetic, butyric, hexanoic, octanoic, cinnamic and coumaric acids) for 10 min at 40° C. Reactions were terminated with 2 μl of 1 M HCl and stored on ice until analysis. After centrifugation at 15 000 g for 5 min at 4° C., the samples were diluted 1:100 in water and analyzed on the TQ-S system in MRM mode using a similar column as previously described. The system was operated with an aqueous buffer pH 7.0 (10 mM Ammonium Acetate, 5 mM NH4HCO2, phase A) and acetonitrile (phase B). The flow rate was 0.3 ml min−1, and the column temperature was kept at 25° C. Metabolites were analyzed using a 15 min multistep gradient method: initial conditions were 1% B raised to 35% B until 10.5 min, and then raised to 100% B until 11 min, held at 100% B for 1 min, decreased to 1% B until 12.5 min, and held at 1% B until 15 min for re-equilibration of the system. The instrument was operated in positive mode with a capillary voltage of 3.0 kV, and a cone voltage of 50 V. Metabolite identity was confirmed with authentic standards. Two different transitions were used for analysis of: acetyl-CoA (810.52>303.30, 27.0V; 810.52>428.25, 24.0V); butyryl-CoA (838.58>331.30, 28.0 V; 838.58>331.30, 25.0 V); hexanoyl-CoA (866.65>359.40, 28.0 V; 866.65>428.25, 26.0 V); octanoyl-CoA (894.65>387.55, 30.0 V; 894.65>428.25, 28.0 V); coumaroyl-CoA (914.59>407.37, 30.0 V; 914.59>428.25, 28.0 V); cinnamoyl-CoA (898.59>391.37, 30.0 V; 898.59>428.25, 28.0 V).
Individual and coupled HuPKS and PKC (HuOACs or CsOAC) assays were carried out as described by Gagne et al. (2012) with some modifications. Enzyme assays were performed in 50 μL with 20 mM HEPES at pH 7.2, 5 mM DTT, 1.8 mM malonyl CoA and 0.6 mM of hexanoyl-CoA. HuPKSs (5 μg) and PKCs (10 μg), were added either individually or in combination. Reaction mixtures were incubated at 30° C. for 3 h. Reactions were stopped by extraction with 100 μL methanol, vortexing and centrifugation at 15 000 g for 10 min. The supernatant was filtered and analyzed with both UPLC-qTOF and triple-Quad systems. The column and mobile phase were as for the metabolic profiling. Initial conditions were 10% B raised to 70% until 6 min, raised to 100% B until 6.2 min, held at 100% B until 8 min, decreased to 10% B until 8.5 min, and held at 10% B until 11 min for re-equilibration of the system. The flow rate was 0.3 ml min−1, and the column temperature was kept at 35° C. UPLC-qTOF was run in both polarities with MS or MS/MS modes using similar parameters as previously described. The TQ-S system was operated in MRM mode in both positive (for olivetol) and negative modes with a capillary voltage of 3.5 or 1.5 kV, respectively, and a cone voltage of 40 or 20 V, respectively. Two different transitions were used for analysis of: OA 92 (223.1>179.1, 15.0 V; 223.1>137.1, 20.0 V); PDAL (181.2>137.1, 10.0 V; 181.2>97.1, 20.0 V); HTAL (223.1>179.1, 10.0 V; 223.1>125.1, 10.0 V); PCP 95 (223.1>179.1, 20.0 V; 223.1>81.0, 25.0 V); olivetol (181.1>111.0, 10.0 V; 181.1>71.2, 10.0 V). Olivetol, OA 92 and PCP 95 identities were confirmed with authentic standards.
HuPT1-4 genes from H. umbraculigerum were separately cloned into pESC-TRP vector. Microsomal preparations from yeast cells transformed with pESC-TRP vectors were performed as described by Jozwiak et al. (2020). PT enzymatic assays were carried out as described previously for CsPT48 with some modifications. The microsomes from yeasts expressing the HuPTs were resuspended in 3.3 ml buffer (10 mM Tris-HCl, 10 mM MgCl2, pH 8.0, 10% glycerol) and homogenized with a tissue grinder. The enzyme assays were performed in 50 μL with 2 μl of the respective membrane preparations dissolved in the reaction buffer (50 mM Tris-HCl, 10 mM MgCl2, pH 8.0), with 500 μM of the aromatic acceptor [OA 92, VA, DHSA 93, PCP 95, naringenin chalcone 97 or pinocembrin chalcone 100] and 500 μM of the isoprenoid (IPP, GPP or FPP). Samples were incubated for 1 h at 30° C. Kinetic assays were similarly performed with 1 mM of GPP and varying (0.5 μM-1.5 mM) concentrations of OA 92, with 15 min incubation at 30° C. Samples were extracted with 100 μl ethanol followed by vortexing and centrifugation. The supernatant was filtered and analyzed via UPLC-qTOF as for the terpenophenols (UPLC Method 1).
The UGT enzyme assays were performed as described by Cai et al. (2021) with some modifications. UGT assays using different aromatic substrates were performed by mixing 1.5 μl of the UDP-Glc solution (80 mM, final concentration: 2.5 mM), 27.5 μl Tris buffer (100 mM, pH 8.0), 1 μl of each of the substrates (50 mM, final concentration: 1 mM) and 20 μl of the lysate enzyme solution. The reactions were incubated at 30° C. for 1 h. Reactions were stopped by extraction with 100 μl methanol, vortexing and centrifugation at 15,000 g for 10 min. The supernatant was filtered and analyzed via UPLC-qTOF using UPLC Method 2. The assay with the purified UGTs was performed by mixing 2 μl of the cannabinoid acceptors (OA 92, DHSA 93, CBGA 1, heliCBGA 2, CBDA, A9-THCA, CBCA 15, olivetol, CBG, CBD or A9-THC, PCP 95, naringenin chalcone 97 or pinocembrin chalcone 100) in the presence of 1.5 μl UDP-Glc 80 mM, 46.5 μl Tris buffer (100 mM, pH 8.0) and 1 μl of each enzyme. The metabolites were extracted and analyzed as previously described. Kinetic assays were performed with the purified enzymes (1.5 μg μl−1) dissolved in 45 μl Tris buffer (100 mM, pH 8.0) and substrates were added using varying (0.5 μM-3 mM) and constant (1 mM) concentrations of OA 92 and UDP-Glc and the total reaction volume was 50 μl. To stop the reactions, 100 μl methanol was added to each tube, and the metabolites were extracted and analyzed as previously described.
Recombinant AAT assays using different donor and acceptor substrates were performed by mixing 7 μl of the cannabinoid acceptors (OA 92, CBGA 1, or heliCBGA 2, 1 mg ml−1) with 58 μl of a potassium phosphate buffer (100 mM, pH 7.4), 5 μl of the acyl-CoA donors (butyryl-CoA, hexanoyl-CoA, iso-valeryl-CoA, or acetyl-CoA, 10 mM) and 30 μl of the enzyme solutions. The reactions were incubated at 30° C. for 3 h. Samples were extracted with 100 μl ethanol followed by vortexing and centrifugation. The supernatant was filtered and used for UPLC-qTOF analysis using a similar column, mobile phase and MS parameters as previously described for terpenophenols. Initial conditions were 40% B for 1 min, raised to 100% B until 14 min, held at 100% B for 3.8 min, decreased to 40% B until 18 min, and held at 40% B until 20 min for re-equilibration of the system. The flow rate was 0.3 ml min−1, and the column temperature was kept at 35° C.
The assay with the purified HuCBAT5 enzyme was performed by mixing 2 μl of the cannabinoid acceptors (OA 92, CBGA 1, heliCBGA 2, CBDA, A9-THCA or CBCA 15) with 2 μl of the acyl-CoA donors (butyryl-CoA, iso-butyryl-CoA, hexanoyl-CoA, iso-valeryl-CoA, or acetyl-CoA, 10 mM), 44 μl of a potassium phosphate buffer (100 mM, pH 7.4), and 2 μl of the purified HuCBAT5 enzyme solution. The reactions were incubated at 30° C. for 3 h. To stop the reactions, 50 μl ethanol was added to each tube and the acylated metabolites were extracted and analyzed via UPLC-qTOF as for the terpenophenols (UPLC Method 1) in both MS and MS/MS modes. Extracted ion chromatograms using the major products were selected from the LC-MS/MS analyses as follows: cannabinoid acceptors without CoAs: OA 92>179.107, CBGA 1, CBCA 15>191.107, heliCBGA 2>225.092, CBDA, A9-THCA>245.154; acylated cannabinoids: OA 92>179.107, CBGA 1>231.102, heliCBGA 2>265.086, CBDA>245.154, A9-THCA>245.154, CBCA 15>191.107).
Transient Expression of Selected Genes in N. benthamiana
Overexpression constructs of GFP (as negative control), CsOLS and CsOAC were generated using GoldenBraid cloning as described by Jozwiak et al. 2020 to a final vector of pAlpha2-Ubq10-CCD-Ter10. HuCoAT6, HuTKS4, and HuCBGAS were amplified and cloned in pAlpha2-NPT II-Ubq10-CCD-Ter10 vector digested with Bsal using ClonExpress II One Step Cloning kit (Vazyme). The full list of oligonucleotides used for cloning can be found in Berman et al., “Parallel evolution of cannabinoid biosynthesis”; Nature Plants 9 817-831 (2023). All plasmids were sequenced and transformed into Agrobacterium tumefaciens strain GV3101 by electroporation. A. tumefaciens harboring the overexpression constructs were grown overnight at 28° C. in Luria-Bertani (LB) medium in the presence of kanamycin and gentamycin. Bacterial cells were collected by centrifugation, washed and resuspended in infiltration buffer (10 mM MES, 2 mM MgCl2, 2 mM Na3PO4, 0.5% glucose and 100 mM acetosyringone) to OD600=0.3. Equal volumes of A. tumefaciens suspension with different expression vectors were combined to obtain the desired gene combinations and incubated for 2 h at room temperature. The solutions were infiltrated into 4- or 5-week-old N. benthamiana leaves from the abaxial side using a 1-ml needleless syringe. Substrates (0.5 mM each) were infiltrated into the same leaf areas 2 days after initial infiltration, and leaves were collected for metabolite analysis after 24 h. Leaf samples were flash frozen and extracted as previously described with 300 μl methanol and analyzed on a similar UPLC system connected to an Orbitrap IQ-X Tribrid MS (Thermo Scientific, Bremen, Germany) using UPLC Method 2 in negative mode. The source parameters were: sheath gas flow rate, auxiliary gas flow rate and sweep gas flow rate: 45, 10 and 1 arbitrary units, respectively; vaporizer temperature: 300° C.; ion transfer tube temperature: 275° C.; spray voltage: 2.3 kV. The instrument was operated in full MS1 with data dependent MS/MS (MS-dd-MS2). Data acquisition in full MS1 mode was 60,000 resolution, the scan range 100-1000 m/z, normalized automatic gain control (AGC) target of 25% and a maximum injection time (IT) of 50 ms. Data acquisition in dd-MS2 mode was with 15,000 resolution, a normalized AGC target of 20%, maximum IT of 150 ms, isolation window of 1.5 m/z and normalized collision energy of 40. Identification of metabolites was performed using analytical standards and/or products from in vitro UGT enzyme assays (
Heterologous Expression in S. cerevisiae
For the expression of HuCoAT6, HuTKS4, CsOAC and HuCBGAS in S. cerevisiae, the CDSs were amplified, and the purified amplicons were inserted into series of pESC (AmpR) plasmids allowing simultaneous expression of two genes from one plasmid. HuCoAT6 and HuTKS4 were inserted using ClonExpress II One Step Cloning kit (Vazyme) into pESC-HIS plasmid linearized with SalI and SacI restriction enzymes, respectively. HuCBGAS and CsOAC were cloned in the same way into pESC-TRP plasmid linearized with SalI/SacI restriction enzymes, respectively. The full list of primers used for the cloning can be found in Berman et al., “Parallel evolution of cannabinoid biosynthesis”; Nature Plants 9 817-831 (2023). pESC constructs were transformed into S. cerevisiae WAT11 using Yeastmaker yeast transformation system (Clontech). The inventors transformed yeast cells with combinations of pESC vectors allowing expression of all the four genes at once. Transformed yeast were grown on SD minimal media supplemented with appropriate amino acids and 2% glucose. Colonies were screened and the presence of the transgene was confirmed by colony PCR. For induction of gene expression, transformed cells were grown in 2 ml minimal medium with 2% glucose and after 24 h transferred to a minimal medium with 2% galactose without additional supplementation or supplemented with GPP (0.21 mM) and either sodium hexanoate (1 mM) or OA 92 (0.2 mM), and grown for additional 24 h at 30° C. Cultures were transferred to 2 ml Eppendorf tube and centrifuged at 8,000 g for 1 min. The cell pellet was weighed, double the amount of glass beads (diameter 500 μm) and 500 μl of MeOH was added and lysed using a bead beater at 22 Hz for 6 min. Lysed cells were centrifuged at 14,000 r.p.m. for 5 min, clear supernatant was collected and dried using SpeedVac. Dry residues were dissolved in 100 μl of methanol, filtered through a 0.22 μm filter and analyzed on LC-MS as detailed for N. benthamiana samples.
As two earlier reports regarding the presence of cannabinoids, specifically CBGA 1, in H. umbraculigerum were contradictory, the inventors decided to carry out comprehensive chemical profiling of cannabinoids in various H. umbraculigerum tissues. The inventors confirmed that CBGA 1 is a major component of H. umbraculigerum, accumulating up to 4.3% on a dry weight basis in leaves (
The inventors predicted that CBGA 1 and heliCBGA 2 biosynthesis originates from hexanoic acid and phenylalanine, respectively (
The inventors employed various high-resolution imaging technologies to examine if, like Cannabis, H. umbraculigerum develops and accumulates cannabinoids in glandular trichomes. The inventors found that in flowers, the involucral bracts of the capitula had numerous non-glandular and glandular trichomes. In individual florets, glandular trichomes were particularly abundant on the tips of the corolla lobe (
Next, the inventors applied matrix-assisted laser desorption/ionization-mass spectrometry imaging (MALDI-MSI) to spatially localize cannabinoids in H. umbraculigerum. The inventors first analyzed the abaxial and adaxial leaf surfaces following partial removal of trichomes (
Cannabis produces various CBGA-type analogs with aliphatic chains of different lengths (one to seven carbons), derived from different linear short- and medium-chain fatty acids (FAs). The inventors observed in leaves of H. umbraculigerum several of these analogs, including cannabigerovarinic acid (CBGVA 9), cannabigerol butyric acid (CBGBA 10), cannabigerohexolic acid (CBGHA 11), and cannabigerophorolic acid (CBGPA 12), corresponding to three, four, six, and seven carbon-atom chains, respectively (Berman et al., “Parallel evolution of cannabinoid biosynthesis”; Nature Plants 9 817-831 (2023)). The inventors also observed two metabolites with similar masses and fragmentation patterns as CBGA 1 and CBGHA 11, which the inventors assigned as cannabinoids derived from branched FAs (13 and 14, respectively, Berman et al., “Parallel evolution of cannabinoid biosynthesis”; Nature Plants 9 817-831 (2023)). These branched cannabinoids have not been identified in Cannabis. The inventors also found small amounts of CBCA 15 and its aromatic analog helichromenic acid (heliCBCA 16) and their hydroxylated forms (17 and 18, respectively, Berman et al., “Parallel evolution of cannabinoid biosynthesis”; Nature Plants 9 817-831 (2023)), and the isoprenyl-forms of CBGA 1 and heliCBGA 2 according to MS/MS fragmentation (CBPA 19 and heliCBPA 20, respectively, Berman et al., “Parallel evolution of cannabinoid biosynthesis”; Nature Plants 9 817-831 (2023)). The inventors did not detect Δ9-THCA- or CBDA-type cannabinoids in any of the tissues.
Some additional peaks exerted MS/MS fragments and chemical formulas corresponding to one or two hydroxylations of the metabolites with five-carbon-atom chains, which were labeled following feeding with hexanoic-D11 acid (21-33, Berman et al., “Parallel evolution of cannabinoid biosynthesis”; Nature Plants 9 817-831 (2023)). Interestingly, hydroxylated amorfrutins were observed with similar fragmentation patterns as the cannabinoids (with m/z difference of 33.984 Da), suggesting similar chemical structures and enzymes associated with their metabolism (34-46, Berman et al., “Parallel evolution of cannabinoid biosynthesis”; Nature Plants 9 817-831 (2023)). The inventors purified from this group metabolite 26 and identified by NMR spectroscopy a new tetrahydroxanthane-type cannabinoid (12-OH-cyclocannabigerolic acid 26). According to its MS/MS fragmentation pattern, the inventors also putatively identified cyclocannabigerolic acid (cycloCBGA 47) and analogous amorfrutin types [12-OH-heli-cyclocannabigerolic acid (12-OH-helicycloCBGA 39) and heli-cyclocannabigerolic acid (helicycloCBGA 48), respectively, Berman et al., “Parallel evolution of cannabinoid biosynthesis”; Nature Plants 9 817-831 (2023)].
According to the current feeding experiments, prenyl-acyl-phloroglucinoids, prenylchalcones, and prenylflavanones were derived from similar precursors as the cannabinoids and amorfrutins (49-91, Berman et al., “Parallel evolution of cannabinoid biosynthesis”; Nature Plants 9 817-831 (2023)). A summary of the identified metabolites 1-91 appears in Berman et al., “Parallel evolution of cannabinoid biosynthesis”; Nature Plants 9 817-831 (2023).
The inventors postulated that the core cannabinoid pathway leading to CBGA 1 in H. umbraculigerum consists of similar types of enzymes and reactions as in Cannabis (
To identify the enzymes responsible for cannabinoid biosynthesis in H. umbraculigerum, the inventors obtained a haplotype resolved dual genome assembly using 44× Pacbio HiFi reads, and 200 M reads of Illumina HiC chromatin interaction data (haploid size of ˜1.3 Gb, Berman et al., “Parallel evolution of cannabinoid biosynthesis”; Nature Plants 9 817-831 (2023)). After scaffolding the N50 of the primary assembly was 174 Mb with eight scaffolds >10 Mb (
The first step in cannabinoid biosynthesis involves the formation of acyl-CoA thioesters by members of the AAE superfamily. As different acyl moieties are substrates for these enzymes, the inventors tested acetic, butyric, hexanoic, octanoic, cinnamic and coumaric acids. In vitro assays with purified recombinant proteins showed that HuAAE2 and HuAAE4 efficiently produced butyryl-CoA, and that HuAAE2 presented higher activity against acetic acid and formed acetyl-CoA (
In Cannabis, the next step is performed by a coupled enzymatic reaction involving a CsOLS and the accessory protein CsOAC, resulting in the condensation of hexanoyl-CoA with three molecules of malonyl-CoA to yield OA 92. In in vitro assays, derailment of the unstable intermediates occurs producing additional by-products not naturally identified in plant extracts [olivetol, pentyl acyl diacetic acid lactone (PDAL) and hexanoyl acyl triacetic acid lactone (HTAL), Berman et al., “Parallel evolution of cannabinoid biosynthesis”; Nature Plants 9 817-831 (2023)]. PDAL and HTAL are produced by spontaneous lactonization of the tri- and tetra-ketide unstable intermediates, whereas olivetol is produced by CsOLS in the absence of CsOAC in an aldol decarboxylation cyclization reaction resembling the production of resveratrol by a stilbene synthase (STS). When CsOAC is also present in the reaction, OA 92 is produced at the expense of olivetol. Here, the inventors cloned and expressed in E. coli HuPKS1-4, HuPKC1-5, CsOLS and CsOAC enzymes, and tested using hexanoyl-CoA and malonyl-CoA their ability to form OA 92 in coupled in vitro assays with all the possible combinations (Berman et al., “Parallel evolution of cannabinoid biosynthesis”; Nature Plants 9 817-831 (2023)). In the absence of PKCs, all the HuPKSs produced the PDAL and HTAL by-products, while HuPKS1, HuPKS2 and HuPKS4 produced also olivetol (
In the next step, OA 92 or OA-derivatives are prenylated by aromatic PTs to form CBGA 1 and its derivatives. The inventors expressed four enzymes in yeast and purified the microsomal fractions used for enzymatic assays (HuPT1-4,
To get more insight to the evolution of the pathway, the inventors searched for orthologous enzymes in Cannabis and in all other Asteraceae species with annotated genomes. To the best of inventors' knowledge, these species do not accumulate terpenophenols. Similarly, to the phylogenetic relationships observed for functionally tested enzymes (i.e., AAEs, PKSs and PTs,
Glycosylated cannabinoids have not been reported to occur naturally in planta. Here the inventors identified glucosylated OA (Glc-OA 102) and glucosylated DHSA (Glc-DHSA 103) as well as glucosylated C3-C6 alkyl-chain intermediates (104-108), glucosylated CBGA (Glc-CBGA 109) and heliCBGA (Glc-heliCBGA 110), and their isoprenylated forms (Glc-CBPA 111 and Glc-heliCBPA 112) (Berman et al., “Parallel evolution of cannabinoid biosynthesis”; Nature Plants 9 817-831 (2023)). All these metabolites exhibited neutral losses of 162.053 Da corresponding to hexose and similar fragments as the non-glucosylated compounds. Di-glucosylated metabolites were not identified in the extracts. In Arabidopsis thaliana uridine 5′-diphospho-glucuronosyltransferases (AtUGT89B1, AtUGT71B1, AtUGT75B1 and AtUGT71B2) catalyze the glycosylation of several hydroxybenzoic acids (HBA and DHBAs) which are structurally like OA 92 (
Eleven of the thirteen UGTs from H. umbraculigerum were expressed in E. coli and examined for enzyme activity using OA 92, CBGA 1, and heliCBGA 2 in a reaction including uridine diphosphate glucose (UDP-Glc) as the sugar donor. Eight out of the eleven enzymes showed activity on the different substrates, including HuUGT1-2, HuUGT4-7, HuUGT11, and HuUGT13 (
Previous reports identified in H. umbraculigerum isoprenylated O-acylated amorfrutins but not geranylated or alkyl-type ones which are also not found in Cannabis. Here the inventors identified a diverse group of O-acylated cannabinoids and amorfrutins including the O-acylated alkyl (113-130) and aralkyl (131-141) metabolites (Berman et al., “Parallel evolution of cannabinoid biosynthesis”; Nature Plants 9 817-831 (2023)). The inventors hypothesized that the acyl group is derived from short- or medium-chain FAs (
O-Acylation of specialized metabolites in plants is frequently catalyzed by BAHD-type alcohol acyl-transferase (AAT) enzymes. Therefore, the inventors selected fifteen H. umbraculigerum BAHD homologs, four of them co-expressed with other cannabinoid-related enzymes (
The inventors verified the in planta activity of the enzymes towards CBGA 1 by transiently co-expressing different combinations of HuCoAT6, HuTKS4, and HuCBGAS4, and the Cannabis CsOAC and CsOLS in N. benthamiana leaves. Following leaves infiltration with sodium hexanoate and GPP, the inventors observed the production of glycosylated forms of OA 92 (HuTKS4+CsOAC or CsOLS+CsOAC) and PCP 95 (only with HuTKS4,
The inventors also reconstituted the cannabinoid pathway by expressing the HuCoAT6, HuTKS4, CsOAC and HuCBGAS4 genes in S. cerevisiae. The inventors observed the production of OA 92, CBGA 1 and PCP 95 without precursor feeding (
Although the invention has been described in conjunction with specific embodiments thereof, it is evident that many alternatives, modifications, and variations will be apparent to those skilled in the art. Accordingly, it is intended to embrace all such alternatives, modifications and variations that fall within the spirit and broad scope of the appended claims.
This application is a ByPass Continuation of PCT Patent Application No. PCT/IL2023/050968 having international filing date Sep. 7, 2023, which claims the benefit of priority of U.S. Provisional Patent Application No. 63/404,645, titled “COMBINATION OF NUCLEIC ACID SEQUENCES ENCODING PROTEINS DERIVED FROM Helichrysum umbraculigerum, AND ANY TRANSGENIC CELL, TISSUE, AND ORGANISM COMPRISING SAME”, filed 8 Sep. 2022, and of U.S. Provisional Patent Application No. 63/453,112, titled “COMBINATION OF NUCLEIC ACID SEQUENCES ENCODING PROTEINS DERIVED FROM Helichrysum umbraculigerum, AND ANY TRANSGENIC CELL, TISSUE, AND ORGANISM COMPRISING SAME”, filed 19 Mar. 2023. The contents of the above applications are all incorporated herein by reference in their entirety.
| Number | Date | Country | |
|---|---|---|---|
| 63404645 | Sep 2022 | US | |
| 63453112 | Mar 2023 | US |
| Number | Date | Country | |
|---|---|---|---|
| Parent | PCT/IL2023/050968 | Sep 2023 | WO |
| Child | 19072119 | US |