CHIMERIC TERPENE SYNTHASES

Information

  • Patent Application
  • 20210147880
  • Publication Number
    20210147880
  • Date Filed
    February 14, 2019
    5 years ago
  • Date Published
    May 20, 2021
    3 years ago
Abstract
Described herein are chimeric terpene synthases, methods for making chimeric terpene synthases, and methods for making terpenes using the same.
Description
FIELD OF THE INVENTION

The disclosure relates to chimeric terpene synthases, methods for making chimeric terpene synthases, and methods for making terpenes using the same.


BACKGROUND

Terpenes are a diverse class of organic compounds built from five carbon building blocks and encompass at least 400 distinct structural families. Given their structural diversity, terpenes have numerous roles including acting as pheromones, anti-oxidants, and anti-microbial agents. Although terpene synthases produce terpenes in both prokaryotes and eukaryotes, the wide array of terpene isomers often hinder high yield extractions from naturally occurring sources. Furthermore, the structural complexity of terpenes often limits de novo chemical synthesis.


SUMMARY

Aspects of the disclosure relate to chimeric terpene synthases comprising an amino acid sequence at least 90% identical to an amino acid selected from the group consisting of: SEQ ID NOs: 1-52. In some embodiments, the chimeric terpene synthase comprises an amino acid sequence at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to an amino acid selected from the group consisting of: SEQ ID NOs: 1-52. In some embodiments, the chimeric terpene synthase comprises an amino acid sequence identical to an amino acid selected from the group consisting of: SEQ ID NOs: 1-52.


Further aspects of the disclosure relate to nucleic acid molecules encoding a chimeric terpene synthase described herein. In some embodiments, a nucleic acid molecule comprises a sequence that is at least 90% identical to a nucleotide sequence selected from the group consisting of SEQ ID NOs: 67-118. In some embodiments, a nucleic acid molecule comprises a sequence selected from the group consisting of SEQ ID NOs: 67-118.


Further aspects of the disclosure relate to vectors comprising a nucleic acid molecule described herein. In some embodiments, the vector is a viral vector, a vector for transient expression, or a vector for inducible expression. In some embodiments, the vector is a lentiviral vector, a retroviral vector, an adenoviral vector, an adeno-associated vector, a galactose-inducible vector, or a doxycycline-inducible vector.


Further aspects of the disclosure relate to host cells comprising a nucleic acid described herein, or a vector described herein.


In some embodiments, the host cell is a fungal cell. In some embodiments, the cell is a yeast cell. In some embodiments, the cell is a Saccharomyces, Pichia, Kluyveromyces, Hansenula, or Yarrowia cell. In some embodiments, the cell is a Saccharomyces cerevisiae cell.


In some embodiments, the host cell is a plant cell.


In some embodiments, the host cell is a bacteria cell.


Further aspects of the disclosure relate to nucleic acid molecules encoding a chimeric terpene synthase, wherein at least 10% of the nucleic acid molecule sequence, or the amino acid sequence, is derived from a rare or extinct plant. In some embodiments, at least 40% of the nucleic acid molecule sequence, or the amino acid sequence, is derived from a rare or extinct plant.


In some embodiments, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, or at least 90% of the nucleic acid molecule sequence, or the amino acid sequence, is derived from a rare or extinct plant. In some embodiments, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, or at least 95% of the nucleic acid molecule sequence, or the amino acid sequence, is derived from a rare or extinct plant.


In some embodiments, the chimeric terpene synthase is a chimeric sesquiterpene synthase. In some embodiments, the rare or extinct plant is selected from the group consisting of: Hibiscadelphus wilderianus, Leucadendron grandiflorum, Macrostylis villosa, Orbexilum stipulatum, Shorea cuspidate, and Wendlandia angustifolia.


Further aspects of the disclosure relate to nucleic acid molecules encoding a chimeric terpene synthase. In some embodiments, at least 10% of the nucleic acid molecule sequence, or the amino acid sequence is derived from a rare or extinct plant. In some embodiments, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, or at least 90% of the nucleic acid molecule sequence is derived from a rare or extinct plant. In some embodiments, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, or at least 95% of the nucleic acid molecule sequence is derived from a rare or extinct plant.


In some embodiments, the nucleic acid molecule further comprises a TATA box sequence.


Further aspects of the disclosure relate to methods of producing one or more sesquiterpenes, wherein the method comprises culturing a host cell described herein under conditions suitable for producing the one or more sesquiterpenes.


Further aspects of the disclosure relate to compositions comprising one or more sesquiterpenes produced by the methods described herein.


In one embodiment, at least one of the one or more sesquiterpenes is an aroma compound.


Further aspects of the disclosure relate to methods of producing a perfume, wherein the method comprises: culturing a host cell described herein under conditions suitable for producing the one or more sesquiterpenes; and extracting the one or more sesquiterpenes.


Each of the limitations of the compositions and methods described herein may encompass various described embodiments. It is, therefore, anticipated that each of the limitations of the invention involving any one element or combinations of elements can be included in each aspect of the invention. This present disclosure is not limited in its application to the details of construction and the arrangement of components set forth in the following description or illustrated in the drawings.





BRIEF DESCRIPTION OF DRAWINGS

The accompanying drawings are not intended to be drawn to scale. The drawings are illustrative only and are not required for enablement of the disclosure. For purposes of clarity, not every component may be labeled in every drawing. In the drawings:



FIG. 1 is a series of pictures depicting structures of identified sesquiterpenes produced using sesquiterpene synthases (SQTSs) containing rare sequences from H. wilderianus.



FIG. 2 is a series of pictures depicting structures of identified sesquiterpenes produced using SQTSs containing rare sequences from L. grandiflorum.



FIG. 3 is a series of pictures depicting structures of sesquiterpenes produced using SQTSs containing rare sequences from M. villosa.



FIG. 4 is a series of pictures depicting structures of sesquiterpenes produced using SQTSs containing rare sequences from O. stipulatum.



FIG. 5 is a series of pictures depicting structures of identified sesquiterpenes produced using SQTSs containing rare sequences from S. cuspidata.



FIG. 6 is a series of pictures depicting structures of identified sesquiterpenes produced using SQTSs containing rare sequences from W. angustifolia.



FIG. 7 is a graph showing chimera product distribution versus plant species. The chimeras are categorized based on the sesquiterpene produced in highest yield.



FIGS. 8A-8F include a series of pictures depicting species of rare plants. FIG. 8A depicts Hibiscadelphus wilderianus (from Radlkofer et al., New and Noteworthy Hawaiian Plants. Hawaiian Board of Agriculture and Forestry Botanical Bulletin. 1911; (1):1-15). FIG. 8B depicts Leucadendron grandiflorum (from Salisbury et al., The Paradisus Londinensis or Coloured Figures of Plants Cultivated in the Vicinity of the Metropolis. 1805; (Volume 1, part 2): 105). FIG. 8C depicts Macrostylis villosa subsp. Villosa (from “Red List of South African Plants: Macrostylis villosa subsp. villosa,” 2007). FIG. 8D depicts Orbexilum stipulatum (from Short, “Orbexilum stipulatum collected at Falls of the Ohio,” 1840 from The Philadelphia Herbarium at the Academy of Natural Sciences). FIG. 8E depicts Shorea cuspidata (from “Kew Royal Botanical Gardens: Shorea cuspidata specimen K000700460,” 1962). FIG. 8F depicts Wendlandia angustifolia (from “Kew Royal Botanical Gardens: Wendlandia angustifolia K000030921,” collection date not recorded).



FIG. 9 is a series of pictures depicting selected gas chromatography—mass spectrometry (GC/MS) chromatograms from H. wilderianus chimera screening data (Table 4).



FIG. 10 is a series of pictures depicting selected GC/MS chromatograms from L. grandiflorum chimera screening data (Table 5).



FIG. 11 is a series of pictures depicting selected GC/MS chromatograms from L. grandiflorum chimera screening data (Table 5).



FIG. 12 is a series of pictures depicting selected GC/MS chromatograms from M. villosa chimera screening data (Table 6).



FIG. 13 is a series of pictures depicting selected GC/MS chromatograms from S. cuspidata chimera screening data (Table 8).



FIG. 14 is a series of pictures depicting selected GC/MS chromatograms from W. angustifolia chimera screening data (Table 9).



FIG. 15 is a series of pictures depicting selected GC/MS chromatograms from W. angustifolia chimera screening data (Table 9).





DETAILED DESCRIPTION

Although terpenes are widely used in the fragrance industry, purification of terpenes from natural sources and de novo chemical synthesis often have high production costs and low yield. This disclosure is premised, in part, on the unexpected finding that chimeric terpene synthases comprising a portion of a terpene synthase sequence from at least one rare or extinct plant can be leveraged to produce a diversity of sesquiterpenes. Accordingly, provided herein are chimeric terpene synthases, methods for making chimeric terpene synthases, and methods for making terpenes using the described chimeric terpene synthases. In some embodiments, the chimeric terpene synthases are chimeric sesquiterpene synthases.


This invention is not limited in its application to the details of construction and the arrangement of components set forth in the description. The invention is capable of other embodiments and of being practiced or of being carried out in various ways. Additionally, the phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting. The use of terms such as “including,” “comprising,” “having,” “containing,” “involving,” and/or variations thereof herein, is meant to encompass the items listed thereafter and equivalents thereof as well as additional items.


Chimeric Terpene Synthases

Aspects of the present disclosure relate to chimeric terpene synthases comprising fragments (e.g., sequences) from at least two terpene synthases, wherein at least one of the two or more terpene synthases is from a rare or extinct plant. For example, the sequence of a chimeric terpene synthase may comprise one or more fragments (e.g., one or more portions of the total sequence) from at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine or at least ten terpene synthases. It should be appreciated that chimeric terpene synthases described herein can be synthetic. Accordingly, chimeric terpene synthases, including synthetic chimeric terpene synthases, described herein comprise sequences derived from more than one terpene synthase, wherein at least one of the terpene synthases is from a rare or extinct plant. In some embodiments, the chimeric terpene synthases are chimeric sesquiterpene synthases.


Terpene synthases are enzymes that catalyze the formation of terpenes from isoprenoid diphosphate substrates. At least two types of terpene synthases have been characterized: classic terpene synthases and isoprenyl diphosphate synthase-type terpene synthases. Classic terpene synthases are found in prokaryotes (e.g., bacteria) and in eukaryotes (e.g., plants, fungi and amoebae), while isoprenyl diphosphate synthase-type terpene synthases have been found in insects (see, e.g., Chen et al., Terpene synthase genes in eukaryotes beyond plants and fungi: Occurrence in social amoebae. Proc Natl Acad Sci USA. 2016; 113(43):12132-12137, which is hereby incorporated by reference in its entirety for this purpose). Several highly conserved structural motifs have been reported in classic terpene synthases, including an aspartate-rich “DDxx(x)D/E” motif and a “NDxxSxxxD/E” (SEQ ID NO: 55) motif, which have both been implicated in coordinating substrate binding (see, e.g., Starks et al., Structural basis for cyclic terpene biosynthesis by tobacco 5-epi-aristolochene synthase. Science. 1997 Sep. 19; 277(5333):1815-20; and Christianson et al., Unearthing the roots of the terpenome. Curr Opin Chem Biol. 2008 April; 12(2):141-50, each of which is hereby incorporated by reference in its entirety for this purpose).


Terpene synthases may be classified by the type of terpenes they produce. As used herein, unless otherwise indicated, terpenes are organic compounds comprising isoprene (i.e., C5H8) units and derivatives thereof. For example, terpenes include pure hydrocarbons with the molecular formula (C5H8)n, in which n represents the number of isoprene subunits. Terpenes also include oxygenated compounds (often referred to as terpenoids). Terpenes are structurally diverse compounds and, for example, may be cyclic (e.g., monocyclic, multi-cyclic, homocyclic and heterocyclic compounds) or acyclic (e.g., linear and branched compounds). In some embodiments, a terpene may have an odor. As used herein, an aroma compound refers to a compound that has an odor. Any methods known in the art, including mass spectrometry (e.g., gas chromatography-mass spectrometry (GC/MS, shown in Example 2 below), may be used to identify a terpene of interest.


Terpene synthases may include, for example, monoterpene synthases, diterpene synthases, and sesquiterpene synthases. Certain non-limiting examples of monoterpene synthases and sesquiterpene synthases may be found, for example, in Degenhardt et al., Monoterpene and sesquiterpene synthases and the origin of terpene skeletal diversity in plants. Phytochemistry. 2009 October-November; 70(15-16):1621-37, which is hereby incorporated by reference in its entirety for this purpose.


Monoterpene synthases catalyze the formation of 10-carbon monoterpenes. Generally, monoterpene synthases use geranyl diphosphate (GPP) as a substrate. Non-limiting examples of monoterpene synthases include Myrcene synthase (UniProtKb Identifier: 024474), (R)-limonene synthase (UniprotKB Identifier: Q2XSC6), (E)-beta-ocimene synthase (UniProtKB Identifier: Q5CD81) and Limonene synthase (UniProtKB Identifier: Q9FV72). Non-limiting examples of monoterpenes include, but are not limited to, limonene, sabinene, thujene, carene, borneol, eucalyptol and camphene.


Diterpene synthases promote the formation of 20-carbon diterpenes. Generally, diterpene synthases use geranylgeranyl diphosphate as a substrate. Non-limiting examples of diterpene synthases include cis-abienol synthase (UniProtKB identifier: H8ZM73), sclareol synthase (UniProtKB identifier: K4HYB0) and abietadiene synthase (Q38710). See, e.g., Gong et al., Diterpene synthases and their responsible cyclic natural products. Nat Prod Bioprospect. 2014; 4(2):59-72, which is hereby incorporated by reference in its entirety for this purpose. Non-limiting examples of diterpenes include, but are not limited to, cembrene and sclareol.


Sesquiterpene synthases catalyze the formation of 15-carbon sesquiterpenes. Generally, sesquiterpene synthases convert farnesyl diphosphate (FDP) into sesquiterpenes. Non-limiting examples of sesquiterpene synthases include (+)-delta-cadinene synthase (UniProtKB Identifier: Q9SAN0), UniProtKB Identifier: A0A067FTE8, Beta-eudesmol synthase (UniProtKB Identifier: B1B1U4), (+)-delta-cadinene synthase isozyme XC14 (UniProtKB Identifier: Q39760), (+)-delta-cadinene synthase isozyme XC1 (UniProtKB Identifier: Q39761), (+)-delta-cadinene synthase isozyme A (UniProtKB Identifier: Q43714), Sesquiterpene synthase 2 (UniProtKB Identifier: Q9FQ26), Putative delta-guaiene synthase (UniProtKB Identifier: A0A0A0QUT9), Delta-guaiene synthase 1 (UniProtKB Identifier: D0VMR6), Alpha-zingiberene synthase (UniProtKB Identifier: Q5SBP4), (Z)-gamma-bisabolene synthase 1 (UniProtKB Identifier: Q9T0J9), A0A067D5M4, Delta-elemene synthase (UniProtKB Identifier: A0A097ZIE0), ShoBecSQTS1, A0A068UHT0, terpene synthase (UniProtKB Identifier: G5CV47), A0A068VE40 and A0A068VI46.


In some embodiments, a sesquiterpene synthase is an alpha-guaiene synthase. As used herein, an alpha-guaiene synthase is capable of catalyzing the formation of alpha-guaiene. In some embodiments, an alpha-guaiene synthase uses (2E,6E)-farnesyl diphosphate as a substrate. Non-limiting examples of alpha-guaiene synthases include UniProtKB Identifier: D0VMR6, UniProtKB Identifier: D0VMR7, UniProtKB Identifier: D0VMR8, UniProtKB Identifier: Q49SP3. As disclosed herein, an alpha-guaiene synthase may comprise a sequence that is at least 50% (e.g., at least 55%, 60%, 65%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more than 99%, including all values in between) identical to SEQ ID NO: 17, 22, or 29. In certain embodiments, an alpha-guaiene synthase comprises SEQ ID NO: 17, 22, or 29. In certain embodiments an alpha-guaiene synthase consists of SEQ ID NO: 17, 22, or 29.


As used herein, unless otherwise indicated, sesquiterpenes include sesquiterpene hydrocarbons and sesquiterpene alcohols (sesquiterpenols). Non-limiting examples of sesquiterpenes include but are not limited to, delta-cadinene, epi-cubenol, tau-cadinol, alpha-cadinol, gamma-selinene, 10-epi-gamma-eudesmol, gamma-eudesmol, alpha/beta-eudesmol, juniper camphor, 7-epi-alpha-eudesmol, cryptomeridiol isomer 1, cryptomeridiol isomer 2, cryptomeridiol isomer 3, humulene, alpha-guaiene, delta-guaiene, zingiberene, beta-bisabolene, beta-farnesene, beta-sesquiphellandrene, cubenol, alpha-bisabolol, alpha-curcumene, trans-nerolidol, gamma, bisabolene, beta-caryophyllene, trans-Sesquisabinene hydrate, delta-elemene, cis-eudesm-6-en-11-ol, daucene, isodaucene, trans-bergamotene, alpha-zingiberene, sesquisabinene hydrate, and 8-Isopropenyl-1,5-dimethyl-1,5-cyclodecadiene.


The present disclosure also encompasses chimeric terpene synthases that are multi-functional (e.g., capable of producing more than one sesquiterpene). In some embodiments, a chimeric terpene synthase is capable of producing delta-cadinene and alpha-cadinol. In some embodiments, a chimeric terpene synthase is capable of producing delta-cadinene, tau-cadinol, and alpha-cadinol. In some embodiments, a chimeric terpene synthase is capable of producing alpha-guaiene and delta-guaiene. In some embodiments, the chimeric terpene synthase is capable of producing beta-caryophyllene and humulene.


In some embodiments, a chimeric terpene synthase (e.g., a chimeric sesquiterpene synthase) of the present disclosure comprises an amino sequence at least 50% (e.g., at least 55%, 60%, 65%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more than 99%, including all values in between) identical to a sequence selected from the group consisting of SEQ ID NOs: 1-52. In some embodiments, the chimeric terpene synthase comprises an amino acid sequence provided in SEQ ID NOs: 1-52.


In some embodiments, a chimeric terpene synthase comprises one or more sequences provided in SEQ ID NOs: 119-357.


The term “sequence identity,” as known in the art, refers to a relationship between the sequences of two polypeptides or polynucleotides, as determined by sequence comparison (alignment). In the art, identity also means the degree of sequence relatedness between two sequences as determined by the number of matches between strings of two or more residues (e.g., nucleic acid or amino acid residues). Identity measures the percent of identical matches between the smaller of two or more sequences with gap alignments (if any) addressed by a particular mathematical model or computer program (e.g., “algorithms”).


Identity of related polypeptides can be readily calculated by any of the methods known to one of ordinary skill in the art. The “percent identity” of two sequences (e.g., nucleic acid or amino acid sequences) may, for example, be determined using the algorithm of Karlin and Altschul Proc. Natl. Acad. Sci. USA 87:2264-68, 1990, modified as in Karlin and Altschul Proc. Natl. Acad. Sci. USA 90:5873-77, 1993. Such an algorithm is incorporated into the NBLAST® and XBLAST® programs (version 2.0) of Altschul et al., J. Mol. Biol. 215:403-10, 1990. BLAST® protein searches can be performed, for example, with the XBLAST program, score=50, wordlength=3 to obtain amino acid sequences homologous to the protein molecules of the invention. Where gaps exist between two sequences, Gapped BLAST® can be utilized, for example, as described in Altschul et al., Nucleic Acids Res. 25(17):3389-3402, 1997. When utilizing BLAST® and Gapped BLAST® programs, the default parameters of the respective programs (e.g., XBLAST® and NBLAST®) can be used, or the parameters can be adjusted appropriately as would be understood by one of ordinary skill in the art.


Another local alignment technique which may be used, for example, is based on the Smith-Waterman algorithm (Smith, T. F. & Waterman, M. S. (1981) “Identification of common molecular subsequences.” J. Mol. Biol. 147:195-197). A general global alignment technique which may be used, for example, is the Needleman-Wunsch algorithm (Needleman, S. B. & Wunsch, C. D. (1970) “A general method applicable to the search for similarities in the amino acid sequences of two proteins.” J. Mol. Biol. 48:443-453), which is based on dynamic programming. More recently, a Fast Optimal Global Sequence Alignment Algorithm (FOGSAA) was developed that purportedly produces global alignment of nucleic acid and amino acid sequences faster than other optimal global alignment methods, including the Needleman—Wunsch algorithm.


The present disclosure also encompasses compositions comprising one or more terpenes (e.g., sesquiterpenes) produced by any one of the chimeric terpene synthases (e.g., sesquiterpene synthases) described herein. In some embodiments, the composition comprises at least one terpene (e.g., sesquiterpene) that is an aroma compound. In some embodiments, the composition is a perfume (e.g., comprising a single fragrance or a mixture of fragrances). In some embodiments, the composition further comprises a fixative (i.e., stabilizer) to reduce volatility of the composition. Non-limiting examples include fixatives include resinoids (e.g., benzoin, olibanum, storax, labdanum, myrrh and tolu balsam) and benzyl benzoate. In some embodiments, the composition further comprises ethyl alcohol. In some embodiments, the composition further comprises distilled water.


In certain embodiments, a terpene synthase (e.g., sesquiterpene synthase) of the present disclosure produces a terpene (e.g., sesquiterpene) composition that comprises at least 1%, at least 2%, at least 3%, at least 4%, at least 5%, at least 6%, at least 7%, at least 8%, at least 9%, at least 10%, at least 15%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at 70%, at least 80%, at least 90%, at least 95%, or 100% including any values in between of a particular terpene, such as a sesquiterpene. Non-limiting examples of sesquiterpenes include delta-cadinene, epi-cubenol, tau-cadinol, alpha-cadinol, gamma-selinene, 10-epi-gamma-eudesmol, gamma-eudesmol, alpha/beta-eudesmol, juniper camphor, 7-epi-alpha-eudesmol, cryptomeridiol isomer 1, cryptomeridiol isomer 2, cryptomeridiol isomer 3, humulene, alpha-guaiene, delta-guaiene, zingiberene, beta-bisabolene, beta-farnesene, beta-sesquiphellandrene, cubenol, alpha-bisabolol, alpha-curcumene, trans-nerolidol, gamma, bisabolene, beta-caryophyllene, trans-Sesquisabinene hydrate, delta-elemene, cis-eudesm-6-en-11-ol, daucene, isodaucene, trans-bergamotene, alpha-zingiberene, sesquisabinene hydrate, and 8-Isopropenyl-1,5-dimethyl-1,5-cyclodecadiene. As a non-limiting example, a terpene synthase may be heterologously expressed in a host cell, the sesquiterpenes produced by the recombinant host cell may be extracted, and the types of sesquiterpenes in the composition may be determined using gas chromatography-mass spectrometry. In some embodiments, a terpene synthase may be recombinantly expressed and is purified. In some embodiments, the sesquiterpenes produced by a purified terpene synthase may be extracted and the types of sesquiterpenes in the composition may be determined using gas chromatography-mass spectrometry.


In certain embodiments, an alpha-guaiene synthase is capable of producing a sesquiterpene composition that comprises at least 1%, at least 2%, at least 3%, at least 4%, at least 5%, at least 6%, at least 7%, at least 8%, at least 9%, at least 10%, at least 15%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at 70%, at least 80%, at least 90%, at least 95%, or 100% including any values in between of alpha-guaiene. In some embodiments, an alpha-guaiene synthase is capable of producing a sesquiterpene composition that comprises between 1% to 10%, between 5% to 20%, between 15% to 20%, between 16% and 20%, between 17% and 20%, between 18% and 20%, between 19% and 20%, between 20% and 25%, between 20% and 24%, between 20% and 23%, between 20% and 22%, between 20% and 21%, between 20% and 30%, between 30% and 40%, between 40% and 50%, between 50% and 60%, between 60% and 70%, between 70% and 80%, between 80% and 90%, or between 90% and 100%, including any values in between alpha-guaiene.


Rare and Extinct Plants

At least one portion of the sequence of the chimeric terpene synthases disclosed herein is derived from a rare or extinct plant. As used herein, the term “rare plant” or “rare plants” encompasses plants that are uncommon, scarce, infrequently encountered, endangered (e.g., threatened), vulnerable, only available in private collections, not found in the endemic location, only available in cultivation, and/or extinct. In some embodiments, a rare plant is a plant that is infrequently encountered (e.g., only encountered in a few locations such as 1, 2, 3, 4, or 5 locations). In some embodiments, a rare plant is an extinct plant. As used herein, an extinct plant refers to a species of plant: having no living members; classified as having no living members; or predicted by one of ordinary skill in the art to have no living members. As a non-limiting example, the International Union for Conservation of Nature (IUCN) Red list of Threatened Species may be used to determine the conservation status of a plant and identify rare plants. For example, plants classified as extinct, extinct in the wild, critically endangered, endangered, vulnerable, and near threatened on the IUCN Red List may be considered rare plants.


Non-limiting examples of rare plants include Leucadendron grandiflorum, Shorea cuspidata, Macrostylis villosa, Orbexilum stipulatum, Myrcia skeldingii, Nesiota Elliptica, Macrostylis villosa, Wendlandia angustofola, Erica Pyramidalis, Stenocarpus dumbeenis, Pradosia glaziovii, Crassula subulata, Hibiscadelphus wilderianus, and Erica foliacea.


In some embodiments, the rare plant may be Hibiscadelphus wilderianus. The Hibiscadelphus genus belongs to the tribe Hibisceae (Malvaceae) and members of the genus often have petals that form a tubular structure in which the lower petals are often shorter than the upper three petals (see, e.g., Oppenheimer et al., A new species of Hibiscadelphus Rock (Malvaceae, Hibisceae) from Maui, Hawaiian Islands; PhytoKeys, 2014; (39):65-75, which is hereby incorporated by reference in its entirety). The Hibiscadelphus genus is endemic to Hawaii and at least eight species have been described. Four of these species are extinct (including Hibiscadelphus bombycinus, Hibiscadelphus crucibracteatus, Hibiscadelphus wilderianus, and Hibiscadelphus woodii), two of these species only persist in cultivation (Hibiscadelphus giffardianus and Hibiscadelphus hualalaiensis), and two are extant in the wild (Hibiscadelphus distans and Hibiscadelphus stellatus).



Hibiscadelphus wilderianus is an extinct tree species last observed at an elevation of 2,600 feet in 1910 on the lava fields of Auwahi on the island of Maui in Hawaii (see, e.g., Radlkofer et al., New and Noteworthy Hawaiian Plants; Hawaiian Board of Agriculture and Forestry Botanical Bulletin, 1911; (1):1-15; “The IUCN Red List of Threatened Species: Hibiscadelphus wilderianus,” World Conservation Monitoring Centre, 1998, each of which is hereby incorporated by reference in its entirety). A description in Latin of Hibiscadelphus wilderianus can be found in the Radlkofer et al. original report. A photo of a tree branch with leaves and fruit was included in the original Radlkofer et al. report and is reproduced in FIG. 8A.


In some embodiments, the rare plant may be Leucadendron grandiflorum. Leucadendron is a dioecious genus that belongs to the Proteaceae family and is endemic to South Africa. Species in the Leucadendron genus include evergreen shrubs and often have cone-shaped infructescences (seed heads). There are at least 80 species in the Leucadendron genus including L. album, L. arcuatum, L. argenteum, L. barkerae, L. bonum, L. brunioides, L. burchellii, L. cadens, L. chamelaea, L. cinereum, L. comosum, L. concavum, L. conicum, L. coniferum, L. cordatum, L. coriaceum, L. corymbosum, L. cryptocephalum, L. daphnoides, L. diemontianum, L. discolor, L. dregei, L. dubium, L. elimense, L. ericifolium, L. eucalyptifolium, L. flexuosum, L. floridum, L. foedum, L. galpinii, L. gandogeri, L. glaberrimum, L. globosum, L. grandiflorum, L. gydoense, L. immoderatum, L. lanigerum, L. laureolum, L. laxum, L. levisanus, L. linifolium, L. loeriense, L. loranthifolium, L. macowanii, L. meridianum, L. meyerianum, L. microcephalum, L. modestum, L. muirii, L. nervosum, L. nitidum, L. nobile, L. olens, L. orientale, L. osbornei, L. platyspermum, L. pondoense, L. procerum, L. pubescens, L. pubibracteolatum, L. radiatum, L. remotum, L. roodii, L. rourkei, L. rubrum, L. salicifolium, L. salignum, L. sericeum, L. sessile, L. sheilae, L. singular, L. sorocephalodes, L. spirale, L. spissifolium, L. stellare, L. stelligerum, L. strobilinum, L. teretifolium, L. thymifolium, L. tinctura, L. tradouwense, L. uliginosum, L. verticillatum, and L. xanthoconus.



Leucadendron grandiflorum is also known commonly as Wynberg Conebush and was last observed in 1806 in Clapham, South Africa. Recorded sightings of Leucadendron grandiflorum have occurred on Wynberg Mountain and this species may have existed on the south slopes of Wynberg hill on moister granite soils (see, e.g., T. Rebelo, “Wynberg Conebush—extinct for 200 years,” iSpot, 25 Jul. 2015, which is hereby incorporated by reference in its entirety). Leucadendron grandiflorum has been described and depicted in Salisbury et al., The Paradisus Londinensis or Coloured Figures of Plants Cultivated in the Vicinity of the Metropolis. 1805; (Volume 1, part 2): 105; see www-dot-biodiversitylibrary.org-backslash-ia/mobot31753000575172 #page/248/mode/1up, the contents of each of which is hereby incorporated by reference in its entirety. No modern collections of Leucadendron grandiflorum have been recorded, and it is considered that this species was likely scarce or extinct by the early 1800s (see, e.g., T. Rebelo, “Wynberg Conebush—extinct for 200 years,” iSpot, 25 Jul. 2015; Catalogue of Life: Leucadendron grandiflorum (Salisb.) R. Br., 20 Dec. 2017). Sister species include L. globosum and L. elimense. FIG. 8B depicts Leucadendron grandiflorum.


In some embodiments, the rare plant may be Macrostylis villosa. The Macrostylis genus belongs to the Rutaceae family and includes at least ten species (e.g., Macrostylis barbigera, Macrostylis cassiopoides, Macrostylis cauliflora, Macrostylis crassifolia, Macrostylis decipiens, Macrostylis hirta, Macrostylis ramulosa, Macrostylis squarrosa, Macrostylis tenuis, and Macrostylis villosa).


There are two recognized subspecies of Macrostylis villosa, M. villosa (Thunb.) Sond. subsp. minor and M. villosa (Thunb.) Sond. subsp. villosa. M. villosa (Thunb.) Sond. subsp. minor is classified as extinct as its habitat was converted to agriculture and extensive searches have failed to relocate surviving plants. It was previously found on the Western Cape in South Africa and inhabited gravel and clay soil on slopes (see, e.g., “Red List of South African Plants: Macrostylis villosa subsp. minor,” 2005, which is hereby incorporated by reference in its entirety). M. villosa (Thunb.) Sond. subsp. villosa is considered endangered due to population loss from urban expansion, foreign plant invasions and conversion of habitat to agriculture. A picture of M. villosa (Thunb.) Sond. subsp. villosa is reproduced in FIG. 8C (see, e.g., “Red List of South African Plants: Macrostylis villosa subsp. villosa,” 2007, which is hereby incorporated by reference in its entirety).


In some embodiments, the rare plant may be Orbexilum stipulatum (Psoralea stipulata). Orbexilum belongs to the Fabaceae family and members of this genus often have characteristic pod walls that are rugose and free from hair. Orbexilum also may be distinguished by its “scarcely accrescent calyx” (see, e.g., Turner, Revision of the genus Orbexilum (Fabaceae: Psoraleeae). Lundellia. 2008; (11):1-7, which is hereby incorporated by reference in its entirety). Orbexilum species include O. chiapasanum, O. gracile, O. lupinellum, O. macrophyllum, O. melanocarpum, O. oliganthum, O. onobrychis, O. pedunculatum, O. simplex, O. stipulatum, and O. virgatum.



O. stipulatum, also known as the “Largestipule Leather-root” or as the “Falls-of-the-Ohio Scurfpea” was only found on Rock Island in Kentucky. The last recorded observation of O. stipulatum was in 1881, prior to resurfacing and flooding of this island. Despite many searches of similar habitats, including intensive searches in 1998, on both the Kentucky and Indiana shores of the Ohio River, this species has not been relocated. Therefore, this species has been classified as extinct (see, e.g., NatureServe Explorer: Orbexilum stipulatum—(Torr. & Gray) Rydb., 2016 and Baskin et al. described above, which is each hereby incorporated by reference in its entirety).



O. stipulatum was a perennial herb and had leaves that were divided into 3 leaflets, each about 2 cm in length. The species had a persistent appendage at the base of the leaves and was also described as having a corolla tube that did not extend beyond the calyx. It is likely that this plant bloomed in late May to mid-June, but seeds have not been observed in nature (see e.g., “NatureServe Explorer: Orbexilum stipulatum—(Torr. & Gray) Rydb.,” 2016; and Baskin et al., Geographical origin of the specimens of Orbexilum stipulatum (T. & G.) Rydb. (Psoralea stipulata T. & G.). Castanea. 1986; (51): 207-210, each of which is hereby incorporated by reference in its entirety). A picture of O. stipulatum may be found in Short, “Orbexilum stipulatum collected at Falls of the Ohio,” 1840 from The Philadelphia Herbarium at the Academy of Natural Sciences is reproduced in FIG. 8D.


In some embodiments, the rare plant may be Shorea cuspidata. Shorea is a genus in the Dipterocarpaceae family and includes many rainforest trees endemic to southeast Asia. Many Shorea species are angiosperms (flowering plants). Non-limiting examples of Shorea species may include Shorea affinis, Shorea congestiflora, Shorea cordifolia, Shorea disticha, Shorea megistophylla, Shorea trapezifolia, Shorea zeylanica, Shorea acuminatissima, Shorea alutacea, Shorea angustifolia, Shorea bakoensis, Shorea balanocarpoides, Shorea chaiana, Shorea collaris, Shorea cuspidata, Shorea faguetiana, Shorea faguetioides, Shorea gibbosa, Shorea hopeifolia, Shorea iliasii, Shorea induplicata, Shorea kudatensis, Shorea laxa, Shorea longiflora, Shorea longisperma, Shorea macrobalanos, Shorea mujongensis, Shorea multiflora, Shorea obovoidea, Shorea patoiensis, Shorea peltata, Shorea polyandra, Shorea richetia, Shorea subcylindrica, Shorea tenuiramulosa, and Shorea xanthophylla.



S. cuspidata is a tree endemic to Malaysia that is currently classified as extinct on the IUCN Red List (“The IUCN Red List: Shorea cuspidata,” 1998, which is incorporated in its entirety by reference), although there have been a few recorded sightings of S. cuspidata subsequent to this classification in Bako National Park, Lambir National Park, and the Semenggoh Arboretum (Ashton, Shorea cuspidata. Tree Flora of Sabah and Sarawek. 2004; (5):246-247; Ling et al., Diversity of the tree flora in Semenggoh Arboretum, Sarawak, Borneo. Gardens' Bulletin Singapore. 2012; (64):139-169, which is each incorporated by reference in its entirety). Shorea cuspidata may be considered a rare plant. Shorea cuspidata has been characterized as a medium-sized tree with flowers second and pale lime-yellow petals (see, e.g., Ashton, Man. Dipt. Brun. 1968: f. 10, pl. 14 (stem-base)). A picture of a Shorea cuspidata specimen is reproduced in FIG. 8E (“Kew Royal Botanical Gardens: Shorea cuspidata specimen K000700460,” 1962, which is hereby incorporated by reference in its entirety).


In some embodiments, the rare plant may be Wendlandia angustifolia. Wendlandia is a genus of flowering plants that belongs to the Rubiaceae family. Non-limiting examples of Wendlandia species may include Wendlandia aberrans, Wendlandia acuminata, Wendlandia amocana, Wendlandia andamanica, Wendlandia angustifolia, Wendlandia appendiculata, Wendlandia arabica, Wendlandia arborescens, Wendlandia augustini, Wendlandia basistaminea, Wendlandia bicuspidata, Wendlandia bouvardioides, Wendlandia brachyantha, Wendlandia brevipaniculata, Wendlandia brevituba, and Wendlandia buddleacea.



W. angustifolia is a plant native to India that is currently classified as extinct in the IUCN Red List (see “The IUCN Red List: Wendlandia angustifolia,” 1998, which is hereby incorporated by reference in its entirety). Subsequent to this classification, W. angustifolia was reportedly observed in Kalakkad Mundantharai Tiger Reserve in India (Viswanathan et al., Rediscovery of Wendlandia Angustifolia Wight Ex Hook.f. (Rubiaceae), from Tamil Nadu, a Species Presumed Extinct. Journal of The Bombay Natural History Society. 2000 97(2):311-313, which is hereby incorporated by reference in its entirety). W. angustifolia may be considered a rare plant. W. angustifolia has been described as a shrub or tree with ternately whorled and linear-lanceolate leaves (see, e.g., Viswanathan et al., Rediscovery of Wendlandia Angustifolia Wight Ex Hook.f. (Rubiaceae), from Tamil Nadu, a Species Presumed Extinct, Journal of The Bombay Natural History Society. 2000; 97(2):311-313, which is hereby incorporated by reference in its entirety). A picture of a specimen is reproduced in FIG. 8F (“Kew Royal Botanical Gardens: Wendlandia angustifolia K000030921,” collection date not recorded), which is hereby incorporated by reference in its entirety.


Methods of Producing Chimeric Terpene Synthases and Terpenes

Also described herein are nucleic acid molecules encoding chimeric terpene synthases. In some embodiments, at least 10% (e.g., at least 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more than 99%, including all values in between) of the nucleic acid molecule encoding such a chimeric terpene synthase may be derived from a rare or extinct plant.


In some instances, a nucleic acid molecule encoding a chimeric terpene synthase comprises a nucleotide sequence that is at least 50% (e.g., at least 55%, 60%, 65%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more than 99%, including all values in between) identical to a sequence selected from the group consisting of SEQ ID NOs: 67-118. In some instances, a nucleic acid molecule encoding a chimeric terpene synthase comprises a nucleotide sequence that is identical to a sequence selected from the group consisting of SEQ ID NOs: 67-118. In some instances, a nucleic acid molecule encoding a chimeric terpene synthase further comprises the nucleotide sequence TATA (TATA box sequence). In some instances, a nucleic acid molecule encoding a chimeric terpene synthase comprises the nucleotide sequence TATA (TATA box sequence) that is located N-terminal to a sequence selected from the group consisting of SEQ ID NOs: 67-118. In some instances, a nucleic acid molecule encoding a chimeric terpene synthase comprises a nucleotide sequence that encodes for a sequence set forth in SEQ ID NOs:119-357.


In some embodiments, at least 10% (e.g., at least 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more than 99%, including all values in between) of the amino acid sequence of the chimeric terpene synthase (e.g., a chimeric sesquiterpene synthase) may be derived from a rare or extinct plant. In some instances, a chimeric terpene synthase comprises one or more sequences set forth in SEQ ID NOs:119-357.


Also described herein are chimeric terpene synthases that are capable of producing alpha-guaiene. In some embodiments, at least 10% (e.g., at least 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more than 99%, including all values in between) of the nucleic acid molecule encoding such a chimeric terpene synthase may be derived from a rare or extinct plant.


In some embodiments, at least 10% (e.g., at least 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more than 99%, including all values in between) of the amino acid sequence of the chimeric terpene synthase that is capable of producing alpha-guaiene may be derived from a rare or extinct plant.


In some instances, construction of the chimeras may include sequence (e.g., nucleic acid sequence and/or amino acid sequence) alignments between at least two terpene synthases of interest. For example, sequence alignment analysis may be used to identify fragments (e.g., domains) of a particular terpene synthase to include in a chimeric terpene synthase. In some embodiments, the chimeric terpene synthase is a chimeric sesquiterpene synthase. Non-limiting examples of analyses may include the types described in the blastn-mapdamage and tblastn pipelines described in Example 2.


In some embodiments, a chimeric terpene synthase coding sequence comprises a mutation at 1, 2, 3, 4, 5, or more positions corresponding to a reference chimeric terpene synthase coding sequence. In some embodiments, the chimeric terpene synthase coding sequence comprises a mutation in 1, 2, 3, 4, 5, or more codons of the coding sequence relative to a reference chimeric terpene synthase coding sequence. As will be understood by one of ordinary skill in the art, a mutation within a codon may or may not change the amino acid that is encoded by the codon due to degeneracy of the genetic code. In some embodiments, the one or more mutations in the coding sequence do not alter the amino acid sequence of the chimeric terpene synthase relative to the amino acid sequence of a reference chimeric terpene synthase.


In some embodiments, the one or more mutations in a chimeric terpene synthase sequence alter the amino acid sequence of the chimeric terpene synthase relative to the amino acid sequence of a reference chimeric terpene synthase. In some embodiments, the one or more mutations alter the amino acid sequence of the chimeric terpene synthase relative to the amino acid sequence of a reference chimeric terpene synthase and alter (enhance or reduce) an activity of the chimeric terpene synthase relative to the reference chimeric terpene synthase.


The skilled artisan will also realize that mutations in a chimeric terpene synthase coding sequence may result in conservative amino acid substitutions to provide functionally equivalent variants of the foregoing polypeptides, e.g., variants that retain the activities of the polypeptides. As used herein, a “conservative amino acid substitution” refers to an amino acid substitution that does not alter the relative charge or size characteristics or functional activity of the protein in which the amino acid substitution is made.


In some instances, an amino acid is characterized by its R group (see, e.g., Table 1). For example, an amino acid may comprise a nonpolar aliphatic R group, a positively charged R group, a negatively charged R group, a nonpolar aromatic R group, or a polar uncharged R group. Non-limiting examples of an amino acid comprising a nonpolar aliphatic R group include alanine, glycine, valine, leucine, methionine, and isoleucine. Non-limiting examples of an amino acid comprising a positively charged R group includes lysine, arginine, and histidine. Non-limiting examples of an amino acid comprising a negatively charged R group include aspartic acid and glutamic acid. Non-limiting examples of an amino acid comprising a nonpolar, aromatic R group include phenylalanine, tyrosine, and tryptophan. Non-limiting examples of an amino acid comprising a polar uncharged R group include serine, threonine, cysteine, proline, asparagine, and glutamine.


Variants can be prepared according to methods for altering polypeptide sequence known to one of ordinary skill in the art such as are found in references which compile such methods, e.g., Molecular Cloning: A Laboratory Manual, J. Sambrook, et al., eds., Fourth Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 2012, or Current Protocols in Molecular Biology, F. M. Ausubel, et al., eds., John Wiley & Sons, Inc., New York, 2010.


Non-limiting examples of functionally equivalent variants of polypeptides may include conservative amino acid substitutions in the amino acid sequences of proteins disclosed herein. Conservative substitutions of amino acids include substitutions made amongst amino acids within the following groups: (a) M, I, L, V; (b) F, Y, W; (c) K, R, H; (d) A, G; (e) S, T; (f) Q, N; and (g) E, D. In some embodiments, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more than 20 residues can be changed when preparing variant polypeptides. In some embodiments, amino acids are replaced by conservative amino acid substitutions.









TABLE 1







Non-limiting Examples of Conservative


Amino Acid Substitutions









Original

Conservative Amino


Residue
R Group Type
Acid Substitutions





Ala
nonpolar aliphatic R group
Cys, Gly, Ser


Arg
positively charged R group
His, Lys


Asn
polar uncharged R group
Asp, Gln, Glu


Asp
negatively charged R group
Asn, Gln, Glu


Cys
polar uncharged R group
Ala, Ser


Gln
polar uncharged R group
Asn, Asp, Glu


Glu
negatively charged R group
Asn, Asp, Gln


Gly
nonpolar aliphatic R group
Ala, Ser


His
positively charged R group
Arg, Tyr, Trp


Ile
nonpolar aliphatic R group
Leu, Met, Val


Leu
nonpolar aliphatic R group
Be, Met, Val


Lys
positively charged R group
Arg, His


Met
nonpolar aliphatic R group
Ile, Leu, Phe, Val


Pro
polar uncharged R group



Phe
nonpolar aromatic R group
Met, Trp, Tyr


Ser
polar uncharged R group
Ala, Gly, Thr


Thr
polar uncharged R group
Ala, Asn, Ser


Trp
nonpolar aromatic R group
His, Phe, Tyr, Met


Tyr
nonpolar aromatic R group
His, Phe, Trp


Val
nonpolar aliphatic R group
Ile, Leu, Met, Thr









Amino acid substitutions in the amino acid sequence of a polypeptide to produce a chimeric terpene synthase (e.g., chimeric sesquiterpene synthase) variant having a desired property and/or activity can be made by alteration of the coding sequence of the chimeric terpene synthase (e.g., chimeric sesquiterpene synthase). Similarly, conservative amino acid substitutions in the amino acid sequence of a polypeptide to produce functionally equivalent variants of the polypeptide typically are made by alteration of the coding sequence of the chimeric terpene synthase (e.g., chimeric sesquiterpene synthase).


Mutations (e.g., substitutions) can be made in a nucleotide sequence by a variety of methods known to one of ordinary skill in the art. For example, mutations can be made by PCR-directed mutation, site-directed mutagenesis according to the method of Kunkel (Kunkel, Proc. Nat. Acad. Sci. U.S.A. 82: 488-492, 1985), or by chemical synthesis of a gene encoding a polypeptide.


Any suitable method, including circular permutation (Yu and Lutz, Trends Biotechnol. 2011 January; 29(1):18-25), may be used to produce variants. In circular permutation, the linear primary sequence of a polypeptide can be circularized (e.g., by joining the N-terminal and C-terminal ends of the sequence) and the polypeptide can be severed (“broken”) at a different location. Thus, the linear primary sequence of the new polypeptide may have low sequence identity (e.g., less than 80%, less than 75%, less than 70%, less than 65%, less than 60%, less than 55%, less than 50%, less than 45%, less than 40%, less than 35%, less than 30%, less than 25%, less than 20%, less than 15%, less than 10%, less or less than 5%, including all values in between) as determined by linear sequence alignment methods (e.g., Clustal Omega or BLAST). Topological analysis of the two proteins, however, may reveal that the tertiary structure of the two polypeptides is similar or dissimilar. Without being bound by a particular theory, a variant polypeptide created through circular permutation of a reference polypeptide and with a similar tertiary structure as the reference polypeptide can share similar functional characteristics (e.g., enzymatic activity, enzyme kinetics, substrate specificity or product specificity). In some instances, circular permutation may alter the secondary structure, tertiary structure or quaternary structure and produce an enzyme with different functional characteristics (e.g., increased or decreased enzymatic activity, different substrate specificity, or different product specificity). See, e.g., Yu and Lutz, Trends Biotechnol. 2011 January; 29(1):18-25.


It should be appreciated that in a protein that has undergone circular permutation, the linear amino acid sequence of the protein would differ from a reference protein that has not undergone circular permutation. However, one of ordinary skill in the art would be able to readily determine which residues in the protein that has undergone circular permutation correspond to residues in the reference protein that has not undergone circular permutation by, for example, aligning the sequences and detecting conserved motifs, and/or by comparing the structures or predicted structures of the proteins, e.g., by homology modeling.


Aspects of the present disclosure relate to the recombinant expression of genes encoding enzymes, functional modifications and variants thereof, as well as uses relating thereto.


A nucleic acid encoding any of the chimeric terpene synthases described herein may be incorporated into any appropriate vector through any method known in the art. For example, the vector may be an expression vector, including but not limited to a viral vector (e.g., a lentiviral, retroviral, adenoviral, or adeno-associated viral vector), any vector suitable for transient expression, or any vector for inducible expression (e.g., a galactose-inducible or doxycycline-inducible vector). A non-limiting example of a vector for expression of a chimeric terpene synthase (e.g., a chimeric sesquiterpene synthase) is described in Example 2 below.


In some embodiments, a vector replicates autonomously in the cell. A vector can contain one or more endonuclease restriction sites that are cut by a restriction endonuclease to insert and ligate a nucleic acid containing a gene described herein to produce a recombinant vector that is able to replicate in a cell. Vectors are typically composed of DNA, although RNA vectors are also available. Cloning vectors include, but are not limited to: plasmids, fosmids, phagemids, virus genomes and artificial chromosomes. As used herein, the terms “expression vector” or “expression construct” refer to a nucleic acid construct, generated recombinantly or synthetically, with a series of specified nucleic acid elements that permit transcription of a particular nucleic acid in a host cell (e.g., microbe), such as a yeast cell. In some embodiments, the nucleic acid sequence of a gene described herein is inserted into a cloning vector such that it is operably joined to regulatory sequences and, in some embodiments, expressed as an RNA transcript. In some embodiments, the vector contains one or more markers, such as a selectable marker as described herein, to identify cells transformed or transfected with the recombinant vector.


In some embodiments, a vector is capable of integrating into the genome of a host cell.


A coding sequence and a regulatory sequence are said to be “operably joined” or “operably linked” when the coding sequence and the regulatory sequence are covalently linked and the expression or transcription of the coding sequence is under the influence or control of the regulatory sequence. If the coding sequence is to be translated into a functional protein, the coding sequence and the regulatory sequence are said to be operably joined or linked if induction of a promoter in the 5′ regulatory sequence transcribes the coding sequence and if the nature of the linkage between the coding sequence and the regulatory sequence does not (1) result in the introduction of a frame-shift mutation, (2) interfere with the ability of the promoter region to direct the transcription of the coding sequence, or (3) interfere with the ability of the corresponding RNA transcript to be translated into a protein. Thus, a promoter region is operably joined or linked to a coding sequence if the promoter region transcribes the coding sequence and the transcript can be translated into the protein or polypeptide of interest.


In some embodiments, the nucleic acid encoding any of the proteins described herein is under the control of regulatory sequences (e.g., enhancer sequences). In some embodiments, a nucleic acid is expressed under the control of a promoter. The promoter can be a native promoter, e.g., the promoter of the gene in its endogenous context, which provides normal regulation of expression of the gene. Alternatively, a promoter can be a promoter that is different from the native promoter of the gene, e.g., the promoter is different from the promoter of the gene in its endogenous context. As used herein, a “heterologous promoter” or “recombinant promoter” is a promoter that is not naturally or normally associated with or that does not naturally or normally control transcription of a DNA sequence to which it is operably joined or linked. In some embodiments, a nucleotide sequence is under the control of a heterologous promoter.


In some embodiments, the promoter is a eukaryotic promoter. Non-limiting examples of eukaryotic promoters include TDH3, PGK1, PKC1, TDH2, PYK1, TPI1, AT1, CMV, EF1a, SV40, PGK1 (human or mouse), Ubc, human beta actin, CAG, TRE, UAS, Ac5, Polyhedrin, CaMKIIa, GAL1, GAL10, TEF1, GDS, ADH1, CaMV35S, Ubi, H1, U6, as would be known to one of ordinary skill in the art (see, e.g., Addgene website: blog.addgene.org/plasmids-101-the-promoter-region). In some embodiments, the promoter is a prokaryotic promoter (e.g., bacteriophage or bacterial promoter). Non-limiting examples of bacteriophage promoters include Pls1con, T3, T7, SP6, and PL. Non-limiting examples of bacterial promoters include Pbad, PmgrB, Ptrc2, Plac/ara, Ptac, Pm.


In some embodiments, the promoter is an inducible promoter. As used herein, an “inducible promoter” is a promoter controlled by the presence or absence of a molecule. Non-limiting examples of inducible promoters include chemically-regulated promoters and physically-regulated promoters. For chemically-regulated promoters, the transcriptional activity can be regulated by one or more compounds, such as alcohol, tetracycline, galactose, a steroid, a metal, or other compounds. For physically-regulated promoters, transcriptional activity can be regulated by a phenomenon such as light or temperature. Non-limiting examples of tetracycline-regulated promoters include anhydrotetracycline (aTc)-responsive promoters and other tetracycline-responsive promoter systems (e.g., a tetracycline repressor protein (tetR), a tetracycline operator sequence (tetO) and a tetracycline transactivator fusion protein (tTA)).


Non-limiting examples of steroid-regulated promoters include promoters based on the rat glucocorticoid receptor, human estrogen receptor, moth ecdysone receptors, and promoters from the steroid/retinoid/thyroid receptor superfamily. Non-limiting examples of metal-regulated promoters include promoters derived from metallothionein (proteins that bind and sequester metal ions) genes. Non-limiting examples of pathogenesis-regulated promoters include promoters induced by salicylic acid, ethylene or benzothiadiazole (BTH). Non-limiting examples of temperature/heat-inducible promoters include heat shock promoters. Non-limiting examples of light-regulated promoters include light responsive promoters from plant cells. In certain embodiments, the inducible promoter is a galactose-inducible promoter. In some embodiments, the inducible promoter is induced by one or more physiological conditions (e.g., pH, temperature, radiation, osmotic pressure, saline gradients, cell surface binding, or concentration of one or more extrinsic or intrinsic inducing agents). Non-limiting examples of an extrinsic inducer or inducing agent include amino acids and amino acid analogs, saccharides and polysaccharides, nucleic acids, protein transcriptional activators and repressors, cytokines, toxins, petroleum-based compounds, metal containing compounds, salts, ions, enzyme substrate analogs, hormones or any combination thereof.


In some embodiments, the promoter is a constitutive promoter. As used herein, a “constitutive promoter” refers to an unregulated promoter that allows continuous transcription of a gene. Non-limiting examples of a constitutive promoter includes CP1, CMV, EF1a, SV40, PGK1, Ubc, human beta actin, CAG, Ac5, polyhedrin, TEF1, GDS, CaM35S, Ubi, H1, and U6.


Other inducible promoters or constitutive promoters known to one of ordinary skill in the art are also contemplated herein.


The precise nature of the regulatory sequences needed for gene expression may vary between species or cell types, but generally include, as necessary, 5′ non-transcribed and 5′ non-translated sequences involved with the initiation of transcription and translation respectively, such as a TATA box, capping sequence, CAAT sequence, and the like. In particular, such 5′ non-transcribed regulatory sequences will include a promoter region which includes a promoter sequence for transcriptional control of the operably joined or linked gene. Regulatory sequences may also include enhancer sequences or upstream activator sequences. The vectors disclosed herein may include 5′ leader or signal sequences. The regulatory sequence may also include a terminator sequence. In some embodiments, a terminator sequence marks the end of a gene in DNA during transcription. The choice and design of one or more appropriate vectors suitable for inducing expression of one or more genes described herein in a heterologous organism is within the ability and discretion of one of ordinary skill in the art.


Expression vectors containing the necessary elements for expression are commercially available and known to one of ordinary skill in the art (see, e.g., Sambrook et al., Molecular Cloning: A Laboratory Manual, Fourth Edition, Cold Spring Harbor Laboratory Press, 2012).


Any suitable host cell may be used to produce any of the chimeric terpene synthases disclosed herein, including eukaryotic cells or prokaryotic cells. Suitable host cells include fungal cells (e.g., yeast cells) and bacteria cells (e.g., E. coli cells). Non-limiting examples of genera of yeast for expression include Saccharomyces (e.g., S. cerevisiae), Pichia, Kluyveromyces (e.g., K. lactis), Hansenula and Yarrowia. In some embodiments, the yeast strain is an industrial polyploid yeast strain. Other non-limiting examples of fungal cells include cells obtained from Aspergillus spp., Penicillium spp., Fusarium spp., Rhizopus spp., Acremonium spp., Neurospora spp., Sordaria spp., Magnaporthe spp., Allomyces spp., Ustilago spp., Botrytis spp., and Trichoderma spp.


The term “cell,” as used herein, may refer to a single cell or a population of cells, such as a population of cells belonging to the same cell line or strain. Use of the singular term “cell” should not be construed to refer explicitly to a single cell rather than a population of cells.


A vector encoding any of the chimeric terpene synthases (e.g., chimeric sesquiterpene synthases) described herein may be introduced into a suitable host cell using any method known in the art. Non-limiting examples of yeast transformation protocols are described in Example 2 below and in Gietz et al., Yeast transformation by the LiAc/SS Carrier DNA/PEG method. Methods Mol Biol. 2006; 313:107-20, which is hereby incorporated by reference in its entirety for this purpose. Host cells may be cultured under any conditions suitable as would be understood by one of ordinary skill in the art. For example, any media, temperature, and incubation conditions known in the art may be used. For host cells carrying an inducible vector, cells may be cultured with an appropriate inducible agent to promote expression.


Any of the cells disclosed herein can be cultured in media of any type (rich or minimal) and any composition prior to, during, and/or after contact and/or integration of a nucleic acid. The conditions of the culture or culturing process can be optimized through routine experimentation as understood by one of ordinary skill in the art. In some embodiments, the selected media is supplemented with various components. In some embodiments, the concentration and amount of a supplemental component is optimized. In some embodiments, other aspects of the media and growth conditions (e.g., pH, temperature, etc.) are optimized through routine experimentation. In some embodiments, the frequency that the media is supplemented with one or more supplemental components, and the amount of time that the cell is cultured is optimized.


Culturing of the cells described herein can be performed in culture vessels known and used in the art. In some embodiments, an aerated reaction vessel (e.g., a stirred tank reactor) is used to culture the cells. In some embodiments, a bioreactor or fermentor is used to culture the cell. Thus, in some embodiments, the cells are used in fermentation. As used herein, the terms “bioreactor” and “fermentor” are interchangeably used and refer to an enclosure, or partial enclosure, in which a biological, biochemical and/or chemical reaction takes place, involving a living organism or part of a living organism. A “large-scale bioreactor” or “industrial-scale bioreactor” is a bioreactor that is used to generate a product on a commercial or quasi-commercial scale. Large scale bioreactors typically have volumes in the range of liters, hundreds of liters, thousands of liters, or more.


In some embodiments, a bioreactor comprises a cell (e.g., a yeast cell) or a cell culture (e.g., a yeast cell culture), such as a cell or cell culture described herein. In some embodiments, a bioreactor comprises a spore and/or a dormant cell type of an isolated microbe (e.g., a dormant cell in a dry state).


Non-limiting examples of bioreactors include: stirred tank fermentors, bioreactors agitated by rotating mixing devices, chemostats, bioreactors agitated by shaking devices, airlift fermentors, packed-bed reactors, fixed-bed reactors, fluidized bed bioreactors, bioreactors employing wave induced agitation, centrifugal bioreactors, roller bottles, and hollow fiber bioreactors, roller apparatuses (for example benchtop, cart-mounted, and/or automated varieties), vertically-stacked plates, spinner flasks, stirring or rocking flasks, shaken multi-well plates, MD bottles, T-flasks, Roux bottles, multiple-surface tissue culture propagators, modified fermentors, and coated beads (e.g., beads coated with serum proteins, nitrocellulose, or carboxymethyl cellulose to prevent cell attachment).


In some embodiments, the bioreactor includes a cell culture system where the cell (e.g., yeast cell) is in contact with moving liquids and/or gas bubbles. In some embodiments, the cell or cell culture is grown in suspension. In other embodiments, the cell or cell culture is attached to a solid phase carrier. Non-limiting examples of a carrier system includes microcarriers (e.g., polymer spheres, microbeads, and microdisks that can be porous or non-porous), cross-linked beads (e.g., dextran) charged with specific chemical groups (e.g., tertiary amine groups), 2D microcarriers including cells trapped in nonporous polymer fibers, 3D carriers (e.g., carrier fibers, hollow fibers, multicartridge reactors, and semi-permeable membranes that can comprising porous fibers), microcarriers having reduced ion exchange capacity, encapsulation cells, capillaries, and aggregates. In some embodiments, carriers are fabricated from materials such as dextran, gelatin, glass, or cellulose.


In some embodiments, industrial-scale processes are operated in continuous, semi-continuous or non-continuous modes. Non-limiting examples of operation modes are batch, fed batch, extended batch, repetitive batch, draw/fill, rotating-wall, spinning flask, and/or perfusion mode of operation. In some embodiments, a bioreactor allows continuous or semi-continuous replenishment of the substrate stock, for example a carbohydrate source and/or continuous or semi-continuous separation of the product, from the bioreactor.


In some embodiments, the bioreactor or fermentor includes a sensor and/or a control system to measure and/or adjust reaction parameters. Non-limiting examples of reaction parameters include biological parameters (e.g., growth rate, cell size, cell number, cell density, cell type, or cell state, etc.), chemical parameters (e.g., pH, redox-potential, concentration of reaction substrate and/or product, concentration of dissolved gases, such as oxygen concentration and CO2 concentration, nutrient concentrations, metabolite concentrations, concentration of an oligopeptide, concentration of an amino acid, concentration of a vitamin, concentration of a hormone, concentration of an additive, serum concentration, ionic strength, concentration of an ion, relative humidity, molarity, osmolarity, concentration of other chemicals, for example buffering agents, adjuvants, or reaction by-products), physical/mechanical parameters (e.g., density, conductivity, degree of agitation, pressure, and flow rate, shear stress, shear rate, viscosity, color, turbidity, light absorption, mixing rate, conversion rate, as well as thermodynamic parameters, such as temperature, light intensity/quality, etc.). Sensors to measure the parameters described herein are well known to one of ordinary skill in the relevant mechanical and electronic arts. Control systems to adjust the parameters in a bioreactor based on the inputs from a sensor described herein are well known to one of ordinary skill in the art in bioreactor engineering.


Terpenes produced by any of the host cells disclosed herein may be extracted using any method known in the art. A non-limiting example of a method for sesquiterpene extraction is provided in Example 2. Any of the terpenes produced from the methods, compositions, or host cells described herein may be used in a suitable composition for topical application to, for example, skin, hair, clothing, or articles in a home (e.g., a perfume). As used herein, the term “perfume” is any fragrance formulation suitable for application to the hair, skin, or clothing of a person or an article in a home. This term includes, but is not limited to: an eau de cologne, eau de toilette, eau de parfum, perfume extract or extrait. In addition to comprising one or more terpenes of the application, such a perfume may include, for example, one or more natural oils, fixatives, emollients, or solvents.


Examples of natural oils which may be used in perfume formulations include, but are not limited to: amyris oil; Angelica seed oil; Angelica root oil; aniseed oil; valerian oil; basil oil; bay oil; mugwort oil; benzoin resin; bergamot oil; birch tar oil; bitter almond oil; savory oil; bucco-leaf oil; Cabreuva oil; cade oil; Calamus oil; camphor oil; Cananga oil; cardamom oil; Cascarilla oil; Cassia oil; Castoreum absolute; cedar-leaf oil; cedarwood oil; cistus oil; citronella oil; lemon oil; copaiba balsam oil; coriander oil; Costus root oil; cumin oil; cypress oil; Davana oil; dill oil; dillseed oil; elemi oil; tarragon oil; eucalyptus citriodora oil; eucalyptus oil; fennel oil; fir oil; galbanum oil; Geranium oil; grapefruit oil; guaiac wood oil; gurjun balsam oil; Helichrysum oil; ginger oil; iris root oil; Calamus oil; blue chamomile oil; Roman chamomile oil; carrot-seed oil; Cascarilla oil; pine-needle oil; spearmint oil; caraway oil; labdanum oil; lavandin oil; lavender oil; lemongrass oil; lovage oil; lime oil (e.g., distilled or pressed lime oil); linaloe oil: Litsea cubeba oil; bay leaf oil; mace oil; marjoram oil; mandarin oil; massoi bark oil; ambrette oil; clary sage oil; Myristica oil; myrrh oil; myrtle oil; clove leaf oil; clove flower oil; neroli oil; olibanum oil; Opopanax oil; orange oil; Origanum oil; palmar osa oil; patchouli oil; Perilla oil; Peru balsam oil; parsley leaf oil; parsley seed oil; petitgrain oil; peppermint oil; pepper oil; pimento oil; pine oil; pennyroyal oil; rosewood oil; rose oil; rosemary oil; Dalmatian sage oil; Spanish sage oil; sandalwood oil; celery seed oil; spike lavender oil; Japanese aniseed oil; Styrax oil; Tagetes oil; fir-needle oil; tea-tree oil; turpentine oil; thyme oil; tuberose absolute; vanilla extract; violet leaf absolute; Verbena oil; vetiver oil; juniper oil; wine-lees oil; wormwood oil; wintergreen oil; ylang oil; hyssop oil; civet absolute; cinnamon leaf oil; cinnamon bark oil; as well as fractions thereof or constituents isolated therefrom; and combinations thereof.


Other examples of compounds which may be used in perfume formulations may include: wood moss absolute; beeswax absolute; Cassia absolute; eau de brouts absolute; oakmoss absolute; Galbanum resin; Helichrysum absolute; iris root absolute; jasmine absolute; labdanum absolute; labdanum resin; lavandin absolute; lavender absolute; Mimosa absolute; tincture of musk; myrrh absolute; olibanum absolute; orange blossom absolute; rose absolute; Tolu balsam; Tonka absolute; as well as fractions thereof or constituents isolated therefrom; and combinations thereof.


As used herein, the term “emollient” means a fatty or oleaginous substance which increases tissue moisture content (and may, for example, render skin softer and more pliable). Emollients for use with the instant compounds and methods may include any appropriate animal fats/oils, vegetable oils, and/or waxes. As a non-limiting set of examples, an emollient for use with the instant compositions and methods may be of natural or synthetic origin and may include: cold-pressed almond oil, jojoba oil, sunflower oil, olive oil, hazelnut oil, avocado oil, safflower oil, grapeseed oil, coconut oil, wheat germ oil, apricot kernel oil, natural waxes and “butters” (e.g., unrefined beeswax, shea butter, jojoba butter, and/or cocoa butter), Schercemol™ LL Ester, Schercemol™ 1818 Ester, butylene glycol, capric/caprylic triglyceride, ceteareth-20, one or more fatty alcohols (e.g., cetearyl alcohol, cetyl alcohol, and/or coconut fatty acids), one or more silicones (e.g., cyclomethicone, dimethicone, and/or cyclopentasiloxane), emulsifying wax, petroleum jelly, fatty acids, glyceryl stearate, hydrogenated oils, isopropyl myristate, mineral oil, octyl palmitate, paraffin, squalene, stearic acid, palmitoyl proline, or magnesium palmitoyl glutamate.


As used herein, the term “fixative” means a compound used to equalize the vapor pressures (and thus the volatilities) of one or more compounds in the perfume. As a non-limiting set of examples, a fixative for use with the compounds and perfumes described herein may be: dipropylene glycol, diethyl phthalate, Hedione®, Abalyn™ D-E Methyl Ester of Rosin, Jojoba (such as Floraesters K-100 Jojoba or Floraesters K-20W Jojoba), Sepicide LD, and/or Foralyn™ 5020-F CG Hydrogenated Rosinate.


As used herein, the term “solvent” is the diluent used to create a perfume. As a non-limiting example, the solvent may be an alcohol (e.g., an ethyl alcohol), 1,2-hexanediol, 1,2-heptanediol, a neutral smelling oil (e.g., fractionated coconut oil or jojoba oil), or one or more volatile silicones. As a non-limiting example, Perfumers' Alcohol (a type of ethyl alcohol) may be used. Perfumers' Alcohol is prepared from 200 proof ethyl alcohol which may contain very small amounts of butyl alcohol, denatonium benzoate (Britex), and/or hexylene glycol. Various grades of Perfumers' Alcohol are available including SDA 40B 200 Proof and SDA-B 200 proof.


Additional compounds or fragrance materials for use in the perfume composition according to the disclosure may include any compounds which are customarily used in the field.


The present invention is further illustrated by the following Examples, which in no way should be construed as limiting. The entire contents of all of the references (including literature references, issued patents, published patent applications, and co-pending patent applications) cited throughout this application are hereby expressly incorporated by reference.


EXAMPLES
Example 1. Functional Characterization of Chimeric Terpene Synthases

Genomic DNA from 12 extinct plant samples were sequenced (Table 2). Sesquiterpene synthase (SQTS) fragments were recovered from seven plants (Table 11), but gaps in the sequencing prevented reconstruction of full-length genes. A library comprising 2,738 terpene synthase chimeras (containing sequence from sesquiterpene synthases from extant plants to fill the sequence gaps) was screened. The expression of 52 SQTS chimeras (sequences provided in Table 10) from six rare plants (Table 2) led to the production of sesquiterpenes in the screening strain. Methods and materials for each of the procedures described in this Example may be found in Example 2.









TABLE 2







Rare Plants that were Sequenced (The plants from which functional sesquiterpene


chimeras were reconstructed are shown bold face and underlined.)

















Year


Family
Genus
Species
Continent
Location
Extinct





Crassulaceae

Crassula


subulata

AFRICA
South
1900






Africa



Ericaceae

Erica


pyramidalis

AFRICA
South
1910






Africa





Malvaceae



custom-character


custom-character



OCEANIA




Hawaii




1910






Proteaceae



custom-character


custom-character



AFRICA




South




1806










Africa







Rutaceae



custom-character


custom-character



AFRICA




South




1980










Africa





Myrtaceae

Myrcia


skeldingii

AMERICA
Jamaica
1972


Rhamnaceae

Nesiota


elliptica

AFRICA
St. Helena
2003




Fabaceae



custom-character


custom-character



AMERICA




Kentucky




1881




Sapotaceae

Pradosia


glaziovii

AMERICA
Brazil
1997




Dipterocarpaceae



custom-character


custom-character



ASIA




Malaysia




1996




Proteaceae

Stenocarpus


dumbeensis

OCEANIA
New
1905






Caledonia





Rubiaceae



custom-character


custom-character



ASIA




India




1997











The terpenes produced by the functional SQTS chimeras were identified initially based on gas chromatography-mass spectrometry (GC/MS) data. In some cases, authentic standards or essential oils containing characterized sesquiterpenes were available to confirm mass spectrum- and retention time-based identifications. In other cases, standards were not available and structural identifications were made based on mass spectral analysis alone. The different methods used to identify the structures are detailed in Table 3, and the specific methods used to identify each sesquiterpene are indicated in Tables 4-9. In some cases, products were identified only as “sesquiterpene” or “sesquiterpenol.” In one case, a mass spectrum was recovered but did not yield a match in the NIST/internal database. This sesquiterpenol was identified in the product tables as an “unidentified sesquiterpenol” and additional characterization may be used to determine its structure.


Fourteen SQTS chimeras derived from Hibiscadelphus wilderianus produced 1 or more sesquiterpenes (FIG. 1, Table 3). Seven SQTS chimeras derived from Leucadendron grandiflorum also produced sesquiterpenes (FIG. 2, Table 5), as did six SQTS chimeras from Macrostylis villosa (FIG. 3, Table 6), two from O. stipulatum (FIG. 4, Table 7), six from Shorea cuspidata (FIG. 5, Table 8), and seventeen from Wendlandia angustifolia (FIG. 6, Table 9). The SQTSs were found to produce one to nine different terpenes. The product profiles of the plant SQTS chimeras were different when the functional SQTS chimeras were grouped by the terpenes produced in highest yield (FIG. 7). Delta-cadinene synthases were the most numerous group of functional chimeras at a total of 22 and were derived from four of the plants. 10 of the 14 of the synthases from H. wilderianus were of this variety. Alpha-cadinol was frequently detected as a minor product of the delta-cadinene synthases; however, three SQTS chimeras from S. cuspidata yielded more alpha-cadinol than delta-cadinene. These six SQTS chimeras derived S. cuspidata produced a very similar product mixture (Table 8, FIG. 13).


The screening of the 2,738-member chimeric sesquiterpene synthase library resulted in the successful expression of 52 functional chimeric sesquiterpene synthases (SQTSs). Fourteen synthases were derived from H. wilderianus, a tree which went extinct in Hawaii over 100 years ago. Cadinene, cadinol, and eudesmol-type sesquiterpenes were produced by these chimeras. A few active chimeras were also generated from O. stipulatum, a plant that went extinct in Kentucky in the 1800s. Two guaienes and gamma-bisabolene were produced by these synthases. Seven functional SQTS chimeras were constructed from L. grandiflorum, a plant that went extinct over 200 years ago. Diverse sesquiterpene and sesquiterpenol structures were produced by these chimeras, along with those derived from three other plants.









TABLE 3







The six methods used to identify the sesquiterpenes


produced by the sesquiterpene synthases.









Method




#
Description
Notes





1
Mass spectrum and retention
High confidence



time matched to
in structure



authentic standard
and stereochemistry.


2
Mass spectrum and retention
High confidence



time matched to previously
in structure



characterized compounds in
and stereochemistry.



essential oils from plants.



3
Poor mass spectrum obtained
Fairly high confidence



due to low titer; retention
in structure and



time and chimera
stereochemistry.



product profile were consistent




with authentic standards or




components in essential oils



4
Strong mass spectrum match to
Fairly high confidence in



compound in NIST/internal
structure, could be an



database
isomer.


5
Poor mass spectrum
Fairly high confidence in



obtained due to low titer;
structure, could be an



retention time and chimera
isomer.



product profiles




matched to terpenes




identified using method # 4



6
Poor mass spectrum obtained
Lower confidence based



due to low yields, best (closest)
on the mass spectral



identification possible with
data available.



NIST/internal database
















TABLE 4







Functional sesquiterpene synthase chimeras derived


from H. wilderianus sequences and their associated products.












% rare

Identi-
%



se-
Terpene
fication
com-


Chimera name
quence
identification
Method1
position2






HibWilSQTS117

49%
delta-cadinene
3
100%


HibWilSQTS118
50%
delta-cadinene
3
100%



HibWilSQTS120

46%
delta-cadinene
3
 13%




epi-cubenol
5
  3%




sesquiterpenol
6
  2%




tau-cadinol
2
 82%


HibWilSQTS121
50%
delta-cadinene
2
 99%




alpha-cadinol
3
  1%



HibWilSQTS123

47%
delta-cadinene
2
 99%




alpha-cadinol
3
  1%


HibWilSQTS124
48%
delta-cadinene
2
 98%




alpha-cadinol
3
  2%


HibWilSQTS126
44%
delta-cadinene
2
 97%




alpha-cadinol
3
  3%



HibWilSQTS19

12%
gamma-selinene
4
  1%




10-epi-gamma-eudesmol
2
  2%




gamma-eudesmol
2
 49%




alpha/beta-eudesmol3
4
 22%




juniper camphor
6
  1%




7-epi-alpha-eudesmol
4
  1%




cryptomeridiol isomer 1
4
  1%




cryptomeridiol isomer 2
4
  2%




cryptomeridiol isomer 3
4
 21%


HibWilSQTS34
13%
sesquiterpene
6
  6%




10-epi-gamma-eudesmol
3
 15%




gamma-eudesmol
3
 27%




alpha/beta-eudesmol3
5
 52%


HibWilSQTS52
12%
delta-cadinene
2
 60%




tau-cadinol
3
  9%




alpha/beta-eudesmol3
4
 31%


HibWilSQTS54
13%
delta-cadinene
2
 99%




alpha-cadinol
2
  1%


HibWilSQTS55
12%
delta-cadinene
3
 71%




tau-cadinol
3
  6%




alpha-cadinol
3
 23%


HibWilSQTS63
12%
sesquiterpene
6
 11%




delta-cadinene
2
 29%




sesquiterpenol
6
 15%




sesquiterpenol
6
  5%




tau-cadinol
3
 10%




alpha-cadinol
3
 30%


HibWilSQTS90
25%
sesquiterpene
6
 40%




alpha/beta-eudesmol3
5
 60%






1The structure identification ranking key is defined in Table 3, with lower numbers indicating a higher degree of confidence.




2The composition of total sesquiterpenes from each chimera was a rough estimate based on a common ion count (m/z 204.2). The ratio of metabolites may have been different in the production strains and it is possible other minor metabolites were detected when samples were prepared. Representative GC/MS chromatograms for the chimeras with bold font can be found in FIG. 9.




3Co-eluted under these run conditions. The peak was partially resolved under longer run conditions, about 6/4 alpha/beta-eudesmol.














TABLE 5







Functional sesquiterpene synthase chimeras derived


from L. grandiflorum sequences and their associated products.












% rare

Identi-




se-
Terpene
fication
% com-


Chimera name
quence
identification
Method1
position2






LeuGraSQTS335

14%
sesquiterpene
6
  1%




10-epi-gamma-
3
  1%




eudesmol






gamma-eudesmol
3
 49%




alpha/beta-eudesmol3
5
 23%




cryptomeridiol
5
  1%




isomer 2






cryptomeridiol
5
 25%




isomer 3





LeuGraSQTS345

12%
Humulene
3
100%



LeuGraSQTS365

11%
alpha-guaiene
3
 20%




delta-guaiene
3
 80%


LeuGraSQTS377
14%
delta-cadinene
3
 98%




alpha-cadinol
3
  2%


LeuGraSQTS379
12%
delta-cadinene
3
 98%




alpha-cadinol
3
  2%



LeuGraSQTS385

13%
Zingiberene
4
 55%




beta-bisabolene
2
 19%




beta-famesene
1
  6%




beta-
2
  6%




sesquiphellandrene






Cubenol
5
  5%




alpha-bisabolol
1
  4%




alpha-curcumene
5
  3%




trans-nerolidol
1
  2%



LeuGraSQTS393

10%
gamma-bisabolene
4
100%






1The structure identification ranking key is defined in Table 3, with lower numbers indicating a higher degree of confidence.




2The composition of total sesquiterpenes from each chimera was a rough estimate based on a common ion count (m/z 204.2). The ratio of metabolites may have been different in the production strains and other minor metabolites may have been detected when samples were prepared. Representative GC/MS chromatograms for the chimeras with bold font can be found in FIG. 10 and FIG. 11.




3Co-eluted under these run conditions. The peak was partially resolved under longer run conditions, about 6/4 alpha/beta-eudesmol.














TABLE 6







Functional sesquiterpene synthase chimeras derived from



M. villosa sequences and their associated products.















Identi-




% rare
Terpene
fication
% com-


Chimera name
sequence
identification
Method1
position2















MacVolSQTS1139

14%
alpha-guaiene
3
19%




delta-guaiene
3
81%



MacVolSQTS2198

62%
beta-caryophyllene
1
85%




Humulene
1
15%


MacVolSQTS2202
69%
beta-caryophyllene
1
86%




Humulene
1
14%


MacVolSQTS2222
69%
beta-caryophyllene
1
86%




Humulene
1
14%


MacVolSQTS2251
65%
beta-caryophyllene
1
87%




Humulene
1
13%



MacVolSQTS2274

38%
unknown
6
16%




sesquiterpene






trans-Sesquisabinene
5
14%




hydrate






delta-elemene
6
34%




unknown
6
16%




sesquiterpene






1The structure identification ranking key is defined in Table 3, with lower numbers indicating a higher degree of confidence.




2The composition of total sesquiterpenes from each chimera was a rough estimate based on a common ion count (m/z 204.2). The ratio of metabolites may have been different in the production strains and other minor metabolites may have been detected when samples were prepared. Representative GC/MS chromatograms for the chimeras with bold font can be found in FIG. 12.














TABLE 7







Functional sesquiterpene synthase chimeras derived from



O. stipulatum sequences and their associated products.















Identi-




% rare
Terpene
fication
% com-


Chimera name
sequence
identification
Method1
position2














OrbStiSQTS1368
10%
gamma-bisabolene
5
100%


OrbStiSQTS1414
42%
alpha-guaiene
3
 21%




delta-guaiene
3
 79%






1The structure identification ranking key is defined in Table 3, with lower numbers indicating a higher degree of confidence.




2The composition of total sesquiterpenes from each chimera was a rough estimate based on a common ion count (m/z 204.2). The ratio of metabolites may have been different in the production strains and other minor metabolites may have been detected when samples were prepared.














TABLE 8







Functional sesquiterpene synthase chimeras derived


from S. cuspidata sequences and their associated products.












% rare
Terpene
Identification
% com-


Chimera name
sequence
identification
Method1
position2















ShoCusSQTS154

38%
delta-cadinene
3
41%




Sesquiterpene
6
41%




alpha-cadinol
3
18%


ShoCusSQTS155
35%
delta-cadinene
3
41%




Sesquiterpene
6
41%




alpha-cadinol
3
18%



ShoCusSQTS156

36%
alpha-cadinol
2
34%




delta-cadinene
2
25%




beta-caryophyllene
1
10%




tau-cadinol
2
10%




Sesquiterpene
6
10%




Sesquiterpene
6
 7%




Humulene
1
 4%


ShoCusSQTS157
38%
alpha-cadinol
3
59%




Sesquiterpene
6
25%




tau-cadinol
3
16%


ShoCusSQTS160
36%
alpha-cadinol
3
33%




Sesquiterpene
6
32%




delta-cadinene
3
 5%


ShoCusSQTS161
37%
delta-cadinene
3
36%




alpha-cadinol
3
34%




Sesquiterpene
6
12%




tau-cadinol
3
10%




beta-caryophyllene
3
 5%




Sesquiterpene
6
 3%






1The structure identification ranking key is defined in Table 3, with lower numbers indicating a higher degree of confidence.




2The composition of total sesquiterpenes from each chimera was a rough estimate based on a common ion count (m/z 204.2). The ratio of metabolites may have been different in the production strains and it is possible other minor metabolites were detected when samples were prepared. Representative GC/MS chromatograms for the chimeras with bold font can be found in FIG. 13.














TABLE 9







Functional sesquiterpene synthase chimeras derived from



W. angustifolia sequences and their associated products.













% rare

Identi-




se-
Terpene
fication
% com-


Chimera name
quence
identification
Method1
position2















WenAngSQTS1007

81%
cis-eudesm-6-en-11-ol
4
100%


WenAngSQTS1086
80%
Daucene
5
  5%




isodaucene
5
  6%




sesquiterpene
6
  4%




cis-eudesm-6-en-11-ol
4
 85%


WenAngSQTS267
11%
gamma-eudesmol
3
 66%




alpha/beta-eudesmol3
5
 15%




cryptomeridiol
5
 19%




isomer 3





WenAngSQTS302

17%
sesquiterpene
6
  2%




trans-bergamotene
4
  5%




alpha-zingiberene
4
 56%




sesquisabinene hydrate
4
 20%




beta-
2
  7%




sesquiphellandrene






trans-nerolidol
1
  2%




sesquiterpenol
6
  4%




sesquiterpenol
6
  4%


WenAngSQTS738
46%
Sesquiterpene
6
  6%




sesquiterpene
6
  7%




delta-cadinene
2
 36%




unidentified
4
 27%




sesquiterpenol






tau-cadinol
3
 15%




alpha-cadinol
3
  9%


WenAngSQTS760
43%
Sesquiterpene
6
  9%




Sesquiterpene
6
  4%




Sesquiterpene
6
  6%




delta-cadinene
2
 41%




sesquiterpenol
6
 22%




tau-cadinol
3
 11%




alpha/beta-eudesmol3
5
  7%



WenAngSQTS780

41%
sesquiterpene
6
  9%




sesquiterpene
6
  3%




sesquiterpene
6
  6%




delta-cadinene
2
 40%




sesquiterpenol
6
 24%




tau-cadinol
3
 11%




alpha/beta-eudesmol3
5
  7%



WenAngSQTS793

75%
Daucene
5
  3%




beta-farnesene
1
  2%




8-Isopropeny1-1,5-
4
  5%




dimethyl-1,5-






cyclodecadiene






sesquiterpene
6
  3%




cis-eudesm-6-en-11-ol
4
 87%


WenAngSQTS805
42%
sesquiterpene
6
  5%




sesquiterpene
6
  6%




delta-cadinene
2
 39%




unidentified
4
 27%




sesquiterpenol






tau-cadinol
3
 15%




alpha-cadinol
3
  8%


WenAngSQTS826
47%
delta-cadinene
3
 42%




sesquiterpenol
6
 36%




tau-cadinol
3
 22%


WenAngSQTS829
74%
cis-eudesm-6-en-11-ol
5
100%


WenAngSQTS843
45%
delta-cadinene
3
 53%




sesquiterpenol
6
 47%


WenAngSQTS848
84%
cis-eudesm-6-en-11-ol
5
100%



WenAngSQTS849

75%
Daucene
4
  3%




beta-farnesene
1
  1%




isodaucene
4
  8%




sesquiterpene
6
  2%




cis-eudesm-6-en-11-ol
4
 86%


WenAngSQTS864
81%
Daucene
5
  2%




8-Isopropeny1-1,5-
4
  5%




dimethyl-1,5-






cyclodecadiene






sesquiterpene
6
  3%




cis-eudesm-6-en-11-ol
4
 90%


WenAngSQTS925
80%
sesquiterpene
6
  3%




sesquiterpene
6
  8%




sesquiterpene
6
  3%




cis-eudesm-6-en-11-ol
5
 86%



WenAngSQTS960

81%
delta-cadinene
2
 99%




alpha-cadinol
3
  1%






1The structure identification ranking key is defined in Table 3, with lower numbers indicating a higher degree of confidence.




2The composition of total sesquiterpenes from each chimera was a rough estimate based on a common ion count (m/z 204.2). The ratio of metabolites may have been different in the production strains and it is possible other minor metabolites were detected when samples were prepared. Representative GC/MS chromatograms for the chimeras with bold font can be found in Appendix FIG. 14 and FIG. 15.




3Co-eluted under these run conditions. The peak was partially resolved under longer run conditions, about 6/4 alpha/beta-eudesmol.














TABLE 10







Amino acid (AA) and nucleic acid sequences of sesquiterpene chimeras.













Extant

Chimera AA se-














scaf-

quence (beginning




Rare
fold
%
after the first



Chimera
DNA
Uni-
Rare
encoded amino



Name
source
prot #
DNA
acid M)
Chimera Nucleic Acid Sequence





HibWilS

Hibis-

Q9SAN0
49%
ASQASQVLASPHPAISS
atggccagtcaggcttcacaagttttagcatctcc


QTS117

cadel-



ENRPKADFHPGIWGDM
ccacccagctatatcctctgaaaaccggccaaag




phus



FIICPDTDIDAATELQYE
gctgatttccatcctggtatctggggcgacatgttt




wilder-



ELKAQVRKMIMEPVDD
attatctgtccagatacggacattgatgccgctac




ianus



SNQKLPFIDAVQRLGVS
agagctgcaatatgaagaattgaaagcgcaagtc






YHFEKEIEDELENIYRD
cgcaagatgatcatggaaccagtagacgattcta






TNNNDADTDLYTTALR
atcaaaagctaccattcattgacgctgttcaaagg






FRLLREHGFDISCDAFN
ctcggagtgagctaccactttgaaaaagaaattga






KLKDEEGNFKASLTSD
agacgaacttgaaaacatctaccgtgataccaata






VPGLLELYEASYLRVH
acaacgacgcagacactgatctatacactaccgc






GEDILDEAISFATAQLT
cttgagattcagattattgagagagcatggttttgat






LALPTLHHPLSEQVGH
atttcctgcgatgctttcaacaagttgaaagacga






ALKQSIRRGLPRVEARN
agaaggtaatttcaaggcttcgttgacttctgacgt






FISIYQDLESHNKALLQ
tcctggtttgttagaactctatgaggcttcctacttg






FAKIDFNMLQLLHRKE
agagtccacggtgaagatatcctagatgaagcca






LSEICRWWKDLDFTRK
tatctttcgctactgctcagttaaccttggctttgcc






LPFARDRVVEGYFWIM
aactttgcatcacccgctttcagagcaagttggtc






GVYFEPQYSLGRKMLT
acgcattgaagcaaagtatcagaagaggcctgc






KVIAMASIVDDTYDSFA
caagagttgaagccagaaactttatctctatttacc






TYDELIPYTDAIERWDI
aagatttagaatcccacaataaggctttgttgcaat






KCMNQLPNYMQISYKA
tcgccaaaattgactttaacatgttacaattgctaca






LLDVYEEMEQLLADKG
taggaaggagctcagcgaaatttgtagatggtgg






RQYRVEYAKKAMIRLV
aaagatcttgattttaccagaaagttacctttcgctc






QAYLLEAKWTHLNYKP
gtgaccgtgtcgtcgaaggttatttctggattatgg






TFEEFRDNALPTSGYA
gagtttacttcgaaccacaatatagcttgggtaga






MLAITAFVGMGEVITPE
aagatgttgaccaaggttattgctatggcttctatc






TFEWAASDPKIIKASTII
gtcgatgatacatacgattccttcgctacttacgac






CRFMDDIAEHKFNHRR
gaattgataccatatactgacgccatcgaaagatg






EDDCSAIECYMEQYKV
ggacatcaagtgtatgaatcagctgccaaactata






TAQEAYDEFNKHIESS
tgcaaatttcgtacaaagcgttattggatgtatacg






WKDVNEEFLKPTEMPT
aggaaatggaacaattgcttgcagataaaggtcg






PVLCRSLNLARVMDVL
acagtacagagtggaatacgctaagaaagctatg






YREGDGYTHVGKAAK
attcggttggtgcaagcatatttgttagaagcgaa






GGITSLLIDPIQI
gtggacccatttaaactacaagccaactttcgaag






(SEQ ID NO: 1)
aatttagagacaatgctttgccgacatctgggtatg







ccatgctagctataaccgcgttcgttggtatgggt







gaagttatcacgccagaaacctttgaatgggctg







cttctgacccaaagattattaaggcctccactatca







tctgccgctttatggatgatatcgctgagcataagt







tcaaccacagaagggaggatgactgttccgctat







tgaatgttacatggagcaatacaaagtcacagctc







aagaagcatacgacgaatttaacaagcacataga







atcgtcttggaaggacgttaatgaagagttcttga







aaccaactgaaatgcctactccggtactgtgtaga







agtttgaacctagccagagtcatggatgttttgtac







agagaaggtgacggttatactcatgttggaaaag







ccgctaagggtggtataacatcacttcttatcgatc







ccattcaaatctaa (SEQ ID NO: 67)





HibWilS

Hibis-

Q9SAN0
50%
ASQASQVLASPHPAISS
atggccagtcaggcttcacaagttttagcatctcc


QTS118

cadel-



ENRPKADFHPGIWGDM
ccacccagctatatcctctgaaaaccggccaaag




phus



FIICPDTDIDAATELQYE
gctgatttccatcctggtatctggggcgacatgttt




wilder-



ELKAQVRKMIMEPVDD
attatctgtccagatacggacattgatgccgctac




ianus



SNQKLPFIDAVQRLGVS
agagctgcaatatgaagaattgaaagcgcaagtc






YHFEKEIEDELENIYRD
cgcaagatgatcatggaaccagtagacgattcta






TNNNDADTDLYTTALR
atcaaaagctaccattcattgacgctgttcaaagg






FRLLREHGFDISCEAFN
ctcggagtgagctaccactttgaaaaagaaattga






KLKDEEGNFKASLTSD
agacgaacttgaaaacatctaccgtgataccaata






VRGLLELYQASYMRIH
acaacgacgcagacactgatctatacactaccgc






GEDILDEAISFTTAQLTL
cttgagattcagattattgagagagcatggttttgat






ALPTLDPPLSEQVGHAL
atttcctgcgaagctttcaacaagttgaaagacga






KQSIRRGLPRVEARNFI
agagggtaatttcaaggcttcgttgacttctgatgtt






SIYQDLESHNKALLQFA
agaggtttgttagaactctatcaggcttcctacatg






KIDFNMLQLLHRKELSE
agaatccacggtgaagatattcttgatgaagccat






ICRWWKDLDFTRKLPF
atctttcaccactgctcaattaaccttggctttgcct






ARDRVVEGYFWIMGV
actttggatcccccattgtcagagcaagtcggtca






YFEPQYSLGRKMLTKVI
tgccctaaagcagagtataagaagaggcctacc






AMASIVDDTYDSFATY
aagagttgaagccagaaactttatctctatttacca






DELIPYTDAIERWDIKC
agacttggaatcccacaataaggctttattgcaatt






MNQLPNYMQISYKALL
cgctaaaattgactttaacatgttacaattgctacat






DVYEEMEQLLADKGR
aggaaggagctcagcgaaatctgtcgttggtgga






QYRVEYAKKAMIRLVQ
aagatcttgattttactagaaagttgcctttcgcacg






AYLLEAKWTHLNYKPT
ggaccgtgtcgttgaaggttatttctggattatggg






FEEFRDNALPTSGYAM
agtttacttcgaaccacaatatagcttgggtagaa






LAITAFVGMGEVITPET
agatgttgaccaaggttattgctatggcttctatcgt






FEWAASDPKIIKASTIIC
cgatgatacatacgattccttcgctacatacgacg






RFMDDIAEHKFNHRRE
aattgatcccatatactgacgccattgaaagatgg






DDCSAIECYMKQYGAT
gacatcaagtgtatgaatcaactgccaaactatat






AQEAYDEFNKHIESSW
gcaaatttcgtacaaagcattattggatgtatacga






KDVNEEFLKPTEMPTP
ggaaatggaacaattgcttgcggataaaggtcgg






VLCRSLNLARVMDVLY
cagtacagagtggaatacgctaagaaagctatga






REGDGYTHVGKAAKG
ttcgattggtacaagcatatttattagaagcgaagt






GITSLLIDPIQI
ggactcacttgaactacaagccaaccttcgaaga






(SEQ ID NO: 2)
atttagagacaatgctttaccgacatctgggtatgc







tatgcttgctataaccgcgttcgttggtatgggtga







agtcatcacgccagaaacttttgaatgggccgctt







ctgacccgaagattatcaaggcttccactatcatct







gccgctttatggatgatatcgctgagcataagttca







accacagaagggaggatgactgttccgctattga







atgttacatgaagcaatacggtgcaaccgcccaa







gaggcatacgacgaatttaacaaacacatagaat







cgtcttggaaggacgttaatgaagagttcttgaaa







ccaactgaaatgcctactccagtgctgtgtagaag







tttgaaccttgctagagtcatggatgttttgtacaga







gaaggtgacggttatactcatgtcgggaaagccg







ctaagggtggtataacctcattgctaattgatccca







ttcaaatctaa







(SEQ ID NO: 68)





HibWilS

Hibis-

Q9SAN0
46%
ASQASQVLASPHPAISS
atggccagtcaggcttcacaagttttagcatctcc


QTS120

cadel-



ENRPKADFHPGIWGDM
ccacccagctatatcctctgaaaaccggccaaag




phus



FIICPDTDIDAATELQYE
gctgatttccatcctggtatctggggcgacatgttt




wilder-



ELKAQVRKMIMEPVDD
attatctgtccagatacggacattgatgccgctac




ianus



SNQKLPFIDAVQRLGVS
agagctgcaatatgaagaattgaaagcgcaagtc






YHFEKEIEDELENIYRD
cgcaagatgatcatggaaccagtagacgattcta






TNNNDADTDLYTTALR
atcaaaagctaccattcattgacgctgttcaaagg






FRLLREHGFDISCDAFN
ctcggagtgagctaccactttgaaaaagaaattga






KLKDEEGNFKASLTSD
agacgaacttgaaaacatctaccgtgataccaata






VPGLLELYEASYLRVH
acaacgacgcagacactgatctatacactaccgc






GEDILDEAISFATAQLT
cttgagattcagattattgagagagcatggttttgat






LALPTLHHPLSEQVGH
atttcctgcgatgctttcaacaagttgaaagacga






ALKQSIRRGLPRVEARN
agaaggtaatttcaaggcttcgttgacttctgacgt






FISIYQDLESHNKALLQ
tcctggtttgttagaactctatgaggcttcctacttg






FAKIDFNMLQLLHRKE
agagtccacggtgaagatatcctagatgaagcca






LSEICRWWKDLDFTRK
tatctttcgctactgctcagttaaccttggctttgcc






LPFARDRVVEGYFWIM
aactttgcatcacccgctttcagagcaagttggtc






GVYFEPQYSLGRKMLT
acgcattgaagcaaagtatcagaagaggcctgc






KVIAMASIVDDTYDSFA
caagagttgaagccagaaactttatctctatttacc






TYDELIPYTDAIERWDI
aagatttagaatcccacaataaggctttgttgcaat






KCMNQLPNYMQISYKA
tcgccaaaattgactttaacatgttacaattgctaca






LLDVYEEMEQLLADKG
taggaaggagctcagcgaaatttgtagatggtgg






RQYRVEYAKKAMIRLV
aaagatcttgattttaccagaaagttacctttcgctc






QAYLLEAKWTHLNYKP
gtgaccgtgtcgtcgaaggttatttctggattatgg






TFEEFRDNALPTSGYA
gagtttacttcgaaccacaatatagcttgggtaga






MLAITAFVGMGEVITPE
aagatgttgaccaaggttattgctatggcttctatc






TFEWAASDPKIIKASTII
gtcgatgatacatacgattccttcgctacttacgac






CRFMDDIAEHKFNHRR
gaattgataccatatactgacgccatcgaaagatg






EDDCSAIECYMKQYGA
ggacatcaagtgtatgaatcagctgccaaactata






TAQEAYDEFNKHIESS
tgcaaatttcgtacaaagcgttattggatgtatacg






WKDVNEEFLKPTEMPT
aggaaatggaacaattgcttgcagataaaggtcg






PVLCRSLNLARVMDVL
acagtacagagtggaatacgctaagaaagctatg






YREGDGYTHVGKAAK
attcggttggtgcaagcatatttgttagaagcgaa






GGITSLLIDPIQI
gtggacccatttaaactacaagccaactttcgaag






(SEQ ID NO: 3)
aatttagagacaatgctttgccgacatctgggtatg







ccatgctagctataaccgcgttcgttggtatgggt







gaagttatcacgccagaaacctttgaatgggctg







cttctgacccaaagattattaaggcctccactatca







tctgccgctttatggatgatatcgctgagcataagt







tcaaccacagaagggaggatgactgttccgctat







tgaatgttacatgaagcaatacggtgcaacagctc







aagaggcatacgacgaatttaacaaacacataga







atcgtcttggaaggacgtcaatgaagagttcttga







aaccaactgaaatgcctactccggtactgtgtaga







agtttgaacctagccagagtcatggatgttttgtac







agagaaggtgacggttatactcatgttgggaaag







ccgctaagggtggtataacatcacttcttatcgatc







ccattcaaatctaa







(SEQ ID NO: 69)





HibWilS

Hibis-

Q9SAN0
50%
ASQASQVLASPHPAISS
atggcctcacaggcttcccaagttttagcatctcct


QTS121

cadel-



ENRPKADFHPGIWGDM
cacccagctatatcttccgaaaaccgtccaaagg




phus



FIICPDTDIDAATELQYE
ctgatttccatccaggtatctggggcgacatgttta




wilder-



ELKAQVRKMIMEPVDD
ttatctgtccagatacagacattgatgccgctacc




ianus



SNQKLPFIDAVQRLGVS
gagttgcaatatgaagaattgaaagcccaagtca






YHFEKEIEDELENIYRD
gaaagatgatcatggaaccagttgacgattctaat






TNNNDADTDLYTTALR
caaaagttgcctttcattgacgctgtccaaagattg






FRLLREHGFDISCEAFN
ggtgtttcataccactttgaaaaagaaattgaaga






KLKDEEGNFKASLTSD
cgaattagaaaacatctacagagatactaataaca






VRGLLELYQASYMRIH
acgacgcagacactgatttgtacaccactgccttg






GEDILDEAISFTTAQLTL
agattcagattattgcgtgagcatggttttgatattt






ALPTLDPPLSEQVGHAL
cttgcgaagctttcaacaagttgaaagacgaaga






KQSIRRGLPRVEARNFI
gggtaatttcaaggcttccttaacctctgatgtcag






SIYQDLESHNKSLLEFA
aggtttgttggaattgtatcaggcttcctacatgag






KIDFNLLQLLHRKELSEI
aatccacggtgaagatattttggatgaagctatatc






CRWWKDLDFTRKLPFA
tttcacaactgctcaattaactttagctttaccaactt






RDRVVEGYFWIMGVYF
tggatcctccattgtctgagcaagttggtcatgcct






EPQYSLGRKMLTKVIA
tgaagcagtcaatacgtagaggtttgccaagagtt






MASIVDDTYDSFATYD
gaagccagaaactttatctctatttaccaagacttg






ELIPYTDAIERWDIKCM
gaatcccacaataagtctttattagaatttgctaaaa






NQLPNYMQISYKALLD
ttgatttcaacttattgcaattgttacacagaaagga






VYEEMEQLLADKGRQ
gttgtccgaaatctgtagatggtggaaagacttgg






YRVEYAKKAMIRLVQA
attttaccagaaagttacctttcgctagagatcgtgt






YLLEAKWTHLNYKPTF
cgttgaaggttatttctggatcatgggtgtctacttc






EEFRDNALPTSGYAML
gaaccacaatactccttgggtagaaagatgttgac






AITAFVGMGEVITPETF
caaagttattgctatggcctctattgttgacgatact






EWAASDPKIIKASTIICR
tatgactcatttgcaacctacgacgaattgatacca






FMDDIAEHKFNHRRED
tatacagacgctattgaaagatgggatatcaagtg






DCSAIECYMEQYKVTA
tatgaaccaattgccaaattatatgcaaatatcttac






QEAYDEFNKHIESSWK
aaggctttgttagacgtttacgaggaaatggaaca






DVNEEFLKPTEMPTPVL
attgttggctgataagggtagacaatatagagtcg






CRSLNLARVMDVLYRE
agtacgcaaaaaaagccatgatcagattggttca






GDGYTHVGKAAKGGIT
ggcctacttattagaggctaagtggacccatttga






SLLIDPIQI
actacaagcctacttttgaagagttcagagacaat






(SEQ ID NO: 4)
gctttaccaacctccggttatgccatgttggctatc







actgcattcgttggtatgggtgaagtcattacacca







gaaacttttgaatgggctgcctctgatccaaagatt







attaaggcttctactatcatctgccgtttcatggatg







atattgctgaacacaaattcaaccacagaagaga







ggacgattgttccgctattgaatgttacatggaaca







atacaaggttacagcccaagaagcatacgacga







atttaacaagcatatcgaatcatcttggaaggacg







ttaatgaagaatttttaaagcctaccgaaatgccaa







caccagtcttgtgtagatctttgaacttggccagag







ttatggatgtcttgtaccgtgaaggtgatggttata







ctcatgtcggtaaggctgctaaaggtggtatcacc







tccttgttgatcgaccctattcaaatttaa







(SEQ ID NO: 70)





HibWilS

Hibis-

Q9SAN0
47%
ASQASQVLASPHPAISS
atggcctcacaggcttcccaagttttagcatctcct


QTS123

cadel-



ENRPKADFHPGIWGDM
cacccagctatatcttccgaaaaccgtccaaagg




phus



FIICPDTDIDAATELQYE
ctgatttccatccaggtatctggggcgacatgttta




wilder-



ELKAQVRKMIMEPVDD
ttatctgtccagatacagacattgatgccgctacc




ianus



SNQKLPFIDAVQRLGVS
gagttgcaatatgaagaattgaaagcccaagtca






YHFEKEIEDELENIYRD
gaaagatgatcatggaaccagttgacgattctaat






TNNNDADTDLYTTALR
caaaagttgcctttcattgacgctgtccaaagattg






FRLLREHGFDISCDAFN
ggtgtttcataccactttgaaaaagaaattgaaga






KLKDEEGNFKASLTSD
cgaattagaaaacatctacagagatactaataaca






VPGLLELYEASYLRVH
acgacgcagacactgatttgtacaccactgccttg






GEDILDEAISFATAQLT
agattcagattattgcgtgagcatggttttgatattt






LALPTLHHPLSEQVGH
cttgcgatgctttcaacaagttgaaagacgaagaa






ALKQSIRRGLPRVEARN
ggtaatttcaaggcttccttaacctctgacgtccca






FISIYQDLESHNKSLLEF
ggtttgttggaattgtatgaggcttcctacttaaga






AKIDFNLLQLLHRKELS
gttcacggtgaagatatcttggatgaagctatatct






EICRWWKDLDFTRKLP
ttcgccactgctcagttaaccttggctttaccaactt






FARDRVVEGYFWIMGV
tgcatcacccattgtctgagcaagttggtcacgca






YFEPQYSLGRKMLTKVI
ttgaagcaatcaatcagaagaggtttgccaagag






AMASIVDDTYDSFATY
ttgaagctagaaactttatctctatttaccaagattta






DELIPYTDAIERWDIKC
gaatcccacaataagtctttattagaatttgccaaa






MNQLPNYMQISYKALL
attgatttcaacttgttgcaattgttacaccgtaagg






DVYEEMEQLLADKGR
agttgtccgaaatatgtagatggtggaaagactta






QYRVEYAKKAMIRLVQ
gattttacaagaaagttacctttcgctagagataga






AYLLEAKWTHLNYKPT
gtcgttgaaggttatttctggattatgggtgtctactt






FEEFRDNALPTSGYAM
cgaaccacaatactccttgggtagaaagatgttga






LAITAFVGMGEVITPET
ccaaagttattgctatggcttctatcgttgacgatac






FEWAASDPKIIKASTIIC
ttatgactcatttgccacttacgacgaattgatccct






RFMDDIAEHKFNHRRE
tatacagacgctattgaacgttgggatatcaagtgt






DDCSAIECYMEQYKVT
atgaaccagttgccaaattatatgcaaatatcttac






AQEAYDEFNKHIESSW
aaggctttgttagacgtttacgaggaaatggaaca






KDVNEEFLKPTEMPTP
attgttggctgataagggtagacaatatagagtcg






VLCRSLNLARVMDVLY
agtacgccaaaaaagcaatgattagattggttcag






REGDGYTHVGKAAKG
gcctacttattagaggctaagtggacccatttgaa






GITSLLIDPIQI
ctacaagcctacatttgaagagttcagagacaatg






(SEQ ID NO: 5)
ctttaccaacttccggttatgccatgttggctataac







cgcattcgttggtatgggtgaagtcattaccccag







aaacttttgaatgggccgcttctgatccaaagatta







tcaaggcttctactatcatctgccgtttcatggatga







tattgccgaacataaattcaaccacagaagagag







gacgattgttccgctattgaatgttacatggaacaa







tacaaggttacagcccaagaagcttacgacgaat







ttaacaagcacatcgaatcatcttggaaggacgtc







aatgaagaatttttgaagcctaccgaaatgccaac







tccagtcttgtgtagatctttgaacttggcaagagtt







atggatgtcttgtacagagaaggtgatggttatact







catgtcggtaaggctgctaaaggtggtatcacctc







cttgttgatcgaccctattcaaatttaa







(SEQ ID NO: 71)





HibWilS

Hibis-

Q9SAN0
48%
ASQASQVLASPHPAISS
atggcctcacaggcttcccaagttttagcatctcct


QTS124

cadel-



ENRPKADFHPGIWGDM
cacccagctatatcttccgaaaaccgtccaaagg




phus



FIICPDTDIDAATELQYE
ctgatttccatccaggtatctggggcgacatgttta




wilder-



ELKAQVRKMIMEPVDD
ttatctgtccagatacagacattgatgccgctacc




ianus



SNQKLPFIDAVQRLGVS
gagttgcaatatgaagaattgaaagcccaagtca






YHFEKEIEDELENIYRD
gaaagatgatcatggaaccagttgacgattctaat






TNNNDADTDLYTTALR
caaaagttgcctttcattgacgctgtccaaagattg






FRLLREHGFDISCEAFN
ggtgtttcataccactttgaaaaagaaattgaaga






KLKDEEGNFKASLTSD
cgaattagaaaacatctacagagatactaataaca






VRGLLELYQASYMRIH
acgacgcagacactgatttgtacaccactgccttg






GEDILDEAISFTTAQLTL
agattcagattattgcgtgagcatggttttgatattt






ALPTLDPPLSEQVGHAL
cttgcgaagctttcaacaagttgaaagacgaaga






KQSIRRGLPRVEARNFI
gggtaatttcaaggcttccttaacctctgatgtcag






SIYQDLESHNKSLLEFA
aggtttgttggaattgtatcaggcttcctacatgag






KIDFNLLQLLHRKELSEI
aatccacggtgaagatattttggatgaagctatatc






CRWWKDLDFTRKLPFA
tttcacaactgctcaattaactttagctttaccaactt






RDRVVEGYFWIMGVYF
tggatcctccattgtctgagcaagttggtcatgcct






EPQYSLGRKMLTKVIA
tgaagcagtcaatacgtagaggtttgccaagagtt






MASIVDDTYDSFATYD
gaagccagaaactttatctctatttaccaagacttg






ELIPYTDAIERWDIKCM
gaatcccacaataagtctttattagaatttgctaaaa






NQLPNYMQISYKALLD
ttgatttcaacttattgcaattgttacacagaaagga






VYEEMEQLLADKGRQ
gttgtccgaaatctgtagatggtggaaagacttgg






YRVEYAKKAMIRLVQA
attttaccagaaagttacctttcgctagagatcgtgt






YLLEAKWTHLNYKPTF
cgttgaaggttatttctggatcatgggtgtctacttc






EEFRDNALPTSGYAML
gaaccacaatactccttgggtagaaagatgttgac






AITAFVGMGEVITPETF
caaagttattgctatggcctctattgttgacgatact






EWAASDPKIIKASTIICR
tatgactcatttgcaacctacgacgaattgatacca






FMDDIAEHKFNHRRED
tatacagacgctattgaaagatgggatatcaagtg






DCSAIECYMKQYGATA
tatgaaccaattgccaaattatatgcaaatatcttac






QEAYDEFNKHIESSWK
aaggctttgttagacgtttacgaggaaatggaaca






DVNEEFLKPTEMPTPVL
attgttggctgataagggtagacaatatagagtcg






CRSLNLARVMDVLYRE
agtacgcaaaaaaagccatgatcagattggttca






GDGYTHVGKAAKGGIT
ggcctacttattagaggctaagtggacccatttga






SLLIDPIQI
actacaagcctacttttgaagagttcagagacaat






(SEQ ID NO: 6)
gctttaccaacctccggttatgccatgttggctatc







actgcattcgttggtatgggtgaagtcattacacca







gaaacttttgaatgggctgcctctgatccaaagatt







attaaggcttctactatcatctgccgtttcatggatg







atattgctgaacacaaattcaaccacagaagaga







ggacgattgttccgctattgaatgttacatgaaaca







atacggtgctacagcccaagaagcatacgacga







atttaacaagcatatcgaatcatcttggaaggacg







ttaatgaagaatttttaaagcctaccgaaatgccaa







caccagtcttgtgtagatctttgaacttggcaagag







ttatggatgtcttgtaccgtgaaggtgatggttata







ctcatgtcggtaaggctgctaaaggtggcatcac







ctccttgttgatcgaccctattcaaatttaa







(SEQ ID NO: 72)





HibWilS

Hibis-

Q9SAN0
44%
ASQASQVLASPHPAISS
atggcctcacaggcttcccaagttttagcatctcct


QTS126

cadel-



ENRPKADFHPGIWGDM
cacccagctatatcttccgaaaaccgtccaaagg




phus



FIICPDTDIDAATELQYE
ctgatttccatccaggtatctggggcgacatgttta




wilder-



ELKAQVRKMIMEPVDD
ttatctgtccagatacagacattgatgccgctacc




ianus



SNQKLPFIDAVQRLGVS
gagttgcaatatgaagaattgaaagcccaagtca






YHFEKEIEDELENIYRD
gaaagatgatcatggaaccagttgacgattctaat






TNNNDADTDLYTTALR
caaaagttgcctttcattgacgctgtccaaagattg






FRLLREHGFDISCDAFN
ggtgtttcataccactttgaaaaagaaattgaaga






KLKDEEGNFKASLTSD
cgaattagaaaacatctacagagatactaataaca






VPGLLELYEASYLRVH
acgacgcagacactgatttgtacaccactgccttg






GEDILDEAISFATAQLT
agattcagattattgcgtgagcatggttttgatattt






LALPTLHHPLSEQVGH
cttgcgatgctttcaacaagttgaaagacgaagaa






ALKQSIRRGLPRVEARN
ggtaatttcaaggcttccttaacctctgacgtccca






FISIYQDLESHNKSLLEF
ggtttgttggaattgtatgaggcttcctacttaaga






AKIDFNLLQLLHRKELS
gttcacggtgaagatatcttggatgaagctatatct






EICRWWKDLDFTRKLP
ttcgccactgctcagttaaccttggctttaccaactt






FARDRVVEGYFWIMGV
tgcatcacccattgtctgagcaagttggtcacgca






YFEPQYSLGRKMLTKVI
ttgaagcaatcaatcagaagaggtttgccaagag






AMASIVDDTYDSFATY
ttgaagctagaaactttatctctatttaccaagattta






DELIPYTDAIERWDIKC
gaatcccacaataagtctttattagaatttgccaaa






MNQLPNYMQISYKALL
attgatttcaacttgttgcaattgttacaccgtaagg






DVYEEMEQLLADKGR
agttgtccgaaatatgtagatggtggaaagactta






QYRVEYAKKAMIRLVQ
gattttacaagaaagttacctttcgctagagataga






AYLLEAKWTHLNYKPT
gtcgttgaaggttatttctggattatgggtgtctactt






FEEFRDNALPTSGYAM
cgaaccacaatactccttgggtagaaagatgttga






LAITAFVGMGEVITPET
ccaaagttattgctatggcttctatcgttgacgatac






FEWAASDPKIIKASTIIC
ttatgactcatttgccacttacgacgaattgatccct






RFMDDIAEHKFNHRRE
tatacagacgctattgaacgttgggatatcaagtgt






DDCSAIECYMKQYGAT
atgaaccagttgccaaattatatgcaaatatcttac






AQEAYDEFNKHIESSW
aaggctttgttagacgtttacgaggaaatggaaca






KDVNEEFLKPTEMPTP
attgttggctgataagggtagacaatatagagtcg






VLCRSLNLARVMDVLY
agtacgccaaaaaagcaatgattagattggttcag






REGDGYTHVGKAAKG
gcctacttattagaggctaagtggacccatttgaa






GITSLLIDPIQI
ctacaagcctacatttgaagagttcagagacaatg






(SEQ ID NO: 7)
ctttaccaacttccggttatgccatgttggctataac







cgcattcgttggtatgggtgaagtcattaccccag







aaacttttgaatgggccgcttctgatccaaagatta







tcaaggcttctactatcatctgccgtttcatggatga







tattgccgaacataaattcaaccacagaagagag







gacgattgttccgctattgaatgttacatgaaacaa







tacggtgctacagcccaagaagcatacgacgaa







tttaacaagcacatcgaatcatcttggaaggacgt







taatgaagaatttttgaagcctaccgaaatgccaa







ctccagtcttgtgtagatctttgaacttggccagag







ttatggatgtcttgtacagagaaggtgatggttata







ctcatgtcggtaaggctgctaaaggtggcatcac







ctccttgttgatcgaccctattcaaatttaa







(SEQ ID NO: 73)





HibWilS

Hibis-

A0A067
12%
SIQVPQISSQNAKSQVM
atgtccatacaggttccccaaatttcttcgcaaaat


QTS19

cadel-

FTE8

RRTANFHPSVWGDRFA
gcaaagtcacaagtaatgcgtagaaccgccaact




phus



NYTAEDKMNHARDLK
ttcatccatctgtgtggggagacagattcgctaact




wilder-



ELKALKEEVGRKLLAT
acacggctgaggataaaatgaaccacgctcgcg




ianus



AGPIQLNLIDAIQRLGV
acttgaaggaacttaaagcgttaaaggaagaagt






GYHFERELEQALQHLY
tggtagaaagctgttggccacagctggcccaatt






NEKYSDDDTEDDLYRIS
caactcaatctaatcgatgctatccaaagattgggt






LRFRLLRQHGYNVSCD
gtcggttatcacttcgaacgagaattggaacaag






KFNMFKDDKGNFKESL
ctttgcaacatttatacaacgagaagtatagcgat






ASDALGMLSLYEAAHL
gacgacactgaagatgatttgtacaggatttctctg






GVHGEDILDEAIAFTTT
agatttagattgttaagacagcacggttacaatgtc






HLKSVATHLSNPLKAQ
tcctgcgacaaattcaacatgtttaaggatgacaa






VRHALRQPLHRGLPRL
aggtaacttcaaggaaagtttggcttctgatgcctt






EHRRYISIYQDDASHYK
gggtatgctctccttatacgaagcggctcatttgg






ALLTLAKLDFNLVQSL
gcgttcacggtgaagatatcttagacgaagctatt






HKKELCEISRWWKDLD
gcatttaccactactcatctaaagtccgtcgctact






FARKLPFARDRMVECY
cacttatctaatcctctaaaggcccaagttcgtcat






FWILGVYFEPNYSLARR
gccttgagacaaccgcttcacagaggtttgccaa






ILTKVIAMTSIIDDIYDV
gattggaacacagaaggtatatcagcatttaccag






YGTPEELKLFTEVIERW
gatgacgcttctcattacaaagctttgttgacccttg






DESSMDQLPEYMQTFF
cgaagttggatttcaatctagttcaatcattgcaca






GALLDLYNEIEKEIANE
aaaaggagctatgtgagatctccagatggtggaa






GWSYRVQYAKEAMKI
ggatttagacttcgctcgtaagttgccttttgctaga






LVEGYYDESKWFHENY
gatagaatggtcgaatgttatttctggatcttgggt






IPKMEEYMRVALVTSG
gtgtatttcgaaccaaactactcactggcccggag






YTMLTTVSFLGMDNIV
aatattgaccaaagttattgctatgacttctattattg






TKETFDWVFSRPKIIRA
atgacatctatgacgtttacgggacaccagaaga






SEIIGRFMDDIKSHKFEQ
attgaagttgttcactgaagtaatcgaacgttggg






ERGHCASAVECYMREH
acgaatcgtcaatggaccaactaccagaatacat






GVSEEEACSELKKQVD
gcaaacgtttttcggtgctcttttagatttatacaatg






NAWKDINHEMIFSETSK
agatagaaaaggaaattgccaacgaaggttggtc






AVPMSVLTRVLNLTRVI
ttacagagtccaatatgcaaaagaagctatgaag






DVVYKEGDGYTHVGN
attttagttgagggttactacgatgaatctaagtggt






EMKQNVAALLIDQVPI
tccatgaaaactacataccaaagatggaggaata






(SEQ ID NO: 8)
tatgcgggtagcattagttaccagcggatacaca







atgttgactaccgtcagttttctggggatggacaa







cattgttactaaggagacatttgattgggttttctcc







agacctaaaatcataagagcatcagaaattatcg







gtagattcatggacgatattaaatctcacaaattcg







aacaggaaagaggtcactgtgcgtccgctgtcg







aatgttatatgagggaacatggcgtgtctgaaga







ggaagcttgcagtgagctcaagaagcaagtcga







taacgcctggaaggacatcaaccacgaaatgatt







ttctccgaaacttctaaggctgttcctatgagcgtg







ctaaccagagttttgaacttgacgagagttattgat







gtcgtctacaaggaaggtgatggttatactcatgt







gggtaatgaaatgaaacaaaacgttgctgctctttt







gatcgaccaagtcccaatttaa







(SEQ ID NO: 74)





HibWilS

Hibis-

B1B1U4
13%
EKQSLTFDGDEEAKIDR
atggaaaagcagtccttgacatttgatggcgacg


QTS34

cadel-



KSSKYHPSIWGDYFIQN
aggaagcaaaaatagatcgtaagtcgtcaaagta




phus



SSLTHAKESTQRMIKRV
ccatcctagtatttggggtgactatttcatccaaaat




wilder-



EELKVQVKSMFKDTSD
tccagcttaacccacgccaaagaatctactcaaa




ianus



LLQLMNLINSIQMLGLD
ggatgatcaagagagttgaagaactaaaggtaca






YHFENEIDEALRLIYEV
agtcaaatctatgttcaaggacacttctgatttgttg






DDKSYGLYETSLRFQLL
caactgatgaacttaattaactctattcaaatgctag






RQHGYHVDGEEAFNM
gacttgactaccactttgaaaatgaaatcgatgag






LKDEEGNFKASLTSDVP
gctctccgcttgatctatgaagttgacgataagtca






GLLELYQASYMRIHGE
tacggtctgtacgaaacgagcttgagattccagtt






DILDEAISFTTAQLTLAL
gttgagacaacatggttaccacgtggatggtgaa






PTLDPPLSAQVSLFLELP
gaagctttcaacatgcttaaagacgaagagggta






LCRRNKILLARKYILIY
actttaaggcgtccttgacctctgatgttccaggttt






QEDAMRNNVILELAKL
attggaattatatcaagctagctacatgagaataca






NFNLLQSLYQEELKKISI
tggtgaagatattttggatgaagccattagtttcact






WWNDLAFAKSLSFTRD
accgctcaattgactttagctcttcccaccttagac






RVVEGYYWVLTIYFEP
ccgccattgtcggcacaagtctctttgttcttggag






QHSRARVICSKVFAFLS
ctaccattatgcagaagaaacaagattttgcttgcc






IMDDIYDNYGILEECTL
agaaaatacatcttgatatatcaagaagatgctatg






LTEAIKRWNPQAIDGLP
cgtaataatgttattctcgagttggctaagcttaact






EYLKDYYLKLLKTFEEF
ttaacttattgcaatccttgtaccaagaagaactga






EDELELNEKYRMLYLQ
agaaaatctctatctggtggaatgacttagcttttg






DEVKALAISYLQEAKW
caaagtctttatctttcactagagatagagtcgttga






GIERHVPSLDEHLHNSL
aggttattactgggtcctaaccatctacttcgaacc






ISSGSSTVICASFVGMG
acagcactcccgagctagggtcatttgttcaaaag






EVATKEVFDWLSSFPK
tttttgcctttttgtccattatggatgacatttatga






VVEACCVIGRLLNDIRS
caactatggaatccttgaagaatgtacattattaacag






HELEQGRDHTASTVES
aagctattaagagatggaacccacaagccatcga






YMKEHDTNVDVACEK
cgggttgcctgaatacctaaaagactattacttga






LREIVEKAWKDLNNES
agttgttgaagactttcgaggaatttgaagatgagt






LNPTKVPRLMIERIVNL
tggaattgaatgagaagtacagaatgctgtatttg






SKSNEEIYKYNDTYTNS
caagatgaagttaaagctctggctatctcatactta






DTTMKDNISLVLVESC
caagaggccaagtggggtattgaaagacacgta






DYFNK
ccatcgttagatgagcatcttcacaattctttgataa






(SEQ ID NO: 9)
gttccggctcttcgactgtgatttgtgctagcttcgt







tggtatgggtgaagttgccacgaaggaagtcttc







gattggttgtcctctttcccaaaggttgtcgaagctt







gttgtgtcatcggtaggctcttgaacgatattcgttc







ccatgaattagagcagggcagagaccacacgg







cttccactgttgaatcttacatgaaggaacacgac







accaatgtggacgttgcctgcgaaaagttgagag







aaatcgtcgaaaaggcgtggaaagatctgaaca







acgaatctctaaaccctactaaggttccaagattg







atgatagaaagaatagtaaacttgtcaaagtccaa







cgaagaaatttacaaatacaacgacacctacact







aattctgatactacaatgaaggacaatattagtcta







gtattggttgagtcctgtgattatttcaacaaataa







(SEQ ID NO: 75)





HibWilS

Hibis-

Q39760
12%
ASQVSQMPSSSPLSSNK
atggccagtcaggtttcacaaatgccttcctcttct


QTS52

cadel-



DEMRPKADFQPSIWGD
ccactatccagcaacaaagatgagatgagacca




phus



LFLNCPDKNIDAETEKR
aaggctgactttcaaccctcgatatggggcgattt




wilder-



HQQLKEEVRKMIVAPM
gttcctgaattgcccagacaagaacattgatgctg




ianus



ANSTQKLAFIDSVQRLG
aaaccgaaaagcgtcatcaacaattgaaagaag






VSYHFTKEIEDELENIY
aagtcagaaagatgatcgtggcaccaatggctaa






HNNNDAENDLYTTSLR
ttctacacaaaagttggctttcattgactctgttcag






FRLLREHGFNVSCDVF
aggcttggagtatcctaccactttactaaagaaatt






NKFKDEQGNFKSSVTS
gaggatgaattagaaaacatctatcacaacaataa






DVRGLLELYQASYLRV
cgacgcagaaaacgatttgtacacgacttcccta






HGEDILDEAISFTTNHLS
agattcagattattgagagaacatggtttcaatgtc






LAVASLDYPLSEEVSHA
tcttgtgacgtttttaacaagtttaaggatgagcaa






LKQSIRRGLPRVEARHY
ggtaatttcaagtcaagtgttacctctgacgtccgc






LSVYQDIESHNKVLLEF
ggtctcttggaattataccaagcgtcgtatttgaga






AKIDFNMVQLLHRKEL
gttcacggtgaagatatcttggacgaagctatttc






SEISRWWKDLDFQRKL
gttcacaactaatcatctctctttggccgttgcttcct






PYARDRVVEGYFWISG
tagattaccctctgtctgaagaggtctctcacgcttt






VYFEPQYSLGRKMLTK
gaagcaaagcataagacgtggtcttccaagagta






VIAMASIVDDTYDSYA
gaagccagacactatttgagcgtttaccaagatat






TYEELIPYTKAIERWDI
cgaatctcataacaaagtcttgttagaatttgctaa






KCIDELPEYMKPSYKAL
gattgacttcaacatggttcaattgctacataggaa






LDVYEEMEQLVAKHG
agagctaagtgaaatttcaagatggtggaaagat






RQYRVEYAKNAMIRLA
ctcgattttcaaagaaagttaccttatgcacgcgac






QSYLVEARWTLQNYKP
cgtgtagtcgaaggttacttctggatctccggggtt






SFEEFKANALPTCGYA
tacttcgaaccacaatacagcttgggtagaaagat






MLAITSFVGMGDIVTPE
gttgactaaggttattgctatggcttctatcgttgat






TFKWAANDPKIIQASTII
gatacctatgactcctacgccacctacgaggaatt






CRFMDDVAEHKFEQER
gatcccatatactaaggccattgaaagatgggac






GHCASAVECYMREHG
atcaagtgtatagacgaactgccagaatatatgaa






VSEEEACSELKKQVDN
gcctagttacaaagctttattggatgtctatgagga






AWKDINHEMIFSETSKA
aatggaacaattggtcgccaaacacggtcgaca






VPMSVLTRVLNLTRVM
gtacagagtggaatacgctaagaatgctatgattc






DVLYREGDGYTYVGK
gattggcgcaatcctacttggttgaagcgagatg






AAKGGITSLLIEPVAL
gactcttcaaaactacaagccatctttcgaagaatt






(SEQ ID NO: 10)
taaggccaatgctttaccgacatgtggatatgctat







gctagctataaccagcttcgttggtatgggtgatat







tgtcacgccagaaacttttaaatgggctgcaaatg







acccgaagattatccaggcttctactatcatctgcc







gatttatggatgatgtagctgagcataagttcgaa







caagaaagggggcactgtgcttccgctgtcgagt







gttacatgagagaacacggtgtgtcagaagaag







aggcatgttctgaattgaaaaagcaagtcgacaa







cgcctggaaggacattaaccatgaaatgattttttc







ggaaacctccaaagctgtcccaatgtcggttctca







ctagagttcttaacttgactagagttatggacgtatt







gtacagagaaggtgatggttatacatatgttggta







aggctgcaaagggcggtatcacctctttattgatt







gaaccagttgccttgtaa







(SEQ ID NO: 76)





HibWilS

Hibis-

Q39761
13%
ASQVSQMPSSSPLSSNK
atggccagtcaggtttcacaaatgccttcctcttct


QTS54

cadel-



DEMRPKADFQPSIWGD
ccactatccagcaacaaagatgagatgagacca




phus



LFLNCPDKNIDAETEKR
aaggctgactttcaaccctcgatatggggcgattt




wilder-



HQQLKEEVRKMIVAPM
gttcctgaattgcccagacaagaacattgatgctg




ianus



ANSTQKLAFIDSVQRLG
aaaccgaaaagcgtcatcaacaattgaaagaag






VSYHFTKEIEDELENIY
aagtcagaaagatgatcgtggcaccaatggctaa






HNNNDAENDLYTTSIRF
ttctacacaaaagttggctttcattgactctgttcag






RLLREHGYHVDGEEAF
aggcttggagtatcctaccactttactaaagaaatt






NMLKDEEGNFKASLTS
gaggatgaattagaaaacatctatcacaacaataa






DVPGLLELYQASYMRI
cgacgcagaaaacgatttgtacacgacttccata






HGEDILDEAISFTTAQL
agattcagattattgagagaacatggttaccacgt






TLALPTLDPPLSEEVSH
cgatggtgaggaagccttcaacatgctcaaggac






ALKQSIRRGLPRVEARH
gaagaaggtaattttaaggcttctttgacctcagac






YLSVYQDIESHNKALLE
gttcctggtttgttagaactatatcaagcctcataca






FAKIDFNMLQFLHRKEL
tgcgaatccatggtgaagatattttggacgaagcg






SEICRWWKDLDFQRKL
atctcttttactactgctcaattaaccttggctttgcc






PYARDRVVEGYFWISG
aaccctggatccaccgctctctgaagaggtcagt






VYFEPQYSLGRKMLTK
cacgcgctaaagcaaagtattagaagaggtttac






VIAMASIVDDTYDSYA
cacgtgtagaagctagacattatctgtccgtttacc






TYEELIPYTNAIERWDI
aagacatcgaatctcacaataaagctctattggaa






KCIDEIPEYMKPSYKAL
tttgccaagattgatttcaacatgttgcagttcctcc






LDVYEEMVQLVAEHG
acagaaaggaactttcagaaatatgtcgttggtgg






RQYRVEYAKNAMIRLA
aaagatttggacttccaacgcaagttaccatatgct






QSYLVEAKWTLQNYKP
agagatcgcgttgtcgagggttacttctggatcag






SFEEFKANALPTCGYA
cggagtttactttgagccacaatacagtttgggtc






MLAITSFVGMGDIVTPE
ggaagatgttaactaaagttattgctatggcttctat






TFKWAASDPKIIQASTII
tgtcgatgacacatatgactcctacgccacctacg






CRFMDDVAEHKFKHRR
aagaattaatcccttatactaacgccatcgaaaga






EDDCSAIECYMEEYGV
tgggacattaagtgtatcgatgaaattccggaata






TAQEAYDVFNKHVESA
catgaaaccatcttacaaagctttgcttgacgtcta






WKDLNQEFLKPTEMPT
cgaagaaatggtacaattggttgctgagcatggta






EVLNRSLNLARVMDVL
ggcaatacagagttgaatatgcaaagaatgccat






YREGDGYTYVGKAAK
gattagattggctcaatcttacttggtggaagcaaa






GGITSLLIEPIAL
gtggacgttgcaaaattacaaacctagctttgagg






(SEQ ID NO: 11)
aatttaaggcgaacgctctgcccacctgtgggtat







gccatgctggcaattacttccttcgttggtatgggc







gacattgtcactcctgaaacattcaaatgggctgc







atccgatccaaagatcattcaagcttcgacgataa







tctgtcgattcatggatgatgtcgctgagcacaag







ttcaagcacaggagagaagatgactgttctgcca







tagaatgttacatggaagaatacggtgttaccgcc







caggaggcttacgatgtcttcaacaagcacgttg







aatccgcgtggaaagatttgaaccaagaatttctc







aagccaactgaaatgccaacagaggtgttgaac







agatcacttaacctcgctcgtgttatggacgtattg







tatagagaaggtgatggttatacttacgttggtaag







gctgctaagggcggtatcacctctttattgatcgaa







ccaatcgctttgtaa







(SEQ ID NO: 77)





HibWilS

Hibis-

Q43714
12%
ASQASQVLASPHPAISS
atggccagtcaggcttcacaagttttagcatctcc


QTS55

cadel-



ENRPKADFHPGIWGDM
ccacccagctatatcctctgaaaaccggccaaag




phus



FIICPDTDIDAATELQYE
gctgatttccatcctggtatctggggcgacatgttt




wilder-



ELKAQVRKMIMEPVDD
attatctgtccagatacggacattgatgccgctac




ianus



SNQKLPFIDAVQRLGVS
agagctgcaatatgaagaattgaaagcgcaagtc






YHFEKEIEDELENIYRD
cgcaagatgatcatggaaccagtagacgattcta






TNNNDADTDLYTTALR
atcaaaagctaccattcattgacgctgttcaaagg






FRLLREHGFDISCDAFN
ctcggagtgagctaccactttgaaaaagaaattga






KFKDEAGNFKASLTSD
agacgaacttgaaaacatctaccgtgataccaata






VQGLLELYEASYMRVH
acaacgacgcagacactgatctatacactaccgc






GEDILDEAISFTTAQLTL
cttgagattcagattattgagagagcatggttttgat






ALPTLHHPLSEQVGHA
atttcctgcgatgctttcaacaagttcaaagacgaa






LKQSIRRGLPRVEARNF
gctggtaatttcaaggcttcgttgacttctgacgttc






ISIYQDLESHNKSLLQF
aaggtttgttggaattgtatgaggcctcctacatga






AKIDFNLLQLLHRKELS
gagtccacggtgaagatatcctagatgaagctat






EICRWWKDLDFTRKLP
atcttttaccactgctcagttaaccttggctttaccta






FARDRVVEGYFWIMGV
ctttgcatcacccgttgtcagagcaagttggtcac






YFEPQYSLGRKMLTKVI
gcactcaagcagagtatcagaagaggcctgcca






AMASIVDDTYDSYATY
agagttgaagccagaaactttatctctatttaccaa






DELIPYTNAIERWDIKC
gatttggaatcccacaataagtccttgttacaattc






MNQLPNYMKISYKALL
gctaaaattgactttaaccttttacaattgctccata






NVYEEMEQLLANQGR
ggaaggaactcagcgaaatttgtagatggtggaa






QYRVEYAKKAMIRLVQ
agatcttgatttcactagaaagttgccttttgcacgt






AYLLEAKWTHQNYKPT
gaccgtgtcgtcgaaggttatttctggattatggga






FEEFRDNALPTSGYAM
gtttacttcgaaccacaatatagcttgggtagaaa






LAITAFVGMGEVITPET
gatgttgaccaaggttattgctatggcttctatcgtc






FKWAASDPKIIKASTIIC
gatgatacatacgattcttacgctacatatgacgaa






RFMDDIAEHKFEQERG
ttgataccatatactaacgccatcgaaagatggga






HCASAVECYMREHGVS
catcaagtgtatgaatcaactgccaaactacatga






EEEACSELKKQVDNAW
agattagttacaaagcattattgaatgtatatgagg






KDINHEMIFSETSKAVP
agatggaacaattgcttgcgaatcaaggtcgaca






MSVLTRVLNLTRVMDV
gtacagagtggaatacgctaagaaagctatgatt






LYREGDGYTHVGKAA
cggttggtgcaagcctacttattagaagcgaagtg






KGGITSLLIDPIQI
gactcatcaaaactacaagccaaccttcgaagaa






(SEQ ID NO: 12)
tttagagacaatgctttgccgacatcagggtatgct







atgctagctataaccgcgttcgttggtatgggtga







agttatcacgccagaaacttttaaatgggccgctt







ctgacccaaagattattaaggcttccactatcatct







gccgctttatggatgatatcgctgagcataagttc







gagcaagaaagggggcactgtgcttccgctgtc







gaatgttacatgagagaacacggtgtctcagaag







aagaggcctgttctgaattgaaaaagcaggtcga







caacgcctggaaggatattaaccatgagatgattt







ttagtgaaacatccaaagctgtcccaatgagtgtt







ctaaccagagttttgaaccttactagagttatggac







gtattgtacagagaaggtgatggttatacgcatgt







cggtaaggctgcaaagggtggtatcacctctttgt







tgattgaccccattcaaatctaa







(SEQ ID NO: 78)





HibWilS

Hibis-

Q9FQ26
12%
AASFANKCRPLANFHP
atggccgcatcatttgctaacaaatgtagaccttta


QTS63

cadel-



TVWGYHFLYYNPEITN
gctaatttccacccaactgtttggggttaccatttct




phus



QEKIEVDEYKETIRKML
tgtattacaacccagagataaccaatcaggaaaa




wilder-



VEAPEGSEQKLVLIDA
gatcgaagtcgatgaatacaaggaaacaattcgt




ianus



MQRLGVAYHFHNEIET
aagatgttggttgaagcccctgaagggtccgagc






SIQNIFDAPKQNNDDNL
aaaaattggtcttaatcgacgctatgcaaagattg






HIVSLRFRLVRQQGHY
ggtgttgcatatcactttcataacgaaattgaaacc






MSSDVFKQFTNQDGKF
tctattcaaaatatcttcgatgctccaaagcaaaac






KETLTNDVQGLLSLYE
aacgacgataacttgcacattgtctctttaagattc






ASHLRVRNEEILEEALT
agattggtccgtcaacagggtcattacatgtcctct






FTTTHLESIVSNLSNKN
gacgtttttaagcaattcactaaccaagatggtaaa






NSLKVEVSEALSQPIRM
ttcaaggaaaccttgactaatgatgtccaaggtttg






TLPRIGARKYISIYENND
ttgtcattatatgaagcttctcacttgagagttagaa






AHNHLLLKFAKLDFNM
atgaagaaatattagaggaagctttgacttttacca






LQKFHQRELSDLTRWW
caactcatttggaatccatcgtttctaacttatcaaa






KDLDFANKIPYARDRL
caaaaataactctttaaaggttgaagtttctgaagc






VECYFWILGVYFEPKYS
tttgtcccaaccaatcagaatgactttgccaagaat






RARKMMTKVLKMTSII
tggtgccagaaagtacatttccatatacgaaaaca






DDTFDAYANFDELVPF
atgacgcccacaaccatttgttgttaaagttcgcta






NDAIQRWDANAIDSIPP
agttggattttaatatgttacaaaagttccaccaaa






YMRPIYQALLDIYGEM
gagaattgtccgacttgaccagatggtggaaaga






DQVLSKEGKLDRVYYA
cttggactttgctaacaagatcccatatgctagag






KYEMKKLVRAYFKESQ
atcgtttagtcgagtgctatttttggattttgggtgttt






WLNDDNHIPKYEEHME
acttcgaacctaaatactctcgtgctagaaagatg






NAIVTVGYMMGATNC
atgaccaaggtcttgaaaatgacatctattattgat






LVGMEEFISKETFEWL
gatacttttgatgcttacgccaatttcgacgaattg






MSEPVIVRASSLIGRAM
gttccattcaatgacgccatccaaagatgggacg






DDIVGHEVEQERGHCA
ctaacgcaatcgattctattccaccatacatgcgtc






SAVECYMREHGVSEEE
caatctaccaggccttgttagatatatatggtgaaa






ACSELKKQVDNAWKDI
tggaccaagttttatccaaagagggtaagttggat






NHEMIFSETSKAVPMSV
agagtctactatgctaagtatgagatgaaaaagtt






LTRVLNLTRVIDTLYQE
ggtcagagcctactttaaggaatctcaatggttaa






EDEYTNAKGKLKNMIH
acgacgataatcatatacctaagtatgaagaacac






SILIESVKI
atggaaaacgctattgttactgtcggttacatgatg






(SEQ ID NO: 13)
ggtgctacaaactgtttggttggtatggaggaattt







atctcaaaagaaaccttcgaatggttgatgtcaga







accagttattgttagagcatcttccttgataggtag







agcaatggatgatatcgtcggtcacgaggttgaa







caagaacgtggtcattgtgcttcagcagtcgaatg







ttacatgagagagcatggtgtttctgaagaagaag







cttgctccgaattaaagaagcaagttgacaacgct







tggaaggacattaaccacgagatgatcttctctga







aacttctaaagctgtcccaatgtctgtcttaaccag







agttttaaacttgacaagagttattgatactttgtac







caggaagaagatgaatacaccaacgctaagggt







aaattaaaaaatatgatccactccatcttgattgagt







cagtcaagatctaa







(SEQ ID NO: 79)





HibWilS

Hibis-

B1B1U4
25%
EKQSLTFDGDEEAKIDR
atggaaaagcagtctttgacatttgatggtgacga


QTS90

cadel-



KSSKYHPSIWGDYFIQN
ggaagcaaaaatagatcgtaagtcatccaagtac




phus



SSLTHAKESTQRMIKRV
catccttctatttggggcgactatttcatccaaaatt




wilder-



EELKVQVKSMFKDTSD
cctctttaacccacgccaaagaatctactcaaaga




ianus



LLQLMNLINSIQMLGLD
atgatcaagagagttgaagaattgaaggtccaag






YHFENEIDEALRLIYEV
ttaaatcaatgttcaaggacacttccgatttattgca






DDKSYGLYETSLRFQLL
attgatgaacttaattaactctattcaaatgttgggtt






RQHGYHVDGEEAFNM
tggactaccactttgaaaatgaaatcgatgaggct






LKDEEGNFKASLTSDVP
ttgagattgatctatgaagtcgacgataagtcctac






GLLELYQASYMRIHGE
ggtttgtacgaaacatcattaagattccagttgttaa






DILDEAISFTTAQLTLAL
gacaacatggttaccacgttgatggtgaagaagc






PTLDPPLSAQVSLFLELP
tttcaacatgttgaaggatgaggaaggtaactttaa






LCRRNKILLARKYILIY
agcttctttaacctccgacgttccaggtttgttaga






QEDAMRNNVILELAKL
gttgtatcaagcctcttacatgcgtattcatggtga






NFNLLQSLYQEELKKISI
agatatattggatgaagctatttcattcactaccgct






WWNDLAFAKSLSFTRD
caattaactttggctttgccaactttagacccaccat






RVVEGYYWVLTIYFEP
tgtccgcacaagtctctttgttcttggagttgccatt






QHSRARVICSKVFAFLS
atgcagaagaaacaagattttgttggccagaaaat






IMDDIYDNYGILEECTL
acatcttgatatatcaagaagatgctatgcgtaata






LTEAIKRWNPQAIDGLP
atgttattttggagttagccaagttgaactttaactta






EYLKDYYLKLLKTFEEF
ttgcaatctttataccaagaagaattgaagaaaatc






EDELELNEKYRMLYLQ
tctatctggtggaatgacttagcttttgctaagtcttt






DEVKALAISYLQEAKW
atctttcaccagagatagagtcgttgaaggttatta






GIERHVPSLDEHLHNSL
ctgggtcttgactatctacttcgaacctcagcactc






ISSGSSTVICASFVGMG
cagagccagagttatttgttccaaagtttttgcttttt






EVATKEVFDWLSSFPK
tgtctattatggatgacatttatgacaactatggtat






VVEACCVIGRLLNDIRS
cttggaagaatgtacattattaaccgaagctattaa






HEFEQERGHCASAVEC
gagatggaacccacaagcaatcgacggtttgcc






YMREHGVSEEEACSEL
agaatacttgaaagactattacttgaagttgttaaa






KKQVDNAWKDINHEMI
gactttcgaggaatttgaagatgaattagaattgaa






FSETSKAVPMSVLTRVL
tgagaagtacagaatgttgtatttgcaagatgaag






NLTRGNEEIYKYNDTY
ttaaagctttggctatctcctacttacaagaggcca






TNSDTTMKDNISLVLVE
agtggggtattgaaagacacgtcccttcattagat






SCDYFNK
gagcatttgcacaattctttgatatcctctggttcttc






(SEQ ID NO: 14)
cactgtcatttgtgcttcattcgttggtatgggtgaa







gttgctaccaaggaagtcttcgattggttgtcctctt







tcccaaaggttgtcgaagcctgttgtgttatcggta







gattgttgaacgatattcgttcccatgaatttgagc







aggaaagaggtcactgcgcttccgctgttgaatgt







tacatgagagaacacggtgtctctgaagaagaag







cctgctcagaattgaagaagcaagttgacaacgc







atggaaagatataaaccatgaaatgatattctctga







aacatctaaggccgttcctatgtcagtcttgacca







gagttttgaacttgacccgtggtaatgaagaaatct







acaagtacaacgatacttatactaattcagacacc







accatgaaagacaacatctccttggtcttggttga







atcttgtgactatttcaacaagtaa







(SEQ ID NO: 80)





LeuGraS

Leuca-

A0A067
14%
SIQVPQISSQNAKSQVM
atgtccatacaggttccccaaatttcttcgcaaaat


QTS335

dendron

FTE8

RRTANFHPSVWGDRFA
gcaaagtcacaagtaatgcgtagaaccgccaact




grandi-



NYTAEDKMNHARDLK
ttcatccatctgtgtggggagacagattcgctaact




florum



ELKALKEEVGRKLLAT
acacggctgaggataaaatgaaccacgctcgcg






AGPIQLNLIDAIQRLGV
acttgaaggaacttaaagcgttaaaggaagaagt






GYHFERELEQALQHLY
tggtagaaagctgttggccacagctggcccaatt






NEKYSDDDTEDDLYRIS
caactcaatctaatcgatgctatccaaagattgggt






LRFRLLRQHGYNVSCD
gtcggttatcacttcgaacgagaattggaacaag






AFNRFKDTKGSFKEDLI
ctttgcaacatttatacaacgagaagtatagcgat






KDVNSMLCLYEATHLR
gacgacactgaagatgatttgtacaggatttctctg






VHGEDILDEALGFTTSQ
agatttagattgttaagacagcacggttacaatgtc






LKSILPKLKPLLASQVM
tcctgcgacgccttcaacagatttaaagataccaa






HALKQPLHRGLPRLEH
gggtagtttcaaggaagacttgatcaaagatgtta






RRYISIYQDDASHYKAL
actctatgctctgtttatacgaagcaactcatttgcg






LTLAKLDFNLVQSLHK
ggttcacggtgaagatattttggacgaagctttgg






KELCEISRWWKDLDFA
gatttacaacttcccaactaaagtccatcttaccta






RKLPFARDRMVECYFW
agttaaaaccattgctggcttctcaagtcatgcatg






ILGVYFEPNYSLARRILT
ccttgaagcaaccgctacaccgtggtttgccaag






KVIAMTSIIDDIYDVYG
actcgaacacagaaggtatattagcatttaccagg






TPEELKLFTEVIERWDE
atgacgcttctcattacaaagccttgttgactcttgc






SSMDQLPEYMQTFFGA
gaagttggatttcaatctagttcaatcattacacaa






LLDLYNEIEKEIANEGW
aaaggagctctgtgagatctccagatggtggaag






SYRVQYAKEAMKILVE
gatttagacttcgctcgtaagttgccttttgctagag






GYYDESKWFHENYIPK
atagaatggtcgaatgttatttctggatcttgggtgt






MEEYMRVALVTSGYT
gtatttcgaaccaaactactcactggctagaagaa






MLTTVSFLGMDNIVTK
tattgaccaaagttattgctatgacctctattatcgat






ETFDWVFSRPKIIRASEI
gacatttatgacgtttacggcactccagaagaatt






IGRFMDDIKSHKFEQER
gaagctattcactgaagtaatcgaacgttgggac






GHAASAVECYMKQHG
gaatcgtcaatggaccaactgccagaatacatgc






LSEQEVCEELYRQVSN
aaacgtttttcggtgctttgttagatttatacaatgag






AWKDINEECLNPTAVP
atagaaaaggaaattgcaaacgaaggttggtctt






MPLLMRALNLARVIDV
acagagtccagtatgcgaaagaagctatgaagat






VYKEGDGYTHVGNEM
tttggttgagggttactacgatgaatctaagtggtt






KQNVAALLIDQVPI
ccatgaaaattacatacccaagatggaggaatat






(SEQ ID NO: 15)
atgcgggtagccttagttaccagcgggtacacaa







tgttgactaccgtcagttttctggggatggacaac







atcgttactaaggagacatttgattgggttttctcca







gacctaagataatccgagccagtgaaattattggt







agattcatggacgatatcaaatctcataagtttgaa







caagagagaggtcacgctgcaagcgctgtcgaa







tgttatatgaagcaacacggtctctcagaacaaga







agtctgtgaagaactttacagacaagtctccaacg







cttggaaggacatcaatgaggaatgcttgaatcc







aaccgctgttccaatgccattgttgatgagagcac







taaacttggcacgcgtaatcgacgtagtttataaa







gaaggtgacggttacactcacgttggtaacgaaa







tgaagcaaaacgtggctgctctacttattgatcaa







gtaccaatctaa







(SEQ ID NO: 81)





LeuGraS

Leuca-

A0A0A0
12%
SAAQVSPAPVPAHNAA
atgtccgcagcgcaagtcagtcctgctccagttcc


QTS345

dendron

QUT9

ASKEEVRRSAGYHPSF
agcccacaatgctgctgcttctaaggaagaggtg




grandi-



WGEFFLTHTSEYAKKD
cgtagatcggccggatatcatccatcattctgggg




florum



DKIQKQHEELKQEVKG
tgaatttttccttactcacacaagcgaatacgctaa






MLVDATTEPTKKLELID
aaaggacgataagattcagaaacaacatgaaga






AILRLGVGYHFEDEIQA
attgaagcaagaggttaagggcatgctagtagat






ELERIHRLGDLDCDLYN
gctacgaccgaacccactaaaaagttagaattga






TCIWFRVLRGQGFTVS
tagacgccatcctgagattgggtgtcggttaccac






AEEFNKFKNSDGNFKE
tttgaagatgagattcaagctgaattggaaaggat






DLINDVSGMLCLYEAT
ccacagactcggtgacttagattgcgacttgtata






HLRVHGEDILDEALEFT
acacctgtatttggttcagagttcttagaggtcaag






TTRLKSILPDLEPPLATQ
gttttactgtctctgctgaagaatttaacaagttcaa






VMHALELPYHKGMQR
aaattccgacggaaacttcaaggaagatttgatca






LEARQYIPIYEADMTKN
atgacgtttctggtatgttgtgtttatacgaagccac






ISLLHFAKLDFNLLQAL
ccatttgcgggttcacggtgaggatattttggatga






HQSEIREITRWWKDLDF
agcgctcgaatttactaccacacgtttaaagtctat






KTRLPYARDRLVECYF
cttaccagacttggaaccgccattggctactcaag






WILGVQYEPQYSMSRL
taatgcacgcactagaactaccttaccataagggt






FLTKVISLASVFDDTYD
atgcagagattggaagcccgacaatacattccaa






IYGTFEELKLLTDAIER
tctatgaagccgatatgactaaaaacatcagcttgt






WEIEATDSLPSYMQILY
tgcatttcgctaagcttgatttcaacctgttacaggc






RALLDVFDEYKDKLIN
tctccaccaatccgaaatcagagagataacccgc






VQGKDYCLYYGKEAM
tggtggaaagatcttgactttaaaactagattgcca






KGLIRSYHTEAVSFHTG
tatgctagagatcgcttagtcgaatgttacttctgg






YVQNFEEYLDNSAVSS
attctaggcgttcaatacgagccacaatacagtat






GYPMLTVEALIGMGHP
gtctcggttgtttttaaccaaggttatttcattggctt






YATKEALDWALKVPR
ctgtcttcgatgacacatatgacatttacggtacctt






VIKASSDICRLVDDLRT
cgaagaattaaagttgttgactgacgccatagaaa






YKVEEERGDAPSGVHC
gatgggagatcgaagcaacagattccttgccgtc






YMRDYNVSEEEACSKI
ttacatgcaaattttatatcgcgctttgctggacgtc






EEMIDLAWKAINEEMQ
ttcgatgaatacaaggataaattgattaacgttcaa






KPGHLPLPILLPALNFTR
gggaaggactattgtttgtattacggtaaagaagc






MMEVLYQNIDGYTNSG
gatgaagggtttgattcgtagctaccacactgaag






GRTKDRITSLLVHPITI
ctgtgtcgtttcataccggctatgttcagaatttcga






(SEQ ID NO: 16)
ggaatacttagacaactccgcagtttcctctggtta







cccaatgctgacggttgaagctttgattggtatgg







gacacccttacgctactaaggaagctttagattgg







gcattgaaggtgccaagagttatcaaggctagttc







agacatctgtagattagtcgatgacttaaggacgt







acaaggtcgaggaggaaagaggtgatgctccct







cgggggtccattgctacatgagagactataatgtc







tcagaagaagaagcatgttctaagatcgaagaaa







tgatcgatctggcctggaaagctataaacgaaga







aatgcaaaagccaggtcatctaccactaccaatct







tgttgcctgccttgaacttcactagaatgatggag







gtcctttaccaaaatattgatggttatacaaattccg







gtggtagaaccaaggacagaatcacctctttgttg







gttcacccaattactatttaa







(SEQ ID NO: 82)





LeuGraS

Leuca-

D0VMR
11%
SSAKLGSASEDVNRRD
atgtcctcagcaaaattgggttctgcttctgaagat


QTS365

dendron

6

ANYHPTVWGDFFLTHS
gtcaaccgtagagacgctaattaccatccaaccg




grandi-



SNFLENNDSILEKHEEL
tttggggagatttctttttaacacactcctctaacttc




florum



KQEVRNLLVVETSDLPS
ttggagaacaatgactcaatattggaaaagcacg






KIQLTDEIIRLGVGYHFE
aagaattgaagcaagaggttagaaacttattggtc






TEIKAQLEKLHDHQLH
gttgaaacttctgacttgccttccaagattcagttga






LNFDLLTTSVWFRLLR
ctgatgaaattatcagattaggtgtcggttatcatttt






GHGFSISSDIFNKFKNSD
gagaccgaaatcaaagcccaattagaaaagttgc






GNFKEDLINDVSGMLC
acgatcatcaattgcacttgaacttcgacttgttga






LYEATHLRVHGEDILDE
ccacatctgtttggttcagattattgagaggtcacg






ALEFTTTRLKSILPDLEP
gtttttccatttcttccgacatcttcaataagttcaaa






PLNECVRDALHIPYHRN
aattcagatggtaactttaaggaagatttaatcaac






VQRLAARQYIPQYDAE
gacgtttctggtatgttgtgcttgtacgaagctactc






PTKIESLSLFAKIDFNML
atttgcgtgtccacggtgaagatattttagacgaa






QALHQRELREASRWW
gccttggaatttactactaccagattgaagtctattt






KEFDFPSKLPYARDRIA
tgccagatttagaaccaccattaaatgaatgtgtca






EGYYWMMGAHFEPKF
gagacgctttgcatattccttatcacagaaacgttc






SLSRKFLNRIIGITSLIDD
aacgtttggctgcaagacaatacataccacagta






TYDVYGTLEEVTLFTE
cgatgccgaaccaacaaaaatcgagtctttgtcat






AVERWDIEAVKDIPKY
tattcgctaagattgatttcaacatgttgcaagcttt






MQVIYTGMLGIFEDFK
gcatcaaagagaattgagagaggcttccagatg






DNLINARGKDYCIDYAI
gtggaaagaatttgacttcccttctaagttaccatat






EVFKEIVRSYQREAEYF
gccagagatcgtatcgctgaaggttactactggat






HTGYVPSYDEYMENSII
gatgggtgcccactttgaaccaaagttctcattgtc






SGGYKMFIILMLIGRGE
tcgtaagttcttaaacagaatcattggtatcacttctt






FELKETLDWASTIPEMV
taattgatgacacctatgatgtttacggtactttgga






EASSLIARYIDDLQTYK
ggaagttactttgtttaccgaagctgttgaaagatg






AEEERGETVSAVRCYM
ggacattgaagctgtcaaggacattccaaaatac






REFGVSEEQACKKMRE
atgcaagtcatctatacaggtatgttaggtatatttg






MIEIEWKRLNKTTLEAD
aagatttcaaagacaacttgataaatgctagaggt






EISSSVVIPSLNFTRVLE
aaggattactgtatcgactatgcaatcgaggttttc






VMYDKGDGYSDSQGV
aaagaaatcgttagatcctaccaaagagaagctg






TKDRIAALLRHAIEI
aatatttccacaccggttacgttccatcctacgatg






(SEQ ID NO: 17)
aatacatggaaaactctattatatctggtggttaca







agatgttcattatcttaatgttaatcggtagaggag







aatttgagttgaaggaaactttggactgggcttcc







actattcctgaaatggtcgaggcatcttccttgatc







gctcgttatattgacgacttgcaaacctataaagct







gaagaagagagaggagaaaccgtctccgcagt







cagatgttacatgcgtgaatttggtgtttcagaaga







acaagcctgtaagaagatgagagagatgatcga







aattgaatggaagagattgaataaaacaactttag







aagctgacgaaatttcttcatctgtcgttattccatc







attgaacttcaccagagttttggaggtcatgtacga







taagggtgatggttactctgattcccaaggtgttac







taaagaccgtatcgccgctttattgagacacgcca







tcgaaatctaa







(SEQ ID NO: 83)





LeuGraS

Leuca-

Q39760
14%
ASQVSQMPSSSPLSSNK
atggccagtcaggtttcacaaatgccttcctcttct


QTS377

dendron



DEMRPKADFQPSIWGD
ccactatccagcaacaaagatgagatgagacca




grandi-



LFLNCPDKNIDAETEKR
aaggctgactttcaaccctcgatatggggcgattt




florum



HQQLKEEVRKMIVAPM
gttcctgaattgcccagacaagaacattgatgctg






ANSTQKLAFIDSVQRLG
aaaccgaaaagcgtcatcaacaattgaaagaag






VSYHFTKEIEDELENIY
aagtcagaaagatgatcgtggcaccaatggctaa






HNNNDAENDLYTTSLR
ttctacacaaaagttggctttcattgactctgttcag






FRLLREHGFNVSCDAF
aggcttggagtatcctaccactttactaaagaaatt






NRFKDTKGSFKEDLIKD
gaggatgaattagaaaacatctatcacaacaataa






VNSMLCLYEATHLRVH
cgacgcagaaaacgatttgtacacgacttcccta






GEDILDEALGFTTSQLK
agattcagattattgagagaacatggtttcaatgtc






SILPKLKPLLASQVMHA
tcttgtgacgcctttaacagatttaaggataccaaa






LKQPLRRGLPRVEARH
ggttcattcaaggaagacttgatcaaggatgttaat






YLSVYQDIESHNKVLLE
tccatgttgtgtttatacgaagcgactcaccttcga






FAKIDFNMVQLLHRKE
gttcatggtgaggatattttggacgaagctttgggt






LSEISRWWKDLDFQRK
ttcacaacctctcaactcaaatcaatcttacctaagt






LPYARDRVVEGYFWIS
taaagccattgctggcttcgcaagtcatgcacgct






GVYFEPQYSLGRKMLT
ttgaagcaaccgctaagacgtggtttgccaagag






KVIAMASIVDDTYDSY
ttgaagccagacactatttgagcgtttaccaagat






ATYEELIPYTKAIERWD
attgaatctcataacaaagtcttgttggaatttgcta






IKCIDELPEYMKPSYKA
agatcgacttcaacatggttcaacttctccatagga






LLDVYEEMEQLVAKHG
aggagctcagtgaaattagtagatggtggaaaga






RQYRVEYAKNAMIRLA
tttagacttccaacgtaaattgccatacgctagaga






QSYLVEARWTLQNYKP
tcgcgttgtcgaaggttatttttggattagtggggta






SFEEFKANALPTCGYA
tacttcgaaccgcaatattccctgggtagaaagat






MLAITSFVGMGDIVTPE
gttaactaaggttattgccatggcttctatcgtcga






TFKWAANDPKIIQASTII
cgatacctacgattcttacgcaacttatgaggaatt






CRFMDDVAEHKFKHRR
aatcccatacaccaaagctatagaaagatgggat






EDDCSAIECYMEEYGV
ataaagtgtatagacgaattgcctgagtatatgaa






TAQEAYDVFNKHVESA
gccatcatacaaggctttgttggacgtgtacgaag






WKDVNKEFLKPTEMPT
aaatggaacagttagttgccaaacacggtcggca






EVLNRSLNLARVMDVL
atacagagttgaatatgctaagaatgctatgatcc






YREGDGYTYVGKAAK
ggctagcccaatcttatctggtcgaggctagatgg






GGITSLLIEPVAL
actctacaaaactacaagccttccttcgaagaattt






(SEQ ID NO: 18)
aaggctaacgcattgccaacttgtggttacgctat







gttggcgatcacttctttcgttggtatgggcgacat







tgttaccccagaaacatttaagtgggccgcgaac







gatccaaagattattcaagcttcaacgataatctgc







cggtttatggatgacgtcgccgaacacaagttca







aacataggagggaagacgattgttctgctatcga







gtgttatatggaagaatacggagtaactgcccag







gaggcctacgacgtcttcaataagcacgtggaat







cagcttggaaggatgttaataaggaatttttgaag







cccaccgagatgcctacggaagtgctgaacaga







tctttgaacctcgcaagagttatggatgtcttgtac







agagaaggtgatggttatacttatgtgggtaaggc







tgctaaaggtgggattacctccctattgatcgaac







cagtcgctttataa







(SEQ ID NO: 84)





LeuGraS

Leuca-

Q39761
12%
ASQVSQMPSSSPLSSNK
atggccagtcaggtttcacaaatgccttcctcttct


QTS379

dendron



DEMRPKADFQPSIWGD
ccactatccagcaacaaagatgagatgagacca




grandi-



LFLNCPDKNIDAETEKR
aaggctgactttcaaccctcgatatggggcgattt




florum



HQQLKEEVRKMIVAPM
gttcctgaattgcccagacaagaacattgatgctg






ANSTQKLAFIDSVQRLG
aaaccgaaaagcgtcatcaacaattgaaagaag






VSYHFTKEIEDELENIY
aagtcagaaagatgatcgtggcaccaatggctaa






HNNNDAENDLYTTSIRF
ttctacacaaaagttggctttcattgactctgttcag






RLLREHGYNVSCDIFNK
aggcttggagtatcctaccactttactaaagaaatt






FKNSDGNFKEDLINDVS
gaggatgaattagaaaacatctatcacaacaataa






GMLCLYEATHLRVHGE
cgacgcagaaaacgatttgtacacgacttccata






DILDEALEFTTTRLKSIL
agattcagattattgagagaacatggttacaatgtc






PDLEPPLATQVMHALK
tcttgtgacatctttaacaagttcaagaatagcgat






QSIRRGLPRVEARHYLS
ggtaacttcaaggaagacttgattaatgatgtttca






VYQDIESHNKALLEFA
ggtatgctctgtttatatgaagcgacccacttgcga






KIDFNMLQFLHRKELSE
gttcatggtgaggatatcttagacgaagctttgga






ICRWWKDLDFQRKLPY
atttacaactactcgcctaaaatctattttgcctgac






ARDRVVEGYFWISGVY
ttagaaccacccctggccacccaagtcatgcacg






FEPQYSLGRKMLTKVIA
ctttgaagcaaagcatcagacgtggtcttccaaga






MASIVDDTYDSYATYE
gttgaagccagacactacttgagtgtttatcaagat






ELIPYTNAIERWDIKCID
attgaatctcataacaaagctttgttggaatttgcta






EIPEYMKPSYKALLDV
agattgatttcaacatgttacaattcctacatagga






YEEMVQLVAEHGRQY
aggagctatcggaaatctgtagatggtggaaaga






RVEYAKNAMIRLAQSY
tctcgattttcaaagaaagttaccttacgcacggg






LVEAKWTLQNYKPSFE
accgtgtcgtcgaaggttatttctggatttccgggg






EFKANALPTCGYAMLA
tttacttcgaaccacaatacagtttgggtagaaag






ITSFVGMGDIVTPETFK
atgttgactaaggttattgctatggcttctatcgtcg






WAASDPKIIQASTIICRF
atgacacctacgattcttacgccacctatgaggaa






MDDVAEHKFKHRRED
ttgataccatatactaacgccatcgaaagatggga






DCSAIECYMEEYGVTA
catcaagtgtatagacgagatcccagaatacatg






QEAYDVFNKHVESAW
aagccttcgtataaagctttattggatgtatacgag






KDLNQEFLKPTEMPTE
gaaatggtgcaattggttgccgaacacggtagac






VLNRSLNLARVMDVLY
agtacagagtggaatacgctaagaatgctatgatt






REGDGYTYVGKAAKG
cgccttgcgcaatcctacttggttgaagcgaaatg






GITSLLIEPIAL
gactctccaaaactacaagccatctttcgaagaat






(SEQ ID NO: 19)
ttaaggccaatgctttaccgacatgcggatatgct







atgctagctatcaccagcttcgttggtatgggtgat







attgtcacgccagaaacttttaaatgggctgcatct







gacccaaagattattcaggcttccactatcatctgt







aggttcatggatgatgttgctgaacataagtttaag







cacagaagagaagacgactgttcagctattgaat







gttacatggaagaatacggcgtcaccgcgcaag







aagcctacgacgtattcaacaaacacgtcgagtc







ggcatggaaggatctgaaccaagaatttctaaaa







cccactgagatgccaacagaagttctcaacagaa







gtttgaacttggctagagtaatggacgttttgtata







gagagggtgatggttatacttatgttggtaaagcc







gctaagggtggcattacctcattgcttatcgagcc







aatcgctttgtaa







(SEQ ID NO: 85)





LeuGraS

Leuca-

Q5SBP4
13%
ESRRSANYQASIWETNF
atggaaagtaggcgttcagcaaattatcaggcttc


QTS385

dendron



TNSPLLSKLQNELSVAH
catatgggagacaaactttactaactctccactttta




grandi-



LEELKLEVKQLIWSTKD
tctaagttgcaaaatgaactgtcggtcgcccatct




florum



PLFLLKFIDSIQRLGVAY
cgaagaattgaaactagaggtgaagcaattaatct






HFEEEIKESLHLVYLEE
ggagcacgaaggatcccttattccttttgaaattca






RNGDHQHYKEKGLHFT
ttgactccattcaaagattgggcgttgcttaccactt






ALRFRILRQDGYHVPQ
tgaagaagaaatcaaggaatctttgcacctggtct






DVFSSFMNKAGDFEES
acctggaagagcgaaacggtgatcatcaacact






LSKDTKGLVSLYEASY
ataaggaaaaaggattgcatttcaccgctttgaga






LSMEGETILDMAKDFSS
ttcagaatattgagacaggacggttaccacgtacc






HHLHKMVEDATDKRV
acaagatgttttttcttcattcatgaataaggctggt






ANQIIHSLEMPLHRRVQ
gactttgaagaaagtttatccaaagacactaaggg






KLEAIWFIQFYECGSDA
tttggtctctttgtacgaagcctcctacctctctatg






NPTLVELAKLDFNMVQ
gaaggtgaaaccattttggatatggccaaggattt






ATYQEELKRLSRWYEE
ctcctctcaccatttacacaagatggttgaagatgc






TGLQEKLSFARHRLAE
tactgacaaaagagttgctaaccaaatcattcata






AFLWSMGIIPEGHFGYG
gcttggagatgcctttgcatagaagagttcaaaag






RMHLMKIGAYITLLDDI
ctagaggctatctggttcatccaattttatgaatgc






YDVYGTLEELQVLTEII
ggttccgacgccaacccgaccttggtcgaattgg






ERWDINLLDQLPEYMQ
cgaaattagattttaatatggtgcaagctacttacc






IFFLYMFNSTNELAYEIL
aagaagaattaaagcgtctatctaggtggtacga






RDQGINVISNLKGLWV
ggaaaccggtctccaagaaaagttgtctttcgctc






ELSQCYFKEATWFHNG
gtcacagattggctgaagctttcttgtggtctatgg






YTPTTEEYLNVACISAS
gcattattcctgaaggtcatttcggatatggcagaa






GPVILFSGYFTTTNPINK
tgcaccttatgaagatcggtgcatacattaccttatt






HELQSLERHAHSLSMIL
ggatgatatttatgacgtttatggtactttggaagaa






RLADDLGTSSDEMKRG
ttgcaagtattgacagaaatcatcgaaagatggg






DVPKAIQCFMNDTGCC
atattaaccttttggaccagttgccagaatacatgc






EEEARQHVKRLIDAEW
aaatattcttcctctacatgtttaactctacaaatgaa






KKMNKDILMEKPFKNF
ctagcttacgaaatcttaagagaccaaggtattaat






CPTAMNLGRISMSFYE
gtcatatccaaccttaaaggtctttgggtcgaactg






HGDGYGGPHSDTKKK
tcacaatgttatttcaaagaagccacgtggttccac






MVSLFVQPMNITI
aacggttataccccaaccactgaggaatacctaa






(SEQ ID NO: 20)
acgttgcttgtatttcagcgtccggtccagttatctt







gttttcgggatactttactactacaaatccaatcaac







aagcatgaattgcaatctttagaaagacacgctca







ctccttaagtatgatcttaagactagcggatgacct







aggtacttcttcggatgagatgaagcggggtgat







gttcctaaggctattcaatgtttcatgaacgacacg







gggtgttgcgaagaagaagccagacagcacgtt







aagagattgattgacgcagaatggaagaagatg







aataaggatatcttgatggagaagccatttaaaaa







cttctgtccaactgcaatgaatttaggccgtatcag







tatgtctttctacgagcacggtgacggttacggcg







gtccacattctgataccaaaaagaagatggtctcg







ttgtttgttcaacccatgaatattaccatttaa







(SEQ ID NO: 86)





LeuGraS

Leuca-

Q9T0J9
10%
ESQTTFKYESLAFTKLS
atggaatcacagactacattcaaatatgagtcttta


QTS393

dendron



HCQWTDYFLSVPIDESE
gcatttaccaagttgtcccattgccaatggactgat




grandi-



LDVITREIDILKPEVMEL
tacttcttgtctgttccaatagacgaatccgaattgg




florum



LSSQGDDETSKRKVLLI
acgtcatcaccagagaaattgatattttaaagcctg






QLLLSLGLAFHFENEIK
aggttatggaattgttatcttcacaaggtgatgacg






NILEHAFRKIDDITGDE
aaacatctaagcgtaaagtcttgttgatccaattgtt






KDLSTISIMFRVFRTYG
gttatctttgggattagcctttcacttcgaaaacga






HNLPSSIFNKFKNSDGN
gattaagaatatcttggaacacgctttcagaaaga






FKEDLINDVSGMLCLY
ttgatgacatcactggtgacgaaaaggatttgtcc






EATHLRVHGEDILDEAL
accatttccataatgtttagagttttcagaacttacg






EFTTTRLKSILPGGTCRP
gtcataacttgccatcctctatctttaataaattcaaa






HILRLIRNTLYLPQRwN
aactcagatggtaatttcaaggaagacttgataaa






MEAVIAREYISFYEQEE
cgatgtttctggtatgttgtgtttatacgaagctact






DHDKMLLRLAKLNFKL
cacttgagagtccatggtgaagacattttagatga






LQLHYIKELKSFIKWW
agctttagagtttaccactacccgtttgaagtctatc






MELGLTSKWPSQFRERI
ttgccaggtggtacttgtagacctcacattttaaga






VEAWLAGLMMYFEPQ
ttgattagaaacactttatatttgccacaaagatgg






FSGGRVIAAKFNYLLTI
aacatggaagccgtcatcgctcgtgaatacatatc






LDDACDHYFSIHELTRL
cttttacgaacaagaggaagaccacgataagatg






VACVERWSPDGIDTLE
ttattgagattggctaagttgaatttcaaattgttaca






DISRSVFKLMLDVFDDI
gttgcattatattaaggaattgaagtcattcatcaaa






GKGVRSEGSSYHLKEM
tggtggatggaattgggtttaacatctaaatggcc






LEELNTLVRANLDLVK
atctcaatttagagagcgtatcgttgaagcctggtt






WARGIQVPSFEEHVEV
agctggtttgatgatgtactttgaaccacaattctcc






GGIALTSYATLMYSFV
ggtggtagagttattgcagctaagttcaactatttat






GMGETAGKEAYEWVR
tgaccattttggatgatgcttgtgatcactacttctc






SRPRLIKSLAAKGRLMD
aattcatgaattgaccagattggtcgcttgtgttga






DITDFDSDMSNGFAAN
aagatggtctccagacggtatcgatacattggag






AINYYMKQFVVTKEEA
gacatctcccgttctgtctttaagttaatgttggatgt






ILECQRMIVDINKTINEE
ttttgacgatatcggtaagggtgttagatccgaag






LLKTTSVPGRVLKQAL
gttcttcctatcacttgaaagaaatgttggaagaatt






NFGRLLELLYTKSDDIY
aaatactttagttagagcaaatttggacttggttaaa






NCSEGKLKEYIVTLLID
tgggccagaggtatccaagtcccatctttcgaag






PIRL
agcatgttgaggttggtggtattgctttaacatccta






(SEQ ID NO: 21)
cgccactttgatgtactctttcgtcggaatgggtga







aaccgctggtaaggaagcctacgaatgggttcgt







tccagacctcgtttgataaagtctttggcagctaaa







ggtagattgatggacgacattactgattttgattca







gatatgtctaacggtttcgctgctaacgcaattaac







tattacatgaagcaattcgtcgttaccaaggaaga







agccatcttagaatgccagagaatgatcgtcgac







atcaacaagaccattaatgaagagttgttaaaaac







tacatctgttcctggtagagtcttgaagcaagcttt







gaacttcggtagattattggaattgttgtacactaa







atctgacgacatctataattgttccgaaggtaagtt







aaaggaatacattgttactttgttgatcgatccaata







agattgtaa







(SEQ ID NO: 87)





MacVolS

Macro-

D0VMR
14%
SSAKLGSASEDVNRRD
atgtcctcagcaaaattgggttctgcttctgaagat


QTS113

stylis

6

ANYHPTVWGDFFLTHS
gtcaaccgtagagacgctaattaccatccaaccg


9

villosa



SNFLENNDSILEKHEGL
tttggggagatttctttttaacacactcctctaacttc






EQKIRTMLISPTDTISKK
ttggagaacaatgactcaatattggaaaagcacg






LSLIDAVQRLGVAYHFE
aaggtttggaacaaaagattagaactatgttaatct






KEIEDEIEKLSCKEYND
ctcctaccgatactatctccaagaaattatctttgat






GNDLQTVALRFRLLRQ
tgacgccgttcagagattgggtgtcgcttatcatttt






QGYFVSCDVFKRFKNT
gagaaggaaattgaagatgaaatcgaaaagttat






KGEFETEDARTLWCLY
catgtaaagagtacaacgacggtaatgacttgca






EATHLRVDGEDILEEAI
aaccgtcgccttgagattcagattattgagacaac






QFSRKKLEALLPELSFP
aaggttatttcgtttcctgcgatgtttttaagcgtttc






LNECVRDALHIPYHRN
aagaacactaagggtgaatttgagactgaagatg






VQRLAARQYIPQYDAE
ctagaacattgtggtgtttatacgaagctactcactt






PTKIESLSLFAKIDFNML
gagagttgacggtgaagatattttggaagaagct






QALHQRELREASRWW
atccaattctctcgtaagaaattagaagcattgttg






KEFDFPSKLPYARDRIA
ccagaattatcctttccattgaatgaatgtgttagag






EGYYWMMGAHFEPKF
atgccttgcatatcccataccacagaaacgtccag






SLSRKFLNRIIGITSLIDD
agattggctgcacgtcaatatataccacaatacga






TYDVYGTLEEVTLFTE
cgctgagcctaccaagattgaatccttatctttgttc






AVERWDIEAVKDIPKY
gctaagattgactttaatatgttgcaggccttgcac






MQVIYTGMLGIFEDFK
caaagagaattgagagaagcttccagatggtgg






DNLINARGKDYCIDYAI
aaggagttcgattttccatctaaattgccttatgccc






EVFKEIVRSYQREAEYF
gtgatagaatcgctgaaggttactactggatgatg






HTGYVPSYDEYMENSII
ggtgctcatttcgaaccaaaattttctttgtctcgta






SGGYKMFIILMLIGRGE
agttcttaaacagaatcattggtataacctccttaat






FELKETLDWASTIPEMV
tgatgatacttatgacgtctacggtactttagaaga






EASSLIARYIDDLQTYK
agttaccttgttcaccgaagccgttgaaagatggg






AEEERGETVSAVRCYM
atattgaggctgtcaaagacatcccaaagtacatg






REFGVSEEQACKKMRE
caagttatatacacaggtatgttaggtattttcgaa






MIEIEWKRLNKTTLEAD
gatttcaaagacaatttgattaacgccagaggtaa






EISSSVVIPSLNFTRVLE
ggattattgcatcgattacgctatcgaagttttcaa






VMYDKGDGYSDSQGV
ggagattgtcagatcttaccaaagagaagcagaa






TKDRIAALLRHAIEI
tactttcacactggttacgttccatcttatgacgaat






(SEQ ID NO: 22)
acatggaaaactcaattatctcaggtggttacaaa







atgtttataatcttgatgttaatcggtagaggtgagt







tcgaattgaaagaaaccttagattgggcttcaact







attccagaaatggtcgaagcttcttccttgatagct







agatacatcgacgatttgcaaacatacaaggccg







aagaagaacgtggtgaaacagtttcagcagtcag







atgttacatgagagagtttggtgtttctgaggaaca







agcttgtaagaagatgagagaaatgattgagatc







gaatggaagagattgaacaagactaccttggaag







ctgacgaaatttcttcttccgttgttattccatctttga







actttactagagtcttggaagtcatgtatgacaagg







gagacggttattctgattcccaaggtgttaccaag







gatcgtattgctgctttgttaagacacgccattgag







atataa







(SEQ ID NO: 88)





MacVolS

Macro-

A0A067
62%
RDLKSVLSSKESTKAD
atgcgtgacttgaaatccgtcttatcttcaaaggaa


QTS219

stylis

D5M4

VNRRSSNYHPSIWGDH
tctacaaaggcagatgttaatagaagatcctctaa


8

villosa



FINVSSNEKYTNTEVEK
ctatcacccttccatctggggtgatcatttcattaac






RFETLKAEIEKLLVSNN
gtttcttcaaatgagaagtacactaacactgaagtc






TAWKTLEEIVAIVNQLQ
gaaaaaagatttgaaaccttgaaggccgaaatag






RLGLAYHFENQIKEAL
aaaagttgttagtttctaacaacaccgcttggaag






QSIYDSHVNGNCDVNY
accttggaggaaattgtcgctatcgttaatcagttg






DHNNDLYIVALRFRLL
caaagattagggttggcttaccacttcgaaaacca






RQHGYKVSADIFKKFR
aatcaaagaagccttgcaatccatttatgactctca






DEKGEFKAMLTNDAK
tgtcaacggtaattgcgacgttaattacgatcaca






GLLCLYEASYLRVQGE
acaacgatttgtacatagtcgctttaagatttcgttt






NILEEACEFSRKHLKSL
gttgagacaacacggttataaagtctctgctgaca






LSHLSTSLAEQVKHSLE
ttttcaagaagtttagagatgaaaagggtgaattta






IPLHRGMPRLEARHYISI
aggctatgttaacaaatgacgccaaaggtttgttgt






YEEDNSSRNELILELAK
gtttatacgaagcatcctatttgagagttcaaggtg






LDFNLLQALHRRELGEI
aaaatatcttagaagaggcttgtgaattttctcgtaa






SRWWKDIDFATKLPFA
gcatttgaagtcattattgtctcacttgtccacctcat






RDRLVECYFWILGVYF
tggctgagcaagttaagcactctttggaaatccca






EPKYSITRKFMTKVIAI
ttacatagaggtatgccaagattggaagctagac






ASVIDDIYDVYGTLEEL
attacatttctatttacgaggaagataactcctctcg






KLFTHAIERWETVAAN
taatgaattgatattagagttggcaaagttggactt






ELPKYMQVCYFALLDV
caacttgttgcaggccttacacagaagagaattg






FKEMEDKLVNKGLLYS
ggtgaaatttctcgttggtggaaagatattgatttc






MPCAKEAVKGLVRAYF
gctactaaattgccattcgccagagacagattagt






VEAEWFNANYMPTFEE
tgaatgttacttctggatcttgggtgtttattttgaac






YMENSTMSSGYPMLAV
ctaaatactccatcactagaaagttcatgactaag






EALIGIEDATISKEAFD
gttatcgctattgcttccgtcatcgatgatatatacg






WAISVPKIIRSCALIARL
acgtttatggtaccttggaggaattgaagttgttca






VDDIHTYKVEQERGDA
ctcatgctattgaaagatgggaaactgtcgctgcc






PSSVECYMQQYDVSEE
aacgaattaccaaagtacatgcaagtttgttacttt






EACNRIKGMVEIEWMN
gctttgttagacgtctttaaggaaatggaagataaa






INEEIQDPNHPPLQWLL
ttagtcaataaaggtttgttatactccatgccatgtg






PSLNLARMMVVLYQN
caaaggaggctgttaaaggtttggttagagcttac






GDNYTNSSGKTKDRIA
ttcgttgaggctgaatggttcaacgctaactatatg






SLLVDPLPM
ccaaccttcgaagaatatatggaaaactcaactat






(SEQ ID NO: 23)
gtcctctggttatccaatgttggctgtcgaagctttg







atcggtattgaagacgcaactatttcaaaggaagc







cttcgattgggcaatatctgttcctaaaattatccgt







tcatgcgcattgatcgccagattggtcgatgacatt







cacacctacaaggtcgaacaagagagaggtgat







gccccatcttccgtcgaatgttacatgcaacaata







cgacgtttctgaggaagaagcctgtaatagaatta







agggtatggttgaaattgaatggatgaatataaac







gaggaaatccaggatccaaaccacccacctttac







aatggttgttgccatctttgaacttagctcgtatgat







ggtcgttttgtaccaaaatggtgacaactatacaa







actcctccggtaaaaccaaggatagaattgcttcc







ttgttggtcgaccctttgccaatgtaa







(SEQ ID NO: 89)





MacVolS

Macro-

A0A067
69%
RDLKSVLSSKESTKAD
atgcgtgacttgaaatccgtcttatcttcaaaggaa


QTS220

stylis

D5M4

VNRRSSNYHPSIWGDH
tctacaaaggcagatgttaatagaagatcctctaa


2

villosa



FINVSSNEKYTNTEVEK
ctatcacccttccatctggggtgatcatttcattaac






RFETLKAEIEKLLVSNN
gtttcttcaaatgagaagtacactaacactgaagtc






TAWKTLEEIVAIVNQLQ
gaaaaaagatttgaaaccttgaaggccgaaatag






RLGLAYHFENQIKEAL
aaaagttgttagtttctaacaacaccgcttggaag






QSIYDSHVNGNCDVNY
accttggaggaaattgtcgctatcgttaatcagttg






DHNNDLYIVALRFRLL
caaagattagggttggcttaccacttcgaaaacca






RQHGYKVSADIFKKFK
aatcaaagaagccttgcaatccatttatgactctca






DEKGEFKDMIRNDARG
tgtcaacggtaattgcgacgttaattacgatcaca






LLCLYEASHLRVKGEDI
acaacgatttgtacatagtcgctttaagatttcgttt






LEEATEFSRKHLKSLLP
gttgagacaacacggttataaagtctctgctgaca






QLSTSLAEQVKHSLEIP
ttttcaagaagtttaaagatgaaaagggtgaattta






LHRGMPRLEARHYISIY
aggatatgatcagaaatgacgccagaggtttattg






EENNSSRNELLLELAKL
tgtttatacgaagcatcccatttgagagttaagggt






DFNLLQALHRRELGDIS
gaagatattttagaagaggctactgaattttctcgt






RWWKDIDFATKLPFAR
aagcacttgaagtcattgttaccacaattgtccaca






DRLVECYFWILGVYFEP
tcattggctgagcaagttaagcactctttggaaatc






KYSITRKFMTKVIAIAS
ccattacatagaggtatgccaagattggaagcta






VIDDIYDVYGTLEELKL
gacattacatttctatttatgaggaaaacaactcctc






FTHAIERWETVAANELP
tcgtaatgaattgttgttagagttggcaaagttgga






KYMQVCYFALLDVFKE
cttcaacttgttgcaggctttacacagaagagaatt






MEDKLVNKGLLYSMPC
gggtgatatttctcgttggtggaaagacatcgattt






AKEAVKGLVRAYFVEA
cgccactaaattgccattcgccagagacagatta






EWFNANYMPTFEEYME
gttgaatgttacttctggatcttgggtgtttattttga






NSTMSSGYPMLAVEAL
acctaaatactccattactagaaaattcatgaccaa






IGIEDATISKEAFDWAIS
ggttatcgctatagcttctgtcatcgatgatatatac






VPKIIRSCALIARLVDDI
gacgtttacggtaccttggaagaattgaagttgttc






HTYKVEQERGDAPSSV
actcatgctattgagcgttgggaaactgtcgctgc






QCYVQQYGVSEEEACN
taatgaattaccaaagtatatgcaagtttgttacttt






KIKGMVEIEWMNINEEI
gctttgttagacgtctttaaggaaatggaagataaa






QDPNHPPLQWLLPSLN
ttagtcaataaaggtttgttatactccatgccatgtg






LARMMVVLYQNGDNY
caaaggaggctgttaagggtttggttagagccta






TNSSGKTKDRIASLLVD
cttcgttgaggctgaatggttcaacgctaactatat






PLPM
gccaaccttcgaagaatatatggaaaactcaacta






(SEQ ID NO: 24)
tgtcctctggttatcctatgttggctgtcgaagcttt







gatcggtattgaagacgcaactatttcaaaggaa







gccttcgattgggcaatatccgttccaaaaattatc







agatcttgtgcattgatcgccagattggtcgatga







cattcacacctacaaggtcgaacaagagagagg







tgatgccccatcttctgtccaatgctacgttcaaca







atacggtgtctccgaagaagaagcctgtaataaa







attaagggtatggttgagattgaatggatgaatata







aacgaagaaatccaggatccaaaccacccacctt







tacaatggttgttgccatctttgaacttagctcgtat







gatggttgttttgtaccaaaatggtgacaactacac







aaactcctccggtaaaaccaaggatagaattgctt







ccttgttggtcgaccctttgccaatgtaa







(SEQ ID NO: 90)





MacVolS

Macro-

A0A067
69%
RDLKSVLSSKESTKAD
atgcgtgacttgaaatccgtcttatcttcaaaggaa


QTS222

stylis

D5M4

VNRRSSNYHPSIWGDH
tctacaaaggcagatgttaatagaagatcctctaa


2

villosa



FINVSSNEKYTNTEVEK
ctatcacccttccatctggggtgatcatttcattaac






RFETLKAEIEKLLVSNN
gtttcttcaaatgagaagtacactaacactgaagtc






TAWKTLEEIVAIVNQLQ
gaaaaaagatttgaaaccttgaaggccgaaatag






RLGLAYHFENQIKEAL
aaaagttgttagtttctaacaacaccgcttggaag






QSIYDSHVNGNCDVNY
accttggaggaaattgtcgctatcgttaatcagttg






DHNNDLYIVALRFRLL
caaagattagggttggcttaccacttcgaaaacca






RQHGYKVSADIFKKFK
aatcaaagaagccttgcaatccatttatgactctca






DEKGEFKDMIRNDARG
tgtcaacggtaattgcgacgttaattacgatcaca






LLCLYEASHLRVKGEDI
acaacgatttgtacatagtcgctttaagatttcgttt






LEEATEFSRKHLKSLLP
gttgagacaacacggttataaagtctctgctgaca






QLSTSLAEQVKHSLEIP
ttttcaagaagtttaaagatgaaaagggtgaattta






LHRGMPRLEARHYISIY
aggatatgatcagaaatgacgccagaggtttattg






EENNSSRNELLLELAKL
tgtttatacgaagcatcccatttgagagttaagggt






DFNLLQALHRRELGDIS
gaagatattttagaagaggctactgaattttctcgt






RWWKDIDFATKLPFAR
aagcacttgaagtcattgttaccacaattgtccaca






DRLVECYFWILGVYFEP
tcattggctgagcaagttaagcactctttggaaatc






KYSITRKFMTKVIAIAS
ccattacatagaggtatgccaagattggaagcta






VIDDIYDVYGTLEELKL
gacattacatttctatttatgaggaaaacaactcctc






FTHAIERWETVAANELP
tcgtaatgaattgttgttagagttggcaaagttgga






KYMQVCYFALLDVFKE
cttcaacttgttgcaggctttacacagaagagaatt






MEDKLVNKGLLYSMPC
gggtgatatttctcgttggtggaaagacatcgattt






AKEAVKGLVKAYFVEA
cgccactaaattgccattcgccagagacagatta






KWFHAKYVPTFEEYME
gttgaatgttacttctggatcttgggtgtttattttga






NSTMSSGYPMLAVEAL
acctaaatactccattactagaaaattcatgaccaa






VGLEDMAITKRALDWA
ggttatcgctatagcttctgtcatcgatgatatatac






ISVPKIIRSCALIARLDD
gacgtttacggtaccttggaagaattgaagttgttc






DVHTYKVEQERGDAPS
actcatgctattgagcgttgggaaactgtcgctgc






SVQCYMQQYDVSEEEA
taatgaattaccaaagtatatgcaagtttgttacttt






CNRIKGMVETAWMEIN
gctttgttagacgtctttaaggaaatggaagataaa






GEIQDTNHLPLQWLLPS
ttagtcaataaaggtttgttatactccatgccatgtg






LNLARMMVVLYQNGD
caaaggaggctgttaagggtttggttaaggccta






NYTNSSGKTKDRIASLL
cttcgttgaggctaagtggttccacgctaagtatgt






VDPLPM
cccaaccttcgaagaatatatggaaaactcaacta






(SEQ ID NO: 25)
tgtcctctggttatcctatgttggctgttgaagctttg







gttggtttagaagacatggccattacaaagagag







ctttggattgggcaatatccgttccaaaaattatca







gatcatgtgcattgatcgccagattggacgatgac







gttcacacttacaaggtcgaacaagagagaggtg







atgccccatcttctgtccaatgctacatgcaacaat







acgacgtctccgaagaagaagcatgtaatcgtatt







aagggtatggttgaaactgcttggatggaaatcaa







cggtgagatccaggataccaaccacttgccatta







caatggttgttgccatctttgaacttagctagaatg







atggtcgttttgtaccaaaatggtgacaactacac







caactcctccggtaaaaccaaggatagaattgcc







tctttgttggtcgaccctttgcctatgtaa







(SEQ ID NO: 91)





MacVolS

Macro-

A0A067
65%
RDLKSVLSSKESTKAD
atgcgtgacttgaaatccgtcttatcttcaaaggaa


QTS225

stylis

D5M4

VNRRSSNYHPSIWGDH
tctacaaaggcagatgttaatagaagatcctctaa


1

villosa



FINVSSNEKYTNTEVEK
ctatcacccttccatctggggtgatcatttcattaac






RFETLKAEIEKLLVSNN
gtttcttcaaatgagaagtacactaacactgaagtc






TAWKTLEEIVAIVNQLQ
gaaaaaagatttgaaaccttgaaggccgaaatag






RLGLAYHFENQIKEAL
aaaagttgttagtttctaacaacaccgcttggaag






QSIYDSHVNGNCDVNY
accttggaggaaattgtcgctatcgttaatcagttg






DHNNDLYIVALRFRLL
caaagattagggttggcttaccacttcgaaaacca






RQHGYKVSADIFKKFK
aatcaaagaagccttgcaatccatttatgactctca






DEKGEFKDMIRNDARG
tgtcaacggtaattgcgacgttaattacgatcaca






LLCLYEASHLRVKGEDI
acaacgatttgtacatagtcgctttaagatttcgttt






LEEATEFSRKHLKSLLP
gttgagacaacacggttataaagtctctgctgaca






QLSTSLAEQVKHSLEIP
ttttcaagaagtttaaagatgaaaagggtgaattta






LHRGMPRLEARHYISIY
aggatatgatcagaaatgacgccagaggtttattg






EENNSSRNELLLELAKL
tgtttatacgaagcatcccatttgagagttaagggt






DFNLLQALHRRELGDIS
gaagatattttagaagaggctactgaattttctcgt






RWWKDIDFATKLPFAR
aagcacttgaagtcattgttaccacaattgtccaca






DRLVECYFWILGVYFEP
tcattggctgagcaagttaagcactctttggaaatc






KYSITRKFMTKVIAIAS
ccattacatagaggtatgccaagattggaagcta






VIDDIYDVYGTLEELKL
gacattacatttctatttatgaggaaaacaactcctc






FTHAIERWETVAANELP
tcgtaatgaattgttgttagagttggcaaagttgga






KYMQVCYFALLDVFKE
cttcaacttgttgcaggctttacacagaagagaatt






MEDKLVNKGLLYSMPC
gggtgatatttctcgttggtggaaagacatcgattt






AKEAVKGLVKAYFVEA
cgccactaaattgccattcgccagagacagatta






KWFHAKYVPTFEEYME
gttgaatgttacttctggatcttgggtgtttattttga






NSTMSSGYPMLAVEAL
acctaaatactccattactagaaaattcatgaccaa






VGLEDMAITKRALDWA
ggttatcgctatagcttctgtcatcgatgatatatac






ISVPKIIRSCALIARLDD
gacgtttacggtaccttggaagaattgaagttgttc






DVHTYKVEQERGDAPS
actcatgctattgagcgttgggaaactgtcgctgc






SVECYMQQYDVSEEEA
taatgaattaccaaagtatatgcaagtttgttacttt






CNRIKGMVEIEWMNIN
gctttgttagacgtctttaaggaaatggaagataaa






EEIQDPNHPPLQWLLPS
ttagtcaataaaggtttgttatactccatgccatgtg






LNLARMMVVLYQNGD
caaaggaggctgttaagggtttggttaaggccta






NYTNSSGKTKDRIASLL
cttcgttgaggctaagtggttccacgctaagtatgt






VDPLPM
cccaaccttcgaagaatatatggaaaactcaacta






(SEQ ID NO: 26)
tgtcctctggttatcctatgttggctgttgaagctttg







gttggtttagaagacatggccattacaaagagag







ctttggattgggcaatatccgttccaaaaattatca







gatcatgtgcattgatcgccagattggacgatgac







gttcacacttacaaggtcgaacaagagagaggtg







atgccccatcttctgtcgaatgctacatgcaacaat







acgacgtctccgaagaagaagcatgtaatcgtatt







aagggtatggttgagattgaatggatgaacataaa







cgaagaaatccaggatccaaaccacccaccttta







caatggttgttgccatctttgaacttagctagaatg







atggtcgttttgtaccaaaatggtgacaactacac







caactcctccggtaaaaccaaggatagaattgctt







ctttgttggtcgaccctttgccaatgtaa







(SEQ ID NO: 92)





MacVolS

Macro-

A0A097
38%
SFAVSASPAKFIQNVEK
atgtccttcgcagtttcagcctctcctgctaaatttat


QTS227

stylis

ZIE0

DSTRRSANFHPSIWGDH
acagaatgtcgagaaggattctaccagacgttct


4

villosa



FLQYTCDSQEPDDDGS
gctaacttccacccatccatctggggtgaccatttt






VKHQQLKEEIRKMLTA
ttgcaatacacttgcgactcacaagaaccagatg






ETKLSQKLDLIDAIQRL
atgacgggtctgttaagcatcaacaattaaaggaa






GVAYHFESEIDEILGRV
gaaattagaaaaatgttgacagctgaaactaagtt






HQAYQESDLCVNENDG
gtcccagaagttagatttgattgacgccatccaaa






LYYISLQFRLLRENGYR
gattgggtgtcgcttatcacttcgaatctgaaatcg






ISADVFNKFRDIDGNFK
atgagattttaggtagagttcaccaagcttaccaa






PSLARNVRGMLSLYEA
gaatcagacttgtgtgtcaacgaaaatgacggttt






THLRVHGENILDEAHA
gtattacatttctttgcaattcagattattgcgtgaaa






FATSHLESIATHQISSPL
acggttacagaatatctgccgatgtctttaacaaat






AEQVKHALFQPIHKGV
tcagagatatcgatggtaattttaagccatccttgg






QRLEARNYMPFYQEEA
ctagaaacgttagaggtatgttatccttgtatgaag






SHNEALLTFAKLDFNK
ccacccatttgcgtgttcacggtgaaaacattttgg






LQKLHQKELSEITRWW
acgaagctcacgctttcgcaacttctcatttagaat






KELDFAHNLPFTIRDRI
ctattgccacccaccaaatctcttccccattggctg






AECYFWAVAVYFEPQY
agcaggtcaagcatgctttgttccaaccaattcac






SLGRRMLAKVFPMTSII
aaaggtgttcaaagattagaagcaagaaattacat






DDIYDVYGKFEELELFT
gcctttctatcaagaagaagcttcccacaacgag






SAIERWDISAIDELPEY
gctttgttaacatttgctaagttggactttaacaagtt






MKLCYRALLDVYSEAE
gcaaaagttgcatcagaaagaattgtctgaaatca






KDLASQGKLYHLHYAK
ctcgttggtggaaggaattagatttcgctcacaatt






EAMKNQVKNYFFEAK
tgccatttactattagagatagaatcgcagaatgtt






WCHQNYIPSVDEYMTV
acttctgggctgttgcagtttacttcgagccacaat






ASVTSGYPMLSTTSFVG
attccttaggtagacgtatgttggccaaagtttttcc






MGDIVTKESFEWSLTNP
tatgacctctataattgacgatatctacgacgtcta






RVIRASSVAARLMNDM
cggtaaattcgaagaattagaattgttcacctcag






VSHKFEQSREHVASSIE
ctatcgaaagatgggatatctctgctatcgatgagt






CYMKQYGATEEETCNE
taccagagtatatgaagttgtgttacagagccttgt






LRKQVSNAWKDINEEC
tagatgtctactctgaagccgaaaaggacttagca






LCPTAVPMPLIVRILNL
tcccaaggtaagttgtatcacttgcattacgccaaa






TRFLDVVYRFEDGYTH
gaagctatgaagaatcaggttaagaactactttttc






SGVVLKDFVASLLINPV
gaggctaagtggtgccatcaaaactatattccatc






SI
tgttgatgaatacatgaccgttgcttccgtcacttca






(SEQ ID NO: 27)
ggttacccaatgttgtccactacttcttttgtcggtat







gggtgatattgttacaaaggaatccttcgaatggt







ctttgaccaatcctagagttatcagagcttcctctgt







tgctgctagattaatgaatgacatggtctcacaca







agttcgaacaatctcgtgaacacgtcgcttcttcaa







tagaatgttacatgaaacaatacggtgcaactgag







gaagaaacctgtaacgagttgagaaaacaagttt







ctaacgcttggaaggatattaacgaagaatgtttat







gtccaacagccgtcccaatgcctttgatagtcaga







attttaaatttgactagattcttggacgttgtttatcgt







tttgaagacggttacacccattccggtgtcgtcttg







aaggactttgttgcctctttgttgattaacccagtttc







catctaa







(SEQ ID NO: 93)





OrbStiS

Orbexi-

Q9T0J9
10%
ESQTTFKYESLAFTKLS
atggaatcacagactacattcaaatatgagtcttta


QTS136

lum



HCQWTDYFLSVPIDESE
gcatttaccaagttgtcccattgccaatggactgat


8

stipu-



LDVITREIDILKPEVMEL
tacttcttgtctgttccaatagacgaatccgaattgg




latum



LSSQGDDETSKRKVLLI
acgtcatcaccagagaaattgatattttaaagcctg






QLLLSLGLAFHFENEIK
aggttatggaattgttatcttcacaaggtgatgacg






NILEHAFRKIDDITGDE
aaacatctaagcgtaaagtcttgttgatccaattgtt






KDLSTISIMFRVFRTYG
gttatctttgggattagcctttcacttcgaaaacga






HNLPAEVFERFKDQHG
gattaagaatatcttggaacacgctttcagaaaga






NFKASLSSDVEGMLSL
ttgatgacatcactggtgacgaaaaggatttgtcc






YEASFLDYEGEDILDEA
accatttccataatgtttagagttttcagaacttacg






KAFTSFHLRGALAGGT
gtcataacttgccagctgaagtctttgaaagattca






CRPHILRLIRNTLYLPQR
aagaccaacacggtaatttcaaagcttctttgtcat






WNMEAVIAREYISFYE
ccgatgttgaaggtatgttgtctttatacgaagcct






QEEDHDKMLLRLAKLN
ctttcttggactatgaaggtgaagatattttagatga






FKLLQLHYIKELKSFIK
agctaaggcctttacttcttttcatttgcgtggtgctt






WWMELGLTSKWPSQF
tggctggtggtacctgtagacctcacatcttaagat






RERIVEAWLAGLMMYF
tgatcagaaacactttatacttgccacaaagatgg






EPQFSGGRVIAAKFNYL
aacatggaggccgtcatagctcgtgaatatatctc






LTILDDACDHYFSIHEL
cttttacgaacaagaggaagaccacgataagatg






TRLVACVERWSPDGID
ttattgagattagctaagttgaatttcaagttgttaca






TLEDISRSVFKLMLDVF
gttgcattacattaaggaattgaaatcattcatcaa






DDIGKGVRSEGSSYHL
gtggtggatggaattgggtttaacatctaaatggc






KEMLEELNTLVRANLD
catctcaatttagagagcgtattgttgaagcttggtt






LVKWARGIQVPSFEEH
agctggtttgatgatgtacttcgaaccacaattctc






VEVGGIALTSYATLMY
cggtggtagagttattgcagccaagtttaactattt






SFVGMGETAGKEAYE
gttaaccattttggatgatgcttgtgatcactatttct






WVRSRPRLIKSLAAKG
caatccatgaattgactagattggtcgcttgtgttg






RLMDDITDFDSDMSNG
aaagatggtctccagacggtatcgataccttgga






FAANAINYYMKQFVVT
ggacatctcccgttctgtctttaagttaatgttggat






KEEAILECQRMIVDINK
gtttttgacgatattggtaaaggtgttagatccgaa






TINEELLKTTSVPGRVL
ggttcttcctaccacttgaaagaaatgttggaaga






KQALNFGRLLELLYTK
attaaataccttagttagagcaaacttggacttggtt






SDDIYNCSEGKLKEYIV
aaatgggccagaggtatccaagtcccatctttcga






TLLIDPIRL
agagcatgttgaggttggtggtattgctttaacatc






(SEQ ID NO: 28)
ctacgcaactttgatgtactctttcgtcggaatggg







tgaaactgctggtaaggaagcatacgaatgggtt







cgttcaagacctcgtttgataaagtctttggccgct







aagggtagattgatggacgacatcactgattttga







ttccgatatgtctaacggtttcgctgctaacgcaatt







aactattacatgaagcagttcgtcgttacaaagga







agaagccatcttagaatgccaaagaatgattgtcg







acatcaataagaccatcaatgaagagttgttaaaa







actacctctgttccaggtagagtcttgaaacaagc







tttgaacttcggtagattattggaattgttgtatacta







agtccgacgacatttacaactgttctgaaggtaaa







ttaaaggaatacatagttactttgttgattgatccaa







taagattgtaa







(SEQ ID NO: 94)





OrbStiS

Orbexi-

A0A067
43%
SIQVPQISSQNAKSQVM
atgtccatacaggttccccaaatttcttcgcaaaat


QTS141

lum

FTE8

RRTANFHPSVWGDRFA
gcaaagtcacaagtaatgcgtagaaccgccaact


4

stipu-



NYTAEDKMNHARDLK
ttcatccatctgtgtggggagacagattcgctaact




latum



ELKALKEEVGRKLLAT
acacggctgaggataaaatgaaccacgctcgcg






AGPIVKLELVDDVKRL
acttgaaggaacttaaagcgttaaaggaagaagt






GIGYRFEKEIVEALHRC
tggtagaaagctgttggccacagctggcccaatt






FISSERFTHRNLHQTAL
gttaagctagagttggtcgatgatgtcaaaagact






SFRLLRECGYDVTCDK
cgggatcggttatagattcgaaaaggaaatcgttg






FNKFTNKEGKFNSKLG
aagctttacaccgttgctttattagttccgaaagatt






ENIKGMIDLYEASQLGI
cactcataggaatttgcaccaaaccgccttgagct






AGEYILAEAGEFSGLVL
tcagattgttacgggaatgtggttacgacgtcactt






KEKVACINNNPLKAQV
gtgataagtttaataagttcactaacaaagagggt






RHALRQPLHRGLPRLE
aagtttaactcaaagttgggtgaaaatatcaaggg






HRRYISIYQDDASHYKA
tatgatagacttgtatgaagctagccaacttggtat






LLTLAKLDFNLVQSLH
tgctggtgaatacatcttggctgaagcaggtgaat






KKELCEISRWWKDLDF
tttcgggcttagttctaaaagaaaaggttgcttgtat






ARKLPFARDRMVECYF
taacaataacccattgaaagcgcaggtcagacat






WILGVYFEPQYSVPRRT
gccctaagacaacctctgcacagaggtctcccaa






TTKVIGLCSVIDDMYD
gattagaacacaggagatacatctctatttaccaa






AYGTIDELELFTNAIER
gatgacgcttctcactataaggctttgttgaccctg






LDTSTMDQLPEYMQTF
gccaagttggatttcaacttggttcaatccctccat






FGALLDLYNEIEKEIAN
aagaaagagctttgcgaaatttccagatggtgga






EGWSYRVQYAKEAMK
aagatcttgacttcgctcggaagttaccttttgcac






ILVEGYYDESRWLKCN
gtgaccgtatggtcgaatgttatttctggatcttgg






HAPTMEEYMKVRGVSS
gagtttacttcgaaccacaatacagtgtaccaaga






GYPLLITISFIGMEDTTE
agaactaccactaaggttattggtttgtgttctgtca






EILTWATSEPMIIRASVI
tcgatgatatgtacgatgcttacggtacaattgacg






VCRLMDDIKSHKFEQE
aattagagctttttactaacgccatcgaaagattgg






RGHAASAVECYMKQH
acacctctactatggatcagctaccagaatatatg






GLSEQEVCEELYRQVS
caaactttctttggtgctttattggatttgtataacga






NAWKDINEECLNPTAV
gatcgaaaaagaaatcgcaaatgaaggttggtcc






PMPLLMRALNLARVID
taccgagtgcaatacgctaaggaagctatgaaaa






VVYKEGDGYTHVGNE
ttttggtggaaggatactatgatgaaagcagatgg






MKQNVAALLIDQVPI
ttgaagtgtaaccacgccccaaccatggaagaat






(SEQ ID NO: 29)
acatgaaggtccgtggtgttagttctggttaccctc







tcttgataaccatatctttcataggtatggaggaca







ctactgaagagatcttaacatgggctacatctgaa







cctatgattatcagagccagtgtcattgtttgtagat







tgatggacgacattaaatcccataagtttgagcaa







gagagggggcatgctgcgagcgctgtagaatgc







tatatgaagcaacacggtctatcagaacaagaag







tttgtgaagaactttacagacaggtctctaatgcat







ggaaggacatcaatgaagaatgtttgaacccgac







cgctgttccaatgccattgttaatgagagcgctga







acttggctcgcgtcattgacgtagtttataaagaag







gtgacggctacacccacgttggtaatgaaatgaa







gcaaaacgtagctgctctcctaatcgatcaagtac







caatctaa







(SEQ ID NO: 95)





ShoCusS

Shorea

ShoBecS
38%
ALQDSEVPSSILNATAG
atggctttgcaggattcagaagtcccttcttccatat


QTS154

cuspi-

QTS1

NRPTASYHPTLWGEKF
taaacgccactgctggtaatcgtccaaccgcatct




data



LVVSTQSTSGSMKNEPT
taccatccaacattgtggggagagaaattcttagtt






TQGEYDELKQQVTKML
gtttccactcaatctacctctggttccatgaagaac






TDATTNDPSKKLHLID
gaaccaactacacaaggtgaatatgacgaattga






MVQRLGIAYHFEIEIEN
agcaacaagtcaccaagatgttgactgatgctac






ALEKINLGDANYFEYD
cactaacgacccatccaaaaagttgcacttgatcg






LYTIALGFRLLRQQGIK
atatggttcaaagattaggtattgcctaccactttga






VSSEIFKKFMDEKGKFK
gattgaaatcgaaaatgctttggaaaagattaactt






EDVVNDVLGMLNLYE
aggtgacgctaactacttcgaatatgacttgtaca






AAHLRLRGEDILDEAL
ccatcgctttgggttttagattgttgagacaacagg






AFTTSHLESMATKVSPL
gtattaaagtctcatctgaaatcttcaagaagtttat






LAEQIAHALNCPIQKGL
ggatgagaaaggtaagttcaaagaagacgttgtt






PRIEARHYISLYSRETHF
aatgatgtcttaggtatgttgaacttatacgaagca






ASSNAALLRFAKIDFN
gcccatttgagattaagaggtgaagatatcttgga






MVQALHQKEISGITKW
cgaggctttagccttcactacctcccacttggaatc






WKNLDFSTKLPYARDR
tatggctacaaaggtttctcctttgttggctgaaca






IVECYFWIMGAYFEPK
aatagcccatgctttaaattgcccaattcaaaagg






YSLARTFLTKVIAMTSI
gtttaccaagaattgaagccagacactatatctcat






LDDTYDNYGTNKELEL
tgtactcccgtgaaactcactttgcttcttctaacgc






LTKCIERWDIDVIDQLP
tgcattgttgagattcgctaaaattgacttcaacatg






EYMKLVYQALLNVYSE
gttcaagctttgcaccagaaggagatctctggtatt






MEAKVAKEGRSYAIDY
acaaagtggtggaaaaatttggatttctcaactaa






AKESMKKTMKAYLDE
gttgccatacgctagagacagaatcgtcgaatgtt






AKWRQEDYVPPIEEYM
atttttggatcatgggtgcttactttgaacctaagtat






QVARISSAYPMLITNSF
tccttggctagaacttttttgaccaaggttatagcaa






VGMGEVATKEAFDWIS
tgacctctatattagatgatacatacgataactacg






NDPKILKASTTICRLMD
gtactaataaggaattggagttgttaactaaatgta






DITSHEFEQTRDHVASG
ttgaacgttgggacatcgacgttattgatcaattac






VECYMKQYGVSREETV
cagaatatatgaagttggtctaccaagcattgttga






KLFREDVANAWKDINE
acgtttactcagaaatggaagccaaagtcgctaa






GFMKPAIFPMPILTVVL
ggagggtcgttcttacgccattgactatgctaagg






NFARVMDFLYKDGDN
aatccatgaaaaagaccatgaaggcatacttgga






YTNSHMLKDYITSLLV
tgaagctaaatggagacaagaagactacgttcct






NPLLI
ccaatagaagaatatatgcaagtcgctagaatttc






(SEQ ID NO: 30)
ctctgcctacccaatgttaatcactaattccttcgtt







ggtatgggtgaagttgctaccaaagaggcattcg







attggatttccaatgacccaaagattttgaaggctt







ctactactatatgtagattgatggatgatatcacttc







tcatgaatttgaacaaacaagagaccatgttgcct







ctggtgtcgaatgttatatgaaacaatacggtgttt







cacgtgaagaaaccgttaagttattcagagagga







tgtcgctaacgcttggaaagacattaacgagggtt







tcatgaagcctgctatattcccaatgccaatcttga







ctgttgttttgaactttgccagagtcatggatttctta







tacaaggatggtgacaactatactaattctcatatg







ttgaaggattacattacatcattgttggtcaatccat







tattaatctaa







(SEQ ID NO: 96)





ShoCusS

Shorea

ShoBecS
35%
ALQDSEVPSSILNATAG
atggcattgcaggattctgaagtcccttcctcaata


QTS155

cuspi-

QTS1

NRPTASYHPTLWGEKF
ttaaacgccaccgctggtaatagaccaactgcttc




data



LVVSTQSTSGSMKNEPT
ttatcacccaacattgtggggagagaagttcttgg






TQGEYDELKQQVTKML
ttgtttccactcaatctacctcaggttctatgaaaaa






TDATTNDPSKKLHLID
cgaaccaaccactcaaggtgaatacgacgaatta






MVQRLGIAYHFEIEIEN
aagcaacaagtcacaaagatgttgactgatgcca






ALEKINLGDANYFEYD
ctactaatgacccatccaaaaagttgcatttaatcg






LYTIALGFRLLRQQGIK
atatggttcaacgtttgggtattgcttaccactttga






VSSEIFKKFMDEKGKFK
aattgagatcgaaaacgctttggaaaaaataaact






EDVVNDVLGMLNLYE
taggtgacgctaattatttcgaatacgatttgtacac






AAHLRLRGEDILDEAL
cattgctttaggttttagattgttgagacaacaaggt






AFTTSHLESMATKVSPL
atcaaggtctcttctgagattttcaagaaatttatgg






LAEQIAHALNCPIQKGL
acgaaaagggtaagttcaaagaagatgttgtcaa






PRIEARHYISLYSRETHF
cgatgttttgggtatgttgaacttgtacgaagcagc






ASSNAALLRFAKIDFN
tcatttaagattaagaggtgaagacatcttggacg






MVQALHQKEISGITKW
aagccttggccttcacaacctcccacttagagtca






WKNLDFSTKLPYARDR
atggctactaaggtctctcctttgttggctgaacaa






IVECYFWIMGAYFEPK
attgcccatgctttgaactgcccaatccaaaaggg






YSLARTFLTKVIAMTSI
tttaccacgtattgaagcaagacactatatttctttat






LDDTYDNYGTNKELEL
actccagagaaactcacttcgcttcctctaatgctg






LTKCIERWDIDVIDQLP
ctttgttgagatttgctaagatcgatttcaatatggtt






EYMKLVYQALLNVYSE
caagccttgcatcagaaggaaatatcaggtataa






MEAKVAKEGRSYAIDY
ccaaatggtggaagaacttggacttttccactaaa






AKESMKKTMKAYLDE
ttaccatatgctagagatcgtattgttgaatgttactt






AKWRQEDYVPTIEEYM
ctggatcatgggtgcttactttgaaccaaagtattc






QVALISSAYPMLITNSF
tttagcaagaacattcttgaccaaagtcattgcaat






VGMGEVATKEAFDWIS
gacctctatcttagacgatacttacgacaactacg






NNPKMLKASTIICRLMD
gtactaacaaggaattggagttgttgactaagtgt






DITSHEFEQTRDHVASG
atcgaaagatgggatattgatgttatcgaccagtta






VECYMKQYGVSREETV
cctgagtatatgaagttggtttatcaagctttgttaa






KLFREDVANAWKDINE
atgtttactctgaaatggaagctaaggtcgccaaa






GFMKPAIFPMPILTVVL
gaaggtcgttcctacgccattgactacgcaaaag






NFARVMDFLYKDGDN
aatctatgaagaaaaccatgaaagcctacttgga






YTNSHMLKDYITSLLV
cgaggctaagtggagacaagaagattacgtccct






NPLLI
accattgaagaatatatgcaagttgcattaatatca






(SEQ ID NO: 31)
tccgcttatccaatgttgattacaaactcattcgtcg







gtatgggtgaggtcgctactaaggaagcttttgac







tggatctccaataacccaaagatgttgaaggcttc







tactattatatgtagattgatggatgatatcacttcc







catgaatttgaacagaccagagaccacgttgcct







ctggtgttgaatgttacatgaaacaatacggtgtct







ccagagaagaaaccgttaagttgttcagagaaga







tgttgctaacgcttggaaggacatcaatgaaggtt







tcatgaagccagcaatcttcccaatgcctatcttga







ctgttgtcttgaattttgccagagttatggactttttgt







acaaggatggtgataactatactaactctcatatgt







taaaagactacattacctcattattggttaatccatt







attgatttaa







(SEQ ID NO: 97)





ShoCusS

Shorea

ShoBecS
36%
ALQDSEVPSSILNATAG
atggctttacaggactccgaggttccttcatctatat


QTS156

cuspi-

QTS1

NRPTASYHPTLWGEKF
tgaacgccaccgctggtaatcgtccaactgcatct




data



LVVSTQSTSGSMKNEPT
tatcatccaacattgtggggtgaaaaattcttggtc






TQGEYDELKQQVTKML
gtttctactcaatccacctctgggtccatgaagaac






TDATTNDPSKKLHLID
gaaccaactacccaaggtgaatacgatgaattaa






MVQRLGIAYHFEIEIEN
agcaacaagtcacaaagatgttgactgatgctac






ALEKINLGDANYFEYD
cactaatgacccatctaaaaagttgcacttgattga






LYTIALGFRLLRQQGIK
catggttcaaagattaggtatcgcctaccactttga






VSSEIFKKFMDEKGKFK
aattgagatcgaaaacgctttggaaaagattaact






EDVVNDVLGMLNLYE
taggtgatgctaattatttcgaatacgatttgtacac






AAHLRLRGEDILDEAL
tatagccttgggttttagattattgagacaacaggg






AFTTSHLESMATKVSPL
tatcaaggtttcatctgaaatcttcaaaaagttcatg






LAEQIAHALNCPIQKGL
gacgagaaaggtaagtttaaggaagacgtcgtta






PRIEARHYISLYSRETHF
acgatgtcttgggtatgttaaacttgtatgaagctg






ASSNAALLRFAKIDFN
cccatttgagattgcgtggtgaagacattttagatg






MVQALHQKEISGITKW
aggctttggcttttaccacatcccacttagaatcaa






WKNLDFSTKLPYARDR
tggcaactaaggtttcacctttgttggctgaacaaa






IVECYFWIMGAYFEPK
tcgcccacgctttaaattgcccaattcaaaaaggtt






YSLARTFLTKVIAMTSI
tgccaagaatagaagccagacattacatttctttgt






LDDTYDNYGTNKELEL
actccagagaaacccacttcgcttcttctaacgca






LTKCIERWDIDVIDQLP
gcattgttgcgtttcgctaagatcgactttaatatgg






EYMKLVYQALLNVYSE
ttcaagcattgcatcagaaagagatttccggtatta






MEAKVAKEGRSYAIDY
ctaagtggtggaagaatttagatttctctacaaaatt






AKESMKKTMKAYLDE
gccatatgctagagatagaatcgtcgaatgttactt






AKWRQEDYVPPIEEYM
ctggattatgggtgcttattttgaaccaaagtactct






QVARISSGYPMLITNSL
ttggccagaacctttttaaccaaagtcattgctatg






VGMGEVATKEAFDLIS
acttctatcttagatgacacatacgacaattacggt






NDPKMLKASTTICRLM
actaacaaggaattggaattgttaaccaagtgtatt






DDITSHEFEQTRDHVAS
gaaagatgggatatagatgttatcgatcaattgcct






GVECYMKQYGVSREET
gaatacatgaagttagtttatcaagctttgttgaac






VKLFREDVANAWKDIN
gtctactccgaaatggaggctaaggtcgctaagg






EGFMKPAIFPMPILTVV
aaggtcgttcctatgccatcgattacgctaaggaa






LNFARVMDFLYKDGD
tccatgaaaaagactatgaaagcctatttggacga






NYTNSHMLKDYITSLL
agctaagtggagacaagaggactacgttccacct






VNPLLI
atcgaagagtacatgcaagttgcaagaatttcttc






(SEQ ID NO: 32)
cggttatccaatgttaattaccaactccttggttggt







atgggtgaagtcgccactaaagaagccttcgattt







gatttctaacgacccaaaaatgttgaaggcttcca







ccactatatgtagattgatggacgatatcacttctc







acgaatttgaacaaactagagatcacgtcgcttca







ggtgttgaatgttatatgaagcaatacggtgtttctc







gtgaggaaaccgttaagttattcagagaagacgt







cgctaacgcatggaaggacattaatgagggtttc







atgaagccagcaatctttccaatgccaatcttgact







gtcgtcttaaacttcgctagagttatggactttttgta







caaagatggtgataattacacaaactctcatatgtt







aaaggattacatcacttcattgttggtcaaccctttg







ttgatttaa







(SEQ ID NO: 98)





ShoCusS

Shorea

ShoBecS
38%
ALQDSEVPSSILNATAG
atggccttacaggactccgaagttccatcatctatt


QTS157

cuspi-

QTS1

NRPTASYHPTLWGEKF
ttgaacgctactgctggtaatagacctacagcatc




data



LVVSTQSTSGSMKNEPT
ttaccatccaaccttgtggggagagaagtttttggt






TQGEYDELKQQVTKML
cgtttccactcaatctacctccggttctatgaaaaa






TDATTNDPSKKLHLID
cgaaccaactacacaaggtgaatatgatgaatta






MVQRLGIAYHFEIEIEN
aagcaacaagtcaccaagatgttgactgatgcta






ALEKINLGDANYFEYD
ctaccaacgacccatctaaaaagttgcacttaata






LYTIALGFRLLRQQGIK
gatatggttcaacgtttgggtatcgcctaccacttc






VSSEIFKKFMDEKGKFK
gagattgaaatcgaaaatgctttagaaaaaattaa






EDVVNDVLGMLNLYE
cttgggtgacgctaactacttcgaatatgatttgta






AAHLRLRGEDILDEAL
cactatcgcattaggttttagattgttgagacaaca






AFTTSHLESMATKVSPL
gggtattaaggtctcctcagaaattttcaagaagtt






LAEQIAHALNCPIQKGL
catggatgaaaaaggtaagtttaaggaggacgtt






PRIEARHYISLYSRETHF
gtcaatgacgttttaggtatgttgaacttgtatgaag






ASSNAALLRFAKIDFN
ctgctcatttacgtttgagaggtgaagatatcttgg






MVQALHQKEISGITKW
acgaagccttggctttcactacatcacacttggaat






WKNLDFSTKLPYARDR
ctatggctaccaaggtttccccattgttggccgag






IVECYFWIMGAYFEPK
caaatagcacatgccttaaattgtcctattcaaaaa






YSLARTFLTKVIAMTSI
ggtttgccaagaatcgaagctagacactacatctc






LDDTYDNYGTNKELEL
tttatactctcgtgaaactcactttgcttcctctaacg






LTKCIERWDIDVIDQLP
ctgccttgttgagattcgctaagattgattttaatatg






EYMKLVYQALLNVYSE
gttcaagccttgcaccagaaagaaatctctggtat






MEAKVAKEGRSYAIDY
caccaagtggtggaagaatttggacttctccacca






AKESMKKTMKAYLDE
agttgccatatgctagagacagaattgtcgaatgc






AKWRQEDYVPPMDEY
tacttctggataatgggtgcatattttgaacctaagt






MQVALISCGYPMLITNS
actctttagctagaacttttttgactaaagttattgct






FVGMGEVATKEAFDWI
atgacatcaattttggatgatacttacgataactac






SNDPKILKASTTICRLM
ggtactaacaaagaattagaattattgaccaagtg






DDITSHEFEQTRDHVAS
tatcgagagatgggacattgacgtcattgaccaat






GVECYMKQYGVSREET
taccagaatacatgaagttggtttatcaagctttgtt






VKLFREDVANAWKDIN
gaacgtctactccgagatggaagcaaaggttgcc






EGFMKPAIFPMPILTVV
aaggaaggtcgttcttatgctatagattatgctaaa






LNFARVMDFLYKDGD
gaatctatgaaaaagacaatgaaggcatacttgg






NYTNSHMLKDYITSLL
acgaagctaagtggagacaagaggattatgttcc






VNPLLI
tccaatggatgaatacatgcaagttgctttgatatc






(SEQ ID NO: 33)
ctgtggttacccaatgttgatcaccaactctttcgtt







ggtatgggtgaagtcgctaccaaagaagcctttg







attggatctctaatgacccaaagattttgaaagcat







ctaccactatctgtagattaatggatgacattacct







cccatgagttcgaacagacaagagatcacgttgc







ttcaggtgtcgaatgttatatgaagcaatacggtgt







ttctcgtgaagaaactgttaaattattcagagagga







tgttgctaacgcttggaaagacattaatgaaggttt







catgaagcctgctattttcccaatgccaattttgac







cgtcgtcttgaatttcgctagagtcatggattttttat







acaaggacggtgataactacacaaactcacatat







gttgaaagattacatcacttcattattagttaatccat







tgttgatataa







(SEQ ID NO: 99)





ShoCusS

Shorea

ShoBecS
36%
ALQDSEVPSSILNATAG
atggcattacaggattcagaggtcccatcctctatt


QTS160

cuspi-

QTS1

NRPTASYHPTLWGEKF
ttgaacgctactgccggtaatcgtcctaccgcttct




data



LVVSTQSTSGSMKNEPT
taccacccaacattgtggggtgaaaagtttttagtt






TQGEYDELKQQVTKML
gtttccactcaatctacctccggctctatgaaaaac






TDATTNDPSKKLHLID
gaaccaaccactcaaggtgaatatgacgaattga






MVQRLGIAYHFEIEIEN
agcaacaagtcactaagatgttgacagatgctact






ALEKINLGDANYFEYD
accaatgacccatctaaaaagttgcatttgatagat






LYTIALGFRLLRQQGIK
atggttcaaagattgggtattgcctaccacttcgaa






VSSEIFKKFMDEKGKFK
atcgaaatcgaaaacgctttagaaaagattaattta






EDVVNDVLGMLNLYE
ggtgacgctaactatttcgaatacgatttatacaca






AAHLRLRGEDILDEAL
atcgctttgggttttagattgttgagacagcaaggt






AFTTSHLESMATKVSPL
atcaaggtctcttcagagattttcaaaaagttcatg






LAEQIAHALNCPIQKGL
gatgagaaaggtaagtttaaggaagacgttgtca






PRIEARHYISLYSRETHF
acgacgttttgggtatgttgaatttatatgaagcag






ASSNAALLRFAKIDFN
cccatttgagattgcgtggtgaagatatattggac






MVQALHQKEISGITKW
gaggctttagctttcactacctcccacttggaatct






WKNLDFATMLPYARD
atggcaaccaaagtttccccattgttagctgaaca






RIVECYFWIMGVYFEPK
aattgcccacgctttgaactgtcctatccaaaagg






YSLARTFLTKVIAMTSI
gtttgccaagaattgaagccagacattacatatctt






LDDTYDNYGTNKELEL
tgtattcaagagaaactcacttcgcttcttccaatg






LTKCIERWDIDVIDQLP
ctgctttattaagatttgctaagatcgattttaacatg






EYMKLVYQALLNVYSE
gtccaagctttgcatcaaaaagagatctctggtatt






MEAKVAKEGRSYAIDY
acaaagtggtggaagaacttggacttcgctactat






AKESMKKTMKAYLDE
gttaccatacgccagagatcgtattgttgaatgcta






AKWRQEDYVPTIEEYM
cttctggatcatgggtgtttattttgaaccaaagtac






QVALISSAYPMLITNSF
tccttagctagaaccttcttgaccaaagttattgca






VGMGEVATKEAFDWIS
atgacttctattttagacgatacatacgacaactac






NNPKMLKASTIICRLMD
ggtactaataaggaattggaattgttgactaaatgt






DITSHEFEQTRDHVASG
attgaaagatgggacatcgatgtcattgatcaattg






VECYMKQYGVSREETV
cctgagtatatgaagttggtttatcaggcattattga






KLFREDVANAWKDINE
acgtctactcagaaatggaagctaaggttgccaa






GFMKPAIFPMPILTVVL
agagggtagatcctacgctattgattacgccaaa






NFARVMDFLYKDGDN
gaatctatgaagaagaccatgaaggcctatttgg






YTNSHMLKDYITSLLV
acgaagctaagtggagacaagaagactacgtcc






NPLLI
ctaccatcgaagaatatatgcaagtcgctttaatat






(SEQ ID NO: 34)
cttcagcctacccaatgttaataactaattcatttgt







cggtatgggtgaggttgccactaaggaagcttttg







attggatctctaacaacccaaaaatgttaaaggctt







ccactattatttgtagattgatggatgacatcacctc







ccacgaatttgaacagacccgtgaccacgttgcc







tctggtgttgaatgttatatgaagcaatacggtgttt







cacgtgaggaaaccgtcaagttgttcagagaaga







tgttgctaatgcttggaaagacatcaatgagggttt







catgaagccagcaatcttcccaatgccaattttga







ctgtcgttttgaacttcgcaagagttatggatttctta







tataaggacggcgacaattacactaactctcatat







gttgaaagactacatcacttctttgttggtcaaccc







attgttaatataa







(SEQ ID NO: 100)





ShoCusS

Shorea

ShoBecS
37%
ALQDSEVPSSILNATAG
atggctttgcaagactctgaagtcccttcctcaattt


QTS161

cuspi-

QTS1

NRPTASYHPTLWGEKF
taaacgcaaccgctggtaatagaccaacagcctc




data



LVVSTQSTSGSMKNEPT
ttaccatccaactttgtggggtgagaaatttttggtt






TQGEYDELKQQVTKML
gtttccactcagtctacctcaggttctatgaagaac






TDATTNDPSKKLHLID
gaaccaactacccaaggtgaatatgatgaattga






MVQRLGIAYHFEIEIEN
agcaacaagtcactaagatgttaacagatgctact






ALEKINLGDANYFEYD
accaatgacccatccaaaaagttgcacttgataga






LYTIALGFRLLRQQGIK
tatggttcaacgtttgggtatcgcctaccacttcga






VSSEIFKKFMDEKGKFK
aatcgagattgaaaacgctttagagaaaatcaact






EDVVNDVLGMLNLYE
tgggcgacgctaattacttcgaatatgatttataca






AAHLRLRGEDILDEAL
ccattgccttaggttttagattgttgagacaacaag






AFTTSHLESMATKVSPL
gtattaaggtttcttccgaaattttcaagaagtttatg






LAEQIAHALNCPIQKGL
gatgaaaaaggtaagttcaaggaagacgtcgtta






PRIEARHYISLYSRETHF
acgacgttttaggtatgttgaacttgtatgaagctg






ASSNAALLRFAKIDFN
cccatttaagattgcgtggtgaagatatcttggatg






MVQALHQKEISGITKW
aagctttagcattcacaacctctcacttggaatctat






WKNLDFATMLPYARD
ggctactaaagtctctccattgttagctgagcagat






RIVECYFWIMGVYFEPK
cgcccacgctttgaattgccctatccaaaagggtt






YSLARTFLTKVIAMTSI
tgccaagaatagaagcaagacattacatttccttgt






LDDTYDNYGTNKELEL
actcaagagaaacacacttcgcttcctctaacgct






LTKCIERWDIDVIDQLP
gctttgttaagatttgctaaaattgactttaatatggt






EYMKLVYQALLNVYSE
tcaagccttacatcaaaaggagatttctggtatcac






MEAKVAKEGRSYAIDY
caagtggtggaagaacttggacttcgcaactatgt






AKESMKKTMKAYLDE
tgccatacgcaagagaccgtattgttgaatgttatt






AKWRQEDYVPPIEEYM
tctggatcatgggtgtctacttcgaacctaagtact






QVARISSGYPMLITNSL
cattggctagaacttttttaactaaagtcatagccat






VGMGEVATKEAFDLIS
gacctccattttggatgacacctacgataactatg






NDPKMLKASTTICRLM
gtactaacaaggaattagagttgttaacaaagtgt






DDITSHEFEQTRDHVAS
atagaaagatgggacattgatgtcatcgatcaatt






GVECYMKQYGVSREET
gcctgaatacatgaagttggtttaccaggctttgtt






VKLFREDVANAWKDIN
aaatgtctactcagaaatggaagctaaggttgcta






EGFMKPAIFPMPILTVV
aagaaggtcgttcttatgcaattgattacgcaaag






LNFARVMDFLYKDGD
gagtctatgaagaaaactatgaaagcttatttgga






NYTNSHMLKDYITSLL
cgaagctaaatggagacaagaagactatgttcca






VNPLLI
ccaatcgaagaatatatgcaagtcgctagaatctc






(SEQ ID NO: 35)
ttccggttacccaatgttgattactaactcattagtc







ggtatgggtgaggttgccactaaggaagctttcg







acttgatttctaatgatccaaagatgttaaaagcct







ccactacaatctgtagattgatggacgacattactt







ctcatgaatttgaacagacacgtgatcacgttgcc







tctggtgtcgagtgctatatgaagcaatacggtgtt







tccagagaagaaaccgtcaagttgtttagagaag







acgttgctaacgcttggaaggatatcaatgaagg







cttcatgaaaccagcaatctttccaatgccaattttg







accgttgttttgaacttcgctagagtcatggacttct







tgtataaggatggcgacaactacactaattcacat







atgttgaaagattacataacttcattattagttaacc







ctttattgatctaa







(SEQ ID NO: 101)





WenAng

Wend-

A0A068
81%
ASAQASLPSNNRQETV
atggcctcagcacaagcttccttaccttctaataac


SQTS10

landia

UHT0

RPLADFPENIWADRIAP
agacaggaaacagtccgtccattggctgacttcc


07

angust-



FTLDKQEYEMCQREIE
cagagaacatctgggctgatagaattgccccattt




ifolia



MLKAEVASMLLATGKT
accttggataagcaagaatacgaaatgtgtcaaa






MMQRFDFIDKIERLGVS
gagaaatagagatgttaaaagctgaagttgcttct






HHFDIEIENQLQEFFNV
atgttgttggcaactggtaagactatgatgcaaag






YTNLGEYSAYDLSSAA
attcgacttcattgataagatcgaaagattggggg






LQFRLFRQHGFNISCGIF
tctcccaccattttgacattgaaatcgaaaatcaatt






DQFIDAKGKFKESLCN
gcaagagtttttcaacgtttataccaacttaggtga






DIRGLLSLYEAAHVRTH
atactctgcctatgatttgtcatctgctgccttgcag






GDKILEEALAFTTTHMT
ttccgtttatttagacaacacggtttcaatatttcctg






SGGPHLDSSLAKQVKY
cggtattttcgaccaatttatcgacgctaaaggtaa






ALEQPLHKGILRYEAW
gttcaaggaatctttatgtaacgatatcagaggttt






RYISIYEEDESNNKLLL
gttgtctttgtacgaagctgctcatgttagaactca






RLAKLDYHLLQMSYKQ
cggtgataaaattttggaagaagctttagctttcac






ELCEITRWGKGLESVSN
cactactcacatgacctccggtggtccacatttag






FPYARDRFVECYFWAV
attcttcattggccaagcaagttaaatacgcattgg






GTLYEPQYSLARMTFA
aacagccattgcataagggtatattgagatatgaa






KVAALITMIDDIYDAYG
gcttggagatacatatctatctacgaagaggacg






TLDELQILTDSAERWD
aatccaacaataagttattattgcgtttggctaagtt






GSGVDQLSDYIRASYN
ggactatcacttgttacaaatgtcatacaagcaag






TLLKFNKEVGEDLAKK
agttgtgtgaaattacaagatggggtaaaggtttg






QRTYAFDKYIEDWKQY
gaatctgtctccaactttccttatgcccgtgacaga






MRTNFSQSRWFFTKEL
ttcgttgaatgttacttttgggctgtcggtactttgta






PSFADYINNGAITIGAY
cgaaccacaatactcattggctagaatgaccttcg






LVASAAFLYMDSAKED
ctaaggttgctgctttaattactatgatcgatgatatt






VINWMSTNPKLVVAYS
tatgatgcctacggtaccttggacgaattgcaaat






THSRLINDFGGHKFEKE
attaactgactctgccgaaagatgggatggttccg






RGSSTAIECYMKDHNV
gtgtcgatcagttgtctgactatattagagcttccta






SEEEAANKFREMMEDA
taatacattattgaaatttaataaggaggttggtga






WKVMNEECLRPTTIPR
agatttggcaaaaaagcaacgtacctacgctttcg






DGLKMLLNIARVGETV
acaagtacatcgaagattggaaacaatacatgag






YKHRIDGFTQPHAIEEH
aaccaacttctctcaatcaagatggtttttcactaag






IRAMLVDFMSI
gagttgccatctttcgctgattacattaacaacggt






(SEQ ID NO: 36)
gccatcacaatcggtgcatatttggttgcctctgct







gctttcttatatatggactccgcaaaagaagatgtt







atcaactggatgtccacaaaccctaagttggtcgt







tgcttactccactcactctcgtttaattaatgactttg







gtggtcacaagttcgaaaaggagagaggttcctc







tactgctattgaatgctacatgaaggaccataatgt







ctccgaagaagaagccgcaaacaagtttagaga







aatgatggaggacgcttggaaggttatgaatgaa







gaatgtttaagaccaactaccatccctagagacg







ggttgaagatgttgttaaacatagccagagtcggt







gaaactgtttacaagcatagaatcgatggttttacc







caaccacatgctattgaagaacacataagagcca







tgttggtcgatttcatgtctatttaa







(SEQ ID NO: 102)





WenAng

Wend-

A0A068
80%
ASAQASLPSNNRQETV
atggcctcagcacaagcttccttaccttctaataac


SQTS10

landia

UHT0

RPLADFPENIWADRIAP
agacaggaaacagtccgtccattggctgacttcc


86

angust-



FTLDKQEYEMCQREIE
cagagaacatctgggctgatagaattgccccattt




ifolia



MLKAEVASMLLATGKT
accttggataagcaagaatacgaaatgtgtcaaa






MMQRFDFIDKIERLGVS
gagaaatagagatgttaaaagctgaagttgcttct






HHFDIEIENQLQEFFNV
atgttgttggcaactggtaagactatgatgcaaag






YTNLGEYSAYDLSSAA
attcgacttcattgataagatcgaaagattggggg






LQFRLFRQHGFNISCGIF
tctcccaccattttgacattgaaatcgaaaatcaatt






DQFIDAKGKFKESLCN
gcaagagtttttcaacgtttataccaacttaggtga






DIRGLLSLYEAAHVRTH
atactctgcctatgatttgtcatctgctgccttgcag






GDKILEEALAFTTTHMT
ttccgtttatttagacaacacggtttcaatatttcctg






SGGPHLDSSLAKQVKY
cggtattttcgaccaatttatcgacgctaaaggtaa






ALEQPLHKGILRYEAW
gttcaaggaatctttatgtaacgatatcagaggttt






RYISIYEEDESNNKLLL
gttgtctttgtacgaagctgctcatgttagaactca






RLAKLDYHLLQMSYKQ
cggtgataaaattttggaagaagctttagctttcac






ELCEITRWGKGLESVSN
cactactcacatgacctccggtggtccacatttag






FPYARDRFVECYFWAV
attcttcattggccaagcaagttaaatacgcattgg






GTLYEPQYSLARMTFA
aacagccattgcataagggtatattgagatatgaa






KVAALITMIDDIYDAYG
gcttggagatacatatctatctacgaagaggacg






TLDELQILTDSAERWD
aatccaacaataagttattattgcgtttggctaagtt






GSGVDQLSDYIRASYN
ggactatcacttgttacaaatgtcatacaagcaag






TLLKFNKEVGEDLAKK
agttgtgtgaaattacaagatggggtaaaggtttg






QRTYAFDKYIEDWKQY
gaatctgtctccaactttccttatgcccgtgacaga






MRTNFSQSRWFFTKEL
ttcgttgaatgttacttttgggctgtcggtactttgta






PSFADYINNGAITIGAY
cgaaccacaatactcattggctagaatgaccttcg






LVASAAFLYMDSAKED
ctaaggttgctgctttaattactatgatcgatgatatt






VINWMSTNPKLVVAYS
tatgatgcctacggtaccttggacgaattgcaaat






THSRLINDFGGHKFDKE
attaactgactctgccgaaagatgggatggttccg






RGTGTAIECYMKDHNIS
gtgtcgatcagttgtctgactatattagagcttccta






EEEAAKKFREMIENTW
taatacattattgaaatttaataaggaggttggtga






KVMNEECLRPIPIPRDT
agatttggcaaaaaagcaacgtacctacgctttcg






LKMLLNIARVGETVYK
acaagtacatcgaagattggaaacaatacatgag






HRIDGFTQPHAIEEHIRA
aaccaacttctctcaatcaagatggtttttcactaag






MLVDFMSI
gagttgccatctttcgctgattacattaacaacggt






(SEQ ID NO: 37)
gccatcacaatcggtgcatatttggttgcctctgct







gctttcttatatatggactccgcaaaagaagatgtt







atcaactggatgtccacaaaccctaagttggtcgt







tgcttactccactcactctcgtttaattaatgactttg







gtggtcacaagttcgacaaggagagaggtaccg







gtactgctattgaatgctacatgaaggaccataat







atatccgaagaagaagccgcaaagaagtttaga







gaaatgatcgagaacacctggaaggtcatgaatg







aagaatgtttaagaccaattccaatccctagagac







acattgaagatgttgttaaacatcgccagagttggt







gaaactgtctacaagcatagaatcgatggttttact







caaccacatgctattgaagaacacataagagctat







gttggttgatttcatgtctatttaa







(SEQ ID NO: 103)





WenAng

Wend-

G5CV47
11%
SLLEGNVNHENGIFRPE
atgtccttgttagaaggtaacgttaatcacgagaa


SQTS26

landia



ANFSPSMWGNIFRDSSK
cggaatatttagaccagaagctaatttctcaccttc


7

angust-



DNQISEEVVEEIEALKE
tatgtggggtaacattttccgtgattcttccaaaga




ifolia



VVKHMIISTTSNAIEQK
caaccaaatctctgaagaagtcgttgaagaaatc






LELVDNLERLGLAYHF
gaggcattgaaggaagtcgttaagcatatgattat






EGQINRLLSSAYNANHE
ttctacaacctccaacgccatcgaacagaaattag






DEGNHKRNKEDLYAA
agttggtcgataatttggaaagattgggtttggctt






ALEFRIFRQHGFNVSSD
accacttcgaaggtcaaatcaacagattattatcat






CFNQFKDTKGKFKKTL
ctgcctataatgctaaccatgaagacgaaggtaa






LIDVKGMLSLYEAAHV
ccacaagagaaataaggaggacttgtacgcagc






REHGDDILEEALIFATF
tgctttggaatttagaattttcagacaacatggtttta






HLERITPNSLDSTLEKQ
acgtttcctctgattgctttaatcaattcaaagatact






VGHALMQSLHRGIPRA
aagggtaagttcaaaaagactttgttgattgatgtc






EAHFNISIYEECGSSNEK
aagggtatgttgtccttgtatgaagctgcccacgtt






LLRLAKLDYNLVQVLH
cgtgaacatggtgacgacatcttagaagaagcttt






KEELSELTKWWKDLDF
gatctttgctaccttccacttagaaagaattactcca






ASKLSYVRDRMVECFF
aattctttggattccacattggaaaaacaagttggt






WTVGVYFEPQYSRARV
cacgcattgatgcaatcattacacagaggtattcc






MLAKCIAMISVIDDTYD
aagagccgaagctcattttaacatatctatttacga






SYGTLDELIIFTEVVDR
agagtgtggttcttctaatgaaaagttgttaagattg






WDISEVDRLPNYMKPI
gctaagttggactacaacttagtccaagtcttgca






YISLLYLFNEYEREINEQ
caaggaggaattatcagaattgaccaaatggtgg






DRFNGVNYVKEAMKEI
aaagatttagacttcgcttctaagttgtcctacgttc






VRSYYIEAEWFIEGKIPS
gtgatagaatggttgaatgttttttctggactgtcgg






FEEYLNNALVTGTYYL
tgtttatttcgaaccacagtactccagagccagag






LAPASLLGMESTSKRTF
ttatgttagctaagtgtattgctatgatctctgttatc






DWMMKKPKILVASAII
gacgatacttacgattcctatggtaccttggacga






GRVIDDIATYKIEKEKG
gttaattatattcactgaagtcgttgatagatgggat






QLVTGIECYMQENNLS
atatccgaggtcgaccgtttgcctaactatatgaa






VEKASAQLSEIAESAW
accaatctacatttctttgttatacttgtttaacgaata






KDLNKECIKTTTSNIPN
tgaaagagaaattaacgaacaagaccgtttcaat






EILMRVVNLTRLIDVVY
ggtgttaactacgttaaggaagctatgaaggaaat






KNNQDGYSNPKNNVKS
cgtcagatcttattacatcgaggccgaatggttcat






VIEALLVNPINM
agaaggtaaaatcccatctttcgaagagtacttga






(SEQ ID NO: 38)
acaatgcattggttacaggtacctattacttattggc







cccagcatctttgttgggtatggaatccacctcaa







agagaacttttgattggatgatgaagaagccaaa







aattttggtcgcttctgctatcattggtagagttattg







atgatattgctacttacaagatagaaaaggaaaag







ggacagttagtcactggtattgaatgctacatgca







agagaacaacttatcagttgaaaaggcctccgct







caattgtctgaaatcgccgagtccgcttggaaag







acttgaataaagaatgtatcaaaactaccacctcc







aacattcctaacgaaatattgatgagagttgtcaac







ttgacaagattaattgacgttgtctacaagaataat







caagatggttattctaaccctaagaacaatgttaag







tcagtcatcgaagctttgttggttaatccaatcaata







tgtaa







(SEQ ID NO: 104)





WenAng

Wend-

Q5SBP4
17%
ESRRSANYQASIWDDN
atggaaagtaggcgttcagcaaattatcaggcttc


SQTS30

landia



FIQSLASPYAGEKYVSQ
catatgggatgacaactttattcaatctcttgcctct


2

angust-



ANELKEQVKMMLDEE
ccttacgctggagagaagtacgtctcgcaagcta




ifolia



DMKLLDCLELVDNLER
acgaattgaaagaacaagtgaagatgatgttaga






LGLAYHFEGQINRLLSS
cgaagaggatatgaaactgttagattgcttggaat






AYNANHEDEGNHKRN
tggttgacaacttggaaagactaggcttggcttat






KEDLYAAALEFRIFRQH
cacttcgagggtcaaatcaatagactcttgagcag






GFNVPQDVFSSFMNKA
tgcctacaacgctaaccatgaagatgaaggtaat






GDFEESLSKDTKGLVSL
cacaagagaaataaggaagacttatacgcggcg






YEASYLSMEGETILDM
gctttggagttcagaatttttagacaacatggtttca






AKDFSSHHLHKMVEDA
acgttccacaggacgtcttctcttcctttatgaataa






TDKRVANQIIHSLEMPL
ggccggtgattttgaagaatccctttctaaggatac






HRRVQKLEAIWFIQFYE
aaaaggtttggtttcattgtatgaagcttcttacctat






CGSDANPTLVELAKLD
caatggaaggtgaaaccatcttagacatggctaa






FNMVQATYQEELKRLS
ggatttctcctctcaccatttacacaaaatggtcga






RWYEETGLQEKLSFAR
agatgctactgataagcgagttgctaaccaaatca






HRLAEAFLWSMGIIPEG
ttcacagccttgaaatgccattgcacagaagggta






HFGYGRMHLMKIGAYI
caaaaactcgaagcaatatggttcattcaattctac






TLLDDIYDVYGTLEELQ
gaatgtggttctgacgccaaccccactttggtaga






VLTEIIERWDINLLDQLP
attggctaagttagacttcaacatggttcaagctac






EYMQIFFLYMFNSTNEL
gtatcaagaagaactaaagagattgtcgagatgg






AYEILRDQGINVISNLK
tacgaagagaccggactgcaagaaaagttatcttt






GLWVELSQCYFKEATW
tgcacgtcatcgtttggccgaagcttttttgtggtct






FHNGYTPTTEEYLNVA
atgggtatcattccagaaggccatttcggttacgg






CISASGPVILFSGYFTTT
tagaatgcacttgatgaagatcggtgcctatattac






NPINKHELQSLERHAHS
tttattggatgatatttatgatgtctacggtaccttgg






LSMILRLADDLGTSSDE
aagagttgcaagttctaactgaaatcatcgaacgt






MKRGDVPKAIQCFMND
tgggacattaatttgttggaccagctgcctgagta






TGCCEEEARQHVKRLI
catgcaaatcttctttttatacatgttcaattccacaa






DAEWKKMNKDILMEK
acgaattagcttatgagatacttagagatcaagga






PFKNFCPTAMNLGRISM
attaatgttatctctaacctcaaagggttgtgggtc






SFYEHGDGYGGPHSDT
gaattgtcccagtgttattttaaggaagcaacctgg






KKKMVSLFVQPMNITI
tttcataacggttacactccaactacagaggaata






(SEQ ID NO: 39)
cttgaacgttgcttgtattagtgcatctggtccagtg







atccttttctccggttatttcaccacgactaacccga







ttaataagcatgaattacaaagtttagaaagacac







gctcattcactaagcatgattctgagattggctgac







gaccttgggacctcatctgatgaaatgaaacggg







gcgatgtgccaaaggccatccagtgctttatgaat







gacactggttgttgtgaagaagaggcaagacaa







cacgtcaaaagactcatagacgctgaatggaag







aagatgaacaaggacatcttgatggaaaaaccct







ttaagaacttctgtccaactgctatgaatttaggtag







gataagcatgtccttttacgagcacggtgatggtt







acggtggtccacactctgataccaaaaaaaagat







ggttagcttgttcgttcaacctatgaacattaccatc







taa







(SEQ ID NO: 105)





WenAng

Wend-

A0A068
46%
ASTEIAVPLNNQHESVR
atggcctcaacagaaatcgcagttcctttgaataa


SQTS73

landia

VE40

QLADFPENIWADRVAS
ccagcacgagtccgtccgtcaattagctgacttcc


8

angust-



FTLDKQGHDMCAKEIE
cagaaaacatttgggctgatagagttgcttctttta




ifolia



MLKEEVMSMLLEEKP
ccttggataagcaaggtcatgacatgtgtgctaaa






MMEKFNLIDNIERLGIS
gaaatagaaatgttaaaggaagaagtcatgtctat






YHFGDKIEDQLQEYYD
gttgttggaggaaaagccaatgatggaaaaattc






ACTNFEKHAECDLSIAA
aacttgatcgataatattgaaagattaggcatctcc






LQFRLFRQHGFNISCGIF
taccacttcggtgacaagattgaagatcaattaca






DGFLDANGKFKESLCN
agaatattacgacgcctgcactaactttgagaagc






DIKGLLSLYEAAHVRT
atgctgaatgtgatttgtcaatagctgccttgcaatt






HGDKILEEALFFTTTHL
cagattgtttagacaacacggtttcaatatttcttgt






TREIPNVGSTLAKQVKY
ggtatctttgacggtttcttggatgcaaacggtaaa






ALEQPLHKGIPRYEAW
ttcaaggaatctttatgtaatgacattaagggtttgtt






RYISIYEEDESSNKLLLR
gtccttatacgaagccgctcatgttagaactcacg






LAKLDYHLSQMLNKQ
gtgataaaattttggaggaagctttgttttttaccact






DLCEIIRWGKELDIISKV
actcatttgacccgtgaaatcccaaacgttggttct






PYARDRIVECYFWAVA
actttggctaagcaggtcaaatatgctttagagca






TYYEPQYSLARMTLTK
accattgcacaagggtatcccaagatacgaagcc






ATVFAGMIDDTYDAYG
tggagatatatttcaatttacgaagaagacgaatct






TLDELKIFTEAVERWDS
tccaacaagttgttattacgtttggcaaagttggatt






SGIDQLSDYMKAAYTL
accatttgtcccaaatgttgaacaaacaggacttgt






VLNFNKEVGEDLAKKQ
gcgagatcattagatggggtaaggaattagacatt






RTYAFDKYIEEWKQYA
atttctaaggttccttatgctagagatagaatcgtc






RTSFTQSKWFLTNELPS
gaatgttacttctgggctgttgccacatattacgaa






FSDYLSNGMVTSTYYL
ccacaatactccttggctagaatgacattgaccaa






LSAAAFLDMDSASEDVI
agctactgtttttgctggtatgatcgatgatacctat






NWMSTNPKLFVALTTH
gacgcttacggtactttagatgagttgaagatattc






ARLANDVGSHKFEKER
actgaagcagtcgaacgttgggactcttccggtat






GSGTAIECYMKDYHVS
tgaccaattgtcagattacatgaaagcagcttaca






EEEAMKKFEEMCDDA
ccttagtcttaaattttaacaaggaagttggtgaag






WKVMNEECLRSTTIPR
atttagccaagaaacaaagaacttacgccttcgac






EILKVILNLARTCEVVY
aagtacatcgaagaatggaagcaatatgctagaa






KHRGDGFTDQRRIEAHI
cctctttcacccaatctaagtggttcttgaccaatg






NAMLMDSVSI
agttgccatccttttctgattatttgtctaacggtatg






(SEQ ID NO: 40)
gttacttcaacatactacttattgtctgccgctgcctt







cttggacatggattccgcttctgaagacgtcataa







attggatgtctaccaaccctaaattgttcgtcgcttt







gacaactcacgctagattggccaacgacgttggt







tctcataaatttgaaaaggaaagaggttcaggtac







cgcaatagaatgttatatgaaggattaccacgtttc







tgaggaagaagctatgaagaaattcgaggaaat







gtgtgacgatgcttggaaggtcatgaacgaagaa







tgcttgcgttccactacaatcccaagagagattttg







aaggttattttgaacttggcaagaacttgtgaagtc







gtttacaagcatcgtggtgatggcttcaccgatca







aagaagaattgaagctcacatcaacgccatgtta







atggactccgtttccatctaa







(SEQ ID NO: 106)





WenAng

Wend-

A0A068
43%
ASTEIAVPLNNQHESVR
atggcctcaacagaaatcgcagttcctttgaataa


SQTS76

landia

VE40

QLADFPENIWADRVAS
ccagcacgagtccgtccgtcaattagctgacttcc


0

angust-



FTLDKQGHDMCAKEIE
cagaaaacatttgggctgatagagttgcttctttta




ifolia



MLKEEVMSMLLEEKP
ccttggataagcaaggtcatgacatgtgtgctaaa






MMEKFNLIDNIERLGIS
gaaatagaaatgttaaaggaagaagtcatgtctat






YHFGDKIEDQLQEYYD
gttgttggaggaaaagccaatgatggaaaaattc






ACTNFEKHAECDLSIAA
aacttgatcgataatattgaaagattaggcatctcc






LQFRLFRQHGFNISCGIF
taccacttcggtgacaagattgaagatcaattaca






DGFLDANGKFKESLCN
agaatattacgacgcctgcactaactttgagaagc






DIKGLLSLYEAAHVRT
atgctgaatgtgatttgtcaatagctgccttgcaatt






HGDKILEEALFFTTTHL
cagattgtttagacaacacggtttcaatatttcttgt






TREIPNVGSTLAKQVKH
ggtatctttgacggtttcttggatgcaaacggtaaa






ALEQPLHRGIPRYEAYC
ttcaaggaatctttatgtaatgacattaagggtttgtt






FISIYEEDESNNKLLLRL
gtccttatacgaagccgctcatgttagaactcacg






AKLDYHLLQMSYKREL
gtgataaaattttggaggaagctttgttttttaccact






SEIIRWGKELDIISKVPY
actcatttgacccgtgaaatcccaaacgttggttct






ARDRIVECYFWAVATY
actttggctaagcaggtcaaacacgctttagagca






YEPQYSLARMTLTKAT
accattgcacagaggtatcccaagatatgaagcc






VFAGMIDDTYDAYGTL
tactgcttcatttcaatttatgaagaagacgaatcta






DELKIFTEAVERWDSSG
acaacaagttgttattacgtttggcaaagttggatt






IDQLSDYMKAAYTLVL
accatttgttgcaaatgtcctacaaaagagaattgt






NFNKEVGEDLAKKQRT
ccgagatcattagatggggtaaggaattagacatt






YAFDKYIEEWKQYART
atttctaaggttccttatgctagagatagaatcgtc






SFTQSKWFLTNELPSFS
gaatgttacttttgggctgttgccacatattacgag






DYLSNGMVTSTYYLLS
ccacaatactccttggctagaatgacattgaccaa






AATFLGMDGASEDVIN
agctactgttttcgctggtatgatcgatgatacctat






WMSTNPKLFVALTTHA
gacgcttacggtactttagacgaattgaagatattc






RLANDVGSHKFEKERG
actgaagcagtcgaacgttgggattcttccggtat






SGTAIECYMKDYHVSE
tgaccaattgtcagattacatgaaagcagcttaca






EEAMKKFEEMCDDAW
ccttagtcttaaattttaacaaggaagttggtgagg






KVMNEECLRSTTIPREI
atttagccaagaaacaaagaacttacgccttcgac






LKVILNLARTCEVVYK
aagtacatcgaagaatggaagcaatatgctagaa






HRGDGFTDQRRIEAHIN
cctctttcacccaatctaagtggttcttgaccaatg






AMLMDSVSI
aattgccatccttttctgattatttgtctaacggtatg






(SEQ ID NO: 41)
gttacttcaacatactacttattgtctgccgctacatt







cttgggtatggacggtgcttctgaagacgtcataa







attggatgtctactaaccctaaattgttcgtcgcttt







gacaacccatgctagattggccaacgacgttggt







tctcacaagtttgaaaaggaaagaggctccggta







ctgcaatagaatgttatatgaaagattaccacgttt







ctgaggaggaagctatgaagaaattcgaagaaat







gtgtgacgatgcctggaaggtcatgaacgaaga







atgcttgcgttctactaccatcccaagagagatttt







gaaggttattttgaacttggccagaacctgtgaag







tcgtttacaagcatcgtggtgatggtttcactgatc







agagaagaattgaagctcacatcaacgctatgtta







atggactccgtttccatctaa







(SEQ ID NO: 107)





WenAng

Wend-

A0A068
41%
ASTEIAVPLNNQHESVR
atggcctcaacagaaatcgcagttcctttgaataa


SQTS78

landia

VE40

QLADFPENIWADRVAS
ccagcacgagtccgtccgtcaattagctgacttcc


0

angust-



FTLDKQGHDMCAKEIE
cagaaaacatttgggctgatagagttgcttctttta




ifolia



MLKEEVMSMLLEEKP
ccttggataagcaaggtcatgacatgtgtgctaaa






MMEKFNLIDNIERLGIS
gaaatagaaatgttaaaggaagaagtcatgtctat






YHFGDKIEDQLQEYYD
gttgttggaggaaaagccaatgatggaaaaattc






ACTNFEKHAECDLSIAA
aacttgatcgataatattgaaagattaggcatctcc






LQFRLFRQHGFNISCGIF
taccacttcggtgacaagattgaagatcaattaca






DGFLDANGKFKESLCN
agaatattacgacgcctgcactaactttgagaagc






DIKGLLSLYEAAHVRT
atgctgaatgtgatttgtcaatagctgccttgcaatt






HGDKILEEALFFTTTHL
cagattgtttagacaacacggtttcaatatttcttgt






TREIPNVGSTLAKQVKH
ggtatctttgacggtttcttggatgcaaacggtaaa






ALEQPLHRGIPRYEAYC
ttcaaggaatctttatgtaatgacattaagggtttgtt






FISMYEEDESSNKLLLR
gtccttatacgaagccgctcatgttagaactcacg






LAKLDYHLSQMLNKQ
gtgataaaattttggaggaagctttgttttttaccact






DLCEIIRWGKELDIISKV
actcatttgacccgtgaaatcccaaacgttggttct






PYARDRIVECYFWAVA
actttggctaagcaggtcaaacacgctttagagca






TYYEPQYSLARMTLTK
accattgcacagaggtatcccaagatatgaagcc






ATVFAGMIDDTYDAYG
tactgcttcatttcaatgtatgaagaagacgaatctt






TLDELKIFTEAVERWDS
ccaacaagttgttattacgtttggcaaagttggatt






SGIDQLSDYMKAAYTL
accatttgtcccaaatgttgaacaaacaggacttgt






VLNFNKEVGEDLAKKQ
gtgagatcattagatggggtaaggaattagacatt






RTYAFDKYIEEWKQYA
atttctaaggttccttatgctagagatagaattgtcg






RTSFTQSKWFLTNELPS
aatgttacttttgggctgttgccacatactacgaac






FSDYLSNGMVTSTYYL
cacaatattccttggctagaatgacattgaccaaa






LSAATFLGMDGASEDV
gctactgttttcgctggtatgatcgatgatacctatg






INWMSTNPKLFVALTT
acgcttacggtactttagatgagttgaagatattca






HARLANDVGSHKFEKE
ctgaagcagtcgaacgttgggactcttccggtatt






RGSGTAIECYMKDYHV
gaccaattgtcagattacatgaaagcagcttacac






SEEEAMKKFEEMCDDA
cttagtcttaaattttaacaaggaagttggtgaaga






WKVMNEECLRSTTIPR
tttagccaagaaacaaagaacttacgccttcgaca






EILKVILNLARTCEVVY
agtacatcgaagaatggaagcaatatgctagaac






KHRGDGFTDQRRIEAHI
ctctttcacccaatctaagtggttcttgaccaatga






NAMLMDSVSI
gttgccatccttttctgattatttgtctaacggtatgg






(SEQ ID NO: 42)
ttacttcaacatactacttattgtctgccgctacattc







ttgggtatggacggtgcttctgaagatgtcataaat







tggatgtctactaaccctaaattgttcgtcgctttga







caacccatgctagattggccaacgacgttggttct







cacaagtttgaaaaggaaagaggctccggtactg







caatagaatgctatatgaaagattaccacgtttctg







aggaagaagctatgaagaaattcgaggaaatgt







gtgacgatgcctggaaggtcatgaacgaagaat







gtttgcgttctactaccatcccaagagagattttga







aggttattttgaacttggccagaacctgtgaagtc







gtttacaagcatcgtggtgatggtttcactgaccaa







agaagaatcgaagctcacattaacgctatgttaat







ggactccgtttccatctaa







(SEQ ID NO: 108)





WenAng

Wend-

A0A068
75%
ASAQASLPSNNRQETV
atggcctcagcacaagcttccttaccttctaataac


SQTS79

landia

UHT0

RPLADFPENIWADRIAP
agacaggaaacagtccgtccattggctgacttcc


3

angust-



FTLDKQEYEMCQREIE
cagagaacatctgggctgatagaattgccccattt




ifolia



MLKAEVASMLLATGKT
accttggataagcaagaatacgaaatgtgtcaaa






MMQRFDFIDKIERLGVS
gagaaatagagatgttaaaagctgaagttgcttct






HHFDIEIENQLQEFFNV
atgttgttggcaactggtaagactatgatgcaaag






YTNLGEYSAYDLSSAA
attcgacttcattgataagatcgaaagattggggg






LQFRLFRQHGFNISCGIF
tctcccaccattttgacattgaaatcgaaaatcaatt






DQFIDAKGKFKESLCN
gcaagagtttttcaacgtttataccaacttaggtga






DIRGLLSLYEAAHVRTH
atactctgcctatgatttgtcatctgctgccttgcag






GDKILEEALAFTTTHMT
ttccgtttatttagacaacacggtttcaatatttcctg






SGGPHLDSSLAKQVKY
cggtattttcgaccaatttatcgacgctaaaggtaa






ALEQPLHKGILRYEAW
gttcaaggaatctttatgtaacgatatcagaggttt






RYISIYEEDESNNKLLL
gttgtctttgtacgaagctgctcatgttagaactca






RLAKLDYHLLQMSYKQ
cggtgataaaattttggaagaagctttagctttcac






ELCEITRWGKGLESVSN
cactactcacatgacctccggtggtccacatttag






FPYARDRFVECYFWAV
attcttcattggccaagcaagttaaatacgcattgg






GTLYEPQYSLARMTFA
aacagccattgcataagggtatattgagatatgaa






KVAALITMIDDIYDAYG
gcttggagatacatatctatctacgaagaggacg






TLDELQILTDSAERWD
aatccaacaataagttattattgcgtttggctaagtt






GSGVDQLSDYIRASYN
ggactatcacttgttacaaatgtcatacaagcaag






TLLKFNKEVGEDLAKK
agttgtgtgaaattacaagatggggtaaaggtttg






QRTYAFDKYIEDWKQY
gaatctgtctccaactttccttatgcccgtgacaga






MRTSFTQSKWFLTNEL
ttcgttgaatgttacttttgggctgtcggtactttgta






PSFADYISNGAITIGAYL
cgaaccacaatactcattggctagaatgaccttcg






IASAGFLDMDSALEDVI
ctaaggttgctgctttaattactatgatcgatgatatt






NWMSTNPKLMVAYST
tatgatgcctacggtaccttggacgaattgcaaat






HSRLINDYGGHKFDKE
attaactgactctgccgaaagatgggatggttccg






RGSVTALDCYMKDYSV
gtgtcgatcagttgtctgactatattagagcttccta






SEEEAAKKFREMCEDN
taatacattattgaaatttaataaggaggttggtga






WKVMNEECLRPTTIPR
agatttggcaaaaaagcaacgtacctacgctttcg






DGLKMLLNIARVGETV
acaagtacatcgaagattggaaacaatacatgag






YKHRIDGFTQPHAIEEH
aacctctttcactcaatcaaagtggtttttgactaac






IRAMLVDFMSI
gagttgccatctttcgctgattacatttccaacggt






(SEQ ID NO: 43)
gccatcacaatcggtgcatatttaattgcctctgcc







ggttttttggatatggattccgccttggaagacgtt







attaactggatgtctaccaacccaaaattaatggtc







gcttattccacccactcaagattgatcaatgattac







ggtggtcacaagttcgacaaggaaagagggtca







gttactgctttggattgctacatgaaggattactcc







gtctctgaggaagaagctgcaaagaagttcaga







gaaatgtgtgaagacaactggaaggttatgaatg







aagaatgtttgagacctactacaattccaagagat







ggtttgaagatgttgttaaacattgctagagtcggt







gaaactgtttacaaacatagaatcgacggttttact







caacctcatgcaatcgaggagcacattagagcca







tgttagttgacttcatgtctatttaa







(SEQ ID NO: 109)





WenAng

Wend-

A0A068
42%
ASTEIAVPLNNQHESVR
atggcctcaacagaaatcgcagttcctttgaataa


SQTS80

landia

VE40

QLADFPENIWADRVAS
ccagcacgagtccgtccgtcaattagctgacttcc


5

angust-



FTLDKQGHDMCAKEIE
cagaaaacatttgggctgatagagttgcttctttta




ifolia



MLKEEVMSMLLEEKP
ccttggataagcaaggtcatgacatgtgtgctaaa






MMEKFNLIDNIERLGIS
gaaatagaaatgttaaaggaagaagtcatgtctat






YHFGDKIEDQLQEYYD
gttgttggaggaaaagccaatgatggaaaaattc






ACTNFEKHAECDLSIAA
aacttgatcgataatattgaaagattaggcatctcc






LQFRLFRQHGFNISCGIF
taccacttcggtgacaagattgaagatcaattaca






DGFLDANGKFKESLCN
agaatattacgacgcctgcactaactttgagaagc






DIKGLLSLYEAAHVRT
atgctgaatgtgatttgtcaatagctgccttgcaatt






HGDKILEEALFFTTTHL
cagattgtttagacaacacggtttcaatatttcttgt






TREIPNVGSTLAKQVKY
ggtatctttgacggtttcttggatgcaaacggtaaa






ALEQPLHKGIPRYEAW
ttcaaggaatctttatgtaatgacattaagggtttgtt






RYISIYEEDESNNKLLL
gtccttatacgaagccgctcatgttagaactcacg






RLAKLDYHLLQMSYKR
gtgataaaattttggaggaagctttgttttttaccact






ELSEIIRWGKELDIISKV
actcatttgacccgtgaaatcccaaacgttggttct






PYARDRIVECYFWAVA
actttggctaagcaggtcaaatatgctttagagca






TYYEPQYSLARMTLTK
accattgcacaagggtatcccaagatacgaagcc






ATVFAGMIDDTYDAYG
tggagatatatttcaatttacgaagaagacgaatct






TLDELKIFTEAVERWDS
aacaacaagttgttattacgtttggcaaagttggat






SGIDQLSDYMKAAYTL
taccatttgttgcaaatgtcctacaaaagagaattg






VLNFNKEVGEDLAKKQ
tccgagatcattagatggggtaaggaattagacat






RTYAFDKYIEEWKQYA
tatttctaaggttccttatgctagagatagaatcgtc






RTSFTQSKWFLTNELPS
gaatgttatttctgggctgttgccacatactacgag






FSDYLSNGMVTSTYYL
ccacaatactccttggctagaatgacattgaccaa






LSAATFLGMDGASEDV
agctactgtttttgctggtatgatcgatgatacctat






INWMSTNPKLFVALTT
gacgcttacggtactttagacgaattgaagatattc






HARLANDVGSHKFEKE
actgaagcagtcgaacgttgggattcttccggtat






RGSSTAIECYMKDYHV
tgaccaattgtcagattacatgaaagcagcttaca






SEEEAMEKFEEMCDDA
ccttagtcttaaattttaacaaggaagttggtgagg






WKVMNEECLRSTTIPR
atttagccaagaaacaaagaacttacgccttcgac






EILKVILNLARTCEVVY
aagtacatcgaagaatggaagcaatatgctagaa






KHRGDGFTDQRRIEAHI
cctctttcacccaatctaagtggttcttgaccaatg






NAMLMDSVSI
aattgccatccttttctgattatttgtctaacggtatg






(SEQ ID NO: 44)
gttacttcaacatactacttattgtctgccgctacatt







cttgggtatggacggtgcttctgaagacgtcataa







attggatgtctactaaccctaaattgttcgtcgcttt







gacaacccacgctagattggccaacgacgttggt







tctcataaatttgaaaaggaaagaggctcctccac







tgcaatagaatgctatatgaaggattaccacgtttc







tgaggaggaagctatggaaaaattcgaagaaat







gtgtgacgatgcctggaaggtcatgaacgaaga







atgcttgcgttccactaccatcccaagagagatttt







gaaggttattttgaacttggccagaacctgtgaag







tcgtttacaagcatcgtggtgatggtttcactgatc







agagaagaattgaagctcacatcaacgctatgtta







atggactcagtttccatctaa







(SEQ ID NO: 110)





WenAng

Wend-

A0A068
47%
ASTEIAVPLNNQHESVR
atggcctcaacagaaatcgcagttcctttgaataa


SQTS82

landia

VE40

QLADFPENIWADRVAS
ccagcacgagtccgtccgtcaattagctgacttcc


6

angust-



FTLDKQGHDMCAKEIE
cagaaaacatttgggctgatagagttgcttctttta




ifolia



MLKEEVMSMLLEEKP
ccttggataagcaaggtcatgacatgtgtgctaaa






MMEKFNLIDNIERLGIS
gaaatagaaatgttaaaggaagaagtcatgtctat






YHFGDKIEDQLQEYYD
gttgttggaggaaaagccaatgatggaaaaattc






ACTNFEKHAECDLSIAA
aacttgatcgataatattgaaagattaggcatctcc






LQFRLFRQHGFNISCGIF
taccacttcggtgacaagattgaagatcaattaca






DGFLDANGKFKESLCN
agaatattacgacgcctgcactaactttgagaagc






DIKGLLSLYEAAHVRT
atgctgaatgtgatttgtcaatagctgccttgcaatt






HGDKILEEALFFTTTHL
cagattgtttagacaacacggtttcaatatttcttgt






TREIPNVGSTLAKQVKY
ggtatctttgacggtttcttggatgcaaacggtaaa






ALEQPLHKGIPRYEAW
ttcaaggaatctttatgtaatgacattaagggtttgtt






RYISIYEEDESNNKLLL
gtccttatacgaagccgctcatgttagaactcacg






RLAKLDYHLLQMSYKR
gtgataaaattttggaggaagctttgttttttaccact






ELSEIIRWGKELDIISKV
actcatttgacccgtgaaatcccaaacgttggttct






PYARDRIVECYFWAVA
actttggctaagcaggtcaaatatgctttagagca






TYYEPQYSLARMTLTK
accattgcacaagggtatcccaagatacgaagcc






ATVFAGMIDDTYDAYG
tggagatatatttcaatttacgaagaagacgaatct






TLDELKIFTEAVERWDS
aacaacaagttgttattacgtttggcaaagttggat






SGIDQLSDYMKAAYTL
taccatttgttgcaaatgtcctacaaaagagaattg






VLNFNKEVGEDLAKKQ
tccgagatcattagatggggtaaggaattagacat






RTYAFDKYIEEWKQYA
tatttctaaggttccttatgctagagatagaatcgtc






RTSFTQSKWFLTNELPS
gaatgttatttctgggctgttgccacatactacgag






FADYLSNGMVTSTYYL
ccacaatactccttggctagaatgacattgaccaa






LSAAALLDMDSALEDV
agctactgtttttgctggtatgatcgatgatacctat






INWMSTNPKFFVALTT
gacgcttacggtactttagacgaattgaagatattc






HARLTNDVGSHKFEKE
actgaagcagtcgaacgttgggattcttccggtat






RGSGTAIECYMKDYHV
tgaccaattgtcagattacatgaaagcagcttaca






SEEEAMKKFEEMCDDA
ccttagtcttaaattttaacaaggaagttggtgagg






WKVMNEECLRSTTIPR
atttagccaagaaacaaagaacttacgccttcgac






EILKVILNLARTCEVVY
aagtacatcgaagaatggaagcaatatgctagaa






KHRGDGFTDQRRIEAHI
cctctttcacccaatctaagtggttcttgaccaatg






NAMLMDSVSI
aattgccatcctttgcagattatttgtctaacggtat






(SEQ ID NO: 45)
ggttacttcaacatactacttattgtctgctgctgcc







ttgttggacatggactccgctttagaagatgtcata







aattggatgtctaccaaccctaaattcttcgtcgctt







tgacaactcacgctagattgaccaacgacgttggt







tctcataaatttgaaaaggaaagaggttccggtac







tgcaatagaatgctatatgaaggattaccacgtttc







tgaggaggaagctatgaagaaattcgaagaaat







gtgtgacgatgcctggaaggtcatgaacgaaga







atgcttgcgttctactacaatcccaagagagatttt







gaaggttattttgaacttggccagaacctgtgaag







tcgtttacaagcatcgtggtgatggcttcactgacc







agagaagaattgaagctcacatcaacgccatgtt







aatggactccgtttccatctaa







(SEQ ID NO: 111)





WenAng

Wend-

A0A068
74%
ASAQASLPSNNRQETV
atggccagtgcgcaagcatcattaccttccaataa


SQTS82

landia

UHT0

RPLADFPENIWADRIAP
cagacaggaaacagtccgtcccctagctgacttc


9

angust-



FTLDKQEYEMCQREIE
ccagagaacatctgggctgataggattgctccatt




ifolia



MLKAEVASMLLATGKT
taccctggataagcaagaatacgaaatgtgtcaa






MMQRFDFIDKIERLGVS
agagaaatagagatgttgaaagctgaagtggcct






HHFDIEIENQLQEFFNV
ctatgttgcttgccactggaaagactatgatgcaa






YTNLGEYSAYDLSSAA
cgattcgacttcattgataagatcgaaagattggg






LQFRLFRQHGFNISCGIF
cgtatcgcaccattttgacattgaaatcgaaaatca






DQFIDAKGKFKESLCN
actccaagagtttttcaacgtttataccaacttgggt






DIRGLLSLYEAAHVRTH
gaatacagcgcgtatgatctgtcatctgctgcattg






GDKILEEALAFTTTHMT
cagttcagattatttagacaacacggtttcaatattt






SGGPHLDSSLAKQVKY
cctgcggtattttcgaccaatttatcgacgctaaag






ALEQPLHKGILRYEAW
gtaagttcaaggaatctttatgtaacgatatccggg






RYISIYEEDESNNKLLL
gtttgttgtctctctacgaagctgctcatgttagaac






RLAKLDYHLLQMSYKQ
gcacggtgataaaattttggaagaagcattggctt






ELCEITRWGKGLESVSN
ttactactacccatatgacttccggtggtccacacc






FPYARDRFVECYFWAV
tagactctagcttggctaagcaagtcaagtacgc






GTLYEPQYSLARMTFA
gcttgagcaaccattacacaaggggattttgagat






KVAALITMIDDIYDAYG
acgaagcttggcgttatatatccatctacgaagaa






TLDELQILTDSAERWD
gacgaatctaataacaaacttctgttaagattggct






GSGVDQLSDYIRASYN
aaactcgattatcatttgcttcaaatgtcctacaagc






TLLKFNKEVGEDLAKK
aggaattatgtgaaatcacgagatggggcaagg






QRTYAFDKYIEDWKQY
gtttagagtcagtttctaatttcccttacgctagaga






MRTSFTQSKWFLTNEL
tcgttttgttgaatgttatttctgggccgtaggaaca






PSFADYISNGAITIGAYL
ttgtacgaaccgcaatacagtctagccagaatga






IASAGFLDMDSALEDVI
cctttgctaaagttgctgccttgattactatgattga






NWMSTNPKLMVAYST
cgatatctacgatgcctatggtaccttggacgagtt






HSRLINDYGGHKFDKE
acaaatattgaccgattctgctgaaagatgggatg






RGTGTAIECYMKDHNIS
gttcgggagtcgaccaattgtctgactatatacgc






EEEAAKKFREMIENTW
gctagttataacactttgttgaagttcaacaaggaa






KVMNEECLRPIPIPRDT
gtcggtgaggatttagccaaaaagcaaagaacgt






LKMLLNIARVGETVYK
acgcatttgacaaatacatcgaagattggaagca






HRIDGFTQPHAIEEHIRA
atacatgagaacttctttcacccagtccaagtggtt






MLVDFMSI
cctgaccaacgaactcccttccttcgctgactaca






(SEQ ID NO: 46)
tttccaatggggctattacaattggtgcttacttgat







cgccagcgcgggttttttggatatggattctgccct







agaagacgttattaactggatgtctactaacccaa







aattgatggtggcttattcaactcacagcagactta







tcaatgattatggtggtcacaagttcgacaaggaa







agagggacgggtacagctattgaatgctacatga







aggatcataacatctctgaggaagaagctgcaaa







gaagttcagagaaatgatcgagaacacttggaag







gttatgaatgaagaatgtctacggccaattccaatt







ccaagagatactctcaagatgctattgaacattgct







agggtcggtgaaactgtttacaaacacagaatcg







acggttttacccaaccacatgcaatcgaggaaca







catcagggccatgttggtcgacttcatgtcaattta







a







(SEQ ID NO: 112)





WenAng

Wend-

A0A068
45%
ASTEIAVPLNNQHESVR
atggcctcaacagaaatcgcagttcctttgaataa


SQTS84

landia

VE40

QLADFPENIWADRVAS
ccagcacgagtccgtccgtcaattagctgacttcc


3

angust-



FTLDKQGHDMCAKEIE
cagaaaacatttgggctgatagagttgcttctttta




ifolia



MLKEEVMSMLLEEKP
ccttggataagcaaggtcatgacatgtgtgctaaa






MMEKFNLIDNIERLGIS
gaaatagaaatgttaaaggaagaagtcatgtctat






YHFGDKIEDQLQEYYD
gttgttggaggaaaagccaatgatggaaaaattc






ACTNFEKHAECDLSIAA
aacttgatcgataatattgaaagattaggcatctcc






LQFRLFRQHGFNISCGIF
taccacttcggtgacaagattgaagatcaattaca






DGFLDANGKFKESLCN
agaatattacgacgcctgcactaactttgagaagc






DIKGLLSLYEAAHVRT
atgctgaatgtgatttgtcaatagctgccttgcaatt






HGDKILEEALFFTTTHL
cagattgtttagacaacacggtttcaatatttcttgt






TREIPNVGSTLAKQVKH
ggtatctttgacggtttcttggatgcaaacggtaaa






ALEQPLHRGIPRYEAYC
ttcaaggaatctttatgtaatgacattaagggtttgtt






FISIYEEDESNNKLLLRL
gtccttatacgaagccgctcatgttagaactcacg






AKLDYHLLQMSYKREL
gtgataaaattttggaggaagctttgttttttaccact






SEIIRWGKELDIISKVPY
actcatttgacccgtgaaatcccaaacgttggttct






ARDRIVECYFWAVATY
actttggctaagcaggtcaaacacgctttagagca






YEPQYSLARMTLTKAT
accattgcacagaggtatcccaagatatgaagcc






VFAGMIDDTYDAYGTL
tactgcttcatttcaatttatgaagaagacgaatcta






DELKIFTEAVERWDSSG
acaacaagttgttattacgtttggcaaagttggatt






IDQLSDYMKAAYTLVL
accatttgttgcaaatgtcctacaaaagagaattgt






NFNKEVGEDLAKKQRT
ccgagatcattagatggggtaaggaattagacatt






YAFDKYIEEWKQYART
atttctaaggttccttatgctagagatagaatcgtc






SFTQSKWFLTNELPSFS
gaatgttacttttgggctgttgccacatattacgag






DYLSNGMVTSTYYLLS
ccacaatactccttggctagaatgacattgaccaa






AAAFLDMDSASEDVIN
agctactgttttcgctggtatgatcgatgatacctat






WMSTNPKLFVALTTHA
gacgcttacggtactttagacgaattgaagatattc






RLANDVGSHKFEKERG
actgaagcagtcgaacgttgggattcttccggtat






SGTAIECYMKDYNVSE
tgaccaattgtcagattacatgaaagcagcttaca






EEALKKFEEMCEDTWK
ccttagtcttaaattttaacaaggaagttggtgagg






VMNEECLRSTTIPREIL
atttagccaagaaacaaagaacttacgccttcgac






KVILNLARTCEVVYKH
aagtacatcgaagaatggaagcaatatgctagaa






RGDGFTDQRRIEAHINA
cctctttcacccaatctaagtggttcttgaccaatg






MLMDSVSI
aattgccatccttttctgattatttgtctaacggtatg






(SEQ ID NO: 47)
gttacttcaacatactacttattgtctgccgctgcctt







cttggacatggactccgcttctgaagatgtcataa







attggatgtctaccaaccctaaattgttcgtcgcttt







gacaactcatgctagattggccaacgacgttggtt







ctcacaagtttgaaaaggaaagaggttcaggtac







cgcaatagaatgttatatgaaagattacaacgtttc







tgaggaggaagctttgaagaaattcgaagaaatg







tgtgaagatacttggaaggtcatgaacgaagaat







gcttgcgttccactacaatcccaagagagattttg







aaggttattttgaacttggccagaacctgtgaagtc







gtttacaagcatcgtggtgacggcttcactgatca







gagaagaattgaagctcacatcaatgctatgttaa







tggactccgtttccatctaa







(SEQ ID NO: 113)





WenAng

Wend-

A0A068
84%
ASAQASLPSNNRQETV
atggcctcagcacaagcttccttaccttctaataac


SQTS84

landia

UHT0

RPLADFPENIWADRIAP
agacaggaaacagtccgtccattggctgacttcc


8

angust-



FTLDKQEYEMCQREIE
cagagaacatctgggctgatagaattgccccattt




ifolia



MLKAEVASMLLATGKT
accttggataagcaagaatacgaaatgtgtcaaa






MMQRFDFIDKIERLGVS
gagaaatagagatgttaaaagctgaagttgcttct






HHFDIEIENQLQEFFNV
atgttgttggcaactggtaagactatgatgcaaag






YTNLGEYSAYDLSSAA
attcgacttcattgataagatcgaaagattggggg






LQFRLFRQHGFNISCGIF
tctcccaccattttgacattgaaatcgaaaatcaatt






DQFIDAKGKFKESLCN
gcaagagtttttcaacgtttataccaacttaggtga






DIRGLLSLYEAAHVRTH
atactctgcctatgatttgtcatctgctgccttgcag






GDKILEEALAFTTTHMT
ttccgtttatttagacaacacggtttcaatatttcctg






SGGPHLDSSLAKQVKY
cggtattttcgaccaatttatcgacgctaaaggtaa






ALEQPLHKGILRYEAW
gttcaaggaatctttatgtaacgatatcagaggttt






RYISIYEEDESNNKLLL
gttgtctttgtacgaagctgctcatgttagaactca






RLAKLDYHLLQMSYKQ
cggtgataaaattttggaagaagctttagctttcac






ELCEITRWGKGLESVSN
cactactcacatgacctccggtggtccacatttag






FPYARDRFVECYFWAV
attcttcattggccaagcaagttaaatacgcattgg






GTLYEPQYSLARMTFA
aacagccattgcataagggtatattgagatatgaa






KVAALITMIDDIYDAYG
gcttggagatacatatctatctacgaagaggacg






TLDELQILTDSAERWD
aatccaacaataagttattattgcgtttggctaagtt






GSGVDQLSDYIRASYN
ggactatcacttgttacaaatgtcatacaagcaag






TLLKFNKEVGEDLAKK
agttgtgtgaaattacaagatggggtaaaggtttg






QRTYAFDKYIEDWKQY
gaatctgtctccaactttccttatgcccgtgacaga






MRTNFSQSRWFFTKEL
ttcgttgaatgttacttttgggctgtcggtactttgta






PSFADYINNGAITIGAY
cgaaccacaatactcattggctagaatgaccttcg






LVASAAFLYMDSAKED
ctaaggttgctgctttaattactatgatcgatgatatt






VINWMSTNPKLVVAYS
tatgatgcctacggtaccttggacgaattgcaaat






THSRLINDFGGHKFDKE
attaactgactctgccgaaagatgggatggttccg






RGSGTALECYMKDYN
gtgtcgatcagttgtctgactatattagagcttccta






VSEEEAANKFREMMED
taatacattattgaaatttaataaggaggttggtga






AWKVMNEDCLRPTSIP
agatttggcaaaaaagcaacgtacctacgctttcg






RDVSKVLLNVARAGEI
acaagtacatcgaagattggaaacaatacatgag






VYKHRIDGFTEPHIIKD
aaccaacttctctcaatcaagatggtttttcactaag






HIRATLVDFMAIN
gagttgccatctttcgctgattacattaacaacggt






(SEQ ID NO: 48)
gccatcacaatcggtgcatatttggttgcctctgct







gctttcttatatatggactccgcaaaagaagatgtt







atcaactggatgtccacaaaccctaagttggtcgt







tgcttactccactcactctcgtttaattaatgactttg







gtggtcacaagttcgacaaggagagaggttccg







gtactgctttggaatgctacatgaaggactacaat







gtctctgaagaagaagccgcaaacaagtttagag







aaatgatggaggacgcttggaaggttatgaatga







agactgtttaagaccaacttccatccctagagatgt







ctccaaggttttgttaaacgtcgccagagctggtg







aaattgtttacaagcatagaatcgatggttttaccg







aaccacatatcattaaagatcacataagagccac







cttggttgatttcatggctattaattaa







(SEQ ID NO: 114)





WenAng

Wend-

A0A068
75%
ASAQASLPSNNRQETV
atggccagtgcgcaagcatcattaccttccaataa


SQTS84

landia

UHT0

RPLADFPENIWADRIAP
cagacaggaaacagtccgtcccctagctgacttc


9

angust-



FTLDKQEYEMCQREIE
ccagagaacatctgggctgataggattgctccatt




ifolia



MLKAEVASMLLATGKT
taccctggataagcaagaatacgaaatgtgtcaa






MMQRFDFIDKIERLGVS
agagaaatagagatgttgaaagctgaagtggcct






HHFDIEIENQLQEFFNV
ctatgttgcttgccactggaaagactatgatgcaa






YTNLGEYSAYDLSSAA
cgattcgacttcattgataagatcgaaagattggg






LQFRLFRQHGFNISCGIF
cgtatcgcaccattttgacattgaaatcgaaaatca






DQFIDAKGKFKESLCN
actccaagagtttttcaacgtttataccaacttgggt






DIRGLLSLYEAAHVRTH
gaatacagcgcgtatgatctgtcatctgctgcattg






GDKILEEALAFTTTHMT
cagttcagattatttagacaacacggtttcaatattt






SGGPHLDSSLAKQVKY
cctgcggtattttcgaccaatttatcgacgctaaag






ALEQPLHKGILRYEAW
gtaagttcaaggaatctttatgtaacgatatccggg






RYISIYEEDESNNKLLL
gtttgttgtctctctacgaagctgctcatgttagaac






RLAKLDYHLLQMSYKQ
gcacggtgataaaattttggaagaagcattggctt






ELCEITRWGKGLESVSN
ttactactacccatatgacttccggtggtccacacc






FPYARDRFVECYFWAV
tagactctagcttggctaagcaagtcaagtacgc






GTLYEPQYSLARMTFA
gcttgagcaaccattacacaaggggattttgagat






KVAALITMIDDIYDAYG
acgaagcttggcgttatatatccatctacgaagaa






TLDELQILTDSAERWD
gacgaatctaataacaaacttctgttaagattggct






GSGVDQLSDYIRASYN
aaactcgattatcatttgcttcaaatgtcctacaagc






TLLKFNKEVGEDLAKK
aggaattatgtgaaatcacgagatggggcaagg






QRTYAFDKYIEDWKQY
gtttagagtcagtttctaatttcccttacgctagaga






MRTSFTQSKWFLTNEL
tcgttttgttgaatgttatttctgggccgtaggaaca






PSFADYISNGAITIGAYL
ttgtacgaaccgcaatacagtctagccagaatga






IASAGFLDMDSALEDVI
cctttgctaaagttgctgccttgattactatgattga






NWMSTNPKLMVAYST
cgatatctacgatgcctatggtaccttggacgagtt






HSRLINDYGGHKFDKE
acaaatattgaccgattctgctgaaagatgggatg






RGSVTALDCYMKDYSV
gttcgggagtcgaccaattgtctgactatatacgc






SEEEAAKKFREMIENT
gctagttataacactttgttgaagttcaacaaggaa






WKVMNEECLRPIPIPRD
gtcggtgaggatttagccaaaaagcaaagaacgt






TLKMLLNIARVGETVY
acgcatttgacaaatacatcgaagattggaagca






KHRIDGFTEPHIIKDHIR
atacatgagaacttctttcacccagtccaagtggtt






AMLVDFMAIN
cctgaccaacgaactcccttccttcgctgactaca






(SEQ ID NO: 49)
tttccaatggggctattacaattggtgcttacttgat







cgccagcgcgggttttttggatatggattctgccct







agaagacgttattaactggatgtctactaacccaa







aattgatggtggcttattcaactcacagcagactta







tcaatgattatggtggtcacaagttcgacaaggaa







agagggagcgttacagctttggattgctacatgaa







ggattacagtgtctctgaggaagaagctgcaaag







aagttcagagaaatgatcgaaaacacctggaag







gttatgaatgaagaatgtctgcggccaattccaatt







ccaagagatactctaaagatgctattgaacattgct







agggtaggtgaaactgtttacaaacatagaatcg







acggttttactgaaccacatataattaaggaccac







atcagggcaatgttggtcgacttcatggctattaac







taa







(SEQ ID NO: 115)





WenAng

Wend-

A0A068
81%
ASAQASLPSNNRQETV
atggcctcagcacaagcttccttaccttctaataac


SQTS86

landia

UHT0

RPLADFPENIWADRIAP
agacaggaaacagtccgtccattggctgacttcc


4

angust-



FTLDKQEYEMCQREIE
cagagaacatctgggctgatagaattgccccattt




ifolia



MLKAEVASMLLATGKT
accttggataagcaagaatacgaaatgtgtcaaa






MMQRFDFIDKIERLGVS
gagaaatagagatgttaaaagctgaagttgcttct






HHFDIEIENQLQEFFNV
atgttgttggcaactggtaagactatgatgcaaag






YTNLGEYSAYDLSSAA
attcgacttcattgataagatcgaaagattggggg






LQFRLFRQHGFNISCGIF
tctcccaccattttgacattgaaatcgaaaatcaatt






DQFIDAKGKFKESLCN
gcaagagtttttcaacgtttataccaacttaggtga






DIRGLLSLYEAAHVRTH
atactctgcctatgatttgtcatctgctgccttgcag






GDKILEEALAFTTTHMT
ttccgtttatttagacaacacggtttcaatatttcctg






SGGPHLDSSLAKQVKY
cggtattttcgaccaatttatcgacgctaaaggtaa






ALEQPLHKGILRYEAW
gttcaaggaatctttatgtaacgatatcagaggttt






RYISIYEEDESNNKLLL
gttgtctttgtacgaagctgctcatgttagaactca






RLAKLDYHLLQMSYKQ
cggtgataaaattttggaagaagctttagctttcac






ELCEITRWGKGLESVSN
cactactcacatgacctccggtggtccacatttag






FPYARDRFVECYFWAV
attcttcattggccaagcaagttaaatacgcattgg






GTLYEPQYSLARMTFA
aacagccattgcataagggtatattgagatatgaa






KVAALITMIDDIYDAYG
gcttggagatacatatctatctacgaagaggacg






TLDELQILTDSAERWD
aatccaacaataagttattattgcgtttggctaagtt






GSGVDQLSDYIRASYN
ggactatcacttgttacaaatgtcatacaagcaag






TLLKFNKEVGEDLAKK
agttgtgtgaaattacaagatggggtaaaggtttg






QRTYAFDKYIEDWKQY
gaatctgtctccaactttccttatgcccgtgacaga






MRTNFSQSRWFFTKEL
ttcgttgaatgttacttttgggctgtcggtactttgta






PSFADYINNGAITIGAY
cgaaccacaatactcattggctagaatgaccttcg






LVASAAFLYMDSAKED
ctaaggttgctgctttaattactatgatcgatgatatt






VINWMSTNPKLVVAYS
tatgatgcctacggtaccttggacgaattgcaaat






THSRLINDFGGHKFDKE
attaactgactctgccgaaagatgggatggttccg






RGSVTALDCYMKDYSV
gtgtcgatcagttgtctgactatattagagcttccta






SEEEAAKKFREMCEDN
taatacattattgaaatttaataaggaggttggtga






WKVMNEECLRPTTIPR
agatttggcaaaaaagcaacgtacctacgctttcg






DGLKMLLNIARVGETV
acaagtacatcgaagattggaaacaatacatgag






YKHRIDGFTQPHAIEEH
aaccaacttctctcaatcaagatggtttttcactaag






IRAMLVDFMSI
gagttgccatctttcgctgattacattaacaacggt






(SEQ ID NO: 50)
gccatcacaatcggtgcatatttggttgcctctgct







gctttcttatatatggactccgcaaaagaagatgtt







atcaactggatgtccacaaaccctaagttggtcgt







tgcttactccactcactctcgtttaattaatgactttg







gtggtcacaagttcgacaaggagagaggttccgt







tactgctttggactgctacatgaaggactactctgt







ctccgaagaagaagccgcaaagaagtttagaga







aatgtgtgaagacaattggaaggtcatgaatgaa







gagtgtttaagaccaactaccatccctagagatgg







gttgaagatgttgttaaacatagccagagttggtg







aaactgtctacaagcatagaattgatggttttaccc







aaccacatgctatcgaagaacacatcagagctat







gttggttgatttcatgtctatttaa







(SEQ ID NO: 116)





WenAng

Wend-

A0A068
80%
ASAQASLPSNNRQETV
atggcctcagcacaagcttccttaccttctaataac


SQTS92

landia

UHT0

RPLADFPENIWADRIAP
agacaggaaacagtccgtccattggctgacttcc


5

angust-



FTLDKQEYEMCQREIE
cagagaacatctgggctgatagaattgccccattt




ifolia



MLKAEVASMLLATGKT
accttggataagcaagaatacgaaatgtgtcaaa






MMQRFDFIDKIERLGVS
gagaaatagagatgttaaaagctgaagttgcttct






HHFDIEIENQLQEFFNV
atgttgttggcaactggtaagactatgatgcaaag






YTNLGEYSAYDLSSAA
attcgacttcattgataagatcgaaagattggggg






LQFRLFRQHGFNISCGIF
tctcccaccattttgacattgaaatcgaaaatcaatt






DQFIDAKGKFKESLCN
gcaagagtattcaacgtttataccaacttaggtga






DIRGLLSLYEAAHVRTH
atactctgcctatgatttgtcatctgctgccttgcag






GDKILEEALAFTTTHMT
ttccgtttatttagacaacacggtttcaatatttcctg






SGGPHLDSSLAKQVKY
cggtattttcgaccaatttatcgacgctaaaggtaa






ALEQPLHKGILRYEAW
gttcaaggaatctttatgtaacgatatcagaggttt






RYISIYEEDESNNKLLL
gttgtctttgtacgaagctgctcatgttagaactca






RLAKLDYHLLQMSYKQ
cggtgataaaattttggaagaagctttagctttcac






ELCEITRWGKGLESVSN
cactactcacatgacctccggtggtccacatttag






FPYARDRFVECYFWAV
attcttcattggccaagcaagttaaatacgcattgg






GTLYEPQYSLARMTFA
aacagccattgcataagggtatattgagatatgaa






KVAALITMIDDIYDAYG
gcttggagatacatatctatctacgaagaggacg






TLDELQILTDSAERWD
aatccaacaataagttattattgcgtttggctaagtt






GSGVDQLSDYIRASYN
ggactatcacttgttacaaatgtcatacaagcaag






TLLKFNKEVGEDLAKK
agttgtgtgaaattacaagatggggtaaaggtttg






QRTYAFDKYIEDWKQY
gaatctgtctccaactttccttatgcccgtgacaga






MRTNFSQSRWFFTKEL
ttcgttgaatgttacttttgggctgtcggtactttgta






PSFADYINNGAITIGAY
cgaaccacaatactcattggctagaatgaccttcg






LVASAAFLYMDSAKED
ctaaggttgctgctttaattactatgatcgatgatatt






VINWMSTNPKLVVAYS
tatgatgcctacggtaccttggacgaattgcaaat






THSRLINDFGGHKFDKE
attaactgactctgccgaaagatgggatggttccg






RGSVTALDCYMKDYSV
gtgtcgatcagttgtctgactatattagagcttccta






SEEEAAKKFREMIENT
taatacattattgaaatttaataaggaggttggtga






WKVMNEECLRPIPIPRD
agatttggcaaaaaagcaacgtacctacgctttcg






TLKMLLNIARVGETVY
acaagtacatcgaagattggaaacaatacatgag






KHRIDGFTEPHIIKDHIR
aaccaacttctctcaatcaagatggtttttcactaag






AMLVDFMAIN
gagttgccatctttcgctgattacattaacaacggt






(SEQ ID NO: 51)
gccatcacaatcggtgcatatttggttgcctctgct







gctttcttatatatggactccgcaaaagaagatgtt







atcaactggatgtccacaaaccctaagttggtcgt







tgcttactccactcactctcgtttaattaatgactttg







gtggtcacaagttcgacaaggagagaggttccgt







tactgctttggactgctacatgaaggactactctgt







ctccgaagaagaagccgcaaagaagtttagaga







aatgatcgaaaacacctggaaggtcatgaatgaa







gagtgtttaagaccaattccaatccctagagacac







attgaagatgttgttaaacatagccagagttggtg







aaactgtctacaagcatagaattgatggttttactg







aaccacatatcatcaaagatcacatcagagctatg







ttggttgatttcatggctattaattaa







(SEQ ID NO: 117)





WenAng

Wend-

A0A068
81%
YEREIEMLKAEVESML
atgtatgagagagaaatcgaaatgttaaaggctg


SQTS96

landia

VI46

LATGKTMMQRFDFIDK
aagtcgaatctatgttgttggccaccggtaaaaca


0

angust-



IERLGVSHHFDIEIENQL
atgatgcagcgtttcgattttatagacaagattgaa




ifolia



QEFFNVYTNFGEYSAY
agattgggcgtttcccaccatttcgatattgaaatc






DLSSAALQFKQWCDHN
gagaaccaattacaagaatttttcaatgtttacacta






RSLSCSITRGLLSLYEA
acttcggtgaatactcagcttacgacttgtcttccg






AHVRTHGDKILEEALH
cagccttgcaatttaagcaatggtgtgaccacaat






LTSGESHLDSTLAKQV
agatcattatcttgctctattactagaggtttgttatc






KCALEQPLHKGIPRYEA
cttgtatgaggctgctcatgtcagaacccacggtg






WRYISIYEEDESHNKLL
ataagatcttggaagaagctttacacttgacttctg






LRLAKLDYHFLQISYRQ
gtgaatcccatttggactccaccttggctaaacaa






DLCEIIRWDSSGVDQLs
gttaaatgtgcattagaacaaccattgcacaaggg






DYIRAVGEELAKKQRT
tatacctcgttacgaagcctggagatatatttctatc






YAFGTFLGMDGASEDV
tacgaagaggatgaatcacataacaagttgttgtt






INWMSTIPKLMFACSTH
gagattagctaaattggattatcacttcttacagatt






ARLINDFGGHKFDKER
tcttacagacaagatttgtgtgaaatcattcgttgg






GTGTALECYMKDYNVS
gactcatctggtgtcgaccaattatctgattacatc






EEEAANKFREMMEDA
agagcagttggtgaggaattggctaagaagcaa






WKVMNEECLRPTTIPR
agaacatacgctttcggtacttttttaggtatggatg






EILKMLLNIVRVGETTN
gtgcctctgaagatgttattaactggatgtccacta






KHRIDGFTQPHAIEEHIR
tcccaaagttgatgttcgcttgctctacacatgcca






AMLVDFMSV
gattgattaatgactttggtggtcataaattcgataa






(SEQ ID NO: 52)
ggaaagaggtactggtaccgctttagagtgttata







tgaaagactataacgtctccgaagaagaagccg







ccaacaagtttagagaaatgatggaggacgcttg







gaaagttatgaatgaagaatgtttgcgtccaacca







ctattccaagagaaatattaaagatgttgttgaaca







tcgtccgtgttggtgaaactactaataagcacaga







atcgatggtttcacacagcctcacgctattgagga







acacattagagctatgttggttgactttatgtccgtc







taa







(SEQ ID NO: 118)
















TABLE 11







Non-limiting examples of sequence fragment(s) derived from rare plants.













SEQ



Ancient DNA

ID


Chimera Name
Source
Fragments
NO





HibWilSQTS117

Hibiscadelphus

LKDEEGNFKASLTSDVPGLLELYEASYLRVHGEDI
119




wilderianus

LDEAISFA





NKALLQFAKIDFNMLQLLHRKELSEICRWWKDLD
120




FTRKLP





DRVVEGYFWIMGVYFEPQYSLGRKMLTKVIAMA
121




SIVDDTYDSFATYDELIPYTDAIER





YMQISYKALLDVYEEMEQLLADKGRQYRVEY
122




WTHLNYKPTFEEFRDNALPTSGYAMLAIT
123




TFEWAASDPKIIKASTIICRFMDDIAE
124




EDDCSAIECYMEQYKVTAQEAYDEFNKHIESSWK
125




DVNEEFLK






HibWilSQTS118

Hibiscadelphus

EAFNKLKDEEGNFKASLTSDVRGLLELYQASYMR
126




wilderianus

IHGEDILDEAISFTTAQLTLALPTLDPP





NKALLQFAKIDFNMLQLLHRKELSEICRWWKDLD
127




FTRKLP





DRVVEGYFWIMGVYFEPQYSLGRKMLTKVIAMA
128




SIVDDTYDSFATYDELIPYTDAIER





YMQISYKALLDVYEEMEQLLADKGRQYRVEY
129




WTHLNYKPTFEEFRDNALPTSGYAMLAIT
130




TFEWAASDPKIIKASTIICRFMDDIAE
131




SAIECYMKQYGATAQEAYDEFNKHIESSWK
132





HibWilSQTS120

Hibiscadelphus

LKDEEGNFKASLTSDVPGLLELYEASYLRVHGEDI
133




wilderianus

LDEAISFA





NKALLQFAKIDFNMLQLLHRKELSEICRWWKDLD
134




FTRKLP





DRVVEGYFWIMGVYFEPQYSLGRKMLTKVIAMA
135




SIVDDTYDSFATYDELIPYTDAIER





YMQISYKALLDVYEEMEQLLADKGRQYRVEY
136




WTHLNYKPTFEEFRDNALPTSGYAMLAIT
137




TFEWAASDPKIIKASTIICRFMDDIAE
138




SAIECYMKQYGATAQEAYDEFNKHIESSWK
139





HibWilSQTS121

Hibiscadelphus

EAFNKLKDEEGNFKASLTSDVRGLLELYQASYMR
140




wilderianus

IHGEDILDEAISFTTAQLTLALPTLDPP





LLEFAKIDFNLLQLLHRKELSEICRWWKD
141




DRVVEGYFWIMGVYFEPQYSLGRKMLTKVIAMA
142




SIVDDTYDSFATYDELIPYTDAIER





YMQISYKALLDVYEEMEQLLADKGRQYRVEY
143




WTHLNYKPTFEEFRDNALPTSGYAMLAIT
144




TFEWAASDPKIIKASTIICRFMDDIAE
145




EDDCSAIECYMEQYKVTAQEAYDEFNKHIESSWK
146




DVNEEFLK






HibWilSQTS123

Hibiscadelphus

LKDEEGNFKASLTSDVPGLLELYEASYLRVHGEDI
147




wilderianus

LDEAISFA





LLEFAKIDFNLLQLLHRKELSEICRWWKD
148




DRVVEGYFWIMGVYFEPQYSLGRKMLTKVIAMA
149




SIVDDTYDSFATYDELIPYTDAIER





YMQISYKALLDVYEEMEQLLADKGRQYRVEY
150




WTHLNYKPTFEEFRDNALPTSGYAMLAIT
151




TFEWAASDPKIIKASTIICRFMDDIAE
152




EDDCSAIECYMEQYKVTAQEAYDEFNKHIESSWK
153




DVNEEFLK






HibWilSQTS124

Hibiscadelphus

EAFNKLKDEEGNFKASLTSDVRGLLELYQASYMR
154




wilderianus

IHGEDILDEAISFTTAQLTLALPTLDPP





LLEFAKIDFNLLQLLHRKELSEICRWWKD
155




DRVVEGYFWIMGVYFEPQYSLGRKMLTKVIAMA
156




SIVDDTYDSFATYDELIPYTDAIER





YMQISYKALLDVYEEMEQLLADKGRQYRVEY
157




WTHLNYKPTFEEFRDNALPTSGYAMLAIT
158




TFEWAASDPKIIKASTIICRFMDDIAE
159




SAIECYMKQYGATAQEAYDEFNKHIESSWK
160





HibWilSQTS126

Hibiscadelphus

LKDEEGNFKASLTSDVPGLLELYEASYLRVHGEDI
161




wilderianus

LDEAISFA





LLEFAKIDFNLLQLLHRKELSEICRWWKD
162




DRVVEGYFWIMGVYFEPQYSLGRKMLTKVIAMA
163




SIVDDTYDSFATYDELIPYTDAIER





YMQISYKALLDVYEEMEQLLADKGRQYRVEY
164




WTHLNYKPTFEEFRDNALPTSGYAMLAIT
165




TFEWAASDPKIIKASTIICRFMDDIAE
166




SAIECYMKQYGATAQEAYDEFNKHIESSWK
167





HibWilSQTS19

Hibiscadelphus

FEQERGHCASAVECYMREHGVSEEEACSELKKQV
168




wilderianus

DNAWKDINHEMIFSETSKAVPMSVLTRVLNLTR






HibWilSQTS34

Hibiscadelphus

GYHVDGEEAFNMLKDEEGNFKASLTSDVPGLLEL
169




wilderianus

YQASYMRIHGEDILDEAISFTTAQLTLALPTLDPPL





S






HibWilSQTS52

Hibiscadelphus

FEQERGHCASAVECYMREHGVSEEEACSELKKQV
170




wilderianus

DNAWKDINHEMIFSETSKAVPMSVLTRVLNLTR






HibWilSQTS54

Hibiscadelphus

GYHVDGEEAFNMLKDEEGNFKASLTSDVPGLLEL
171




wilderianus

YQASYMRIHGEDILDEAISFTTAQLTLALPTLDPPL





SE






HibWilSQTS55

Hibiscadelphus

FEQERGHCASAVECYMREHGVSEEEACSELKKQV
172




wilderianus

DNAWKDINHEMIFSETSKAVPMSVLTRVLNLTR






HibWilSQTS63

Hibiscadelphus

EQERGHCASAVECYMREHGVSEEEACSELKKQV
173




wilderianus

DNAWKDINHEMIFSETSKAVPMSVLTRVLNLTR






HibWilSQTS90

Hibiscadelphus

GYHVDGEEAFNMLKDEEGNFKASLTSDVPGLLEL
174




wilderianus

YQASYMRIHGEDILDEAISFTTAQLTLALPTLDPPL





S





FEQERGHCASAVECYMREHGVSEEEACSELKKQV
175




DNAWKDINHEMIFSETSKAVPMSVLTRVLNLTRG






LeuGraSQTS335

Leucadendron

DAFNRFKDTKGSFKEDLIKDVNSMLCLYEATHLR
176




grandiflorum

VHGEDILDEALGFTTSQLKSILPKLKPLLASQVMH





ALKQPL






LeuGraSQTS345

Leucadendron

FNKFKNSDGNFKEDLINDVSGMLCLYEATHLRVH
177




grandiflorum

GEDILDEALEFTTTRLKSILPDLEPPLATQVMHA






LeuGraSQTS365

Leucadendron

IFNKFKNSDGNFKEDLINDVSGMLCLYEATHLRV
178




grandiflorum

HGEDILDEALEFTTTRLKSILPDLEPPL






LeuGraSQTS377

Leucadendron

DAFNRFKDTKGSFKEDLIKDVNSMLCLYEATHLR
179




grandiflorum

VHGEDILDEALGFTTSQLKSILPKLKPLLASQVMH





ALKQPL






LeuGraSQTS379

Leucadendron

IFNKFKNSDGNFKEDLINDVSGMLCLYEATHLRV
180




grandiflorum

HGEDILDEALEFTTTRLKSILPDLEPPLATQVMHA






LeuGraSQTS385

Leucadendron

ETNFTNSPLLSKLQNELSVAHLEELKLEVKQLIWS
181




grandiflorum

TKDPLFLLKFIDSIQRLGVAYHFEEEIKESLHLVYL





E






LeuGraSQTS393

Leucadendron

IFNKFKNSDGNFKEDLINDVSGMLCLYEATHLRV
182




grandiflorum

HGEDILDEALEFTTTRLKSILP






MacVolSQTS1139

Macrostylis

EGLEQKIRTMLISPTDTISKKLSLIDAVQRLGVAYH
183




villosa

FEKEIEDEIEKLSCKEYNDGNDLQTVALRFRLLRQ





QGYFVSC






MacVolSQTS2198

Macrostylis

LQRLGLAYHFENQIKEALQSI
184




villosa

LSHLSTSLAEQVKHSLEIPLHRGMPRLEARHYISIY
185




EEDNSS





ELAKLDFNLLQALHRRELGEISRWWKDIDFATKL
186




PFARDRLVECYFWILGVYFEPKYSITRKFMTKVIAI





ASVIDDIYDVYGTLEELKLFTHAIERWETVAANEL





PKYMQVCYFALLDVFKEMEDKLVNKGLLYSMPC





AKEAVKGLVRAYFVEAEWFNANYMPTFEEYMEN





STMSSGYPMLAVEALIGIEDATISKEAFDWAISVP





KIIRSCALIARLVDDIH





DAPSSVECYMQQYDVSEEEACNRIKGMVEIEW
187




NLARMMVVLYQNGDNYTNSSGKTKDRIASLLV
188




LQRLGLAYHFENQIKEALQSI
189





MacVolSQTS2202

Macrostylis

KFKDEKGEFKDMIRNDARGLLCLYEASHLRVKGE
190




villosa

DILEEATEFSRKHLKSLLPQLSTSLAEQVKHSLEIP





LHRGMPRLEARHYISIYEENNSSRNELLLELAKLD





FNLLQALHRRELGDISRWWKDIDFATKLPFARDR





LVECYFWILGVYFEPKYSITRKFMTKVIAIASVIDD





IYDVYGTLEELKLFTHAIERWETVAANELPKYMQ





VCYFALLDVFKEMEDKLVNKGLLYSMPCAKEAV





KGLVRAYFVEAEWFNANYMPTFEEYMENSTMSS





GYPMLAVEALIGIEDATISKEAFDWAISVPKIIRSC





ALIARLVDDIH





KVEQERGDAPSSVQCYVQQ
191




NLARMMVVLYQNGDNYTNSSGKTKDRIASLLV
192




LQRLGLAYHFENQIKEALQSI
193





MacVolSQTS2222

Macrostylis

KFKDEKGEFKDMIRNDARGLLCLYEASHLRVKGE
194




villosa

DILEEATEFSRKHLKSLLPQLSTSLAEQVKHSLEIP





LHRGMPRLEARHYISIYEENNSSRNELLLELAKLD





FNLLQALHRRELGDISRWWKDIDFATKLPFARDR





LVECYFWILGVYFEPKYSITRKFMTKVIAIASVIDD





IYDVYGTLEELKLFTHAIERWETVAANELPKYMQ





VCYFALLDVFKEMEDKLVNKGLLYSMPCAKEAV





YVPTFEEYMENSTMSSGYPMLAVEALV
195




DWAISVPKIIRSCALIA
196




KVEQERGDAPSSVQCYMQQYDVSEEEACNRIKG
197




MVETAWMEINGEIQDTNHL





NLARMMVVLYQNGDNYTNSSGKTKDRIASLLV
198





MacVolSQTS2251

Macrostylis

LQRLGLAYHFENQIKEALQSI
199




villosa

KFKDEKGEFKDMIRNDARGLLCLYEASHLRVKGE
200




DILEEATEFSRKHLKSLLPQLSTSLAEQVKHSLEIP





LHRGMPRLEARHYISIYEENNSSRNELLLELAKLD





FNLLQALHRRELGDISRWWKDIDFATKLPFARDR





LVECYFWILGVYFEPKYSITRKFMTKVIAIASVIDD





IYDVYGTLEELKLFTHAIERWETVAANELPKYMQ





VCYFALLDVFKEMEDKLVNKGLLYSMPCAKEAV





YVPTFEEYMENSTMSSGYPMLAVEALV
201




DWAISVPKIIRSCALIA
202




DAPSSVECYMQQYDVSEEEACNRIKGMVEIEW
203




NLARMMVVLYQNGDNYTNSSGKTKDRIASLLV
204





MacVolSQTS2274

Macrostylis

KFIQNVEKDSTRRSANFHPSIWGDH
205




villosa

DDGSVKHQQLKEEIRKMLTAETKLSQKLDLIDAIQ
206




RLGVAYHFESEIDEIL





SLARNVRGMLSLYEATHLRVHGENILDEA
207




LEARNYMPFYQEEASHNEALLTFAKLDFNKLQKL
208




HQKELSEITR





FEQSREHVASSIECYMKQYGATEEETCNELRKQV
209




SNAWKDINEECLCPTAVPMPLIVRILNLT






OrbStiSQTS1368

Orbexilum

AEVFERFKDQHGNFKASLSSDVEGMLSLYEASFL
210




stipulatum

DYEGEDILDEAKAFTSFHLRGAL






OrbStiSQTS1414

Orbexilum

VKLELVDDVKRLGIGYRFEKEIVEALHRCFISSERF
211




stipulatum

THRNLHQTALSFRLLRECGYDVT





FNKFTNKEGKFNSKLGENIKGMIDLYEASQLGIAG
212




EYILAEAGEFSGLVLKEKVACINN





VYFEPQYSVPRRTTTKVIGLCSVIDDMYDAYGTID
213




ELELFTNAIERLDTST





RWLKCNHAPTMEEYMKVRGVSSGYPLLITISFIG
214




MEDTTEEILTWATSEPMIIRASVIVCRLMDDI






ShoCusSQTS154

Shorea cuspidata

FMDEKGKFKEDVVNDVLGMLNLYEAAHLRLRGE
215




DILDEALAFTTSHLE





WWKNLDFSTKLPYARDRIVECYFWIMGAYFE
216




SLARTFLTKVIAMTSILDDTYDNYG
217




DYVPPIEEYMQVARISSAYPMLITNSFVGMGEVAT
218




KEAFDWISNDPKILKASTTICRLMDD





EFEQTRDHVASGVECYMKQYGVSREETVK
219





ShoCusSQTS155

Shorea cuspidata

FMDEKGKFKEDVVNDVLGMLNLYEAAHLRLRGE
220




DILDEALAFTTSHLE





WWKNLDFSTKLPYARDRIVECYFWIMGAYFE
221




SLARTFLTKVIAMTSILDDTYDNYG
222




YMQVALISSAYPMLITNSFVGMGEVATKEAFDWI
223




SNNPKMLKASTII





EFEQTRDHVASGVECYMKQYGVSREETVK
224





ShoCusSQTS156

Shorea cuspidata

FMDEKGKFKEDVVNDVLGMLNLYEAAHLRLRGE
225




DILDEALAFTTSHLE





WWKNLDFSTKLPYARDRIVECYFWIMGAYFE
226




SLARTFLTKVIAMTSILDDTYDNYG
227




DYVPPIEEYMQVARIS
228




GYPMLITNSLVGMGEVATKEAFDLISNDPKMLKA
229




ST





EFEQTRDHVASGVECYMKQYGVSREETVK
230





ShoCusSQTS157

Shorea cuspidata

FMDEKGKFKEDVVNDVLGMLNLYEAAHLRLRGE
231




DILDEALAFTTSHLE





WWKNLDFSTKLPYARDRIVECYFWIMGAYFE
232




SLARTFLTKVIAMTSILDDTYDNYG
233




VPPMDEYMQVALISCGYPMLITNSFVGMGEVATK
234




EAFDWISNDPKILKASTTICRLMDD





EFEQTRDHVASGVECYMKQYGVSREETVK
235





ShoCusSQTS160

Shorea cuspidata

FMDEKGKFKEDVVNDVLGMLNLYEAAHLRLRGE
236




DILDEALAFTTSHLE





WWKNLDFATMLPYARDRIVECYFWIMGVYFEPK
237




YSLARTFLTKVIAMTSILDDTYDNYG





YMQVALISSAYPMLITNSFVGMGEVATKEAFDWI
238




SNNPKMLKASTII





EFEQTRDHVASGVECYMKQYGVSREETVK
239





ShoCusSQTS161

Shorea cuspidata

FMDEKGKFKEDVVNDVLGMLNLYEAAHLRLRGE
240




DILDEALAFTTSHLE





WWKNLDFATMLPYARDRIVECYFWIMGVYFEPK
241




YSLARTFLTKVIAMTSILDDTYDNYG





DYVPPIEEYMQVARIS
242




GYPMLITNSLVGMGEVATKEAFDLISNDPKMLKA
243




ST





EFEQTRDHVASGVECYMKQYGVSREETVK
244





WenAngSQTS1007

Wendlandia

SNNRQETVRPLADFPENIWADRIAPFT
245




angustofolia

EMCQREIEMLKAEVASMLLATGKTMMQRFDFID
246




KIERLGVSHHFD





IFDQFIDAKGKFKESLCNDIRGLLSLYEAAHVRTH
247




GDKILEEALAFTTTHMTSGGPHLDSSLAKQVKYA





LEQPLHKGILRYEAWRYISIYEEDESNNKLLLRLA





KLDYHLLQMSYKQEL





RWGKGLESVSNFPYARDRFVECYFWAVGTLYEP
248




QYSLARMTFAKVAA





RWDGSGVDQLSDYIRASYNTLLKFNKEVGEDLAK
249




KQRTYAFDKYIEDWKQYMRTNFSQSRWFFTKELP





SFADYINNGAITIGAYLVASAAFLYMDSAKEDVIN





WMSTNPKLVVAYSTHSRLINDFGGHKFEKERGSS





TAIECYMKDHNVSEEEAANKFREMMEDAWKVM





NEECLRPTTI





ETVYKHRIDGFTQPHAIEEHIRAMLVDFMSI
250





WenAngSQTS1086

Wendlandia

SNNRQETVRPLADFPENIWADRIAPFT
251




angustofolia

EMCQREIEMLKAEVASMLLATGKTMMQRFDFID
252




KIERLGVSHHFD





IFDQFIDAKGKFKESLCNDIRGLLSLYEAAHVRTH
253




GDKILEEALAFTTTHMTSGGPHLDSSLAKQVKYA





LEQPLHKGILRYEAWRYISIYEEDESNNKLLLRLA





KLDYHLLQMSYKQEL





RWGKGLESVSNFPYARDRFVECYFWAVGTLYEP
254




QYSLARMTFAKVAA





RWDGSGVDQLSDYIRASYNTLLKFNKEVGEDLAK
255




KQRTYAFDKYIEDWKQYMRTNFSQSRWFFTKELP





SFADYINNGAITIGAYLVASAAFLYMDSAKEDVIN





WMSTNPKLVVAYSTHSRLINDFGGHK





KERGTGTAIECYMKDHN
256




EMIENTWKVMNEECLRPIPIPRDTLKML
257




ETVYKHRIDGFTQPHAIEEHIRAMLVDFMSI
258





WenAngSQTS267

Wendlandia

LELVDNLERLGLAYHFEGQINRLLSSAYNANHED
259




angustofolia

EGNHKRNKEDLYAAALEFRIFRQHGFNV






WenAngSQTS302

Wendlandia

YVSQANELKEQVKMMLDEEDMKLLDCLELVDNL
260




angustofolia

ERLGLAYHFEGQINRLLSSAYNANHEDEGNHKRN





KEDLYAAALEFRIFRQHGFNVPQ






WenAngSQTS738

Wendlandia

NNQHESVRQLADFPENIWADRV
261




angustofolia

QGHDMCAKEIEMLKEEVMSMLLE
262




STLAKQVKYALEQPLHKGIPRYEAWRYISIYEED
263




LAKLDYHLSQMLNKQDLCEI
264




RDRIVECYFWAVATYYEPQYSLARMT
265




EVGEDLAKKQRTYAFDKYIE
266




YARTSFTQSKWFLTNELPSFSDYL
267




AAFLDMDSASEDVINWMSTNPKLFVALTTHARLA
268




NDVGSHKFEKERGSGTAIECYMKDYHVSEEEAM





KKFEEMCDDAWKVMNEE






WenAngSQTS760

Wendlandia

NNQHESVRQLADFPENIWADRV
269




angustofolia

QGHDMCAKEIEMLKEEVMSMLLE
270




QVKHALEQPLHRGIPRYEAYCFISIYEEDESNNKLL
271




LRLAKLDYHLLQMSYKRE





RDRIVECYFWAVATYYEPQYSLARMT
272




EVGEDLAKKQRTYAFDKYIE
273




YARTSFTQSKWFLTNELPSFSDYL
274




TFLGMDGASEDVINWMSTNPKLFVA
275




KFEKERGSGTAIECYMKDYHVSEEEAMKKFEEMC
276




DDAWKVMNEE






WenAngSQTS780

Wendlandia

NNQHESVRQLADFPENIWADRV
277




angustofolia

QGHDMCAKEIEMLKEEVMSMLLE
278




QVKHALEQPLHRGIPRYEAYCF
279




LAKLDYHLSQMLNKQDLCEI
280




RDRIVECYFWAVATYYEPQYSLARMT
281




EVGEDLAKKQRTYAFDKYIE
282




YARTSFTQSKWFLTNELPSFSDYL
283




TFLGMDGASEDVINWMSTNPKLFVA
284




KFEKERGSGTAIECYMKDYHVSEEEAMKKFEEMC
285




DDAWKVMNEE






WenAngSQTS793

Wendlandia

SNNRQETVRPLADFPENIWADRIAPFT
286




angustofolia

EMCQREIEMLKAEVASMLLATGKTMMQRFDFID
287




KIERLGVSHHFD





IFDQFIDAKGKFKESLCNDIRGLLSLYEAAHVRTH
288




GDKILEEALAFTTTHMTSGGPHLDSSLAKQVKYA





LEQPLHKGILRYEAWRYISIYEEDESNNKLLLRLA





KLDYHLLQMSYKQEL





RWGKGLESVSNFPYARDRFVECYFWAVGTLYEP
289




QYSLARMTFAKVAA





RWDGSGVDQLSDYIRASYNTLLKFNKEVGEDLAK
290




KQRTYAFDKYIEDWKQYMRTSFTQSKWFLTNELP





SFADY





LDMDSALEDVINWMSTNPKLMVAY
291




KFDKERGSVTALDCYMKDYSVSEEEAAKKFREM
292




CEDNWKVMNEECLRPTTI





ETVYKHRIDGFTQPHAIEEHIRAMLVDFMSI
293





WenAngSQTS805

Wendlandia

NNQHESVRQLADFPENIWADRV
294




angustofolia

QGHDMCAKEIEMLKEEVMSMLLE
295




STLAKQVKYALEQPLHKGIPRYEAWRYISIYEEDE
296




SNNKLLLRLAKLDYHLLQMSYKRE





RDRIVECYFWAVATYYEPQYSLARMT
297




EVGEDLAKKQRTYAFDKYIE
298




YARTSFTQSKWFLTNELPSFSDYL
299




TFLGMDGASEDVINWMSTNPKLFVA
300




STAIECYMKDYHVSEEEAMEKFEEMCDDAWKVM
301




NEE






WenAngSQTS826

Wendlandia

NNQHESVRQLADFPENIWADRV
302




angustofolia

QGHDMCAKEIEMLKEEVMSMLLE
303




STLAKQVKYALEQPLHKGIPRYEAWRYISIYEEDE
304




SNNKLLLRLAKLDYHLLQMSYKRE





RDRIVECYFWAVATYYEPQYSLARMT
305




EVGEDLAKKQRTYAFDKYIE
306




YARTSFTQSKWFLTNELPSFADYLS
307




AALLDMDSALEDVINWMSTNPKFFVALTTHARLT
308




NDVGSHKFEKERGSGTAIECYMKDYHVSEEEAM





KKFEEMCDDAWKVMNEE






WenAngSQTS829

Wendlandia

SNNRQETVRPLADFPENIWADRIAPFT
309




angustofolia

EMCQREIEMLKAEVASMLLATGKTMMQRFDFID
310




KIERLGVSHHFD





IFDQFIDAKGKFKESLCNDIRGLLSLYEAAHVRTH
311




GDKILEEALAFTTTHMTSGGPHLDSSLAKQVKYA





LEQPLHKGILRYEAWRYISIYEEDESNNKLLLRLA





KLDYHLLQMSYKQEL





RWGKGLESVSNFPYARDRFVECYFWAVGTLYEP
312




QYSLARMTFAKVAA





RWDGSGVDQLSDYIRASYNTLLKFNKEVGEDLAK
313




KQRTYAFDKYIEDWKQYMRTSFTQSKWFLTNELP





SFADY





LDMDSALEDVINWMSTNPKLMVAY
314




KERGTGTAIECYMKDHN
315




EMIENTWKVMNEECLRPIPIPRDTLKML
316




ETVYKHRIDGFTQPHAIEEHIRAMLVDFMSI
317





WenAngSQTS843

Wendlandia

NNQHESVRQLADFPENIWADRV
318




angustofolia

QGHDMCAKEIEMLKEEVMSMLLE
319




QVKHALEQPLHRGIPRYEAYCFISIYEEDESNNKLL
320




LRLAKLDYHLLQMSYKRE





RDRIVECYFWAVATYYEPQYSLARMT
321




EVGEDLAKKQRTYAFDKYIE
322




YARTSFTQSKWFLTNELPSFSDYL
323




AAFLDMDSASEDVINWMSTNPKLFVALTTHARLA
324




NDVGSHK





RGSGTAIECYMKDYNVSEEEALKKFEEMCEDTW
325




KVMNEE






WenAngSQTS848

Wendlandia

SNNRQETVRPLADFPENIWADRIAPFT
326




angustofolia

EMCQREIEMLKAEVASMLLATGKTMMQRFDFID
327




KIERLGVSHHFD





IFDQFIDAKGKFKESLCNDIRGLLSLYEAAHVRTH
328




GDKILEEALAFTTTHMTSGGPHLDSSLAKQVKYA





LEQPLHKGILRYEAWRYISIYEEDESNNKLLLRLA





KLDYHLLQMSYKQEL





RWDGSGVDQLSDYIRASYNTLLKFNKEVGEDLAK
329




KQRTYAFDKYIEDWKQYMRTNFSQSRWFFTKELP





SFADYINNGAITIGAYLVASAAFLYMDSAKEDVIN





WMSTNPKLVVAYSTHSRLINDFGGHKFDKERGSG





TALECYMKDYNVSEEEAANKFREMMEDAWKVM





NEDCLRPTSIPRDVSKVLLNVARAGEIVYKHRIDG





FTEPHIIKDHIRATLVDFMAIN





RWGKGLESVSNFPYARDRFVECYFWAVGTLYEP
330




QYSLARMTFAKVAA






WenAngSQTS849

Wendlandia

SNNRQETVRPLADFPENIWADRIAPFT
331




angustofolia

EMCQREIEMLKAEVASMLLATGKTMMQRFDFID
332




KIERLGVSHHFD





IFDQFIDAKGKFKESLCNDIRGLLSLYEAAHVRTH
333




GDKILEEALAFTTTHMTSGGPHLDSSLAKQVKYA





LEQPLHKGILRYEAWRYISIYEEDESNNKLLLRLA





KLDYHLLQMSYKQEL





RWGKGLESVSNFPYARDRFVECYFWAVGTLYEP
334




QYSLARMTFAKVAA





RWDGSGVDQLSDYIRASYNTLLKFNKEVGEDLAK
335




KQRTYAFDKYIEDWKQYMRTSFTQSKWFLTNELP





SFADY





LDMDSALEDVINWMSTNPKLMVAY
336




KFDKERGSVTALDCYMKDYSVSEEEAAKKFREMI
337




ENTWKVMNEECLRPIPIPRDTLKML





EPHIIKDHIRAMLVDFMAI
338





WenAngSQTS864

Wendlandia

SNNRQETVRPLADFPENIWADRIAPFT
339




angustofolia

EMCQREIEMLKAEVASMLLATGKTMMQRFDFID
340




KIERLGVSHHFD





IFDQFIDAKGKFKESLCNDIRGLLSLYEAAHVRTH
341




GDKILEEALAFTTTHMTSGGPHLDSSLAKQVKYA





LEQPLHKGILRYEAWRYISIYEEDESNNKLLLRLA





KLDYHLLQMSYKQEL





RWGKGLESVSNFPYARDRFVECYFWAVGTLYEP
342




QYSLARMTFAKVAA





RWDGSGVDQLSDYIRASYNTLLKFNKEVGEDLAK
343




KQRTYAFDKYIEDWKQYMRTNFSQSRWFFTKELP





SFADYINNGAITIGAYLVASAAFLYMDSAKEDVIN





WMSTNPKLVVAYSTHSRLINDFGGHKFDKERGSV





TALDCYMKDYSVSEEEAAKKFREMCEDNWKVM





NEECLRPTTI





ETVYKHRIDGFTQPHAIEEHIRAMLVDFMSI
344





WenAngSQTS925

Wendlandia

SNNRQETVRPLADFPENIWADRIAPFT
345




angustofolia

EMCQREIEMLKAEVASMLLATGKTMMQRFDFID
346




KIERLGVSHHFD





IFDQFIDAKGKFKESLCNDIRGLLSLYEAAHVRTH
347




GDKILEEALAFTTTHMTSGGPHLDSSLAKQVKYA





LEQPLHKGILRYEAWRYISIYEEDESNNKLLLRLA





KLDYHLLQMSYKQEL





RWGKGLESVSNFPYARDRFVECYFWAVGTLYEP
348




QYSLARMTFAKVAA





RWDGSGVDQLSDYIRASYNTLLKFNKEVGEDLAK
349




KQRTYAFDKYIEDWKQYMRTNFSQSRWFFTKELP





SFADYINNGAITIGAYLVASAAFLYMDSAKEDVIN





WMSTNPKLVVAYSTHSRLINDFGGHKFDKERGSV





TALDCYMKDYSVSEEEAAKKFREMIENTWKVMN





EECLRPIPIPRDTLKML





EPHIIKDHIRAMLVDFMAI
350





WenAngSQTS960

Wendlandia

EAFNKLKDEEGNFKASLTSDVRGLLELYQASYMR
351




angustofolia

IHGEDILDEAISFTTAQLTLALPTLDPP





NKALLQFAKIDFNMLQLLHRKELSEICRWWKDLD
352




FTRKLP





DRVVEGYFWIMGVYFEPQYSLGRKMLTKVIAMA
353




SIVDDTYDSFATYDELIPYTDAIER





YMQISYKALLDVYEEMEQLLADKGRQYRVEY
354




WTHLNYKPTFEEFRDNALPTSGYAMLAIT
355




TFEWAASDPKIIKASTIICRFMDDIAE
356




EDDCSAIECYMEQYKVTAQEAYDEFNKHIESSWK
357




DVNEEFLK









Example 2. Materials and Methods for Construction of Terpene Synthase Chimeras
Terpene Synthases for Capture-Seq and Chimera Scaffolding

Candidate sesquiterpene synthases (SQTSs) were designed by combining sequence fragments from rare flower genomes (Table 11) with “scaffold” SQTSs from sources including UniProt and GenBank.


For Capture-seq (targeted sequencing of terpene synthases), a subset of 5,171 terpene synthases (TPSs) were compiled from UniProt that had nucleotide sequences in EMBL/Genbank. Oligonucleotide chips were generated for enriching the flower DNA samples for TPS-homologous sequences, and then subjected first to Illumina sequencing. The Capture-seq libraries were also sequenced a second time at higher depth.


For SQTS chimera reconstruction, sequences closer to annotated SQTSs than annotated mono-, di-, or tri-terpene synthases were selected. This set of 1,521 putative SQTSs were used (in both nucleotide and peptide form) as query sequences for blastn and tblastn in the chimera construction pipeline below.


Chimera Reconstruction

Two methods were used for constructing chimeric SQTSs: 1) the blastn-mapDamage pipeline, and 2) the tblastn pipeline.


Blastn-Mapdamage Pipeline

Generally, the blastn-mapdamage pipeline conservatively detects fragments with high nucleotide similarity to the scaffolds resulting in chimeric terpene synthases (e.g., chimeric sesquiterpene synthases) that are likely very close to the original enzyme sequences in the rare flowers. To detect mutations that may be artifacts of stereotypical rare DNA damage, bam-formatted Illumina read alignments were inputted into mapDamage software.


Specifically, the following steps were used to generate alignments of DNA fragments from each flower to various SQTS scaffolds:

    • 1. Illumina reads (fastq files) from genomic capture-seq runs were combined and assembled by SPADES into longer contigs.
    • 2. The 1521-set of SQTS scaffolds were used as queries in a blastn search with default parameters against the SPADES contigs. Relatively few scaffolds had hits, so all of the scaffolds with hits were chosen to serve as references for read alignment in the next step.
    • 3. Combined reads from the sequencing runs were quality-trimmed (using bbduk) and pair-merged (using bbmerge) and aligned to chosen SQTS reference sequences using bwa mem. Results were reformatted to bam, sorted, and indexed.
    • 4. mapDamage was run on the aligned reads. This resulted in a read alignment where SNPs resembling DNA damage were assigned low quality scores.
    • 5. Read alignments were processed as follows: bases with quality <25 were masked (changed to the reference); alignments were reformatted to fasta; SNPs with counts <6 were masked; duplicate reads were removed; SNPs with frequency <0.1 were masked; reads that were exact subsequences of other reads were removed; reads were translated in the frame of the reference; and subsequences were removed again. The quality and SNP frequency thresholds used for masking the alignment were determined empirically by looking at distributions of quality and SNP frequency.
    • 6. Read alignments and SPADE contig alignments (after reference-frame translation) were combined and realigned using Clustal Omega. This was done because some contigs spanned regions of the scaffolds that the reads did not.


      The alignments from the above steps were used to construct SQTS chimeras as follows:
    • 1. The alignment was split into “independent subregions” such that each subregion did not contain any fragment (aligned read) overlapping with and differing from a fragment from another subregion (identical overlaps were allowed between subregions).
    • 2. In each subregion, all possible combinations of “compatible fragments” were enumerated. Compatible fragments were defined as fragments that either overlapped identically (and therefore could be merged into a longer fragment) or did not overlap at all (and, e.g., were assumed to come from the same haplotype). Fragment combinations were “max-coverage”—that is, contained as many compatible fragments as possible. Each max-coverage fragment combination was considered to be a possible reconstruction of that region of the alignment, and was merged into a superfragment (which may have contained gaps) and saved.
    • 3. Superfragments from each subregion were downsampled to 90% or 95% identity using a custom, iterative algorithm, and all possible combinations of downsampled superfragments from different subregions were combined. Regions that were shorter than a certain threshold are downsampled to a single sequence. Each combination of superfragments was merged into the scaffold to generate a chimera sequence. The downsampling parameters were varied slightly varied according to the sample and scaffold to allow >1 but <100 chimeras to be constructed in each case.


After running the above pipeline on each sample, a total of 1136 chimeras were generated. A significant fraction of the chimeras were constructed purely from aligned reads.


A total of 652 sesquiterpene synthase chimeras were created using these methods.


tblastn Pipeline


Generally, the tblastn pipeline maximized the sensitivity of detecting fragments homologous to the SQTS scaffolds, and therefore cast a wide net for potentially usable sequences.


Specifically, the following steps were used to generate alignments of DNA fragments from each flower to various SQTS scaffolds:

    • 1. The 1521-set of SQTS scaffolds were used as protein queries to tblastn to search all-frames translations of the SPADES contigs (described above).
    • 2. Hits (aligned contigs) were filtered to a minimum of 40% identity to the scaffold and a minimum length that depends on hit identity by a heuristic function. The filtering criteria were chosen by inspecting plots of hit length versus identity across all samples.
    • 3. Downsampling scaffolds was performed by hierarchically clustering the scaffolds by the number of identical residues to each hit. The scaffold in each cluster with the greatest number of identities across all of its hits was kept for chimera reconstruction. Downsampling reduced the number of scaffolds by 20-fold. This step was skipped for samples in which fewer than 10 scaffolds have hits.
    • 4. Certain scaffolds were always chosen as a cluster representation because they were previously identified as having activity and/or were known in the literature (even if another sequence had more identities to hits). These preferred scaffolds were not downsampled, and tblastn hits were kept for chimera construction.
    • 5. The aligned portions of all contigs hitting a scaffold were realigned to the scaffold using Clustal Omega. Unaligned portions of contigs were discarded as likely representing introns. This alignment was then used for chimera construction.
    • 6. Chimeras were constructed from aligned tblastn hits using the combinatorial compatible fragments method described above without downsampling in subregions. Both “max-coverage” (as many as possible compatible fragments in each set) and “min-coverage” (only one compatible fragment in each set) chimeras were generated. The min-coverage chimeras may avoid combining fragments from unrelated sequences.


      The tblastn pipeline yielded 10,114 “max-coverage” chimeras and 2,624 “min-coverage” chimeras. Certain max-coverage chimeras were downsampled to 95% identity by CD-HIT. This resulted in 388 sequences (382 after removing sequences with ambiguous amino acids). Certain max coverage chimeras were filtered to a minimum rare DNA content of 60% and downsampled to 90% identity. This resulted in 1320 sequences. Certain min-coverage chimeras were filtered to a minimum rare DNA content of 10% and downsampled to 95% identity by CD-HIT.


Encoding and Synthesis Order

Each enzyme was codon-optimized twice: once using a yeast expression-weighted codon table, and once using a yeast expression-weighted codon table after removing codons with <10% frequency. A different random number was used as the seed for each encoding. Encodings for different enzymes were completely independent—no specific procedure was used to preserve codons at residues inherited by chimeras from scaffolds.


Sequences encoding the chimeric enzymes were cloned into the pESC-URA3 screening vector, driven by pGAL1 and terminated by tCYC1.


Chimera Reconstruction Aided by Extant Transcriptome

For one of the extinct flower species, Shorea cuspidata, transcriptome sequencing data was available on an extant relative Shorea beccariana. This made it possible to construct chimeras using SQTS scaffolds from a related flower. This was done in a 2-step process:

    • 1. The S. beccariana (Sb) transcriptome data were assembled and mined for SQTS homologs. The data were downloaded from the data set SRR687302 from the NCBI SRA database. Assembly was done using Trinity, and ORFs were predicted via Transdecoder. BLAST was used to identify fragments homologous to a set of 1,500 curated SQTS sequences.
    • 2. The identified Sb SQTSs or SQTS fragments were used as scaffold sequences in either the tblastn or blastn-mapDamage pipelines to reconstruct chimeras. If the scaffold was a fragment itself, it was in turn merged into the closest Uniprot-sourced SQTS sequence to generate a full-length chimera.


Screening Strain and Sesquiterpene Synthase Transformation

The chimeric sesquiterpene synthases were transformed into high copy pESC-URA3-derived expression vectors under the control of the galactose-inducible P(gal1) promoter (Sikorski et al., A system of shuttle vectors and yeast host strains designed for efficient manipulation of DNA in Saccharomyces cerevisiae. Genetics. 1989 May; 122(1):19-27, which is hereby incorporated by reference in its entirety for this purpose).


These vectors were transformed into a haploid Saccharomyces cerevisiae CEN.PK2 strain (MATa ura3-52 trp1-289 leu2-3_112 his3Δ1 MAL2-8C SUC2) that had been modified to increase sesquiterpene flux via integration of two copies of the catalytic region of HMG-CoA reductase 1 under control of convergent P(gal1) promoters at the homothallic switching endonuclease (YDL227C) locus on chromosome 4 (see SEQ ID NO: 53 shown below). See: Entian et al., Yeast Genetic Strain and Plasmid Collections. Methods in Microbiology. 2007; (36): 629-666; tHMG1, Donald et al., Effects of overproduction of the catalytic domain of 3-hydroxy-3-methylglutaryl coenzyme A reductase on squalene synthesis in Saccharomyces cerevisiae. Appl Environ Microbiol. 1997 September; 63(9):3341-4; Özaydin et al., Carotenoid-based phenotypic screen of the yeast deletion collection reveals new genes with roles in isoprenoid production. Metab Eng. 2013 January; 15:174-83, each of which is hereby incorporated by reference in its entirety). Competition for fanesyl pyrophosphate was reduced in these cells by replacing the Erg9 (Farnesyl-diphosphate farnesyl transferase) promoter with the methionine-repressible Met3 promoter as shown below in SEQ ID NO: 54 and incubating in media containing methionine (see: Ro et al., Production of the antimalarial drug precursor artemisinic acid in engineered yeast. Nature. 2006 Apr. 13; 440(7086):940-3; and Asadollahi et al., Production of plant sesquiterpenes in Saccharomyces cerevisiae: effect of ERGS repression on sesquiterpene biosynthesis. Biotechnol Bioeng. 2008 Feb. 15; 99(3):666-77, each of which is hereby incorporated by reference in its entirety for this purpose). This strain with downregulated Erg9 and containing two copies of galactose-inducible tHMG1 on chromosome 4 was designated t119889.


The transformation of the chimeric sesquiterpene vectors into strain t119889 was performed employing the chemical transformation techniques demonstrated in Gietz et al., Yeast transformation by the LiAc/SS Carrier DNA/PEG method. Methods Mol Biol. 2006; 313:107-20, which is hereby incorporated by reference in its entirety for this purpose.


Sesquiterpene Production and Extraction

Transformant colonies were inoculated into 300 μl of SC-ura medium (Synthetic Complete with 2% dextrose, no uracil added) in 96 deep well plates. The plates were covered with Excel Scientific AeroSeal membranes (BS-25) and incubated for 48 hours at 30° C. in a shaking incubator. 30 μl of the cultures (1:15 dilution) were mixed into 420 μl of SC-ura induction medium containing 1.8% galactose and 0.2% raffinose as the carbon sources, yielding a starting optical density at 600 nm (OD600) of approximately 0.1-0.2. A 0.88% dodecane overlay (4 μl) was added to each well and the plates were covered with AeroSeal membranes and incubated at 30° C. in a shaking incubator for four days. 15 μl of each culture was removed to measure OD600 at the end of the four days. 350 μl of ethyl acetate (250 μM tridecane internal) was added to directly to each well and mixed (1:1 Extraction). The 96-well plates were then centrifuged and the ethyl acetate extractions were stored at −80° C. in glass vials until analysis by GC-MS.


Sesquiterpene Structure Identification

Ethyl Acetate samples (1.0 uL) were injected into the Agilent/Gerstel 7890B GC System, where the GC inlet was set to 250 C with a split ratio of 2:1. The capillary column was an Agilent DB-5MS (20m×0.18 mm×0.18 μm) with carrier gas (helium) flow set to 1.5 ml/min. The GC oven temperature was set to 100° C. (hold for 0.10 min) with a ramp of 40° C./min to 155° C., where the ramp was then 15° C./min to 190° C. and then finally the ramp was changed to 75° C./min to 280 C (5-minute method). For a more comprehensive analysis of targets, the GC oven temperature was set to 100° C. (hold for 2.0 min) with a 10° C./min ramp to 250° C. (hold for 2.0 min) was utilized (20-minute method). The MS source and quadruple for both methods were set to 230° C. and 180° C. on the Agilent 5977B MSD (Etune), respectively. The mass scan range was set to 40-250 mz where spectra and linear retention index calculations were matched against the NIST MS database (2008 version), in addition to available standards and essential oils.


Peaks present in the extracted ion chromatogram (204.2 mz parent mass) were identified in one of six ways (see Table 3). The authentic standards utilized in this screen for verification of products were beta-caryophyllene (Sigma-Aldrich catalog #W225207-SAMPLE-K), beta-farnesene (Sigma-Aldrich catalog #73492-1ML-F), trans-nerolidol (Sigma-Aldrich catalog #18143-100MG-F), and alpha-humulene (Sigma-Aldrich catalog #53675-1ML). Sesquitperene rich essential oils used to aid structure identification were derived from the following plants: Rhodendron, Sweet Basil, Black Pepper, Citronella, Ylang, Balsam copaiba, and Patchouli.










ΔHO(YDL227C)::2xP(gal)-tHMG1 integration on chromosome 4.



(SEQ ID NO: 53)




AGGGTTCGCAAGTCCTGTTTCTATGCCTTTCTCTTAGTAATTCACGAAATAAACCT








ATGGTTTACGAAATGATCCACGAAAATCATGTTATTATTTACATCAACATATCGCG







AAAATTCATGTCATGTCCACATTAACATCATTGCAGAGCAACAATTCATTTTCATAG







AGAAATTTGCTACTATCACCCACTAGTACTACCATTGGTACCTACTACTTTGAATTG







TACTACCGCTGGGCGTTATTAGGTGTGAAACCACGAAAAGTTCACCATAACTTCGA







ATAAAGTCGCGGAAAAAAGTAAACAGCTATTGCTACTCAAATGAGGTTTGCAGAAG







CTTGTTGAAGCATGATGAAGCGTTCTAAACGCACTATTCATCATTAAATATTTAAA







GCTCATAAAATTGTATTCAATTCCTATTCTAAATGGCTTTTATTTCTATTACAACTA







TTAGCTCGATGCACGAGCGCAACGCTCACAACGCTCGTCCAACGCCGGCGGACCTACG








GATTAGAGCCGCCGAGCGGGTGACAGCCCTCCGAAGGAAGACTCTCCTCCGTGCGTCCTCG









TCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAA









AGATTCTACAATACTAGCTTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACA









AACCTTCAAATGAACGAATCAAATTAACAACCATAGGATGATAATGCGATTAGTTTTTTAGCCTT









ATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGCAAA








AACTGCATAACCACTTTAACTAATACTTTCAACATTTTCGGTTTGTATTACTTCTTATTCAAATGT








AATAAAAGTATCAACAAAAAATTGTTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACT
A






TAATGGCTGCAGACCAATTGGTGAAGACTGAAGTCACCAAGAAGTCTTTTACTGCT







CCTGTACAAAAGGCTTCTACACCAGTTTTAACCAATAAAACAGTCATTTCTGGATC









GAAAGTCAAAAGTTTATCATCTGCGCAATCGAGCTCATCAGGACCTTCATCATCTA









GTGAGGAAGATGATTCCCGCGATATTGAAAGCTTGGATAAGAAAATACGTCCTTTA









GAAGAATTAGAAGCATTATTAAGTAGTGGAAATACAAAACAATTGAAGAACAAAGA









GGTCGCTGCCTTGGTTATTCACGGTAAGTTACCTTTGTACGCTTTGGAGAAAAAAT









TAGGTGATACTACGAGAGCGGTTGCGGTACGTAGGAAGGCTCTTTCAATTTTGGC









AGAAGCTCCTGTATTAGCATCTGATCGTTTACCATATAAAAATTATGACTACGACC









GCGTATTTGGCGCTTGTTGTGAAAATGTTATAGGTTACATGCCTTTGCCCGTTGGT









GTTATAGGCCCCTTGGTTATCGATGGTACATCTTATCATATACCAATGGCAACTAC









AGAGGGTTGTTTGGTAGCTTCTGCCATGCGTGGCTGTAAGGCAATCAATGCTGGC









GGTGGTGCAACAACTGTTTTAACTAAGGATGGTATGACAAGAGGCCCAGTAGTCC









GTTTCCCAACTTTGAAAAGATCTGGTGCCTGTAAGATATGGTTAGACTCAGAAGAG









GGACAAAACGCAATTAAAAAAGCTTTTAACTCTACATCAAGATTTGCACGTCTGCA









ACATATTCAAACTTGTCTAGCAGGAGATTTACTCTTCATGAGATTTAGAACAACTA









CTGGTGACGCAATGGGTATGAATATGATTTCTAAGGGTGTCGAATACTCATTAAAG









CAAATGGTAGAAGAGTATGGCTGGGAAGATATGGAGGTTGTCTCCGTTTCTGGTA









ACTACTGTACCGACAAAAAACCAGCTGCCATCAACTGGATCGAAGGTCGTGGTAA









GAGTGTCGTCGCAGAAGCTACTATTCCTGGTGATGTTGTCAGAAAAGTGTTAAAAA









GTGATGTTTCCGCATTGGTTGAGTTGAACATTGCTAAGAATTTGGTTGGATCTGCA









ATGGCTGGGTCTGTTGGTGGATTTAACGCACATGCAGCTAATTTAGTGACAGCTGT









TTTCTTGGCATTAGGACAAGATCCTGCACAAAATGTCGAAAGTTCCAACTGTATAA









CATTGATGAAAGAAGTGGACGGTGATTTGAGAATTTCCGTATCCATGCCATCCATC









GAAGTAGGTACCATCGGTGGTGGTACTGTTCTAGAACCACAAGGTGCCATGTTGG









ACTTATTAGGTGTAAGAGGCCCACATGCTACCGCTCCTGGTACCAACGCACGTCAA









TTAGCAAGAATAGTTGCCTGTGCCGTCTTGGCAGGTGAATTATCCTTATGTGCTGC









CCTAGCAGCCGGCCATTTGGTTCAAAGTTATATGACCCACAACAGGAAACCTGCTG









AACCAACAAAACCTAACAATTTGGACGCCACTGATATAAATCGTTTGAAAGATGGG









TCCGTCACCTGCATTAAATCCTAA
GCTAGCTAcustom-charactercustom-character







custom-character
custom-character
custom-character
custom-character
custom-character







custom-character
custom-character
custom-character
custom-character
custom-character







custom-character
custom-character
custom-character
custom-character CGGCCGTACG






AAAATCGTTATTGTCTTGAAGGTGAAATTTCTACTCTTATTAATGGTGAACGTTAAGCTG





ATGCTATGATGGAAGCTGATTGGTCTTAACTTGCTTGTCATCTTGCTAATGGTCATATGG





CTCGTGTTATTACTTAAGTTATTTGTACTCGTTTTGAACGTAATGCTAATGATCATCTTAT





GGAATAATAGTGAACGGCCGcustom-charactercustom-charactercustom-charactercustom-character






custom-character
custom-character
custom-character
custom-character
custom-character







custom-character
custom-character
custom-character
custom-character
custom-character







custom-character TAGCTAGCttaggatttaatgcaggtgacggacccatctttcaaa








cgatttatatcagtggcgtccaaattgttaggttttgttggttcagcaggtttcctgttgtgggtcatataactttgaac









caaatggccggctgctagggcagcacataaggataattcacctgccaagacggcacaggcaactattcttgctaattgac









gtgcgttggtaccaggagcggtagcatgtgggcctcttacacctaataagtccaacatggcaccttgtggttctagaaca









gtaccaccaccgatggtacctacttcgatggatggcatggatacggaaattctcaaatcaccgtccacttctttcatcaa









tgttatacagttggaactttcgacattttgtgcaggatcttgtcctaatgccaagaaaacagctgtcactaaattagctg









catgtgcgttaaatccaccaacagacccagccattgcagatccaaccaaattcttagcaatgttcaactcaaccaatttg









gaaacatcactttttaacacttttctgacaacatcaccaggaatagtagcttctgcgacgacactcttaccacgaccttc









gatccagttgatggcagctggttttttgtcggtacagtagttaccagaaacggagacaacctccatatcttcccagccat









actcttctaccatttgctttaatgagtattcgacacccttagaaatcatattcatacccattgcgtcaccagtagttgtt









ctaaatctcatgaagagtaaatctcctgctagacaagtttgaatatgttgcagacgtgcaaatcttgatgtagagttaaa









agcttttttaattgcgttttgtccctcttctgagtctaaccatatcttacaggcaccagatcttttcaaagttgggaaac









ggactactgggcctcttgtcataccatccttagttaaaacagttgttgcaccaccgccagcattgattgccttacagcca









cgcatggcagaagctaccaaacaaccctctgtagttgccattggtatatgataagatgtaccatcgataaccaaggggcc









tataacaccaacgggcaaaggcatgtaacctataacattttcacaacaagcgccaaatacgcggtcgtagtcataatttt









tatatggtaaacgatcagatgctaatacaggagcttctgccaaaattgaaagagccttcctacgtaccgcaaccgctctc









gtagtatcacctaattttttctccaaagcgtacaaaggtaacttaccgtgaataaccaaggcagcgacctctttgttctt









caattgttttgtatttccactacttaataatgcttctaattcttctaaaggacgtattttcttatccaagctttcaatat









cgcgggaatcatcttcctcactagatgatgaaggtcctgatgagctcgattgcgcagatgataaacttttgactttcgat









ccagaaatgactgttttattggttaaaactggtgtagaagccttttgtacaggagcagtaaaagacttcttggtgacttc









agtcttcaccaattggtctgcagccat
TATagttttttctccttgacgttaaagtatagaggtatattaacaattttttg








ttgatacttttattacatttgaataagaagtaatacaaaccgaaaatgttgaaagtattagttaaagtggttatgcagtt









tttgcatttatatatctgttaatagatcaaaaatcatcgcttcgctgattaattaccccagaaataaggctaaaaaacta









atcgcattatcatcctatggttgttaatttgattcgttcatttgaaggtttgtggggccaggttactgccaatttttcct









cttcataaccataaaagctagtattgtagaatctttattgttcggagcagtgcggcgcgaggcacatctgcgtttcagga









acgcgaccggtgaagacgaggacgcacggaggagagtcttccttcggagggctgtcacccgctcggcggcttctaatccg









t
AGGTCCGCCGGCGTTGGACGAGCGTTGTGAGCGTTGCGCTCGTGCATCaatgtgtatattagtttaaaaagttgtatgt







aataaaagtaaaatttaatattttggatgaaaaaaaccatttttagactttttcttaactagaatgctggagtagaaata







cgccatctcaagatacaaaaagcgttaccggcactgatttgtttcaaccagtatatagattattattgggtcttgatcaa






ctttcctcagacatatcagtaacagttatcaagctaaatatttacgcgaaagaaaaacaaatattttaattgtgatactt






gtgaattttattttattaaggatacaaagttaagagaaaacaaaatttatatacaatataagtaatattcatatatatgt







gatgaatgcagtcttaacgagaagacatggccttggtgacaactctcttcaaaccaacttcagcctttctcaattcatca







gcagatgggtcttcgatttgcaaagcagcca






Upper case, bold: HO upstream homology sequence (SEQ ID NO: 56)





Upper case, italicized and underlined: P(gal1) (SEQ ID NO: 57)





Upper case, underlined and bold: tHMG1 (SEQ ID NO: 58)





Upper case, bold and italicized: CYC1 terminator (SEQ ID NO: 59)





Lower case, bold and italicized: CYC1 terminator, reverse complement


(SEQ ID NO: 60)





Lower case, underlined and bold: tHMG1, reverse complement (SEQ ID NO: 61)





Lower case, italicized and underlined: P(gal1), reverse complement


(SEQ ID NO: 62)





Lower case, bold: HO downstream homology sequence (SEQ ID NO: 63)





P(met3) integration upstream of Erg9 with flanking genes included.


(SEQ ID NO: 54)





ATGTCCGGTAAATGGAGACTAGTGCTGACTGGGATAGGCAATCCAGAGCCTCAGT










ACGCTGGCACCCGTCACAATGTAGGGCTATATATGCTGGAGCTGCTACGAAAGCG









GCTTGGTCTGCAGGGGAGAACCTATTCCCCTGTGCCTAATACGGGCGGCAAAGTG









CATTATATAGAAGACGAACATTGTACGATACTAAGATCGGATGGCCAGTACATGAA









TCTAAGTGGAGAACAGGTGTGCAAGGTCTGGGCCCGGTACGCCAAGTACCAAGCC









CGACACGTTGTTATTCATGACGAGTTAAGTGTGGCGTGTGGAAAAGTGCAGCTCA









GAGCCCCCAGCACCAGTATTAGAGGTCATAATGGGCTGCGAAGTCTACTGAAATG









CTCCGGAGGCCGTGTACCCTTTGCCAAATTGGCTATTGGAATCGGCAGAGAACCT









GGGTCCCGCTCTAGAGACCCTGCGAGCGTCTCCCGCTGGGTTCTGGGAGCTCTAA









CTCCGCAGGAACTACAAACCTTGCTTACACAGAGTGAACCTGCTGCCTGGCGTGCT









CTGACTCAGTACATTTCATAG
GTTTAACTTGATACTACTAGATTTTTTCTCTTCATTTAT






AAAATTTTTGGTTATAATTGAAGCTTTAGAAGTATGAAAAAATCCTTTTTTTTCATTCTTT





GCAACCAAAATAAGAAGCTTCTTTTATTCATTGAAATGATGAATATAAACCTAACAAAA





GAAAAAGACTCGAATATCAAACATTAAAAAAAAATAAAAGAGGTTATCTGTTTTCCCAT





TTAGTTGGAGTTTGCATTTTCTAATAGATAGAACTCTCAATTAATGTGGATTTAGTTTCT





CTGTTCGTTTTTTTTTGTTTTGTTCTCACTGTATTTACATTTCTATTTAGTATTTAGTTATT





CATATAATCTTAACTTCTCGAGGAGCTCGATCTTGAAACTGAGTAAGATGCTCAGAATA






CCCGTCAAGATAAGAGTATAATGTAGAGTAATATACCAAGTATTCAGCATATTCTCCTC







TTCTTTTGTATAAATCACGGAAGGGATGATTTATAAGAAAAATGAATACTATTACACTT







CATTTACCACCCTCTGATCTAGATTTTCCAACGATATGTACGTAGTGGTATAAGGTGAGG







GGGTCCACAGATATAACATCGTTTAATTTAGTACTAACAGAGACTTTTGTCACAACTAC







ATATAAGTGTACAAATATAGTACAGATATGACACACTTGTAGCGCCAACGCGCATCCTA







CGGATTGCTGACAGAAAAAAAGGTCACGTGACCAGAAAAGTCACGTGTAATTTTGTAA







CTCACCGCATTCTAGCGGTCCCTGTCGTGCACACTGCACTCAACACCATAAACCTTAGC







AACCTCCAAAGGAAATCACCGTATAACAAAGCCACAGTTTTACAACTTAGTCTCTTATG







AAGTGTCTCTCTCTGTCGTAACAGTTGTGATATCGGAAGAAGAGAAAAGACGAAGAGC






AGAAGCGGAAAACGTATACACGTCACATATCACACACACACAatgggaaagctattacaattggcat







tgcatccggtcgagatgaaggcagctttgaagctgaagttttgcagaacaccgctattctccatctatgatcagtccacg









tctccatatctcttgcactgtttcgaactgttgaacttgacctccagatcgtttgctgctgtgatcagagagctgcatcc









agaattgagaaactgtgttactctcttttatttgattttaagggctttggataccatcgaagacgatatgtccatcgaac









acgatttgaaaattgacttgttgcgtcacttccacgagaaattgttgttaactaaatggagtttcgacggaaatgccccc









gatgtgaaggacagagccgttttgacagatttcgaatcgattcttattgaattccacaaattgaaaccagaatatcaaga









agtcatcaaggagatcaccgagaaaatgggtaatggtatggccgactacatcttagatgaaaattacaacttgaatgggt









tgcaaaccgtccacgactacgacgtgtactgtcactacgtagctggtttggtcggtgatggtttgacccgtttgattgtc









attgccaagtttgccaacgaatctttgtattctaatgagcaattgtatgaaagcatgggtcttttcctacaaaaaaccaa









catcatcagagattacaatgaagatttggtcgatggtagatccttctggcccaaggaaatctggtcacaatacgctcctc









agttgaaggacttcatgaaacctgaaaacgaacaactggggttggactgtataaaccacctcgtcttaaacgcattgagt









catgttatcgatgtgttgacttatttggccggtatccacgagcaatccactttccaattttgtgccattccccaagttat









ggccattgcaaccttggctttggtattcaacaaccgtgaagtgctacatggcaatgtaaagattcgtaagggtactacct









gctatttaattttgaaatcaaggactttgcgtggctgtgtcgagatttttgactattacttacgtgatatcaaatctaaa









ttggctgtgcaagatccaaatttcttaaaattgaacattcaaatctccaagatcgaacagtttatggaagaaatgtacca









ggataaattacctcctaacgtgaagccaaatgaaactccaattttcttgaaagttaaagaaagatccagatacgatgatg









aattggttccaacccaacaagaagaagagtacaagttcaatatggttttatctatcatcttgtccgttcttcttgggttt









tattatatatacactttacacagagcgtga







Uppercase, bold and underlined: Upstream sequence PTH1 (YHR189W) (SEQ ID NO: 64)





Uppercase and underlined: P(met3) (SEQ ID NO: 65)





Lowercase, bold and underlined: Erg9 (YHR190W) (SEQ ID NO: 66)






EQUIVALENTS

Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments of the invention described herein. Such equivalents are intended to be encompassed by the following claims.


All references, including patent documents, disclosed herein are incorporated by reference in their entirety, particularly for the disclosure referenced herein.

Claims
  • 1-41. (canceled)
  • 42. A method for producing one or more terpenes, comprising: culturing a host cell that comprises a nucleic acid molecule encoding a chimeric terpene synthase, wherein the host cell produces one or more of the following terpenes: alpha-guaiene, delta-cadinene, cis-eudesm-6-en-11-ol, beta-caryophyllene, humulene, and/or alpha-cadinol.
  • 43. The method of claim 42, wherein the chimeric terpene synthase comprises sequences from at least two terpene synthases and wherein at least one of the terpene synthases is a plant terpene synthase.
  • 44. The method of claim 43, wherein the plant is selected from the group consisting of: Hibiscadelphus wilderianus, Leucadendron grandiflorum, Macrostylis villosa, Orbexilum stipulatum, Shorea cuspidata, and Wendlandia angustifolia.
  • 45. The method of claim 42, wherein the chimeric terpene synthase is an alpha-guaiene synthase that comprises an amino acid sequence that is at least 95% identical to SEQ ID NO: 17, 22, or 29.
  • 46. The method of claim 45, wherein the alpha-guaiene synthase includes one or more sequences derived from a plant terpene synthase selected from the group consisting of SEQ ID NOs: 178, 183, and 211-214.
  • 47. The method of claim 42, wherein the chimeric terpene synthase produces delta-cadinene and wherein the chimeric terpene synthase comprises an amino acid sequence that is at least 90% identical to SEQ ID NOs: 40, 41, 42, 44, 45, or 47.
  • 48. The method of claim 47, wherein the chimeric terpene synthase includes one or more sequences derived from a plant terpene synthase selected from the group consisting of SEQ ID NOs: 261, 262, 263, 264, 265, 267, 268, 271, 275, 276, 279, 296, 301, 306, 307, 308, 324, and 325.
  • 49. The method of claim 42, wherein the chimeric terpene synthase produces cis-eudesm-6-en-11-ol and wherein the chimeric terpene synthase comprises an amino acid sequence that is at least 90% identical to SEQ ID NOs: 36, 37, 43, 46, 48, 49, 50, or 51.
  • 50. The method of claim 49, wherein the chimeric terpene synthase includes one or more sequences derived from a plant terpene synthase selected from the group consisting of SEQ ID NOs: 245, 246, 247, 248, 249, 255, 256, 257, 258, 290, 291, 292, 329, 337, 338, 343, and 349.
  • 51. The method of claim 42, wherein the chimeric terpene synthase produces beta-caryophyllene and/or humulene and wherein the chimeric terpene synthase comprises an amino acid sequence that is at least 90% identical to SEQ ID NO: 23, 24, 25, or 26.
  • 52. The method of claim 51, wherein the chimeric terpene synthase includes one or more sequences derived from a plant terpene synthase selected from the group consisting of SEQ ID NOs: 184, 185, 186, 187, 188, 190, 191, 194, 195, 196, and 197.
  • 53. The method of claim 42, wherein the chimeric terpene synthase produces alpha-cadinol and wherein the chimeric terpene synthase comprises an amino acid sequence that is at least 90% identical to any one of SEQ ID NOs: 30-35.
  • 54. The method of claim 53, wherein the chimeric terpene synthase includes one or more sequences derived from a plant terpene synthase selected from the group consisting of SEQ ID NOs: 215, 216, 217, 218, 219, 221, 222, 223, 228, 229, 234, and 237.
  • 55. The method of claim 42, wherein the chimeric terpene synthase produces delta-cadinene and wherein the chimeric terpene synthase comprises an amino acid sequence that is at least 98% identical to SEQ ID NOs: 1, 3, 4, 5, 6, 7, or 12.
  • 56. The method of claim 55, wherein the chimeric terpene synthase includes one or more sequences derived from a plant terpene synthase selected from the group consisting of SEQ ID NOs: 119, 120, 121, 122, 123, 124, 125, 139, 140, 141, and 172.
  • 57. The method of claim 42, wherein the chimeric terpene synthase produces delta-cadinene and wherein the chimeric terpene synthase comprises an amino acid sequence that is at least 97% identical to SEQ ID NOs: 11, 18, or 19.
  • 58. The method of claim 57, wherein the chimeric terpene synthase includes one or more sequences derived from a plant terpene synthase selected from the group consisting of SEQ ID NOs: 171, 179, and 180.
  • 59. The method of claim 42, wherein the host cell is a fungal cell, plant cell, or a bacterial cell.
  • 60. The method of claim 42 further comprising extracting the one or more terpenes.
  • 61. The method of claim 42, wherein at least one of the one or more terpenes is an aroma compound.
  • 62. A host cell comprising a nucleic acid molecule encoding a chimeric terpene synthase, wherein at least 10% of the amino acid sequence of the chimeric terpene synthase is derived from an extinct plant.
  • 63. A method for producing an aroma compound comprising culturing a host cell that comprises a nucleic acid molecule encoding a chimeric terpene synthase, wherein the chimeric terpene synthase comprises sequences from at least two terpene synthases and wherein at least one of the terpene synthases is from an extinct plant.
RELATED APPLICATIONS

This application is a national stage filing under 35 U.S.C § 371 of international application PCT/US2019/018122, entitled “CHIMERIC TERPENE SYNTHASES,” filed Feb. 14, 2019, which was published under PCT Article 21(2) in English and which claims the benefit under 35 U.S.C. § 119(e) of U.S. Provisional Application Ser. No. 62/630,640, entitled “CHIMERIC TERPENE SYNTHASES” filed on Feb. 14, 2018, the entire disclosures of each of which are herein incorporated by reference in their entireties.

PCT Information
Filing Document Filing Date Country Kind
PCT/US2019/018122 2/14/2019 WO 00
Provisional Applications (1)
Number Date Country
62630640 Feb 2018 US