Methods for making high intensity sweeteners

Information

  • Patent Grant
  • 11060124
  • Patent Number
    11,060,124
  • Date Filed
    Wednesday, May 2, 2018
    6 years ago
  • Date Issued
    Tuesday, July 13, 2021
    3 years ago
Abstract
Provided herein include methods of making mogroside compounds, e.g., Compound 1, compositions (for example host cells) for making the mogroside compounds, and the mogroside compounds made by the methods disclosed herein, and compositions (for example, cell lysates) and recombinant cells comprising the mogroside compounds (e.g., Compound 1). Also provided herein are novel cucurbitadienol synthases and the use thereof.
Description
REFERENCE TO SEQUENCE LISTING

The present application is being filed along with a Sequence Listing in electronic format. The Sequence Listing is provided as a file entitled 2018-11-19 Substitute Seq Listing_SNMX.044A.TXT, created Nov. 19, 2018, which is 3.64 MB in size. The information in the electronic format of the Sequence Listing is incorporated herein by reference in its entirety.


BACKGROUND
Field

The present disclosure relates to methods, systems and compositions for producing sweet tasting compounds, as well as compositions comprising the sweet tasting compounds.


Background Description

The taste system provides sensory information about the chemical composition of the external world. Taste transduction is one of the most sophisticated forms of chemical-triggered sensation in animals. Signaling of taste is found throughout the animal kingdom, from simple metazoans to the most complex of vertebrates. Mammals are believed to have five basic taste modalities: sweet, bitter, sour, salty, and umami (the taste of monosodium glutamate, a.k.a. savory taste).


For centuries, various natural and unnatural compositions and/or compounds have been added to ingestible compositions, including foods and beverages, and/or orally administered medicinal compositions to improve their taste. Although it has long been known that there are only a few basic types of “tastes,” the biological and biochemical basis of taste perception was poorly understood, and most taste improving or taste modifying agents have been discovered largely by simple trial and error processes.


With respect to the sweet taste, diabetes, and cardiovascular disease are health concerns on the rise globally, but are growing at alarming rates in the United States. Sugar and calories are key components that can be limited to render a positive nutritional effect on health. High-intensity sweeteners can provide the sweetness of sugar, with various taste qualities. Because they are many times sweeter than sugar, much less of the sweetener is required to replace the sugar.


High-intensity sweeteners have a wide range of chemically distinct structures and hence possess varying properties, such as, without limitation, odor, flavor, mouthfeel, and aftertaste. These properties, particularly flavor and aftertaste, are well known to vary over the time of tasting, such that each temporal profile is sweetener-specific.


There has been significant recent progress in identifying useful natural flavoring agents, such as for example sweeteners such as sucrose, fructose, glucose, erythritol, isomalt, lactitol, mannitol, sorbitol, xylitol, certain known natural terpenoids, flavonoids, or protein sweeteners. See, e.g., Kinghom, et al., “Noncariogenic Intense Natural Sweeteners,” Med. Res. Rev. 18 (5) 347-360 (1998) (discussing discovered natural materials that are much more intensely sweet than common natural sweeteners such as sucrose, fructose, and the like.) Similarly, there has been recent progress in identifying and commercializing new artificial sweeteners, such as aspartame, saccharin, acesulfame-K, cyclamate, sucralose, and the like. See, e.g., Ager, et al., Angew. Chem. Int. Ed. 37, 1802-1817 (1998). The entire contents of the references identified above are hereby incorporated herein by reference in their entirety.


Sweeteners such as saccharin and 6-methyl-1,2,3-oxathiazin-4(3H)-one-2,2-dioxide potassium salt (acesulfame potassium) are commonly characterized as having bitter and/or metallic aftertastes. Products prepared with 2,4-dihydroxybenzoic acid are claimed to display reduced undesirable aftertastes associated with sweeteners, and do so at concentrations below those concentrations at which their own tastes are perceptible. Also, high intensity sweeteners such as sucralose and aspartame are reported to have sweetness delivery problems, i.e., delayed onset and lingering of sweetness. See S. G. Wiet, et al., J. Food Sci., 58(3):599-602, 666 (1993).


There is a need for new sweetening compounds, sweet taste enhancers, and compositions containing such compounds and enhancers, having improved taste and delivery characteristics. In addition, there is a need for foods containing new sweetening compounds and/or sweet taste enhancers with such desirable characteristics.


SUMMARY

Provided herein include a method of producing Compound 1 having the structure of:




embedded image


In some embodiments, the method comprises contacting mogroside IIIE with a first enzyme capable of catalyzing production of Compound 1 from mogroside IIIE. In some embodiments, contacting mogroside IIIE with the first enzyme comprises contacting mogroside IIIE with a recombinant host cell that comprises a first gene encoding the first enzyme. The first gene can be, for example, heterologous to the recombinant host cell.


In some embodiments, the mogroside IIIE contacts with the first enzyme in a recombinant host cell that comprises a first polynucleotide encoding the first enzyme. The mogroside IIIE can be, for example, provided to the recombinant cell, present in the recombinant host cell, produced by the recombinant host cell, or any combination thereof. In some embodiments, the method comprises cultivating the recombinant host cell in a culture medium under conditions in which the first enzyme is expressed. The first enzyme can be, for example, one or more of UDP glycosyltransferases, cyclomaltodextrin glucanotransferases (CGTases), glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases.


In some embodiments, the first enzyme is a CGTase. In some embodiments, the CGTase comprises an amino acid sequence having, or having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to the sequence of any one of SEQ ID NOs: 1, 3, 78-101, 148, and 154. In some embodiments, the CGTase comprises the amino acid sequence of any one of SEQ ID NOs: 1, 3, 78-101, 148, and 154. In some embodiments, the CGTase consists of the amino acid sequence of SEQ ID NOs: 1, 3, 78-101, 148, and 154.


In some embodiments, the first enzyme is a dextransucrase. For example, the dextransucrase can comprise an amino acid sequence having, or having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of the sequences set forth in SEQ ID NOs: 2, 103, 106-110, 156, 159-162, and 896. In some embodiments, the dextransucrase is encoded by a nucleic acid sequence having at least 70% sequence identity to any one of SEQ ID NOs: 104, 105, 157, 158, and 895.


In some embodiments, the first enzyme is a transglucosidase. For example, the transglucosidase can comprise an amino acid sequence having, or having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to the sequence of any one of SEQ ID NOs: 163-291 and 723. In some embodiments, the transglucosidase comprises an amino acid sequence of any one of SEQ ID NOs: 163-291 and 723.


In some embodiments, the first enzyme is a beta-glucosidase. For example, the beta-glucosidase can comprise an amino acid sequence having, or having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to the sequence set forth in any one of SEQ ID NOs: 102, 292, 354-374, and 678-741.


In some embodiments, the method comprises contacting mogroside IIA with an enzyme capable of catalyzing a production of mogroside IIIE from mogroside IIA. In some embodiments, contacting mogroside IIA with the enzyme comprises contacting the mogroside IIA with the recombinant host cell to produce mogroside IIIE, and wherein the recombinant host cell comprises a gene encoding the enzyme capable of catalyzing production of mogroside IIIE from mogroside IIA. The mogroside IIA can be, for example, provided to the recombinant host cell, produced by the recombinant host cell, present in the recombinant host cell, or any combination thereof. The enzyme capable of catalyzing the production of mogroside IIIE from mogroside IIA can be, for example, one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases.


In some embodiments, the second enzyme is a uridine diphosphate-glucosyl transferase (UGT). For example, the UGT can be UGT73C3 (SEQ ID NO: 4), UGT73C6 (SEQ ID NO: 5), UGT 85C2 (SEQ ID NO: 6), UGT73C5 (SEQ ID NO: 7), UGT73E1 (SEQ ID NO: 8), UGT98 (SEQ ID NO: 9 or 407), UGT1576 (SEQ ID NO:15), UGT SK98 (SEQ ID NO:16), UGT430 (SEQ ID NO:17), UGT1697 (SEQ ID NO:18), UGT11789 (SEQ ID NO:19). In some embodiments, the UGT comprises an amino acid sequence having, or having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 4-9, 15-19, 125, 126, 128, 129, 293-307, 407, 409, 411, 413, 439, 441 and 444. In some embodiments, the UGT is encoded by a nucleic acid sequence having, or having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of the sequences set forth in UGT1495 (SEQ ID NO:10), UGT1817 (SEQ ID NO: 11), UGT5914 (SEQ ID NO: 12), UGT8468 (SEQ ID NO:13), UGT10391 (SEQ ID NO:14), SEQ ID NOs: 116-124, 127, 130, 408, 410, 412, 414, 440, 442, 443, and 445.


In some embodiments, the method comprises contacting mogrol with one or more enzyme capable of catalyzing a production of mogroside IIIE and/or IE from mogrol. In some embodiments, contact mogrol with the one or more enzymes comprises contacting mogrol with the recombinant host cell to produce mogroside IIIE and/or mogroside IIE, wherein the recombinant host cell comprises one or more genes encoding one or more enzymes capable of catalyzing production of mogroside IIIE and/or mogroside IE from mogrol. The mogrol can be, for example, provided to the recombinant host cell, produced by the recombinant host cell, present in the recombinant host cell, or any combination thereof.


In some embodiments, at least one of the one or more enzymes capable of catalyzing production of mogroside IE and/or mogroside IIIE from mogrol comprise a sequence having, or having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of the sequences set forth in or is encoded by a sequence having, or having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of the sequences in SEQ ID NOs: 315, 316, 420, 422, 424, 426, 430, 431, 446, 871, 845-949, and 951-1012. In some embodiments, the one or more enzymes capable of catalyzing production of mogroside IIIE from mogrol comprises one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. In some embodiments, at least one of the one or more enzymes is a uridine diphosphate-glucosyltransferase (UGT). For example, the UGT can be UGT73C3, UGT73C6, 85C2, UGT73C5, UGT73E1, UGT98, UGT1495, UGT1817, UGT5914, UGT8468, UGT10391, UGT1576, UGT SK98, UGT430, UGT1697, or UGT11789, or comprises an amino acid sequence having, or having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 4-9, 15-19, 125, 126, 128, 129, 293-307, 405, 406, 407, 409, 411, 413, 439, 441, and 444.


In some embodiments, the method comprises contacting a mogroside compound with one or more enzymes capable of catalyzing a production of mogroside IIIE from a mogroside compound to produce mogroside IIIE, wherein the mogroside compound is one or more of mogroside IA1, mogroside IE1, mogroside IIA1, mogroside IIE, mogroside IIA, mogroside IIIA1, mogroside IIIA2, mogroside III, mogroside IV, mogroside IVA, mogroside V, and siamenoside. In some embodiments, contacting the mogroside compound with the one or more enzymes capable of catalyzing the product of mogroside IIIE from the mogroside compound comprises contacting the mogroside compound with the recombinant host cell, wherein the recombinant host cell comprises one or more genes encoding the one or more enzymes capable of catalyzing production of mogroside IIIE from the mogroside compound. The mogroside compound can be, for example, provided to the recombinant host cell, produced by the recombinant host cell, present in the recombinant host cell, and any combination thereof. In some embodiments, the one or more enzymes capable of catalyzing production of mogroside IIIE from the mogroside compound comprises one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. In some embodiments, the mogroside compound is mogroside IIE.


In some embodiments, the one or more enzymes capable of catalyzing production of Mogroside IIIE from the mogroside compound comprises an amino acid sequence having, or having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 1, 3, 78-101, 106-109, 147, 154, 163-303, 405, 411, 354-405, 447-723, 770, 776, and 782. In some embodiments, the mogroside compound is morgroside IIA or mogroside IIE. In some embodiments, contacting with one or more enzymes produces one or more of mogroside IIIA, mogroside IVE and mogroside V.


In some embodiments, the one or more enzymes comprises an amino acid having, or having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 304, 405, 411, 872, 874, 978, 880, 882, 884, 886, 888, 890, 892, 894 and 896. In some embodiments, the one or more enzymes is encoded by a sequence having, or having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 305, 406, 412, 873, 875, 877, 879, 881, 883, 885, 887, 889, 891, 893 and 895.


In some embodiments, the method comprises contacting mogroside IA1 with the recombinant host cell, wherein the recombinant host cell comprises a gene encoding UGT98 or UGT SK98 enzyme. In some embodiments, the UGT98 or UGT SK98 enzyme comprises an amino acid sequence having at least 70% sequence identity to SEQ ID NO: 9, 407, 16 or 306. In some embodiments, the UGT98 is encoded by a sequence set forth in SEQ ID NO: 307. In some embodiments, the contacting results in production of mogroside IIA in the cell.


In some embodiments, the method comprises contacting 11-hydroxy-24,25 epoxy cucurbitadienol with the recombinant host cell, wherein the recombinant host cell comprises a gene encoding an epoxide hydrolase. In some embodiments, the 11-hydroxy-24,25 epoxy cucurbitadienol is provided to the recombinant host cell, present in the recombinant host cell, produced by the recombinant host cell, and any combination thereof.


In some embodiments, the method comprises contacting 11-hydroxy cucurbitadienol with the recombinant host cell, wherein the recombinant host cell comprises a gene encoding a cytochrome P450 or an epoxide hydrolase. The 11-hydroxy cucurbitadienol can be provided to, produced by, and/or present in, the recombinant host cell.


In some embodiments, the method comprises contacting 3,24,25-trihydroxy cucurbitadienol with the recombinant host cell, wherein the recombinant host cell comprises a gene encoding a cytochrome P450. In some embodiments, the 3,24,25-trihydroxy cucurbitadienol is provided to the recombinant host cell, present in the recombinant host cell, produced by the recombinant host cell, or any combination thereof. In some embodiments, the contacting results in production of mogrol in the recombinant host cell. In some embodiments, the cytochrome P450 comprises an amino acid sequence having, or having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 20, 49, 308, 315, 430, 872, 874, 876, 878, 880, 882, 884, 886, 888, 889, and 892; or the cytochrome P450 is encoded by a nucleic acid sequence having, or having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 31-48, 316, 431, 871, 873, 875, 877, 879, 881, 883, 885, 887, 890, and 891.


In some embodiments, the epoxide hydrolase comprises an amino acid sequence having, or having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 21-30 and 309-314; or the epoxide hydrolase is encoded by a nucleic acid sequence having, or having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID Nos: 114 and 115.


In some embodiments, the method comprises contacting cucurbitadienol with the recombinant host cell, wherein the recombinant host cell comprises a gene encoding cytochrome P450. In some embodiments, the contacting results in production of 11-hydroxy cucurbitadienol. In some embodiments, the cucurbitadienol is provided to, produced by, and/or present in the recombinant host cell. In some embodiments, the cytochrome P450 is encoded by a nucleic acid sequence having, or having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 31-48, 316, 431, 871, 873, 875, 877, 879, 881, 883, 885, 887, 889, and 892. In some embodiments, the cytochrome P450 comprises an amino acid sequence having, or having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NOs: 20, 31, 49, 308, 315, 430, 872, 874, 876, 878, 880, 882, 884, 886, 888, 890, and 891. In some embodiments, the method further comprises contacting one or more of 2,3-oxidosqualene, dioxidosqualene and diepoxysqualene with the recombinant host cell, wherein the recombinant host cell comprises a gene encoding a polypeptide having cucurbitadienol synthase activity.


In some embodiments, the method comprises contacting a mogroside intermediate with the recombinant host cell, wherein the recombinant host cell comprises a gene encoding a polypeptide having cucurbitadienol synthase activity. In some embodiments, the polypeptide having cucurbitadienol synthase activity is a fusion protein comprising one or more fusion domain fused to a cucurbitadienol synthase. In some embodiments, the fusion protein comprises a fusion domain fused to the N-terminus, the C-terminus, or both of the cucurbitadienol synthase. In some embodiments, the fusion domain is about 3 to about 1000 amino acids long. In some embodiments, the fusion domain is about 5 to about 50 amino acids long. In some embodiments, the fusion domain is a substantial portion or the entire sequence of a functional protein.


In some embodiments, the fusion polypeptide having cucurbitadienol synthase activity comprises an amino acid sequence having, or having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NOs: 70-73, 75-77, 319, 321, 323, 325, 327-333, 417, 420, 422, 424, 426, 446, 902, 904 or 906. In some embodiments, the cucurbitadienol synthase is encoded by a nucleic acid sequence having, or having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NOs: 74, 320, 322, 324, 326, 328, 418, 421, 423, 425, 427, 897, 899, 901, 903 and 905.


In some embodiments, the contacting results in production of cucurbitadienol. In some embodiments, the 2,3-oxidosqualene and diepoxysqualene is provided to, produced by, and/or present in the recombinant host cell. In some embodiments, one or more of the 2,3-oxidosqualene and diepoxysqualene is produced by an enzyme comprising a sequence having, or having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 898 or 900. In some embodiments, the production of one or more of 2,3-oxidosqualene, diepoxysqualene, and diepoxysqualene involves an enzyme encoded by a nucleic acid sequence having, or having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 897 or 899. In some embodiments, the cucurbitadienol synthase comprises an amino acid sequence having at least 70% sequence identity to any one of SEQ ID NOs: 70-73, 75-77, 319, 321, 323, 325, 327, 329-333, 420, 422, 424, 426, 446, 902, 904, and 906. In some embodiments, the polypeptide comprising cucurbitadienol synthase activity is encoded by a gene comprising a nucleic acid sequence having, or having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 74, 320, 322, 324, 326, 328, 418, 421, 423, 425, 427, 897, 899, 901, 903 and 905.


In some embodiments, the 11-hydroxy cucurbitadienol is provided to, produced by, and/or present in the recombinant host cell. In some embodiments, the 11-hydroxy cucurbitadienol is expressed in a cell (for example the recombinant host cell) comprising a gene encoding CYP87D18 and/or SgCPR protein. In some embodiments, the CYP87D18 or SgCPR protein comprises an amino acid sequence having, or having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 872 or 874. In some embodiments, the CYP87D18 or SgCPR protein is encoded by a nucleic acid sequence having, or having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 871 or 873.


In some embodiments, the method comprises contacting squalene with the recombinant host cell, wherein the recombinant host cell comprises a gene encoding a squalene epoxidase. In some embodiments, the contacting results in production of 2,3-oxidosqualene. In some embodiments, the squalene is provided to, produced by, and/or present in the recombinant host cell. In some embodiments, the squalene epoxidase comprises an amino acid sequence having, or having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 50-56, 60, 61, 334 or 335. In some embodiments, squalene epoxidase is encoded by a nucleic acid sequence having, or having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 335.


In some embodiments, the method comprises contacting farnesyl pyrophosphate with the recombinant host cell, wherein the recombinant host cell comprises a gene encoding a squalene synthase. In some embodiments, the contacting results in production of squalene. In some embodiments, the farnesyl pyrophosphate is provided to, produced by, and/or present in the recombinant host cell. In some embodiments, the squalene synthase comprises an amino acid sequence having, or having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 69 and 336. In some embodiments, the squalene synthase is encoded by a sequence comprising a nucleic acid sequence having, or having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 337.


In some embodiments, the method comprises contacting geranyl-PP with the recombinant host cell, wherein the recombinant host cell comprises a gene encoding farnesyl-PP synthase. In some embodiments, the contacting results in production of farnesyl-PP. In some embodiments, the geranyl-PP is provided to, produced by, and/or present in the recombinant host cell. In some embodiments, the farnesyl-PP synthase comprises an amino acid sequence having, or having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 338. In some embodiments, the farnesyl-PP synthase is encoded by a nucleic acid sequence having, or having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 339.


In some embodiments, one or more of the genes encoding (1) the first enzyme capable of catalyzing the production of Compound I from mogroside IIIE, (2) the enzyme capable of catalyzing the production of mogroside IIIE from mogroside IIA, (3) the epoxide hydrolase, (4) the cytochrome P450, (5) the polypeptide having cucurbitadienol synthase activity, (6) squalene epoxidase, (7) farnesyl-PP synthase is operably linked to a heterologous promoter. In some embodiments, the heterologous promoter is a CMV, EF1a, SV40, PGK1, human beta actin, CAG, GAL1, GAL10, TEF, GDS, ADH1, CaMV35S, Ubi, T7, T7lac, Sp6, araBAD, trp, lac, Ptac, pL promoter, or a combination thereof. In some embodiments, the promoter is an inducible, repressible, or constitutive promoter. In some embodiments, production of one or more of pyruvate, acetyl-CoA, citrate, and TCA cycle intermediates have been upregulated in the recombinant host cell. In some embodiments, cytosolic localization has been upregulated in the recombinant host cell. In some embodiments, one or more of the first, second third, fourth, fifth, sixth, seventh, eighth, ninth, and tenth gene comprises at least one sequence encoding a 2A self-cleaving peptide.


In some embodiments, the recombinant host cell is a plant, bivalve, fish, fungus, bacteria, or mammalian cell. In some embodiments, the plant is selected from the group consisting of Siraitia, Momordica, Gynostemma, Cucurbita, Cucumis, Arabidopsis, Artemisia, Stevia, Panax, Withania, Euphorbia, Medicago, Chlorophytum, Eleutherococcus, Aralia, Morus, Medicago, Betula, Astragalus, Jatropha, Camellia, Hypholoma, Aspergillus, Solanum, Huperzia, Pseudostellaria, Corchorus, Hedera, Marchantia, and Morus. In some embodiments, the fungus is selected from the group consisting of Trichophyton, Sanghuangporus, Taiwanofungus, Moniliophthora, Marssonina, Diplodia, Lentinula, Xanthophyllomyces, Pochonia, Colletotrichum, Diaporthe, Histoplasma, Coccidioides, Histoplasma, Sanghuangporus, Aureobasidium, Pochonia, Penicillium, Sporothrix, Metarhizium, Aspergillus, Yarrowia, and Lipomyces. In some embodiments, the fungus is Aspergillus nidulans, Yarrowia lipolytica, or Rhodosporin toruloides. In some embodiments, the recombinant host cell is a yeast cell. In some embodiments, the yeast is selected from the group consisting of Candida, Sacccharaomyces, Saccharomycotina, Taphrinomycotina, Schizosaccharomycetes, Komagataella, Basidiomycota, Agaricomycotina, Tremellomycetes, Pucciniomycotina, Aureobasidium, Coniochaeta, Rhodosporidium, and Microboryomycetes. In some embodiments, the bacteria is selected from the group consisting of Frankia, Actinobacteria, Streptomyces, and Enterococcus. In some embodiments, the recombinant host cell is a Saccharomyces cerevisiae cell or a Yarrowia lipolytica cell. In some embodiments, one or more of the first, second third, fourth, fifth, sixth, seventh, eighth, ninth, and tenth gene is a codon optimized gene for expression in a bacterial, mammalian, plant, fungal and/or insect cell. In some embodiments, one or more of the first, second third, fourth, fifth, sixth, seventh, eighth, ninth, and tenth gene comprises a functional mutation to increased activity of the encoded enzyme. In some embodiments, cultivating the recombinant host cell comprises monitoring the cultivating for pH, dissolved oxygen level, nitrogen level, or a combination thereof of the cultivating conditions.


In some embodiments, the method comprises isolating Compound 1. In some embodiments, isolating Compound 1 comprises lysing the recombinant host cell, and/or isolating Compound 1 from the culture medium. In some embodiments, the method comprises purifying Compound 1. In some embodiments, purifying Compound 1 comprises HPLC, solid phase extraction or a combination thereof. In some embodiments, the purifying comprises harvesting the recombinant host cells; saving the supernatant; and lysing the recombinant host cells. In some embodiments, the lysing comprises subjecting the cells to shear force or detergent washes thereby obtaining a lysate. The shear force can be, for example, from a sonication method, french pressurized cells, or beads. In some embodiments, the lysate is subjected to filtering and purification steps. In some embodiments, the lysate is filtered and purified by solid phase extraction.


In some embodiments, the method further comprises contacting a first mogroside with one or more hydrolase to produce Mogroside IIIE before contacting the Mogroside IIIE with the first enzyme. The hydrolase can be, for example, a β-glucan hydrolase. In some embodiments, the hydrolase is EXG1 or EXG2. In some embodiments, the hydrolases comprises an amino acid sequence comprising having, or having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs. 292, 366-368, 372, 376-398, 447-520, 524-845, 1013, 1014, and 1023.


In some embodiments, the first mogroside is mogroside IV, mogroside V, mogroside VI, or a combination thereof. In some embodiments, the first mogroside is mogroside V, siamenoside I, mogroside IVE, mogroside VI, mogroside IVA, or a combination thereof.


In some embodiments, contacting the first mogroside with the one or more hydrolase comprises contacting the first mogroside with a host cell that comprises a gene encoding the hydrolase. In some embodiments, the first mogroside contacts the one or more hydrolase in a host cell that comprises a gene encoding the hydrolase and a gene encoding the enzyme capable of catalyzing production of Compound 1. The host cell can be the recombinant host cell that comprises the first gene encoding the first enzyme capable of catalyzing the production of Compound 1 from mogroside IIIE. In some embodiments, the host cell is not the recombinant host cell comprising the first gene encoding the first enzyme capable of catalyzing the production of Compound 1 from mogroside IIIE. In some embodiments, the first mogroside is provided to, produced by, and/or present in the host cell. The gene encoding the hydrolase can be heterologouis or homologous to the host cell. In some embodiments, the gene encoding the hydrolase is expressed at a normal level in the host cell. In some embodiments, the gene encoding the hydrolase is overexpressed in the host cell.


In some embodiments, the recombinant host cell comprises an oxidosqualene cyclase such as a cycloartenol synthase or a beta-amyrin synthase or a nucleic acid sequence encoding an oxidosqualene cyclase such as a cycloartenol synthase or a beta-amyrin synthase, and wherein the oxidosqualene cyclase, cycloartenol synthase, or beta-amyrin synthase are modified to produce cucurbitadienol or epoxycucurbitadienol. The oxidosqualene cyclase can, for example, comprise or consists of a sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 341, 343 and 346-347.


In some embodiments, the recombinant host cell comprises cytochrome P450 reductase or a gene encoding cytochrome P450 reductase. In some embodiments, the cytochrome P450 reductase comprises, or consists of, a sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 318.


Disclosed herein include a compound having the structure of Compound 1,




embedded image



wherein the compound is produced by any of the methods disclosed herein.


Disclosed herein include a cell lysate comprising Compound 1 having the structure:




embedded image


Also disclosed herein include a recombinant cell comprising: Compound 1 having the structure:




embedded image



and a gene encoding an enzyme capable of catalyzing production of Compound 1 from mogroside IIIE. In some embodiments, the gene is a heterologous gene to the recombinant cell.


Disclosed herein include a recombinant cell comprising a first gene encoding a first enzyme capable of catalyzing production of Compound 1 having the structure:




embedded image



from mogroside IIIE.


The first enzyme can be, for example, one or more of UDP glycosyltransferases, cyclomaltodextrin glucanotransferases (CGTases), glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. In some embodiments, the first enzyme is a CGTase. For example, the CGTase can comprise an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to the sequence of any one of SEQ ID NOs: 1, 3, 78-101, 148, and 154. In some embodiments, the CGTase comprises, or consists of, the amino acid sequence of any one of SEQ ID NOs: 1, 3, 78-101, 148, and 154. In some embodiments, the first enzyme is a dextransucrase. For example, the dextransucrase can comprise an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of the sequences set forth in SEQ ID NOs: 2, 103, 106-110, 156 and 896. In some embodiments, the dextransucrase is encoded by a nucleic acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 104, 105, 157, 158, and 895. In some embodiments, the first enzyme is a transglucosidase. The transglucosidase can, for example, comprise an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to the sequence of any one of SEQ ID NOs: 163-291 and 723. In some embodiments, the transglucosidase comprises, or consists of, an amino acid sequence of any one of SEQ ID NOs: 3, 95-102 and 163-291 and 723. In some embodiments, the first enzyme is a beta-glucosidase. In some embodiments, the beta-glucosidase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 102, 292, 354-376, and 678-741.


In some embodiments, the cell further comprises a second gene encoding a uridine diphosphate-glucosyl transferase (UGT). In some embodiments, the UGT comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 4-9, 15-19, 125, 126, 128, 129, 293-307, 407, 409, 411, 413, 439, 441, and 444. In some embodiments, the UGT is encoded by a nucleic acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 116-124, 127, 130, 408, 410, 412, 414, 440, 442, 443, and 445. In some embodiments, the UGT is encoded by a sequence set forth in UGT1495 (SEQ ID NO: 10), UGT1817 (SEQ ID NO: 11), UGT5914 (SEQ ID NO: 12), UGT8468 (SEQ ID NO: 13), or UGT10391 (SEQ ID NO: 14). In some embodiments, the cell further comprises a third gene encoding UGT98 or UGT SK98. In some embodiments, the UGT98 or UGT SK98 comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 9, 407, 16 or 306. In some embodiments, the UGT98 is encoded by a nucleic acid sequence set forth in SEQ ID NO: 307.


In some embodiments, the cell comprises a fourth gene encoding an epoxide hydrolase. In some embodiments, the epoxide hydrolase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 21-30 and 309-314; or is encoded by a nucleic acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 114 and 115.


In some embodiments, the cell comprises a fifth sequence encoding P450. In some embodiments, the P450 comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 20, 49, 308, 315, 430, 872, 874, 876, 878, 880, 882, 884, 886, 888, 890, and 891; or is encoded by a nucleic acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 31-48, 316, 431, 871, 873, 875, 877, 879, 881, 883, 885, 887, 889, and 892. In some embodiments, the P450 is encoded by a gene comprising or consisting of a sequence set forth in any one of SEQ ID NOs: 31-48, 316 and 318.


In some embodiments, the cell comprises a sixth sequence encoding a polypeptide having cucurbitadienol synthase activity. In some embodiments, the polypeptide having cucurbitadienol synthase activity is a fusion protein. In some embodiments, the fusion protein comprises one or more fusion domains fused to the N-terminus, the C-terminus, or both of a cucurbitadienol synthase. The length of the fusion domain can vary, for example from about 3 to about 1000 amino acids long, or about 5 to about 50 amino acids long. In some embodiments, the fusion domain is a substantial portion or the entire sequence of a functional protein. In some embodiments, the polypeptide having cucurbitadienol synthase activity comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 70-73, 75-77, 319, 321, 323, 325, 327-333, 417, 420, 422, 424, 426, 446, 902, 904, 906, 906, 851, 854, 856, 1024, 859, 862, 865, 867, 915, 920, 924, 928, 932, 936, 940, 944, 948, 952, 956, 959, 964, 967, 971, 975, 979, 983, 987, 991, 995, 999, 1003, 1007, and 1011. In some embodiments, the polypeptide having cucurbitadienol synthase activity is encoded by a nucleic acid sequence having at least 70% sequence identity to any one of SEQ ID NOs: 74, 320, 322, 324, 326, 328, 418, 421, 423, 425, 427, 897, 899, 901, 903 and 905.


In some embodiments, the cell further comprises a seventh gene encoding a squalene epoxidase. In some embodiments, the squalene epoxidase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 50-56, 60, 61, 334, and 335. In some embodiments, the squalene epoxidase is encoded by a nucleic acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 335.


In some embodiments, the cell comprises an eighth gene encoding a squalene synthase. In some embodiments, the squalene synthase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 69 or 336. In some embodiments, the squalene synthase is encoded by a sequence comprising, or consisting of, a nucleic acid sequence set forth in SEQ ID NO: 337.


In some embodiments, the cell further comprises a ninth gene encoding a farnesyl-PP synthase. In some embodiments, the farnesyl-PP synthase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 338. In some embodiments, the farnesyl-PP synthase is encoded by a nucleic acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 339.


In some embodiments, the cell is a mammalian, plant, bacterial, fungal, or insect cell. For example, the cell can be a yeast cell. In some embodiments, the yeast is selected from Candida, Saccharaomyces, Saccharomycotina, Taphrinomycotina, Schizosaccharomycetes, Komagataella, Basidiomycota, Agaricomycotina, Tremellomycetes, Pucciniomycotina, Aureobasidium, Coniochaeta, and Microboryomycetes. In some embodiments, the plant is selected from the group consisting of Siraitia, Momordica, Gynostemma, Cucurbita, Cucumis, Arabidopsis, Artemisia, Stevia, Panax, Withania, Euphorbia, Medicago, Chlorophytum, Eleutherococcus, Aralia, Morus, Medicago, Betula, Astragalus, Jatropha, Camellia, Hypholoma, Aspergillus, Solanum, Huperzia, Pseudostellaria, Corchorus, Hedera, Marchantia, and Morus. In some embodiments, the fungus is selected from Trichophyton, Sanghuangporus, Taiwanofungus, Moniliophthora, Marssonina, Diplodia, Lentinula, Xanthophyllomyces, Pochonia, Colletotrichum, Diaporthe, Histoplasma, Coccidioides, Histoplasma, Sanghuangporus, Aureobasidium, Pochonia, Penicillium, Sporothrix, and Metarhizium.


In some embodiments, the cell comprises a gene encoding at least one hydrolytic enzyme capable of hydrolyzing Mogroside V. In some embodiments, Compound 1 displays tolerance to hydrolytic enzymes in the recombinant cell, wherein the hydrolytic enzymes display capabilities of hydrolyzing Mogroside VI, Mogroside V, Mogroside IV to Mogroside IIIE.


In some embodiments, the recombinant cell comprises an oxidosqualene cyclase such as a cycloartenol synthase or a beta-amyrin synthase or a nucleic acid sequence encoding an oxidosqualene cyclase such as a cycloartenol synthase or a beta-amyrin synthase, and wherein the oxidosqualene cyclase, cycloartenol synthase, or beta-amyrin synthase are modified to produce cucurbitadienol or epoxycucurbitadienol.


In some embodiments, the oxidosqualene cyclase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 341, 343 and 346-347. In some embodiments, the cell comprises cytochrome P450 reductase or a gene encoding cytochrome P450 reductase. In some embodiments, the cytochrome P450 reductase regenerates cytochrome P450 activity. In some embodiments, the cytochrome P450 reductase comprises a sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 318.


In some embodiments, the cell comprises a sequence set forth in any one of SEQ ID NOs: 897, 899, 909, 911, 913, 418, 421, 423, 425, 427, 871, 873, 901, 903, and 905. In some embodiments, the cell comprises an enzyme comprising a sequence set forth in or is encoded by any one sequence of SEQ ID NOs: 315, 316, 420, 422, 424, 426, 430, 431, 446, 871, 845-949, and 951-1012.


In some embodiments, the cell comprises a gene encoding a hydrolase capable of hydrolyzing a first mogroside to produce Mogroside IIIE. In some embodiments, the hydrolase is a β-glucan hydrolase. In some embodiments, the hydrolase is EXG1 or EXG2. In some embodiments, the hydrolase comprises an amino acid sequence comprising having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs. 292, 366-368, 372, 376-398, 447-520, 524-845, 1013, 1014, and 1023.


In some embodiments, the first mogroside is mogroside IV, mogroside V, mogroside VI, or a combination thereof. In some embodiments, the first mogroside is mogroside V, siamenoside I, mogroside IVE, mogroside VI, mogroside IVA, or a combination thereof.


The gene encoding the hydrolase can be heterologous or homologous to the recombinant host cell. The gene encoding the hydrolase can be expressed at a normal level or overexpressed in the recombinant host cell. In some embodiments, the cell is a yeast cell. In some embodiments, the cell is Saccharomyces cerevisiae or Yarrowia lipolytica.


Also disclosed herein include a method of producing Compound 1 having the structure of:




embedded image



In some embodiments, the method comprises: contacting a first mogroside with one or more hydrolase to produce mogroside IIIE; and contacting the mogroside IIIE with an enzyme capable of catalyzing production of Compound 1 from mogroside IIIE.


In some embodiments, the hydrolase is a β-glucan hydrolase. In some embodiments, the hydrolase is EXG1, for example an EXG1 comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 1012 or 1014. In some embodiments, the hydrolase is EXG2, for example an EXG2 comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 1023.


In some embodiments, the first mogroside is mogroside IV, a mogroside V, a mogroside VI, or a combination thereof. In some embodiments, the first mogroside is mogroside V, siamenoside I, mogroside IVE, mogroside VI, mogroside IVA, or a combination thereof.


In some embodiments, contacting the first mogroside with the one or more hydrolase comprises contacting the first mogroside with a recombinant host cell that comprises a first gene encoding the hydrolase and a second gene encoding the enzyme capable of catalyzing production of Compound 1. In some embodiments, the first mogroside contacts the one or more hydrolase in a recombinant host cell that comprises a first gene encoding the hydrolase and a second gene encoding the enzyme capable of catalyzing production of Compound 1. In some embodiments, the first mogroside is produced by the recombinant host cell. In some embodiments, the first gene is native to the recombinant host cell. In some embodiments, the first gene is expressed at a normal level. In some embodiments, the first gene is heterologous to the recombinant host cell. In some embodiments, the first gene is overexpressed. In some embodiments, the second gene is heterologous to the recombinant host cell. In some embodiments, the enzyme capable of catalyzing production of Compound 1 is one or more of UDP glycosyltransferases, cyclomaltodextrin glucanotransferases (CGTases), glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases.


Disclosed herein includes a recombinant cell, comprising: a first gene encoding a hydrolase capable of hydrolyzing a first mogroside to produce mogroside IIIE; and a second gene encoding an enzyme capable of catalyzing production of Compound 1 from Mogroside IIIE, wherein Compound 1 has the structure:




embedded image


In some embodiments, the hydrolase is a β-glucan hydrolase. For example, the hydrolase can be EXG1 or EXG2. In some embodiments, the EXG2 protein comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 1023. In some embodiments, the first mogroside is mogroside IV, mogroside V, mogroside VI, or a combination thereof. In some embodiments, the first mogroside is mogroside V, siamenoside I, mogroside IVE, mogroside VI, mogroside IVA, or a combination thereof. The first gene can be heterologous or homologous to the recombinant host cell. The first gene can be overexpressed or expressed at a normal level in the host cell. The cell can be, for example, a yeast cell. In some embodiments, the cell is Saccharomyces cerevisiae or Yarrowia lipolytica.


Also disclosed herein include a fusion polypeptide having cucurbitadienol synthase activity, wherein the fusion polypeptide comprises a fusion domain fused to a cucurbitadienol synthase. The fusion domain can be fused to the N-terminus, the C-terminus, or both of the cucurbitadienol synthase. The fusion domain can be, for example, about 3 to about 1000 amino acids long, or about 5 to about 50 amino acids long. In some embodiments, the fusion domain comprises a substantial portion or the entire sequence of a functional protein. In some embodiments, the fusion domain is a substantial portion or the entire sequence of a functional protein. In some embodiments, the fusion domain comprises a portion or the entire sequence of a yeast protein. In some embodiments, the fusion domain is a portion or the entire sequence of a yeast protein. In some embodiments, the fusion polypeptide comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 851, 854, 856, 1024, 859, 862, 865, 867, 915, 920, 924, 928, 932, 936, 940, 944, 948, 952, 956, 959, 964, 967, 971, 975, 979, 983, 987, 991, 995, 999, 1003, 1007, and 1011. In some embodiments, the fusion polypeptide comprises, or consists of, an amino acid sequence set forth in any one of SEQ ID NOs: 851, 854, 856, 1024, 859, 862, 865, 867, 915, 920, 924, 928, 932, 936, 940, 944, 948, 952, 956, 959, 964, 967, 971, 975, 979, 983, 987, 991, 995, 999, 1003, 1007, and 1011. In some embodiments, the fusion domain of the fusion polypeptide comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 866, 870, 917, 921, 925, 929, 933, 937, 941, 945, 949, 953, 957, 961, 968, 972, 976, 980, 984, 988, 992, 996, 1000, 1004, 1008, and 1012. In some embodiments, the fusion domain of the fusion polypeptide comprises, or consists of, a sequence set forth in any one of SEQ ID NOs: 866, 870, 917, 921, 925, 929, 933, 937, 941, 945, 949, 953, 957, 961, 968, 972, 976, 980, 984, 988, 992, 996, 1000, 1004, 1008, and 1012.


In some embodiments, the cucurbitadienol synthase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 70-73, 75-77, 319, 321, 323, 325, 327, 329-333, 420, 422, 424, 426, 446, 902, 904, and 906. In some embodiments, the cucurbitadienol synthase comprises, or consists of, an amino acid sequence set forth in any one of SEQ ID NOs: 70-73, 75-77, 319, 321, 323, 325, 327, 329-333, 420, 422, 424, 426, 446, 902, 904, and 906. In some embodiments, the cucurbitadienol synthase is encoded by a gene comprising a nucleic acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 74, 320, 322, 324, 326, 328, 418, 421, 423, 425, 427, 897, 899, 901, 903, and 905. In some embodiments, the cucurbitadienol synthase is encoded by a gene comprising, or consisting of, a nucleic acid sequence set forth in any one of SEQ ID NOs: 74, 320, 322, 324, 326, 328, 418, 421, 423, 425, 427, 897, 899, 901, 903, and 905.


Disclosed herein include a recombinant nucleic acid molecule which comprises a nucleic acid sequence encoding any of the fusion polypeptides disclosed herein and having cucurbitadienol synthase activity. Disclosed herein include a recombinant cell comprising any of the fusion polypeptide disclosed herein and having cucurbitadienol synthase activity and/or any recombinant nucleic acid molecules disclosed herein encoding a fusion polypeptide having cucurbitadienol synthase activity. Also disclosed herein include a method using any of the fusion polypeptide disclosed herein and having cucurbitadienol synthase activity. The method can comprise contacting a substrate for cucurbitadienol synthase with a fusion polypeptide having cucurbitadienol synthase activity. In some embodiments, the contacting results in a production of curcurbitadienol, 24,25-epoxy curcurbitadienol, or a combination thereof. In some embodiments, the substrate for cucurbitadienol synthase comprises one or more of 2,3-oxidosqualene, dioxidosqualene and diepoxysqualene. In some embodiments, the contacting comprises contacting the substrate with a recombinant host cell which comprises a nucleic acid sequence encoding the fusion polypeptide. The recombinant host cell can, for example, express the fusion polypeptide. In some embodiments, the substrate is provided to, present in, and/or produced by the recombinant host cell.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 shows HPLC data and mass spectroscopy data (inset) of Compound 1 production after treatment of Mogroside IIIE with CGTase.



FIG. 2 shows HPLC data and mass spectroscopy data (inset) of Compound 1 production after treatment of Mogroside IIIE with Streptococcus mutans Clarke ATCC 25175 Dextransucrase.



FIG. 3 shows HPLC data and mass spectroscopy data (inset) of mogroside glycosylation reaction after treatment with Celluclast in the presence of xylan.



FIGS. 4 and 5 shows HPLC data and mass spectroscopy data (inset) of mogroside glycosylation reaction after treatment with UDP-glycosyltransferase.



FIG. 6 shows HPLC data and mass spectroscopy data (inset) of Mogrol after treatment with UDP-glycosyltransferase UGT73C6 to Mogroside I.



FIGS. 7-9 show HPLC data and mass spectroscopy data (inset) of Mogrol after treatment with UDP-glycosyltransferase (338) (SEQ ID NO: 405) to the products Mogroside I, Mogroside IIA, and 2 different Mogroside III products.



FIGS. 10 and 11 show HPLC data and mass spectroscopy data (inset) of Mogroside IIIE after treatment with UDP-glycosyltransferase to produce Siamenoside I and Mogroside V products.



FIGS. 12-14 show HPLC data and mass spectroscopy data (inset) of products of the reaction of Mogrol, Siamenoside I or Compound 1 after treatment with UDP-glycosyltransferase (339) (SEQ ID NO:409) to produce Mogroside I, Isomogroside V and Compound 1 derivative, respectively.



FIGS. 15-20 show HPLC data and mass spectroscopy data (inset) of Mogroside IIIA, Mogroside IVE, Mogroside V, respectively which were produced treating Mogroside IIA, Mogroside IIE, Mogroside IIIE, Mogroside IVA, or Mogroside IVE with UDP-glycosyltransferase (330) (SEQ ID NO: 411).



FIGS. 21 and 22 show mass spectroscopy profile of reaction products Mogroside IVE and Mogroside V.



FIG. 23 shows production of cucurbitadienol with cucurbitadienol synthase (SgCbQ) (SEQ ID NO: 417).



FIG. 24 shows production of cucurbitadienol using the enzyme Cpep2 (SEQ ID NO: 420).



FIG. 25 shows production of cucurbitadienol using the enzyme Cpep4 (SEQ ID NO: 422).



FIG. 26 shows production of dihydroxycucurbitadienol from catalysis by epoxide hydrolase (SEQ ID NO: 428).



FIGS. 27A-B show tolerance of Compound 1 to hydrolysis by microbial enzymes



FIG. 28 shows UPLC chromatogram of α-mogroside isomers mixture from Hilic_80_20_method.



FIG. 29 shows purity of the sample from UPLC analysis on Hilic_80_20_method.



FIG. 30 shows a flow chart showing a non-limiting exemplary pathway for producing Compound 1.



FIG. 31 shows the UV absorbance of the diepoxysqulene of Step 1 shown in FIG. 30 for boosting oxidosqualene.



FIG. 32 shows production of cucurbitadienol in step 2 using enzymes from Cucumis melo and Cucurbita maxima



FIG. 33 shows production of cucurbitadienol in step 2 using enzyme from Pisum sativum.



FIG. 34 shows production of cucurbitadienol in step 2 using enzyme from Dictyostelium sp.



FIG. 35 shows the intermediates of Step 3 of the pathway shown in FIG. 30.



FIG. 36 shows the mass spectroscopy data of the intermediates of step 4 of the pathyway shown in FIG. 30, mogrol synthesis.



FIG. 37 shows the intermediates of step 7 of the pathway shown in FIG. 30, synthesis of Compound 1.



FIG. 38 is a schematic illustration showing the production of hyper-glycosylated mogrosides through glycosolation enzymes, which may then be hydrolyzed back to Mogroside IIIE.



FIG. 39 is a schematic illustration showing how hydrolysis can be used to hydrolyze hyper-glycosylated mogrosides to produce Mogroside IIIE, which can then be converted to Compound 1.



FIGS. 40A-B show that after 2 days of incubation, substantially all of the mogrosides were converted to Mogroside IIIE in S. cerevisiae or Y. lipolytica.



FIG. 41 shows that no hydrolysis product from Compound 1 was detected from in S. cerevisiae or Y. lipolytica.



FIG. 42 shows production of Compound 1 in S. cerevisiae modified to overexpress a dextransucrase (DexT).



FIG. 43 shows enzymatic reactions catalyzed by the 311 enzyme (UDP-glycosyltransferases.





DETAILED DESCRIPTION
Definitions

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as is commonly understood by one of ordinary skill in the art to which this disclosure belongs. All patents, applications, published applications, and other publications are incorporated by reference in their entirety. In the event that there is a plurality of definitions for a term herein, those in this section prevail unless stated otherwise.


“Solvate” refers to the compound formed by the interaction of a solvent and a compound described herein or salt thereof. Suitable solvates are physiologically acceptable solvates including hydrates.


A “sweetener”, “sweet flavoring agent”, “sweet flavor entity”, “sweet compound,” or “sweet tasting compound,” as used herein refers to a compound or physiologically acceptable salt thereof that elicits a detectable sweet flavor in a subject.


As used herein, the term “operably linked” is used to describe the connection between regulatory elements and a gene or its coding region. Typically, gene expression is placed under the control of one or more regulatory elements, for example, without limitation, constitutive or inducible promoters, tissue-specific regulatory elements, and enhancers. A gene or coding region is said to be “operably linked to” or “operatively linked to” or “operably associated with” the regulatory elements, meaning that the gene or coding region is controlled or influenced by the regulatory element. For instance, a promoter is operably linked to a coding sequence if the promoter effects transcription or expression of the coding sequence.


The term “regulatory element” and “expression control element” are used interchangeably and refer to nucleic acid molecules that can influence the expression of an operably linked coding sequence in a particular host organism. These terms are used broadly to and cover all elements that promote or regulate transcription, including promoters, enhancer sequences, response elements, protein recognition sites, inducible elements, protein binding sequences, 5′ and 3′ untranslated regions (UTRs), transcriptional start sites, termination sequences, polyadenylation sequences, intrans, core elements required for basic interaction of RNA polymerase and transcription factors, upstream elements, enhancers, response elements (see, e.g., Lewin, “Genes V” (Oxford University Press, Oxford) pages 847-873), and any combination thereof. Exemplary regulatory elements in prokaryotes include promoters, operator sequences and a ribosome binding sites. Regulatory elements that are used in eukaryotic cells can include, without limitation, transcriptional and translational control sequences, such as promoters, enhancers, splicing signals, polyadenylation signals, terminators, protein degradation signals, internal ribosome-entry element (IRES), 2A sequences, and the like, that provide for and/or regulate expression of a coding sequence and/or production of an encoded polypeptide in a host cell. In some embodiments herein, the recombinant cell described herein comprises a genes operably linked to regulatory elements.


As used herein, 2A sequences or elements refer to small peptides introduced as a linker between two proteins, allowing autonomous intraribosomal self-processing of polyproteins (See e.g., de Felipe. Genetic Vaccines and Ther. 2:13 (2004); deFelipe et al. Traffic 5:616-626 (2004)). These short peptides allow co-expression of multiple proteins from a single vector. Many 2A elements are known in the art. Examples of 2A sequences that can be used in the methods and system disclosed herein, without limitation, include 2A sequences from the foot-and-mouth disease virus (F2A), equine rhinitis A virus (E2A), Thosea asigna virus (T2A), and porcine teschovirus-1 (P2A) as described in U.S. Patent Publication No. 20070116690.


As used herein, the term “promoter” is a nucleotide sequence that permits binding of RNA polymerase and directs the transcription of a gene. Typically, a promoter is located in the 5′ non-coding region of a gene, proximal to the transcriptional start site of the gene. Sequence elements within promoters that function in the initiation of transcription are often characterized by consensus nucleotide sequences. Examples of promoters include, but are not limited to, promoters from bacteria, yeast, plants, viruses, and mammals (including humans). A promoter can be inducible, repressible, and/or constitutive. Inducible promoters initiate increased levels of transcription from DNA under their control in response to some change in culture conditions, such as a change in temperature.


As used herein, the term “enhancer” refers to a type of regulatory element that can increase the efficiency of transcription, regardless of the distance or orientation of the enhancer relative to the start site of transcription.


As used herein, the term “transgene” refers to any nucleotide or DNA sequence that is integrated into one or more chromosomes of a target cell by human intervention. In some embodiment, the transgene comprises a polynucleotide that encodes a protein of interest. The protein-encoding polynucleotide is generally operatively linked to other sequences that are useful for obtaining the desired expression of the gene of interest, such as transcriptional regulatory sequences. In some embodiments, the transgene can additionally comprise a nucleic acid or other molecule(s) that is used to mark the chromosome where it has integrated.


“Percent (%) sequence identity” with respect to polynucleotide or polypeptide sequences is used herein as the percentage of bases or amino acid residues in a candidate sequence that are identical with the bases or amino acid residues in another sequence, after aligning the two sequences. Gaps can be introduced into the sequence alignment, if necessary, to achieve the maximum percent sequence identity. Conservative substitutions are not considered as part of the sequence identity. Alignment for purposes of determining percent (%) sequence identity can be achieved in various ways that are within the skill in the art, for instance, using publicly available computer methods and programs such as BLAST, BLAST-2, ALIGN, FASTA (available in the Genetics Computing Group (GCG) package, from Madison, Wis., USA), or Megalign (DNASTAR). Those of skill in the art can determine appropriate parameters for measuring alignment, including any algorithms needed to achieve maximal alignment over the full length of the sequences being compared.


For instance, percent (%) amino acid sequence identity values may be obtained by using the WU-BLAST-2 computer program described in, for example, Altschul et al., Methods in Enzymology, 1996, 266:460-480. Many search parameters in the WU-BLAST-2 computer program can be adjusted by those skilled in the art. For example, some of the adjustable parameters can be set with the following values: overlap span=1, overlap fraction=0.125, word threshold (T)=11, and scoring matrix=BLOSUM62. When WU-BLAST-2 is used, a % amino acid sequence identity value is determined by dividing (a) the number of matching identical amino acid residues between the amino acid sequence of a first protein of interest and the amino acid sequence of a second protein of interest as determined by WU-BLAST-2 by (b) the total number of amino acid residues of the first protein of interest.


Percent amino acid sequence identity may also be determined using the sequence comparison program NCBI-BLAST2 described in, for example, Altschul et al., Nucleic Acids Res., 1997, 25:3389-3402. The NCBI-BLAST2 sequence comparison program may be downloaded from http://www.ncbi.nlm.nih.gov or otherwise obtained from the National Institute of Health, Bethesda, Md. NCBI-BLAST2 uses several adjustable search parameters. The default values for some of those adjustable search parameters are, for example, unmask=yes, strand=all, expected occurrences=10, minimum low complexity length=15/5, multi-pass e-value=0.01, constant for multi-pass=25, drop-off for final gapped alignment=25 and scoring matrix=BLOSUM62.


In situations where NCBI-BLAST2 is used for amino acid sequence comparisons, the % amino acid sequence identity of a given amino acid sequence A to, with, or against a given amino acid sequence B (which can alternatively be phrased as a given amino acid sequence A that has or comprises a certain % amino acid sequence identity to, with, or against a given amino acid sequence B) is calculated as follows:

100 times the fraction X/Y

where X is the number of amino acid residues scored as identical matches by the sequence alignment program NCBI-BLAST2 in that program's alignment of A and B, and where Y is the total number of amino acid residues in B. It will be appreciated that where the length of amino acid sequence A is not equal to the length of amino acid sequence B, the % amino acid sequence identity of A to B will not equal the % amino acid sequence identity of B to A.


As used herein, “isolated” means that the indicated compound has been separated from its natural milieu, such that one or more other compounds or biological agents present with the compound in its natural state are no longer present.


As used herein, “purified” means that the indicated compound is present at a higher amount relative to other compounds typically found with the indicated compound (e.g., in its natural environment). In some embodiments, the relative amount of purified a purified compound is increased by greater than 1%, 5%, 10%, 20%, 30%, 40%, 50%, 80%, 90%, 100%, 120%, 150%, 200%, 300%, 400%, or 1000%. In some embodiments, a purified compound is present at a weight percent level greater than 1%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 98%, 99%, or 99.5% relative to other compounds combined with the compound. In some embodiments, the compound 1 produced from the embodiments herein is present at a weight percent level greater than 1%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 98%, 99%, or 99.5% relative to other compounds combined with the compound after production.


“Purification” as described herein, can refer to the methods for extracting Compound 1 from the cell lysate and/or the supernatant, wherein the cell is excreting the product of Compound 1. “Lysate” as described herein, comprises the cellular content of a cell after disruption of the cell wall and cell membranes and can include proteins, sugars, and mogrosides, for example. Purification can involve ammonium sulfate precipitation to remove proteins, salting to remove proteins, hydrophobic separation (HPLC), and use of an affinity column. In view of the products produced by the methods herein, affinity media is contemplated for the removal of specific mogrosides with an adsorbent resin.


“HPLC” as described herein is a form of liquid chromatography that can be used to separate compounds that are dissolved in solution. Without being limiting the HPLC instruments can comprise of a reservoir of mobile phase, a pump, an injector, a separation column, and a detector. Compounds can then be separated by injecting a sample mixture onto the column. The different components in the mixture pass can pass through the column at different rates due to differences in their partitioning behavior between the mobile liquid phase and the stationary phase. There are several columns that can be used. Without being limiting the columns can be normal phase columns, reverse phase columns, size exclusion type of columns, and ion exchange columns.


Also contemplated is the use of solid phase extraction and fractionation, which is useful for desalting proteins and sugar samples. Other methods can include the use of HPLC, liquid chromatography for analyzing samples, and liquid-liquid extraction, described in Aurda Andrade-Eiroa et al. (TrAC Trends in Analytical Chemistry Volume 80, June 2016, Pages 641-654; incorporated by reference in its entirety herein.


“Solid phase extraction” (SPE) for purification, as described herein, refers to a sample preparation process in which compounds that are dissolved or suspended in a liquid mixture are separated from other compounds in the mixture according to their physical and chemical properties. For example, analytical laboratories can use solid phase extraction to concentrate and purify samples for analysis. Solid phase extraction can also be used to isolate analytes of interest from a wide variety of matrices, including urine, blood, water, beverages, soil, and animal tissue, for example. In the embodiments herein, Compound 1 that is in cell lysate or in the cell media can be purified by solid phase extraction.


SPE uses the affinity of solutes dissolved or suspended in a liquid (known as the mobile phase) for a solid through which the sample is passed (known as the stationary phase) to separate a mixture into desired and undesired components. SPE can also be used and applied directly in gas-solid phase and liquid-solid phase, or indirectly to solid samples by using, e.g., thermodesorption with subsequent chromatographic analysis. This can result in either the desired analytes of interest or undesired impurities in the sample are retained on the stationary phase. The portion that passes through the stationary phase can be collected or discarded, depending on whether it contains the desired analytes or undesired impurities. If the portion retained on the stationary phase includes the desired analytes, they can then be removed from the stationary phase for collection in an additional step, in which the stationary phase is rinsed with an appropriate eluent.


Ways that the solid phase extraction can be performed are not limited. Without being limiting, the procedures may include: Normal phase SPE procedure, Reversed phase SPE, Ion exchange SPE, Anion exchange SPE, Cation exchange, and Solid-phase microextraction. Solid phase extraction is described in Sajid et al., and Plotka-Wasylka J et al. (Anal Chim Acta. 2017 May 1; 965:36-53, Crit Rev Anal Chem. 2017 Apr. 11:1-11; incorporated by reference in its entirety).


In some embodiments, the compound 1 that is produced by the cell is purified by solid phase extraction. In some embodiments, the purity of compound 1, for example purified by solid phase extraction is 70%, 80%, 90% or 100% pure or any level of purity defined by any aforementioned values.


“Fermentation” as described herein, refers broadly to the bulk growth of host cells in a host medium to produce a specific product. In the embodiments herein, the final product produced is Compound 1. This can also include methods that occur with or without air and can be carried out in an anaerobic environment, for example. The whole cells (recombinant host cells) may be in fermentation broth or in a reaction buffer.


Compound 1 and intermediate mogroside compound for the production of Compound 1 can be isolated by collection of intermediate mogroside compounds and Compound 1 from the recombinant cell lysate or from the supernatant. The lysate can be obtained after harvesting the cells and subjecting the cells to lysis by shear force (French press cell or sonication) or by detergent treatment. The lysate can then be filtered and treated with ammonium sulfate to remove proteins, and fractionated on a C18 HPLC (5×10 cm Atlantis prep T3 OBD column, 5 um, Waters) and by injections using an A/B gradient (A=water B=acetonitrile) of 10→30% B over 30 minutes, with a 95% B wash, followed by re-equilibration at 1% (total run time=42 minutes). The runs can be collected in tared tubes (12 fractions/plate, 3 plates per run) at 30 mL/fraction. The lysate can also be centrifuged to remove solids and particulate matter.


Plates can then be dried in the Genevac HT12/HT24. The desired compound is expected to be eluted in Fraction 21 along with other isomers. The pooled Fractions can be further fractionated in 47 runs on fluoro-phenyl HPLC column (3×10 cm, Xselect fluoro-phenyl OBD column, 5 um, Waters) using an A/B gradient (A=water, B=acetonitrile) of 15→30% B over 35 minutes, with a 95% B wash, followed by re-equilibration at 15% (total run time=45 minutes). Each run was collected in 12 tared tubes (12 fractions/plate, 1 plate per run) at 30 mL/fraction. Fractions containing the desired peak with the desired purity can be pooled based on UPLC analysis and dried under reduced pressure to give a whitish powdery solid. The pure compound can be re-suspended/dissolved in 10 mL of water and lyophilized to obtain at least a 95% purity.


As used herein, a “glycosidic bond” refers to a covalent bond connecting two furanose and/or pyranose groups together. Generally, a glycosidic bond is the bond between the anomeric carbon of one furanose or pyranose moiety and an oxygen of another furanose or pyranose moiety. Glycosidic bonds are named using the numbering of the connected carbon atoms, and the alpha/beta orientation. α- and β-glycosidic bonds are distinguished based on the relative stereochemistry of the anomeric position and the stereocenter furthest from C1 in the ring. For example, sucrose is a disaccharide composed of one molecule of glucose and one molecule of fructose connected through an alpha 1-2 glycosidic bond, as shown below.




embedded image


An example of a beta 1-4 glycosidic bond can be found in cellulose:




embedded image


As used in the specification and the appended claims, the singular forms “a,” “an” and “the” include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to “an aromatic compound” includes mixtures of aromatic compounds.


Often, ranges are expressed herein as from “about” one particular value, and/or to “about” another particular value. When such a range is expressed, another embodiment includes from the one particular value and/or to the other particular value. Similarly, when values are expressed as approximations, by use of the antecedent “about,” it will be understood that the particular value forms another embodiment. It will be further understood that the endpoints of each of the ranges are significant both in relation to the other endpoint, and independently of the other endpoint.


“Codon optimization” as described herein, refers to the design process of altering codons to codons known to increase maximum protein expression efficiency. In some alternatives, codon optimization for expression in a cell is described, wherein codon optimization can be performed by using algorithms that are known to those skilled in the art so as to create synthetic genetic transcripts optimized for high mRNA and protein yield in humans. Codons can be optimized for protein expression in a bacterial cell, mammalian cell, yeast cell, insect cell, or plant cell, for example. Programs containing algorithms for codon optimization in humans are readily available. Such programs can include, for example, OptimumGene™ or GeneGPS® algorithms. Additionally codon optimized sequences can be obtained commercially, for example, from Integrated DNA Technologies. In some of the embodiments herein, a recombinant cell for the production of Compound 1 comprises genes encoding enzymes for synthesis, wherein the genes are codon optimized for expression. In some embodiments, the genes are codon optimized for expression in bacterial, yeast, fungal or insect cells.


As used herein, the terms “nucleic acid,” “nucleic acid molecule,” and “polynucleotide” are interchangeable and refer to any nucleic acid, whether composed of phosphodiester linkages or modified linkages such as phosphotriester, phosphoramidate, siloxane, carbonate, carboxymethylester, acetamidate, carbamate, thioether, bridged phosphoramidate, bridged methylene phosphonate, bridged phosphoramidate, bridged phosphoramidate, bridged methylene phosphonate, phosphorothioate, methylphosphonate, phosphorodithioate, bridged phosphorothioate or sultone linkages, and combinations of such linkages. The terms “nucleic acid” and “polynucleotide” also specifically include nucleic acids composed of bases other than the five biologically occurring bases (adenine, guanine, thymine, cytosine and uracil).


Non-limiting examples of polynucleotides include deoxyribonucleic acid (DNA) or ribonucleic acid (RNA), oligonucleotides, fragments generated by the polymerase chain reaction (PCR), and fragments generated by any of ligation, scission, endonuclease action, and exonuclease action. Nucleic acid molecules can be composed of monomers that are naturally-occurring nucleotides (such as DNA and RNA), or analogs of naturally-occurring nucleotides (e.g., enantiomeric forms of naturally-occurring nucleotides), or a combination of both. Modified nucleotides can have alterations in sugar moieties and/or in pyrimidine or purine base moieties. Sugar modifications include, for example, replacement of one or more hydroxyl groups with halogens, alkyl groups, amines, and azido groups, or sugars can be functionalized as ethers or esters. Moreover, the entire sugar moiety can be replaced with sterically and electronically similar structures, such as aza-sugars and carbocyclic sugar analogs. Examples of modifications in a base moiety include alkylated purines and pyrimidines, acylated purines or pyrimidines, or other well-known heterocyclic substitutes. Nucleic acid monomers can be linked by phosphodiester bonds or analogs of such linkages. Analogs of phosphodiester linkages include phosphorothioate, phosphorodithioate, phosphoroselenoate, phosphorodiselenoate, phosphoroanilothioate, phosphoranilidate, phosphoramidate, and the like. The term “nucleic acid molecule” also includes so-called “peptide nucleic acids,” which comprise naturally-occurring or modified nucleic acid bases attached to a polyamide backbone. Nucleic acids can be either single stranded or double stranded. In some alternatives, a nucleic acid sequence encoding a fusion protein is provided. In some alternatives, the nucleic acid is RNA or DNA. In some embodiments, the nucleic acid comprises any one of SEQ ID NOs: 1-1023.


“Coding for” or “encoding” are used herein, and refers to the property of specific sequences of nucleotides in a polynucleotide, such as a gene, a cDNA, or an mRNA, to serve as templates for synthesis of other macromolecules such as a defined sequence of amino acids. Thus, a gene codes for a protein if transcription and translation of mRNA corresponding to that gene produces the protein in a cell or other biological system. In some embodiments herein, a recombinant cell is provided, wherein the recombinant cell comprises genes encoding for enzymes such as dextransucrase, UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, dextranases, and/or UGT. In some embodiments, the transglucosidases comprises an amino acid sequence set forth by any one of SEQ ID NOs: 163-290 and 723. In some embodiments, the CGTases are encoded by or have the sequence of any one of SEQ ID NOs: 1, 3, 78-101, 147 and 154. In some embodiments, the genes encoding the enzymes such as dextransucrase, UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, dextranases, and/or UGT are codon optimized for expression in the host cell. A “nucleic acid sequence coding for a polypeptide” includes all nucleotide sequences that are degenerate versions of each other and that code for the same amino acid sequence.


Optimization can also be performed to reduce the occurrence of secondary structure in a polynucleotide. In some alternatives of the method, optimization of the sequences in the vector can also be performed to reduce the total GC/AT ratio. Strict codon optimization can lead to unwanted secondary structure or an undesirably high GC content that leads to secondary structure. As such, the secondary structures affect transcriptional efficiency. Programs such as GeneOptimizer can be used after codon usage optimization, for secondary structure avoidance and GC content optimization. These additional programs can be used for further optimization and troubleshooting after an initial codon optimization to limit secondary structures that can occur after the first round of optimization. Alternative programs for optimization are readily available. In some alternatives of the method, the vector comprises sequences that are optimized for secondary structure avoidance and/or the sequences are optimized to reduce the total GC/AT ratio and/or the sequences are optimized for expression in a bacterial or yeast cell.


“Vector,” “Expression vector” or “construct” is a nucleic acid used to introduce heterologous nucleic acids into a cell that has regulatory elements to provide expression of the heterologous nucleic acids in the cell. Vectors include but are not limited to plasmid, minicircles, yeast, and viral genomes. In some alternatives, the vectors are plasmid, minicircles, yeast, or genomes. In some alternatives, the vector is for protein expression in a bacterial system such as E. coli. In some alternatives, the vector is for protein expression in a bacterial system, such as E. coli. In some alternatives, the vector is for protein expression in a yeast system. In some embodiments, the vector for expression is a viral vector. In some embodiments the vector is a recombinant vector comprising promoter sequences for upregulation of expression of the genes. “Regulatory elements” can refer to the nucleic acid that has nucleotide sequences that can influence the transcription or translation initiation and rate, stability and mobility of a transcription or translation product.


“Recombinant host” or “recombinant host cell” as described herein is a host, the genome of which has been augmented by at least one incorporated DNA sequence. Said incorporated DNA sequence may be a heterologous nucleic acid encoding one or more polypeptides. Such DNA sequences include but are not limited to genes that are not naturally present, DNA sequences that are not normally transcribed into RNA or translated into a protein (“expressed”), and other genes or DNA sequences which one desires to introduce into the nonrecombinant host. In some embodiments, the recombinant host cell is used to prevent expression problems such as codon-bias. There are commercial hosts for expression of proteins, for example, BL21-CodonPlus™ cells, tRNA-Supplemented Host Strains for Expression of Heterologous Genes, Rosetta™ (DE3) competent strains for enhancing expression of proteins, and commercial yeast expression systems in the genera Saccharomyces, Pichia, Kluyveromyces, Hansenula and Yarrowia.


The recombinant host may be a commercially available cell such as Rosetta cells for expression of enzymes that may have rare codons.


In some embodiments, the recombinant cell comprises a recombinant gene for the production of cytochrome P450 polypeptide comprising the amino acid sequence of any one of CYP533, CYP937, CYP1798, CYP1994, CYP2048, CYP2740, CYP3404, CYP3968, CYP4112, CYP4149, CYP4491, CYP5491, CYP6479, CYP7604, CYP8224, CYP8728, CYP10020, and CYP10285. In some embodiments, the P450 polypeptide is encoded in genes comprising any one of the sequences set forth in SEQ ID Nos: 31-48, 316, 431, 871, 873, 875, 877, 879, 881, 883, 885, 887, and 891.


In some embodiments, the P450 enzyme is aided by at least one CYP activator, such as CPR4497. In some embodiments, the recombinant host cell further comprises a gene encoding CPR4497, wherein the gene comprises a nucleic acid sequence set forth in SEQ ID NO: 112. In some embodiments, the recombinant host cell further comprises a gene encoding CPR4497, wherein the amino acid sequence of CPR4497 is set forth in SEQ ID NO: 113.


In some embodiments, wherein the recombinant host cell is a yeast cell, the recombinant cell has a deletion of EXG1 gene and/or the EXG2 gene to prevent reduction of glucanase activity which may lead to deglucosylation of mogrosides.


The type of host cell can vary. For example, the host cell can be selected from a group consisting of Agaricus, Aspergillus, Bacillus, Candida, corynebacterium, Escherichia, Fusarium/Gibberella, Kluyveromyces, Laetiporus, Lentinus, Phaffia, Phanerochaete, Pichia, Physcomitrella, Rhodoturula, Saccharomyces, Schizosaccharomyces, Sphaceloma, Xanthophyllomyces, Yarrowia, Lentinus tigrinus, Laetiporus sulphureus, Phanerochaete chrysosporium, Pichia pastoris, Physcomitrella patens, Rhodoturula glutinis, Rhodoturula mucilaginosa, Phaffia rhodozyma, Xanthophyllomyces dendrorhous, Fusarium fujikuroi/Gibberella fujikuroi, Candida utilis, Yarrowia lipolytica, Siraitia, Momordica, Gynostemma, Cucurbita, Cucumis, Arabidopsis, Artemisia, Stevia, Panax, Withania, Euphorbia, Medicago, Chlorophytum, Eleutherococcus, Aralia, Morus, Medicago, Betula, Astragalus, Jatropha, Camellia, Hypholoma, Aspergillus, Solanum, Huperzia, Pseudostellaria, Corchorus, Hedera, Marchantia, and Morus, Trichophyton, Sanghuangporus, Taiwanofungus, Moniliophthora, Marssonina, Diplodia, Lentinula, Xanthophyllomyces, Pochonia, Colletotrichum, Diaporthe, Histoplasma, Coccidioides, Histoplasma, Sanghuangporus, Aureobasidium, Pochonia, Penicillium, Sporothrix, Metarhizium, Aspergillus, Yarrowia, Lipomyces, Aspergillus nidulans, Yarrowia lipolytica, Rhodosporin toruloides, Candida, Sacccharaomyces, Saccharomycotina, Taphrinomycotina, Schizosaccharomycetes, Komagataella, Basidiomycota, Agaricomycotina, Tremellomycetes, Pucciniomycotina, Aureobasidium, Coniochaeta, Rhodosporidium, and Microboryomycetes, Gibberella fujikuroi, Kluyveromyces lactis, Schizosaccharomyces pombe, Aspergillus niger, Saccharomyces cerevisiae, Escherichia coli, Rhodobacter sphaeroides, and Rhodobacter capsulatus. Methods to enhance product yield have been described, for example, in S. cerevisiae. Methods are known for making recombinant microorganisms.


Methods to prepare recombinant host cells from Aspergillus spp. is described in WO2014086842, incorporated by reference in its entirety herein. Nucleotide sequences of the genomes can be obtained through gene data libraries available publicly and can allow for rational design and modifications of the pathways to enhance and improve product yield.


“Culture media” as described herein, can be a nutrient rich broth for the growth and maintenance of cells during their production phase. A yeast culture for maintaining and propagating various strains, can require specific formulations of complex media for use in cloning and protein expression, and can be appreciated by those of skill in the art. Commercially available culture media can be used from ThermoFisher for example. The media can be YPD broth or can have a yeast nitrogen base. Yeast can be grown in YPD or synthetic media at 30° C.


Lysogeny broth (LB) is typically used for bacterial cells. The bacterial cells used for growth of the enzymes and mogrosides can have antibiotic resistance to prevent the growth of other cells in the culture media and contamination. The cells can have an antibiotic gene cassettes for resistance to antibiotics such as chloramphenicol, penicillin, kanamycin and ampicillin, for example.


As described herein, a “fusion protein” is a protein created through the joining of two or more nucleic acid sequences that originally coded for a portion or entire amino acid sequence of separate proteins. For example, a fusion protein can contain a functional protein (e.g., an enzyme (including, but not limited to, cucurbitadienol synthase)) and one or more fusion domains. A fusion domain, as describe herein, can be a full length or a portion/fragment of a protein (e.g., a functional protein including but not limited to, an enzyme, a transcription factor, a toxin, and translation factor). The location of the one or more fusion domains in the fusion protein can vary. For example, the one or more fusion domains can be at the N- and/or C-terminal regions (e.g., N- and/or C-termini) of the fusion protein. The one or more fusion domains can also be at the central region of the fusion protein. The fusion domain is not required to be located at the terminus of the fusion protein. A fusion domain can be selected so as to confer a desired property. For example, a fusion domain may affect (e.g., increase or decrees) the enzymatic activity of an enzyme that it is fused to, or affect (e.g., increase or decrease) the stability of a protein that it is fused to. A fusion domain may be a multimerizing (e.g., dimerizing and tetramerizing) domain and/or functional domains. In some embodiments, the fusion domain may enhance or decrease the multimerization of the protein that it is fused to. As a non-limiting example, a fusion protein can contain a full length protein A and a fusion domain fused to the N-terminal region and/or C-terminal region of the full length protein A. In some examples, a fusion protein contains a partial sequence of protein A and a fusion domain fused to the N-terminal region and/or C-terminal region (e.g., the N-terminus and C-terminus) of the partial sequence of protein A. The fusion domain can be, for example, a portion or the entire sequence of protein A, or a portion or the entire sequence of a protein different from protein A. In some embodiments, one or more of the enzymes suitable for use in the methods, systems and compositions disclosed herein can be a fusion protein. In some embodiments, the fusion protein is encoded by a nucleic acid sequence having at least 70%, 80%, 90%, 95%, or 99% sequence identity to one of the nucleic acid sequences listed in Table 1. In some embodiments, the fusion protein comprise an amino acid sequence having at least 70%, 80%, 90%, 95%, or 99% sequence identity to one of the amino acid sequences listed in Table 1. In some embodiments, the fusion protein comprises an amino acid protein sequence having at least 80%, 90%, 95%, or 99% sequence identity to one of the amino acid sequences listed in Table 1, and a fusion domain at N-, C-, or both terminal regions of the fusion protein. In some embodiments, the fusion protein comprises one of the amino acid protein sequences listed in Table 1, and a fusion domain located at N-, C-, or both terminal regions of the fusion protein.


The length of the fusion domain can vary, for example, from 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90, 100, 125, 150, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, or a range between any of these two numbers, amino acids. In some embodiments, the fusion domain is about 3, 4, 5, 5, 6, 7, 8, 9, 10, 12, 15, 20, 25, 30, 40, 50, or a range between any two of these numbers, amino acids in length. In some embodiments, the fusion domain is a substantial portion or the entire sequence of a functional protein (for example, an enzyme, a transcription factor, or a translation factor). In some embodiments, the fusion protein is a protein having cucurbitadienol synthase activity.


Optimizing cell growth and protein expression techniques in culture media are also contemplated. For growth in culture media, cells such as yeast can be sensitive to low pH (Narendranath et al., Appl Environ Microbiol. 2005 May; 71(5): 2239-2243; incorporated by reference in its entirety). During growth, yeast must maintain a constant intracellular pH. There are many enzymes functioning within the yeast cell during growth and metabolism. Each enzyme works best at its optimal pH, which is acidic because of the acidophilic nature of the yeast itself. When the extracellular pH deviates from the optimal level, the yeast cell needs to invest energy to either pump in or pump out hydrogen ions in order to maintain the optimal intracellular pH. As such media containing buffers to control for the pH would be optimal. Alternatively, the cells can also be transferred into a new media if the monitored pH is high.


Growth optimization of bacterial and yeast cells can also be achieved by the addition of nutrients and supplements into a culture media. Alternatively, the cultures can be grown in a fermenter designed for temperature, pH control and controlled aeration rates. Dissolved oxygen and nitrogen can flowed into the media as necessary.


The term “Operably linked” as used herein refers to functional linkage between a regulatory sequence and a heterologous nucleic acid sequence resulting in expression of the latter.


“Mogrosides” and “mogroside compounds” are used interchangeably herein and refer to a family of triterpene glycosides. Non-limiting exemplary examples of mogrosides include such as Mogroside V, Siamenoside I, Mogroside IVE, Iso-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III, which have been identified from the fruits of Siraitia grosvenorii (Swingle) that are responsible for the sweetness of the fruits. In the embodiments herein, mogroside intermediates can be used in the in vivo, ex vivo, or in vitro production of Compound 1 having the structure of:




embedded image


In some embodiments, a recombinant cell for producing Compound 1, further produces mogrosides and comprises genes encoding enzymes for the production of mogrosides. Recombinant cells capable of the production of mogrosides are further described in WO2014086842, incorporated by reference in its entirety herein. In some embodiments, the recombinant cell is grown in a media to allow expression of the enzymes and production of Compound 1 and mogroside intermediates. In some embodiments, Compound 1 is obtained by lysing the cell with shear force (i.e. French press cell or sonication) or by detergent lysing methods. In some embodiments, the cells are supplemented in the growth media with precursor molecules such as mogrol to boost production of Compound 1.


“Promoter” as used herein refers to a nucleotide sequence that directs the transcription of a structural gene. In some alternatives, a promoter is located in the 5′ non-coding region of a gene, proximal to the transcriptional start site of a structural gene. Sequence elements within promoters that function in the initiation of transcription are often characterized by consensus nucleotide sequences. Without being limiting, these promoter elements can include RNA polymerase binding sites, TATA sequences, CAAT sequences, differentiation-specific elements (DSEs; McGehee et al., Mol. Endocrinol. 7:551 (1993); hereby expressly incorporated by reference in its entirety), cyclic AMP response elements (CREs), serum response elements (SREs; Treisman et al., Seminars in Cancer Biol. 1:47 (1990); incorporated by reference in its entirety), glucocorticoid response elements (GREs), and binding sites for other transcription factors, such as CRE/ATF (O'Reilly et al., J. Biol. Chem. 267:19938 (1992); incorporated by reference in its entirety), AP2 (Ye et al., J. Biol. Chem. 269:25728 (1994); incorporated by reference in its entirety), SP1, cAMP response element binding protein (CREB; Loeken et al., Gene Expr. 3:253 (1993); hereby expressly incorporated by reference in its entirety) and octamer factors (see, in general, Watson et al., eds., Molecular Biology of the Gene, 4th ed. (The Benjamin/Cummings Publishing Company, Inc. 1987; incorporated by reference in its entirety)), and Lemaigre and Rousseau, Biochem. J. 303:1 (1994); incorporated by reference in its entirety). As used herein, a promoter can be constitutively active, repressible or inducible. If a promoter is an inducible promoter, then the rate of transcription increases in response to an inducing agent. In contrast, the rate of transcription is not regulated by an inducing agent if the promoter is a constitutive promoter.


A “ribosome skip sequence” as described herein refers to a sequence that during translation, forces the ribosome to “skip” the ribosome skip sequence and translate the region after the ribosome skip sequence without formation of a peptide bond. Several viruses, for example, have ribosome skip sequences that allow sequential translation of several proteins on a single nucleic acid without having the proteins linked via a peptide bond. As described herein, this is the “linker” sequence. In some alternatives of the nucleic acids provided herein, the nucleic acids comprise a ribosome skip sequence between the sequences for the genes for the enzymes described herein, such that the proteins are co-expressed and not linked by a peptide bond. In some alternatives, the ribosome skip sequence is a P2A, T2A, E2A or F2A sequence. In some alternatives, the ribosome skip sequence is a T2A sequence.


Compound 1


As disclosed herein, Compound 1 is a compound having the structure of:




embedded image



or a salt thereof.


Compound 1 is a high-intensity sweetener the can be used in a wide variety of products in which a sweet taste is desired. Compound 1 provides a low-calorie advantage to other sweeteners such as sucrose or fructose.


In some embodiments, Compound 1 is in an isolated and purified form. In some embodiments, Compound 1 is present in a composition in which Compound 1 is substantially purified.


In some embodiments, Compound 1 or salts thereof are isolated and is in solid form. In some embodiments, the solid form is amorphous. In some embodiments, the solid form is crystalline. In some embodiments, the compound is in the form of a lyophile. In some embodiments, Compound 1 is isolated and within a buffer.


The skilled artisan will recognize that some structures described herein may be resonance forms or tautomers of compounds that may be fairly represented by other chemical structures, even when kinetically; the artisan recognizes that such structures may only represent a very small portion of a sample of such compound(s). Such compounds are considered within the scope of the structures depicted, though such resonance forms or tautomers are not represented herein.


Isotopes may be present in Compound 1. Each chemical element as represented in a compound structure may include any isotope of said element. For example, in a compound structure a hydrogen atom may be explicitly disclosed or understood to be present in the compound. At any position of the compound that a hydrogen atom may be present, the hydrogen atom can be any isotope of hydrogen, including but not limited to hydrogen-1 (protium) and hydrogen-2 (deuterium). Thus, reference herein to a compound encompasses all potential isotopic forms unless the context clearly dictates otherwise. In some embodiments, compounds described herein are enriched in one or more isotopes relative to the natural prevalence of such isotopes. In some embodiments, the compounds described herein are enriched in deuterium. In some embodiments, greater than 0.0312% of hydrogen atoms in the compounds described herein are deuterium. In some embodiments, greater than 0.05%, 0.08%, or 0.1% of hydrogen atoms in the compounds described herein are deuterium.


In some embodiments, Compound 1 is capable of forming acid and/or base salts by virtue of the presence of amino and/or carboxyl groups or groups similar thereto.


In some embodiments, Compound 1 is substantially isolated. In some embodiments, Compound 1 is substantially purified. In some embodiments, the compound is in the form of a lyophile. In some embodiments, the compound is crystalline. In some embodiments, the compound is amorphous.


Production Compositions


In some embodiments, the production composition contains none, or less than a certain amount, of undesirable compounds. In some embodiments, the composition contains, or does not contain, one or more isomers of Mogroside I, Mogroside II, and Mogroside III. In some embodiments, the composition contains a weight percent of less than 5%, 3%, 2%, 1%, 0.5%, 0.3%, 0.1%, 800 ppm, 500 ppm, 200 ppm, or 100 ppm of all isomers of Mogroside I, Mogroside II, and Mogroside III. In some embodiments, the composition contains, or does not contain, one or more of Mogroside IIIE, 11-oxo-Mogroside IIIE, Mogroside IIIA2, Mogroside IE, Mogroside IIE, and 11-oxo-mogrol. In some embodiments, the composition contains a weight percent of less than 5%, 3%, 2%, 1%, 0.5%, 0.3%, 0.1%, 800 ppm, 500 ppm, 200 ppm, or 100 ppm of one or more of Mogroside IIIE, 11-oxo-Mogroside IIIE, Mogroside IIIA2, Mogroside IE, Mogroside IIE, and 11-oxo-mogrol. In some embodiments, the composition contains a weight percent of less than 5%, 3%, 2%, 1%, 0.5%, 0.3%, 0.1%, 800 ppm, 500 ppm, 200 ppm, or 100 ppm of Mogroside IIIE. In some embodiments, the composition contains a weight percent of less than 5%, 3%, 2%, 1%, 0.5%, 0.3%, 0.1%, 800 ppm, 500 ppm, 200 ppm, or 100 ppm of 11-oxo-Mogroside IIIE. In some embodiments, the composition contains a weight percent of less than 5%, 3%, 2%, 1%, 0.5%, 0.3%, 0.1%, 800 ppm, 500 ppm, 200 ppm, or 100 ppm of 11-oxo-mogrol.


In some embodiments, the production composition is in solid form, which may by crystalline or amorphous. In some embodiments, the composition is in particulate form. The solid form of the composition may be produced using any suitable technique, including but not limited to re-crystallization, filtration, solvent evaporation, grinding, milling, spray drying, spray agglomeration, fluid bed agglomeration, wet or dry granulation, and combinations thereof. In some embodiments, a flowable particulate composition is provided to facilitate use in further food manufacturing processes. In some such embodiments, a particle size between 50 μm and 300 μm, between 80 μm and 200 μm, or between 80 μm and 150 μm is generated.


Some embodiments provide a production composition comprising Compound 1 that is in solution form. For example, in some embodiments a solution produced by one of the production processes described herein is used without further purification. In some embodiments, the concentration of Compound 1 in the solution is greater than 300 ppm, 500 ppm, 800 ppm, 0.1%, 0.5%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, or 20% by weight. In some embodiments, the concentration of all isomers of Mogroside I, Mogroside II, and Mogroside III is less than 5%, 3%, 2%, 1%, 0.5%, 0.3%, 0.1%, 800 ppm, 500 ppm, 200 ppm, or 100 ppm. In some embodiments, the concentration of one or more of Mogroside IIIE, 11-oxo-Mogroside IIIE, Mogroside IIIA2, Mogroside IE, Mogroside IIE, and 11-oxo-mogrol in the production composition is less than 5%, 3%, 2%, 1%, 0.5%, 0.3%, 0.1%, 800 ppm, 500 ppm, 200 ppm, 100 ppm, 50 ppm, 30 ppm, 20 ppm, 10 ppm, 5 ppm, 1 ppm, or 0.1 ppm of one or more of Mogroside IIIE, 11-oxo-Mogroside IIIE, Mogroside IIIA2, Mogroside IE, Mogroside IIE, and 11-oxo-mogrol. In some embodiments, the concentration of Mogroside IIIE is less than 5%, 3%, 2%, 1%, 0.5%, 0.3%, 0.1%, 800 ppm, 500 ppm, 200 ppm, 100 ppm, 50 ppm, 30 ppm, 20 ppm, 10 ppm, 5 ppm, 1 ppm, or 0.1 ppm. In some embodiments, the concentration of 11-oxo-Mogroside IIIE is less than 5%, 3%, 2%, 1%, 0.5%, 0.3%, 0.1%, 800 ppm, 500 ppm, 200 ppm, 100 ppm, 50 ppm, 30 ppm, 20 ppm, 10 ppm, 5 ppm, 1 ppm, or 0.1 ppm. In some embodiments, the concentration of 11-oxo-mogrol is less than 5%, 3%, 2%, 1%, 0.5%, 0.3%, 0.1%, 800 ppm, 500 ppm, 200 ppm, 100 ppm, 50 ppm, 30 ppm, 20 ppm, 10 ppm, 5 ppm, 1 ppm, or 0.1 ppm.


Methods of Producing Compound 1 and Intermediate Mogroside Compounds


In some embodiments, Compound 1 is produced by contact of various starting and/or intermediate compounds with one or more enzymes. The contact can be in vivo (e.g., in a recombinant cell) or in vitro. The starting and intermediate compounds for producing Compound 1 can include, but are not limited to, Mogroside V, Mogroside IIE, Mogroside IIIE, Siamenoside I, Mogroside VI isomer, Mogroside IIA, Mogroside IVE, or Mogroside IVA.


In some embodiments, Compound 1 as disclosed herein is produced in recombinant host cells in vivo as described herein or by modification of these methods. Ways of modifying the methodology include, among others, temperature, solvent, reagents etc., known to those skilled in the art. The methods shown and described herein are illustrative only and are not intended, nor are they to be construed, to limit the scope of the claims in any manner whatsoever. Those skilled in the art will be able to recognize modifications of the disclosed methods and to devise alternate routes based on the disclosures herein; all such modifications and alternate routes are within the scope of the claims.


In some embodiments, Compound 1 disclosed herein is obtained by purification and/or isolation from a recombinant bacterial cell, yeast cell, plant cell, or insect cell. In some embodiments, the recombinant cell is from Siraitia grosvenorii. In some such embodiments, an extract obtained from Siraitia grosvenorii may be fractionated using a suitable purification technique. In some embodiments, the extract is fractionated using HPLC and the appropriate fraction is collected to obtain the desired compound in isolated and purified form.


In some embodiments, Compound 1 is produced by enzymatic modification of a compound isolated from Siraitia grosvenorii. For example, in some embodiments, Compound 1 isolated from Siraitia grosvenorii is contacted with one or more enzymes to obtain the desired compounds. The contact can be in vivo (e.g., in a recombinant cell) or in vitro. The starting and intermediate compounds for producing Compound 1 can include, but are not limited to, Mogroside V, Mogroside IIE, Mogroside IIIE, Siamenoside I, Mogroside VI isomer, Mogroside IIA, Mogroside IVE, or Mogroside IVA. One or more of these compounds can be made in vivo. Enzymes suitable for use to generate compounds described herein can include, but are not limited to, a pectinase, a β-galactosidase (e.g., Aromase), a cellulase (e.g., Celluclast), a clyclomatlodextrin glucanotransferase (e.g., Toruzyme), an invertase, a glucostransferase (e.g., UGT76G1), a dextransucrase, a lactase, an arabanse, a xylanase, a hemicellulose, an amylase, or a combination thereof. In some embodiments, the enzyme is a Toruzyme comprises an amino acid sequence set forth in any one of SEQ ID NO: 89-94.


Some embodiments provide a method of making Compound 1,




embedded image



the method comprises fractionating an extract of Siraitia grosvenorii on an HPLC column and collecting an eluted fraction comprising Compound 1.


Some embodiments provide a method of making Compound 1,




embedded image



wherein the method comprises treating Mogroside IIIE with the glucose transferase enzyme UGT76G1. In some embodiments, UGT76G1 is encoded by a sequence set forth in SEQ ID NO: 440. In some embodiments, UGT76G1 comprises an amino acid sequence set forth in SEQ ID NO: 439.


Various mogroside compounds can be used as intermediate compounds for producing Compound 1. A non-limiting example of such mogroside compounds is Compound 3 having the structure of:




embedded image



In some embodiments, a method for producing Compound 3 comprises contacting mogroside IIIE with a cell (e.g., a recombinant host cell) that expresses one or more cyclomaltodextrin glucanotransferases. In some embodiments, the cyclomaltodextrin glucanotransferase comprises an amino acid sequence set forth in SEQ ID NO: 95.


Various mogroside compounds can be used as intermediate compounds for producing Compound 1. One non-limiting example of such mogroside compounds is Compound 12 having the structure of:




embedded image



In some embodiments, a method for producing Compound 12 comprises contacting mogroside VI with a cell (e.g., a recombinant host cell) that expresses one or more invertase.


Various mogroside compounds can be used as intermediate compounds for producing Compound 1. One non-limiting example of such mogroside compounds is Compound 5 having the structure of:




embedded image


In some embodiments, a method for producing Compound 5 comprises contacting mogroside IIIE with a cell (e.g., a recombinant host cell) that expresses one or more cyclomaltodextrin glucanotransferase. In some embodiments, the method is performed in the presence of starch.


Various mogroside compounds can be used as intermediate compounds for producing Compound 1. One non-limiting example of such mogroside compounds is Compound 4 having the structure of:




embedded image


In some embodiments, a method or producing Compound comprises contacting mogroside IIIE with a cell (e.g., a recombinant host cell) that expresses one or more cyclomaltodextrin glucanotransferase. In some embodiments, the method is performed in the presence of starch.


Hydrolysis of Hyper-Glycosylated Mogrosides


In some embodiments, one or more hyper-glycosylated mogrosides are hydrolyzed to Mogroside IIIE by contact with one or more hydrolase enzymes. In some embodiments, the hyper-glycosylated mogrosides are selected from a mogroside IV, a mogroside V, a mogroside VI, and combinations thereof. In some embodiments, the hyper-glycosylated mogrosides are selected from Mogroside V, Siamenoside I, Mogroside IVE, Iso-mogroside V, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IVE, and combinations thereof.


It has been surprisingly discovered that Compound 1 displays tolerance to hydrolysis by certain hydrolyzing enzymes, even though such enzymes display capabilities of hydrolyzing hyper-glycosylated mogrosides to Mogroside IIIE. The alpha-linked glycoside present in Compound 1 provides a unique advantage over other mogrosides (e.g., beta-linked glycosides) due to its tolerance to hydrolysis. In some embodiments, during microbial production of Compound 1, the microbial host will hydrolyze unwanted beta-linked mogrosides back to Mogroside IIIE. This will improve the purity of Compound 1 due to the following: 1) Reduction of unwanted Mogroside VI, Mogroside V, and Mogroside IV levels, 2) The hydrolysis will increase the amount of Mogroside IIIE available to be used as a precursor for production of Compound 1.



FIG. 38 illustrates the production of hyper-glycosylated mogrosides through glycosolation enzymes, which may then be hydrolyzed back to Mogroside IIIE. The result is a mixture of mogrosides with a lower than desirable yield of hyper-glycosylated mogrosides. The hydrolase enzymes can be removed, but a mixture of mogrosides are still obtained and the lifespan of the producing organism may be reduced. However, because Compound 1 is resistant to hydrolysis, the hydrolysis can be used to drive hyper-glycosylated mogrosides to Mogroside IIIE, which can then be converted to Compound 1 (as shown in FIG. 39).


In some embodiments, the hydrolase is a β-glucan hydrolase. In some embodiments, the hydrolase is EXG1. The EXG1 protein can comprise an amino acid sequence having at least 70%, 80%, 90%, 95%, 98%, 99%, or more sequence identity to SEQ ID NO: 1013 or 1014. In some embodiments, the EXG1 protein comprises, or consists of, an amino acid sequence set forth in SEQ ID NO: 1013 or 1014. In some embodiments, the hydrolase is EXG2. The EXG2 protein can comprise an amino acid sequence having at least 70%, 80%, 90%, 95%, 98%, 99%, or more sequence identity to SEQ ID NO: 1023. In some embodiments, the EXG2 protein comprises, or consists of, an amino acid sequence set forth in SEQ ID NO: 1023. The hydrolase can be, for example, any one of the hydrolases disclosed herein.


Production of Compound 1 from Mogroside IIIE


Compound 1 can be produced from Mogroside IIIE by contact with one or more enzymes capable of converting Mogroside IIIE to Compound 1. In some embodiments, the enzyme capable of catalyzing production of Compound 1 is one or more of UDP glycosyltransferases, cyclomaltodextrin glucanotransferases (CGTases), glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases.


In some embodiments, the enzyme capable of catalyzing the production of Compound 1 is a CGTase. In some embodiments, the CGTase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to the sequence of any one of SEQ ID NOs: 1, 3, 78-101, 148, and 154. In some embodiments, the CGTase comprises, or consists of, the amino acid sequence of SEQ ID NOs: 1, 3, 78-101, 148, and 154. In some embodiments, the enzyme capable of catalyzing the production of Compound 1 is a dextransucrase. In some embodiments, the dextransucrase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity y to the sequence of any one of SEQ ID NOs: 2, 103, 106-110, 156 and 896. In some embodiments, the dextransucrase comprises, or consists of, an amino acid sequence of any one of SEQ ID NOs: 2, 103, 106-110, 156 and 896. In some embodiments, the dextransucrase is encoded by a nucleic acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 104, 105, 157, 158, and 895. In some embodiments, the dextransucrase is encoded by a nucleic acid sequence comprising, or consisting of, any one of SEQ ID NOs: 104, 105, 157, 158, and 895. In some embodiments, the enzyme capable of catalyzing the production of Compound 1 is a transglucosidase. In some embodiments, the transglucosidase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 163-290 and 723. In some embodiments, the transglucosidase comprises, or consists of, an amino acid sequence of any one of SEQ ID NOs: 163-292 and 723. Parameters for determining the percent sequence identity can be performed with ClustalW software of by Blast searched (ncbi.nih.gov). The use of these programs can determine conservation between protein homologues.


In some embodiments, the enzyme capable of catalyzing the production of Compound 1 is a uridine diphosphate-glucosyl transferase (UGT). The UGT can comprise, for example, an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 4-9, 15-19, 125, 126, 128, 129, 293-307, 407, 409, 411, 413, 439, 441, and 444. In some embodiments, UGT comprises, or consists of, the amino acid sequence of any one of SEQ ID NOs: 4-9, 10-14, 125, 126, 128, 129, 293-304, 306, 407, 409, 411, 413, 439, 441, and 444. In some embodiments, the UGT is encoded by a nucleic acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to UGT1495 (SEQ ID NO: 10), UGT1817 (SEQ ID NO: 11), UGT5914 (SEQ ID NO: 12), UGT8468 (SEQ ID NO:13), UGT10391 (SEQ ID NO:14), and SEQ ID NOs: 116-124, 127, 130, 408, 410, 412, 414, 440, 442, 443, and 445. In some embodiments, the UGT is encoded by a nucleic acid sequence comprising, consisting of, any one of the nucleic acid sequence of UGT1495 (SEQ ID NO: 10), UGT1817(SEQ ID NO: 11), UGT5914 (SEQ ID NO: 12), UGT8468 (SEQ ID NO:13), UGT10391 (SEQ ID NO:14), SEQ ID NOs: 116-124, 127, 130, 408, 410, 412, 414, 440, 442, 443, and 445. In some embodiments, the enzyme can be UGT98 or UGT SK98. For example, as described herein, a recombinant host cell capable of producing Compound 1 can comprises a third gene encoding UGT98 and/or UGT SK98. In some embodiments, the UGT98 or UGT SK98 comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 9 or 16. In some embodiments, the UGT comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to UGT73C3 (SEQ ID NO: 4), UGT73C6 (SEQ ID NO: 5), UGT85C2 (SEQ ID NO: 6), UGT73C5 (SEQ ID NO: 7), UGT73E1 (SEQ ID NO: 8), UGT98 (SEQ ID NO: 9), UGT1576 (SEQ ID NO: 15), UGT SK98 (SEQ ID NO: 16), UGT430 (SEQ ID NO: 17), UGT1697 (SEQ ID NO: 18), and UGT11789 (SEQ ID NO: 19). In some embodiments, the UGT comprises, or consists of, an amino acid sequence of any one of UGT73C3 (SEQ ID NO: 4), UGT73C6 (SEQ ID NO: 5), 85C2 (SEQ ID NO: 6), UGT73C5 (SEQ ID NO: 7), UGT73E1 (SEQ ID NO: 8), UGT98 (SEQ ID NO: 9), UGT1576 (SEQ ID NO: 15), UGT SK98 (SEQ ID NO:16), UGT430 (SEQ ID NO:17), UGT1697 (SEQ ID NO: 18), and UGT11789 (SEQ ID NO:19). In some embodiments, the UGT is encoded by a nucleic acid sequence at least 70%, 80%, 90%, 95%, 98%, 99% or more sequence identity to UGT1495 (SEQ ID NO: 10), UGT1817 (SEQ ID NO: 11), UGT5914 (SEQ ID NO: 12), UGT8468 (SEQ ID NO:13) or UGT10391 (SEQ ID NO: 14). In some embodiments, the UGT is encoded by a nucleic acid sequence comprising, or consisting of, any one of the sequences of UGT1495 (SEQ ID NO: 10), UGT1817 (SEQ ID NO: 11), UGT5914 (SEQ ID NO: 12), UGT8468 (SEQ ID NO: 13), and UGT10391 (SEQ ID NO: 14). As disclosed herein, the enzyme capable of catalyzing the production of Compound 1 can comprises an amino acid sequence having at least 70%, at least 80%, at least 90%, at least 95%, at least 98%, at least 99%, or more sequence identity to any one of the UGT enzymes disclosed herein. Furthermore, a recombinant host cell capable of producing Compound 1 can comprises an enzyme comprising, or consisting of a sequence having at least 70%, at least 80%, at least 90%, at least 95%, at least 98%, at least 99%, or more sequence identity to any one of the UGT enzymes disclosed herein. In some embodiments, the recombinant host cell comprises an enzyme comprising, or consisting of a sequence of any one of the UGT enzymes disclosed herein.


In some embodiments, the method of producing Compound 1 comprises treating Mogroside IIIE with the glucose transferase enzyme UGT76G1, for example the UGT76G1 of SEQ ID NO: 439 and the UGT76G1 encoded by the nucleic acid sequence of SEQ ID NO: 440.


Enzymes for the Production of Mogroside Compounds and Compound 1


As described herein, the enzymes of UDP glycosyltransferases, cyclomaltodextrin glucanotransferases (CGTases), glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases can comprise the amino acid sequences described in the table of sequences herein and can also be encoded by the nucleic acid sequences described in the Table of sequences. Additionally the enzymes can also include functional homologues with at least 70% sequence identity to the amino acid sequences described in the table of sequences. Parameters for determining the percent sequence identity can be performed with ClustalW software of by Blast searched (ncbi.nih.gov). The use of these programs can determine conservation between protein homologues.


In some embodiments, the transglucosidases comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 163-290 and 723. In some embodiments, the CGTase comprises, or consists of, an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 1, 3, 78-101, and 154. In some embodiments, the transglucosidases comprise an amino acid sequence or is encoded by a nucleic acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 163-290 and 723.


The methods herein also include incorporating genes into the recombinant cells for producing intermediates such as pyruvate, acetyl-coa, citrate, and other TCA intermediates (Citric acid cycle). Intermediates can be further used to produce mogroside compounds for producing Compound 1. Methods for increasing squalene content are described in Gruchattka et al. and Rodriguez et al. (PLoS One. 2015 Dec. 23; 10(12; Microb Cell Fact. 2016 Mar. 3; 15:48; incorporated by reference in their entireties herein).


Expression of enzymes to produce oxidosqualene and diepoxysqualene are further contemplated. The use of enzymes to produce oxidosqualene and diepoxysqualene can be used to boost squalene synthesis by the way of squalene synthase and/or squalene epoxidase. For example, Su et al. describe the gene encoding SgSQS, a 417 amino acid protein from Siraitia grosvenorii for squalene synthase (Biotechnol Lett. 2017 Mar. 28; incorporated by reference in its entirety herein). Genetically engineering the recombinant cell for expression of HMG CoA reductase is also useful for squalene synthesis (Appl Environ Microbiol. 1997 September; 63(9):3341-4.; Front Plant Sci. 2011 Jun. 30; 2:25; FEBS J. 2008 April; 275(8):1852-9.; all incorporated by reference in their entireties herein. In some embodiments, the 2, 3-oxidosqualene or diepoxysqualene is produced by an enzyme comprising a sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 898 or 900. In some embodiments, the 2,3-oxidosqualene or diepoxysqualene is produced by an enzyme encoded by a nucleic acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 897 or 899.


Expression of enzymes to produce cucurbitadienol/epoxycucurbitadienol are also contemplated. Examples of curubitadienol synthases from C. pepo, S grosvenorii, C sativus, C melo, C moschata, and C maxim are contemplated for engineering into the recombinant cells by a vector for expression. Oxidosqualene cyclases for titerpene biosynthesis is also contemplated for expression in the recombinant cell, which would lead to the cyclization of an acyclic substrate into various polycyclic triterpenes which can also be used as intermediates for the production of Compound 1 (Org Biomol Chem. 2015 Jul. 14; 13(26):7331-6; incorporated by reference in its entirety herein).


Expression of enzymes that display epoxide hydrolase activities to make hydroxy-cucurbitadienols are also contemplated. In some embodiments herein, the recombinant cells for the production of Compound 1 further comprises genes that encode enzymes that display epoxide hydrolase activities to make hydroxy-cucurbitadienols are provided. Such enzymes are provided in Itkin et al. which is incorporated by reference in its entirety herein. The enzymes described in Itkin et al. are provided in Table 1, table of sequences, provided herein. Ikin et al., also describes enzymes for making key mogrosides, UGS families, glycosyltransferases and hydrolases that can be genetically modified for reverse reactions such as glycosylations.


The expression of enzymes in recombinant cells to that hydroxylate mogroside compounds to produce mogrol are also contemplated. These enzymes can include proteins of the CAZY family, UDP glycosyltransferases, CGTases, Glycotransferases, Dextransucrases, Cellulases, B-glucosidases, Transglucosidases, Pectinases, Dextranases, yeast and fungal hydrolyzing enzymes. Such enzymes can be used for example for hydrolyzing Mogroside V to Mogroside IIIE, in which Mogroside IIIE can be further processed to produce Compound 1, for example in vivo. In some embodiments, fungal lactases comprise an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to anyone of SEQ ID NO: 678-722.


In some embodiments, a mogrol precursor such as squalene or oxidosqualine, mogrol or mogroside is produced. The mogrol precursor can be used as a precursor in the production of Compound 1. Squalene can be produced from famesyl pyrophosphate using a squalene synthase, and oxidosqualene can be produced from squalene using a squalene epoxidase. The squalene synthase can be, for example, squalene synthase from Gynostemma pentaphyllum (protein accession number C4P9M2), a cucurbitaceae family plant. The squalene synthase can also comprise a squalene synthase from Arabidopsis thaliana (protein accession number C4P9M3), Brassica napus, Citrus macrophylla, Euphorbia tirucalli (protein accession number B9WZW7), Glycine max, Glycyrrhiza glabra (protein accession number Q42760, Q42761), Glycrrhiza uralensis (protein accession number D6QX40, D6QX41, D6QX42, D6QX43, D6QX44, D6QX45, D6QX47, D6QX39, D6QX55, D6QX38, D6QX53, D6QX37, D6QX35, B5AID5, B5AID4, B5AID3, C7EDDO, C6KE07, C6KE08, C7EDC9), Lotusjaponicas (protein accession number Q84LE3), Medicago truncatula (protein accession number Q8GSL6), Pisum sativum, Ricinus communis (protein accession number B9RHC3). Various squalene synthases have described in WO 2016/050890, the content of which is incorporated herein by reference in its entirety.


Recombinant Host Cells


Any one of the enzymes disclosed herein can be produced in vitro, ex vivo, or in vivo. For example, a nucleic acid sequence encoding the enzyme (including but not limited to any one of UGTs, CGTases, glycotransferases, dextransucrases, celluases, beta-glucosidases, amylases, transglucosidases, pectinases, dextranases, cytochrome P450, epoxide hydrolases, cucurbitadienol synthases, squalene epoxidases, squalene synthases, hydrolases, and oxidosqualene cyclases) can introduced to a host recombinant cell, for example in the form of an expression vector containing the coding nucleic acid sequence, in vivo. The expression vectors can be introduced into the host cell by, for example, standard transformation techniques (e.g., heat transformation) or by transfection. The expression systems can produce the enzymes for mogroside and Compound 1 production, in order to produce Compound 1 in the cell in vivo. Useful expression systems include, but are not limited to, bacterial, yeast and insect cell systems. For example, insect cell systems can be infected with a recombinant virus expression system for expression of the enzymes of interest. In some embodiments, the genes are codon optimized for expression in a particular cell. In some embodiments, the genes are operably linked to a promoter to drive transcription and translation of the enzyme protein. As described herein, codon optimization can be obtained, and the optimized sequence can then be engineered into a vector for transforming a recombinant host cell.


Expression vectors can further comprise transcription or translation regulatory sequences, coding sequences for transcription or translation factors, or various promoters (e.g., GPD1 promoters) and/or enhancers, to promote transcription of a gene of interest in yeast cells.


The recombinant cells as described herein are, in some embodiments, genetically modified to produce Compound 1 in vivo. Additionally, a cell can be fed a mogrol precursor or mogroside precursor during cell growth or after cell growth to boost rate of the production of a particular intermediate for the pathway for producing Compound 1 in vivo. The cell can be in suspension or immobilized. The cell can be in fermentation broth or in a reaction buffer. In some embodiments, a permeabilizing agent is used for transfer of a mogrol precursor or mogroside precursor into a cell. In some embodiments, a mogrol precursor or mogroside precursor can be provided in a purified form or as part of a composition or an extract.


The recombinant host cell can be, for example a plant, bivalve, fish, fungus, bacteria or mammalian cell. For example, the plant can be selected from Siraitia, Momordica, Gynostemma, Cucurbita, Cucumis, Arabidopsis, Artemisia, Stevia, Panax, Withania, Euphorbia, Medicago, Chlorophytum, Eleutherococcus, Aralia, Morus, Medicago, Betula, Astragalus, Jatropha, Camellia, Hypholoma, Aspergillus, Solanum, Huperzia, Pseudostellaria, Corchorus, Hedera, Marchantia, and Morus. The fungus can be selected from Trichophyton, Sanghuangporus, Taiwanofungus, Moniliophthora, Marssonina, Diplodia, Lentinula, Xanthophyllomyces, Pochonia, Colletotrichum, Diaporthe, Histoplasma, Coccidioides, Histoplasma, Sanghuangporus, Aureobasidium, Pochonia, Penicillium, Sporothrix, Metarhizium, Aspergillus, Yarrowia, and Lipomyces. In some embodiments, the fungus is Aspergillus nidulans, Yarrowia lipolytica, or Rhodosporin toruloides. In some embodiments, the recombinant host cell is a yeast cell. In some embodiments, the yeast is selected from Candida, Sacccharaomyces, Saccharomycotina, Taphrinomycotina, Schizosaccharomycetes, Komagataella, Basidiomycota, Agaricomycotina, Tremellomycetes, Pucciniomycotina, Aureobasidium, Coniochaeta, Rhodosporidium, Yarrowia, and Microboryomycetes. In some embodiments, the bacteria is selected from Frankia, Actinobacteria, Streptomyces, and Enterococcus. In some embodiments, the bacteria is Enterococcus faecalis.


In some embodiments, the recombinant genes are codon optimized for expression in a bacterial, mammalian, plant, fungal or insect cell. In some embodiments, one or more of genes comprises a functional mutation to increased activity of the encoded enzyme. In some embodiments, cultivating the recombinant host cell comprises monitoring the cultivating for pH, dissolved oxygen level, nitrogen level, or a combination thereof of the cultivating conditions.


Producing Mogrol from Squalene


Some embodiments of the method of producing Compound 1 comprises producing an intermediate for use in the production of Compound 1. The compound having the structure of:




embedded image



is produced in vivo in a recombinant host. In some embodiments, the compound is in the recombinant host cell, is secreted into the medium in which the recombinant cell is growing, or both. In some embodiments, the recombinant cell further produces intermediates such as mogroside compounds in vivo. The recombinant cell can be grown in a culture medium, under conditions in which the genes disclosed herein are expressed. Some embodiments of methods of growing the cell are described herein.


In some embodiments, the intermediate is, or comprises, at least one of squalene, oxidosqualene, curubitadienol, mogrol and mogrosides. In some embodiments, the mogroside is Mogroside IIE. As described herein, mogrosides are a family of glycosides that can be naturally isolated from a plant or a fruit, for example. As contemplated herein, the mogrosides can be produced by a recombinant host cell.


In some alternatives of the methods described herein, the recombinant host cell comprises a polynucleotide or a sequence comprising one or more of the following:


a gene encoding squalene epoxidase;


a gene encoding cucurbitadienol synthase;


a gene encoding cytochrome P450;


a gene encoding cytochrome P450 reductase; and


a gene encoding epoxide hydrolase.


In some embodiments, the squalene epoxidase comprises a sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 54. In some embodiments, the squalene epoxidase comprises a sequence from Arabidopsis thaliana (the protein accession numbers: Q9SM02, 065403, 065402, 065404, 081000, or Q9T064), Brassica napus (protein accession number 10 065727, 065726), Euphorbia tirucalli (protein accession number A7VJN1), Medicago truncatula (protein accession number Q8GSM8, Q8GSM9), Pisum sativum, and Ricinus communis (protein accession number B9R6VO, B9S7W5, B9S6Y2, B9TOY3, B9S7TO, B9SX91) and functional homologues of any of the aforementioned sharing at least 70%, such as at least 80%, for example at least 90%, such as at least 95%, for example at least 98% sequence identity therewith. In some embodiments, the squalene epoxidase comprises, or consists of an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NOs: 50-56, 60, 61, 334 or 335.


In some embodiments, the cell comprises genes encoding ERG7 (lanosterol synthase). In some embodiments, lanosterol synthase comprises a sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 111. In some embodiments, the P450 polypeptide is encoded in genes comprising a sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of Claims: 31-48. In some embodiments, the sequences can be separated by ribosome skip sequences to produce separated proteins.


In some embodiments, the recombinant host cell comprises a gene encoding a polypeptide having cucurbitadienol synthase activity. In some embodiments, the polypeptide having cucurbitadienol synthase activity comprises an amino acid sequence as set forth in any one of SEQ ID NOs: 70-73, 75-77, 319, 321, 323, 325, 327-333, 420, 422, 424, 426, 446, 902, 904, and 906. In some embodiments, the polypeptide having cucurbitadienol synthase activity comprises a sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to the sequence of any one of SEQ ID NO: 70-73, 75-77, 319, 321, 323, 325, 327-333, 420, 422, 424, 426, 446, 902, 904 and 906. In some embodiments, the polypeptide having cucurbitadienol synthase activity comprises a C-terminal portion comprising the sequence set forth in SEQ ID NO: 73. In some embodiments, the gene encoding the polypeptide having cucurbitadienol synthase activity is codon optimized. In some embodiments, the codon optimized gene comprises the nucleic acid sequence set forth in SEQ ID NO: 74.


In some embodiments, the polypeptide having cucurbitadienol synthase activity is a fusion polypeptide comprising a fusion domain fused to a cucurbitadienol synthase. The fusion domain can be fused to, for example, N-terminus or C-terminus of a cucurbitadienol synthase. The fusion domain can be located, for example, at the N-terminal region or the C-terminal region of the fusion polypeptide. The length of the fusion domain can vary. For example, the fusion domain can be, or be about, 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, 20, 25, 30, 40, 50, 100, 150, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1500, 2000, or a range between any two of these numbers amino acids long. In some embodiments, the fusion domain is 3 to 1000 amino acids long. In some embodiments, the fusion domain is 5 to 50 amino acids long. In some embodiments, the fusion domain comprises a substantial portion or the entire sequence of a functional protein. In some embodiments, the fusion domain comprises a portion or the entire sequence of a yeast protein. For example, the fusion polypeptide having cucurbitadienol synthase activity can comprise an amino acid sequence having, or having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, or 100% sequence identity to any one of SEQ ID NOs: 851, 854, 856, 1024, 859, 862, 865, 867, 915, 920, 924, 928, 932, 936, 940, 944, 948, 952, 956, 959, 964, 967, 971, 975, 979, 983, 987, 991, 995, 999, 1003, 1007, and 1011. In some embodiments, the fusion polypeptide comprises, or consists of, an amino acid sequence set forth in any one of SEQ ID NOs: 851, 854, 856, 1024, 859, 862, 865, 867, 915, 920, 924, 928, 932, 936, 940, 944, 948, 952, 956, 959, 964, 967, 971, 975, 979, 983, 987, 991, 995, 999, 1003, 1007, and 1011. In some embodiments, the fusion domain of the fusion polypeptide comprises an amino acid sequence having, or having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, or 100% sequence identity to any one of SEQ ID NOs: 866, 870, 917, 921, 925, 929, 933, 937, 941, 945, 949, 953, 957, 961, 968, 972, 976, 980, 984, 988, 992, 996, 1000, 1004, 1008, and 1012. In some embodiments, the fusion domain of the fusion polypeptide comprises, or consists of, an amino acid sequence set forth in any one of SEQ ID NOs: 866, 870, 917, 921, 925, 929, 933, 937, 941, 945, 949, 953, 957, 961, 968, 972, 976, 980, 984, 988, 992, 996, 1000, 1004, 1008, and 1012. In some embodiments, the cucurbitadienol synthase fused with the fusion domain comprises an amino acid sequence having, or having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, or 100% sequence identity to any one of SEQ ID NOs: 70-73, 75-77, 319, 321, 323, 325, 327, 329-333, 420, 422, 424, 426, 446, 902, 904, and 906. In some embodiments, the cucurbitadienol synthase fused with the fusion domain comprises, or consists of, an amino acid sequence set forth in any one of SEQ ID NOs: 70-73, 75-77, 319, 321, 323, 325, 327, 329-333, 420, 422, 424, 426, 446, 902, 904, and 906. In some embodiments, the cucurbitadienol synthase fused with the fusion domain is encoded by a gene comprising a nucleic acid sequence having, or having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, or 100% sequence identity to any one of SEQ ID NOs: 74, 320, 322, 324, 326, 328, 418, 421, 423, 425, 427, 897, 899, 901, 903, and 905. In some embodiments, the cucurbitadienol synthase fused with the fusion domain is encoded by a gene comprising, or consists of, a nucleic acid sequence set forth in any one of SEQ ID NOs: 74, 320, 322, 324, 326, 328, 418, 421, 423, 425, 427, 897, 899, 901, 903, and 905. Disclosed herein include a recombinant nucleic acid molecule comprising a nucleic acid sequence encoding a fusion polypeptide having cucurbitadienol synthase activity. Also disclosed include a recombinant cell comprising a fusion polypeptide having cucurbitadienol synthase activity or a recombinant nucleic acid molecule encoding the fusion polypeptide.


The fusion polypeptides having cucurbitadienol synthase activity disclosed herein can be used to catalyze enzymatic reactions as cucurbitadienol synthases. For example, a substrate for cucurbitadienol synthase can be contacted with one or more of these fusion polypeptide to produce reaction products. Non-limiting examples of the reaction product include curcurbitadienol, 24,25-epoxy curcurbitadienol, and any combination thereof. Non-limiting examples of the substrate for cucurbitadienol synthase include 2,3-oxidosqualene, dioxidosqualene, diepoxysqualene, and any combination thereof. In some embodiments, the substrate can be contacted with a recombinant host cell which comprises a nucleic acid sequence encoding one or more fusion polypeptides having cucurbitadienol synthase activity. The substrate can be provided to the recombinant host cells, present in the recombinant host cell, produced by the recombinant host cell, or any combination thereof.


In some embodiments, the cytochrome P450 is a CYP5491. In some embodiments, the cytochrome P540 comprises an amino acid sequence having, or having at least, 50%, 60%, 70%, 80%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to the sequence set forth in SEQ ID NO: 44 and/or SEQ ID NO:74. In some embodiments, the P450 reductase polypeptide comprises an amino acid sequence having, or having at least, 50%, 60%, 70%, 80%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 46. In some embodiments, the P450 polypeptide is encoded by a gene comprising a sequence having, or having at least, 50%, 60%, 70%, 80%, 90%, 95%, 98%, 99%, 100% or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 31-48, 316, 431, 871, 873, 875, 877, 879, 881, 883, 885, 887, and 891.


In some embodiments, the epoxide hydrolase comprises an sequence having, or having at least, 70%, 80%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 38 or 40. In some embodiments, the epoxide hydrolase comprises, or consists of, the sequence set forth in SEQ ID NO: 38 or 40.


Some Methods of Producing Squalene for Mogrol Production


Squalene is a natural 30 carbon organic molecule that can be produced in plants and animals and is a biochemical precursor to the family of steroids. Additionally, squalene can be used as precursor in mogrol syntheses in vivo in a host recombinant cell. Oxidation (via squalene monooxygenase) of one of the terminal double bonds of squalene yields 2,3-squalene oxide, which undergoes enzyme-catalyzed cyclization to afford lanosterol, which is then elaborated into cholesterol and other steroids. As described in Gruchattka et al. (“In Vivo Validation of In Silico Predicted Metabolic Engineering Strategies in Yeast: Disruption of α-Ketoglutarate Dehydrogenase and Expression of ATP-Citrate Lyase for Terpenoid Production.” PLOS ONE Dec. 23, 2015; incorporated by reference in its entirety herein), synthesis of squalene can occur initially from precursors of the glycolysis cycle to produce squalene. Squalene in turn can be upregulated by the overexpression of ATP-citrate lyase to increase the production of squalene. Some embodiments disclosed herein include enzymes for producing squalene and/or boosting the production of squalene in recombinant host cells, for example recombinant yeast cells. ATP citrate lyase can also mediate acetyl CoA synthesis which can be used for squalene and mevalonate production, which was seen in yeast, S. cerevisiae (Rodrigues et al. “ATP citrate lyase mediated cytosolic acetyl-CoA biosynthesis increases mevalonate production in Saccharomyces cerevisiae” Microb Cell Fact. 2016; 15: 48.; incorporated by reference in its entirety). On example of the gene encoding an enzyme for mediating the acetyl CoA synthesis is set forth in SEQ ID NO: 130. In some embodiments herein, the recombinant cell comprises sequences for mediating acetyl CoA synthesis.


Some embodiments disclosed herein provide methods for producing Compound 1 having the structure of:




embedded image


In some embodiments, the methods further comprises producing intermediates in the pathway for the production of compound 1 in vivo. In some embodiments, the recombinant host cell that produces Compound 1 comprises at least one enzyme capable for converting dioxidosqualene to produce 24,25 epoxy cucurbitadienol, converting oxidosqualene to cucurbitadienol, catalyzing the hydroxylation of 24,25, epoxy cucurbitadienol to 11-hydroxy-24,25 epoxy cucurbitadienol, enzyme for catalyzing the hydroxylation of cucurbitadienol to 11-hydroxy-cucurbitadienol, enzyme for the epoxidation of cucurbitadienol to 24,25 epoxy cucurbitadienol, enzymes capable of catalyzing epoxidation of 11-hydroxy-cucurbitadienol to produce 11-hydroxy-24,25 epoxy cucurbitadienol, enzymes for the conversion of 11-hydroxy-cucurbtadienol to 11-hydroxy-24,25 epoxy cucurbitadienol, enzymes for catalyzing the conversion of 11-hydroxy-24,25 epoxy cucurbitadienol to produce mogrol and/or enzymes for catalyzing the glycosylation of a mogroside precursor to produce a mogroside compound. In some embodiments, the enzyme for glycosylation is encoded by a sequence set forth in any one of SEQ ID NOs: 121, 122, 123, and 124.


In some embodiments, the enzyme for catalyzing the hydroxylation of 24,25 epoxy cucurbitadienol to form 11-hydroxy-24,25 epoxy cucurbitadinol is CYP5491. In some embodiments, the CYP5491 comprises a sequence set forth in SEQ ID NO: 49. In some embodiments, the squalene epoxidase comprises a sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to the sequence of SEQ ID NO: 54.


In some embodiments, the enzyme capable of epoxidation of 11-hydroxycucurbitadientol comprises an amino acid sequence set forth in SEQ ID NO: 74.


In some alternatives, the recombinant cell comprises genes for expression of enzymes capable of converting dioxidosqualent to 24,25 epoxy cucurbitadienol, converting oxidosqualene to cucurbitadienol, hydroxylation of 24,25 epoxy cucurbitadienol to 11-hydroxy-24,25 epoxy cucurbitadienol, hydroxylation of cucurbitadienol to produce 11-hydroxy-cucurbitadienol, epoxidation of cucurbitadienol to produce 24,25 epoxy cucurbitadienol, and/or epoxidation of 11-hydroxycucurbitadienol to produce 11-hydroxy-24,25 epoxy cucurbitadienol. In these embodiments herein, the intermediates and mogrosides are produced in vivo.


In some embodiments, a method of producing Compound 1 further comprises producing one or more of mogroside compounds and intermediates, such as oxidosqualene, dixidosqualene, cucurbitdienol, 24,25 epoxy cucurbitadienol, 11-hydrosy-cucurbitadienol, 11-hydroxy 24,25 epoxy cucurbitadienol, mogrol, and mogroside compounds.


Methods for the Production of Mogroside Compounds


Described herein include methods of producing a mogroside compound, for example, one of the mogroside compounds described in WO2014086842 (incorporated by reference in its entirety herein). The mogroside compound can be used as an intermediate by a cell to further produce Compound 1 disclosed herein.


Recombinant hosts such as microorganisms, plant cells, or plants can be used to express polypeptides useful for the biosynthesis of mogrol (the triterpene core) and various mogrol glycosides (mogrosides).


In some embodiments, the production method can comprise one or more of the following steps in any orders:


(1) enhancing levels of oxido-squalene


(2) enhancing levels of dioxido-squalene


(3) Oxido-squalene→cucurbitadienol


(4) Dioxido-squalene→24,25 epoxy cucurbitadienol


(5) Cucurbitadienol→11-hydroxy-cucurbitadienol


(6) 24,25 epoxy cucurbitadienol→11-hydroxy-24,25 epoxy cucurbitadienol


(7) 11-hydroxy-cucurbitadienol→mogrol


(8) 11-hydroxy-24,25 epoxy cucurbitadienol→mogrol


(9) mogrol→various mogroside compounds.


In the embodiments herein, the oxido-squalene, dioxido-squalene, cucurbitadienol, 24,25 epoxy cucurbitadienol or mogrol may be also produced by the recombinant cell. The method can include growing the recombinant microorganism in a culture medium under conditions in which one or more of the enzymes catalyzing step(s) of the methods of the invention, e.g. synthases, hydrolases, CYP450s and/or UGTs are expressed. The recombinant microorganism may be grown in a fed batch or continuous process. Typically, the recombinant microorganism is grown in a fermenter at a defined temperature(s) for a desired period of time in order to increase the yield of Compound 1.


In some embodiments, mogroside compounds can be produced using whole cells that are fed raw materials that contain precursor molecules to increase the yield of Compound 1. The raw materials may be fed during cell growth or after cell growth. The whole cells may be in suspension or immobilized. The whole cells may be in fermentation broth or in a reaction buffer.


In some embodiments, the recombinant host cell can comprise heterologous nucleic acid(s) encoding an enzyme or mixture of enzymes capable of catalyzing Oxido-squalene to cucurbitadienol, Cucurbitadienol to 11-hydroxycucurbitadienol, 11-hydroxy-cucurbitadienol to mogrol, and/or mogrol to mogroside. In some embodiments, the cell can further comprise Heterologous nucleic acid(s) encoding an enzyme or mixture of enzymes capable of catalyzing Dioxido-squalene to 24,25 epoxy cucurbitadienol, 24,25 epoxy cucurbitadienol to hydroxy-24,25 epoxy cucurbitadienol, 11-hydroxy-24,25 epoxy cucurbitadienol to mogrol, and/or mogrol to mogroside


The host cell can comprises a recombinant gene encoding a cucurbitadienol synthase and/or a recombinant gene encoding a cytochrome P450 polypeptide.


In some embodiments, the cell comprises a protein having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 70-73, 75-77, 319, 321, 323, 325, 327-333, 420, 422, 424, 426, 446, 902, 904, and 906 (curcurbitadienol synthase).


In some embodiments, the conversion of Oxido-squalene to cucurbitadienol is catalyzed by cucurbitadienol synthase of any one of SEQ ID NOs: 70-73, 75-77, 319, 321, 323, 325, 327-333, 420, 422, 424, 426, 446, 902, 904, and 906, or a functional homologue thereof sharing at least 70%, such as at least 80%, for example at least 90%, such as at least 95%, for example at least 98% sequence identity therewith. In some embodiments, the cucurbitadienol synthase polypeptide comprises a C-terminal portion comprising the sequence set forth in SEQ ID NO: 73. In some embodiments, the gene encoding the cucurbitadienol synthase polypeptide is codon optimized. In some embodiments, the codon optimized gene comprises the nucleic acid sequence set forth in SEQ ID NO: 74.


In some embodiments, the conversion of Cucurbitadienol to 11-hydroxy-cucurbitadienol is catalyzed CYP5491 of SEQ ID NO: 49 or a functional homologue thereof sharing at least 70%, such as at least 80%, for example at least 90%, such as at least 95%, for example at least 98% sequence identity therewith.


In some embodiments, the conversion of 11-hydroxy-cucurbitadienol to mogrol comprises a polypeptide selected from the group consisting of Epoxide hydrolase 1 of SEQ ID NO: 29, Epoxide hydrolase 2 of SEQ ID NO: 30 and functional homologues of the aforementioned sharing at least 70%, such as at least 80%, for example at least 90%, such as at least 95%, for example at least 98% sequence identity therewith. In some embodiments, the genes encoding epoxide hydrolase 1 and epoxide hydrolase 2 are codon optimized for expression. In some embodiments, the codon optimized genes for epoxide hydrolase comprise a nucleic acid sequence set forth in SEQ ID NO: 114 or 115.


In some embodiments, the epoxide hydrolase comprises an amino acid sequence as set forth in any one of SEQ ID NOs: 21-28 (Itkin et al, incorporated by reference in its entirety herein).


In some embodiments, the conversion of mogrol to mogroside is catalyzed in the host recombinant cell by one or more UGTs selected from the group consisting of UGT1576 of SEQ ID NO: 15, UGT98 of SEQ ID NO: 9, UGT SK98 of SEQ ID NO: 68 and functional homologues of the aforementioned sharing at least 70%, such as at least 80%, for example at least 90%, such as at least 95%, for example at least 98% sequence identity therewith.


In some embodiments, the host recombinant cell comprises a recombinant gene encoding a cytochrome P450 polypeptide is encoded by any one of the sequences in SEQ ID NOs: 31-48, 316, 431, 871, 873, 875, 877, 879, 881, 883, 885, 887, and 891.


In some embodiments, the host recombinant cell comprises a recombinant gene encoding squalene epoxidase polypeptide comprising the sequence in SEQ ID No: 50.


In some embodiments, the host recombinant cell comprises a recombinant gene encoding cucurbitadienol synthase polypeptide of any one of SEQ ID NOs: 70-73, 75-77, 319, 321, 323, 325, 327-333, 420, 422, 424, 426, 446, 902, 904, and 906. In some embodiments, the cucurbitadienol synthase polypeptide comprises a C-terminal portion comprising the sequence set forth in SEQ ID NO: 73. In some embodiments, the gene encoding the cucurbitadienol synthase polypeptide is codon optimized. In some embodiments, the codon optimized gene comprises the nucleic acid sequence set forth in SEQ ID NO: 74.


Production of Mogroside Compounds from Mogrol


In some embodiments, the method of producing Compound 1 comprises contacting mogroside IIIE with a first enzyme capable of catalyzing production of Compound 1 from mogroside IIIE. In some embodiments, the method is performed in vivo, wherein a recombinant cell comprises a gene encoding the first enzyme capable of catalyzing production of Compound 1 from mogroside IIIE. In some embodiments, the cell further comprises a gene encoding an enzyme capable of catalyzing production of mogroside IE1 from mogrol. In some embodiments, the enzyme comprises a sequence set forth in any one of SEQ ID NOs: 4-8.


In some embodiments, the cell further comprises enzymes to convert mogroside IE to mogroside IV, mogroside V, 11-oxo-mogroside V, and siamenoside I. In some embodiments, the enzymes for converting mogroside IIE to mogroside IV, mogroside V, 11-oxo-mogroside V, and siamenoside I are encoded by genes that comprise the nucleic acid sequences set forth in SEQ ID NOs: 9-14 and 116-120. In some embodiments, the method of producing Compound 1 comprises treating Mogroside IIIE with the glucose transferase enzyme UGT76G1.


In some embodiments, the method comprises fractionating lysate from a recombinant cell on an HPLC column and collecting an eluted fraction comprising Compound 1.


In some embodiments, the method comprises contacting mogroside IIIE with a first enzyme capable of catalyzing production of Compound 1 from mogroside IIIE. In some embodiments, contacting mogroside IIIE with the first enzyme comprises contacting mogroside IIIE with a recombinant host cell that comprises a first gene encoding the first enzyme. In some embodiments, the first gene is heterologous to the recombinant host cell. In some embodiments, the mogroside IIIE contacts with the first enzyme in a recombinant host cell that comprises a first polynucleotide encoding the first enzyme. In some embodiments, the mogroside IIIE is present in the recombinant host cell. In some embodiments, the mogroside IIIE is produced by the recombinant host cell. In some embodiments, the method comprises cultivating the recombinant host cell in a culture medium under conditions in which the first enzyme is expressed. In some embodiments, the first enzyme is one or more of UDP glycosyltransferases, cyclomaltodextrin glucanotransferases (CGTases), glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. In some embodiments, the first enzyme is a CGTase. In some embodiments, the CGTase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to the sequence of any one of SEQ ID NOs: 1, 3, 78-101, 148 and 154. In some embodiments, the transglucosidases are encoded by any one of SEQ ID NOs: 163-290 and 723. In some embodiments, the CGTases comprises, or consists of, a sequence set forth in any one of SEQ ID NOs: 1, 3, 78-101, 148, and 154. In some embodiments, the first enzyme is a dextransucrase. In some embodiments, the dextransucrase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to the sequence of any one of SEQ ID NOs: 2, 103, 106-110, 156 and 896. In some embodiments, the DexT comprises an amino acid sequence any one of SEQ ID NOs: 2, 103, 106-110, 156 and 896. In some embodiments, the DexT comprises a nucleic acid sequence set forth in SEQ ID NO: 104 or 105. In some embodiments, the dextransucrase comprises an amino acid sequence of SEQ ID NO: 2 or 106-110. In some embodiments, the first enzyme is a transglucosidase. In some embodiments, the transglucosidase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to the sequence of any one of SEQ ID NOs: 3163-291 and 723. In some embodiments, the transglucosidase comprises an amino acid sequence of SEQ ID NOs: 163-291 and 723. In some embodiments, the transglucosidases are encoded by any one of SEQ ID NOs: 163-291 and 723. In some embodiments, the transglucosidases comprises an amino acid sequence set forth by any one of SEQ ID NOs: 163-290 and 723. In some embodiments, the genes encode a CGTase comprising any one of the sequence set forth in SEQ ID NOs: 1, 3, 78-101, and 154.


In some embodiments, the method comprises contacting Mogroside IIA with the recombinant host cell to produce mogroside IIIE, wherein the recombinant host cell further comprises a second gene encoding a second enzyme capable of catalyzing production of Mogroside IIIE from Mogroside IIA. In some embodiments, the mogroside IIA is produced by the recombinant host cell. In some embodiments, the second enzyme is one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. In some embodiments, the second enzyme is a uridine diphosphate-glucosyl transferase (UGT). In some embodiments, the transglucosidases comprises an amino acid sequence set forth by any one of SEQ ID NOs: 163-290 and 723. In some embodiments, the genes encode a CGTase comprising an amino acid sequences set forth in SEQ ID NOs: 1, 3, 78-101, 148, and 154. In some embodiments, the UGT is UGT73C3 (SEQ ID NO: 4), UGT73C6 (SEQ ID NO:5, 444 or 445), 85C2 (SEQ ID NO: 6), UGT73C5 (SEQ ID NO: 7), UGT73E1 (SEQ ID NO: 8), UGT98 (SEQ ID NO: 9 or 407), UGT1576 (SEQ ID NO: 15), UGT SK98 (SEQ ID NO: 16), UGT430 (SEQ ID NO: 17), UGT1697 (SEQ ID NO: 18), or UGT11789 (SEQ ID NO: 19) or any one of SEQ ID NOs: 4, 5, 7-9, 15-19, 125, 126, 128, 129, 293-304, 306, 307, 407, 439, 441, and 444. In some embodiments, the UGT is encoded by a gene set forth in UGT1495 (SEQ ID NO: 10), UGT1817 (SEQ ID NO: 11), UGT5914 (SEQ ID NO: 12), UGT8468 (SEQ ID NO: 13) or UGT10391 (SEQ ID NO: 14).


In some embodiments, the method comprises contacting mogrol with the recombinant host cell to produce mogroside IIIE, wherein the recombinant host cell further comprises one or more genes encoding one or more enzymes capable of catalyzing production of mogroside IIIE from mogrol. In some embodiments, the mogrol is produced by the recombinant host cell. In some embodiments, the one or more enzymes comprises one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. In some embodiments, the second enzyme is a uridine diphosphate-glucosyl transferase (UGT). In some embodiments, the UGT is UGT73C3, UGT73C6, 85C2, UGT73C5, UGT73E1, UGT98, UGT1495, UGT1817, UGT5914, UGT8468, UGT10391, UGT1576, UGT SK98, UGT430, UGT1697, or UGT11789.


In some embodiments, the method comprises contacting a mogroside compound with the recombinant host cell to produce mogroside IIIE, wherein the recombinant host cell further comprises one or more genes encoding one or more enzymes capable of catalyzing production of mogroside IIIE from the mogroside compound, wherein the mogroside compound is one or more of mogroside IA1, mogroside IE1, mogroside IIA1, mogroside IIE, mogroside IIIA1, mogroside IIIA2, mogroside III, mogroside IV, mogroside IVA, mogroside V, or siamenoside. In some embodiments, the mogroside compound is produced by the recombinant host cell. In some embodiments, the one or more enzymes comprises one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. In some embodiments, the transglucosidases comprises an amino acid sequence set forth by any one of SEQ ID NOs: 163-290 and 723. In some embodiments, the genes encode a CGTase comprising an amino acid sequences set forth in SEQ ID NOs: 1, 3, 78-101, 148, and 154. In some embodiments, the method comprises contacting Mogroside IA1 with the recombinant host cell, wherein the recombinant host cell comprises a gene encoding UGT98 or UGT SK98. In some embodiments, the UGT98 or UGT SK98 enzyme comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 9, 407 or 16. In some embodiments, the contacting results in production of Mogroside IIA in the cell. In some embodiments, the one or more enzymes comprises an amino acid set forth by any one of SEQ ID NOs: 1, 3, 78-101, 106-109, 147, 154, 163-303, 405, 411, 354-405, 447-723, 770, 776, and 782.


In some embodiments, the method further comprises contacting 11-hydroxy-24,25 epoxy cucurbitadienol with the recombinant host cell, wherein the recombinant host cell further comprises a third gene encoding an epoxide hydrolase. In some embodiments, the 11-hydroxy-24,25 epoxy cucurbitadienol is produced by the recombinant host cell. In some embodiments, the method further comprises contacting 11-hydroxy-cucurbitadienol with the recombinant host cell, wherein the recombinant host cell comprises a fourth gene encoding a cytochrome P450 or an epoxide hydrolase. In some embodiments, the P450 polypeptide is encoded in genes comprising the sequence set forth in any one of SEQ ID NOs: 31-48, 316, 431, 871, 873, 875, 877, 879, 881, 883, 885, 887, and 891. In some embodiments, the 11-hydroxy-cucurbitadienol is produced by the recombinant host cell.


In some embodiments, the method further comprises contacting 3, 24, 25 trihydroxy cucurbitadienol with the recombinant host cell, wherein the recombinant host cell further comprises a fifth gene encoding a cytochrome P450. In some embodiments, the P450 polypeptide is encoded in genes comprising the sequence set forth in any one of SEQ ID NOs: 31-48, 316 and 318. In some embodiments, the 3, 24, 25 trihydroxy cucurbitadienol is produced by the recombinant host cell. In some embodiments, the contacting results in production of Mogrol in the recombinant host cell. In some embodiments, the cytochrome P450 comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 20, 308 or 315. In some embodiments, the P450 polypeptide is encoded in genes comprising the sequence set forth in any one SEQ ID NOs: 31-48, 316, 431, 871, 873, 875, 877, 879, 881, 883, 885, 887, and 891. In some embodiments, the epoxide hydrolase comprises an amino acid sequence having at least 70% of sequence identity to any one of SEQ ID NOs: 21-30 and 309-314.


In some embodiments, the method further comprises contacting cucurbitadienol with the recombinant host cell, wherein the recombinant host cell comprises a gene encoding cytochrome P450. In some embodiments, contacting results in production of 11-cucurbitadienol. In some embodiments, the 11-hydroxy cucurbitadienol is expressed in cells comprising a gene encoding CYP87D18 or SgCPR protein. In some embodiments, CYP87D18 or SgCPR comprises a sequence set forth in SEQ ID NO: 315, 872 or 874. In some embodiments, the CYP87D18 or SgCPR is encoded by SEQ ID NO: 316, 871 or 873. In some embodiments, the cucurbitadienol is produced by the recombinant host cell. In some embodiments, the gene encoding cytochrome P450 comprises a nucleic acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID Nos: 31-48, 316, 431, 871, 873, 875, 877, 879, 881, 883, 885, 887, and 891. In some embodiments, the cytochrome P450 comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NOs: 31-48, 316, 431, 871, 873, 875, 877, 879, 881, 883, 885, 887, and 891. In some embodiments, the P450 polypeptide is encoded in genes comprising the sequence set forth in any one of SEQ ID NOs: 31-48, 316, 431, 871, 873, 875, 877, 879, 881, 883, 885, 887, and 891.


In some embodiments, the method further comprises contacting 2,3-oxidosqualene with the recombinant host cell, wherein the recombinant host cell comprises a seventh gene encoding cucurbitadienol synthase. In some embodiments, he cucurbitadienol synthase comprises an amino acid sequence set forth in SEQ ID NO: 70-73, 75-77, 319, 321, 323, 325, 327-333, 420, 422, 424, 426, 446, 902, 904 or 906. In some embodiments, the cucurbitadienol synthase is encoded by any one sequence set forth in SEQ ID NOs: 74, 320, 322, 324, 326, 328, 418, 421, 423, 425, 427, 897, 899, 901, 903 and 905. In some embodiments, the contacting results in production of cucurbitadienol. In some embodiments, the 2,3-oxidosqualene is produced by the recombinant host cell. In some embodiments, the 2,3-oxidosqualene or diepoxysqualene is produced by an enzyme comprising a sequence set forth in SEQ ID NO: 898 or 900. In some embodiments, the 2,3-oxidosqualene or diepoxysqualene is produced by an enzyme encoded by a nucleic acid sequence set forth in SEQ ID NO: 897 or 899.


In some embodiments, the cucurbitadienol synthase is encoded by a gene comprising a sequence set forth in SEQ ID NO: 74. In some embodiments, the cucurbitadienol synthase is encoded by a gene comprising a nucleic acid sequence set forth in any one of SEQ ID NOs: 74, 320, 322, 324, 326, 328, 418, 421, 423, 425, 427, 897, 899, 901, 903, and 905. In some embodiments, 11-hydroxy cucurbitadienol is produced by the cell. In some embodiments, 11-OH cucurbitadienol is expressed in cells comprising a gene encoding CYP87D18 or SgCPR protein. In some embodiments, CYP87D18 or SgCPR comprises a sequence set forth in SEQ ID NO: 315, 872 or 874. In some embodiments, the CYP87D18 or SgCPR is encoded by SEQ ID NO: 316, 871 or 873. In some embodiments, the cucurbitadienol synthase polypeptide comprises a C-terminal portion comprising the sequence set forth in SEQ ID NO: 73. In some embodiments, the gene encoding the cucurbitadienol synthase polypeptide is codon optimized. In some embodiments, the codon optimized gene comprises the nucleic acid sequence set forth in SEQ ID NO: 74. In some embodiments, the cucurbitadienol synthase comprises an amino acid sequence having at least 70% sequence identity to any one of SEQ ID NOs: 70-73, 75-77, 319, 321, 323, 325, 327-333, 420, 422, 424, 426, 446, 902, 904, and 906 (which include, for example, cucurbitadienol synthases from C. pepo, S grosvenorii, C sativus, C melo, C moschata, and C maxim). In some embodiments, the cucurbitadienol synthase polypeptide comprises a C-terminal portion comprising the sequence set forth in SEQ ID NO: 73. In some embodiments, the gene encoding the cucurbitadienol synthase polypeptide is codon optimized. In some embodiments, the codon optimized gene comprises the nucleic acid sequence set forth in SEQ ID NO: 74. In some embodiments, the cucurbitadienol synthase comprises an amino acid comprising the polypeptide from Lotus japonicas (BAE53431), Populus trichocarpa (XP_002310905), Actaea racemosa (ADC84219), Betula platyphylla (BAB83085), Glycyrrhiza glabra (BAA76902), Vitis vinifera (XP_002264289), Centella asiatica (AAS01524), Panax ginseng (BAA33460), and Betula platyphylla (BAB83086), as described in WO 2016/050890, incorporated by reference in its entirety herein.


In some embodiments, the method comprises contacting squalene with the recombinant host cell, wherein the recombinant host cell comprises an eighth gene encoding a squalene epoxidase. In some embodiments, the contacting results in production of 2, 3-oxidosqualene. In some embodiments, the squalene is produced by the recombinant host cell. In some embodiments, the squalene epoxidase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NOs: 50-56, 60, 61, 334 or 335.


In some embodiments, the method comprises contacting farnesyl pyrophosphate with the recombinant host cell, wherein the recombinant host cell comprises a ninth gene encoding a squalene synthase. In some embodiments, the contacting results in production of squalene. In some embodiments, the farnesyl pyrophosphate is produced by the recombinant host cell. In some embodiments, the squalene synthase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 69 or 336.


In some embodiments, the method further comprises contacting geranyl-PP with the recombinant host cell, wherein the recombinant host cell comprises a tenth gene encoding farnesyl-PP synthase. In some embodiments, the contacting results in production of farnesyl-PP. In some embodiments, the geranyl-PP is produced by the recombinant host cell. In some embodiments, the farnesyl-PP synthase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 338. In some embodiments, one or more of the first, second third, fourth, fifth, sixth, seventh, eighth, ninth, and tenth gene is operably linked to a heterologous promoter. In some embodiments, the heterologous promoter is a CMV, EF1a, SV40, PGK1, human beta actin, CAG, GAL1, GAL10, TEF, GDS, ADH1, CaMV35S, Ubi, T7, T7lac, Sp6, araBAD, trp, Lac, Ptac, pL promoter, or a combination thereof. In some embodiments, the promoter is an inducible, repressible, or constitutive promoter. In some embodiments, production of one or more of pyruvate, acetyl-CoA, citrate, and TCA cycle intermediates have been upregulated in the recombinant host cell. In some embodiments, cytosolic localization has been upregulated in the recombinant host cell. In some embodiments, one or more of the first, second, third, fourth, fifth, sixth, seventh, eighth, ninth, and tenth gene comprises at least one sequence encoding a 2A self-cleaving peptide. As used herein, the terms the first, the second, the third, the fourth, the fifth, the sixth, the seventh, the eighth, the ninth, the tenth, and alike do not infer particular order and/or a requirement for presence of the earlier number. For example, the recombinant host cell described herein can comprise the first gene and the third gene, but not the second gene. As another example, the recombinant host cell can comprise the first gene, the fifth gene, and the tenth gene, but not the second gene, the third gene, the fourth gene, the sixth gene, the seventh gene, the eighth gene, and the ninth gene.


The recombinant host cell can be, for example, a plant, bivalve, fish, fungus, bacteria or mammalian cell. For example, the plant is selected from Siraitia, Momordica, Gynostemma, Cucurbita, Cucumis, Arabidopsis, Artemisia, Stevia, Panax, Withania, Euphorbia, Medicago, Chlorophytum, Eleutherococcus, Aralia, Morus, Medicago, Betula, Astragalus, Jatropha, Camellia, Hypholoma, Aspergillus, Solanum, Huperzia, Pseudostellaria, Corchorus, Hedera, Marchantia, and Morus. In some embodiments, fungus is selected from Trichophyton, Sanghuangporus, Taiwanofungus, Moniliophthora, Marssonina, Diplodia, Lentinula, Xanthophyllomyces, Pochonia, Colletotrichum, Diaporthe, Histoplasma, Coccidioides, Histoplasma, Sanghuangporus, Aureobasidium, Pochonia, Penicillium, Sporothrix, Metarhizium, Aspergillus, Yarrowia, and Lipomyces. In some embodiments, the fungus is Aspergillus nidulans, Yarrowia lipolytica, or Rhodosporin toruloides. In some embodiments, the recombinant host cell is a yeast cell. In some embodiments, the yeast is selected from Candida, Sacccharaomyces, Saccharomycotina, Taphrinomycotina, Schizosaccharomycetes, Komagataella, Basidiomycota, Agaricomycotina, Tremellomycetes, Pucciniomycotina, Aureobasidium, Coniochaeta, Rhodosporidium, and Microboryomycetes. In some embodiments, the bacteria is selected from Frankia, Actinobacteria, Streptomyces, Enterococcus, In some embodiments, the bacteria is Enterococcus faecalis. In some embodiments, one or more of the first, second third, fourth, fifth, sixth, seventh, eighth, ninth, and tenth genes has been codon optimized for expression in a bacterial, mammalian, plant, fungal or insect cell. In some embodiments, one or more of the first, second third, fourth, fifth, sixth, seventh, eighth, ninth, and tenth genes comprises a functional mutation to increased activity of the encoded enzyme. In some embodiments, cultivating the recombinant host cell comprises monitoring the cultivating for pH, dissolved oxygen level, nitrogen level, or a combination thereof of the cultivating conditions. In some embodiments, the method comprises isolating Compound 1. In some embodiments, isolating Compound 1 comprises lysing the recombinant host cell. In some embodiments, isolating Compound 1 comprises isolating Compound 1 from the culture medium. In some embodiments, the method comprises purifying Compound 1. In some embodiments, purifying Compound 1 comprises HPLC, solid phase extraction or a combination thereof. In some embodiments, the purifying comprises harvesting the recombinant cells, saving the supernatant and lysing the cells. In some embodiments, the lysing comprises subjecting the cells to shear force or detergent washes thereby obtaining a lysate. In some embodiments, the shear force is from a sonication method, french pressurized cells, or beads. In some embodiments, the lysate is subjected to filtering and purification steps. In some embodiments, the lysate is filtered and purified by solid phase extraction.


In some embodiments, a compound having the structure of Compound 1,




embedded image



is provided, wherein the compound is produced by the method of any one of the alternative methods provided herein.


In some embodiments, a cell lysate comprising Compound 1 having the structure:




embedded image



is provided.


In some embodiments, a recombinant cell comprising: Compound 1 having the structure:




embedded image



is provided, and a gene encoding an enzyme capable of catalyzing production of Compound 1 from mogroside IIIE. In some embodiments, the gene is a heterologous gene to the recombinant cell.


In some embodiments, a recombinant cell comprising a first gene encoding a first enzyme capable of catalyzing production of Compound 1 having the structure:




embedded image



from mogroside IIIE is provided. In some embodiments, the first enzyme comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 1, 3, 78-101, 148, or 154 (CGTase). In some embodiments, the first enzyme comprises the amino acid sequence of SEQ ID NOs: 1, 3, 78-101, 148, or 154 (CGTase). In some embodiments, the first enzyme is a dextransucrase. In some embodiments, the dextransucrase comprises, or consists of, an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 2, 103, 106-110, 156, and 896. In some embodiments, the dextransucrase comprises, or consists of, the amino acid sequence of SEQ ID NO: 2, 103, 104, or 105. In some embodiments, the dextransucrase comprises, or consists of, the amino acid sequence of any one of SEQ ID NO: 2, 103-110 and 156-162 and 896. In some embodiments, the DexT comprises a nucleic acid sequence set forth in SEQ ID NO: 104 or 105. In some embodiments, the first enzyme is a transglucosidase. In some embodiments, the transglucosidase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to the sequence of SEQ ID NO: 201 or SEQ ID NO: 291. In some embodiments, the recombinant cell further comprises a second gene encoding a uridine diphosphate-glucosyl transferase (UGT). In some embodiments, the UGT comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 4, 5, 6, 7, 8, 9, 15, 16, 17, 18, and 19. In some embodiments, UGT comprises, or consists of, the amino acid sequence of any one of SEQ ID NOs: 4, 5, 6, 7, 8, 9, 15, 16, 17, and 18. In some embodiments, the UGT is encoded by a sequence set forth in UGT1495 (SEQ ID NO: 10), UGT1817 (SEQ ID NO: 11), UGT5914 (SEQ ID NO: 12), UGT8468 (SEQ ID NO: 13), or UGT10391 (SEQ ID NO: 14). In some embodiments, the cell comprises a third gene encoding UGT98 or UGT SK98. In some embodiments, the UGT98 or UGT SK98 comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 9, 407 or 16. In some embodiments, the cell comprises a fourth gene encoding an epoxide hydrolase. In some embodiments, the epoxide hydrolase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 21-30 and 309-314. In some embodiments, the cell comprises a fifth sequence encoding P450. In some embodiments, the P450 comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NOs: 20, 49, 308, 315 or 317. In some embodiments, P450 is encoded by a gene comprising a sequence set forth in any one of SEQ ID NOs: 31-48, 316, 431, 871, 873, 875, 877, 879, 881, 883, 885, 887, and 891. In some embodiments, further comprises a sixth sequence encoding cucurbitadienol synthase. In some embodiments, the cucurbitadienol synthase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 70-73, 75-77, 319, 321, 323, 325, 327-333, 420, 422, 424, 426, 446, 902, 904, and 906. In some embodiments, the cucurbitadienol synthase polypeptide comprises a C-terminal portion comprising the sequence set forth in SEQ ID NO: 73. In some embodiments, the gene encoding the cucurbitadienol synthase polypeptide is codon optimized. In some embodiments, the codon optimized gene comprises the nucleic acid sequence set forth in SEQ ID NO: 74. In some embodiments, the cell further comprises a seventh gene encoding a squalene epoxidase. In some embodiments, the squalene epoxidase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 50-56, 60, 61, 334, and 335. In some embodiments, the cell further comprises an eighth gene encoding a squalene synthase. In some embodiments, the eighth gene comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 69 or SEQ ID NO: 336. In some embodiments, the cell further comprises a ninth gene encoding a farnesyl-PP synthase. In some embodiments, the farnesyl-PP synthase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 338. In some embodiments, the cell is a mammalian, bacterial, fungal, or insect cell. In some embodiments, the cell is a yeast cell. Non-limiting examples of the yeast include Candida, Sacccharaomyces, Saccharomycotina, Taphrinomycotina, Schizosaccharomycetes, Komagataella, Basidiomycota, Agaricomycotina, Tremellomycetes, Pucciniomycotina, Aureobasidium, Coniochaeta, and Microboryomycetes. In some embodiments, the plant is selected from the group consisting of Siraitia, Momordica, Gynostemma, Cucurbita, Cucumis, Arabidopsis, Artemisia, Stevia, Panax, Withania, Euphorbia, Medicago, Chlorophytum, Eleutherococcus, Aralia, Morus, Medicago, Betula, Astragalus, Jatropha, Camellia, Hypholoma, Aspergillus, Solanum, Huperzia, Pseudostellaria, Corchorus, Hedera, Marchantia, and Morus. In some embodiments, the fungus is Trichophyton, Sanghuangporus, Taiwanofungus, Moniliophthora, Marssonina, Diplodia, Lentinula, Xanthophyllomyces, Pochonia, Colletotrichum, Diaporthe, Histoplasma, Coccidioides, Histoplasma, Sanghuangporus, Aureobasidium, Pochonia, Penicillium, Sporothrix, or Metarhizium.


In some embodiments, the cell comprises a sequence of an enzyme set forth in any one of SEQ ID NO: 897, 899, 909, 911, 913, 418, 421, 423, 425, 427, 871, 873, 901, 903 or 905. In some embodiments, the enzyme comprises a sequence set forth in or is encoded by a sequence in SEQ ID NO: 420, 422, 424, 426, 446, 872, 874-896, 898, 900, 902, 904, 906, 908, 910, 912, and 951-1012.


In some embodiments, DNA can be obtained through gene synthesis. This can be performed by either through Genescript or IDT, for example. DNA can be cloned through standard molecular biology techniques into an overexpression vector such as: pQE1, pGEX-4t3, pDest-17, pET series, pFASTBAC, for example. E. coli host strains can be used to produce enzyme (i.e., Top10 or BL21 series+/−codon plus) using 1 mM IPTG for induction at OD600 of 1. E. coli strains can be propagated at 37 C, 250 rpm and switched to room temperature or 30 C (150 rpm) during induction. When indicated, some enzymes can also be expressed through SF9 insect cell lines using pFASTBAC and optimized MO. Crude extract containing enzymes can be generated through sonication and used for the reactions described herein. All UDP-glycosyltransferase reactions contain sucrose synthase, and can be obtained from A. thaliana via gene synthesis and expressed in E. coli.


Hydrolysis of Hyper-Glycosylated Mogrosides to Produce Compound 1


In some embodiments, hyper-glycosylated mogrosides can be hydrolyzed to produce Compound 1. Non-limiting examples of hyper-glycosylated mogrosides include Mogroside V, Siamenoside I, Mogroside IVE, Iso-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III. Enzymes capable of catalyzing the hydrolysis process to produce Compound 1 can be, for example, CGTases (e.g., displays hydrolysis without starch), cellulases, β-glucosidases, transglucosidases, amylases, pectinases, dextranases, and fungal lactases. The amino acid sequences of some of these enzymes and the nucleic acid sequences encoding some of these enzymes can be found in Table 1.


In some embodiments, Compound 1 displays tolerance to hydrolytic enzymes in the recombinant cell, wherein the hydrolytic enzymes display capabilities of hydrolyzing Mogroside VI, Mogroside V, Mogroside IV to Mogroside IIIE. The alpha-linked glycoside present in Compound 1 provides a unique advantage over other Mogrosides (beta-linked glycosides) due to its tolerance to hydrolysis. During microbial production of Compound 1, the recombinant host cells (e.g., microbial host cells) can hydrolyze unwanted beta-linked Mogrosides back to Mogroside IIIE. Without being bound by any particular theory, it is believed that the hydrolysis by the host cells can improve the purity of Compound 1 due to: 1) Reduction of unwanted Mogroside VI, Mogroside V, and Mogroside IV levels, and/or 2) The hydrolysis will increase the amount of Mogroside IIIE available to be used as a precursor for production of Compound 1.


Purification of Mogroside Compounds


Some embodiments comprise isolating mogroside compounds, for example Compound 1. In some embodiments, isolating Compound 1 comprises lysing the recombinant host cell. In some embodiments, isolating Compound 1 comprises isolating Compound 1 from the culture medium. In some embodiments, the method further comprises purifying Compound 1. In some embodiments, purifying Compound 1 comprises HPLC, solid phase extraction or a combination thereof. In some embodiments, the purifying comprises harvesting the recombinant cells, saving the supernatant and lysing the cells. In some embodiments, the lysing comprises subjecting the cells to shear force or detergent washes thereby obtaining a lysate. In some embodiments, the shear force is from a sonication method, french pressurized cells, or beads. In some embodiments, the lysate is subjected to filtering and purification steps. In some embodiments, the lysate is filtered and purified by solid phase extraction. The lysate can then be filtered and treated with ammonium sulfate to remove proteins, and fractionated on a C18 HPLC (5×10 cm Atlantis prep T3 OBD column, 5 um, Waters) and by injections using an A/B gradient (A=water B=acetonitrile) of 10→30% B over 30 minutes, with a 95% B wash, followed by re-equilibration at 1% (total run time=42 minutes). The runs can be collected in tared tubes (12 fractions/plate, 3 plates per run) at 30 mL/fraction. The lysate can also be centrifuged to remove solids and particulate matter. Plates can then be dried in the Genevac HT12/HT24. The desired compound is expected to be eluted in Fraction 21 along with other isomers. The pooled Fractions can be further fractionated in 47 runs on fluoro-phenyl HPLC column (3×10 cm, Xselect fluoro-phenyl OBD column, 5 um, Waters) using an A/B gradient (A=water, B=acetonitrile) of 15→30% B over 35 minutes, with a 95% B wash, followed by re-equilibration at 15% (total run time=45 minutes). Each run was collected in 12 tared tubes (12 fractions/plate, 1 plate per run) at 30 mL/fraction. Fractions containing the desired peak with the desired purity can be pooled based on UPLC analysis and dried under reduced pressure to give a whitish powdery solid. The pure compound can be re-suspended/dissolved in 10 mL of water and lyophilized to obtain at least a 95% purity.


For purification of Compound 1, in some embodiments, the compound can be purified by solid phase extraction, which may remove the need to HPLC. Compound 1 can be purified, for example, to or to about 70%, 80%, 90%, 95%, 98%, 99%, or 100% purity or any level of purity within a range described by any two aforementioned values In some embodiments, compound 1 that is purified by solid phase extraction is, or is substantially, identical to the HPLC purified material. In some embodiments, the method comprises fractionating lysate from a recombinant cell on an HPLC column and collecting an eluted fraction comprising Compound 1.


Fermentation


Host cells can be fermented as described herein for the production of Compound 1. This can also include methods that occur with or without air and can be carried out in an anaerobic environment, for example. The whole cells (e.g., recombinant host cells) may be in fermentation broth or in a reaction buffer.


Monk fruit (Siraitia grosvenorii) extract can also be used to contact the cells in order to produce Compound 1. In some embodiments, a method of producing Compound 1 is provided. The method can comprise contacting monk fruit extract with a first enzyme capable of catalyzing production of Compound 1 from a mogroside such as such as Mogroside V, Siamenoside I, Mogroside IVE, Iso-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III. In some embodiments, the contacting comprises contacting the mogrol fruit extract with a recombinant host cell that comprises a first gene encoding the first enzyme. In some embodiments, the first gene is heterologous to the recombinant host cell. In some embodiments, the mogrol fruit extract contacts with the first enzyme in a recombinant host cell that comprises a first polynucleotide encoding the first enzyme. In some embodiments, mogroside IIIE is in the mogrol fruit extract. In some embodiments, mogroside IIIE is also produced by the recombinant host cell. In some embodiments, the method further comprises cultivating the recombinant host cell in a culture medium under conditions in which the first enzyme is expressed. In some embodiments, the first enzyme is one or more of UDP glycosyltransferases, cyclomaltodextrin glucanotransferases (CGTases), glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. In some embodiments, the first enzyme is a CGTase. For example, the CGTase can comprise an amino acid sequence having at least 70%, at least 80%, at least 90%, at least 95%, at least 98%, at least 99%, or more sequence identity to the sequence of any one of SEQ ID NO: SEQ ID NOs: 1, 3, 78-101, 148, and 154. In some embodiments, the CGTase comprises the amino acid sequence of any one of SEQ ID NOs: SEQ ID NOs: 1, 3, 78-101, 148, and 154. In some embodiments, the CGTase comprises the amino acid sequence of any one of SEQ ID NOs: 78-101. In some embodiments, the first enzyme is a dextransucrase. In some embodiments, the dextransucrase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of the sequences set forth in SEQ ID NOs: 2, 103, 106-110, 156 and 896. In some embodiments, the dextransucrase comprises an amino acid sequence of any one of SEQ ID NOs: 2, 103, 106-110, 156 and 896. In some embodiments, the first enzyme is a transglucosidase. In some embodiments, the transglucosidase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to the sequence of any one of SEQ ID NOs: 163-290 and 723. In some embodiments, the transglucosidase comprises an amino acid sequence of any one of SEQ ID NOs: 163-290 and 723. In some embodiments, the first enzyme is a beta-glucosidase. In some embodiments, the beta glucosidase comprises an amino acid sequence set forth in SEQ ID NO: 292, or an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 292. In some embodiments, the mogrol fruit extract comprises Mogroside IIA and the recombinant host cell comprises a second gene encoding a second enzyme capable of catalyzing production of Mogroside IIIE from Mogroside IIA. In some embodiments, mogroside IIA is also produced by the recombinant host cell. In some embodiments, the second enzyme is one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. In some embodiments, the second enzyme is a uridine diphosphate-glucosyl transferase (UGT). In some embodiments, the UGT is UGT73C3 (SEQ ID NO: 4), UGT73C6 (SEQ ID NO:5, 444, or 445), 85C2 (SEQ ID NO: 6), UGT73C5 (SEQ ID NO: 7), UGT73E1 (SEQ ID NO: 8), UGT98 (SEQ ID NO: 9 or 407), UGT1576(SEQ ID NO:15), UGT SK98 (SEQ ID NO: 16), UGT430 (SEQ ID NO: 17), UGT1697 (SEQ ID NO: 18), UGT11789 (SEQ ID NO: 19), or comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to UGT73C3 (SEQ ID NO: 4), UGT73C6 (SEQ ID NO:5, 444 or 445), 85C2 (SEQ ID NO: 6), UGT73C5 (SEQ ID NO: 7), UGT73E1 (SEQ ID NO: 8), UGT98 (SEQ ID NO: 9 or 407), UGT1576 (SEQ ID NO:15), UGT SK98 (SEQ ID NO:16), UGT430 (SEQ ID NO:17), UGT1697 (SEQ ID NO:18), UGT11789 (SEQ ID NO:19). In some embodiments, the UGT is encoded by a gene set forth in UGT1495 (SEQ ID NO: 10), UGT1817 (SEQ ID NO: 11), UGT5914 (SEQ ID NO: 12), UGT8468 (SEQ ID NO: 13) or UGT10391 (SEQ ID NO:14). In some embodiments, the monk fruit extract comprises mogrol. In some embodiments, the method further comprises contacting the mogrol of the monk fruit extract wherein the recombinant host cell further comprises one or more genes encoding one or more enzymes capable of catalyzing production of Mogroside IIIE from mogrol. In some embodiments, mogrol is also produced by the recombinant host cell. In some embodiments, the one or more enzymes comprises one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. In some embodiments, the second enzyme is a uridine diphosphate-glucosyl transferase (UGT). In some embodiments, the UGT is UGT73C3, UGT73C6, 85C2, UGT73C5, UGT73E1, UGT98, UGT1495, UGT1817, UGT5914, UGT8468, UGT10391, UGT1576, UGT SK98, UGT430, UGT1697, or UGT11789, or comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to those UGTs. In some embodiments, the method further comprises contacting the monk fruit extract with the recombinant host cell to produce mogroside IIIE, wherein the recombinant host cell further comprises one or more genes encoding one or more enzymes capable of catalyzing production of Mogroside IIIE from the mogroside compound, wherein the mogroside compound is one or more of mogroside IA1, mogroside IE1, mogroside IIA1, mogroside IIE, mogroside IIA, mogroside IIIA1, mogroside IIIA2, mogroside III, mogroside IV, mogroside IVA, mogroside V, or siamenoside. In some embodiments, a mogroside compound is also produced by the recombinant host cell. In some embodiments, the one or more enzymes comprises one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. In some embodiments, the mogroside compound is Mogroside IIE. In some embodiments, the one or more enzymes is comprises an amino acid set forth by any one of SEQ ID NOs: 293-303. In some embodiments, the mogroside compound is Morgroside IIA or Mogroside IIE, and wherein contacting the monk fruit extract with the recombinant cell expressing the one or more enzymes produces Mogroside IIIA, Mogroside IVE and Mogroside V. In some embodiments, the one or more enzymes comprise an amino acid set forth in SEQ ID NO: 304. In some embodiments, the one or more enzymes is encoded by a sequence set forth in SEQ ID NO: 305. In some embodiments, the monk fruit extract comprises Mogroside IA1. In some embodiments, the method further comprises contacting the monk fruit extract with the recombinant host cell, wherein the recombinant host cell comprises a gene encoding UGT98 or UGT SK98. In some embodiments, the UGT98 or UGT SK98 enzyme comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 9, 407, 16 or 306. In some embodiments, the UGT98 is encoded by a sequence set forth in SEQ ID NO: 307. In some embodiments, the contacting results in production of Mogroside IIA in the cell. In some embodiments, the monk fruit extract comprises 11-hydroxy-24,25 epoxy cucurbitadienol. In some embodiments, the method further comprises contacting monk fruit extract with the recombinant host cell, wherein the recombinant host cell further comprises a third gene encoding an epoxide hydrolase. In some embodiments, the 11-hydroxy-24,25 epoxy cucurbitadienol is also produced by the recombinant host cell. In some embodiments, the method further comprises contacting monk fruit extract with the recombinant host cell, wherein the recombinant host cell comprises a fourth gene encoding a cytochrome P450 or an epoxide hydrolase. In some embodiments, the 11-hydroxy-cucurbitadienol is also produced by the recombinant host cell. In some embodiments, the monk fruit extract comprises 3, 24, 25 trihydroxy cucurbitadienol. In some embodiments, the method further comprises contacting monk fruit extract with the recombinant host cell, wherein the recombinant host cell further comprises a fifth gene encoding a cytochrome P450. In some embodiments, the 3, 24, 25 trihydroxy cucurbitadienol is also produced by the recombinant host cell. In some embodiments, the contacting with mogrol fruit extract results in production of Mogrol in the recombinant host cell. In some embodiments, the cytochrome P450 comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 20 or 308. In some embodiments, the epoxide hydrolase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 21-30 and 309-314. In some embodiments, the monk fruit extract comprises cucurbitadienol. In some embodiments, the method further comprises contacting cucurbitadienol with the recombinant host cell, wherein the recombinant host cell comprises a gene encoding cytochrome P450. In some embodiments, the contacting results in production of 11-hydroxy cucurbitadienol. In some embodiments, the 11-hydroxy cucurbitadienol is expressed in cells comprising a gene encoding CYP87D18 or SgCPR protein. In some embodiments, CYP87D18 or SgCPR comprises a sequence set forth in SEQ ID NO: 315, 872 or 874. In some embodiments, the CYP87D18 or SgCPR is encoded by SEQ ID NO: 316, 871 or 873. In some embodiments, the cucurbitadienol is also produced by the recombinant host cell. In some embodiments, the gene encoding cytochrome P450 comprises a nucleic acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID Nos: 31-48, 316, 431, 871, 873, 875, 877, 879, 881, 883, 885, 887, and 891. In some embodiments, the cytochrome P450 comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 20 or 49. In some embodiments, the monk fruit extract comprises 2, 3-oxidosqualene. In some embodiments, the method further comprises contacting 2, 3-oxidosqualene of the monk fruit extract with the recombinant host cell, wherein the recombinant host cell comprises a seventh gene encoding cucurbitadienol synthase. In some embodiments, he cucurbitadienol synthase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 70-73, 75-77, 319, 321, 323, 325, 327-333, 420, 422, 424, 426, 446, 902, 904 or 906. In some embodiments, the cucurbitadienol synthase is encoded by a sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 74, 320, 322, 324, 326, 328, 418, 421, 423, 425, 427, 897, 899, 901, 903, or 905. In some embodiments, the monk fruit extract comprises mogroside intermediates such as Mogroside V, Siamenoside I, Mogroside IVE, ISO-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III. In some embodiments, the method further comprises contacting a mogroside intermediate with the recombinant host cell, wherein the recombinant host cell comprises a seventh gene encoding cucurbitadienol synthase. In some embodiments, he cucurbitadienol synthase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 70-73, 75-77, 319, 321, 323, 325, 327-333, 420, 422, 424, 426, 446, 902, 904, or 906. In some embodiments, the cucurbitadienol synthase is encoded by a sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 74, 320, 322, 324, 326, 328, 418, 421, 423, 425, 427, 897, 899, 901, 903, or 905. In some embodiments, the contacting results in production of cucurbitadienol. In some embodiments, the 2,3-oxidosqualene and diepoxysqualene is also produced by the recombinant host cell. In some embodiments, the 2, 3-oxidosqualene or diepoxysqualene is produced by an enzyme comprising a sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 898 or 900, or comprising a sequence set forth in SEQ ID NO: 898 or 900. In some embodiments, the 2,3-oxidosqualene or diepoxysqualene is produced by an enzyme encoded by a nucleic acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 897 or 899; or encoded by a nucleic acid set forth in SEQ ID NO: 897 or 899.


In some embodiments, the cucurbitadienol synthase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 70-73, 75-77, 319, 321, 323, 325, 327-333, 420, 422, 424, 426, 446, 902, 904, and 906. In some embodiments, the cucurbitadienol synthase is a cucurbitadienol synthase from C. pepo, S grosvenorii, C sativus, C melo, C moschata, or C maxim. In some embodiments, the cucurbitadienol synthase is encoded by a gene comprising a nucleic acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 74, 320, 322, 324, 326, 328, 418, 421, 423, 425, 427, 897, 899, 901, 903, and 905, or comprising a nucleic acid sequence set forth in any one of SEQ ID NOs: 74, 320, 322, 324, 326, 328, 418, 421, 423, 425, 427, 897, 899, 901, 903 and 905. In some embodiments, 11-OH cucurbitadienol is produced by the cell. In some embodiments, 11-OH cucurbitadienol is expressed in cells comprising a gene encoding CYP87D18 or SgCPR. In some embodiments, CYP87D18 or SgCPR comprises a sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 315, 872, or 874, or a sequence set forth in SEQ ID NO: 315, 872 or 874. In some embodiments, the CYP87D18 or SgCPR is encoded by a sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 316, 871 or 873, or a sequence set forth in SEQ ID NO: 316, 871 or 873. In some embodiments, the monk fruit extract comprises squalene. In some embodiments, the 2,3-oxidosqualene or diepoxysqualene is produced by an enzyme comprising a sequence having at least 70%, 80%, 85%, 90%, 95%, 98%, 99%, or more sequence identity to SEQ ID NO: 898 or 900, or a sequence set forth in SEQ ID NO: 898 or 900. In some embodiments, the 2, 3-oxidosqualene or diepoxysqualene is produced by an enzyme encoded by a nucleic acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO; 897 or 899, or a sequence set forth in SEQ ID NO: 897 or 899. In some embodiments, the method further comprises contacting squalene with the recombinant host cell, wherein the recombinant host cell comprises an eighth gene encoding a squalene epoxidase. In some embodiments, the contacting results in production of 2,3-oxidosqualene. In some embodiments, the squalene is also produced by the recombinant host cell. In some embodiments, the squalene epoxidase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 50-56, 60, 61, 334 or 335. In some embodiments, squalene epoxide is encoded by a nucleic acid sequence set forth in SEQ ID NO: 335. In some embodiments, the monk fruit extract comprises farnesyl pyrophosphate. In some embodiments, the method further comprises contacting farnesyl pyrophosphate with the recombinant host cell, wherein the recombinant host cell comprises a ninth gene encoding a squalene synthase. In some embodiments, the contacting results in production of squalene. In some embodiments, the farnesyl pyrophosphate is also produced by the recombinant host cell. In some embodiments, the squalene synthase comprises an amino acid sequence having at least 70%, at least 80%, at least 90%, at least 95%, at least 98%, at least 99%, or more of sequence identity to any one of SEQ ID NO: 69 and 336. In some embodiments, the squalene synthase is encoded by a sequence comprising the nucleic acid sequence set forth in SEQ ID NO: 337. In some embodiments, the monk fruit extract comprises geranyl-PP. In some embodiments, the method further comprises contacting geranyl-PP with the recombinant host cell, wherein the recombinant host cell comprises a tenth gene encoding farnesyl-PP synthase. In some embodiments, the contacting results in production of farnesyl-PP. In some embodiments, the geranyl-PP is also produced by the recombinant host cell. In some embodiments, the farnesyl-PP synthase comprises an amino acid sequence having at least 70%, at least 80%, at least 90%, at least 95%, at least 98%, at least 99%, or more sequence identity to SEQ ID NO: 338. In some embodiments, the farnesyl-PP synthase is encoded by a nucleic acid sequence set forth in SEQ ID NO: 339. In some embodiments, one or more of the first, second, third, fourth, fifth, sixth, seventh, eighth, ninth, and tenth gene is operably linked to a heterologous promoter. In some embodiments, the heterologous promoter is a CMV, EF1a, SV40, PGK1, human beta actin, CAG, GAL1, GAL10, TEF, GDS, ADH1, CaMV35S, Ubi, T7, T7lac, Sp6, araBAD, trp, lac, Ptac, pL promoter, or a combination thereof. In some embodiments, the promoter is an inducible, repressible, or constitutive promoter. In some embodiments, production of one or more of pyruvate, acetyl-CoA, citrate, and TCA cycle intermediates have been upregulated in the recombinant host cell. In some embodiments, cytosolic localization has been upregulated in the recombinant host cell. In some embodiments, one or more of the first, second third, fourth, fifth, sixth, seventh, eighth, ninth, and tenth gene comprises at least one sequence encoding a 2A self-cleaving peptide. In some embodiments, the recombinant host cell is a plant, bivalve, fish, fungus, bacteria or mammalian cell. In some embodiments, the plant is selected from Siraitia, Momordica, Gynostemma, Cucurbita, Cucumis, Arabidopsis, Artemisia, Stevia, Panax, Withania, Euphorbia, Medicago, Chlorophytum, Eleutherococcus, Aralia, Morus, Medicago, Betula, Astragalus, Jatropha, Camellia, Hypholoma, Aspergillus, Solanum, Huperzia, Pseudostellaria, Corchorus, Hedera, Marchantia, and Morus. In some embodiments, the fungus is selected from Trichophyton, Sanghuangporus, Taiwanofungus, Moniliophthora, Marssonina, Diplodia, Lentinula, Xanthophyllomyces, Pochonia, Colletotrichum, Diaporthe, Histoplasma, Coccidioides, Histoplasma, Sanghuangporus, Aureobasidium, Pochonia, Penicillium, Sporothrix, Metarhizium, Aspergillus, Yarrowia, and Lipomyces. In some embodiments, the fungus is Aspergillus nidulans, Yarrowia lipolytica, or Rhodosporin toruloides. In some embodiments, the recombinant host cell is a yeast cell. In some embodiments, the yeast is selected from Candida, Sacccharaomyces, Saccharomycotina, Taphrinomycotina, Schizosaccharomycetes, Komagataella, Basidiomycota, Agaricomycotina, Tremellomycetes, Pucciniomycotina, Aureobasidium, Coniochaeta, Rhodosporidium, and Microboryomycetes. In some embodiments, the bacteria is selected from the group consisting of Frankia, Actinobacteria, Streptomyces, and Enterococcus. In some embodiments, the bacteria is Enterococcus faecalis. In some embodiments, one or more of the first, second third, fourth, fifth, sixth, seventh, eighth, ninth, and tenth gene has been codon optimized for expression in a bacterial, mammalian, plant, fungal or insect cell. In some embodiments, one or more of the first, second third, fourth, fifth, sixth, seventh, eighth, ninth, and tenth genes comprises a functional mutation to increased activity of the encoded enzyme. In some embodiments, cultivating the recombinant host cell comprises monitoring the cultivating for pH, dissolved oxygen level, nitrogen level, or a combination thereof of the cultivating conditions. In some embodiments, the method comprises isolating Compound 1. In some embodiments, isolating Compound 1 comprises lysing the recombinant host cell. In some embodiments, isolating Compound 1 comprises isolating Compound 1 from the culture medium. In some embodiments, the method further comprises purifying Compound 1. In some embodiments, purifying Compound 1 comprises HPLC, solid phase extraction or a combination thereof. In some embodiments, the purifying further comprises harvesting the recombinant cells, saving the supernatant and lysing the cells. In some embodiments, the lysing comprises subjecting the cells to shear force or detergent washes thereby obtaining a lysate. In some embodiments, the shear force is from a sonication method, french pressurized cells, or beads. In some embodiments, the lysate is subjected to filtering and purification steps. In some embodiments, the lysate is filtered and purified by solid phase extraction. In some embodiments, the method further comprises second or third additions of monk fruit extract to the growth media of the recombinant host cells. Additionally the method can be performed by contacting the monk fruit extract with the recombinant cell lysate, wherein the recombinant cell lysate comprises the expressed enzymes listed herein.


In general, compounds as disclosed and described herein, individually or in combination, can be provided in a composition, such as, e.g., an ingestible composition. In one embodiment, compounds as disclosed and described herein, individually or in combination, can provide a sweet flavor to an ingestible composition. In other embodiments, the compounds disclosed and described herein, individually or in combination, can act as a sweet flavor enhancer to enhance the sweetness of another sweetener. In other embodiments, the compounds disclosed herein impart a more sugar-like temporal profile and/or flavor profile to a sweetener composition by combining one or more of the compounds as disclosed and described herein with one or more other sweeteners in the sweetener composition. In another embodiment, compounds as disclosed and described herein, individually or in combination, can increase or enhance the sweet taste of a composition by contacting the composition thereof with the compounds as disclosed and described herein to form a modified composition. In another embodiment, compounds as disclosed and described herein, individually or in combination, can be in a composition that modulates the sweet receptors and/or their ligands expressed in the body other than in the taste buds.


As used herein, an “ingestible composition” includes any composition that, either alone or together with another substance, is suitable to be taken by mouth whether intended for consumption or not. The ingestible composition includes both “food or beverage products” and “non-edible products”. By “Food or beverage products”, it is meant any edible product intended for consumption by humans or animals, including solids, semi-solids, or liquids (e.g., beverages) and includes functional food products (e.g., any fresh or processed food claimed to have a health-promoting and/or disease-preventing properties beyond the basic nutritional function of supplying nutrients). The term “non-food or beverage products” or “noncomestible composition” includes any product or composition that can be taken into the mouth by humans or animals for purposes other than consumption or as food or beverage. For example, the non-food or beverage product or noncomestible composition includes supplements, nutraceuticals, pharmaceutical and over the counter medications, oral care products such as dentifrices and mouthwashes, and chewing gum.


Compositions Comprising Mogroside Compounds


Also disclosed herein include compostions, e.g., ingestible compositions, comprising one or more of the mogroside compounds disclosed herein, including but not limited to Compound 1 and the compounds shown in FIG. 43. In some embodiments, an ingestible composition can be a beverage. For example, the beverage can be selected from the group consisting of enhanced sparkling beverages, colas, lemon-lime flavored sparkling beverages, orange flavored sparkling beverages, grape flavored sparkling beverages, strawberry flavored sparkling beverages, pineapple flavored sparkling beverages, ginger-ales, root beers, fruit juices, fruit-flavored juices, juice drinks, nectars, vegetable juices, vegetable-flavored juices, sports drinks, energy drinks, enhanced water drinks, enhanced water with vitamins, near water drinks, coconut waters, tea type drinks, coffees, cocoa drinks, beverages containing milk components, beverages containing cereal extracts and smoothies. In some embodiments, the beverage can be a soft drink.


An “ingestibly acceptable ingredient” is a substance that is suitable to be taken by mouth and can be combined with a compound described herein to form an ingestible composition. The ingestibly acceptable ingredient may be in any form depending on the intended use of a product, e.g., solid, semi-solid, liquid, paste, gel, lotion, cream, foamy material, suspension, solution, or any combinations thereof (such as a liquid containing solid contents). The ingestibly acceptable ingredient may be artificial or natural. Ingestibly acceptable ingredients includes many common food ingredients, such as water at neutral, acidic, or basic pH, fruit or vegetable juices, vinegar, marinades, beer, wine, natural water/fat emulsions such as milk or condensed milk, edible oils and shortenings, fatty acids and their alkyl esters, low molecular weight oligomers of propylene glycol, glyceryl esters of fatty acids, and dispersions or emulsions of such hydrophobic substances in aqueous media, salts such as sodium chloride, wheat flours, solvents such as ethanol, solid edible diluents such as vegetable powders or flours, or other liquid vehicles; dispersion or suspension aids; surface active agents; isotonic agents; thickening or emulsifying agents, preservatives; solid binders; lubricants and the like.


Additional ingestibly acceptable ingredients include acids, including but are not limited to, citric acid, phosphoric acid, ascorbic acid, sodium acid sulfate, lactic acid, or tartaric acid; bitter ingredients, including, for example caffeine, quinine, green tea, catechins, polyphenols, green robusta coffee extract, green coffee extract, whey protein isolate, or potassium chloride; coloring agents, including, for example caramel color, Red #40, Yellow #5, Yellow #6, Blue #1, Red #3, purple carrot, black carrot juice, purple sweet potato, vegetable juice, fruit juice, beta carotene, turmeric curcumin, or titanium dioxide; preservatives, including, for example sodium benzoate, potassium benzoate, potassium sorbate, sodium metabisulfate, sorbic acid, or benzoic acid; antioxidants including, for example ascorbic acid, calcium disodium EDTA, alpha tocopherols, mixed tocopherols, rosemary extract, grape seed extract, resveratrol, or sodium hexametaphosphate; vitamins or functional ingredients including, for example resveratrol, Co-Q10, omega 3 fatty acids, theanine, choline chloride (citocoline), fibersol, inulin (chicory root), taurine, panax ginseng extract, guanana extract, ginger extract, L-phenylalanine, L-carnitine, L-tartrate, D-glucoronolactone, inositol, bioflavonoids, Echinacea, ginko biloba, yerba mate, flax seed oil, garcinia cambogia rind extract, white tea extract, ribose, milk thistle extract, grape seed extract, pyrodixine HCl (vitamin B6), cyanoobalamin (vitamin B12), niacinamide (vitamin B3), biotin, calcium lactate, calcium pantothenate (pantothenic acid), calcium phosphate, calcium carbonate, chromium chloride, chromium polynicotinate, cupric sulfate, folic acid, ferric pyrophosphate, iron, magnesium lactate, magnesium carbonate, magnesium sulfate, monopotassium phosphate, monosodium phosphate, phosphorus, potassium iodide, potassium phosphate, riboflavin, sodium sulfate, sodium gluconate, sodium polyphosphate, sodium bicarbonate, thiamine mononitrate, vitamin D3, vitamin A palmitate, zinc gluconate, zinc lactate, or zinc sulphate; clouding agents, including, for example ester gun, brominated vegetable oil (BVO), or sucrose acetate isobutyrate (SAIB); buffers, including, for example sodium citrate, potassium citrate, or salt; flavors, including, for example propylene glycol, ethyl alcohol, glycerine, gum Arabic (gum acacia), maltodextrin, modified corn starch, dextrose, natural flavor, natural flavor with other natural flavors (natural flavor WONF), natural and artificial flavors, artificial flavor, silicon dioxide, magnesium carbonate, or tricalcium phosphate; and stabilizers, including, for example pectin, xanthan gum, carboxylmethylcellulose (CMC), polysorbate 60, polysorbate 80, medium chain triglycerides, cellulose gel, cellulose gum, sodium caseinate, modified food starch, gum Arabic (gum acacia), or carrageenan.


EXAMPLES

Some aspects of the embodiments discussed above are disclosed in further detail in the following examples, which are not in any way intended to limit the scope of the present disclosure.


Example 1: Production of Siamenoside I



embedded image


As disclosed herein, siamenoside I can be an intermediate mogroside compound for the production of Compound 1 disclosed herein. For example, siamenoside I may be hydrolyzed to produce mogroside IIIE which can then be used to produce Compound 1. For example, a method for producing siamenoside I can comprises: contacting mogrol with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. For example, a recombinant cell expressing pectinase from Aspergillus aculeatus can be used.


As another example, the method for producing siamenoside I can comprises: contacting one or more of Mogroside V, Siamenoside I, Mogroside IVE, Iso-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. For example, a pectinase from Aspergillus aculeatus can be used.


Example 2: Production of Mogroside IVE



embedded image


As disclosed herein, Mogroside IVE can be an intermediate mogroside compound for the production of Compound 1 disclosed herein. For example, Mogroside IVE from mogroside V can then be used to produce Compound 1. For example, a method for producing Mogroside IVE can comprises: contacting mogroside V with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. As another example the recombinant cell can comprises a gene encoding pectinase. The pectinase can be encoded by a gene from Aspergillus aculeatus.


As another example, the method for producing Mogroside IVE can comprises: contacting one or more of Mogroside V, Siamenoside I, Mogroside IVE, Iso-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. For example, a pectinase from Aspergillus aculeatus can be used.


Example 3: Production of Mogroside IIIE



embedded image


As disclosed herein, Mogroside IIIE can be an intermediate mogroside compound for the production of Compound 1 disclosed herein. For example, Mogroside IIA may be glycosylated to produce mogroside IIIE which can then be used to produce Compound 1.


As another example, the method for producing Mogroside IIIE can comprises: contacting one or more of Mogroside V, Mogroside IIA, Siamenoside I, Mogroside IVE, ISO-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. For example, a pectinase from Aspergillus aculeatus can be encoded by a gene within the recombinant host cell.


Example 4: Production of Mogroside IVA



embedded image


As disclosed herein, Mogroside IVA can be an intermediate mogroside compound for the production of Compound 1 disclosed herein. For example, Mogroside IVA from mogroside V can then be used to produce Compound 1.


For example, a method for producing Mogroside IVA can comprises: contacting Mogroside V with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can also be a 3-galactosidase from Aspergillus oryzae, for example.


As another example, the method for producing Mogroside IVA can comprises: contacting one or more of Mogroside V, Siamenoside I, Mogroside IVE, Iso-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. For example, a β-galactosidase from Aspergillus oryzae can be used in the method.


Example 5: Production of Mogroside IIA



embedded image


As disclosed herein, Mogroside IIA can be an intermediate mogroside compound for the production of Compound 1 disclosed herein. For example, a method for producing Mogroside IIA can comprise: contacting Mogroside IA1 with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases.


As another example, the method for producing Mogroside IIA can comprises: contacting one or more of Mogroside IA1, Mogroside V, Siamenoside I, Mogroside IVE, ISO-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. For example, a celluclast can also be used.


Example 6: Production of Mogroside IIIA1 from Aromase



embedded image


As disclosed herein, Mogroside IIIA1 can be an intermediate mogroside compound for the production of Compound 1 disclosed herein. For example, Mogroside IIIA1 can be an intermediate to produce mogroside IVA which can then be used as an intermediate to produce Compound 1. For example, a method for producing Mogroside IIIA1 I can comprise contacting Siamenoside I with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can also be Aromase, for example. As another example, the method for producing Mogroside IIIA1 I can comprise: contacting one or more of Mogroside V, Siamenoside I, Mogroside IVE, Iso-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases.


Example 7: Production of Compound 3



embedded image


As disclosed herein, Compound 3 can be an intermediate mogroside compound that is produced with Compound 1 disclosed herein. For example, a method for producing Compound 3 can comprises: contacting mogrol with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be Cyclomaltodextrin glucanotransferase from Bacillus lichenformis and/or Toruzyme.


As another example, the method for producing Compound 3 can comprises: contacting one or more of Mogroside V, Siamenoside I, Mogroside IVE, Iso-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IIIE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. For example, a CTGase enzyme can be used.


Example 8: Production of Compound 4



embedded image


As disclosed herein, Compound 4 produced during the production of Compound 1 disclosed herein. For example, a method for producing Compound 4 can also lead to the production of Compound 1, the method can comprise contacting Mogroside IIIE with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases.


As another example, the method for producing Compound 4 can comprises: contacting one or more of Mogroside V, Siamenoside I, Mogroside IVE, Iso-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, 3-glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be Cyclomaltodextrin glucanotransferase from Bacillus lichenformis and/or Toruzyme, for example.


Example 9: Production of Compound 5



embedded image


Compound 5 can be an intermediate mogroside compound produced during the production of Compound 1 disclosed herein. For example, a method for producing Compound 5 can also lead to the production of Compound 1, the method can comprise contacting Mogroside IIIE with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases.


As another example, the method for producing Compound 5 can comprise: contacting one or more of Mogroside V, Siamenoside I, Mogroside IVE, Iso-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. For example, a CGTase from Bacillus lichenformis or Toruzyme can be used.


Example 10: Production of Compound 6



embedded image


As disclosed herein, Compound 6 can be an intermediate mogroside compound produced during the production of Compound 1 disclosed herein. For example, a method for producing Compound 6 can also lead to the production of Compound 1, the method can comprise contacting Mogroside IIIE with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases.


As another example, the method for producing Compound 6 can comprise: contacting one or more of Mogroside V, Siamenoside I, Mogroside IVE, Iso-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. For example, a CGTase from Bacillus lichenformis or Toruzyme can be used.


Example 11: Production of Compound 7



embedded image


As disclosed herein, Compound 7 can be an intermediate mogroside compound produced during the production of Compound 1 disclosed herein. For example, a method for producing Compound 7 can also lead to the production of Compound 1, the method can comprise contacting Mogroside IIIE with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases.


As another example, the method for producing Compound 7 can comprise: contacting one or more of Mogroside V, Siamenoside I, Mogroside IVE, Iso-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. For example, a CGTase from Bacillus lichenformis or Toruzyme can be used.


Example 12: Production of Compound 8



embedded image


As disclosed herein, Compound 8 can be an intermediate mogroside compound produced during the production of Compound 1 disclosed herein. For example, a method for producing Compound 8 can also lead to the production of Compound 1, the method can comprise contacting Mogroside IIIE with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases.


As another example, the method for producing Compound 8 can comprise: contacting one or more of Mogroside V, Siamenoside I, Mogroside IVE, Iso-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. For example, a CGTase from Bacillus lichenformis or Toruzyme can be used.


Example 13: Production of Compound 9



embedded image


As disclosed herein, Compound 9 can be an intermediate mogroside compound produced during the production of Compound 1 disclosed herein. For example, a method for producing Compound 9 can also lead to the production of Compound 1, the method can comprise contacting Mogroside IIIE with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases.


As another example, the method for producing Compound 9 can comprise: contacting one or more of Mogroside V, Siamenoside I, Mogroside IVE, Iso-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. For example, a CGTase from Bacillus lichenformis or Toruzyme can be used.


Example 14: Production of Compound 10



embedded image


As disclosed herein, Compound 10 can be an intermediate mogroside compound produced during the production of Compound 1 disclosed herein. For example, a method for producing Compound 10 can also lead to the production of Compound 1, the method can comprise contacting Mogroside IIIE with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases.


As another example, the method for producing Compound 10 can comprise: contacting one or more of Mogroside V, Siamenoside I, Mogroside IVE, Iso-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. For example, a CGTase from Bacillus lichenformis or Toruzyme can be used.


Example 15: Production of Compound 11



embedded image


As disclosed herein, Compound 11 can be an intermediate mogroside compound produced during the production of Compound 1 disclosed herein. For example, a method for producing Compound 11 can also lead to the production of Compound 1, the method can comprise contacting Mogroside IIIE or 11-oxo-MIIIE with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases.


As another example, the method for producing Compound 11 can comprise: contacting one or more of Mogroside V, Siamenoside I, Mogroside IVE, Iso-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. For example, a CGTase from Bacillus lichenformis or Toruzyme can be used.


Example 16: Production of Compound 12



embedded image


As disclosed herein, Compound 12 can be an intermediate mogroside compound that can be used in the production of Compound 1, disclosed herein. For example, a method for producing Compound 12 can also lead to the production of Compound 1, the method can comprise contacting Mogroside VI isomer with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, invertases and dextranases. The enzyme can be an invertase enzyme from baker's yeast, for example.


As another example, the method for producing Compound 12 can comprise: contacting one or more of Mogroside V, Siamenoside I, Mogroside IVE, Iso-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, Mogroside VI isomer and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, invertases and dextranases.


Example 17: Production of Compound 13



embedded image


As disclosed herein, Compound 13 can be an intermediate mogroside produced during the production of Compound 1 disclosed herein. For example, the method can comprise contacting Mogroside IIIE with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme expressed can also be a celluclast, for example.


As another example, the method for producing Compound 13 can comprise: contacting one or more of Mogroside V, Siamenoside I, Mogroside IVE, Iso-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. For example, a celluclast can be used.


Example 18: Production of Compound 14



embedded image


As disclosed herein, Compound 14 can be an intermediate mogroside compound produced during the production of Compound 1 disclosed herein. For example, the method can comprise contacting Mogroside IIIE with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme expressed can also be a celluclast, for example.


As another example, the method for producing Compound 14 can comprise: contacting one or more of Mogroside V, Siamenoside I, Mogroside IVE, Iso-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. For example, a celluclast can be used. The method can also require the presence of a sugar, such as α-lactose, for example.


Example 19: Production of Compound 15



embedded image


As disclosed herein, Compound 15 can be an intermediate mogroside compound that can be used for the production of Compound 1 disclosed herein. For example, the method can comprise contacting mogroside IIA with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, 3-glucosidases, amylases, transglucosidases, pectinases, and dextranases.


As another example, the method for producing Compound 15 can comprise: contacting one or more of Mogroside V, Siamenoside I, Mogroside IVE, Iso-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. For example, a toruzyme can be used.


Example 20: Production of Compound 16



embedded image


As disclosed herein, Compound 16 can be an intermediate mogroside compound that can be used for the production of Compound 1 disclosed herein. For example, the method can comprise contacting mogroside IIA with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, 3-glucosidases, amylases, transglucosidases, pectinases, and dextranases.


As another example, the method for producing Compound 16 can comprise: contacting one or more of Mogroside V, Siamenoside I, Mogroside IVE, Iso-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. For example, a toruzyme can be used.


The enzyme can be Toruzyme, for example. The recombinant cell can further comprise a gene encoding a clyclomatlodextrin glucanotransferase (e.g., Toruzyme), an invertase, a glucostransferase (e.g., UGT76G1), for example.


Example 21: Production of Compound 17



embedded image


As disclosed herein, Compound 17 can be an intermediate mogroside compound for the production of Compound 1 disclosed herein. For example, Compound 17 may be hydrolyzed to produce mogroside IIIE which can then be used to produce Compound 1. For example, a method for producing Compound 17 can comprises: contacting Siamenoside I with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, transglucosidases, sucrose synthases, pectinases, and dextranases. For example, a recombinant cell expressing a UDP glycosyltransferase can be used.


As another example, the method for producing Compound 17 can comprises: contacting one or more of Mogroside V, Siamenoside I, Mogroside IVE, Iso-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. For example, a UDP glycosyltransferases can be used.


Example 22: Production of Compound 18



embedded image


As disclosed herein, Compound 18 can be an intermediate mogroside compound produced during the production of Compound 1 disclosed herein. For example, Compound 18 may be hydrolyzed to produce mogroside IIIE which can then be used to produce Compound 1. For example, a method for producing Compound 18 can also lead to the production of Compound 1, the method can comprise contacting Mogroside IIIE with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzymes can be Sus1 and UGT76G1 for example.


As another example, the method for producing Compound 18 can comprise: contacting one or more of Mogroside V, Siamenoside I, Mogroside IVE, Iso-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be UGT76G1, for example.


Example 23: Production of Compound 19



embedded image


As disclosed herein, Compound 19 can be an intermediate mogroside compound produced during the production of Compound 1 disclosed herein. Compound 19 can be further hydrolyzed to produce Compound 1, for example. For example, a method for producing Compound 18 can also lead to the production of Compound 1, the method can comprise contacting Mogroside IIIE with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzymes can be Sus1 and UGT76G1 for example.


As another example, the method for producing Compound 19 can comprise: contacting one or more of Mogroside V, Siamenoside I, Mogroside IVE, Iso-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be UGT76G1, for example. The enzyme can also be sucrose synthase Sus1, for example.


Example 24: Production of Compound 20



embedded image


As disclosed herein, Compound 20 can be an intermediate mogroside compound produced during the production of Compound 1 disclosed herein. Compound 20 can be further hydrolyzed to produce Compound 1, for example. For example, a method for producing Compound 20 can also lead to the production of Compound 1, the method can comprise contacting Mogroside IIIE with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzymes can be Sus1 and UGT76G1 for example.


As another example, the method for producing Compound 20 can comprise: contacting one or more of Mogroside V, Siamenoside I, Mogroside IVE, Iso-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be UGT76G1, for example. The enzyme can also be sucrose synthase Sus1, for example. The enzyme can be sucrose synthase Sus1 and UGT76G1, for example.


Example 25: Production of Compound 21



embedded image


As disclosed herein, Compound 21 can be an intermediate mogroside compound produced during the production of Compound 1 disclosed herein. Compound 21 can be further hydrolyzed to produce Compound 1, for example. For example, a method for producing Compound 21 can also lead to the production of Compound 1, the method can comprise contacting Mogroside IVE with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzymes can be Sus1 and UGT76G1 for example.


As another example, the method for producing Compound 21 can comprise: contacting one or more of Mogroside V, Mogroside IVE, Siamenoside I, Mogroside IVE, ISO-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be UGT76G1, for example. The enzyme can also be sucrose synthase Sus1, for example. The enzymes can be sucrose synthase Sus1 and GT76G1, for example.


Example 26: Production of Compound 22



embedded image


As disclosed herein, Compound 22 can be an intermediate mogroside compound produced during the production of Compound 1 disclosed herein. Compound 22 can be further hydrolyzed to produce Compound 1, for example. For example, a method for producing Compound 22 can also lead to the production of Compound 1, the method can comprise contacting Mogroside IVA with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzymes can be Sus1 and UGT76G1 for example.


As another example, the method for producing Compound 22 can comprise: contacting one or more of Mogroside V, Mogroside IVE, Siamenoside I, Mogroside IVE, ISO-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be UGT76G1, for example. The enzyme can also be sucrose synthase Sus1, for example. The enzymes can be sucrose synthase Sus1 and GT76G1, for example.


Example 27: Production of Compound 23



embedded image


As disclosed herein, Compound 23 can be an intermediate mogroside compound produced during the production of Compound 1 disclosed herein. Compound 23 can be further hydrolyzed to produce Compound 1, for example. For example, a method for producing Compound 22 can also lead to the production of Compound 1, the method can comprise contacting Mogroside IVE with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be dextransucrase, for example.


As another example, the method for producing Compound 23 can comprise: contacting one or more of Mogroside V, Mogroside IVE, Siamenoside I, Mogroside IVE, ISO-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be detransucrase, for example, which will hydrolyze the hyper glycosylated mogroside IVE isomers to the desired mogroside V isomer.


Examples 28 and 29: Production of Mogroside IIA1 and Mogroside IIA2 from Fungal lactase



embedded image


As disclosed herein, Mogroside IIA1 and Mogroside IIA2 can be intermediate mogroside compound produced during the production of Compound 1 disclosed herein. Mogroside IIA1 and Mogroside IIA2 can be further hydrolyzed to produce Compound 1, for example. For example, a method for producing Mogroside IIA1 and Mogroside IIA2 can also lead to the production of Compound 1, the method can comprise contacting Mogroside IVE with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be a lactase from a fungus, for example.


As another example, the method for producing Mogroside IIA1 and Mogroside IIA2 can include: contacting one or more of Mogroside V, Mogroside IVE, Siamenoside I, Mogroside IVE, Iso-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases.


Example 30: Production of Mogroside IA from Viscozyme



embedded image


As disclosed herein, Mogroside IA can be intermediate mogroside compound produced during the production of Compound 1 disclosed herein. Mogroside IA_can be further hydrolyzed to produce Compound 1, for example. A method for producing Mogroside IA_can also lead to the production of Compound 1, the method can comprise contacting Mogroside IIA with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be Viscozyme, for example.


As another example, the method for producing Mogroside IA can comprise: contacting one or more of Mogroside V, Mogroside IVE, Siamenoside I, Mogroside IVE, ISO-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be Viscozyme, for example.


Example 31: Production of Compound 24



embedded image


As disclosed herein, Compound 24 can be intermediate mogroside compound produced during the production of Compound 1 disclosed herein. Compound 24 can be further hydrolyzed to produce Compound 1, for example. A method for producing Compound 24_can also lead to the production of Compound 1, the method can comprise contacting mogroside IIIE with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be dextransucrase DexT, for example.


As another example, the method for producing Compound 24 can comprise: contacting one or more of Mogroside V, Mogroside IVE, Siamenoside I, Mogroside IVE, ISO-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be dextransucrase DexT, for example.


Example 32: Production of Compound 25



embedded image


As disclosed herein, Compound 25 can be intermediate mogroside compound produced during the production of Compound 1 disclosed herein. Compound 25 can be further hydrolyzed to produce Compound 1, for example. A method for producing Compound 25_can also lead to the production of Compound 1, the method can comprise contacting mogroside IIIE with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be dextransucrase DexT, for example.


As another example, the method for producing Compound 25 can comprise: contacting one or more of Mogroside V, Mogroside IVE, Siamenoside I, Mogroside IVE, ISO-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be dextransucrase DexT, for example.


Example 33: Production of Compound 26



embedded image


As disclosed herein, Compound 26 can be intermediate mogroside compound produced during the production of Compound 1 disclosed herein. Compound 26 can be further hydrolyzed to produce Compound 1, for example. A method for producing Compound 26 can also lead to the production of Compound 1, the method can comprise contacting mogroside IIIE with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be dextransucrase DexT, for example.


As another example, the method for producing Compound 26 can include: contacting one or more of Mogroside V, Mogroside IVE, Siamenoside I, Mogroside IVE, ISO-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be dextransucrase DexT, for example.


Examples 34 and 35: Production of Mogrol and Mogroside IE from Pectinase



embedded image


As disclosed herein, Mogrol and Mogroside IE can be intermediate mogroside compounds produced during the production of Compound 1 disclosed herein. Mogrol can be used as a substrate for producing Mogroside IA1, which is further hydrolyzed to form Compound 1 and Mogroside IE can be further hydrolyzed to produce Compound 1, for example. A method for producing Mogrol and Mogroside can also lead to the production of Compound 1, the method can comprise contacting mogroside V with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be pectinase enzyme from Aspergillus aculeatus, for example.


As another example, the method for producing Mogrol and Mogroside IE can comprise: contacting one or more of Mogroside V, Mogroside IVE, Siamenoside I, Mogroside IVE, Iso-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases.


Example 36: Production of Mogroside IIE



embedded image


As disclosed herein, Mogroside IE can be intermediate mogroside compounds produced during the production of Compound 1 disclosed herein. Mogroside IE can be further hydrolyzed to produce Compound 1, for example. A method for producing Mogroside IE can also lead to the production of Compound 1, the method can comprise contacting mogroside V with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be pectinase enzyme from Aspergillus aculeatus, for example.


As another example, the method for producing Mogroside IE can comprise: contacting one or more of Mogroside V, Mogroside IVE, Siamenoside I, Mogroside IVE, ISO-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases.


Examples 37 and 38: Production of Compounds 32 and 33



embedded image


As disclosed herein, Compounds 32 and 33 can be intermediate mogroside compounds produced during the production of Compound 1 disclosed herein. Compounds 32 and 33 can be further hydrolyzed to produce Compound 1, for example. A method for producing Compounds 32 and 33 can also lead to the production of Compound 1, the method can comprise contacting one or more of Mogroside V, Mogroside IVE, Siamenoside I, Mogroside IVE, ISO-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be pectinase enzyme from Aspergillus aculeatus, for example.


As another example, the method for producing Compound 32 and 33 can comprise: contacting one or more of Mogroside V, Mogroside IVE, Siamenoside I, Mogroside IVE, Iso-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases.


Examples 39 and 40: Production of Compounds 34 and 35



embedded image


As disclosed herein, Compounds 34 and 35 can be intermediate mogroside compounds produced during the production of Compound 1 disclosed herein. Compounds 32 and 33 can be further hydrolyzed to produce Compound 1, for example. A method for producing Compounds 34 and 35 can also lead to the production of Compound 1, the method can comprise contacting mogroside IIIE with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be celluclast, for example.


As another example, the method for producing Compounds 34 and 35 can comprise: contacting one or more of Mogroside V, Mogroside IVE, Siamenoside I, Mogroside IVE, Iso-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases.


Examples 41 and 42: Production of Mogroside IIIA2 and Mogroside III



embedded image


As disclosed herein, Mogroside IIIA2 and Mogroside III can be intermediate mogroside compounds produced during the production of Compound 1 disclosed herein. Mogroside IIIA2 and Mogroside III can be further hydrolyzed to produce Compound 1, for example.


For example Mogroside IIIA2 and Mogroside III can be also contact UGT to form Mogroside IVA, another mogroside compound that can be used to make Mogroside IIIE, which is further hydrolyzed to form Compound 1.


A method for producing Mogroside IIIA2 and Mogroside III can also lead to the production of Compound 1, the method can comprise contacting mogroside IIIE with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be celluclast, for example.


As another example, the method for producing Mogroside IIIA2 and Mogroside III can comprise: contacting one or more of Mogroside V, Mogroside IVE, Siamenoside I, Mogroside IVE, Iso-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, β-glucosidases, amylases, transglucosidases, pectinases, and dextranases.


Example 43: Use of CGT-SL Enzyme to Produce Compound 1

In 1 ml reaction volume, 5 mg of Mogroside IIIE, 50 mg of soluble starch, 0.1M NaOAC pH 5.0, 125 ul of CGT-SL enzyme (from Geobaccilus thermophillus) and water was mixed and with a stir bar and incubated at 50 C. Time point samples were taken for HPLC.


HPLC Data: Mass spec of Compound 1 production as shown in FIG. 1. In some embodiments, CTG-SL can comprise a sequence set forth in SEQ ID NO: 3, 148 or 154.


Example 44: Cloning: Gene Encoding for Dextransucrase Enzyme was PCR Amplified from Leuconostoc citreum ATCC11449 and Cloned into pET23a

Growth conditions: BL21 Codon Plus RIL strain was grown in 2×YT at 37 C, 250 rpm until OD600 of 1. 10 mM of lactose was added for induction, incubated at room temperature, 150 rpm overnight. Crude extract used for the reaction was obtained either by sonication or osmotic shock.


In some embodiments, the dextransucrase comprises, or consists of, an amino acid sequence of any one of SEQ ID NOs: 2, 103, 106-110, 156, and 896. In some embodiments, the DexT can comprises an amino acid sequence set forth in SEQ ID NO: 103. In some embodiments, the DexT comprises a nucleic acid sequence set forth in SEQ ID NO: 104 or 105.


Example 45: Reaction of Mogroside IIIE with S. mutans Clarke ATCC25175 Dextransucrase to Produce Compound 1

Growth conditions: The strain indicated above was grown anaerobically with glucose supplementation as indicated in Wenham, Henessey and Cole (1979) to stimulate dextransucrase production. 5 mg/ml Mogroside IIE was added to the growth media. Time point samples were taken for HPLC. HPLC Data is presented as mass spec of Compound 1 production in FIG. 2.


Example 46: Reaction of Mogroside IIIE with CGTase

In 1 ml reaction volume, 5 mg of Mogroside IIIE, 50 mg of soluble starch, 0.1M NaOAC pH 5.0, 125 ul of enzyme and water was mixed and with a stir bar and incubated at 50 C. Time point samples were taken for HPLC. The enzyme used was CGTase. The product of Compound 1 is seen in the HPLC data and mass spectroscopy data as shown in FIG. 1. Mass peaks correspond to the size of Compound 1.


Example 47: Reaction of Mogroside IIIE with Celluclast

Celluclast xylosylation were performed with mogroside IIE with celluclast from the native host: Trichoderma reesei


Reaction conditions: 5 mg of Mogroside IIIE, 100 mg xylan, 50 ul Celluclast were mixed in a total volume of 1 ml with 0.1M sodium acetate pH 5.0, incubated at 50 C with stirring. Time point samples were taken for HPLC.


Xylosylated product is highlighted in FIG. 3. Products from xylosylation can be used as intermediates in production of Compound 1. The sequences for Celluclast are included in Table 1 and is used herein for the production of xylosylated products.


Example 48: Glycosyltransferases (Maltotriosyl Transferase) (Native Host: Geobacillus sp. APC9669)

In this example, glycosytransferase AGY15763.1 (Amano Enzyme U.S.A. Co., Ltd., Elgin, Ill.; SEQ ID NO: 434, see Table 1) was used. 20 mL d water, 0.6 ml 0.5M MES pH 6.5, 6 g soluble starch, 150 mg Mogroside IIIE, and 3 ml enzyme were added to a 40 ml flat-bottom screw cap vial. The vial was sealed with black cap, incubated at 30° C. and stirred at 500 rpm using magnetic bar. 3 more identical reactions were set up for a total of 600 mg Mogroside IIE used as starting material. The reaction was stopped after 24 hours. Insoluble starch was removed by centrifugation (4000 rpm for 5 min, Eppendorf). The supernatant was heated to 80° C. for 30 minutes with stirring (500 rpm), followed by centrifugation (4000 rpm for 10 min, Eppendorf). The supernatant was filtered through a 250 ml, 0.22 micron PES and checked by LC-MS (Sweet Naturals 2016-Enzymatic_2016Q4_A.SPL, line 1254) to obtain HPLC data


The AGY15763.1 protein (SEQ ID NO: 434) can be encoded by the native gDNA (SEQ ID NO: 437) or codon optimized (for E. coli) DNA sequence (SEQ ID NO: 438)


An example of additional glycosytransferase expected to perform similarly is the UGT76G1 protein from Stevia rebaudiana (SEQ ID NO: 439), which can be expressed in E. coli. The native coding sequence for UGT76G1 (SEQ ID NO: 439) is provided in SEQ ID NO: 440).


Example 49: UDP-Glycosyltransferases UGT73C5 in the Presence of Mogrol

Mogrol was reacted with UDP-glycosyltransferases which produced Mogroside I and Mogroside II. 1 mg/ml of Mogrol was reacted with 200 ul crude extract containing UGT73C5 (A. thaliana)(334), 2 ul crude extract containing sucrose synthase, 5 mM UDP, lx M221 protease inhibitor, 200 mM sucrose, 0.5 mg/ml spectinomycin, in 0.1M Tris-HCl pH7.0, incubated at 30 C. Samples were taken after 2 days for HPLC. The reaction products were from Mogrol to Mogroside I and Mogroside II as shown in FIGS. 4 and 5.


The protein sequence of UGT73C5 is shown in SEQ ID NO: 441, the native DNA coding sequence for UGT73C5 (SEQ ID NO: 441) is shown in SEQ ID NO: 442, and the UGT73C5 coding sequence (Codon optimized for E. coli) is shown in SEQ ID NO: 443.


Example 50: UDP-Glycosyltransferases (UGT73C6) in the Presence of Mogrol to Produce Mogroside I

Reaction conditions: 1 mg/ml of Mogrol was reacted with 200 ul crude extract containing UGT73C6, 2 ul crude extract containing sucrose synthase, 5 mM UDP, lx M221 protease inhibitor, 200 mM sucrose, 0.5 mg/ml spectinomycin, in 0.1M Tris-HCl pH7.5, incubated at 30 C. Samples were taken after 2 days for HPLC. The reaction product was Mogroside I from Mogrol. As shown in the HPLC data and Mass spectroscopy data of FIG. 6.


The protein and gDNA sequence encoding A. thaliana UGT73C6 is shown in SEQ ID NO: 444 and SEQ ID NO: 445, respectively.


Example 51: UDP-Glycosyltransferases (338) in the Presence of Mogrol to Produce Mogroside I, Mogroside IIA and Two Different Mogroside III Products

Reaction conditions: 1 mg/ml of Mogrol or Mogroside IIA was reacted with 200 ul crude extract containing 338, 2 ul crude extract containing sucrose synthase, 5 mM UDP, lx M221 protease inhibitor, 200 mM sucrose, 0.5 mg/ml spectinomycin, in 0.1M Tris-HCl pH8.5, incubated at 30 C. Samples were taken after 2 days for HPLC


Mogrol reaction with Bacillus sp. UDP-glycotransferase (338) (described in Pandey et al., 2014; incorporated by reference in its entirety herein) led to the reaction products: Mogroside I, Mogroside IIA, and 2 different Mogroside III products. FIGS. 7-9 show the HPLC and mass spectroscopy data for the products obtained after the reaction. FIG. 8 shows the peaks which correlate to the size of Mogrol IIA.


The protein and gDNA sequence encoding UGT 338 is provided in SEQ ID NO: 405 and SEQ ID NO: 406, respectively.


Example 52: UDP-Glycosyltransferases (301 (UGT98)) in the Presence of Mogroside IIIE to Produce Siamenoside I and Mogroside V

Reaction conditions: 1 mg/ml of Mogroside IIIE was reacted with 200 ul crude extract containing 301, 2 ul crude extract containing sucrose synthase, 5 mM UDP, lx M221 protease inhibitor, 200 mM sucrose, 0.5 mg/ml spectinomycin, in 0.1 M Tris-HCl pH7.0, incubated at 30 C. Samples were taken after 2 days for HPLC and mass spec analysis. The reaction products from Mogroside IIIE were Siamenoside I and Mogroside V as shown in FIGS. 10-11.


The protein and gDNA sequence encoding S. grosvenorii 301 UGT98 is provided in SEQ ID NO: 407 and SEQ ID NO: 408, respectively.


Example 53: UDP-Glycosyltransferases (339) in the Presence of Mogrol, Siamenoside I or Compound 1 to Produce Mogroside I from Mogrol, Isomogroside V from Siamenoside I and Compound 1 Derivative from Compound 1

Reaction conditions: 1 mg/ml of Mogrol, Siamenoside I or Compound 1 was reacted with 200 ul crude extract containing 339 (described in Itkin et al., incorporated by reference in its entirety herein), 2 ul crude extract containing sucrose synthase, 5 mM UDP, lx M221 protease inhibitor, 200 mM sucrose, 0.5 mg/ml spectinomycin, in 0.1M Tris-HCl pH7.0, incubated at 30 C. Samples were taken after 2 days for HPLC


The reaction products from Mogrol lead to Mogroside I, Siamenoside I lead to Isomogroside V, and Compound 1 led to a Compound 1 derivative (FIGS. 12-14).


The protein and DNA sequence encoding S. grosvenorii UGT339 is provided in SEQ ID NO: 409 and SEQ ID NO: 410, respectively.


Example 54: UDP-Glycosyltransferases (330) in the Presence of Mogroside IIA, Mogroside IIE, Mogroside IIIE, Mogroside IVA, or Mogroside IVE to Produce Mogroside IIIA, Mogroside IVE, and Mogroside V

As described herein, the use of UDP-glycotransferase (330) as described in Noguchi et al. 2008 (incorporated by reference in its entirety herein) led to the reaction products Mogroside IIIA, Mogroside IVE, Mogroside V. The native host is Sesamum indicum, and the production host was SF9. For the reaction, 1 mg/ml of Mogroside IIA, Mogroside IIE, Mogroside IIIE, Mogroside IVA, or Mogroside IVE was reacted with 200 ul crude extract containing 330, 2 ul crude extract containing sucrose synthase, 5 mM UDP, lx M221 protease inhibitor, 200 mM sucrose, 0.5 mg/ml spectinomycin, in 0.1M Tris-HCl pH7.0, incubated at 30 C. Samples were taken after 2 days for HPLC.


The reaction led to surprising products such as Mogroside IIIA, Mogroside IVE, Mogroside V. As shown in FIGS. 15-20, the sizes of the compounds produced correspond to Mogroside IIIA, Mogroside IVE, and Mogroside V.


The protein and gDNA sequence encoding the S. grosvenorii UGT330 protein is provided in SEQ ID NO: 411 and SEQ ID NO: 412, respectively.


Example 55: UDP-Glycosyltransferases (328) (Described in Itkin et al) in the Presence of Mogroside IIA, Mogroside IIE, Mogroside IIIE, Mogroside IVA, or Mogroside IVE to Produce Mogroside IIIA, Mogroside IVE, and Mogroside V

Reaction conditions: 1 mg/ml of Mogroside IIIE was reacted with 200 ul crude extract containing 330, 2 ul crude extract containing sucrose synthase, 5 mM UDP, lx M221 protease inhibitor, 200 mM sucrose, 0.5 mg/ml spectinomycin, in 0.1M Tris-HCl pH7.0, incubated at 30 C. Samples were taken after 2 days for HPLC


The reaction products were Mogroside IVE and Mogroside V. As shown in FIGS. 21-22, the size of the products in the mass spectroscopy data corresponds to Mogroside IVE and Mogroside V. S. grosvenorii UGT328 protein (glycosyltransferase) and coding sequence thereof is provided in SEQ ID NO: 413 and 414, respectively.


The sucrose synthase AtSus1 protein sequence and the gDNA encodes the AtSus1 protein are provided in SEQ ID NO: 415 and 416, respectively.


Example 56: Mogrol Production in Yeast

DNA was obtained through gene synthesis either through Genescript or IDT. For some of the cucurbitadienol synthases, cDNA or genomic DNA was obtained through 10-60 day old seedlings followed by PCR amplification using specific and degenerate primers. DNA was cloned through standard molecular biology techniques into one of the following overexpression vectors: pESC-Ura, pESC-His, or pESC-LEU. Saccharomyces cerevisiae strain YHR072 (heterozygous for erg7) was purchased from GE Dharmacon. Plasmids (pESC vectors) containing Mogrol synthesis genes were transformed/co-transformed by using Zymo Yeast Transformation Kit II. Strains were grown in standard media (YPD or SC) containing the appropriate selection with 2% glucose or 2% galactose for induction of heterologous genes at 30 C, 220 rpm. When indicated, lanosterol synthase inhibitor, Ro 48-8071 (Cayman Chemicals) was added (50 ug/ml). Yeast production of mogrol and precursors were prepared after 2 days induction, followed by lysis (Yeast Buster), ethyl acetate extraction, drying, and resuspension in methanol. Samples were analyzed through HPLC.


Production of cucurbitadienol was catalyzed by cucurbitadienol synthase S. grosvernorii SgCbQ in growth conditions with no inhibitor.


Production of cucurbitadienol is shown in the HPLC and mass spectroscopy data which show mass peaks for the indicated product (FIG. 23). The protein sequence and DNA sequence encoding S. grosvernorii SgCbQ are provided in SEQ ID NO: 446 and 418, respectively.


Cpep2 was also used for the production of cucurbitadienol in yeast. As shown in FIG. 24, is the mass spectroscopy profile which shows peaks and characteristic fragments that correspond with cucurbitadienol. Protein sequence of Cpep2 and DNA sequence encoding Cpep2 protein is provided in SEQ ID NO: 420 and 421, respectively.



Cucurbita pepo (Jack O' Lantern) Cpep4 was also used in the production of cucurbitadienol under growth conditions with no inhibitor. Production of cucurbitadienol is shown in the mass spectral data shown in FIGS. 24 and 25. As shown the peaks and fragments correspond to cucurbitadienol. The protein sequence and DNA sequence encoding Cpep4 are provided in SEQ ID NOs: 422 and 423, respectively.


A putative cucurbitadienol synthase protein sequence representing Cmax was obtained from native host Cucurbita maxima. The deduced coding DNA sequence will be used for gene synthesis and expression. The cucurbitadienol synthase sequences for the protein and DNA encoding the cucurbitadienol synthase is shown below:


Proteins and DNA coding sequences below were obtained through alignment of genomic DNA PCR product sequence with known cucurbitadienol synthase sequences available through public databases (Pubmed). It is expected that any one of these Cmax proteins may be used in the methods, systems, compositions (e.g., host cells) disclosed herein to produce Compound 1. A non-limiting exemplary Cmax protein is Cmax1 (protein) (SEQ ID NO: 424) encoded by Cmax1 (DNA) (SEQ ID NO: 425).


A putative cucurbitadienol synthase protein sequence representing Cmos1 was obtained from native host Cucurbita moschata. The deduced coding DNA sequence is used for gene synthesis and expression. Protein(s) and DNA coding sequence(s) shown below were obtained through alignment of genomic DNA PCR product sequence with known cucurbitadienol synthase sequences available through public databases (Pubmed). Any one of these Cmos proteins may be used in the methods, systems, compositions (e.g., host cells) disclosed herein to produce Compound 1. A non-limiting exemplary Cmos1 protein is Cmos1 (protein) (SEQ ID NO: 426) encoded by Cmos1 (DNA) (SEQ ID NO: 427).


Example 57: Production of Dihydroxycucurbitadienol in Yeast (Cucurbitadienol Synthase & Epoxide Hydrolase)

The production of dihydroxycucurbitadienol in yeast was considered using cucurbitadienol synthase & epoxide hydrolase. The native host for these enzymes is S. grosvenorii.


Growth conditions: SgCbQ was co-expressed with an epoxide hydrolase (EPH) in the presence of lanosterol synthase inhibitor.


Possible dihydroxycucurbitadienol product is shown in FIG. 26.


EPH protein sequence and a DNA encoding EPH protein (codon optimized S. cerevisiae) is provided in SEQ ID NO: 428 and 429, respectively.


Example 58: Production of Mogrol from Cucurbitadienol Synthase, Epoxide Hydrolase, Cytochrome P450 and Cytochrome P450 Reductase

Four enzymes, including Cucurbitadienol synthase, epoxide hydrolase, cytochrome P450, and cytochrome P450 reductase are co-expressed in S. cerevisiae. For the growth conditions SgCbQ, EPH, CYP87D18 and AtCPR (cytochrome P450 reductase from A. thaliana) are co-expressed in the presence of lanosterol synthase inhibitor. Production of mogrol by S. cerevisiae is expected. The protein sequence and DNA sequence encoding SgCbQ, EPH, CYP87D18 and AtCPR (cytochrome P450 reductase from A. thaliana) are: CYP87D18 (protein) (SEQ ID NO: 430), and CYP87D18 (DNA) (SEQ ID NO: 431); and AtCPR (protein) (SEQ ID NO: 432), and AtCPR (DNA) (SEQ ID NO: 433).


Example 59: Compound 1 is Tolerant to Microbial Hydrolysis

Yeast strains Saccharomyces cerevisiae, Yarrowia lipolytica and Candida bombicola, were incubated in YPD supplemented with 1 mg/ml Mogroside V or Compound 1. After 3 days, supernatants were analyzed by HPLC.


As shown in the HPLC data, epoxide hydrolase hydrolyzed Mogroside V to Mogroside IIIE. There was no hydrolysis products observed with Compound 1 (FIG. 37).


Example 60: Streptococcus mutans Clarke ATCC 25175 Dextransucrase


Streptococcus mutans Clarke can be grown anaerobically with glucose supplementation. An example of growth conditions can be found in Wenham, Henessey and Cole (1979), in which the method is used to stimulate dextransucrase production, for example. 5 mg/ml Mogroside IIIE was added to the growth media. Time point samples to monitor production can be taken for HPLC, for example. Sequences for various dextransucrase can be found in the Table 1, which include protein sequences for dextransucrases and nucleic acid sequences that encode dextransucrases (for example, SEQ ID NOs: 157-162). In some embodiments, the dextransucrase comprises, or consists of, an amino acid sequence of any one of SEQ ID NOs: 2, 103, 106-110, 156, and 896. In some embodiments, the DexT can comprises an amino acid sequence set forth in SEQ ID NO: 103. In some embodiments, the DexT comprises a nucleic acid sequence set forth in SEQ ID NO: 104 or 105. In some embodiments, herein the recombinant cell encodes a protein comprising the sequence set forth in any one of SEQ ID NO: 156-162 and/or comprises a nucleic acid encoding dextransucrase comprising a nucleic acid sequence set forth in any one of SEQ ID NOs: 157-162. This example is used to produce Compound 1.


Example 61: 90% Pure Compound 1 Production Procedure and Sensory Evaluation

A fraction containing the mixture of 3 α-mogroside isomers is obtained by treating mogroside IIIE (MIIIE) with Dextransucrase/dextranase enzymes reaction followed by SPE fractionation. Based on UPLC analysis this mixture has 3 isomers, 11-oxo-Compound 1, Compound 1 and mogroside V isomer in 5:90:5% ratios respectively. These 3 isomers are characterized from the purification of a different fraction/source by LC-MS, 1D and 2D NMR spectra and by the comparison of closely related isomers in mogrosides series reported in the literature. This sample is further evaluated in sensory by comparing with pure Compound 1 sample using a triangle test.


Enzyme Reaction and Purification Procedure


100 mL of pH 5.5 1M sodium acetate buffer, 200 g sucrose, 100 mL dextransucrase DexT (1 mg/ml crude extract, pET23a, BL21-Codon Plus-RIL, grown in 2×YT), 12.5 g of Mogroside IIIE and 600 mL water were added to a 2.8 L shake flask, and the flask was shaken at 30° C., 200 rpm. The progress of the reaction was monitored periodically by LC-MS. After 72 hours, the reaction was treated with 2.5 mL of dextranase (Amano) and continued shaking the flask at 30° C. After 24 hours the reaction mixture was quenched by heating at 80° C. and centrifuged at 5000 rpm for 5 minutes and the supernatant was filtered and loaded directly onto a 400 g C18 SPE column and fractionated using MeOH: H2O 5/25/50/75/100 step-gradient. Each step in the gradient was collected in 6 jars, with 225 mL in each jar. The desired products were eluted in the second jar of the 75% MeOH fraction (SPE 75_2) and dried under reduced pressure. It was further re-suspended/dissolved in 7 mL of H2O, freezed and lyophilized the vial for 3 days to get 1.45 g of white solid.


As per the UPLC analysis (FIGS. 28 and 29), the mixture has 3 characterized α-mogroside isomers; 11-oxo-Compound 1, Compound 1 and mogroside V isomer in 5:90:5% ratios, respectively. No residual solvent and/or structurally unrelated impurities were observed based on 1H and 13C-NMR (Pyridine-d5+D2O) analysis.


Sensory Evaluation


Triangle testing for pure Compound 1 vs. 90% pure Compound 1 was performed on Nov. 10, 2016. Two different compositions: (1) LSB+175 ppm pure Compound 1 (standard) and (2) LSB+175 ppm 90% pure Compound 1 were tested. All samples of compositions were made with Low Sodium Buffer (LSB) pH 7.1 and contain 0% ethanol.


Conclusions: Panelists found that composition (1) LSB+175 ppm pure_Compound 1 (standard) was not significantly different than composition (2) LSB+175 ppm 90% pure Compound 1 (test) (p>0.05). Some of the testing analytical results are shown in Tables 2-4.









TABLE 2







Frequency of panelists that correctly selected the different sample.


n = 38 (19 panelists × 2 reps).










Samples
Total














Incorrect
24



Correct
14



Total
38



Correct Sample
0.381



Selected (p-value)

















TABLE 3







Analytical Results: Test Day








Theoretical # (μM)
Observed (μM)





175 ppm (155.51uM) pure_compound 1
132.20 ± 1.54 (n = 2)


(standard)


175 ppm (155.56uM) 90%_pure_compound 1
157.62 ± 0.63 (n = 2)


(test)
















TABLE 4







Analytical results: the day before the testing day








Theoretical # (μM)
Observed (μM)





175 ppm (155.51uM) pure_compound 1
134.48 ± 7.31 (n = 2)


(standard)


175 ppm (155.56uM) 90%_pure_compound 1
140.69 ± 4.34 (n = 2)


(test)









Example 62: Gene Expression in Recombinant Yeast Cells

DNA was obtained through gene synthesis either through Genescript, IDT, or Genewiz. For some of the cucurbitadienol synthases, cDNA or genomic DNA was obtained through 10-60 day old seedlings followed by PCR amplification using specific and degenerate primers. DNA was cloned through standard molecular biology techniques or through yeast gap repair cloning (Joska et al., 2014) into one of the following overexpression vectors: pESC-Ura, pESC-His, or pESC-LEU. Gene expression was regulated by one of the following promoters; Gall, Gal10, Tef1, or GDS. Yeast transformation was performed using Zymo Yeast Transformation Kit II. Yeast strains were grown in standard media (YPD or SC) containing the appropriate selection with 2% glucose or 2% galactose for induction of heterologous genes. Yeast strains were grown in shake flask or 96 well plates at 30° C., 140-250 rpm. When indicated, lanosterol synthase inhibitor, Ro 48-8071 (Cayman Chemicals) was added (50 ug/ml). Yeast production of mogrol and precursors were prepared through lysis (Yeast Buster), ethyl acetate extraction, drying, and resuspension in methanol. Samples were analyzed through LCMS methods described below using A/B gradient (A=H2O, B=acetonitrile):


For analyzing diepoxysqualene, the LCMS method included the use of C18 2.1×50 mm column, 5% B for 1.5 min, gradient 5% to 95% B or 5.5 min, 95% B for 6 min, 100% B for 3 min, 5% B for 1.5, and all at flow rate of 0.3 ml/min.


For analyzing cucurbitadienol, the first LCMS method included the use of C4 2.1×100 mm column, gradient 1 to 95% B for 6 minutes, and at flow rate of 0.55 ml/min; and the second LCMS method included the use of Waters Acquity UPLC Protein BEH C4 2.1×100 mM, 1.7 um, with guard, 62 to 67% B for 2 min, 100% B for 1 min, and at flow rate of 0.9 ml/min.


For analyzing 11-OH cucurbitadienol, the LCMS method included the use of C8 2.1×100 mm column, gradient 60 to 90% B for 6 minutes at flow rate of 0.55 ml/min


For analyzing Mogrol, the LCMS method included the use of C8 2.1×100 mm column, gradient 50 to 90% B for 6 minutes at flow rate of 0.55 ml/min


For analyzing Mogroside IIIE & Compound 1, the LCMS method included the use of Fluoro-phenyl 2.1×100 mm column, gradient 15 to 30% B for 6 minutes, at flow rate of 0.55 ml/min.


Example 63

Step 1. Boosting Oxidosqualene Availability



Saccharomyces cerevisiae strain YHR072 (heterozygous for lanosterol synthase erg7) was purchased from GE Dharmacon. Expression of active erg7 gene was reduced by replacing the promoter with that of cup1 (Peng et al., 2015). A truncated yeast HMG-CoA reductase (tHMG-CoA) under control of GDS promoter and yeast squalene epoxidase (erg1) under the control of Tef1 promoter was integrated into the genome. Oxidosqualene boost was monitored by the production of diepoxysqualene as shown in the HPLC and UV absorbance (FIG. 31).

    • tHMG-CoA (protein) SEQ ID NO:898 (pathway 1)
    • tHMG-CoA (DNA) SEQ ID NO:897 (pathway 1)
    • Erg1 (protein) SEQ ID NO: 900; Erg1 (DNA) SEQ ID NO: 899


In some embodiments, tHMG-CoA enzyme is used for the production of diepoxysqualene.


Genes encoding for putative squalene epoxidases in S. grosvenorii (Itkins et al., 2016) were selected to test for boosting oxidosqualene/diepoxysqualene production. The sequences of 3 squalene epoxidases can be found in Table 1 for their amino acids and the coding sequence (SEQ ID NO: 50-56, 60, 61, 334 or 335). Additional sequences for squalene epoxidases suitable to use in the methods, systems and compositions disclosed herein for producing oxidosqualene and/or diepoxysqualene, and for boosting the production of oxidosqualene and/or diepoxysqualene include: SQE1 (protein) SEQ ID NO: 908, SQE1 (DNA) SEQ ID NO: 909; SQE2 (protein) SEQ ID NO: 910, SQE2 (DNA) SEQ ID NO: 911; SQE3 (protein) SEQ ID NO: 912, and SQE3 (DNA) SEQ ID NO: 913.


Step 2. Cucurbitadienol Production


Cucurbitadienol Synthase Enzymes


Plasmids containing S. grosvernorii cucurbitadienol synthase gene (SgCbQ) were transformed into yeast strain with oxidosqualene boost. Strains were grown 1-3 days at 30 C, 150-250 rpm. Production of cucurbitadienol is shown in the HPLC and mass spectroscopy data which show mass peaks for the indicated product (FIG. 23). The SgCbQ protein and gDNA encoding SgCbQ is provided in SEQ ID NO: 446 and SEQ ID NO: 418, respectively. Cucurbita pepo (Jack O' Lantern) protein Cpep2 was also used for the production of cucurbitadienol in yeast. FIG. 24 shows the mass spectroscopy profile which contains peaks and characteristic fragments that correspond with cucurbitadienol. The Cpep2 protein and DNA encoding Cpep2 is provided in SEQ ID NO: 420 and SEQ ID NO: 421, respectively. Cucurbita pepo (Jack O' Lantern) protein Cpep4 was also used in the production of cucurbitadienol. The host cells were cultivated under the growth conditions with no inhibitor. Production of cucurbitadienol is demonstrated in the mass spectral data shown in FIG. 25. As shown, the peaks and fragments correspond to cucurbitadienol. The Cpep4 protein and DNA encoding Cpep4 is provided in SEQ ID NO: 422 and SEQ ID NO: 423, respectively. The Cucurbita maxima protein Cmax was also used for the production of cucurbitadienol in yeast. FIG. 32 shows the mass spectroscopy profile which contains peaks and characteristic fragments that correspond with cucurbitadienol. The Cmax1 protein sequence is provided in SEQ ID NO: 424, and the coding sequence for Cmax1 (DNA) is provided in SEQ ID NO: 425. Cucumis melo protein Cmelo was also used for the production of cucurbitadienol in yeast. FIG. 32 shows the mass spectroscopy profile which contains peaks and characteristic fragments that correspond with cucurbitadienol. The Cmelo protein sequence is provided in SEQ ID NO: 902, and the coding sequence for Cmelo (DNA) is provided in SEQ ID NO: 901. It is expected that Cucurbita moschata protein Cmos1 can also be used for the production of cucrbitodienol in recombinant host cells, for example yeast cells. Cmos1 sequences Cmos1 (protein) (SEQ ID NO: 426) and Cmos1 (DNA) (SEQ ID NO: 427) were obtained through alignment of genomic DNA PCR product sequence with known cucurbitadienol synthase sequences available through public databases (Pubmed). It is expected that Cmost 1 protein (SEQ ID NO: 426) can be used for the production of cucurbitadienol in recombinant host cells, for example yeast cells.


Converting Other Oxidosqualene Cyclases into a Cucurbitadienol Synthase


Plasmids containing modified oxidosqualene genes were transformed into yeast strain with oxidosqualene boost. Strains were grown 1-3 days at 30 C, 150-250 rpm.


The protein PSX Y118L from the native host Pisum sativum was also used for the production of cucurbitadienol in yeast. FIG. 33 shows the mass spectroscopy profile which contain peaks and characteristic fragments that correspond with cucurbitadienol when the tyrosine at position 118 is converted into leucine. The sequences for the protein and DNA encoding the modified oxidosqualene cyclase are: PSXY118L (protein) (SEQ ID NO: 904) and PSXY118L (DNA, codon optimized) (SEQ ID NO: 903).


The oxidosqualene cyclase from Dictyostelium sp. was also used for the production of cucurbitadienol in yeast. As shown in FIG. 34, the HPLC peak of cucurbitadienol is shown when the tyrosine at position 80 is converted into leucine. The sequences for the protein and DNA encoding the modified oxidosqualene cyclase are: DdCASY80L (protein) (SEQ ID NO: 906) and DdCASY80L (DNA) (SEQ ID NO: 905).


Improving Cucurbitadienol Synthase Activities


The gene encoding for a cucurbitadienol synthase form Cucumis melo was codon optimized (SEQ ID: 907) and used as a starting point for generating a library of modifications. Modifications were introduced through standard molecular biology techniques consisting of fusion peptides at the N-terminus (i.e., 5′) or C-terminus (i.e., 3′) end of the enzyme. Plasmids libraries of modified cucurbitadienol synthase genes were transformed into a yeast strain with oxidosqualene boost. Enzyme activities were measured by ratios of peak heights or areas of 409 and 427 positive mass fragments at the expected retention times for cucurbitadienol vs. an internal standard using LCMS method 2 described above. Enzyme performance were scored as average % activities over the average activities of the parent enzyme (n=8). Step 1 sequences of the enzymes and the sequences that encode the enzyme can be found in SEQ ID NOs: 951-1012. Step 1 sequence also include the fusions SS2c-G10, SS2e-A7b, SS2d-G11, SS2e-A7a, SS4d-G5, SS4d-C7, SS3b-D8, and SS2c-A10a as described in Table 1.


Step 3. Production of 11-OH Cucurbitadienol


CYP87D18 (CYP450, S. grosvenorii) and SgCPR (CYP450 reductase, S. grosvenorii) were expressed in S. cerevisiae strain producing cucurbitadienol. 11-OH cucurbitadienol (i.e., 11-hydroxy cucurbitadienol) was observed using HPLC and mass spectroscopy data (FIG. 36). S. grosvenorii CYP87D18 protein sequence is shown in SEQ ID NO: 872, and CYP87D18 (codon optimized DNA) coding sequence is shown in SEQ ID NO: 871. S. grosvenorii SgCPR protein sequence is shown in SEQ ID NO: 874, and SgCPR1 (codon optimized DNA) coding sequence is shown in SEQ ID NO: 873.


Additional CYP450s from S. grosvenorii and Glycyrrhiza (CYP88D6) were expressed in S. cerevisiae strain producing cucurbitadienol. Protein sequences and DNA coding sequences for the enzymes are provided in SEQ ID NOs: 875-890.


Step 4. Production of Mogrol


CYP1798 (CYP450 enzyme, S. grosvenorii) and EPH2A (epoxide hydrolase, S. grosvenorii) were expressed in S. cerevisiae strain producing 11-OH cucurbitadienol. Mogrol was observed using HPLC and mass spectroscopy data (FIG. 36). For sequences, DNA coding and protein sequences for the enzymes are provided in SEQ ID NOs: 891-894.


Epoxidation of Cucurbitadienol and/or 11-OH Cucurbitadienol


Additional CYP450s and SQEs from S. grosvenorii and Glycyrrhiza (CYP88D6) were also expressed in S. cerevisiae strain producing cucurbitadienol or 11-OH cucurbitadienol to test for epoxidation.


For SQEs, protein and DNA coding sequences for the enzymes are provided in SEQ ID NOs: 882-888. For CYP450s, protein and DNA coding sequences for the enzymes are provided in SEQ ID NOs: 875-890.


Step 7: Production of Compound 1 from Mogroside IIIE in S. cerevisiae.



S. cerevisiae strain expressing a truncated dextransucrase (tDexT) was incubated in YPD (30C, 250 rpm) containing 7 mg/ml Mogroside V for 1-2 day resulting in hydrolysis to Mogroside IIIE. The S. cerevisiae cells were harvested, lysed, and then mixed back with the YPD supernatant containing Mogroside IIIE. To initiate the dextransucrase reaction, sucrose was added to a final concentration of 200 g/L, followed by incubation at 30 C, 250 rpm for 2 days. Production of Compound 1 was observed using HPLC (FIG. 37). Protein sequence for tDexT is shown in SEQ ID NO: 896, and the DNA coding sequence for tDexT is shown in SEQ ID NO. 895.


Example 64


S. cerevisiae or Y. lipolytica was grown in the presence of Mogroside V to allow the hydrolytic enzymes in the yeast to generate Mogroside IIIE. After 1 or 2 days, the cells were lysed in analyzed by HPLC to determine the mogroside content. After 1 day of incubation, S. cerevisiae produced a mixture of Mogroside V, Mogrosides IV, and Mogroside IIIE. After 2 days of incubation, substantially all of the mogrosides were converted to Mogroside IIIE as shown in FIG. 40A.


Similarly, after 2 days of incubation Y. lipolytica produced mostly Mogroside IIIE (shown in FIG. 40B).


Example 65


S. cerevisiae or Y. lipolytica was grown in the presence of Compound 1. Unlike other mogrosides (see Example 64), no hydrolysis products due to hydrolysis of Compound 1 was observed as shown in FIG. 41.


Example 66


S. cerevisiae was modified to overexpress a dextransucrase (DexT). This modified strain was grown in the presence of a mogrosides mixture to allow the hydrolytic enzymes in S. cerevisiae to generate Mogroside IIIE. After 2 days of incubation, the cells were lysed to release the DexT enzyme and supplemented with sucrose. After 24 hours, significant amounts of Compound 1 was produced (shown in FIG. 42)


Example 67: Generation of Fusion Proteins Having Cucurbitadienol Synthase Activity

A collection or library of S. cerevisiae in-frame fusion polynucleotides for a cucurbitadienol synthase gene (DNA coding sequence provided in SEQ ID NO: 907, and protein sequence provided in SEQ ID NO: 902) was prepared. The in-frame fusion polynucleotides were cloned into a yeast vector molecule to generate fusion proteins.


Various fusion proteins were generated and tested for cucurbitadienol synthase activities. The testing results for some of the fusion protein generated in this example are shown in Table 2.









TABLE 2







Cucurbitadienol synthase activities for the fusion proteins












SEQ ID NO
Activity
SEQ ID NO
Activity



for fusion
(as compared
for fusion
(as compared



protein
to the parent)
protein
to the parent)
















1024
166%
851
142%



854
135%
856
123%



859
105%
862
102%



865
125%
867
145%



915
124%
920
124%



924
121%
928
117%



932
128%
936
126%



940
109%
944
107%



948
102%
952
90%



956
85%
959
46%



964
74%
967
72%



971
89%
975
35%



979
96%
983
80%



987
111%
991
114%



995
124%
999
103%



1003
118%
1007
97%










Example 68: UDP-Gycosyltransferases (311 Enzyme, SEQ IDs: 436-438) in the Presence of Mogroside IIIE, Mogroside IVE or Mogroside IVA to Produce Mogroside IV and Mogroside V Isomers

Reaction conditions: To a 50 ml Falcon tube with 17 ml water, 3 ml of pH 7.0 1M Tris-HCl, 0.12 g UDP (Carbosynth), 3 g sucrose, 300 ul of protease inhibitor 100×M221, 150 ul of Kanamycin (50 mg/ml), 1.185 ml sucrose synthase Sus1 (1 mg/ml crude extract), 150 mg of starting Mogrosides, and 6 ml 311 enzyme (1 mg/ml crude extract) were added and incubated at 30° C., 150 rpm. The progress of the reaction was monitored periodically by LC-MS. After 3 days, the reaction was stopped by heating to 80° C. for 30 minutes with stirring (500 rpm). The reaction was then centrifuged (4000 rpm for 10 min, Eppendorf) and the supernatant was filtered through a 50 ml, 0.22 micron PVDF. The reaction products identified are depicted in FIG. 43.









TABLE 1







Some protein and DNA sequences disclosed herein












SEQ



Protein/DNA

ID



Description
Protein/DNA Sequence
NO
Reference













Cyclomaltodextrin
MKEKDKLKVNRNNVNFSKDIIYQIVTDRFHNGCPSYNPKGGLYDESRKNKKKYFGGDWIGIIEK
1



g1ucanotransferase
LNTNYFTELGVTSLWISQPVENIFTPINDLVGSTSYHGYWARDFKRTNPFFGTFGDFQTLITTA




(CGTase; Bacillus)
HAKDIKIIMDFAPNHTSPALHDDATYAENGRLYDNGLLLGGYDNDYNHYFHHNGGTDFEEYEDG





VYRNLFDLADLNHQNIAIDLYFKEAIKLWLDQGIDGIRVDAVKHMSYGWQKSWLNSIYNYRPVF





IFGEWYINPNEYDHRNVHFANNSGMSLLDFSFAHKVREVFRDGMDSMHGLHKMIEETYQIYNDV





NNLVTFIDNHDMDRFHINGQSKRRIEQSLVFLLTSRGIPSVYYGTEQYMVGNGDPNNRGQMESF





DVNTDNFKIIQSLSSLRSLNYALPYGNTKERYITNDIYVYERYFGSDVVLIALNRNLTEGYEIK





DVKTILPSRKYKDILDGLLDGEAIRVENNNIDSLWLGPGSGQVWHHKGVNSIPLIGTVGHKMTT





VGQIICIEGCGFTSKKGSVLFEEKEAEVVSWSHTSIKVKVPAVNDGKYEITVVTDTGTRSNIYK





HIEVLNTKQVCIRFVIENGYEIPESEVFIMGNTYSLGNMNPCKAVGPFFNQIMYQFPTGYFDIS





VPADTLLEFKFIRKINNTLLIEGGENHKYRTPSFGTGEVVVKWQTAEKTILVES




DexT protein
MPANAPDKQSVTNAPVVPPKHDTDQQDDSLEKQQVLEPSVNSNIPKKQTNQQLAVVTAPANSAP
2




QTKTTAEISAGTELDTMPNVKHVDGKVYFYGDDGQPKKNFTTIIDGKPYYFDKDTGALSNNDKQ





YVSELFSIGNKHNAVYNTSSDNFTQLEGHLTASSWYRPKDILKNGKRWAPSTVTDFRPLLMAWW





PDKSTQVTYLNYMKDQGLLSGTHHFSDNENMRTLTAAAMQAQVNIEKKIGQLGNTDWLKTAMTQ





YIDAQPNWNIDSEAKGDDHLQGGALLYTNSDMSPKANSDYRKLSRTPKNQKGQIADKYKQGGFE





LLLANDVDNSNPVVQAEQLNWLHYMMNIGSILQNDDQANFDGYRVDAVDNVDADLLQIAGEYAK





AAYGVDKNDARANQHLSILEDWGDEDPDYVKAHGNQQITMDFPLHLAIKYALNMPNDKRSGLEP





TREHSLVKRITDDKENVAQPNYSFIRAHDSEVQTIIADIIKDKINPASTGLDSTVTLDQIKQAF





DIYNADELKADKVYTPYNIPASYALLLTNKDTIPRVYYGDMFTDDGQYMAKQSPYYQAIDALLK





ARIKYAAGGQTMKMNYFPDEQSVMTSVRYGKGAMTASDSGNQETRYQGIGLVVNNRPDLKLSDK





DEVKMDMGAAHKNQDYRPVLLTTKSGLKVYSTDANAPVVRTDANGQLTFKADMVYGVNDPQVSG





YIAAWVPVGASENQDARTKSETTQSTDGSVYHSNAALDSQVIYEGFSNFQDFPTTPDEFTNIKI





AQNVNLFKDWGITSFEMAPQYRASSDKSFLDAIVQNGYAFTDRYDIGYNTPTKYGTADNLLDAL





RALHGQGIQAINDWVPDQIYNLPDEQLVTAIRTDGSGDHTYGSVIDHTLYASKTVGGGIYQQQY





GGAFLEQLKTQYPQLFQQKQISTDQPMNPDIQIKSWEAKYFNGSNIQGRGAWYVLKDWGTQQYF





NVSDAQTFLPKQLLGEKAKTGFVTRGKETSFYSTSGYQAKSAFICDNGNWYYFDDKGKMVVGNQ





VINGINYYFLPNGIELQDAYLVHDGMYYYYNNIGKQLHNTYYQDKQKNFHYFFEDGHMAQGIVT





IIQSDGTPVTQYFDENGKQQKGVAVKGSDGHLHYFDGASGNMLFKSWGRLADGSWLYVDEKGNA





VTGKQTINNQTVYFNDDGRQIKNNFKELADGSWLYLNNKGVAVTGEQIINGQTLYFGNDGRQFK





GTTHINATGESRYYDPDSGNMITDRFERVGDNQWAYFGYDGVAVTGDRIIKGQKLYFNQNGIQM





KGHLRLENGIMRYYDADTGELVRNRFVLLSDGSWVYFGQDGVPVTGVQVINGQTLYFDADGRQV





KGQQRVIGNQRYWMDKDNGEMKKITYAAALEHHHHHH




CGTase CGT-SL
MKRWLSVVLSMSLVFSAFFLVSDTQKVTVEAAGNLNKVNFTSDIVYQIVVDRFVDGNTSNNPSG
3




SLFSSGCTNLRKYCGGDWQGIINKINDGYLTEMGVTAIWISQPVENVFAVMNDADGSTSYHGYW





ARDFKKTNPFFGTLSDFQRLVDAAHAKGIKVIIDFAPNHTSPASETNPSYMENGRLYDNGTLIG





GYTNDTNSYFHHNGGTTFSNLEDGIYRNLFDLADFNHQNQFIDKYLKDAIKLWLDMGIDGIRMD





AVKHMPFGWQKSFMDEVYDYRPVFTFGEWFLSENEVDSNNHFFANESGMSLLDFRFGQKLRQVL





RNNSDDWYGFNQMIQDTASAYDEVIDQVTFIDNHDMDRFMADEGDPRKVDIALAVLLTSRGVPN





IYYGTEQYMTGNGDPNNRKMMTSFNKNTRAYQVIQKLSSLRRSNPALSYGDTEQRWINSDVYIY





ERQFGKDVVLVAVNRSLSKSYSITGLFTALPSGTYTDQLGALLDGNTIQVGSNGAVNAFNLGPG





EVGVWTYSAAESVPIIGHIGPMMGQVGHKLTIDGEGFGTNVGTVKFGNTVASVVSWSNNQITVT





VPNIPAGKYNITVQTSGGQVSAAYDNFEVLTNDQVSVRFVVNNANTNWGENIYLVGNVHELGNW





NTSKAIGPLFNQVIYSYPTWYVDVSVPEGKTIEFKFIKKDGSGNVIWESGSNHVYTTPTSTTGT





VNVNWQY




UGT73C3 protein
MATEKTHQFHPSLHFVLFPFMAQGHMIPMIDIARLLAQRGVTITIVTTPHNAARFKNVLNRAIE
4
SEQ ID NO:21



SGLAINILHVKFPYQEFGLPEGKENIDSLDSTELMVPFFKAVNLLEDPVMKLMEEMKPRPSCLI

in



SDWCLPYTSIIAKNFNIPKIVFHGMGCFNLLCMHVLRRNLEILENVKSDEEYFLVPSFPDRVEF

W02016050890



TKLQLPVKANASGDWKEIMDEMVKAEYTSYGVIVNTFQELEPPYVKDYKEAMDGKVWSIGPVSL

(which is



CNKAGADKAERGSKAAIDQDECLQWLDSKEEGSVLYVCLGSICNLPLSQLKELGLGLEESRRSF

incorporated



IWVIRGSEKYKELFEWMLESGFEERIKERGLLIKGWAPQVLILSHPSVGGFLTHCGWNSTLEGI

by reference



TSGIPLITWPLFGDQFCNQKLVVQVLKAGVSAGVEEVMKWGEEDKIGVLVDKEGVKKAVEELMG

in Its



DSDDAKERRRRVKELGELAHKAVEKGGSSHSNITLLLQDIMQLAQFKN

entirety)


UGT73C6 protein
MAFEKNNEPFPLHFVLFPFMAQGHMIPMVDIARLLAQRGVLITIVTTPHNAARFKNVLNRAIES
5
SEQ ID NO: 23



GLPINLVQVKFPYQEAGLQEGQENMDLLTTMEQITSFFKAVNLLKEPVQNLIEEMSPRPSCLIS

in



DMCLSYTSEIAKKFKIPKILFHGMGCFCLLCVNVLRKNREILDNLKSDKEYFIVPYFPDRVEFT

W02016050890



RPQVPVETYVPAGWKEILEDMVEADKTSYGVIVNSFQELEPAYAKDFKEARSGKAWTIGPVSLC





NKVGVDKAERGNKSDIDQDECLEWLDSKEPGSVLYVCLGSICNLPLSQLLELGLGLEESQRPFI





WVIRGWEKYKELVEWFSESGFEDRIQDRGLLIKGWSPQMLILSHPSVGGFLTHCGWNSTLEGIT





AGLPMLTWPLFADQFCNEKLVVQILKVGVSAEVKEVMKWGEEEKIGVLVDKEGVKKAVEELMGE





SDDAKERRRRAKELGESAHKAVEEGGSSHSNITFLLQDIMQLAQSNN




UGT85C2 sequence
MDAMATTEKKPHVIFIPFPAQSHIKAMLKLAQLLHHKGLQITFVNTDFIHNQFLESSGPHCLDG
6
SEQ ID NO: 25



APGFRFETIPDGVSHSPEASIPIRESLLRSIETNFLDRFIDLVTKLPDPPTCIISDGFLSVFTI

in



DAAKKLGIPVMMYWTLAACGFMGFYHIHSLIEKGFAPLKDASYLTNGYLDTVIDWVPGMEGIRL

W02016050890



KDFPLDWSTDLNDKVLMFTTEAPQRSHKVSHHIFHTFDELEPSIIKTLSLRYNHIYTIGPLQLL





LDQIPEEKKQTGITSLHGYSLVKEEPECFQWLQSKEPNSVVYVNFGSTTVMSLEDMTEFGWGLA





NSNHYFLWIIRSNLVIGENAVLPPELEEHIKKRGFIASWCSQEKVLKHPSVGGFLTHCGWGSTI





ESLSAGVPMICWPYSWDQLTNCRYICKEWEVGLEMGTKVKRDEVKRLVQELMGEGGHKMRNKAK





DWKEKARIAIAPNGSSSLNIDKMVKEITVLARN




UGT73C5 protein
MVSETTKSSPLHFVLFPFMAQGHMIPMVDIARLLAQRGVIITIVTTPHNAARFKNVLNRAIESG
7
SEQ ID NO: 22



LPINLVQVKFPYLEAGLQEGQENIDSLDTMERMIPFFKAVNFLEEPVQKLIEEMNPRPSCLISD

in



FCLPYTSKIAKKFNIPKILFHGMGCFCLLCMHVLRKNREILDNLKSDKELFTVPDFPDRVEFTR

W02016050890



TQVPVETYVPAGDWKDIFDGMVEANETSYGVIVNSFQELEPAYAKDYKEVRSGKAWTIGPVSLC





NKVGADKAERGNKSDIDQDECLKWLDSKKHGSVLYVCLGSICNLPLSQLKELGLGLEESQRPFI





WVIRGWEKYKELVEWFSESGFEDRIQDRGLLIKGWSPQMLILSHPSVGGFLTHCGWNSTLEGIT





AGLPLLTWPLFADQFCNEKLVVEVLKAGVRSGVEQPMKWGEEEKIGVLVDKEGVKKAVEELMGE





SDDAKERRRRAKELGDSAHKAVEEGGSSHSNISFLLQDIMELAEPNN




UGT73E1 protein
MSPKMVAPPTNLHFVLFPLMAQGHLVPMVDIARILAQRGATVTIITTPYHANRVRPVISRAIAT
8
SEQ ID NO: 24



NLKIQLLELQLRSTEAGLPEGCESFDQLPSFEYWKNISTAIDLLQQPAEDLLRELSPPPDCIIS

in



DFLFPWTTDVARRLNIPRLVFNGPGCFYLLCIHVAITSNILGENEPVSSNTERVVLPGLPDRIE

W02016050890



VTKLQIVGSSRPANVDEMGSWLRAVEAEKASFGIVVNTFEELEPEYVEEYKTVKDKKMWCIGPV





SLCNKTGPDLAERGNKAAITEHNCLKWLDERKLGSVLYVCLGSLARISAAQAIELGLGLESINR





PFIWCVRNETDELKTWFLDGFEERVRDRGLIVHGWAPQVLILSHPTIGGFLTHCGWNSTIESIT





AGVPMITWPFFADQFLNEAFIVEVLKIGVRIGVERACLFGEEDKVGVLVKKEDVKKAVECLMDE





DEDGDQRRKRVIELAKMAKIAMAEGGSSYENVSSLIRDVTETVRAPH




UGT98 protein
MDAQRGHTTTILMFPWLGYGHLSAFLELAKSLSRRNFHIYFCSTSVNLDAIKPKLPSSSSSDSI
9
SEQ ID NO: 53



QLVELCLPSSPDQLPPHLHTTNALPPHLMPTLHQAFSMAAQHFAAILHTLAPHLLIYDSFQPWA

in



PQLASSLNIPAINFNTTGASVLTRMLHATHYPSSKFPISEFVLHDYWKAMYSAAGGAVTKKDHK

W02016050890



IGETLANCLHASCSVILINSFRELEEKYMDYLSVLLNKKVVPVGPLVYEPNQDGEDEGYSSIKN





WLDKKEPSSTVFVSFGSEYFPSKEEMEEIAHGLEASEVHFIWVVRFPQGDNTSAIEDALPKGFL





ERVGERGMVVKGWAPQAKILKHWSTGGFVSHCGWNSVMESMMFGVPIIGVPMHLDQPFNAGLAE





EAGVGVEAKRDSDGKIQREEVAKSIKEVVIEKTREDVRKKAREMGEILRSKGDEKIDELVAEIS





LLRKKAPCSI




UGT1495 gene
ATGCTTCCATGGCTGGCTCACGGCCATGTCTCCCCTTTCTTCGAGCTCGCCAAGTTGCTCGCCG
10
SEQ ID NO: 27


sequence
CTAGAAACTTCCACATATTCTTCTGCTCCACCGCCGTAAACCTCCGCTCCGTCGAACCAAAACT

in



CTCTCAGAAGCTCTCCTCCCACGTGGAGCTGGTGGAGCTCAACCTACCGCCCTCGCCGGAGCTC

W02016050890



CCTCCGCACCGCCACACCACCGCCGGCCTTCCACCGCACCTCATGTTCTCGCTCAAGCGAGCTT





TCGACATGGCCGCTCCCGCCTTCGCCGCCATCCTCCGCGACCTGAACCCGGACTTGCTCATCTA





CGACTTCCTGCAGCCGTGGGCGGCGGCGGAGGCTCTGTCGGCGGATATTCCGGCCGTGATGTTC





AAAAGCACGGGTGCGCTCATGGCGGCCATGGTCGCGTACGAGCTGACGTTTCCGAACTCTGATT





TTTTCTCGCTTTTCCCTGAGATTCGTCTCTCCGAGTGCGAGATTAAACAGCTGAAGAACTTGTT





TCAATGTTCTGTGAATGATGCGAAAGACAAGCAAAGGATTAAGGGATGTTATGAGAGATCTTGC





GGCATGATTTTGGTGAAATCTTTCAGAGAAATCGAAGGCAAATATATTGATTTTCTCTCTACTC





TGCTGGGCAAGAAGGTTGTTCCAGTTGGTCCACTTGTTCAACAAACAGAAGACGACGTCGTATC





AGGAAGTTTTGACGAATGGCTAAATGGAAAAGATAGATCGTCTTCCATACTCGTGTCTTTCGGA





AGCGAGTTCTACCTGTCCAGAGAAGACATGGAAGAGATCGCGCATGGCTTAGAGCTGAGCCAGG





TGAACTTCATATGGGTCGTCAGGTTTCCGGCGGGAGGAGAGAGAAACACGACAAAGGTGGAAGA





AGAACTGCCAAAAGGGTTTCTAGAGAGAGTTAGAGAGAGAGGGATGGTGGTGGAGGGCTGGGCG





CCGCAGGCTCAGATCTTGAAACATCCAAGCGTCGGCGGATTCCTCAGCCACTGCGGGTGGAGCT





CCGTCGTGGAGAGCATGAAATTCGGCGTTCCGATCATCGCCATGCCGATGCACCTCGACCAGCC





GCTGAATTCCCGGCTGGTCGAGCGGCTCGGCGTCGGCGTAGTGGTGGAGAGAGACGGCCGCCTC





CGGGGAGAGGTGGAGAGAGTTGTCAGAGAGGTGGTGGTGGAGAAAAGTGGAGAGAGAGTGAGGA





AGAAGGTGGAGGAGTTTGCAGAGATCATGAAGAAGAAAAAAGACAATGAAGAGATGGACGTAGT





CGTGGAAGAGTTGGTGACGCTCTGCAGGAAGAAGAAGAAGGAGGAGGATTTACAGAGTAATTAT





TGGTGCAGAACCGCCATTGATGACCATTGTTCTGAAGTCGTGAAGATTGAAGATGCTGCAGCAG





CCGACGAGGAGCCTCTTTGCAAATAA




UGT1817 gene
ATGGCTGTCACTTACAGCCTGCACATAGCAATGTACCCTTGGTTTGCTTTCGGCCACTTGACTC
11
SEQ ID NO: 28


sequence
CATTTCTCCAAGTCTCCAACAAGCTTGCCAAGGAAGGCCACAAAATCTCCTTCTTCATCCCAAC

in



GAAAACGCTAACCAAATTGCAGCCTTTCAATCTCTTTCCAGATCTCATTACCTTTGTCCCCATC

W02016050890



ACTGTTCCTCATGTTGATGGTCTCCCTCTTGGAGCTGAGACTACTGCTGATGTTTCTCACCCTT





CACAGCTCAGTCTCATCATGACTGCTATGGATTGCACCCAACCCGAAATCGAGTGTCTTCTTCG





AGACATAAAACCTGATGCCATCTTCTTCGATTTCGCGCACTGGGTGCCAAAATTGGCATGTGGA





TTGGGCATTAAGTCGATTGATTACAGTGTCTGTTCTGCAGTATCAATTGGTTATGTTTTGCCCC





TATTAAGGAAAGTTTGTGGACAAGATTTATTAACTGAAGATGATTTTATGCAGCCATCTCCTGG





CTACCCGAGTTCCACCATCAATCTTCAAGCTCATGAGGCTCGATATTTTGCATCTCTGAGCCGC





TGGAGGTTTGGCAGTGATGTCCCTTTCTTTAGTCGCCATCTTACTGCACTTAATGAATGCAATG





CTTTAGCATTCAGGTCATGTAGGGAGATTGAAGGGCCTTTTATAGACTATCCAGAAAGTGAATT





AAAAAAGCCTGTGTTGCTTTCCGGAGCAGTGGATCTACAACCGCCAACCACAACTGTAGAAGAA





AGATGGGCAAAATGGCTATCAGGGTTCAACACCGACTCGGTCGTATATTGTGCATTTGGAAGTG





AGTGTACCTTAGCAAAAGACCAATTCCAAGAACTGCTGTTGGGTTTTGAGCTTTCAAATATGCC





ATTCTTTGCTGCACTTAAACCACCTTTTGGTGTTGACTCGGTTGAAGCAGCCTTGCCTGAAGGT





TTTGAACAGAGAGTTCAGGGAAGAGGGGTGGTCTATGGGGGATGGGTCCAACAGCAGCTCATTT





TGGAGCACCCATCAATTGGATGCTTTGTTACACATTGTGGATCAGGCTCCTTATCAGAGGCGTT





AGTGAAGAAGTGTCAATTAGTGTTGTTACCTCGTATCGGTGACCACTTTTTCCGAGCAAGAATG





TTGAGCAATTATTTGAAAGTTGGTGTGGAGGTAGAGAAAGGAGAAGGAGATGGATCTTTTACAA





AGGAAAGTGTGTGGAAGGCAGTGAAGACAGTGATGGATGAAGAGAATGAAACTGGGAAAGAGTT





CAGAGCGAACCGTGCCAAGATAAGAGAGCTATTGCTCGACGAAGATCTCGAGGAGTCTTATATC





AACAATTTCATCCACAGCCTGCATACTTTGAATGCATGA




UGT5914 gene
ATGGAAGCTAAGAACTGCAAAAAGGTTCTGATGTTCCCATGGCTGGCGCATGGTCACATATCAC
12
SEQ ID NO: 30


sequence
CATTTGTAGAGCTGGCCAAGAAGCTCACAGACAACAACTTCGCCGTTTTTCTATGTTCTTCCCC

in



TGCAAATCTTCAAAACGTCAAGCCAAAACTCCCCCATCACTACTCTGATTCCATTGAACTCGTG

W02016050890



GAGCTCAACCTTCCATCGTCGCCGGAGCTTCCCCCTCATATGCACACCACCAATGGCCTCCCTT





TGCATTTAGTTCCCACCCTCGTTGACGCCTTGGACATGGCCGCTCCGCACTTCTCCGCCATTTT





ACAGGAACTGAATCCAGATTTTCTCATATTCGACATCTTCCAACCCTGGGCGGCTGAAATCGCT





TCCTCCTTCGGCGTTCCTGCTATTTTGTTGCTTATCGTTGGATCTGCTATAACCGCTTTAGGGG





TTCATTTTGTCCGGAGCTCCGGTACGGAATTCCCCTTTCCCGAGCTTACTAAATCATTCAAGAA





GGAGGACGACCGAAAACCTCCAGGAGATTCCGGCAACGATAGAGGAAAACGGCTATTCAAATGT





CTGCTGGACCTGGAACATTCTTCAGAGACTATTTTGGTGAACAGTTTTACAGAGATAGAGGGCA





AATATATGGACTATCTCTCGGTCTTACTGAAGAAGAAGATCCTTCCGATTGGTCCTTTGGTTCA





GAAAATTGGCTCCGATGACGATGAATCGGGAATCCTCCGGTGGCTTGACAAGAAGAAACCGAAT





TCAACTGTGTACGTTTCGTTCGGGAGTGAGTACTATTTGAGCAAAGAAGACATAGCAGAGCTTG





CGCATGGTCTGGAAATCAGCGGCGTCAATTTCATCTGGATTGTTCGGTTTCCAAAGGGAGAGAA





AATCGCCATTGAAGAGGCATTACCAGATGAATTTCTTGAAAGAGTCGGAGAGAGAGGCGTCGTC





GTTGATGGATGGGCGCCGCAGATGAAAATATTAGGGCATTCGAGCGTCGGCGGGTTTCTGTCTC





ACTGCGGATGGAACTCTGTGCTGGAGAGTCTGGTGCTCGGCGTGCCGATCATATCCCTGCCGAT





ACACCTCGAACAGCCGTGGAACGCCTTGGTAGCGGAGCACGTCGGCGTTTGTGTGAGGGCGAAG





AGAGACGACGGAGGAAATCTTCAAAGAGAGTTGGTGGCGGAGGCCATTAAAGAAGTGGTGGTTG





AGGAAACAGGAGCGGAACTGAGAAGCAAAGCAAGAGTAATTAGTGAAATCTTGAAAAATAAAGA





AGCTGAAACAATACAAGATTTGGTGGCTGAGCTTCACCGGCTTTCTGACGCAAGAAGAGCTTGT





TGA




UGT8468 (gene
ATGGAAAAAAATCTTCACATAGTGATGCTTCCATGGTCGGCGTTCGGCCATCTCATACCATTTT
13
SEQ ID NO: 31


sequence)
TTCACCTCTCCATAGCCTTAGCCAAAGCCAAAGTTTATATCTCCTTCGTCTCCACTCCAAGAAA

in



TATTCAGAGACTYCCCCAAATCCCGCCGGACTTAGCTTCTTTCATAGATTTGGTGGCCATTCCC

W02016050890



TTGCCGAGACTCGACGACGATCTGTTGCTAGAATCTGCAGAGGCCACTTCTGATATTCCGATCG





ACAAGATTCAGTATTTGAAGCGAGCCGTCGACCTCCTCCGCCACCCCTTCAAGAAGTTTGTCGC





CGAACAATCGCCGGACTGGGTCGTCGTTGATTTTCATGCTTATTGGGCCGGCGAGATCTACCAG





GAGTTTCAAGTTCCCGTCGCCTACTTCTGTATTTTCTCGGCCATCTGTTTGCTTTATCTTGGAC





CTCCAGACGTGTATTCGAAGGATCCTCAGATCATGGCACGAATATCTCCCGTTACCATGACGGT





GCCGCCGGAGTGGGTCGGTTTTCCGTCCGCCGTAGCCTACAACTTGCATGAGGCGACGGTCATG





TACTCTGCTCTCTATGAAACAAATGGGTCTGGAATAAGCGACTGCGAGAGGATTCGCCGGCTCG





TCCTTTCCTGTCAAGCCGTGGCCATTCGAAGCTGCGAGGAGATTGAAGGCGAATACCTTAGGTT





ATGTAAGAAACTGATTCCACCGCAGGGGATTGCCGTCGGCTTGCTTCCGCCGGAAAAGCCACCA





AAATCAGATCACGAGCTCATCAAATGGCTTGACGAGCAAAAGCTCCGATTCGTCGTGTACGTGA





CATTCGGCAGCGAATGCAACCTGACGAAGGACCAAGTTCACGAGATAGCCCACGGGCTGGAACT





GTCGGAGCTGCCATTTTTATGGGCACTGAGGAAACCCAGCTGGGCAGCTGAGGAAGACGATGGG





CTGCCGTCTGGGTTTCGTGAGAGAACGTCCGGGAGAGGGGTGGTGAGCATGGAGTGGGTGCCGC





AGTTGGAGATTCTGGCGCACCAGGCCATCGGCGTCTCTTTAGTTCACGGGGGCTGGGGCTCTAT





TATCGAGTCGCTACAAGCTGGGCACTGTCTGGTTGTGCTGCCGTTTATCATCGACCAGCCGCTG





AACTCAAAGCTTTTGGTGGAGAAAGGGATGGCGCTTGAGATCAGAAGGAACGGTTCTGATGGAT





GGTTTAGTAGAGAAGACATCGCCGGAACTTTGAGAGAAGCTATGCGGTCGTCTGAGGAAGGCGG





GCAGCTGAGGAGCCGTGCAAAAGAGGCGGCGGCCATCGTTGGAGATGAGAAGCTGCAGTGGGAA





CAATACTTCGGCGCGTTCGTACAGTTTCTGAGGGACAAGTCTTGA




UGT10391 (gene
ATGTCCGAGGAGAAAGGCAGAGGGCACAGCTCGTCGACGGAGAGACACACTGCTGCCGCCATGA
14
SEQ ID NO: 32


sequence)
ACGCCGAGAAACGAAGCACCAAAATCTTGATGCTCCCATGGCTGGCTCACGGCCACATATCTCC

in



ATACTTCGAGCTCGCCAAGAGGCTCACCAAGAAAAACTGCCACGTTTACTTGTGTTCTTCGCCT

W02016050890



GTAAATCTCCAAGGCATCAAGCCGAAACTCTCTGAAAATTACTCTTCCTCCATTGAACTTGTGG





AGCTTCATCTTCCATCTCTCCCCGACCTTCCTCCCCATATGCACACGACCAAAGGCATCCCTCT





ACATCTACAATCCACCCTCATCAAAGCCTTCGACATGGCCGCCCCTGATTTTTCCGACCTGTTG





CAGAAACTCGAGCCGGATCTCGTCATTTCCGATCTCTTCCAGCCATGGGCAGTTCAATTAGCGT





CGTCTCGGAACATTCCCGTCGTCAATTTCGTTGTCACCGGAGTCGCTGTTCTTAGTCGTTTGGC





TCACGTGTTTTGCAACTCCGTTAAGGAATTCCCTTTCCCGGAACTCGATCTAACCGACCATTGG





ATCTCCAAGAGCCGCCGCAAAACGTCCGACGAATTAGGTCGCGAGTGCGCGATGCGATTTTTCA





ACTGCATGAAACAATCTTCAAACATCACTCTAGCCAACACTTTCCCCGAGTTCGAAGAAAAATA





CATCGATTATCTCTCTTCCTCGTTTAAGAAAAAGATTCTTCCGGTTGCTCCTCTAGTTCCTGAA





ATCGACGCAGACGACGAGAAATCGGAAATTATCGAGTGGCTTGACAAGAAGAAACCGAAATCGA





CTGTTTACGTTTCGTTTGGGAGTGAGTATTATCTGACGAAAGAAGACAGGGAAGAGCTCGCCCA





TGGCTTAGAAAAGAGCGGCGTGAATTTCATCTGGGTTATTAGGTTTCCAAAGGGCGAGAAGATC





ACCATTGAAGAGGCTTTACCAGAAGGATTTCTCGAGAGAGTAGGGGACAGGGGAGTGATTATCG





ACGGGTGGGCGCCGCAGTTGAAAATATTGAGGCATTCAAGCGTGGGCGGGTTCGTGTGCCACTG





CGGGTGGAACTCTGTGGTGGAGAGCGTGGTGTTTGGGGTGCCGATCATAGCCTTGCCGATGCAG





CTCGATCAGCCATGGCATGCGAAGGTGGCGGAGGACGGCGGCGTCTGTGCGGAGGCGAAGAGAG





ACGTTGAAGGGAGCGTTCAGAGAGAAGAGGTGGCGAAGGCCATTAAAGAGGTGGTGTTTGAGAA





GAAGGGGGGGGTTCTGAGTGGAAAAGCAAGAGAGATCAGCGAGGCCTTGAGAAAGAGGGAAGGG





GAAATCATAGAGGAATTGGTTGCTGAGTTTCACCAGCTCTGTGAAGCTTGA




UGT1576 protein
MASPRHTPHFLLFPFMAQGHMIPMIDLARLLAQRGVIITIITTPHNAARYHSVLARAIDSGLHI
15
SEQ ID NO: 48



HVLQLQFPCKEGGLPEGCENVDLLPSLASIPRFYRAASDLLYEPSEKLFEELIPRPTCIISDMC

in



LPWTMRIALKYHVPRLVFYSLSCFFLLCMRSLKNNLALISSKSDSEFVTFSDLPDPVEFLKSEL

W02016050890



PKSTDEDLVKFSYEMGEADRQSYGVILNLFEEMEPKYLAEYEKERESPERVWCVGPVSLCNDNK





LDKAERGNKASIDEYKCIRWLDGQQPSSVVYVSLGSLCNLVTAQIIELGLGLEASKKPFIWVIR





RGNITEELQKWLVEYDFEEKIKGRGLVILGWAPQVLILSHPAIGCFLTHCGWNSSIEGISAGVP





MVTWPLFADQVFNEKLIVQILRIGVSVGTETTMNWGEEEEKGVVVKREKVREAIEIVMDGDERE





ERRERCKELAETAKRAIEEGGSSHRNLTMLIEDIIHGGGLSYEKGSCR




UGT SK98 protein
MDAQRGHTTTILMLPWVGYGHLLPFLELAKSLSRRKLFHIYFCSTSVSLDAIKPKLPPSISSDD
16
SEQ ID NO: 50



SIQLVELRLPSSPELPPHLHTTNGLPSHLMPALHQAFVMAAQHFQVILQTLAPHLLIYDILQPW

in



APQVASSLNIPAINFSTTGASMLSRTLHPTHYPSSKFPISEFVLHNHWRAMYTTADGALTEEGH

W02016050890



KIEETLANCLHTSCGVVLVNSFRELETKYIDYLSVLLNKKVVPVGPLVYEPNQEGEDEGYSSIK





NWLDKKEPSSTVFVSFGTEYFPSKEEMEEIAYGLELSEVNFIWVLRFPQGDSTSTIEDALPKGF





LERAGERAMVVKGWAPQAKILKHWSTGGLVSHCGWNSMMEGMMFGVPIIAVPMHLDQPFNAGLL





EEAGVGVEAKRGSDGKIQREEVAKSIKEVVIEKTREDVRKKAREMGEILRSKGDEKIDELVAEI





SLLRKKAPCSI




UGT430 protein
MEQAHDLLHVLLFPYPAKGHIKPFLCLAELLCNAGLNVTFLNTDYNHRRLHNLHLLAACFPSLH
17
SEQ ID NO: 62



FESISDGLQPDQPRDILDPKFYISICQVTKPLFRELLLSYKRTSSVQTGRPPITCVITDVIFRF

in



PIDVAEELDIPVFSFCTFSARFMFLYFWIPKLIEDGQLPYPNGNINQKLYGVAPEAEGLLRCKD

W02016050890



LPGHWAFADELKDDQLNFVDQTTASLRSSGLILNTFDDLEAPFLGRLSTIFKKIYAVGPIHALL





NSHHCGLWKEDHSCLAWLDSRAARSVVFVSFGSLVKITSRQLMEFWHGLLNSGTSFLFVLRSDV





VEGDGEKQVVKEIYETKAEGKWLVVGWAPQEKVLAHEAVGGFLTHSGWNSILESIAAGVPMISC





PKIGDQSSNCTWISKVWKIGLEMEDQYDRATVEAMVRSIMKHEGEKIQKTIAELAKRAKYKVSK





DGTSYRNLEILIEDIKKIKPN




UGT1697 protein
MVQPRVLLFPFPALGHVKPFLSLAELLSDAGIDVVFLSTEYNHRRISNTEALASRFPTLHFETI
18
SEQ ID NO: 68



PDGLPPNESRALADGPLYFSMREGTKPRFRQLIQSLNDGRWPITCIITDIMLSSPIEVAEEFGI

in



PVIAFCPCSARYLSIHFFIPKLVEEGQIPYADDDPIGEIQGVPLFEGLLRRNHLPGSWSDKSAD

W02016050890



ISFSHGLINQTLAAGRASALILNTFDELEAPFLTHLSSIFNKIYTIGPLHALSKSRLGDSSSSA





SALSGFWKEDRACMSWLDCQPPRSVVFVSFGSTMKMKADELREFWYGLVSSGKPFLCVLRSDVV





SGGEAAELIEQMAEEEGAGGKLGMVVEWAAQEKVLSHPAVGGFLTHCGWNSTVESIAAGVPMMC





WPILGDQPSNATWIDRVWKIGVERNNREWDRLTVEKMVRALMEGQKRVEIQRSMEKLSKLANEK





VVRGGLSFDNLEVLVEDIKKLKPYKF




UGT11789 protein
MDAKEESLKVFMLPWLAHGHISPYLELAKRLAKRKFLVYFCSTPVNLEAIKPKLSKSYSDSIQL
19
SEQ ID NO: 72



MEVPLESTPELPPHYHTAKGLPPHLMPKLMNAFKMVAPNLESILKTLNPDLLIVDILLPWMLPL

in



ASSLKIPMVFFTIFGAMAISFMIYNRTVSNELPFPEFELHECWKSKCPYLFKDQAESQSFLEYL

W02016050890



DQSSGVILIKTSREIEAKYVDFLTSSFTKKVVTTGPLVQQPSSGEDEKQYSDIIEWLDKKEPLS





TVLVSFGSEYYLSKEEMEEIAYGLESASEVNFIWIVRFPMGQETEVEAALPEGFIQRAGERGKV





VEGWAPQAKILAHPSTGGHVSHNGWSSIVECLMSGVPVIGAPMQLDGPIVARLVEEIGVGLEIK





RDEEGRITRGEVADAIKTVAVGKTGEDFRRKAKKISSILKMKDEEEVDTLAMELVRLCQMKRGQ





ESQD




CYP1798 protein
MEMSSSVAATISIWMVVVCIVGVGWRVVNWVWLRPKKLEKRLREQGLAGNSYRLLFGDLKERAA
20
SEQ ID NO: 74



MEEQANSKPINFSHDIGPRVFPSMYKTIQNYGKNSYMWLGPYPRVHIMDPQQLKTVFTLVYDIQ

in



KPNLNPLIKFLLDGIVTHEGEKWAKHRKIINPAFHLEKLKDMIPAFFHSCNEIVNEWERLISKE

W02016050890



GSCELDVMPYLQNLAADAISRTAFGSSYEEGKMIFQLLKELTDLVVKVAFGVYIPGWRFLPTKS





NNKMKEINRKIKSLLLGIINKRQKAMEEGEAGQSDLLGILMESNSNEIQGEGNNKEDGMSIEDV





IEECKVFYIGGQETTARLLIWTMILLSSHTEWQERARTEVLKVFGNKKPDFDGLSRLKVVTMIL





NEVLRLYPPASMLTRIIQKETRVGKLTLPAGVILIMPIILIHRDHDLWGEDANEFKPERFSKGV





SKAAKVQPAFFPFGWGPRICMGQNFAMIEAKMALSLILQRFSFELSSSYVHAPTVVFTTQPQHG





AHIVLRKL




EPH1 epoxide
MEKIEHSTIATNGINMHVASAGSGPAVLFLHGFPELWYSWRHQLLYLSSLGYRAIAPDLRGFGD
21
Disclosed in


hydrolase
TDAPPSPSSYTAHHIVGDLVGLLDQLGVDQVFLVGDWGAMMAWYFCLFRPDRVKALVNLSVHFT

the suppl. of



PRNPAISPLDGFRLMLGDDFYVCKFQEPGVAEADFGSVDTATMFKKFLTMRDPRPPIIPNGFRS

Itkin et al.



LATPEALPSWLTEEDIDYFAAKFAKTGFTGGFNYYRAIDLTWELTAPWSGSEIKVPTKFIVGDL

Proc Natl



DLVYHFPGVKEYIHGGGFKKDVPFLEEVVVMEGAAHFINQEKADEINSLIYDFIKQF

Acad Sci USA





2016, 22;





113(47):E7619





-E7628)


EPH2 epoxide
MEKIEHTTISTNGINMHVASIGSGPAVLFLHGFPELWYSWRHQLLFLSSMGYRAIAPDLRGFGD
22
Disclosed in


hydrolase
TDAPPSPSSYTAHHIVGDLVGLLDQLGIDQVFLVGHDWGAMMAWYFCLFRPDRVKALVNLSVHF

the



LRRHPSIKFVDGFRALLGDDFYFCQFQEPGVAEADFGSVDVATMLKKFLTMRDPRPPMIPKEKG

supplement of



FRALETPDPLPAWLTEEDIDYFAGKFRKTGFTGGFNYYRAFNLTWELTAPWSGSEIKVAAKFIV

Itkin et al



GDLDLVYHFPGAKEYIHGGGFKKDVPLLEEVVVVDGAAHFINQERPAEISSLIYDFIKKF




EPH3 epoxide
MDQIEHITINTNGIKMHIASVGTGPVVLLLHGFPELWYSWRHQLLYLSSVGYRAIAPDLRGYGD
23
Disclosed in


hydrolase
TDSPASPTSYTALHIVGDLVGALDELGIEKVFLVGHDWGAIIAWYFCLFRPDRIKALVNLSVQF

the



IPRNPAIPFIEGFRTAFGDDFYMCRFQVPGEAEEDFASIDTAQLFKTSLCNRSSAPPCLPKEIG

supplement of



FRAIPPPENLPSWLTEEDINYYAAKFKQTGFTGALNYYRAFDLTWELTAPWTGAQIQVPVKFIV

Itkin et al



GDSDLTYHFPGAKEYIHNGGFKKDVPLLEEVVVVKDACHFINQERPQEINAHIHDFINKF




EPH4 epoxide
MENIEHTTVQTNGIKMHVAAIGTGPPVLLLHGFPELWYSWRHQLLYLSSAGYRAIAPDLRGYGD
24
Disclosed in


hydrolase
TDAPPSPSSYTALHIVGDLVGLLDVLGIEKVFLIGHDWGAIIAWYFCLFRPDRIKALVNLSVQF

the



FPRNPTTPFVKGFRAVLGDQFYMVRFQEPGKAEEEFASVDIREFFKNVLSNRDPQAPYLPNEVK

supplement of



FEGVPPPALAPWLTPEDIDVYADKFAETGFTGGLNYYRAFDRTWELTAPWTGARIGVPVKFIVG

Itkin et al



DLDLTYHFPGAQKYIHGEGFKKAVPGLEEVVVMEDTSHFINQERPHEINSHIHDFFSKFC




EPH5 epoxide
MEKESEIHSIRHTTVSVNGINMHVAEKGEGPLVLFIHGFPELWYSWRHQILDLASLGYRAVAPD
25
Disclosed in


hydrolase
LRGYGDSDAPPSASSYTSFHIVGDLIALLDAIVGVEEKVFVVAHDWGAIIAWYLCLYRPDRIKA

the



LVNLSVAFIRRNPKGKPVEWIRALYGDDHYMCRCQEPGEIEGEFAEIGTERVLTQFLTYHSPKP

supplement of



LMLPKGKAFGHPLDTPIPLPPWLSHQDIEYYASKFDKKGFTGPVNYYRNLDRNWELNAPFTRAQ

Itkin et al



VKVPVKFIVGDLDLTYHSFGTKEYIHSGEMKKDVPFLQEVVVMEGVGHFIQSEKPHEISDHIYQ





FIKKF




EPH6 epoxide
MEKIEHTIITTNGINMHVASIGTGPAVLFLHGFPELWYSWRHQLLSFSSLGYRAIAPDLRGYGD
26
Disclosed in


hydrolase
SDAPPSPSSYTVFHIVGDLVGLLDQLGIDQVFLVGHDWGASIAWYFSLLRPDRIKALVNLSVQY

the



FPRNPARNTVEALRALFGDDYYVCRFQEPGEMEEDFASIDTAVIFKIFLSSRDPRPPCIPKAVG

supplement of



FRAFPVPDSLPSWLSEEDISYYASKFSKKGFTGGLNYYRALALNWELTAPWTGTQIKVPTKFIV

Itkin et al



GDLDLTYHIPGSKEYIHKGGFERDVPSLEEVVVIEGAAHFVNQERPEEISKHIYDFIKKF




EPH7 epoxide
MDAIEHRTVSVNGINMHVAEKGEGPVVLLLHGFPELWYSWRHQILALSSLGYRAVAPDLRGYGD
27
Disclosed in


hydrolase
TDAPGSISSYTCFHIVGLVALVESLGVDRVFVVAHDWGAMIANCLCLFRPEMVKAFVCLSVPFR

the



QRNPKMKPVQSMRAFFGDDYYICRFQNPGEIEEEMAQVGAREVLRGILTSRRPGPPILPKGQAF

supplement of



RARPGASTALPSWLSEKDLSFFASKYDQKGFTGPLNYYRAMDLNWELTASWTGVQVKVPVKYIV

Itkin et al



GDVDMVFTTPGVKEYVNGGGFKKDVPFLQEVVIMEGVGHFINQEKPEEISSHIHDFISRF




EPH8 epoxide
MDQIQHKFIDIRGLKLHIAEIGTGSPAVVFLHGFPEIWYSWRHQMVAAAAVGYRAISPDLRGYG
28
Disclosed in


hydrolase
FSDPHPQPQNASFDDFVEDTLAILDFLHIPKAFLVGKDFGSWPVYLFSLVHPTRVAGIVSLGVP

the



FLPPNPKRYRDLPEGFYIFRWKESGRAEADFGRFDVKTVLRRIYTLFSRSEIPIAEKDQEIMDM

supplement of



VDESTPPPPWLTDEDLAAYATAYEHSGFESALQVPYRRRHQELGMSNPRVDVPVLLIIGGKDYF

Itkin et al



LKFPGIEDYIKSEKMREIVPDLEVADLADGTHFMQEQFPAQVNHLLISFLGKRNT




EH1 epoxide
MDAIEHRTVSVNGINMHVAEKGEGPVVLLLHGFPELWYSWRHQILALSSLGYRAVAPDLRGYGD
29
SEQ ID NO: 38


hydrolase 1
TDAPGSISSYTCFHIVGDLVALVESLGMDRVFVVAHDWGAMIANCLCLFRPEMVKAFVCLSVPF

in



RQRNPKMKPVQSMRAFFGDDYYICRFQNPGEIEEEMAQVGAREVLRGILTSRRPGPPILPKGQA

W02016050890



FRARPGASTALPSWLSEKDLSFFASKYDQKGFTGPLNYYRAMDLNWELTASWTGVQVKVPVKYI





VGDVDMVFTTPGVKEYVNGGGFKKDVPFLQEVVIMEGVGHFINQEKPEEISSHIHDFISKF




EH2 epoxide
MDEIEHITINTNGIKMHIASVGTGPVVLLLHGFPELWYSWRHQLLYLSSVGYRAIAPDLRGYGD
30
SEQ ID NO: 40


hydrolase
TDSPASPTSYTALHIVGDLVGALDELGIEKVFLVGHDWGAIIAWYFCLFRPDRIKALVNLSVQF

in



IPRNPAIPFIEGFRTAFGDDFYICRFQVPGEAEEDFASIDTAQLFKTSLCNRSSAPPCLPKEIG

W02016050890



FRAIPPPENLPSWLTEEDINFYAAKFKQTGFTGALNYYRAFDLTWELTAPWTGAQIQVPVKFIV





GDSDLTYHFPGAKEYIHNGGFKRDVPLLEEVVVVKDACHFINQERPQEINAHIHDFINKF




CYP533 gene
ATGGAACTCTTCTCTACCAAAACTGCAGCCGAGATCATCGCTGTTGTCTTGTTTTTCTACGCTC
31
SEQ ID NO: 3


(coding sequence)
TCATCCGGCTATTATCTGGAAGATTCAGCTCTCAACAGAAGAGACTGCCACCTGAAGCCGGTGG

in



CGCCTGGCCACTGATCGGCCATCTCCATCTCCTAGGTGGGTCGGAACCTGCACATAAAACCTTG

W02016050890



GCGAACATGGCGGACGCCTACGGACCAGTTTTTACGTTGAAACTGGGCATGCATACAGCTTTGG





TTATGAGCAGTTGGGAAATAGCGAGAGAGTGCTTTACTAAAAACGACAGAATCTTTGCCTCCCG





CCCCATAGTCACTGCCTCAAAGCTTCTCACCTATAACCATACCATGTTTGGGTTCAGCCAATAT





GGTCCATTCTGGCGCCATATGCGCAAAATAGCCACGCTTCAACTCCTCTCAAACCACCGCCTCG





AGCAGCTCCAACACATCAGAATATCGGAGGTCCAGACTTCGATTAAGAAACTGTACGAGTTGTG





GGTCAACAGCAGAAATAATGGAGGCGAGAAAGTGTTGGTGGAGATGAAGACGTGGTTCGGAGGC





ATAACCTTGAACACCATATTCAGGATGGTGGTCGGAAAGCGATTCTCGACTGCTTTCGAAGGCA





GTGGTGGCGAACGGTATCGGAAGGCGTTGAGGGATTCTCTTGAATGGTTTGGGGCATTCGTTCC





GTCAGATTCATTCCCGTTTTTAAGATGGTTGGATTTGGGAGGATATGAGAAGGCGATGAAGAAG





ACGGCGAGTGTGCTGGACGAGGTGCTTGATAAATGGCTCAAAGAGCATCAGCAGAGGAGAAACT





CCGGTGAACTGGAGACGGAGGAGCACGACTTCATGCACGTGATGCTGTCTATTGTTAAGGATGA





TGAAGAACTATCCGGCTACGATGCCGATACAGTCACAAAAGCTACATGTTTGAATTTAATAGTT





GGTGGATTCGACACTACACAAGTAACTATGACATGGGCTCTTTCTTTGCTTCTCAACAATGAAG





AGGTATTAAAAAAGGCCCAACTTGAACTAGACGAACAAGTTGGAAGAGAGAGGTTTGTGGAAGA





GTCCGATGTTAAAAATCTGTTATATCTCCAGGCCATCGTGAAGGAAACTTTGCGTTTGTACCCT





TCAGCGCCAATCTCGACATTTCATGAGGCCATGGAAGATTGCACTGTTTCTGGCTACCACATCT





TTTCAGGGACGCGTTTGATGGTGAATCTTCAAAAGCTTCAAAGAGATCCACTTGCATGGGAGGA





TCCATGTGACTTTCGACCGGAGAGATTTCTGACAACTCATAAGGATTTCGATCTTAGAGGACAT





AGTCCTCAATTGATACCATTTGGGAGTGGTCGAAGAATATGCCCTGGCATCTCGTTTGCCATTC





AAGTTTTGCATCTTACGCTTGCAAATCTACTTCATGGGTTTGACATTGGAAGGCCATCTCATGA





ACCAATCGATATGCAGGAGAGTAAAGGACTAACGAGTATTAAAACAACTCCACTTGAGGTTGTT





TTAGCTCCACGCCTTGCTGCTCAAGTTTATGAGTGA




CYP937 gene (coding
ATGCCGATCGCAGAICAGTCTCTGATTTGTTTGGTCGCCCACTCTTCTTTGCACTATATG
32
SEQ ID NO: 4


seugence)
ATTGGTTCTTAGAGCATGGATCTGTTTATAAACTTGCCTTTGGACCAAAAGCCTTTGTTGTTGT

in



ATCAGATCCCATTGTGGCAAGATATATTCTTCGAGAAAATGCATTTGGTTATGACAAGGGAGTG

W02016050890



CTTGCTGATAIITTAGAACCGATAAIGGGTAAAGGACTAATACCACTGATCCTTGGCACTTGGA





AGCAGAGGAACGATTATTGCTCCAGATTCCATGCCTTTACTTGAAGCTTATGACCAAATAAGTA





ATTTGCCAATTGTTCAGAACGATCAATATTGAAATTGGAGAAGCTTCTAGGAGAAGGTGAACTA





CAGGAGAATAAAACCATTGAGTTGGATATGGAAGCAGAGTTTTCAAGTTTGGCTCTTGATATCA





TTGGACTCGGTGTTTTCAACTATGATTTTGGTTCTGTAACCAAAGAATCTCCGGTGATTAAGGC





TGTATATGGGACTCTITTTGAAGCAGAGCATAGATCGACTTTCTATATCCCATATTGGAAAGTA





CCTTTGGCAAGGTGGATAGTCCCAAGGCAGCGTAAATTCCATGGTGACCTTAAGGTTATTAATG





AGTGTCTTGAEGGCCTAATACGCAACGCAAGAGAAACCCGAGACTTGAAACGGATGTTGAAATT





GCAGCAAAGGGACTACTTAAATCTCAAGGATATCAGTCTTTTGCGTTICTZAGTTGATATGCGG





GGAGCTGATGTTGATGATCGCCATTAGGGACGATCTGATGACGATGCTATTCATGCTGGCCATG





AAACAACTGCTGCTGTGCTTAGCTTACATCTTTTTTTTGCTTCACAAAATCCTTCAAAAATGAA





AAAAGCGCAAGCAAGATTGATTATCTTGCATGGAGCCAACTTTTGAATCATACGAATCGTTAAA





GCATTGAAGTACATCAGACTTATCGTTGCAGAGACTCTTCGTTTGTTTCCTCAGCCTCCATTGC





TGATAAGACGAGCTCTCAAATCAGATATATTACCAGGAGGATACAATGGTGACAAAACTGGATA





TGCAATTCCTGCAGGGACTGACATCTTCATCTCTGTTTACAATCTCCACAGATCTCCCTACTTC





TGGGATAATCCTCAAGAATTTGAACCAGAGAGATTTCAAGTAAAGAGGGCAAGCGAGGGAATTG





AAGGATGGGATGGTTTCGACCCATCTAGAAGCCCCGGAGCTCTATACCCGAATGAGATTGTAGC





AGACTTTTCCTTCTTACCATTTGGTGGAGGCCCTAGAAAATGTGTGGGAGATCAATTTGCTCTA





ATGGAGTCAACTATAGCATTGGCCATGTTACTGCAGAAGTTTGATGTGGAGCTAAAAGGAAGTC





CAGAATCTGTAGAACTAGTTACTGGAGCCACAATACATACCAAAAGTGGGTTGTGGTGCAPACT





GAGAAGAAGATCACAAGTAAACTGA




CYP1798
ATGGAAATGTCCTCAAGTGTCGCAGCCACAATCAGTATCTGGATGGTCGTCGTATGTATCGTAG
33
SEQ ID NO: 5


gene (coding
GTGTAGGTTGGAGAGTCGTAAATTGGGTTTGGTTGAGACCAAAGAAATTGGAAAAGAGATTGAG

in


sequence, codon
AGAACAAGGTTEGGCGGGIAATCCTTACAGATTGITGCTCGTGACTTGAAGGATGCGAGCTGCA

W02016050890


optimized)
ATGGAAGAACAAGCAAATTCAAAGCCTATAAACTTCTCCCATGACATCGGTCCAAGAGCTTTCC





CTTCAATGTACAAGACCATCCAAAACTACGGTAAAAACTCCTACATGTGCTTAGGTCCATACCC





TAGAGTCCACATCATGGATCCACAACAATTGAAGACCGCTTCTACTTTGGTCTACGACATTCAA





AAGCCAAATTTGAACCCGGTTGATTAAATTCTTOTTAGATCGGCGTTACACATGAAGGGTGAAA





AGTGGGCCTAACCACAGAAACATTATTAACCCAGGTTCCATTGGGAAAAGGTGAAGGATATGAT





ACCTGGCTTCTTTCACTCATCTAATGAAATCCTCAACGAATAAAGATTGATTGCTACAAAAGAA





GGTTGCTGCGAATTGGATGGCAATCCCGTATTCACAAAATTEGCCGGTGACGCCATTACAAGAA





CCGCTTTTGTTCTTCATACGAAGAAGAAAGATTGATTGCTAGATCTTCCAATTGTTGAAGGAAT





TTTGCTTCTCAAGCTAGCTTTTGCTCTTTATATTCCACCTTOGAGATICTTGCCTACAAAGAGT





AACAATGAAGGAAATTAATAGAAAAATCAAGGCITTGITGTGGGCTATCATICAAGATGCATTG





GACAAAAGGCAATGGAAGAAGGCGAAOCCGGTCAATCTGATTTOTTGGGIATATTAATOGAAAG





TAATACTAACGAAATCCAAGOTGAAGGTAATAACAAGGAAGAIGGCATGTCTATTGAAGACGTC





ATCGAAGAGTGTAAGGTATATTATATAGGAGGTCAAGAAACTACAGCAAGATTATTGATCTGGA





CTATGATATTTTGTCCAGTCGAATATAGAATGGCAAGAAGAGCCAGAACCGAAGACTTGAAGGT





ATTTGTAATAAGAAACCAGATTTCGACGGTTTGTCPAGATTGAAGCTAGATACTATTGATCTTG





AACGAAGTTGTAAGATTTACCCACCTCCTGCCATGCCTGACAAGPATCATCCAAAAGGAAACAA





GAGTTGCTAAACCTAACCGTGCCAGCAGTCTTATCTTGATAATGCCTATCATCTTGATACATAG





AGATCACGACTTGTGGGGTGAAGATCTAACGAGTTAAACCAGAAAGAATCAGTAAAGCTTCTTG





TCTAGGCACAGCAAAGTCCAACCAGCCTTTTCCCTTTTGGTTGGCCTCGTACCTATTTGCATGG





GACAAAACTTCGCTATGATCGAAGCTAAGATGGCATTGAGTTTGATCTIGCAAAGATTTGCTIT





CGAATAGTCTICATCCTACGTTCATGCACCAACTCTCGACTICACTACACAACCACAACACGGT





GCCCACATCGTATTGAGAAAGTTATGA




CYP1994 gene
ATGGAACCACAACCAAGTGCGAATTCAACTGGAATCACAGCCTAAGCACCGTCCTATCGGTG
34
SEQ ID NO: 6


(coding sequence)
TCATTGCCATTATTTTCTTCCGTTTTCTCGTCAAAAGAGTCACGGCCCGGTGAGCGAAAGGG

in



TCCGAAGCCGCCAAAAGTAGCCGGAGGGTGGCCTCTAATTGGCCACCTCCCTCTCCTCTCGAGGA

W02016050890



CCTGAACTGCCCCATGTCAAACTGGGTGGGTTGCCTGATAAATATGGTCCAATCTTCTCGATCC





GGCTGGGTGTCCACTCCGCCGTCGTGATAAACAGTTGGGAGGCGGCGAAACAGTTATTAACCAA





CCATGACGTCGCCGTCTCTTCCCGCCCCCAAATGCTCGGCGGAAAACTCCTGGGCTACAACTAC





GCCOTGTiiGGETTCGGACCCTACGGCTOTTACTOGCGCAACATGCGCAAGATAACCACGOAAG





AGCTECTATCCAATAGCAGAATCCAOCTCCTAAGAGACGTTCGAGCGTCAGAAGTGAACCAAGG





CATAAAAGAGCTCTACCAGCACTGGAAAGAAAGAAGAGACGGTCACGACCAAGCCTTGGEGGAA





CATAAAAGAGCTCTACCAGCACTGGAAAGAAAGAAGAGACGGTCACGACCAAGCCTTGGTGGAA





TCTTTGGAGCTGCAGCAACGGTAGACGAGGAAGAGGCGCGACGGAGCCATAAAGCATTGAAGGA





GTTGTTACATTATATGGGGCTTTTTCTACTGGGTGATGCTGTTCCATATCTAGGATGGTTGGAC





GTCGGCGGCCATGTGAAGGCGATGAAGAAAACTTCAAAAGAATTGGACCG7ATGTTAACACAGT





GGTTGGAGGAGCACAAGAAGGAAGGACCCAAGAAAGATCATAAAGACTTCATGGACGTGATGCT





TTCAGTTCTCAATGAAACATCCGATGTTCTTTCAGATAAGACCCATGGCTTCGATGCTGATACC





ATCATCAAAGCTACATGTATGACGATGGTTTTAGGAGGGAGTGATACGACGGCGGTGGTTGTGA





TATGGGCAATCTCGCTGCTGCTGAATAATCGCCCTGCGTTGAGAAAAGTGCAAGAAGAACTGGA





AGCCCATATCGGCCGAGACAGAGAACTGGAGGAATCGGATCTCGGTAAGCTAGTGTATTTGCAG





GCAGTCGTGAAGGAGACATTGCGGCTGTACGGAGCCGGAGGCCTTTTCTTTCGTGAAACCACAG





AGGATGTCACCATCGACGGATTCCATGTCGAGAAAGGGACATGGCTGTTCGTGAACGTGGGGAA





GATCCACAGAGATGGGAAGGTGTGGCCGGAGCCAACGGAGTTCAAACCGGAGAGGTTTCTGACG





ACCCACAAAGATTTTGATCTGAAGGGCCAGCGGTTTGAGCTCATCCCTTTCGGGGGAGGAAGAA





GATCGTGCCCTGGAATGTCTTTTGGSCTCCAAATGCTACAGCTTATTTTGGGTAAACTGCTTCA





GGCTTTTGATATATCGACGCCGGGGGACGCCGCCGTTGATATGACCGGATCCATTGGACTGACG





AACATGAAAGCCACTCCATTGGAAGTGCTCATCACCCCGCGCTTGCCTCTTTCGCTTTACGATT





GA




CYP2048 gene
ATGGAGACTCTTCTTCTTCATCTTCAATCGTTATTTCATCCAATTTCCTTCACTGGTTTCGTTG
35
SEQ ID NO: 7


(coding sequence)
TCCTCTTTAGCTTCCTGTTCCTGCTCCAGAAATGGTTACTGACACGTCCAAACTCTTCATCAGA

in



AGCCTCACCCCCTTCTCCACCAAAGCTTCCCATCTTCGGACACCTTCTAAACCTGGGTCTGCAT

W02016050890



CCCCACATCACCCTCGGAGCCTACGCTCGCCGCTATGGCCCTCTCTTCCTCCTCCACTTCGGCA





GCAAGCCCACCATCGTCGTCTCTTCTGCCGAAATCGCTCGCGATATCATGAAGACCCACGACCT





CGTCTTCGCCAACCGTCCTAAATCAAGCATCAGCGAAAAGATTCTTTACGGCTCCAAAGATTTA





GCCGCATCTCCTTACGGCGAATACTGGAGGCAGATGAAAAGCGTTGGCGTGCTTCATCTTTTGA





GCAACAAAAGGGTTCAATCCTTTCGCTCTGTCAGAGAAGAAGAAGTCGAACTGATGATCCAGAA





GATCCAACAGAACCCCCTATCAGTTAATTTAAGCGAAATATTCTCTGGACTGACGAACGACATA





GTTTGCAGGGTGGCTTTAGGGAGAAAGTATGGCGTGGGAGAAGACGGAAAGAAGTTCCGGTCTC





TTCTGCTGGAGTTTGGGGAAGTATTGGGAAGTTTCAGTACGAGAGACTTCATCCCGTGGCTGGG





TTGGATTGATCGTATCAGTGGGCTGGACGCCAAAGCCGAGAGGGTAGCCAAAGAGCTCGATGCT





TTCTTTGACAGAGTGATCGAAGATCACATCCATCTAAACAAGAGAGAGAATAATCCCGATGAGC





AGAAGGACTTGGTGGATGTGCTGCTTTGTGTACAGAGAGAAGACTCCATCGGGTTTCCCCTTGA





GATGGATAGCATAAAAGCTTTAATCTTGGACATGTTTGCTGCAGGCACAGACACGACATACACG





GTGTTGGAGTGGGCAATGTCCCAACTGTTGAGACACCCAGAAGCGATGAAGAAACTGCAGAGGG





AGGTCAGAGAAATAGCAGGTGAGAAAGAACACGTAAGTGAGGATGATTTAGAAAAGATGCATTA





CTTGAAGGCAGTAATCAAAGAAACGCTGCGGCTACACCCACCAATCCCACTCCTCGTCCCCAGA





GAATCAACCCAAGACATCAGGTTGAGGGGGTACGATATCAGAGGCGGCACCCGGGTTATGATCA





ATGCATGGGCCATCGGAAGA




CYP2740 gene
ATGTCGATGAGTAGTGAAATTGAAAGCCTCTGGGTTTTCGCGCTGGCTTCTAAATGCTCTGCTT
36
SEQ ID NO: 8


(coding sequence)
TAACTAAAGAAAACATCCTCTGGTCTTTACTCTTCTTTTTCCTAATCTGGGTTTCTGTTTCCAT

in



TCTCCACTGGGCCCATCCGGGCGGCCCGGCTTGGGGCCGCTACTGGTGGCGCCGCCGCCGCAGC

W02016050890



AATTCCACCGCCGCTGCTATTCCCGGCCCGAGAGGCCTCCCCCTCGTCGGCAGCATGGGCTTGA





TGGCCGACTTGGCCCACCACCGGATTGCCGCCGTGGCTGACTCCTTAAACGCCACCCGCCTCAT





GGCCTTTTCGCTCGGCGACACTCGCGTGATCGTCACATGCAACCCCGACGTCGCCAAAGAGATT





CTCAACAGCTCCCTCTTCGCCGACCGCCCCGTTAAGGAGTCCGCTTACTCCTTGATGTTCAACC





GCGCCATTGGGTTCGCCCCCTATGGCCTTTACTGGCGGACCCTCCGCCGCATCGCTTCCCACCA





CCTCTTCTGCCCCAAGCAAATCAAGTCCTCCCAGTCCCAGCGCCGCCAAATCGCTTCCCAAATG





GTCGCAATGTTCGCAAACCGCGATGCCACACAGAGCCTCTGCGTTCGCGACTCTCTCAAGCGGG





CTTCTCTCAACAACATGATGGGCTCTGTTTTCGGCCGAGTTTACGACCTCTCTGACTCGGCTAA





CAATGACGTCCAAGAACTCCAGAGCCTCGTCGACGAAGGCTACGACTTGC7GGGCCTCCTCAAC





TGGTCCGACCATCTCCCATGGCTCGCCGACTTCGACTCTCAGAAAATCCGGTTCAGATGCTCCC





GACTCGTCCCCAAGGTGAACCACTTCGTCGGCCGGATCATCGCCGAACACCGCGCCAAATCCGA





CAACCAAGTCCTAGATTICGTCGACGTTTTGCTCTCTCTCCAAGAAGCCGACAAACTCTCTGAC





TCCGATATGATCGCCGTTCTTTGGGAAATGATTTTTCGTGGGACGGACACGGTGGCAGTTTTAA





TCGAGTGGATACTGGCCAGGATGGTACTTCACAACGATATCCAAAGGAAAGTTCAAGAGGAGCT





AGATAACGTGGTTGGGAGTACACGCGCCGTCGCGGAATCCGACATTCCGTCGCTGGTGTATCTA





ACGGCTGTGGTTAAGGAAGTTCTGAGGTTACATCCGCCGGGCCCACTCCTGTCGTGGGCCCGCC





TAGCCATCACTGATACAATCATCGATGGGCATCACGTGCCCCGGGGGACCACCGCTATGGTTAA





CATGTGGTCGATAGCGCGGGACCCACAGGTCTGGTCGGACCCACTCGAATTTATGCCCCAGAGG





TTTGTGTCCGACCCCGGTGACGTGGAGTTCTCGGTCATGGGTTCGGATCTCCGGCTGGCTCCGT





TCGGGTCGGGCAGAAGGACCTGCCCCGGGAAGGCCTTCGCCTGGACAACTGTCACCTTCTGGGT





GGCCACGCTTTTACACGACTTCAAATGGTCGCCGTCCGATCAAAACGACGCCGTCGACTTGTCG





GAGGTCCTCAAGCTCTCCTGCGAGATGGCCAATCCCCTCACCGTTAAAGTACACCCAAGGCGCA





GTTTAAGCTTTTAA




CYP3404 gene
ATGGATGGTTTTCTTCCAACAGTGGCGGCGAGCGTGCCTGTGGGAGTGGGTGCAATATTGTTCA
37
SEQ ID NO: 9


(coding sequence)
CGGCGTTGTGCGTCGTCGTGGGAGGGGTTTTGGTTTATTTCTATGGACCTTACTGGGGAGTGAG

in



AAGGGTGCCTGGTCCACCAGCTATTCCACTGGTCGGACATCTTCCCTTGCTGGCTAAGTACGGC

W02016050890



CCAGACGTTTTCTCTGTCCTTGCCACCCAATATGGCCCTATCTTCAGGTTCCATATGGGTAGGC





AGCCATTGATAATTATAGCAGACCCTGAGCTTTGTAAAGAAGCTGGTATTAAGAAATTCAAGGA





CATCCCAAATAGAAGTGTCCCTTCTCCAATATCAGCTTCCCCTCTTCATCAGAAGGGTCTTTTC





IICACAAGGGATGCAAGATGGTCGACAATGCGGAACACGATATTATCGGTCTATCAGTCCTCCC





ATCTAGCGAGACTAATACCTACTATGCAATCAATCATTGAAACTGCAACTCAAAATCTCCATTC





CTCTGTCCAGGAAGACAICCCTTTCTCCAATCTCTCCCTCAAATTGACCACCGATGTGATTGGA





ACAGCAGCCTTCGGTGTCAACTTTGGGCTCTCTAATCCACAGGCAACCAAAAGTTGTGCTACCA





ACGGCCAAGACAACAAAAATGACGAAGTTTCAGACTTCATCAATCAACACATCTACTCCACAAC





GCAGCTCAAGATGGATTTATCAGGTTCCTTCTCAATCATACTTGGACTGCTTGTCCCTATACTC





CAAGAACCATTTAGACAAGTCCTAAAGAGAATACCATTCACCATGGACTGGAAAGTGGACCGGA





CAAATCAGAAATTAAGTGGTCGGCTTAATGAGATTGTGGAGAAGAGAATGAAGTGTAACGATCA





AGGTTCAAAAGACTTCTTATCGCICATTTTGAGAGCAAGAGAGTCAGAGACAGTATCAAGGAAT





GTCTTCACTCCAGACTACATCAGTGCAGTTACGIATGAACACCTACTTGCIGGGTCGGCTACCA





CGGCGTTTACGTTGTCTTCTATTGTATATTTAGTTGCTGGGCATCCAGAAGTCGAGAAGAAGTT





GCTAGAAGAGATTGACAACTTTGGTCCATCCGArCAGATACCAACAGCTAATGATCTTCATCAG





AAGTTTCCATATCTTGATCAGGTGATTAAAGAGGCTATGAGGTTCTACACTGTTTCCCCTCTAG





TAGCCAGAGAAACAGCTAAAGATGTGGAGATTGGTGGATATCTTCTTCCAAAGGGGACATGGGT





ITGGTTAGCACTTGGAGTTCTTGCCAAGGATCCAAAGAACTTTCCAGAACCAGATAAATTCAAA





CCAGAGAGGTTTGATCCAAATGAAGAAGAGGAGAAACAAAGGCATCCTTATGCTTTAATCCCCT





TTGGAATTGGTCCTCGAGCATGCATIGGTAAAAAATTCGCCCTTCAGGAGTTGAAGCTCTCGTT





GATTCATTTGTACAGGAAGTTTGTATTTCGGCAT




CYP3968 gene
ATGGAAATCATTTTATCATATCTCAACAGCTCCATAGCTGGACTCTTCCTCTTGCTTCTCTTCT
38
SEQ ID NO: 10


(coding sequence)
CGTTTTTTGTTTTGAAAAAGGCTAGAACCTGTAAACGCAGACAGCCTCCTGAAGCAGCCGGCGG

in



ATGGCCGATCATCGGCCACCTGAGACTGCTCGGGGGTTCGCAACTTCCCCATGAAACCTTGGGA

W02016050890



GCCATGGCCGACAAGTATGGACCAATCTTCAGCATCCGAGTTGGTGTCCACCCATCTCTTGTTA





TAAGCAGTTGGGAAGTGGCTAAAGAGTGCTACACCACCCTCGACTCAGTTGTCTCTTCTCGTCC





CAAGAGTTTGGGTGGAAAGTTGTTGGGCTACAACTTCGCCGCTTTTGGGTTCAGGCCTTATGAT





TCCTTTTACCGGAGTATCCGCAAAACCATAGCCTCCGAGGTGCTGTCGAACCGCCGTCTGGAGT





TGCAGAGACACATTCGAGTTTCTGAGGTGAAGAGATCGGTGAAGGAGCTTTACAATCTGTGGAC





GCAGAGAGAGGAAGGCTCAGACCACATACTTATTGATGCGGATGAATGGATTGGTAATATTAAT





TTGAACGTGATTCTGATGATGGTTTGTGGGAAGCGGTTTCTTGGCGGTTCTGCCAGCGATGAGA





AGGAGATGAGGCGGTGTCTCAAAGTCTCGAGAGATTTCTTCGATTTGACAGGGCAGTTTACGGT





GGGAGATGCCATTCCTTTCCTGCGATGGCTGGATTTGGGTGGATATGCGAAGGCGATGAAGAAA





ACTGCAAAAGAAATGGACTGTCTCGTTGAGGAATGGCTGGAAGAACACCGCCGGAAGAGAGACT





CCGGCGCCACCGACGGTGAACGTGACTTCATGGATGTGATGCTTTCGATTCTTGAAGAGATGGA





CCTTGCTGGCTACGACGCTGACACAGTCAACAAAGCCACATGCCTGAGCATTATTTCTGGGGGA





ATCGATACTATAACGCTAACTCTGACATGGGCGATCTCGTTATTGCTGAACAATCGAGAGGCAC





TGCGAAGGGTTCAAGAGGAGGTGGACATCCATGTCGGAAACAAAAGGCTTGTGGATGAATCAGA





CTTGAGCAAGCTGGTGTATCTCCAAGCCGTCGTGAAAGAGACATTAAGGTTGTACCCAGCAGGG





CCGCTGTCGGGAGCTCGAGAGTTCAGTCGGGACTGCACGGTCGGAGGGTATGACGTGGCCGCCG





GCACACGGCTCATCACAAACCTTTGGAAGATACAGACGGACCCTCGGGTGTGGCCGGAGCCACT





TGAGTTCAGGCCGGAGAGGTTTCTGAGCAGCCACCAGCAGTTGGATGTGAAGGGCCAGAACTTT





GAACTGGCCCCATTTGGTTGTGGAAGAAGAGTGTGCCCTGGGGCGGGGCTTGGGGTTCAGATGA





CGCAGTTGGTGCTGGCGAGTCTGATTCATTCGGTGGAACTTGGAACTCGCTCCGATGAAGCGGT





GGACATGGCTGCTAAGTTTGGACTCACAATGTACAGAGCCACCCCTCTTCAGGCTCTCGTCAAG





CCACGCCTCCAAGCCGGTGCTTATTCATGA




CYP4112 gene
ATGGGTGTATTGTCCATTTTATTATTCAGATATTCCGTCAAGAAGAAGCCATTAAGATGCGGTC
39
SEQ ID NO: 11


(coding sequence)
ACGATCAAAGAAGTACCACAGATAGTCCACCTGGTTCAAGAGGTTTGCCATTGATAGGTGAAAC

in



TTTGCAATTCATGGCTGCTATTAATTCTTTGAACGGTGTATACGATTTCGTTAGAATAAGATGT

W02016050890



TTGAGATACGGTAGATGCTTTAAGACAAGAATCTTCGGTGAAACCCATGTTTTTGTCTCAACTA





CAGAATCCGCTAAGTTGATCTTGAAGGATGGTGGTGAAAAATTCACCAAAAAGTACATCAGATC





AATCGCTGAATTGGTTGGTGACAGAAGTTTGTTATGTGCATCTCATTTGCAACACAAGAGATTG





AGAGGTTTGTTGACTAATTTGTTTTCTGCCACATTCTTGGCTTCTTTCGTAACTCAATTCGATG





AACAAATCGTTGAAGCTTTTAGATCATGGGAATCCGGTAGTACCATAATCGTTTTGAACGAAGC





ATTGAAGATCACTTGTAAGGCCATGTGCAAAATGGTCATGTCCTTAGAAAGAGAAAACGAATTG





GAAGCTTTGCAAAAGGAATTGGGTCATGTTTGTGAAGCTATGTTGGCATTTCCATGCAGATTCC





CTGGTACAAGATTTCACAATGGTTTGAAGGCAAGAAGAAGAATCATTAAAGTTGTCGAAATGGC





CATTAGAGAAAGAAGAAGATCTGAAGCTCCTAGAGAAGATTTCTTGCAAAGATTGTTGACAGAA





GAAAAGGAAGAAGAAGACGGTGGTGGTGTTTTAAGTGATGCCGAAATTGGTGACAACATATTGA





CAATGATGATCGCAGGTCAAGATACCACTGCCTCTGCTATTACCTGGATGGTCAAGTTTTTGGA





AGAAAACCAAGATGTATTGCAAAACTTAAGAGACGAACAATTCGAAATCATGGGTAAACAAGAA





GGTTGTGGTTCATGCTTCTTGACATTAGAAGATTTGGGTAATATGTCCTATGGTGCAAAAGTAG





TTAAGGAATCATTGAGATTAGCCTCCGTCGTACCATGGTTTCCTAGATTGGTTTTACAAGATTC





TTTGATCCAAGGTTACAAAATTAAAAAGGGTTGGAACGTCAACATAGACGTAAGATCTTTACAT





TCAGATCCATCCTTGTATAATGACCCAACAAAGTTTAACCCTAGTAGATTCGATGACGAAGCTA





AACCTTACTCATTTTTGGCATTCGGTATGGGTGGTAGACAATGTTTGGGTATGAACATGGCAAA





GGCCATGATGTTGGTTTTCTTGCACAGATTGGTCACCTCATTCAGATGGAAGGTTATAGATTCC





GACTCTTCAATCGAAAAATGGGCTTTGTTCTCTAAGTTGAAGTCAGGTTGCCCTATCGTAGTTA





CCCACATCGGTTCCTAA




CYP4149 gene
ATGGATTTCTACTGGATCTGTGTTCTTCTGCTTTGCTTCGCATGGTTTTCCATTTTATCCCTTC
40
SEQ ID NO: 12


(coding sequence)
ACTCGAGAACAAACAGCAGCGGCACTTCCAAACTTCCTCCCGGACCGAAACCCTTGCCGATCAT

in



CGGAAGCCTTTTGGCTCTCGGCCACGAGCCCCACAAGTCTTTGGCTAATCTCGCTAAATCTCAT

W02016050890



GGCCCTCTTATGACCTTAAAGCTCGGCCAAATCACCACCGTCGTAGTTTCCTCCGCTGCCATGG





CTAAGCAAGTTCTCCAAACGCACGACCAGTTTCTGTCCAGCAGGACCGTTCCAGACGCAATGAC





CTCTCACAACCACGATGCTTTCGCACTCCCATGGATTCCGGTTTCACCCCTCTGGCGAAACCTT





CGACGAATATGCAACAACCAGTTGTTTGCCGGCAAGATTCTCGACGCCAACGAGAATCTCCGGC





GAACCAAAGTGGCCGAGCTCGTATCCGATATCTCGAGAAGTGCATTGAAAGGTGAGATGGTGGA





TTTTGGAAACGTGGTGTTCGTCACTTCGCTCAATCTGCTTTCCAATACGATTTTCTCGGTGGAT





TTCTTCGACCCAAATTCTGAAATTGGGAAAGAGTTCAGGCACGCAGTACGAGGCCTCATGGAAG





AAGCTGCCAAACCAAATTTGGGGGATTATTTCCCTCTGCTGAAGAAGATAGATCTTCAAGGAAT





AAAGAGGAGACAGACCACTTACTTCGATCGGGTTTTTAATGTTTTGGAGCACATGATCGACCAG





CGTCTTCAGCAGCAGAAGACGACGTCTGGTTCTACCTCCAACAACAACAACGACTTACTGCACT





ACCTTCTCAACCTCAGCAACGAAAATAGCGACATGAAATTGGGGAAACTTGAGCTGAAACACTT





CTTATTGGTGCTATTCGTCGCTGGGACTGAAACGAGTTCTGCAACACTGCAATGGGCAATGGCA





GAACTACTAAGAAACCCAGAAAAGTTAGCAAAAGCTCAAGCGGAGACCAGGCGGGTGATTGGGA





AAGGGAACCCAATTGAAGAATCAGACATTTCGAGGCTGCCTTATCTGCAAGCAGTGGTGAAAGA





AACTTTCAGATTGCACACACCAGCGCCATTTCTACTGCCGCGCAAAGCACTACAGGACGTGGAA





ATTGCAGGTTTCACAGTCCCAAAGGACGCTCAGGTACTGGTAAATTTATGGGCTATGAGCAGAG





ATTCAAGCATCTGGGAGAACCCAGAGTGGTTCGAGCCAGAAAGGTTTTTGGAGTCGGAGCTGGA





CGTTAGAGGGAGAGATTTTGAGCTGATCCCGTTCGGCGGTGGGCGGAGGATTTGCCCCGGTCTG





CCGTTGGCGATGAGAATGTTGCATTTGATTTTGGGTTCTCTCATCCACTTCTTTGATTGGAAGC





TTGAAGATGGGTGTCGGCCGGAAGACGTGAAAATGGACGAAAAGCTTGGCCTCACTCTGGAGTT





GGCTTTTCCCCTCACAGCCTTGCCTGTCCTTGTCTAA




CYP4491 gene
ATGTCCTCCTGCGGTGGTCCAACTCCTTTGAATGTTATCGGTATCTTATTACAATCAGAATCCT
41
SEQ ID NO: 13


(coding sequence)
CCAGAGCCTGCAACTCAGACGAAAACTCAAGAATTTTGAGAGATTTCGTAACAAGAGAAGTTAA

in



CGCTTTCTTATGGTTGTCCTTGATCACTATCACAGCAGTTTTGATCAGTAAAGTTGTCGGTTTG

W02016050890



TTTAGATTGTGGTCTAAGGCAAAGCAATTGAGAGGTCCACCTTGTCCATCATTCTACGGTCATT





CTAAGATCATCTCAAGACAAAATTTGACTGATTTGTTATATGACTCCCACAAAAAGTACGGTCC





AGTAGTTAAATTGTGGTTAGGTCCTATGCAATTGTTAGTCTCCGTAAAGGAACCAAGTTTGTTG





AAGGAAATATTGGTTAAAGCTGAGGATAAGTTGCCTTTAACAGGTAGAGCCTTTAGATTGGCTT





TCGGTAGATCTTCATTATTTGCATCCAGTTTCGAAAAGGTTCAAAACAGAAGACAAAGATTGGC





CGAAAAGTTGAATAAGATCGCATTCCAAAGAGCCAACATCATTCCAGAAAAGGCCGTAGCTTGT





TTCATGGGTAGAGTTCAAGATTTGATGATAGAAGAATCTGTCGACTGTAATAAGGTTTCTCAAC





ATTTGGCTTTTACTTTGTTAGGTTGCACATTGTTTGGTGACGCCTTCTTAGGTTGGTCTAAGGC





TACAATCTATGAAGAATTGTTGATGATGATCGCTAAGGACGCATCCTTTTGGGCTAGTTATAGA





GTTACCCCAATCTGGAAGCAAGGTTTCTGGAGATACCAAAGATTGTGTATGAAGTTGAAGTGCT





TGACTCAAGATATCGTTCAACAATACAGAAAGCATTACAAGTTGTTTTCTCACTCACAAAACCA





AAACTTACACAACGAAACCAAGTCAACTGGTGTTGAAGTCGCTTTTGATATTCCACCTTGTCCT





GCTGCAGACGTTAGAAATTCTTGCTTTTTCTACGGTTTGAACGATCATGTTAACCCAAACGAAG





AACCTTGTGGTAATATTATGGGTGTCATGTTTCACGGTTGCTTGACTACAACCTCTTTGATCGC





ATCAATCTTGGAAAGATTGGCCACTAACCCAGAAATCCAAGAAAAGATTAATTCTGAATTGAAC





TTAGTTCAAAAGGGTCCAGTCAAGGATCATAGAAAGAATGTTGACAACATGCCTTTGTTATTGG





CAACAATCTATGAATCAGCTAGATTATTGCCAGCAGGTCCTTTATTGCAAAGATGTCCTTTGAA





GCAAGATTTGGTTTTGAAAACAGGTATCACCATTCCAGCTGGTACCTTGGTCGTAGTTCCTATT





AAATTGGTTCAAATGGATGACTCTTCATGGGGTTCAGATGCCAATGAGTTTAATCCATACAGAT





TCTTGTCCATGGCTTGTAATGGTATTGACATGATACAAAGAACCCCTTTAGCTGGTGAAAACAT





TGGTGACCAAGGTGAAGGTTCATTTGTCTTGAATGACCCAATTGGTAACGTAGGTTTCTTACCT





TTTGGTTTCGGTGCAAGAGCCTGCGTTGGTCAAAAGTTTATAATCCAAGGTGTCGCTACTTTGT





TCGCAAGTTTGTTGGCCCATTACGAAATTAAATTGCAATCCGAGAGTAAGAATGATTCTAAACC





ATCCAGTAACACCTCTGCCAGTCAAATCGTCCCAAACTCAAAAATCGTATTCGTAAGAAGAAAC





TCATAA




CYP5491 gene
ATGTGGACTGTCGTGCTCGGTTTGGCGACGCTGITTGTCGCCIACTACATCCATTGGATTAACA
42
SEQ ID NO: 14


(coding sequence)
AATGGAGAGATTCCAAGITCAACGGAGTTCTGCCGCCGGGCACCATGGGTTTGCCGCTCATCGG
in




AGAGACGATTCAACTGAGICGACCCAGTGACTCCCTCGACGTTCACCCITTCATCCAGAAAAAA

W02016050890



GTTGAAAGATACGGGCCGATCTTCAAAACATGTCTGGCCGGAAGGCCGGTGGTGGTGTCGGCGG





ACGCAGAGTTCAACAACTACATAATGCTGCAGGAAGGAAGAGCAGTGGAAATGTGGTATTTGGA





TACGCTCTCCAAATTTTTCGGCCTCGACACCGAGTGGCTCAAAGCTCTGGGCCTCATCCACAAG





TACATCAGAAGCATTACTCTCAATCACTTCGGCGCCGAGGCCCTGCGGGAGAGATTTCTTCCTT





TTATTGAAGCATCCTCCATGGAAGCCClrCACICCTGGTCTACTCAACCTAGCGTCGAAGTCAA





AAATGCCTCCGCTCTCATGGTTTXTAGGACCTCGGTGAATAAGATGTTCGGTGAG6ATGCGAAG





AAGCTATCGGGAAATATCCCTGGGAAGTTCACGAAGCTTCTAGGAGGATTTCTGAGTTTACCAC





TGAATTTTCCCGGCACCACCTACCACAAATGCTTGAAGGATAIGAAGGAAATCCAGAAGAAGCT





AAGAGAGGTTGTAGACGATAGATTGGCTAATGTGGGGCCTGATGTGGAAGATTTCTTGGGGCAA





GCCCTTAAAGATAAGGAATCAGAGAAGTTCATTTCAGAGGAGTTCATCATCCAACTGTTGTTTT





CTATCAGTTTTGCTAGCTTTGAGTCCATCTCCACCACTCTTACTTTGATTCTCAAGCTCCTTGA





TGAACACCCAGAAGTAGTGAAAGAGTTGGAAGCTGAACACGAGGCGATTCGAAAAGCTAGAGCA





GATCCAGATGGACCAATTACTTGGGAAGAATACAAATCCATGACTTTTACATTACAAGTCATCA





ATGAAACCCTAAGGTTGGGGAGTGTCACACCTGCCTTGTTGAGGAAAACAGTTAAAGATCTTCA





AGTAAAAGGATACATAATCCCGGAAGGATGGACAATAATGCTTGTCACCGCTTCACGTCACAGA





GACCCAAAAGTCTATAAGGACCCTCATATCTTCAATCC.ATGGCGTTGGAAGGACTTGGACTCA





TTACCATCCAAAAGAACTTCATGCCTTTTGGGGGAGGCTTAAGGCATTGTGCTGGTGCTGAGTA





CTCTAAAGTCTACTTGTGCACCTTCXTGCACATCCTCTGTACCAAATACCGATGGACCAAACTT





GGGGGAGGAAGGATTGCAAGAGCTCATATATTGAGTTTTGAAGATGGGTTACATGTGAAGTTCA





CACCCAAGGAATGA




CYP6479 gene
ATGAAGATGAAGATGGAATCCATGCGCACCTCCCTGGATATCTCCGACCATGACATACTTCCAA
43
SEQ ID NO: 15


(coding sequence)
GGGTTTATCCTCATGTTCACCTATGGATCAACAAATATGGGAAAAACTTCATTCAGTGGAATGG

in



CAACGTAGCTCAGTTGATTGTTTCGGATCCTGACACGATCAAGGAGATACTCCAAAACCGAGAA

W02016050890



CAAGCTGTTCCCAAAATAGATCTCAGCGGAGATGCACGGAGGATATTCGGGAATGGGCTTTCGA





CTTCTGACGGTGAAAAATGGGCTAAGGCTCGAAGAATCGCTGATTACGCTTTCCACGGGGATCT





CCTAAGAAATATGGGGCCAACCATGGTTTCCTGTGCTGAGGCAATGGTGGAAAAGTGGAAGCAT





CATCAAGGCAAAGAGCTTGATTTGTTCGAAGAGTTTAAGGTGCTCACTTCAGATATCATTGCAC





ATACAGCCTTTGGAAGCAGTTATTTGGAAGGGAAAGTTATTTTTCAGACTCTAAGTAAGCTGAG





CATGATATTATTTAAGAATCAGTTCAAACGAAGGATTCCTGTTATCAGCAAGTTCTTCAGATCA





AAGGATGCGAGGGAGGGAGAGGAGCTGGAAAGAAGGTTGAAAAATTCCATAATTTCAATAATGG





AAAAGAGAGAAGAGAAGGTGATAAGTGGTGAAGCAGATAACTATGGTAATGATTTTCTTGGATT





ACTTTTGAAGGCAAAGAATGAGCCTGACCAGAGGCAGAGGATTTCTGTTGATGATGTAGTGGAT





GAATGCAAAACAGTTTACTTCGCTGGGCAAGAAACTACAAGTGTTTTGCTTGCTTGGACCGCCT





TTCTTTTAGCAACTCATGAGCATTGGCAAGAAGAAGCAAGAAAGGAAGTGCTGAATATGTTTGG





CAACAAGAATCCAACTTTAGAAGGCATCACAAAATTAAAGATTATGAGCATGATCATCAAGGAA





TCTCTAAGATTATATCCTCCAGCCCCGCCCATGTCAAGGAAGGTTAAAAAGGAAGTCAGATTGG





GGAAGCTGGTTCTCCCCCCCAACATTCAAGTAAGCATCTCAACTATTGCAGTTCATCATGATAC





TGCAATATGGGGTGAAGATGCCCATGTATTCAAACCAGAAAGATTTTCTGAAGGAACAGCTAAA





GATATCCCATCAGCTGCATACATCCCATTTGGCTTTGGTCCTCGAAACTGCATCGGCAATATCT





TGGCCATCAACGAAACTAAGATTGCACTGTCGATGATTCTACAACGATTTTCTTTCACCATCTC





CCCGGCCTACGTCCACGCACCTTTCCAGTTCCTCACTATCTGCCCCCAACACGGGGTTCAGGTA





AAGCTTCAGTCCCTATTAAGTGAAAGGTGA




CYP7604 gene
ATGGAAGCTGAATTTGGTGCCGGTGCTACTATGGTATTATCCGTTGTCGCAATCGTCTTCTTTT
44
SEQ ID NO: 16


(coding sequence)
TCACATTTTTACACTTGTTTGAATCTTTCTTTTTGAAGCCAGATAGATTGAGATCTAAGTTGAG

in



AAAGCAAGGTATTGGTGGTCCATCTCCTTCATTTTTGTTGGGTAATTTGTCAGAAATTAAATCC

W02016050890



ATCAGAGCTTTGTCTTCACAAGCTAAGAACGCAGAAGATGCCTCTGCTGGTGGTGGTGGTGGTT





CCGCCAGTATAGCTCATGGTTGGACTTCAAATTTGTTTCCTCACTTAGAACAATGGAGAAACAG





ATATGGTCCAATTTTCGTATACTCCAGTGGTACAATCCAAATCTTGTGTATCACAGAAATGGAA





ACCGTTAAGGAAATCTCTTTGTCAACCTCCTTGAGTTTAGGTAAACCTGCTCATTTGTCTAAGG





ATAGAGGTCCATTGTTAGGTTTGGGTATCTTAGCCTCTTCAGGTCCTATTTGGGTTCACCAAAG





AAAGATCATCGCTCCACAATTGTATTTGGATAAAGTAAAGGGTATGACCTCATTGATGGTTGAA





AGTGCAAATTCTATGTTAAGATCCTGGGAAACTAAAGTTGAAAATCATGGTGGTCAAGCCGAAA





TTAACGTCGATGGTGACTTGAGAGCATTAAGTGCCGATATCATTTCTAAGGCTTGCTTTGGTTC





AAACTATTCCGAAGGTGAAGAAATTTTCTTGAAGTTGAGAGCATTGCAAGTTGTCATGAGTAAG





GGTTCTATTGGTATACCTGGTTTTAGATACATACCAACTAAAAATAACAGAGAAATGTGGAAGT





TGGAAAAGGAAATCGAATCAATGATCTTGAAGGTTGCCAACGAAAGAACACAACATTCCAGTCA





CGAACAAGATTTGTTGCAAATGATTTTGGAAGGTGCAAAGTCTTTGGGTGAAGACAATAAGAGT





ATGAACATATCAAGAGACAAGTTTATTGTTGACAATTGTAAGAACATCTATTTCGCTGGTCATG





AAACTACAGCTATAACCGCATCTTGGTGCTTGATGTTGTTAGCTGCACACCCTGATTGGCAAGC





AAGAGCCAGATCTGAAGTTTTACAATGTTGCGATGACAGACCAATCGATGCAGACACAGTCAAA





AATATGAAGACCTTGACTATGGTAATTCAAGAAACTTTGAGATTGTACCCACCTGCTGTATTCG





TTACAAGACAAGCATTAGAAGATATCAGATTCAAAAACATCACAATACCAAAGGGTATGAACTT





TCATATACCAATCCCTATGTTGCAACAAGACTTCCACTTATGGGGTCCTGATGCTTGTTCATTT





GACCCACAAAGATTCTCCAATGGTGTCTTAGGTGCATGCAAAAACCCACAAGCCTATATGCCTT





TTGGTGTTGGTCCAAGAGTCTGTGCCGGTCAACATTTCGCTATGATCGAATTGAAAGTCATCGT





ATCATTGGTTTTGTCCAGATTCGAATTTTCTTTGTCACCTTCCTACAAGCATTCACCAGCCTTC





AGATTAGTTGTCGAACCAGAAAACGGTGTCATATTGCATGTCAGAAAGTTGTGA




CYP8224 gene
ATGGAAGTGGATATCAATATCTTCACCGTCTTTTCCTTCGTATTATGCACAGTCTTCCTCTTCT
45
SEQ ID NO: 17


(coding sequence)
TTCTATCCTTCTTGATCCTCCTCCTCCTCCGAACGCTCGCCGGAAAATCCATAACGAGCTCCGA

in



GTACACGCCAGTGTACGGCACCGTCTACGGTCAGGCTTTCTATTTCAACAACCTGTACGATCAT

W02016050890



CTAACGGAGGTGGCCAAGAGACATCGAACCTTCCGGCTGCTTGCGCCGGCATACAGCGAGATAT





ACACGACCGATCCGAGAAACATCGAGCATATGTTGAAGACGAAATTCGATAAGTATTCGAAAGG





AAGCAAGGATCAAGAAATCGTTGGGGATCTGTTTGGAGAGGGGATATTTGCAGTCGATGGAGAT





AAGTGGAAGCAGCAGAGGAAGCTGGCTAGCTATGAATTCTCGACGAGGATTCTTAGGGATTTTA





GCTGCTCGGTTTTCAGACGAAGTGCTGCTAAACTTGTTGGAGTTGTTTCGGAGTTTTCCAGCAT





GGGTCGGGTTTTTGATATCCAGGATTTGCTAATGCGGTGCGCTTTGGACTCCATTTTCAAAGTG





GGGTTCGGGGTTGATTTGAATTGCTTGGAGGAATCAAGCAAAGAAGGGAGCGATTTCATGAAAG





CCTTCGATGATTCTAGCGCTCAGATTTTTTGGCGCTATATCGATCCCTTCTGGAAATTGAAGAG





ATTGCTTAACATCGGTTCCGAAGCTTCGTTTAGGAACAACATAAAAACCATAGATGCTTTTGTG





CACCAGTTGATCAGAGACAAGAGAAAATTGCTTCAGCAACCGAATCACAAGAATGACAAAGAGG





ACATACTTTGGAGGTTTCTGATGGAAAGTGAGAAGGATCCAACAAGAATGAATGATCAATATCT





AAGGGATATAGTCCTCAATTTCATGTTGGCTGGCAAAGATTCAAGTGGAGGAACTCTGTCCTGG





TTCTTCTACATGCTATGCAAGAACCCTTTAATACAGGAAAAAGTTGCAGAAGAAGTGAGGCAAA





TTGTTGCGTTTGAAGGGGAAGAAGTTGACATCAATTTGTTCATACAAAACTTAACTGATTCAGC





TCTTGACAAAATGCATTATCTTCATGCAGCATTGACCGAGACTCTGAGGCTATATCCTGCAGTC





CCTTTGGATGGAAGGACTGCAGAAATAGATGACATTCTTCCTGATGGCTATAAACTAAGAAAAG





GGGATGGAGTATACTACATGGCCTATTCCATGGGCAGGATGTCCTCCCTTTGGGGAGAAGATGC





TGAAGATTTTAAACCCGAAAGATGGCTTGAAAGTGGAACTTTTCAACCCGAATCACCTTTCAAA





TTCATCGCTTTTCATGCGGGTCCTCGAATGTGTTTGGGAAAAGAGTTTGCTTATCGACAAATGA





AGATAGTATCTGCTGCTTTGCTTCAATTTTTTCGATTCAAAGTAGCTGATACAACGAGGAATGT





GACTTATAGGATCATGCTTACCCTTCACATTGATGGAGGTCTCCCTCTTCTTGCAATTCCGAGA





ATTAGAAAATTTACCTAA




CYP8728 gene
TTGGATAGTGGAGTTAAAAGAGTGAAACGGCTAGTTGAAGAGAAACGGCGAGCAGAATTGTCTG
46
SEQ ID NO: 18


sequence
CCCGGATTGCCTCTGGAGAATTCACAGTCGAAAAAGCTGGTTTTCCATCTGTATTGAGGAGTGG

in



CTTATCAAAGATGGGTGTTCCCAGTGAGATTCTGGACATATTATTTGGTTTCGTTGATGCTCAA

W02016050890



GAAGAATATCCCAAGATTCCCGAAGCAAAAGGATCAGTAAATGCAATTCGTAGTGAGGCCTTCT





TCATACCTCTCTATGAGCTTTATCTCACATATGGTGGAATATTTAGGTTGACTTTTGGGCCAAA





GTCATTCTTGATAGTTTCTGATCCTTCCATTGCTAAACATATACTGAAGGATAATCCGAGGAAT





TATTCTAAGGGTATCTTAGCTGAAATTCTAGAGTTTGTCATGGGGAAGGGACTTATACCAGCTG





ACGAGAAGATATGGCGTGTACGAAGGCGGGCTATAGTCCCATCTTTGCATCTGAAGTATGTAGG





TGCTATGATTAATCTTTTTGGAGAAGCTGCAGATAGGCTTTGCAAGAAGCTAGATGCTGCAGCA





TCTGATGGGGTTGATGTGGAAATGGAGTCCCTGTTCTCCCGTTTGACTTTAGATATCATTGGCA





AGGCAGTTTTTAACTATGACTTTGATTCACTTACAAATGACACTGGCATAGTTGAGGCTGTTTA





CACTGTGCTAAGAGAAGCAGAGGATCGCAGTGTTGCACCAATTCCAGTATGGGAAATTCCAATT





TGGAAGGATATTTCACCACGGCAAAAAAAGGTCTCTAAAGCCCTCAAATTGATCAACGACACCC





TCGATCAACTAATTGCTATATGCAAGAGGATGGTTGATGAGGAGGAGCTGCAGTTTCATGAGGA





ATACATGAATGAGCAAGATCCAAGCATCCTTCATTTCCTTTTGGCATCAGGAGATGATGTTTCA





AGCAAGCAGCTTCGTGATGACTTGATGACTATGCTTATAGCTGGGCATGAAACATCTGCTGCAG





TTTTAACATGGACCTTTTATCTTCTTTCCAAGGAGCCGAGGATCATGTCCAAGCTCCAGGAGGA





GGTTGATTCAGTCCTTGGGGATCGGTTTCCAACTATTGAAGATATGAAGAACCTCAAATATGCC





ACACGAATAATTAACGAATCCTTGAGGCTTTACCCACAGCCACCAGTTTTAATACGTCGATCTC





TTGACAATGATATGCTCGGGAAGTACCCCATTAAAAAGGGTGAGGACATATTCATTTCTGTTTG





GAACTTGCATCGCAGTCCAAAACTCTGGGATGATGCGGATAAATTTAATCCTGAAAGGTGGCCT





CTGGATGGACCCAATCCAAATGAGACAAATCAAAATTTCAGATATTTACCTTTTGGTGGCGGAC





CACGGAAATGTGTGGGAGACATGTTTGCTTCGTACGAGACTGTTGTAGCACTTGCAATGCTTGT





TCGGCGATTTGACTTCCAAATGGCACTTGGAGCACCTCCTGTAAAAATGACAACTGGAGCTACA





ATTCACACAACAGATGGATTGAAAATGACAGTTACACGAAGAATGAGACCTCCAATCATACCCA





CATTAGAGATGCCTGCAGTGGTCGTTGACTCGTCTGTCGTGGACTCGTCCGTCGCCATTTTGAA





AGAAGAAACACAAATTGGTTAG




DNA sequence
CAGTTCCTCTCCTGGTCCTCCCAGTTTGGCAAGAGGTTCATCTTCTGGAATGGGATCGAGCCCA
47
SEQ ID NO: 19


encoding CYP10020
GAATGTGCCTCACCGAGACCGATTTGATCAAAGAGCTTCTCTCTAAGTACAGCGCCGTCTCCGG

in



TAAGTCATGGCTTCAGCAACAGGGCTCCAAGCACTTCATCGGCCGCGGTCTCTTAATGGCCAAC

W02016050890



GGCCAAAACTGGTACCACCAGCGTCACATCGTCGCGCCGGCCTTCATGGGAGACAGACTCAAGA





GTTACGCCGGGTACATGGTGGAATGCACAAAGGAGATGCTTCAGTCAATTGAAAACGAGGTCAA





CTCGGGGCGATCCGAGTTCGAAATCGGTGAGTATATGACCAGACTCACCGCCGATATAATATCA





CGAACCGAGTTCGAAAGCAGCTACGAAAAGGGAAAGCAAATTTTCCATTTGCTCACCGTTTTAC





AGCATCTCTGCGCTCAGGCGAGCCGCCACCTCTGCCTTCCTGGAAGCCGGTTTTTTCCGAGTAA





ATACAACAGAGAGATAAAGGCATTGAAGACGAAGGTGGAGGGGTTGTTAATGGAGATAATACAG





AGCAGAAGAGACTGTGTGGAGGTGGGGAGGAGCAGTTCGTATGGAAATGATCTGTTGGGAATGT





TGCTGAATGAGATGCAGAAGAAGAAAGATGGGAATGGGTTGAGCTTGAATTTGCAGATTATAAT





GGATGAATGCAAGACCTTCTTCTTCGCCGGCCATGAAACCACTGCTCTTTTGCTCACTTGGACT





GTAATGTTATTGGCCAGCAACCCTTCTTGGCAACACAAGGTTCGAGCCGAAGTTATGGCCGTCT





GCAATGGAGGAACTCTCTCTCTTGAACATCTCTCCAAGCTCTCTCTGTTGAGTATGGTGATAAA





TGAATCGTTGAGGCTATACCCGCCAGCAAGTATTCTTCCAAGAATGGCATTTGAAGATATAAAG





CTGGGAGATCTTGAGATCCCAAAAGGGCTGTCGATATGGATCCCAGTGCTTGCAATTCACCACA





GTGAAGAGCTATGGGGCAAAGATGCAAATGAGTTCAACCCAGAAAGATTTGCAAATTCAAAAGC





CTTCACTTCGGGGAGATTCATTCCCTTTGCTTCTGGCCCTCGCAACTGCGTTGGCCAATCATTT





GCTCTCATGGAAACCAAGATCATTTTGGCTATGCTCATCTCCAAGTTTTCCTTCACCATCTCTG





ACAATTATCGCCATGCACCCGTGGTCGTCCTCACTATAAAACCCAAATACGGAGTCCAAGTTTG





CTTGAAGCCTTTCAATTAA




DNA sequence
ATGGAAGACACCTTCCTACTCTATCCTTCCCTCTCTCTTCTCTTTCTTCTTTTTGCTTTCAAGC
48
SEQ ID NO: 20


encoding CYP10285
TCATCCGTCGATCCGGAGGAGTTCGCAGGAACTTACCGCCGAGTCCGCCCTCTCTTCCGGTTAT

in



CGGCCACCTCCATCTCTTGAAAAAGCCACTCCACCGGACTTTCCAGAAACTTTCCGCCAAATAT

W02016050890



GGTCCTGTTATGTCCCTCCGCCTCGGGTCTCGCCTCGCAGTCATTGTATCGTCGTCGTCGGCGG





TGGACGAGTGTTTCACTAAAAACGACGTCGTGCTCGCCAACCGTCCTCGTTTGCTAATTGGCAA





ACACCTCGGCTACAACTACACTACCATGGTTGGGGCTCCCTACGGCGACCACTGGCGTAGCCTC





CGCCGCATCGGTGCCCTCGAAATCTTCTCTTCATCTCGCCTCAACAAATTCGCCGACATCCGAA





GGGATGAAGTAGAGGGATTGCTTCGCAAACTCTCACGCAATTCGCTCCATCAATTCTCGAAAGT





GGAAGTTCAATCGGCCTTGTCGGAGCTGACGTTCAACATCTCGATGAGAATGGCGGCAGGGAAA





CGGTATTACGGAGATGACGTGACGGACGAGGAAGAGGCGAGAAAGTTCAGAGAGTTAATTAAAC





AGATAGTGGCGCTGGGCGGAGTATCAAATCCAGGGGATTTCGTCCCGATTCTGAATTGGATTCC





GAACGGTTTCGAGAGGAAGTTGATCGAGTGTGGGAAGAAGACGGATGCGTTCTTGCAGGGGCTG





ATCGAGGACCACCGGAGAAAGAAGGAAGAGGGTAGGAACACGATGATCGATCACCTGCTCTCTC





TGCAAGAATCGGAGCCTGCTCACTACGGAGACCAAATAATCAAAGGATTTATACTGGTGTTACT





GACGGCGGGGACCGATACATCGGCCGTGACAATGGAGTGGGCGCTATCTCATCTCCTGAACAAT





CCTGAAGTGCTAAAGAAGGCAAGAGATGAGGTCGACACTGAAATTGGACAAGAACGACTTGTCG





AAGAATCAGACGTAGTATCTAAGTTACCCTATCTTCAAGGGATCATCTCCGAGACTCTCCGGCT





GAATCCCGCCGCTCCGATGTTGTTGCCCCATTACGCCTCGGACGACTGCACGATATGTGGATAC





GACGTGCCACGTGACACAATCGTAATGGTCAATGCATGGGCCATACATAGGGATCCAAACGAAT





GGGAGGAGCCCACGTGTTTCAGACCAGAACGATATGAAAAGTCGTCGTCGGAAGCGGAGGTACA





CAAGTCGGTGAGTTTCGGGGTGGGAAGGCGAGCTTGTCCTGGGTCTGGCATGGCGCAGAGGGTG





ATGGGCTTGACTTTGGCGGCACTGGTTCAGTGCTTCGAGTGGGAGAGAGTTGGAGAAGAAGAAG





TGGACATGAACGAAGGCTCAGGTGCCACAATGCCCAAGATGGTGCCATTGGAGGCCATGTGCAG





AGCTCGTCCCATCGTCCACAACCTTCTTTACTGA




CYP5491 protein
MWTVVLGLATLFVAYYIHWINKWRDSKFNGVLPPGTMGLPLIGETIQLSRPSDSLDVHPFIQKK
49
SEQ ID NO: 44



VERYGPIFKTCLAGRPVVVSADAEFNNYIMLQEGRAVEMWYLDTLSKFFGLDTEWLKALGLIHK

in



YIRSITLNHFGAEALRERFLPFIEASSMEALHSWSTQPSVEVKNASALMVFRTSVNKMFGEDAK

W02016050890



KLSGNIPGKFTKLLGGFLSLPLNFPGTTYHKCLKDMKEIQKKLREVVDDRLANVGPDVEDFLGQ





ALKDKESEKFISEEFIIQLLFSISFASFESISTTLTLILKLLDEHPEVVKELEAEHEAIRKARA





DPDGPITWEEYKSMTFTLQVINETLRLGSVTPALLRKTVKDLQVKGYIIPEGWTIMLVTASRHR





DPKVYKDPHIFNPWRWKDLDSITIQKNFMPFGGGLRHCAGAEYSKVYLCTFLHILCTKYRWTKL





GGGRIARAHILSFEDGLHVKFTPKE




Squalene epoxidase
MSAVNVAPELINADNTITYDAIVIGAGVIGPCVATGLARKGKKVLIVERDWAMPDRIVGELMQP
50
SEQ ID NO: 54


(S. cerevisiae)
GGVRALRSLGMIQSINNIEAYPVTGYTVFFNGEQVDIPYPYKADIPKVEKLKDLVKDGNDKVLE

in



DSTIHIKDYEDDERERGVAFVHGRFLNNLRNITAQEPNVTRVQGNCIEILKDEKNEVVGAKVDI

W02016050890



DGRGKVEFKAHLTFICDGIFSRFRKELHPDHVPTVGSSFVGMSLFNAKNPAPMHGHVILGSDHM





PILVYQISPEETRILCAYNSPKVPADIKSWMIKDVQPFIPKSLRPSFDEAVSQGKFRAMPNSYL





PARQNDVTGMCVIGDALNMRHPLTGGGMTVGLHDVVLLIKKIGDLDFSDREKVLDELLDYHFER





KSYDSVINVLSVALYSLFAADSDNLKALQKGCFKYFQRGGDCVNKPVEFLSGVLPKPLQLTRVF





FAVAFYTIYLNMEERGFLGLPMALLEGIMILITAIRVFTPFLFGELIG




Squalene epoxidase
MVDQFSLAFIFASVLGAVAFYYLFLRNRIFRVSREPRRESLKNIATTNGECKSSYSDGDIIIVG
51
SEQ ID NO: 88


(Gynostemma
AGVAGSALAYTLGKDGRRVHVIERDLTEPDRTVGELLQPGGYLKLTELGLEDCVNEIDAQRVYG

in



pentaphyllum)

YALFKDGKDTKLSYPLEKFHSDVSGRSFHNGRFIQRMREKAATLPNVRLEQGTVTSLLEENGII

W02016050890



KGVQYKSKTGQEMTAYAPLTIVCDGCFSNLRRSLCNPKVDVPSCFVALVLENCELPHANYGHVI





LADPSPILFYPISSTEVRCLVDVPGQKVPSISNGEMANYLKSVVAPQIPPQIYDALRSCYDKGN





IRTMPNRSMPADPYPTPGALLMGDAFNMRHPLTGGGMTVALSDIVVLRDLLKPLRDLHDAPILS





NYLEAFYTLRKPVASTINTLAGALYKVFCASPDQARREMRQACFDYLSLGGVFSNGPVSLLSGL





NPRPLSLVLHFFAVAIYGVGRLLIPFPSPRRVWIGARLISGASGIIFPIIKAEGVRQIFFPATL





PAYYRAPPLVRGR




Squaiene epoxidase
MESQLWNWILPLLISSLLISFVAFYGFFVKPKRNGLRHDRKTVSTVTSDVGSVNITGDTVADVI
52
SEQ ID NO: 89


1 (Arabidopsis
VVGAGVAGSALAYTLGKDKRRVHVIERDLSEPDRIVGELLQPGGYLKLLELGIEDCVEEIDAQR

in



thaliana)

VYGYALFKNGKRIRLAYPLEKFHEDVSGRSFHNGRFIQRMREKAASLPNVQLEQGTVLSLLEEN

W02016050890



GTIKGVRYKNKAGEEQTAFAALTIVCDGCFSNLRRSLCNPQVEVPSCFVGLVLENCNLPYANHG





HVVLADPSPILMYPISSTEVRCLVDVPGQKVPSIANGEMKNYLKTVVAPQMPHEVYDSFIAAVD





KGNIKSMPNRSMPASPYPTPGALLMGDAFNMRHPLTGGGMTVALADIVVLRNLLRPLRDLSDGA





SLCKYLESFYTLRKPVAATINTLANALYQVFCSSENEARNEMREACFDYLGLGGMCTSGPVSLL





SGLNPRPLTLVCHFFAVAVYGVIRLLIPFPSPKRIWLGAKLISGASGIIFPIIKAEGVRQMFFP





ATVPAYYYKAPTVGETKCS




Squalene epoxidase
MTYAWLWTLLAFVLTWMVFHLIKMKKAATGDLEAEAEARRDGATDVIIVGAGVAGASLAYALAK
53
SEQ ID NO: 90


4 (Arabidopsis
DGRRVHVIERDLKEPQRFMGELMQAGGRFMLAQLGLEDCLEDIDAQEAKSLAIYKDGKHATLPF

in



thaliana)

PDDKSFPHEPVGRLLRNGRLVQRLRQKAASLSNVQLEEGTVKSLIEEEGVVKGVTYKNSAGEEI

W02016050890



TAFAPLTVVCDGCYSNLRRSLVDNTEEVLSYMVGYVTKNSRLEDPHSLHLIFSKPLVCVIYQIT





SDEVRCVAEVPADSIPSISNGEMSTFLKKSMAPQIPETGNLREIFLKGIEEGLPEIKSTATKSM





SSRLCDKRGVIVLGDAFNMRHPIIASGMMVALSDICILRNLLKPLPNLSNTKKVSDLVKSFYII





RKPMSATVNTLASIFSQVLVATTDEAREGMRQGCFNYLARGDFKTRGLMTILGGMNPHPLTLVL





HLVAITLTSMGHLLSPFPSPRRFWHSLRILAWALQMLGAHLVDEGFKEMLIPTNAAAYRRNYIA





TTTV




Squaiene epoxidase
MAFTHVCLWTLVAFVLTWTVFYLTNMKKKATDLADTVAEDQKDGAADVIIVGAGVGGSALAYAL
54
SEQ ID NO: 91


6 (Arabidopsis
AKDGRRVHVIERDMREPERMMGEFMQPGGRLMLSKLGLQDCLEDIDAQKATGLAVYKDGKEADA

in



thaliana)

PFPVDNNNFSYEPSARSFHNGRFVQQLRRKAFSLSNVRLEEGTVKSLLEEKGVVKGVTYKNKEG

W02016050890



EETTALAPLTVVCDGCYSNLRRSLNDDNNAEIMSYIVGYISKNCRLEEPEKLHLILSKPSFTMV





YQISSTDVRCGFEVLPENFPSIANGEMSTFMKNTIVPQVPPKLRKIFLKGIDEGAHIKVVPAKR





MTSTLSKKKGVIVLGDAFNMRHPVVASGMMVLLSDILILRRLLQPLSNLGDANKVSEVINSFYD





IRKPMSATVNTLGNAFSQVLIGSTDEAKEAMRQGVYDYLCSGGFRTSGMMALLGGMNPRPLSLV





YHLCAITLSSIGQLLSPFPSPLRIWHSLKLFGLAMKMLVPNLKAEGVSQMLFPANAAAYHKSYM





AATTL




Squalene epoxidase
MAFTNVCLWTLLAFMLTWTVFYVTNRGKKATQLADAVVEEREDGATDVIIVGAGVGGSALAYAL
55
SEQ ID NO: 92


5 (Arabidopsis
AKDGRRVHVIERDLREPERIMGEFMQPGGRLMLSKLGLEDCLEGIDAQKATGMTVYKDGKEAVA

in



thaliana)

SFPVDNNNFPFDPSARSFHNGRFVQRLRQKASSLPNVRLEEGTVKSLIEEKGVIKGVTYKNSAG

W02016050890



EETTALAPLTVVCDGCYSNLRRSLNDNNAEVLSYQVGFISKNCQLEEPEKLKLIMSKPSFTMLY





QISSTDVRCVFEVLPNNIPSISNGEMATFVKNTIAPQVPLKLRKIFLKGIDEGEHIKAMPTKKM





TATLSEKKGVILLGDAFNMRHPAIASGMMVLLSDILILRRLLQPLSNLGNAQKISQVIKSFYDI





RKPMSATVNTLGNAFSQVLVASTDEAKEAMRQGCYDYLSSGGFRTSGMMALLGGMNPRPISLIY





HLCAITLSSIGHLLSPFPSPLRIWHSLRLFGLAMKMLVPHLKAEGVSQMLFPVNAAAYSKSYMA





ATAL




Squalene epoxidase
MKPFVIRNLERFQSTLRSSLLYTNHRIPSSRYSLSTRRFTTGATYIRRWKATAAULKLSAVNST
56
SEQ ID NO: 93


2 (Arabidopsis
VMMKPAKIALDQFIASLFTFLLLYILRRSSNKNKKNRGLVVS0NDTVSKNLETEVDSGTDVIIV

in



thaliana)

GAGVAGSALAHTLGKEGRRVHVIERDFSEQDRIVGELLQPGGYLKLIELGLEDCVKKIDAQRVL

W02016050890



GYVLFKDGKHTKLAYPLETFDSDVAHNGRFVQRMREKALSNVRLEQGTVTSLLEEHGT





IKGVRIRTKEGNEFRSFAFLTIVCDGCFSNLRRSLCKPKVDVPSTFVGLVLENCELPFANHGHV





VLGDPSPILMYPISSSEVRCLVDVPGLPPIANGEMAKYLVAPQVPTKVREAFITKVEKG





NIRTMPNRSMPADPIPTPGALLLGDAFNMRHPLTGGGMTVALADIVVLRDLLRPIRNLNDKEAL





SKYIESFYTLRKPVASTINTLAD.ALYKV7LASSDEARTEMREACFDYLSLGGVFSSGPVALLSG





LNPRPLSLVLHFFAVAIYAVCRLMLPFPSTESFWLGARIISSASSIIFPIIKAEGVRQMFFPRT





IPAIYRAPP




Squalene epoxidase
MAPTIFVDHCILTTTFVASLFAFLLLYVLRRRSKTIHGSVNVRNGTLTVKSGTDVDIIIVGAGV
57
SEQ ID NO: 94


3 (Arabidopsis
AGAALAHTLGKEGRRVUVIERDLTEPDRIVGELLQPGGYLKLIELGLEDCVKDIDAQRVLGYAL

in



thaliana)

FKDGKHTKLSYPLDQFDSDVAGRSFHNGRFVQRMRSKASLLPNVRMEQGTVTSLVEENGIIKGV

W02016050890



QYKTKDGQELKSFAPLTIVCDGCFSNLRRSLCKPKVEVPSNFVGLVLENCELPFPNHGHWLGD





PSPILFYPISSSEVRCLVDVPGSKLPSVASGEMAHHLKTMVAPQVPPQIRBAFISAVEKGNIRT





MPNRSMPADPIHTPGALLLGDAFNMRHLTGGGMTVALSDIVILRDLLNPLVDLTNKESLSKYI





ESFYTLRKPVASTINTLAGALYKVFLADDARSEMRRACFDYLSLGGVCSSGVALLSGLNPR





PMSLVLKFFAVAIFGVGRLLVPLPSVKRLWLGARLISSASGIIFPIIKAEGVRQMFFPRTIPAI





YRAPPTPSSSSPQ




Squalene
MDLAFPHVCLWTLLAFVLTWTVFYVNNRRKKVAHLPDAATEVRRDGDADVIIVGAGVGGSALAY
58
SEQ ID NO: 95


monooxygenase 1,1
ALAKDGRRVIIVIERDMREPVRMMGEFMQPGGRLLLSKLGLEDCLEGIDEQIATGLAVYKDGQKA

in


(Brassicanapus)
LVSFPEDNDFPYEPTGRAFYNGRFVQRLRQKASSLPTVOLEEGTVKSLIEEKGVIKGVTYKNSA

W02016050890



GEETTAFAPLTVVCDGCYSNLRRSVNDNNAEVISYQVGYVSKNCQLEDPEHLKLIMSKPSTTML





YQISSTDVRCVMEIFTGNIPSISNGEMAVYLKNTMAPOVPPELRKIFLKGIDEGAOIKAMPTKR





MEATLSEKQGVIVLGDAFNMRHPAIASGMMVVLSDILILRRLLQPLRNLSDANKVSEVIKSFYV





IRKPMSATVNTLGNAFSQVIIASTDEAKEAMRQGCFDYLSSGRTSGMMALLGGMNPRPLSLI





FHLCGITLSSIGQLLSPYPSPLGIWHSLRLYGAEGVSQMLSPAYPAAYRKSYMTATAL




Squalene
MDMAFVEVCLRMLLVFVLSWTIFHVNNRKKKKATKLADLATEERKEGGPDVIIVGAGVGGSALA
59
SEQ ID NO: 96


monooxygenase 1,2
YALAKDGRRVIIVIERDMREPVRMMGEFMQPGGRLMLSKLGLQDCLEEIDAQKSTGIRLFKDGKE

in


(Brassicahapus)
TVACFPVDTNFTYEPSGRFFHNGRFVQRLROKASSLPNVRLEEGTVRSLIEEKGVVKGVTYKNS

W02016050890



SGEETTSFAPLTVVCDGCHSNLRRSLNDNNAEVTAYEIGYISRNCRLEQPDKLITLIMAKPSFAM





LYQVSSTDVRCNFELLSKNLPSVSNGEMTSFVRNSIAPQVPLKLRKTFLDEGSHIKITQAK





RIPATLSRKKGATIVLGDAFNMRHPVIA3GMMVLLSDILILSRLLKPLGNLGDENKVSEVMKSFY





ALRKPMSATVNTLGNSFWQVLIASTDEAKEAMRQGCFDYLSSGGFRTSGLMALIGGMNPRPLSL





FYJILFVISLSSIGQLLSPFPTPLRVWHSLRLLDLSLKMLVPHLKAEGIGQMLSPTNAAAYRKSY





MAATVV




Squalene epoxidase
MEVIFDTYIFGTFFASLCAFLLLFILRPKVKKMGKIREISSINTQNDTAITPPKGSGTDVIIVG
60
SEQ ID NO: 97


(Euphorbia
AGVAGAALACTLGKDGRRVEVIERDLKEPDRIVGELLQLKLVELGLQDCVEEIDAQRIVG

in



tirucalli)

YALFMDGNNTKLSYPLEKFDAEVSGKSFHNGRFIQRMREKAASLNVULEQGTVTSLLEENGTI

W02016050890



KGVQYKTKDGQEHKAYAPLTVVCDGCFSNLRRSLCKPKVDVPSHFVGLVLENCDLPFANHGHVI





LADPSPILFYPISSTEVRCLVDVPGQKLPSIASMAILKTMVAKQIPPVLHDAFVSAIDKGN





IRTMFNRSMPADPLPTPGALLMGDAFNMREPLTGGGNIVALADIVIARDLLKPLRDLNDAFALA





KYLESFYTLRKPVASTINTLAGALYKVFSASPDEARKEMRQACFDYLSLGGECAMGPVSLLSGL





NPSPLTLVLHFFGVAIYGVGRLLIPFPTPKGMWIGARIISSASGIIFPIIKAEGVRQVFFPATV





PAIYRNPPVNGKSVEVPKS




Squalene epoxidase
MTDPYGFGWITCTLITLAALYNFLFSRKNHSDSUTINITTATGECRSFNPNGDVDIIIVGAGV
61
SEQ ID NO: 98


(Medicago
AGSALAYTLGRRVLIIERDLNEPDRIVGELLQPGGYLKLIELGLDDCVEKIDAQKVFGYAL

in



truhcatula)

FKDGHTRLSYPLEKFHSDIARSFHNGRFILRMRAASLPWLEQGTVTSLLEENGTIKGV

W02016050890



QYKTKDAQEFSACAPLTIVCDGCFSNLRRSLCNPKVEVPSCFVGLVLENCELPCADHGHVILGD





PSPVLFYPISSTEIRCLVDVPGOKVPSISNGEMAKYLKTVVAPQVPPELHAAFIAAVDKGHIRT





MPNRSMPADPYPTPGALLMGDAFNMRHPLTGGGMTVALSDIVVLRNLLKPLRDLNDASSLCKYL





ESFYTLRKPVASTINTLAGALYKVFCASPDPARKEMROACFDYLSLGGLFSEGPVSLLSGLNPC





PLSLVLHFFAVAIYGVGRLLLPFTSPKRLWIGIRLIASASGIILPIIKAEGIRQMFFTATVPAY





YRAPPDA




Squalene
MDLYNIGWILSSVLSLFALYNLIFAGKKNYDVNEKVNOREDSVTSTDAGEIKSDKLNGDADVII
62
SEQ ID NO: 99


monooxygenase
VGAGIAGAALAHTLGKDGRRVHIIERDLSEPDRIVGELLQPGGYLKLVELGLQDCVDNIDAORV

in


(Medicago
FGYALFKDGKIITRLSYPLEKFHSDVSGRSFHNGRFIQRMREHAASLPNVNMEQGTVISLLEEKG

W02016050890



truncatula)

TIKGVOYKNKDOQAL7LAYAPLTIVCDOCFSNLRRSLONPKVDNITSCFVGLILENCELPCANHGH





VILGDPSPILFYPISSTEIROLVDVPOTKVPSISNGDMTKYLKTTVAPOVPPELYDAFIAAVDK





GNIRTMPNRSMPADPRPTPGAVLMGDAFNMRHPLIGGGMTVALSDIVVLRNLLKPMRDLNDAPT





LCHYLESFYILRKPVASTINTLAGALYKVFSASPDEARKEMRQACFDYLSLGGLFSEOPISLLS





OLNPRPLSLVLHFFAVAVFOVORLLLPYPSPKRVNIGARLLSGASGIILPIIKAEGIROMFFPA





TVPAYYRAPPVNAF




Squalene
MADNYLLGWILCSIIGIZOLYYMVYLVVKREEEDNNRKALLQARSDSAKTMSAVSQNGEORSDN
63
SEQ ID NO:


monooxygenase
PADADIIIVGAGVAGSALANTLGKDGRRVHVIERDLTEPDRIVGELLQPGGYLKLIELGLEDCV

100 in


(Ricinus communis)
EEIDAQRVFOYALFMDGKIITQLSYPLEKFHSDVAGRSFHNGRFIQRMREHASSIPNVRLEQGTV

W02016050890



TSLIEEKGIIRGVVYKTKIGEELTAFAPLTIVCDOCFSNLRRSLONPKVDVPSCFVGLVLEDCK





LPYQYHONVVLADPSPILFWISSIEVRCLVDVPOQKVPSISNGEMAKYLKNVVAPWIPPEIYD





SFVAAVDKGNIRTMPNRSMFASPYPTPGALLMGDAFNMRHPLTGGGMTVALSDIVVLRELLKPL





RDLHDAPTLCRYLESFYTPVASTINTLAGALTKVFCASSDEARNEMRQACFDYLSLGGVFS





TGPISLLSGLNPRPLSLVVHFFAVA+32GVGRLLLPFPSPKRVWVGARLISGASGIIFPIIAEG





VRQMFFETATVPAYYRAPPVECN




Squalene
MEYKLAVAGITASLWALFMLCSLKRKKNITRASFNNYTDETLKSSSKEICQPEIVASPDIIIVG
64
SEQ ID NO:


monooxygenase
AGVAGAALAYALGEDGRQVEVIERDLSEPDRIVGELLO_PLKLIELGLEDCVEKIDAWYFG

101 in


(Ricinuscommunis)
YAIFKDGKSTKLSYPLDGFUNVSGRSFHNGRFIQRMREKATSLPNLILQQ+32TSLVEKKGTV

W02016050890



KGVNYRTRNOQEMTAYAPLTIVCDOCFSNLRRSLCNPKVEIPSOFVALVLENCDLPYANHONVI





LADPSPILFYPISSTEVROLVDIPOQKVPSISNGELAQYLKSTVAKQIPSELHDAFISAIEKON





IRTMFNRSMPASPHPTPGALLVGDATE7NM.REPLTOGGNIVALSDIVLLRNLLRPLENLNDASVLC





KYLESFYILRKPMASTINTLAGALYKVFSASTDRARSEMRQACFDYLSLGGVFSNGPIALLSGL





NPRPLNLVLHFFAVAVYGVGRLILPFPSPKSIWDGVKLISGASSVIFPIMKAEGIGQIFFPITK





PPNHKSOTW




Squalene
MGVSREENARDEKCHYYENGISLSEKSMSTDIIIVGAGVAGSALAYTLGKDGRRVHVIERDLSL
65
SEQ ID NO:


monooxygenase
QDRIVGELLQPGGYLKLIELGLEDCVEEIDAQQVFGYALYKNGRSTKLSYPLESFDSDVSGRSF

102 in


(Ricinuscommunis)
HNGRFIQRMREKAASLPNVRLEEGTVTSLLEVKGTIKGVQYKTKNGEELTASAPLTIVCDGCFS

W02016050890



NLRRSLCNPKVDIPSCFVALILENSGQKLPSISNGDMANYLKSVVAPQIPPVLSEAFISAIEKG





KIRTMPNRSMPAAPHPTPGALLLGDAFNMRHPLTGGGMTVALSDIVVLRNLLKPLHDLTDASAL





CEYLKSFYSLRKPVASTINTLAGALYKVFSASHDPARNEMRQACFDYLSLGGVFSNGPIALLSG





LNPRPLSLVAHFFAVAIYGVGRLIFPLPSAKGMWMGARMIKVASGIIFPIIRAEGVQHMFFSKT





LSAFSRSQTS




Squalene
MEYQYFVGGIIASALLFVLVCRLAGKRQRRALRDTVDRDEISQNSENGISQSEKNMNTDIIIVG
66
SEQ ID NO:


monooxygenase
AGVAGSTLAYTLGKDGRRVRVIERDLSLQDRIVGELLQPGGYLKLIELGLEDCVEEIDALQVFG

103 in


(Ricinuscommunis)
YALYKNGRSTKLSYPLDSFDSDVSGRSFHNGRFIQRMREKAASLPNVRMEGGTVTSLLEVKGTI

W02016050890



KGVQYKNKNGEELIACAPLTIVCDGCFSNLRRSLCNSKVDIPFCFVALILENCELPYPNHGHVI





LADPSPILFYRISISEIRCLVDIPAGQKLPSISNGEMANYLKSVVAPQIPPELSNAFLSAIEKG





KIRTMPKRSMPAAPHPTPGALLLGDAFNMRHPLTGGVMTVALSDIVVLRSLLRPLHDLTDASAL





CEYLKSFYSLRKPMVSTINTLAGALYRVFSASQDPARDEMRQACFDYLSLGGVFSNGPIALLSG





LNPRPLSLIVHFFAVAVYGVGRLIFPLPSAKRMWMQE




Sgualene
MEYQYLMGGGIMTLLFVLSYRLKRETRASVENARDEVLQNSENGISQSEKAMNTDIKLLLEQIV
67
SEQ ID NO:


monooxygenase
QKIAMLNSIRLEEGTVTSLLEVKRDIKGVQYKTKNGEELTACAPLTIVSHGCFSNLRLHVTPST

104 in


(Ricinuscommunis)
SKFKSFIGLEVDIPSSFAALILGNCELPFPNHGHVILADPSSILFYRISSSEICCLVDVPAGQK

W02016050890



LPSISNGEMANYLKSVVAHQAFKVGLAY




Squalene
MSPISIQLPPRPQLYRSLISSLSLSTYKQPPSPPSFSLTIANSPPQPQPQATVSSKTRTITRLS
68
SEQ ID NO:


monooxygenase
NSSNRVNLLQAEQHPQEPSSDLSYSSSPPHCVSGGYNIKLMEVGTDNYAVIIILGTFFASLFAF

105 in


(Ricinuscommunis)
VFLSILRYNFKNKNKAKIHDETTLKTQNDNVRLPDNGSGNDVIIVGAGVAGAALAYTLGKDGRR

W02016050890



VHVIERDLTEPDRIVGELLQPGGYLKLIELGLEDCVQEIDAQRVLGYALFKDGKNTRLSYPLEK





FHADVAGRSFHNGRFIQRMREKAASLPNVKLEQGTVTSLLEENGTIKGVQYKTKDGQEIRAYAP





LTIVCDGCFSNLRRSLCNPKVDVPSCFVGLVLENCQLPFANHGHVVLADPSPILFYPISSTEVR





CLVDVPGQKVPSIANGEMAKYLKNVVAPQIPPVLHDAFISAIDKGNIRTMPNRSMPADPHPTPG





ALLMGDAFNMRHPLTGGGMTVALSDIVVLRDLLKPLRDLNDATSLTKYLESFYTLRKPVASTIN





TLAGALYKVFSASPDQARKEMRQACFDYLSLGGIFSSGPVALLSGLNPRPLSLVMHFFAVAIYG





VGRLLLPFPSPKSVWIGARLISSASGIIFPIIKAEGVRQMFFPATIPAIYRPPPVKDTSDDEQK





SR




ERG9 protein (S.
MGKLLOLALHPVEMKAALKLKFCRTPLFSIYDQSTSPYLLMCFELLNLTSRSFAAVIRELHPEL
69
SEQ ID NO: 87



cerevisiae)

RNCVTLFYLILRALDTIEDDMSIEHDLKIDLLRHFHZKLLLIKWSFDGKAPDVKDRAVLTDFES

in



ILIEFHKLKPEYQEVIKEITEKMGNGMADYILDENYNLNGLCTVHDYDVYCHYVAGLVGDGLTR

W02016050890



LIVIAKNESLYSNEULYSMGL.DIQKTNIIRDYNEDLVDGRSITWPKEIWSQYAPQLKDITMKP





ENEQLGLDCINHLVLNALSHVIDVLTYLAGIHEQSTFQYCAIPQVMAIATLALVFNNREVLHGN





VKIRKGTTCYLILKSRTLPGCVEIFDYYLRDTKSKIAVQDPNFLKLNIQISKIEQFMEEMYQDK





LPPNVIKPNETPIFLKVKERSRYDDELVPTQQEEEYKFNMVLSIILSVLLGFYYIYTLHRA




Cucurbitadienoi
MWRLKVGAESVGENDEKWLKSISNHLGRQVWEFCPDAGTQQQLLQVHKARKAFHDDRFHRKQSS
70
SEQ ID NO: 43


synthase (S
DLFITIQYGKEVENGGKTAGVKLKEGEEVRKEAVESSLERALSFYSSIQTSDGNWASDLGGPMF

in



grosvenorli)

LLPGLVIALYVTGVLNSVLSKHHRQEMCRYVYNHQNEDGGWGLHIEGPSTMFGSALNYVALRLL

W02016050890



GEDANAGAMPKARAWILDHGGATGITSWGKLWLSVLGVYEWSGNNPLPPEFWLFPYFLPFHPGR





MWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYAVPYHEIDWNKSRNTCAKEDLYYPHPKM





QDILWGSLHHVYEPLFTRWPAKRLREKALQTAMQHIHYEDENTRYICLGPVNKVLNLLCCWVED





PYSDAFKLHLQRVHDYLWVAEDGMKMQGYNGSQLWDTAFSIQAIVSTKLVDNYGPTLRKAHDFV





KSSQIQQDCPGDPNVWYRHIHKGAWPFSTRDHGWLISDCTAEGLKAALMLSKLPSETVGESLER





NRLCDAVNVLLSLQNDNGGFASYELTRSYPWLELINPAETFGDIVIDYPYVECTSATMEALTLF





KKLHPGHRTKEIDTAIVRAANFLENMQRTDGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCLA





IRKACDFLLSKELPGGGWGESYLSCQNKVYTNLEGNRPHLVNTAWVLMALIEAGQAERDPTPLH





RAARLLINSQLENGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYCHRVLTE




Cucurbitadienol
MWRLKVGAESVGEEDEKWVKSVSNHLGRQVWEFCADAAADTPHQLLQIQNARNHFHHNRFHRKQ
71
Disclosed in


synthase
SSDLFLAIQYEKEIAKGAKGGAVKVKEGEEVGKEAVKSTLERALGFYSAVQTRDGNWASDLGGP

Takase et al.


(UniProtKB-Q6BE24)
LFLLPGLVIALHVTGVLNSVLSKHHRVEMCRYLYNHQNEDGGWGLHIEGTSTMFGSALNYVALR

(Org Biomol



LLGEDADGGDGGAMTKARAWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPPEFWLLPYSLP

Chem. 2015



FHPGRMWCHCRMVYLPMSYLYGKRFVGPITPKVLSLRQELYTIPYHEIDWNKSRNTCAKEDLYY

Jul



PHPKMQDILWGSIYHVYEPLFTRWPGKRLREKALQAAMKHIHYEDENSRYICLGPVNKVLNMLC

14;13(26):733



CWVEDPYSDAFKLHLQRVHDYLWVAEDGMRMQGYNGSQLWDTAFSIQAIVATKLVDSYAPTLRK

1-6) which is



AHDFVKDSQIQEDCPGDPNVWFRHIHKGAWPLSTRDHGWLISDCTAEGLKASLMLSKLPSTMVG

incorporated



EPLEKNRLCDAVNVLLSLQNDNGGFASYELTRSYPWLELINPAETFGDIVIDYPYVECTAATME

by reference



ALTLFKKLHPGHRTKEIDTAIGKAANFLEKMQRADGSWYGCWGVCFTYAGWFGIKGLVAAGRTY

in its



NSCLAIRKACEFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVLMALIEAGQGERD

entirety



PAPLHRAARLLMNSQLENGDFVQQEIMGVFNKNCMITYAAYRNIFPIWALGEYCHRVLTE




Cucurbitadienol
MWRLKVGAESVGEEDEKWVKSVSNHLGRQVWEFCADAAADTPHQLLQIQNARNHFHHNRFHRKQ
72
SEQ ID NO: 1


synthase (C. pepo)
SSDLFLAIQYEKEIAKGAKGGAVKVKEGEEVGKEAVKSTLERALGFYSAVQTRDGNWASDLGGP

of



LFLLPGLVIALHVTGVLNSVLSKHHRVEMCRYLYNHQNEDGGWGLHIEGTSTMFGSALNYVALR

W02014/086842



LLGEDADGGDGGAMTKARAWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPPEFWLLPYSLP

which is



FHPGRMWCHCRMVYLPMSYLYGKRFVGPITPKVLSLRQELYTIPYHEIDWNKSRNTCAKEDLYY

incorporated



PHPKMQDILWGSIYHVYEPLFTRWPGKRLREKALQAAMKHIHYEDENSRYICLGPVNKVLNMLC

by reference



CWVEDPYSDAFKLHLQRVHDYLWVAEDGMRMQGYNGSQLWDTAFSIQAIVATKLVDSYAPTLRK

in its



AHDFVKDSQIQEDCPGDPNVWFRHIHKGAWPLSTRDHGWLISDCTAEGLKASLMLSKLPSTMVG

entirety.



EPLEKNRLCDAVNVLLSLQNDNGGFASYELTRSYPWLELINPAETFGDIVIDYPYVECTAATME





ALTLFKKLHPGHRTKEIDTAIGKAANFLEKMQRADGSWYGCWGVCFTYAGWFGIKGLVAAGRTY





NSCLAIRKACEFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVLMALIEAGQGERD





PAPLHRAARLLMNSQLENGDFVQQEIMGVFNKNCMITYAAYRNIFPIWALGEYCHRVLTE




C-terminal portion
LEPNRLCDAVNVILSLQNDNGGFASTELTRSYPWLELINPAFTFGDIVTDYPYVECTSATMFAL
73
SEQ ID NO: 2


of S. Grosvenorrii
TLFKKLHPGHRTKEIDTAIVRAANFLENMORTDGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNN

in


cucurbitadienol
CLAIRKACDFLLSKELPGGGWGESYLSCQNKVYTNLEGNRPHLVNTAWVLMALIEAGQAERDPT

W02014/086842


synthase
PLHRAARLLINSQLENGDFPQQEIMGVFNKMNJCMITYAAYRNIFPIWALGEYCHRVLTE




Codon optimized
ATGTGGAGATTGAAAGTAGGTGCTGAATCCGTAGGTGAAAACGACGAAAAGTGGTTGAAAAGTA
74
SEQ ID NO: 42


cucurbitadienol
TAAGTAATCATTTGGGTAGACAAGTCTGGGAATTTTGTCCAGATGCAGGTACACAACAACAATT

in


synthase gene from
GTTGCAAGTACATAAGGCTAGAAAGGCATTTCATGATGACAGATTCCACAGAAAGCAATCTTCA

W02014/086842



Siraitia

GATTTGTTCATCACCATCCAATACGGCAAGGAAGTAGAAAACGGTGGCAAGACTGCTGGTGTTA





grosvencrii

AATTGAAGGAAGGTGAAGAAGTTAGAAAAGAAGCAGTTGAATCCAGTTTGGAAAGAGCCTTGTC





TTTCTACTCTTCAATCCAAACCTCTGATGGTAATTGGGCATCAGACTTGGGTGGTCCAATGTTC





TTGTTACCTGGTTTGGTCATTGCCTTGTACGTAACTGGTGTTTTGAACTCTGTATTGTCAAAGC





ATCACAGACAAGAAATGTGTAGATACGTTTACAACCATCAAAACGAAGATGGTGGTTGGGGTTT





GCACATTGAAGGTCCATCCACTATGTTTGGTAGTGCATTGAATTATGTCGCCTTAAGATTGTTA





GGTGAAGATGCAAACGCCGGTGCTATGCCTAAGGCAAGAGCCTGGATATTAGACCATGGTGGTG





CTACTGGTATCACATCCTGGGGTAAATTGTGGTTAAGTGTCTTAGGTGTATATGAATGGTCTGG





TAATAACCCATTGCCACCTGAATTTTGGTTGTTCCCTTACTTTTTACCATTCCATCCTGGTAGA





ATGTGGTGTCACTGCAGAATGGTTTACTTGCCAATGTCTTACTTGTACGGCAAGAGATTCGTTG





GTCCAATAACACCTATCGTCTTGTCATTGAGAAAGGAATTGTACGCAGTTCCTTACCATGAAAT





CGATTGGAACAAGTCCAGAAACACCTGTGCTAAGGAAGATTTGTATTACCCACACCCTAAAATG





CAAGACATTTTGTGGGGTAGTTTACATCACGTTTACGAACCATTATTTACTAGATGGCCTGCTA





AAAGATTGAGAGAAAAGGCATTACAAACAGCCATGCAACATATCCACTACGAAGATGAAAACAC





CAGATACATCTGCTTGGGTCCAGTTAACAAGGTCTTGAACTTGTTGTGTTGCTGGGTTGAAGAT





CCTTATTCTGACGCTTTCAAGTTGCATTTGCAAAGAGTACACGATTACTTGTGGGTTGCAGAAG





ACGGTATGAAAATGCAAGGTTACAATGGTTCACAATTGTGGGATACAGCTTTTTCCATTCAAGC





AATAGTCAGTACTAAGTTGGTAGATAACTACGGTCCAACATTAAGAAAAGCTCATGACTTCGTA





AAGTCCAGTCAAATACAACAAGATTGTCCAGGTGACCCTAATGTTTGGTATAGACATATCCACA





AAGGTGCATGGCCATTTTCTACCAGAGATCATGGTTGGTTGATTTCAGACTGTACTGCTGAAGG





TTTGAAGGCTGCATTGATGTTGTCTAAGTTGCCATCAGAAACTGTTGGTGAATCCTTGGAAAGA





AATAGATTATGCGATGCCGTTAACGTCTTGTTGAGTTTGCAAAACGACAACGGTGGTTTCGCTT





CTTACGAATTGACTAGATCATACCCATGGTTGGAATTAATTAATCCTGCTGAAACATTCGGTGA





TATCGTCATTGACTATCCATACGTAGAATGTACCTCCGCTACTATGGAAGCATTGACCTTGTTC





AAGAAGTTGCATCCTGGTCACAGAACAAAGGAAATCGATACCGCAATTGTTAGAGCCGCTAATT





TCTTGGAAAACATGCAAAGAACAGACGGTTCTTGGTATGGTTGTTGGGGTGTTTGCTTTACCTA





CGCTGGTTGGTTCGGTATTAAAGGTTTAGTCGCAGCCGGTAGAACATACAATAACTGTTTGGCC





ATAAGAAAAGCTTGCGATTTCTTGTTATCTAAGGAATTACCAGGTGGTGGTTGGGGTGAATCCT





ACTTGAGTTGTCAAAACAAGGTTTACACTAATTTGGAAGGCAACAGACCTCATTTAGTTAACAC





AGCCTGGGTCTTGATGGCTTTAATCGAAGCCGGTCAAGCTGAAAGAGATCCAACTCCTTTGCAT





AGAGCTGCAAGATTGTTGATCAACTCACAATTGGAAAACGGTGATTTTCCACAACAAGAAATCA





TGGGTGTTTTCAACAAGAACTGCATGATAACATATGCCGCTTACAGAAACATTTTTCCTATATG





GGCTTTGGGTGAATACTGCCACAGAGTCTTGACCGAATAA




Cycloartenol
MWKLKIAEGGNPWLRSTNSHVGRQVWEFDPKLGSPQDLAEIETARNNFHDNRFSHKHSSDLLMR
75
Disclosed in


synthase [Lotus
IQFSKENPIGEVLPKVKVKDVEDVTEEAVVTTLRRAISFHSTLQSHDGHWPGDYGGPMFLMPDL

W02014/086842


japonicus]
VITLSITGALNAVLTDEHRKEMCRYLYNHQNKDGGWGLHIEGPSTMFGSVLNYVTLRLLGEGPN




GenBank Accession
DGQGDMEKARDWILGHGGATYITSWGKMWLSVLGVFEWSGNNPLPPEIWLLPYALPFHPGRMWC




No. BAE53431.1
HCRMVYLPMSYLYGKRFVGPITPTILSLRKELFTIPYHDIDWNQARNLCAKEDLYYPHPLVQDI





LWASLHKVVEPVLMQWPGKKLREKAINSVMEHIHYEDENTRYICIGPVNKVLNMLCCWVEDPNS





EAFKLHLPRIYDYLWIAEDGMKMQGYNGSQLWDTAFAAQAIISTNLIEEYGPTLRKAHTFIKNS





QVLEDCPGDLNKWYRHISKGAWPFSTADHGWPISDCTAEGLKAILSLSKIAPDIVGEPLDAKRL





YDAVNVILSLQNEDGGLATYELTRSYSWLELINPAETFGDIVIDYPYVECTSAAIQALTSFRKL





YPGHRREEIQHSIEKAAAFIEKIQSSDGSWYGSWGVCFTYGTWFGVKGLIAAGKSFSNCSSIRK





ACEFLLSKQLPSGGWGESYLSCQNKVYSNLEGNRPHAVNTGWAMLALIEAEQAKRDPTPLHRAA





LYLINSQMENGDFPQQEIMGVFNKNCMITYAAYRSIFPIWALGEYRCRVLQAR




Hypothetical
MWKLTIGAESVHDNGQSSSWLKSVNNHLGRQVWEFCPQLGSPDELLQLQNVRLSFQAQRFDKKH
76
Disclosed in


protein
SADLLMRFQFEKENPCVNLPQIKVKDDEDVTEEAVTTTLRRAVNFYRKIQAHDGHWPGDYGGPM

W02014/086842


POPTR_0007s15200g
FLLPGLIITLSITGALNAVLSKEHQREMCRYLYNHQNRDGGWGLHIEGPSTMFGTCLNYVTLRL




[Populus
LGEGAEGGDGEMEKGRKWILDHGGATEITSWGKMWLSVLGVHEWSGNNPLPPEVWLCPYLLPMH





trichocarpa]

PGRMWCHCRMVYLPMSYLYGKRFVGPITPTIQSLRKEIYTVPYHEVDWNTARNTCAKEDLYYPH





PLVQDILWASLHYAYEPILTRWPLNRLREKALHKVMQHIHYEDENTQYICIGPVNKVLNMLCCW





VEDPHSEAFKLHLPRVFDYLWIAEDGMKMQGYNGSQLWDTAFAVQAIVSTNLAEEYSGTLRKAH





KYLKDSQVLEDCPGDLNFWYRHISKGAWPFSTADHGWPISDCTAEGLKAVLLLSKLPTEMVGDP





LGVERLRDAVNVILSLQNADGGFATYELTRSYQWLELINPAETFGDIVIDYPYVECTSAAIQAL





ASFKKLYPGHRREEIDNCIAEAANFIEKIQATDGSWYGSWGVCFTYAGWFGIKGLVAAGMTYNS





SSSIRKACDYMLSKELAGGGWGESYLSCQNKVYTNLKDDRPHIVNTGWAMLALIEAGQAERDPI





PLHRAARVLINSQMENGDFPQEEIMGVFNKNCMISY





SAYRNIFPIWALGEYRCQVLQAL




putative 2,3
MWKLKIAEGEDPWLRSVNNHVGRQVWEFDRNLGTPEELIEVEKAREDFSNHKFEKKHSSDLLMR
77
Disclosed in


oxidosdualene
LQLAKENPCSIDLPRVQVKDTEEVTEEAVTTTLRRGLSFYSTIQGHDGHWPGDYGGPLFLMPGL

W02014/086842


cyclase [Actaea
VIALSVTGALNAVLSSEHQRETRRYIYNHQNEDGGWGLHIEGSSTMFITTLNYVTLRLLGEGAD





racemosa]

DGEGAMEKARKWILNHGSATATTSWGKMWLSVLGVFEWSGNNPLPPEMWLLPYCLPFHPGRMWC





HCRMVYLPMSYLYGKRFVGPITPTIESLRKELYSVPYHEIDWNQARNLCAKEDLYYPHPLVQDI





LWTSLHYGVEPILTRWPANKLREKSLLTTMQHIHYEDENTRYICIGPVNKVLNMLCCWVEDPNS





EAFKLHIPRIYDYLWVAEDGMKMQGYNGSQLWDTAFAVQAIISTNLFEDYAPTLRKAHKYIKDS





QVLDDCPGDLNFWYRHISKGAWPFSTADHGWPISDCTAEGLKAALLLSKIPSKSVGDPINAKQL





YDAVNVILSLQNGDGGFATYELTRSYPWLELINPAETFGDIVIDYPYVECTAAAIQALTSFKKL





YPGHRREDIENCVEKAVKFLKEIQAPDGSWYGSWGVCFTYGIWFGIKGLVAAGETFTNSSSIRK





ACDFLLSKELDSGGWGESYLSCQNKVYTNLKGNRPHLVNTGWAMSALIDAGQAERDPKPLHRAA





RVLINSQMDNGDFPQEEIMGVFNRNCMISYSAYRNIFPIWALGEYRCRVLKAP




CGTase
MKSRYKRLTSLALSLSMALGISLPAWASPDTSVDNKVNFSTDVIYQIVTDRFADGDRTNNPAGD
78



AAA22298.1
AFSGDRSNLKLYFGGDWQGIIDKINDGYLTGMGVTALWISQPVENITSVIKYSGVNNTSYHGYW





ARDFKQTNDAFGDFADFQNLIDTLTLITSRSDRLRPQPHVSGRAGTNPGFAENGALYDNGSLLG





AYSNDTAGLFHHNGGTDFSTIEDGIYKNLYDLADINHNNNAMDAYFKSAIDLWLGMGVDGIRFD





AVKQYPFGWQKSFVSSIYGGDHPVFTFGEWYLGADQTDGDNIKFANESGMNLLDFEYAQEVREV





FRDKTETMKDLYEVLASTESQYDYINNMVTFIDNHDMDRFQVAGSGTRATEQALALTLTSRGVP





AIYYGTEQYMTGDGDPNNRAMMTSFNTGTTAYKVIQALAPLRKSNPAIAYGTTTERWVNNDVLI





IERKFGSSAALVAINRNSSAAYPISGLLSSLPAGTYSDVLNGLLNGNSITVGSGGAVTNFTLAA





GGTAVWQYTAPETSPAIGNVGPTMGQPGNIVTIDGRGFGGTAGTVYFGTTAVTGSGIVSWEDTQ





IKAVIPKVAAGKTGVSVKTSSGTASNTFKSFNVLTGDQVTVRFLVNQANTNYGTNVYLVGNAAE





LGTWDPNKAIGPMYNQVIAKYPSWYYDVSVPAGTKLDFKFIKKGGGTVTWEGGGNHTYTTPASG





VGTVTVDWQN




CGTase
MKYLLPTAAAGLLLLAAQPAMAMDIGINSDPSPDTSVDNKVNFSTDVIYQIVTDRFADGDRTNN
79



3WMS_A
PAGDAFSGDRSNLKLYFGGDWQGIIDKINDGYLTGMGVTALWISQPVENITSVIKYSGVNNTSY





HGYWARDFKQTNDAFGDFADFQNLIDTAHAHNIKVVIDFAPNHTSPADRDNPGFAENGALYDNG





SLLGAYSNDTAGLFHHNGGTDFSTIEDGIYKNLIDLADINHNNNAMDAYFKSAIDLWLGMGVDG





IRFDAVKHMPFGWQKSFVSSIYGGDHPVFTFGEWYLGADQTDGDNIKFANESGMNLLDFEYAQE





VREVFRDKTETMKDLYEVLASTESQYDYINNMVTFIDNHDMDRFQVAGSGTRATEQALALTLTS





RGVPAIYYGTEQYMTGDGDPNNRAMMTSFNTGTTAYKVIQALAPLRKSNPAIAYGTTTERWVNN





DVLIIERKFGSSAALVAINRNSSAAYPISGLLSSLPAGTYSDVLNGLLNGNSITVGSGGAVTNF





TLAAGGTAVWQYTAPETSPAIGNVGPTMGQPGNIVTIDGRGFGGTAGTVYFGTTAVTGSGIVSW





EDTQIKAVIPKVAAGKTGVSVKTSSGTASNTFKSFNVLTGDQVTMRFLVNQANTNYGTNVYLVG





NAAELGSWDPNKAIGPMYNQVIAKYPSWYYDVSVPAGTKLDFKFIKKGGGTVTWEGGGNHTYTT





PASSVGTVTVDWQNLE




CGTase
SPDTSVDNKVNFSTDVIYQIVTDRFADGDRTNNPAGDAFSGDRSNLKLYFGGDWQGIIDKINDG
80



4JCL_A
YLTGMGVTALWISQPVENITSVIKYSGVNNTSYHGYWARDFKQTNDAFGDFADFQNLIDTAHAH





NIKVVIDFAPNHTSPADRDNPGFAENGGMYDNGSLLGAYSNDTAGLFHHNGGTDFSTIEDGIYK





NLYDLADINHNNNAMDAYFKSAIDLWLGMGVDGIRFDAVKHMPFGWQKSFVSSIYGGDHPVFTF





GEWYLGADQTDGDNIKFANESGMNLLDFEYAQEVREVFRDKTETMKDLYEVLASTESQYDYINN





MVTFIDNHDMDRFQVAGSGTRATEQALALTLTSRGVPAIYYGTEQYMTGDGDPNNRAMMTSFNT





GTTAYKVIQALAPLRKSNPAIAYGTTTERWVNNDVLIIERKFGSSAALVAINRNSSAAYPISGL





LSSLPAGTYSDVLNGLLNGNSITVGSGGAVTNFTLAAGGTAVWQYTAPETSPAIGNVGPTMGQP





GNIVTIDGRGFGGTAGTVYFGTTAVTGSGIVSWEDTQIKAVIPKVAAGKTGVSVKTSSGTASNT





FKSFNVLTGDQVTVRFLVNQANTNYGTNVYLVGNAAELGSWDPNKAIGPMYNQVIAKYPSWYYD





VSVPAGTKLDFKFIKKGGGTVTWEGGGNHTYTTPASGVGTVTVDWQN




CGTase
MKSRYKRLTSLALSLSMALGISLPAWASPDTSVDNKVNFSTDVIYQIVTDRFADGDRTNNPAGD
81



WP_036618292.1
AFSGDRSNLKLYFGGDWQGIIDKINDGYLTGMGVTALWISQPVENITSVIKYSGVNNTSYHGYW





ARDFKQTNDAFGDFADFQNLIDTAHAHNIKVVIDFAPNHTSPADRDNPGFAENGALYDNGSLLG





AYSNDTAGLFHHNGGTDFSTIEDGIYKNLYDLADINHNNNAMDAYFKSAIDLWLGMGVDGIRFD





AVKHMPFGWQKSFVSSIYGGDHPVFTFGEWYLGADQTDGDNIKFANESGMNLLDFEYAQEVREV





FRDKTETMKDLYEVLASTESQYDYINNMVTFIDNHDMDRFQVAGSGTRATEQALALTLTSRGVP





AIYYGTEQYMTGDGDPNNRAMMTSFNTGTTAYKVIQALAPLRKSNPAIAYGTTTERWVNNDVLI





IERKFGSSAALVAINRNSSAAYPISGLLSSLPAGTYSDVLNGLLNGNSITVGSGGAVTNFTLAA





GGTAVWQYTAPETSPAIGNVGPTMGQPGNIVTIDGRGFGGTAGTVYFGTTAVTGSGIVSWEDTQ





IKAVIPKVAAGKTGVSVKTSSGTASNTFKSFNVLTGDQVTVRFLVNQANTNYGTNVYLVGNAAE





LGSWDPNKAIGPMYNQVIAKYPSWYYDVSVPAGTKLDFKFIKKGGGTVTWEGGGNHTYTTPASG





VGTVTVDWQN




CGTase
MKSRYKRLTSLALSLSMALGISLPAWASPDTSVDNKVNFSTDVIYQIVTDRFADGDRTNNPAGD
82



P04830.2
AFSGDRSNLKLYFGGDWQGIIDKINDGYLTGMGVTALWISQPVENITSVIKYSGVNNTSYHGYW





ARDFKQTNDAFGDFADFQNLIDTAHAHNIKVVIDFAPNHTSPADRDNPGFAENGGMYDNGSLLG





AYSNDTAGLFHHNGGTDFSTIEDGIYKNLYDLADINHNNNAMDAYFKSAIDLWLGMGVDGIRFD





AVKHMPFGWQKSFVSSIYGGDHPVFTFGEWYLGADQTDGDNIKFANESGMNLLDFEYAQEVREV





FRDKTETMKDLYEVLASTESQYDYINNMVTFIDNHDMDRFQVAGSGTRATEQALALTLTSRGVP





AIYYGTEQYMTGDGDPNNRAMMTSFNTGTTAYKVIQALAPLRKSNPAIAYGTTTERWVNNDVLI





IERKFGSSAALVAINRNSSAAYPISGLLSSLPAGTYSDVLNGLLNGNSITVGSGGAVTNFTLAA





GGTAVWQYTAPETSPAIGNVGPTMGQPGNIVTIDGRGFGGTAGTVYFGTTAVTGSGIVSWEDTQ





IKAVIPKVAAGKTGVSVKTSSGTASNTFKSFNVLTGDQVTVRFLVNQANTNYGTNVYLVGNAAE





LGSWDPNKAIGPMYNQVIAKYPSWYYDVSVPAGTKLDFKFIKKGGGTVTWEGGGNHTYTTPASG





VGTVTVDWQN




CGTase
MKSRYKRLTSLALSLSMALGISLPAWASPDTSVDNKVNFSTDVIYQIVTDRFADGDRTNNPAGD
83



AAC04359.1
AFSGDRSNLKLYFGGDWQGIIDKINDGYLTGMGVTALWISQPVENITSVIKYSGVNNTSYHGYW





ARDFKQTNDAFGDFADFQNLIDTAHAHNIKVVIDFAPNHTSPADRDNPGFAENGALYDNGSLLG





AYSNDTAGLFHHNGGTDFSTIEDGIYKNLYDLADINHNNNAMDAYFKSAIDLWLGMGVDGIRFD





AVKHMPFGWQKSFVSSIYGGDHPVFTFGEWYLGADQTDGDNIKFANESGMNLLDFEYAQEVREV





FRDKTETMKDLYEVLASTESQYDYINNMVTFIDNHDMDRFQVAGSGTRATEQALALTLTSRGVP





AIYYGTEQYMTGDGDPNNRAMMTSFNTGTTAYKVIQALAPLRKSNPAIAYGTTTERWVNNDVLI





IERKFGSSAALVAINRNSSAAYPISGLLSSLPAGTYSDVLNGLLNGNSITVGSGGAVTNFTLAA





GGTAVWQYTAPETSPAIGNVGPTMGQPGNIVTIDGRGFGGTAGTVYFGTTAVTGSGIVSWEDTQ





IKAVIPKVAAGKTGVSVKTSSGTASNTFKSFNVLTGDQVTVRFLVNQANTNYGTNVYLVGNAAE





LGSWDPNKAIGPMYNQVIAKYPSWYYDVSVPAGTKLDFKFIKKGGGTVTWEGGGNHTYTTPASG





VGTVTVDWQN




CGTase
MKSRYKRLTSLALSLSMALGISLPAWASPDTSVDNKVNFSTDVIYQIVTDRFADGDRTNNPAGD
84



CAA41773.1
AFSGDRSNLKLYFGGDWQGIIDKINDGYLTGMGVTALWISQPVENITSVIKYSGVNNTSYHGYW





ARDFKQTNDAFGDFADFQNLIDTAHAHNIKVVIDFAPNHTSPADRDNPGFAENGGMYDNGSLLG





AYSNDTAGLFHHNGGTDFSTIEDGIYKNLYDLADINHNNNAMDAYFKSAIDLWLGMGVDGIRFD





AVKHMPFGWQKSFVSSIYGGDHPVFTFGEWYLGADQTDGDNIKFANESGMNLLDFEYAQEVREV





FRDKTETMKDLYEVLASTESQYDYINNMVTFIDNHDMDRFQVAGSGTRATEQALALTLTSRGVP





AIYYGTEQYMTGDGDPNNRAMMTSFNTGTTAYKVIQALAPLRKSNPAIAYGTTTERWVNNDVLI





IERKFGSSAALVAINRNSSAAYPISGLLSSLPAGTYSDVLNGLLNGNSITVGSGGAVTNFTLAA





GGTAVWQYTAPETSPAIGNVGPTMGQPGNIVTIDGRGFGGTAGTVYFGTTAVTGSGIVSWEDTQ





IKAVIPKVAAGKTGVSVKTSSGTASNTFKSFNVLTGDQVTVRFLVNQANTNYGTNVYLVGNAAE





LGSWDPNKAIGPMYNQVIAKYPSWYYDVSVPAGTKLDFKFIKKGGGTVTWEGGGNHTYTTPASG





VGTVTVDWQN




CGTase
SPDTSVDNKVNFSTDVIYQIVTDRFADGDRTNNPAGDAFSGDRSNLKLYFGGDWQGIIDKINDG
85



AGT21379.1
YLTGMGVTALWISQPVENITSVIKYSGVNNTSYHGYWARDFKQTNDAFGDFADFQNLIDTAHAH





NIKVVIDFAPNHTSPADRDNPGFAENGALYDNGSLLGAYSNDTAGLFHHNGGTDFSTIEDGIYK





NLYDLADINHNNNAMDAYFKSAIDLWLGMGVDGIRFDAVKHMPFGWQKSFVSSIYGGDHPVFTF





GEWYLGADQTDGDNIKFANESGMNLLDFEYAQEVREVFRDKTETMKDLYEVLASTESQYDYINN





MVTFIDNHDMDRFQVAGSGTRATEQALALTLTSRGVPAIYYGTEQYMTGDGDPNNRAMMTSFNT





GTTAYKVIQALAPLRKSNPAIAYGTTTERWVNNDVLIIERKFGSSAALVAINRNSSAAYPISGL





LSSLPAGTYSDVLNGLLNGNSITVGSGGAVTNFTLAAGGTAVWQYTAPETSPAIGNVGPTMGQP





GNIVTIDGRGFGGTAGTVYFGTTAVTGSGIVSWEDTQIKAVIPKVAAGKTGVSVKTSSGTASNT





FKSFNVLTGDQVTMRFLVNQANTNYGTNVYLVGNAAELGSWDPNKAIGPMYNQVIAKYPSWYYD





VSVPAGTKLDFKFIKKGGGTVTWEGGGNHTYTTPASSVGTVTVDWQN




CGTase
SPDTSVDNKVNFSTDVIYQIVTDRFADGDRTNNPAGDAFSGDRSNLKLYFGGDWQGIIDKINDG
86



AGT95840.1
YLTGMGVTALWISQPVENITSVIKYSGVNNTSYHGYWARDFKQTNDAFGDFADFQNLIDTAHAH





NIKVVIDFAPNHTSPADRDNPGFAENGALYDNGSLPGAYSNDTAGLFHHNGGTDFSTIEDGIYK





NLYDLADINHNNNAMDAYFKSAIDLWLGMGVDGIRFDAVKHMPFGWQKSFVSSIYGGDHPVFTF





GEWYLGADQTDGDNIKFANESGMNLLDFEYAQEVREVFRDKTETMKDLYEVLASTESQYDYINN





MVTFIDNHDMDRFQVAGSGTRATEQALALTLTSRGVPAIYYGTEQYMTGDGDPNNRAMMTSFNT





GTTAYKVIQALAPLRKSNPAIAYGTTTERWVNNDVLIIERKFGSSAALVAINRNSSAAYPISGL





LSSLPAGTYSDVLNGLLNGNSITVGSGGAVTNFTLAXGXTAVWQYTAPETSPAIGDVGPTMGQP





GNIVTIDGRGFGGTAGTVYFGTTAVTGSGIVSWEDTQIKAVIPKVAAGKTGVSVKTSSGTASNT





FKSFNVLTGDQVTVRFLVNQANTNYGTNVYLVGNAAELDSWDPNKAIGPMYNQVIAKYPSWYYD





VSVPAGTKLDFKFIKKGGGTVTWEGGGNHTYTTPASGVGTVTADWQN




CGTase
MKKQVKWLTSVSMSVGIALGAALPVWASPDTSVNNKLNFSTDTVYQIVTDRFVDGNSANNPTGA
87



P31835.1
AFSSDHSNLKLYFGGDWQGITNKINDGYLTGMGITALWISQPVENITAVINYSGVNNTAYHGYW





PRDFKKTNAAFGSFTDFSNLIAAAHSHNIKVVMDFAPNHTNPASSTDPSFAENGALYNNGTLLG





KYSNDTAGLFHHNGGTDFSTTESGIYKNLYDLADINQNNNTIDSYLKESIQLWLNLGVDGIRFD





AVKHMPQGWQKSYVSSIYSSANPVFTFGEWFLGPDEMTQDNINFANQSGMHLLDFAFAQEIREV





FRDKSETMTDLNSVISSTGSSYNYINNMVTFIDNHDMDRFQQAGASTRPTEQALAVTLTSRGVP





AIYYGTEQYMTGNGDPNNRGMMTGFDTNKTAYKVIKALAPLRKSNPALAYGSTTQRWVNSDVYV





YERKFGSNVALVAVNRSSTTAYPISGALTALPNGTYTDVLGGLLNGNSITVNGGTVSNFTLAAG





GTAVWQYTTTESSPIIGNVGPTMGKPGNTITIDGRGFGTTKNKVTFGTTAVTGANIVSWEDTEI





KVKVPNVAAGNTAVTVTNAAGTTSAAFNNFNVLTADQVTVRFKVNNATTALGQNVYLTGNVAEL





GNWTAANAIGPMYNQVEASYPTWYFDVSVPANTALQFKFIKVNGSTVTWEGGNNHTFTSPSSGV





ATVTVDWQN




CGTase
MKSRYKRLTSLALSLSMALGISLPAWASPDTSVDNKVNFSTDVIYQIVTDRFADGDRTNNPAGD
88



KFM94552.1
AFSGDRSNLKLYFGGDWQGIIDKINDGYLTGMGVTALWISQPVENITSVIKYSGVNNTSYHGYW





ARDFKQTNDAFGDFADFQNLIDTAHAHNIKVVIDFAPNHTSPADRDNPGFAENGALYDNGSLLG





AYSNDTAGLFHHNGGTDFSTIEDGIYKNLYDLADINHNNNAMDAYFKSAIDLWLGMGVDGIRFD





AVKHMPFGWQKSFVSSIYGGDHPVFTFGEWYLGADQTDGDNIKFANESGMNLLDFEYAQEVREV





FRDKTETMKDLYEVLASTESQYDYINNMVTFIDNHDMDRFQVAGSGTRATEQALALTLTSRGVP





AIYYGTEQYMTGDGDPNNRAMMTSFNTGTTAYKVIQALAPLRKSNPAIAYGTTTERWVNNDVLI





IERKFGSSAALVAINRNSSAAYPISGLLSSLPAGTYSDVLNGLLNGNSITVGSGGAVTNFTLAA





GGTAVWQYTAPETSPAIGNVGPTMGQPGNIVTIDGRGFGGTAGTVYFGTTAVTGSGIVSWEDTQ





IKAVIPKVAAGKTGVSVKTSSGTASNTFKSFNVLTGDQVTVRFLVNQANTNYGTNVYLVGNAAE





LGSWDPNKAIGPMYNQVIAKYPSWYYDVSVPAGTKLDFKFIKKGGGTVTWEGGGNHTYTTPASG





VGTVTVDWQN




Toruzyme
MKKTLKLLSILLITIALLFSTIPSVPAAPDTSVSNVVNYSTDVIYQIVTDRFLDGNPSNNPTGD
89



AJE25826.1
LYDPTHTSLKKYFGGDWQGIINKINDGYLTGMGITAIWISQPVENIYAVLPDSTFGGSTSYHGY





WARDFKKTNPFFGSFTDFQNLIATAHAHNIKVIIDFAPNHTSPASETDPTYGENGRLYDNGELL





GGYTNDTNGYFHHYGGTNFSSYEDGIYRNLFDLADLDQQNNTIDSYLKAAIKLWLDMGIDGIRM





DAVKHMAFGWQKNFMDSILSYRPVFTFGEWYLGTNEVDPNNTYFANESGMNLLDFRFAQKVRQV





FRDNTDTMYGLDSMIQSTAADYNFINDMVTFIDNHDMDRFYTGGSTRPVEQALAFTLTSRGVPA





IYYGTEQYMTGNGDPYNRAMMTSFNTNTTAYNVIKKLAPLRKSNPAIAYGTQKQRWVNNDVYIY





ERQFGNNVALIAINRNLSTSYNITGLYTALPAGTYSDVLGGLLNGNSITVSSNGSVTSFTLAPG





AVAVWQYVSTTNPPLIGHVGPTMTKAGQTITIDGRGFGTTAGQVLFGTTPATIVSWEDTEVKVK





VPALTPGKYNITLKTASEVTSNSYNNINVLTGNQVCVRFVVNNATTVWGENVYLTGNVAELGNW





DTSKAIGPMFNQVVYQYPTWYYDVSVPAGTTIEFKFIKKNGSTVTWEGGYNHVYTTPTSGTATV





IVNWQN




Toruzyme
MRKNVKLFAAIILFFSLLLTSCGSKDTSSNITPKSDVIYQVMIDRFYNGDKSNDDPKISKGMFD
90



KH062967.1
PTYTNWRMYWGGDLKGLTEKIPYIKGMGVTAIWISPVVDNINKPAIYNGEINAPYHGYWARDFK





RVEEHFGSWEDFDNFVKTAHANGIKVILDFAPNHTSPADKNNPDFAENGALYDDGNLLGTYSND





VNKLFHHNGGITNWNNLKDLQDKNLFDLADLDQSNPIVDKYLKDSIKLWFSHGIDGVRLDAVKH





MPMEWVKSFADTIYGVNKDAILFGEWMLNGPTDPLYGYNIQFANTSGFSVLDFMLNSAIKDVFE





KGYGFDRLNDTIEETNKDYDNPYKLVTFVDNHDMPRFLSVNDDKDKLHEAIAFIMTSRGIPAIY





YGTEQYLHNDTNGGNDPYNRPMMEKFDENTTAYVLIRELSNLRKATQALQYGKTVSRYVSNDVY





IYERQYGKDIVVVAINKGEETTVKNIETSLRKGKYSDYLKGLLKGGNLKVERGNSENDILSITL





PKDSVSIWTNVKVK




Toruzyme
MKKTLKLLSILLITIALLFSSIPSVPAAPDTSVSNVVNYSTDVIYQIVTDRFLDGNPNNNPTGD
91



KH061869.1
LYDPTHTSLKKYFGGDWQGIINKINDGYLTGMGITAIWISQPVENIYAVLPDSTFGGSTSYHGY





WARDFKKTNPFFGSFTDFQNLIATAHAHNIKVIIDFAPNHTSPASETDPTYGENGRLYDNGVLL





GGYTNDTNGYFHHYGGTNFSSYEDGIYRNLFDLADLDQQNNTIDSYLKAAIKLWLDMGIDGIRM





DAVKHMAFGWQKNFMDSILSYRPVFTFGEWYLGTNEVDPNNTYFANESGMSLLDFRFAQKVRQV





FRDNTDTMYGLDSMLQSTAADYNFINDMVTFIDNHDMDRFYTGGSTRPVEQALAFTLTSRGVPA





IYYGTEQYMTGNGDPYNRAMMTSFDTTTTAYNVIKKLAPLRKSNPAIAYGTQKQRWINNDVYIY





ERQFGNNVALVAINRNLSTSYYITGLYTALPAGTYSDVLGGLLNGNNISVASDGSVTPFTLAPG





EVAVWQYVSTTNPPLIGHVGPTMTKAGQTITIDGRGFGTTAGQVLFGTTPATIVSWEDTEVKVK





VPALTPGKYNVTLKTASGVTSNSYNNINVLTGNQVCVRFVVNNASTVWGENVYLTGNVAELGSW





DTSKAIGPMFNQVVYQYPTWYYDVSVPAGTTIEFKFIKKNGSTVTWEGGYNHVYTTPTSGTATV





IVNWQN




Toruzyme
MKKTLKLLSILLITIALLFSSIPSVPAAPDTSVSNVVNYSTDVIYQIVTDRFLDGNPSNNPTGD
92



KH061665.1
LYDPTHTSLKKYFGGDWQGIINKINDGYLTGMGITAIWISQPVENIYAVLPDSTFGGSTSYHGY





WARDFKKTNPFFGSFTDFQNLIATAHAHNIKVIIDFAPNHTSPASETDPTYGENGRLYDNGVLL





GGYTNDTNGYFHHYGGTNFSSYEDGIYRNLFDLADLDQQNNTIDSYLKVAIKLWLNMGIDGIRM





DAVKHMAFGWQKNFMDSILSYRPVFTFGEWYLGTNEVDPNNTYFANESGMSLLDFRFAQKVRQV





FRDNTDTMYGLDSMLQSTAADYNFINDMVTFIDNHDMDRFYTGGSTRPVEQALAFTLTSRGVPA





IYYGTEQYMTGNGDPYNRAMMTSFDTTTTAYNVIKKLAPLRKSNPAIAYGTQKQRWINNDVYIY





ERQFGNNVALVAINRNLSTSYYITGLYTALPAGTYSDVLGGLLNGNNISVASDGSVTPFTLAPG





EVAVWQYVSTTNPPLIGHVGPTMTKAGQTITIDGRGFGTTAGQVLFGTTPATIVSWEDTEVKVK





VPALTPGKYNVTLKTASGVTSNSYNNINVLTGNQVCVRFVVNNASTVWGENVYLTGNVAELGSW





DTSKAIGPMFNQVVYQYPTWYYDVSVPAGTTIEFKFIKKNGSTVTWEGGYNHVYTTPTSGTATV





IVNWQN




Toruzyme
MKKTLKLLSILLITIALLFSSIPSVPAAPDTSVSNVVNYSTDVIYQIVTDRFLDGNPNNNPTGD
93



WP_042834654.1
LYDPTHTSLKKYFGGDWQGIINKINDGYLTGMGITAIWISQPVENIYAVLPDSTFGGSTSYHGY





WARDFKKTNPFFGSFTDFQNLIATAHAHNIKVIIDFAPNHTSPASETDPTYGENGRLYDNGVLL





GGYTNDTNGYFHHYGGTNFSSYEDGIYRNLFDLADLDQQNNTIDSYLKAAIKLWLDMGIDGIRM





DAVKHMAFGWQKNFMDSILSYRPVFTFGEWYLGTNEVDPNNTYFANESGMSLLDFRFAQKVRQV





FRDNTDTMYGLDSMLQSTAADYNFINDMVTFIDNHDMDRFYTGGSTRPVEQALAFTLTSRGVPA





IYYGTEQYMTGNGDPYNRAMMTSFDTTTTAYNVIKKLAPLRKSNPAIAYGTQKQRWINNDVYIY





ERQFGNNVALVAINRNLSTSYYITGLYTALPAGTYSDVLGGLLNGNNISVASDGSVTPFTLAPG





EVAVWQYVSTTNPPLIGHVGPTMTKAGQTITIDGRGFGTTAGQVLFGTTPATIVSWEDTEVKVK





VPALTPGKYNVTLKTASGVTSNSYNNINVLTGNQVCVRFVVNNASTVWGENVYLTGNVAELGSW





DTSKAIGPMFNQVVYQYPTWYYDVSVPAGTTIEFKFIKKNGSTVTWEGGYNHVYTTPTSGTATV





IVNWQN




Toruzyme
MKKTLKLLSILLITIALLFSSIPSVPAAPDTSVSNVVNYSTDVIYQIVTDRFLDGNPSNNPTGD
94



WP_042834464.1
LYDPTHTSLKKYFGGDWQGIINKINDGYLTGMGITAIWISQPVENIYAVLPDSTFGGSTSYHGY





WARDFKKTNPFFGSFTDFQNLIATAHAHNIKVIIDFAPNHTSPASETDPTYGENGRLYDNGVLL





GGYTNDTNGYFHHYGGTNFSSYEDGIYRNLFDLADLDQQNNTIDSYLKVAIKLWLNMGIDGIRM





DAVKHMAFGWQKNFMDSILSYRPVFTFGEWYLGTNEVDPNNTYFANESGMSLLDFRFAQKVRQV





FRDNTDTMYGLDSMLQSTAADYNFINDMVTFIDNHDMDRFYTGGSTRPVEQALAFTLTSRGVPA





IYYGTEQYMTGNGDPYNRAMMTSFDTTTTAYNVIKKLAPLRKSNPAIAYGTQKQRWINNDVYIY





ERQFGNNVALVAINRNLSTSYYITGLYTALPAGTYSDVLGGLLNGNNISVASDGSVTPFTLAPG





EVAVWQYVSTTNPPLIGHVGPTMTKAGQTITIDGRGFGTTAGQVLFGTTPATIVSWEDTEVKVK





VPALTPGKYNVTLKTASGVTSNSYNNINVLTGNQVCVRFVVNNASTVWGENVYLTGNVAELGSW





DTSKAIGPMFNQVVYQYPTWYYDVSVPAGTTIEFKFIKKNGSTVTWEGGYNHVYTTPTSGTATV





IVNWQN




Cyclomaltodextrin
MRRWLSLVLSMSFVFSAIFIVSDTQKVTVEAAGNLNKVNFTSDVVYQIVVDRFVDGNTSNNPSG
95



glucanotransferase
ALFSSGCTNLRKYCGGDWQGIINKINDGYLTDMGVTAIWISQPVENVFSVMNDASGSASYHGYW




[Geobacillus
ARDFKKPNPFFGTLSDFQRLVDAAHAKGIKVIIDFAPNHTSPASETNPSYMENGRLYDNGTLLG





stearothermophilus]

GYTNDANMYFHHNGGTTFSSLEDGIYRNLFDLADLNHQNPVIDRYLKDAVKMWIDMGIDGIRMD




GenBank:
AVKHMPFGWQKSLMDEIDNYRPVFTFGEWFLSENEVDANNHYFANESGMSLLDFRFGQKLRQVL




CAA41770.1
RNNSDNWYGFNQMIQDTASAYDEVLDQVTFIDNHDMDRFMIDGGDPRKVDMALAVLLTSRGVPN





IYYGTEQYMTGNGDPNNRKMMSSFNKNTRAYQVIQKLSSLRRNNPALAYGDTEQRWINGDVYVY





ERQFGKDVVLVAVNRSSSSNYSITGLFTALPAGTYTDQLGGLLDGNTIQVGSNGSVNAFDLGPG





EVGVWAYSATESTPIIGHVGPMMGQVGHQVTIDGEGFGTNTGTVKFGTTAANVVSWSNNQIVVA





VPNVSPGKYNITVQSSSGQTSAAYDNFEVLTNDQVSVRFVVNNATTNLGQNIYIVGNVYELGNW





DTSKAIGPMFNQVVYSYPTWYIDVSVPEGKTIEFKFIKKDSQGNVTWESGSNHVYTTPTNTTGK





IIVDWQN




cyclomaltodextrin
MRRWLSLVLSMSFVFSAIFIVSDTQKVTVEAAGNLNKVNFTSDVVYQIVVDRFVDGNTSNNPSG
96



glucanotransferase
ALFSSGCTNLRKYCGGDWQGIINKINDGYLTDMGVTAIWISQPVENVFSVMNDASGSASYHGYW




[Geobacillus
ARDFKKPNPFFGTLSDFQRLVDAAHAKGIKVIIDFAPNHTSPASETNPSYMENGRLYDNGTLLG





stearothermophilus]

GYTNDANMYFHHNGGTTFSSLEDGIYRNLFDLADLNHQNPVIDRYLKDAVKMWIDMGIDGIRMD




GenBank:
AVKHMPFGWQKSLMDEIDNYRPVFTFGEWFLSENEVDANNHYFANESGMSLLDFRFGQKLRQVL




CAA41771.1
RNNSDNWYGFNQMIQDTASAYDEVLDQVTFIDNHDMDRFMIDGGDPRKVDMALAVLLTSRGVPN





IYYGTEQYMTGNGDPNNRKMMSSFNKNTRAYQVIQKLSSLRRNNPALAYGDTEQRWINGDVYVY





ERQFGKDVVLVAVNRSSSSNYSITGLFTALPAGTYTDQLGGLLDGNTIQVGSNGSVNAFDLGPG





EVGVWAYSATESTPIIGHVGPMMGQVGHQVTIDGEGFGTNTGTVKFGTTAANVVSWSNNQIVVA





VPNVSPGKYNITVQSSSGQTSAAYDNFEVLTNDQVSVRFVVNNATTNLGQNIYIVGNVYELGNW





DTSKAIGPMFNQVVYSYPTWYIDVSVPEGKTIEFKFIKKDSQGNVTWESGSNHVYTTPTNTTGK





IIVDWQN





MRRWLSLVLSMSFVFSAIFIVSDTQKVTVEAAGNLNKVNFTSDVVYQIVVDRFVDGNTSNNPSG
97




ALFSSGCTNLRKYCGGDWQGIINKINDGYLTDMGVTAIWISQPVENVFSVMNDASGSASYHGYW




cyclomaltodextrin
ARDFKKPNPFFGTLSDFQRLVDAAHAKGIKVIIDFAPNHTSPASETNPSYMENGRLYDNGTLLG




glucanotransferase
GYTNDANMYFHHNGGTTFSSLEDGIYRNLFDLADLNHQNPVIDRYLKDAVKMWIDMGIDGIRMD




[Geobacillus
AVKHMPFGWQKSLMDEIDNYRPVFTFGEWFLSENEVDANNHYFANESGMSLLDFRFGQKLRQVL





stearothermophilus]

RNNSDNWYGFNQMIQDTASAYDEVLDQVTFIDNHDMDRFMIDGGDPRKVDMALAVLLTSRGVPN




GenBank:
IYYGTEQYMTGNGDPNNRKMMSSFNKNTRAYQVIQKLSSLRRNNPALAYGDTEQRWINGDVYVY




CAA41772.1
ERQFGKDVVLVAVNRSSSSNYSITGLFTALPAGTYTDQLGGLLDGNTIQVGSNGSVNAFDLGPG





EVGVWAYSATESTPIIGHVGPMMGQVGHQVTIDGEGFGTNTGTVKFGTTAANVVSWSNNQIVVA





VPNVSPGKYNITVQSSSGQTSAAYDNFEVLTNDQVSVRFVVNNATTNLGQNIYIVGNVYELGNW





DTSKAIGPMFNQVVYSYPTWYIDVSVPEGKTIEFKFIKKDSQGNVTWESGSNHVYTTPTNTTGK





IIVDWQN




Chain A,
AGNLNKVNFTSDVVYQIVVDRFVDGNTSNNPSGALFSSGCTNLRKYCGGDWQGIINKINDGYLT
98



Cyclodextrin
DMGVTAIWISQPVENVFSVMNDASGSASYHGYWARDFKKPNPFFGTLSDFQRLVDAAHAKGIKV




Glucanotransferase
IIDFAPNHTSPASETNPSYMENGRLYDNGTLLGGYTNDANMYFHHNGGTTFSSLEDGIYRNLFD




(E.C.2.4.1.19;
LADLNHQNPVIDRYLKDAVKMWIDMGIDGIRMDAVKHMPFGWQKSLMDEIDNYRPVFTFGEWFL




CGTase)
SENEVDANNHYFANESGMSLLDFRFGQKLRQVLRNNSDNWYGFNQMIQDTASAYDEVLDQVTFI




PDB: 1CYG_A
DNHDMDRFMIDGGDPRKVDMALAVLLTSRGVPNIYYGTEQYMTGNGDPNNRKMMSSFNKNTRAY





QVIQKLSSLRRNNPALAYGDTEQRWINGDVYVYERQFGKDVVLVAVNRSSSSNYSITGLFTALP





AGTYTDQLGGLLDGNTIQVGSNGSVNAFDLGPGEVGVWAYSATESTPIIGHVGPMMGQVGHQVT





IDGEGFGTNTGTVKFGTTAANVVSWSNNQIVVAVPNVSPGKYNITVQSSSGQTSAAYDNFEVLT





NDQVSVRFVVNNATTNLGQNIYIVGNVYELGNWDTSKAIGPMFNQVVYSYPTWYIDVSVPEGKT





IEFKFIKKDSQGNVTWESGSNHVYTTPTNTTGKIIVDWQN




Cyclomaltodextrin
MRRWLSLVLSMSFVFSAIFIVSDTQKVTVEAAGNLNKVNFTSDVVYQIVVDRFVDGNTSNNPSG
99



glucanotransferase
ALFSSGCTNLRKYCGGDWQGIINKINDGYLTDMGVTAIWISQPVENVFSVMNDASGSASYHGYW




(also known as
ARDFKKPNPFFGTLSDFQRLVDAAHAKGIKVIIDFAPNHTSPASETNPSYMENGRLYDNGTLLG




Cyclodextrtn-
GYTNDANMYFHHNGGTTFSSLEDGIYRNLFDLADLNHQNPVIDRYLKDAVKMWIDMGIDGIRMD




glycosyltransferas
AVKHMPFGWQKSLMDEIDNYRPVFTFGEWFLSENEVDANNHYFANESGMSLLDFRFGQKLRQVL




e; CGTase)
RNNSDNWYGFNQMIQDTASAYDEVLDQVTFIDNHDMDRFMIDGGDPRKVDMALAVLLTSRGVPN




UntProtKB/SwIss-
IYYGTEQYMTGNGDPNNRKMMSSFNKNTRAYQVIQKLSSLRRNNPALAYGDTEQRWINGDVYVY




Prot: P31797.1
ERQFGKDVVLVAVNRSSSSNYSITGLFTALPAGTYTDQLGGLLDGNTIQVGSNGSVNAFDLGPG





EVGVWAYSATESTPIIGHVGPMMGQVGHQVTIDGEGFGTNTGTVKFGTTAANVVSWSNNQIVVA





VPNVSPGKYNITVQSSSGQTSAAYDNFEVLTNDQVSVRFVVNNATTNLGQNIYIVGNVYELGNW





DTSKAIGPMFNQVVYSYPTWYIDVSVPEGKTIEFKFIKKDSQGNVTWESGSNHVYTTPTNTTGK





IIVDWQN




hypothetical
MSRNGAVTPDWQFTVEVQEGETITYKYVKGGSWDQEGLADHTREDDNDDDVSYYGYGAIGTDLK
100



protein
VTVHNEGNNTMIVQDRILRWIDMPVVIEEVQKQGSQVTIKGNAIKNGVLTINGERVPIDGRMAF




AA906_05840
SYTFTPASHQKEVSIHIEPSAESKTAIFNNDGGAIAKNTKDYVLNLETKQLREGKLTTPPSNGD




(Geobacillus
SPESDWPGSETPSHDGGATPGNGTSPGSGGPSDGTSPGGSVPPGGTAPPGNEAPPSRPPQKPSP





stearothermophilus]

SKPKEKPRKPTTPPGQVKKVYWDGVELKKGQIGRLTVQKPINLWKRTKDGRLVFVRILQPGEVY





RVYGYDVRFGGQYAVGGGYYVTDIDTHIRYETPSKEKLKLVNGE




Maltodextrin
MSRNGAVTPDWQFTVEVQEGETITYKYVKGGSWDQEGLADHTREDDNDDDVSYYGYGAIGTDLK
101



glucosidase
VTVHNEGNNTMIVQDRILRWIDMPVVIEEVQKQGSQVTIKGNAIKNGVLTINGERVPIDGRMAF




(Geobacillus
SYTFTPASHQKEVSIHIEPSAESKTAIFNNDGGAIAKNTKDYVLNLETKQLREGKLTTPPSNGD





stearothermophilus]

SPESDWPGSETPSHDGGATPGNGTSPGSGGPSDGTSPGGSVPPGGTAPPGNGAPPSAPPQKPSP




GenBank:
SKPKEKPRKPTTPPSQVKKVYWDGVELKKGQIGRLTVQKPINLWKRAKDGRLVFVRILQPGEVY




KYD32676.1
RVYGYDARFGGQYAVGGGYYVTDIDTHIRYETPSKEKLKLVNGE




Beta-glucosid
MAMQLRSLLLCVLLLLLGFALADTNAAARIHPPVVCANLSRANFDTLVPGFVFGAATASYQVEG
102



(Almonds)
AANLDGRGPSIWDTFTHKHPEKIADGSNGDVAIDQYHRYKEDVAIMKDMGLESYRFSISWSRVL





PNGTLSGGINKKGIEYYNNLINELLHNGIEPLVTLFHWDVPQTLEDEYGGFLSNRIVNDFEEYA





ELCFKKFGDRVKHWTTLNEPYTFSSHGYAKGTHAPGRCSAWYNQTCFGGDSATEPYLVTHNLLL





AHAAAVKLYKTKYQAYQKGVIGITVVTPWFEPASEAKEDIDAVFRALDFIYGWFMDPLTRGDYP





QSMRSLVGERLPNFTKKESKSLSGSFDYIGINYYSARYASASKNYSGHPSYLNDVNVDVKTELN





GVPIGPQAASSWLYFYPKGLYDLLCYTKEKYNDPIIYITENGVDEFNQPNPKLSLCQLLDDSNR





IYYYYHHLCYLQAAIKEGVKVKGYFAWSLLDNFEWDNGYTVRFGINYVDYDNGLKRHSKHSTHW





FKSFLKKSSRNTKKIRRCGNNNTSATKFVF




DexT protein
MPANAPDKQSVTNAPVVPPKHDTDQQDDSLEKQQVLEPSVNSNIPKKQTNQQLAVVTAPANSAP
103



(Leucohostoc
QTKTTAEISAGTELDTMPNVKHVDGKVYFYGDDGQPKKNFTTIIDGKPYYFDKDTGALSNNDKQ





citreum]

YVSELFSIGNKHNAVYNTSSDNFTQLEGHLTASSWYRPKDILKNGKRWAPSTVTDFRPLLMAWW





PDKSTQVTYLNYMKDQGLLSGTHHFSDNENMRTLTAAAMQAQVNIEKKIGQLGNTDWLKTAMTQ





YIDAQPNWNIDSEAKGDDHLQGGALLYTNSDMSPKANSDYRKLSRTPKNQKGQIADKYKQGGFE





LLLANDVDNSNPVVQAEQLNWLHYMMNIGSILQNDDQANFDGYRVDAVDNVDADLLQIAGEYAK





AAYGVDKNDARANQHLSILEDWGDEDPDYVKAHGNQQITMDFPLHLAIKYALNMPNDKRSGLEP





TREHSLVKRITDDKENVAQPNYSFIRAHDSEVQTIIADIIKDKINPASTGLDSTVTLDQIKQAF





DIYNADELKADKVYTPYNIPASYALLLTNKDTIPRVYYGDMFTDDGQYMAKQSPYYQAIDALLK





ARIKYAAGGQTMKMNYFPDEQSVMTSVRYGKGAMTASDSGNQETRYQGIGLVVNNRPDLKLSDK





DEVKMDMGAAHKNQDYRPVLLTTKSGLKVYSTDANAPVVRTDANGQLTFKADMVYGVNDPQVSG





YIAAWVPVGASENQDARTKSETTQSTDGSVYHSNAALDSQVIYEGFSNFQDFPTTPDEFTNIKI





AQNVNLFKDWGITSFEMAPQYRASSDKSFLDAIVQNGYAFTDRYDIGYNTPTKYGTADNLLDAL





RALHGQGIQAINDWVPDQIYNLPDEQLVTAIRTDGSGDHTYGSVIDHTLYASKTVGGGIYQQQY





GGAFLEQLKTQYPQLFQQKQISTDQPMNPDIQIKSWEAKYFNGSNIQGRGAWYVLKDWGTQQYF





NVSDAQTFLPKQLLGEKAKTGFVTRGKETSFYSTSGYQAKSAFICDNGNWYYFDDKGKMVVGNQ





VINGINYYFLPNGIELQDAYLVHDGMYYYYNNIGKQLHNTYYQDKQKNFHYFFEDGHMAQGIVT





IIQSDGTPVTQYFDENGKQQKGVAVKGSDGHLHYFDGASGNMLFKSWGRLADGSWLYVDEKGNA





VTGKQTINNQTVYFNDDGRQIKNNFKELADGSWLYLNNKGVAVTGEQIINGQTLYFGNDGRQFK





GTTHINATGESRYYDPDSGNMITDRFERVGDNQWAYFGYDGVAVTGDRIIKGQKLYFNQNGIQM





KGHLRLENGIMRYYDADTGELVRNRFVLLSDGSWVYFGQDGVPVTGVQVINGQTLYFDADGRQV





KGQQRVIGNQRYWMDKDNGEMKKITYAAALEHHHHHH




DexT gene sequence
ATGCCAGCAAATGCCCCAGATAAACAATCAGTGACTAATGCACCAGTAGTGCCGCCAAAGCATG
104




ATACGGACCAGCAGGACGATTCACTAGAAAAACAGCAAGTATTAGAACCGAGCGTAAATAGTAA





TATACCAAAAAAGCAGACAAATCAACAGTTAGCGGTTGTTACAGCACCAGCAAATTCAGCACCT





CAAACCAAAACAACAGCAGAAATTTCTGCTGGTACAGAGTTAGACACGATGCCTAATGTTAAGC





ATGTAGATGGCAAAGTTTATTTTTATGGAGATGATGGCCAACCAAAAAAGAATTTTACTACTAT





TATAGATGGTAAACCTTACTACTTTGATAAAGATACAGGGGCACTATCTAATAACGATAAGCAA





TATGTATCGGAATTATTCAGTATTGGCAATAAACATAACGCCGTCTATAACACATCATCAGATA





ATTTTACGCAATTAGAAGGACATCTGACGGCAAGTAGTTGGTATCGTCCAAAAGATATTTTGAA





AAATGGTAAACGTTGGGCACCTTCAACAGTGACTGATTTCAGACCATTATTGATGGCCTGGTGG





CCGGATAAGAGTACGCAAGTCACTTATCTGAATTACATGAAAGATCAGGGCCTCTTGTCTGGTA





CTCATCACTTTTCCGATAATGAAAATATGCGGACCTTAACGGCAGCTGCCATGCAGGCACAGGT





AAACATTGAGAAAAAAATTGGGCAACTTGGCAATACGGATTGGTTGAAAACGGCGATGACGCAA





TACATTGATGCCCAGCCCAATTGGAATATTGACAGTGAGGCGAAAGGAGATGATCATCTACAAG





GTGGTGCACTACTTTATACAAATAGTGATATGTCGCCAAAGGCCAATTCTGATTATCGTAAGCT





GAGCCGTACGCCTAAAAATCAAAAAGGTCAAATTGCTGATAAATATAAGCAAGGTGGGTTTGAA





TTATTACTAGCAAACGATGTCGATAATTCTAATCCAGTTGTGCAAGCAGAACAACTTAATTGGT





TACATTATATGATGAATATCGGTAGTATTTTACAAAATGATGACCAAGCTAATTTTGATGGTTA





CCGTGTTGATGCTGTCGATAATGTGGACGCTGACTTACTACAGATTGCTGGTGAATATGCTAAG





GCTGCCTATGGTGTTGACAAAAATGACGCGAGAGCGAATCAACATTTATCAATTTTGGAAGACT





GGGGAGATGAAGATCCAGACTATGTCAAAGCACATGGCAACCAGCAAATTACAATGGATTTCCC





CTTGCATTTAGCGATTAAATACGCGCTCAACATGCCTAATGATAAGCGGAGTGGCCTTGAGCCA





ACCCGTGAACACAGTTTAGTCAAACGAATTACAGATGATAAAGAAAATGTTGCACAACCAAATT





ATTCATTTATCCGAGCTCATGACAGTGAAGTACAAACGATTATTGCTGATATTATTAAAGATAA





AATCAACCCGGCGTCAACAGGGCTAGATTCAACAGTGACTTTGGATCAAATTAAGCAGGCTTTT





GACATCTATAATGCTGATGAATTGAAAGCAGATAAAGTTTACACACCTTACAATATTCCAGCAT





CATACGCTTTGTTATTGACTAATAAAGACACAATTCCACGTGTTTATTATGGGGATATGTTCAC





GGATGATGGCCAATACATGGCTAAACAATCACCTTACTATCAAGCGATTGATGCGTTGTTGAAA





GCTCGTATCAAGTATGCTGCTGGTGGTCAAACCATGAAAATGAACTATTTTCCAGATGAACAAT





CTGTTATGACATCAGTTCGTTATGGTAAGGGTGCAATGACGGCAAGTGACTCTGGTAACCAAGA





GACACGCTATCAAGGTATTGGACTTGTTGTCAACAATCGCCCAGATTTGAAACTATCTGACAAA





GATGAAGTCAAAATGGATATGGGTGCGGCACATAAAAACCAAGATTATCGCCCAGTTTTGTTGA





CGACAAAATCAGGATTAAAAGTCTACAGCACTGATGCAAATGCACCTGTCGTTCGAACTGACGC





CAATGGCCAATTAACTTTTAAGGCAGACATGGTATATGGTGTAAACGACCCACAAGTGTCAGGG





TACATTGCGGCTTGGGTACCAGTAGGGGCTTCAGAAAATCAAGATGCTCGAACGAAAAGTGAAA





CAACGCAGTCAACTGACGGGAGTGTTTATCATTCTAATGCAGCGTTAGATTCGCAAGTCATTTA





TGAAGGCTTTTCAAATTTTCAAGACTTTCCAACAACACCCGATGAGTTTACGAACATTAAAATT





GCTCAAAATGTTAACTTATTTAAGGATTGGGGTATTACTAGCTTTGAAATGGCGCCACAATATC





GCGCCAGCTCAGATAAAAGTTTCTTAGATGCTATCGTACAAAATGGTTATGCATTTACAGATCG





ATATGATATTGGTTACAACACACCAACAAAGTATGGGACAGCAGATAATTTGTTAGATGCTTTA





CGTGCATTGCATGGTCAGGGTATTCAAGCGATTAACGACTGGGTACCAGATCAAATTTATAATC





TACCCGATGAACAGTTAGTCACGGCTATTCGAACAGACGGTTCAGGTGATCATACTTATGGTTC





AGTTATTGACCATACTTTGTATGCATCAAAGACAGTTGGCGGGGGCATTTATCAGCAACAATAT





GGTGGGGCCTTCTTGGAACAATTAAAAACACAGTACCCGCAACTTTTCCAGCAAAAACAGATTT





CCACAGATCAGCCAATGAACCCAGATATTCAAATTAAGTCATGGGAAGCCAAGTATTTCAACGG





TTCGAACATTCAGGGGCGTGGGGCTTGGTATGTTTTGAAGGACTGGGGCACACAACAGTATTTT





AATGTGTCAGATGCGCAGACCTTCCTTCCAAAGCAATTATTGGGTGAAAAGGCCAAAACTGGTT





TTGTTACGCGTGGTAAGGAGACTTCATTCTATTCCACTAGTGGCTATCAAGCAAAATCTGCCTT





TATTTGTGATAACGGTAATTGGTACTACTTTGATGACAAAGGGAAAATGGTTGTTGGAAACCAA





GTTATCAATGGCATCAATTATTACTTTTTACCGAATGGTATCGAATTACAAGATGCCTATCTAG





TACATGATGGTATGTACTATTATTATAATAATATTGGCAAGCAACTGCACAACACATATTACCA





AGATAAACAAAAAAATTTCCATTACTTCTTTGAAGATGGGCACATGGCACAGGGTATTGTCACC





ATCATTCAAAGTGATGGCACCCCAGTCACACAGTACTTTGATGAGAATGGTAAGCAACAAAAAG





GCGTGGCGGTCAAAGGATCAGATGGTCATTTGCATTACTTTGACGGTGCGTCAGGGAATATGCT





CTTTAAATCATGGGGTAGACTAGCAGATGGCTCTTGGCTATATGTAGACGAGAAAGGTAATGCG





GTTACAGGCAAACAAACCATTAATAATCAAACGGTTTACTTTAATGATGATGGTCGTCAAATCA





AAAATAACTTTAAAGAATTAGCAGATGGTTCTTGGCTTTATCTTAACAATAAAGGTGTTGCAGT





AACAGGAGAGCAAATAATTAATGGGCAGACACTTTATTTTGGTAACGATGGTCGTCAATTTAAA





GGGACAACACATATAAATGCTACTGGTGAAAGCCGTTACTATGACCCAGACTCAGGTAATATGA





TAACTGATCGTTTTGAACGTGTTGGTGATAATCAATGGGCTTATTTTGGTTATGATGGTGTTGC





AGTAACAGGGGACCGAATCATTAAAGGGCAAAAACTCTATTTCAACCAAAATGGTATCCAAATG





AAAGGCCACTTACGTCTTGAAAATGGTATCATGCGTTATTACGATGCTGATACTGGCGAATTAG





TTCGTAATCGATTTGTATTGCTATCTGATGGTTCATGGGTTTACTTTGGCCAAGATGGCGTACC





CGTAACTGGCGTGCAAGTGATTAATGGCCAAACATTATATTTTGACGCAGATGGTAGGCAAGTC





AAAGGGCAGCAACGTGTAATCGGCAATCAACGCTATTGGATGGATAAAGACAATGGTGAAATGA





AAAAAATAACATACGCGGCCGCACTCGAGCACCACCACCACCACCACTGA




DexT gene(DNA
ATGCAAAACGGCGAAGTGTGTCAGCGTAAAAAACTGTACAAGTCAGGGAAGATATTAGTTACAG
105



sequence cloned
CAAGTATTTTTGCTGTTATGGGTTTTGGTACTGCCATGTCACAAGCAAACGCGAGCAGTAGTGA




into pET23a)
TAATGATAGCAAAACACAAACTATTTCAAAAATAGTAAAAAGTAAAGTCGAACCGGCAACTGTT





CAACCAGCGAAACCAGCGGAACCTACTAATAAAATAGTTGACCAAGCAGATATGCATACGGTCA





GCGGGCAAAACAGCGTGCCACCAGTAGTGACTAATCAATCCAATTAACAGGCTGCAAAACCAAC





TACACCTGTTACCGATGTCACAGATACGCATAAAATCGAAGCAAACAACGTCCCTGCTGATGTT





ATGCCAGCAAATGCCCCAGATAAACAATCAGTGACTAATGCACCAGTAGTGCCGCCAAAGCATG





ATACGGACCAGCAGGACGATTCACTAGAAAAACAGCAAGTATTAGAACCGAGCGTAAATAGTAA





TATACCAAAAAAGCAGACAAATCAACAGTTAGCGGTTGTTACAGCACCAGCAAATTCAGCACCT





CAAACCAAAACAACAGCAGAAATTTCTGCTGGTACAGAGTTAGACACGATGCCTAATGTTAAGC





ATGTAGATGGCAAAGTTTATTTTTATGGAGATGATGGCCAACCAAAAAAGAATTTTACTACTAT





TATAGATGGTAAACCTTACTACTTTGATAAAGATACAGGGGCACTATCTAATAACGATAAGCAA





TATGTATCGGAATTATTCAGTATTGGCAATAAACATAACGCCGTCTATAACACATCATCAGATA





ATTTTACGCAATTAGAAGGACATCTGACGGCAAGTAGTTGGTATCGTCCAAAAGATATTTTGAA





AAATGGTAAACGTTGGGCACCTTCAACAGTGACTGATTTCAGACCATTATTGATGGCCTGGTGG





CCGGATAAGAGTACGCAAGTCACTTATCTGAATTACATGAAAGATCAGGGCCTCTTGTCTGGTA





CTCATCACTTTTCCGATAATGAAAATATGCGGACCTTAACGGCAGCTGCCATGCAGGCACAGGT





AAACATTGAGAAAAAAATTGGGCAACTTGGCAATACGGATTGGTTGAAAACGGCGATGACGCAA





TACATTGATGCCCAGCCCAATTGGAATATTGACAGTGAGGCGAAAGGAGATGATCATCTACAAG





GTGGTGCACTACTTTATACAAATAGTGATATGTCGCCAAAGGCCAATTCTGATTATCGTAAGCT





GAGCCGTACGCCTAAAAATCAAAAAGGTCAAATTGCTGATAAATATAAGCAAGGTGGGTTTGAA





TTATTACTAGCAAACGATGTCGATAATTCTAATCCAGTTGTGCAAGCAGAACAACTTAATTGGT





TACATTATATGATGAATATCGGTAGTATTTTACAAAATGATGACCAAGCTAATTTTGATGGTTA





CCGTGTTGATGCTGTCGATAATGTGGACGCTGACTTACTACAGATTGCTGGTGAATATGCTAAG





GCTGCCTATGGTGTTGACAAAAATGACGCGAGAGCGAATCAACATTTATCAATTTTGGAAGACT





GGGGAGATGAAGATCCAGACTATGTCAAAGCACATGGCAACCAGCAAATTACAATGGATTTCCC





CTTGCATTTAGCGATTAAATACGCGCTCAACATGCCTAATGATAAGCGGAGTGGCCTTGAGCCA





ACCCGTGAACACAGTTTAGTCAAACGAATTACAGATGATAAAGAAAATGTTGCACAACCAAATT





ATTCATTTATCCGAGCTCATGACAGTGAAGTACAAACGATTATTGCTGATATTATTAAAGATAA





AATCAACCCGGCGTCAACAGGGCTAGATTCAACAGTGACTTTGGATCAAATTAAGCAGGCTTTT





GACATCTATAATGCTGATGAATTGAAAGCAGATAAAGTTTACACACCTTACAATATTCCAGCAT





CATACGCTTTGTTATTGACTAATAAAGACACAATTCCACGTGTTTATTATGGGGATATGTTCAC





GGATGATGGCCAATACATGGCTAAACAATCACCTTACTATCAAGCGATTGATGCGTTGTTGAAA





GCTCGTATCAAGTATGCTGCTGGTGGTCAAACCATGAAAATGAACTATTTTCCAGATGAACAAT





CTGTTATGACATCAGTTCGTTATGGTAAGGGTGCAATGACGGCAAGTGACTCTGGTAACCAAGA





GACACGCTATCAAGGTATTGGACTTGTTGTCAACAATCGCCCAGATTTGAAACTATCTGACAAA





GATGAAGTCAAAATGGATATGGGTGCGGCACATAAAAACCAAGATTATCGCCCAGTTTTGTTGA





CGACAAAATCAGGATTAAAAGTCTACAGCACTGATGCAAATGCACCTGTCGTTCGAACTGACGC





CAATGGCCAATTAACTTTTAAGGCAGACATGGTATATGGTGTAAACGACCCACAAGTGTCAGGG





TACATTGCGGCTTGGGTACCAGTAGGGGCTTCAGAAAATCAAGATGCTCGAACGAAAAGTGAAA





CAACGCAGTCAACTGACGGGAGTGTTTATCATTCTAATGCAGCGTTAGATTCGCAAGTCATTTA





TGAAGGCTTTTCAAATTTTCAAGACTTTCCAACAACACCCGATGAGTTTACGAACATTAAAATT





GCTCAAAATGTTAACTTATTTAAGGATTGGGGTATTACTAGCTTTGAAATGGCGCCACAATATC





GCGCCAGCTCAGATAAAAGTTTCTTAGATGCTATCGTACAAAATGGTTATGCATTTACAGATCG





ATATGATATTGGTTACAACACACCAACAAAGTATGGGACAGCAGATAATTTGTTAGATGCTTTA





CGTGCATTGCATGGTCAGGGTATTCAAGCGATTAACGACTGGGTACCAGATCAAATTTATAATC





TACCCGATGAACAGTTAGTCACGGCTATTCGAACAGACGGTTCAGGTGATCATACTTATGGTTC





AGTTATTGACCATACTTTGTATGCATCAAAGACAGTTGGCGGGGGCATTTATCAGCAACAATAT





GGTGGGGCCTTCTTGGAACAATTAAAAACACAGTACCCGCAACTTTTCCAGCAAAAACAGATTT





CCACAGATCAGCCAATGAACCCAGATATTCAAATTAAGTCATGGGAAGCCAAGTATTTCAACGG





TTCGAACATTCAGGGGCGTGGGGCTTGGTATGTTTTGAAGGACTGGGGCACACAACAGTATTTT





AATGTGTCAGATGCGCAGACCTTCCTTCCAAAGCAATTATTGGGTGAAAAGGCCAAAACTGGTT





TTGTTACGCGTGGTAAGGAGACTTCATTCTATTCCACTAGTGGCTATCAAGCAAAATCTGCCTT





TATTTGTGATAACGGTAATTGGTACTACTTTGATGACAAAGGGAAAATGGTTGTTGGAAACCAA





GTTATCAATGGCATCAATTATTACTTTTTACCGAATGGTATCGAATTACAAGATGCCTATCTAG





TACATGATGGTATGTACTATTATTATAATAATATTGGCAAGCAACTGCACAACACATATTACCA





AGATAAACAAAAAAATTTCCATTACTTCTTTGAAGATGGGCACATGGCACAGGGTATTGTCACC





ATCATTCAAAGTGATGGCACCCCAGTCACACAGTACTTTGATGAGAATGGTAAGCAACAAAAAG





GCGTGGCGGTCAAAGGATCAGATGGTCATTTGCATTACTTTGACGGTGCGTCAGGGAATATGCT





CTTTAAATCATGGGGTAGACTAGCAGATGGCTCTTGGCTATATGTAGACGAGAAAGGTAATGCG





GTTACAGGCAAACAAACCATTAATAATCAAACGGTTTACTTTAATGATGATGGTCGTCAAATCA





AAAATAACTTTAAAGAATTAGCAGATGGTTCTTGGCTTTATCTTAACAATAAAGGTGTTGCAGT





AACAGGAGAGCAAATAATTAATGGGCAGACACTTTATTTTGGTAACGATGGTCGTCAATTTAAA





GGGACAACACATATAAATGCTACTGGTGAAAGCCGTTACTATGACCCAGACTCAGGTAATATGA





TAACTGATCGTTTTGAACGTGTTGGTGATAATCAATGGGCTTATTTTGGTTATGATGGTGTTGC





AGTAACAGGGGACCGAATCATTAAAGGGCAAAAACTCTATTTCAACCAAAATGGTATCCAAATG





AAAGGCCACTTACGTCTTGAAAATGGTATCATGCGTTATTACGATGCTGATACTGGCGAATTAG





TTCGTAATCGATTTGTATTGCTATCTGATGGTTCATGGGTTTACTTTGGCCAAGATGGCGTACC





CGTAACTGGCGTGCAAGTGATTAATGGCCAAACATTATATTTTGACGCAGATGGTAGGCAAGTC





AAAGGGCAGCAACGTGTAATCGGCAATCAACGCTATTGGATGGATAAAGACAATGGTGAAATGA





AAAAAATAACATACGCGGCCGCACTCGAGCACCACCACCACCACCACTGA




Dextransucrase
MEKKVRFKLRKVKKRWVTVSVASAVVTLTSLSGSLVKADSTDDRQQAVTESQASLVTTSEAAKE
106



(also known as 6-
TLTATDTSTATSATSQPTATVTDNVSTTNQSTNTTANTANFDVKPTTTSEQSKTDNSDKIIATS




glucosyltransferase)
KAVNRLTATGKFVPANNNTAHSRTVTDKIVPIKPKIGKLKQPSSLSQDDIAALGNVKNIRKVNG




UniProtKB/Swiss-
KYYYYKEDGTLQKNYALNINGKTFFFDETGALSNNTLPSKKGNITNNDNTNSFAQYNQVYSTDA




Prot: P13470.2
ANFEHVDHYLTAESWYRPKYILKDGKTWTQSTEKDFRPLLMTWWPDQETQRQYVNYMNAQLGIH





QTYNTATSPLQLNLAAQTIQTKIEEKITAEKNTNWLRQTISAFVKTQSAWNSDSEKPFDDHLQK





GALLYSNNSKLTSQANSNYRILNRTPTNQTGKKDPRYTADRTIGGYEFLLANDVDNSNPVVQAE





QLNWLHFLMNFGNIYANDPDANFDSIRVDAVDNVDADLLQIAGDYLKAAKGIHKNDKAANDHLS





ILEAWSYNDTPYLHDDGDNMINMDNRLRLSLLYSLAKPLNQRSGMNPLITNSLVNRTDDNAETA





AVPSYSFIRAHDSEVQDLIRNIIRAEINPNVVGYSFTMEEIKKAFEIYNKDLLATEKKYTHYNT





ALSYALLLTNKSSVPRVYYGDMFTDDGQYMAHKTINYEAIETLLKARIKYVSGGQAMRNQQVGN





SEIITSVRYGKGALKATDTGDRTTRTSGVAVIEGNNPSLRLKASDRVVVNMGAAHKNQAYRPLL





LTTDNGIKAYHSDQEAAGLVRYTNDRGELIFTAADIKGYANPQVSGYLGVWVPVGAAADQDVRV





AASTAPSTDGKSVHQNAALDSRVMFEGFSNFQAFATKKEEYTNVVIAKNVDKFAEWGVTDFEMA





PQYVSSTDGSFLDSVIQNGYAFTDRYDLGISKPNKYGTADDLVKAIKALHSKGIKVMADWVPDQ





MYALPEKEVVTATRVDKYGTPVAGSQIKNTLYVVDGKSSGKDQQAKYGGAFLEELQAKYPELFA





RKQISTGVPMDPSVKIKQWSAKYFNGTNILGRGAGYVLKDQATNTYFSLVSDNTFLPKSLVNPN





HGTSSSVTGLVFDGKGYVYYSTSGNQAKNAFISLGNNWYYFDNNGYMVTGAQSINGANYYFLSN





GIQLRNAIYDNGNKVLSYYGNDGRRYENGYYLFGQQWRYFQNGIMAVGLTRIHGAVQYFDASGF





QAKGQFITTADGKLRYFDRDSGNQISNRFVRNSKGEWFLFDHNGVAVTGTVTFNGQRLYFKPNG





VQAKGEFIRDADGHLRYYDPNSGNEVRNRFVRNSKGEWFLFDHNGIAVTGTRVVNGQRLYFKSN





GVQAKGELITERKGRIKYYDPNSGNEVRNRYVRTSSGNWYYFGNDGYALIGWHVVEGRRVYFDE





NGVYRYASHDQRNHWDYDYRRDFGRGSSSAVRFRHSRNGFFDNFFRF




Dextransucrase
MDKKVRYKLRKVKKRWVTVSVASAVMTLTTLSGGLVKADSNESKSQISNDSNTSVVTANEESNV
107



(also known as
TTEVTSKQEAASSQTNHTVTTISSSTSVVNPKEVVSNPYTVGETASNGEKLQNQTTTVDKTSEA




Sucrose 6-
AANNISKQTTEADTDVIDDSNAANLQILEKLPNVKEIDGKYYYYDNNGKVRTNFTLIADGKILH




glucosyltransferase)
FDETGAYTDTSIDTVNKDIVTTRSNLYKKYNQVYDRSAQSFEHVDHYLTAESWYRPKYILKDGK




UniProtKB/Swiss-
TWTQSTEKDFRPLLMTWWPSQETQRQYVNYMNAQLGINKTYDDTSNQLQLNIAAATIQAKIEAK




Prot: P08987.3
ITTLKNTDWLRQTISAFVKTQSAWNSDSEKPFDDHLQNGAVLYDNEGKLTPYANSNYRILNRTP





TNQTGKKDPRYTADNTIGGYEFLLANDVDNSNPVVQAEQLNWLHFLMNFGNIYANDPDANFDSI





RVDAVDNVDADLLQIAGDYLKAAKGIHKNDKAANDHLSILEAWSDNDTPYLHDDGDNMINMDNK





LRLSLLFSLAKPLNQRSGMNPLITNSLVNRTDDNAETAAVPSYSFIRAHDSEVQDLIRDIIKAE





INPNVVGYSFTMEEIKKAFEIYNKDLLATEKKYTHYNTALSYALLLTNKSSVPRVYYGDMFTDD





GQYMAHKTINYEAIETLLKARIKYVSGGQAMRNQQVGNSEIITSVRYGKGALKATDTGDRTTRT





SGVAVIEGNNPSLRLKASDRVVVNMGAAHKNQAYRPLLLTTDNGIKAYHSDQEAAGLVRYTNDR





GELIFTAADIKGYANPQVSGYLGVWVPVGAAADQDVRVAASTAPSTDGKSVHQNAALDSRVMFE





GFSNFQAFATKKEEYTNVVIAKNVDKFAEWGVTDFEMAPQYVSSTDGSFLDSVIQNGYAFTDRY





DLGISKPNKYGTADDLVKAIKALHSKGIKVMADWVPDQMYAFPEKEVVTATRVDKFGKPVEGSQ





IKSVLYVADSKSSGKDQQAKYGGAFLEELQAKYPELFARKQISTGVPMDPSVKIKQWSAKYFNG





TNILGRGAGYVLKDQATNTYFNISDNKEINFLPKTLLNQDSQVGFSYDGKGYVYYSTSGYQAKN





TFISEGDKWYYFDNNGYMVTGAQSINGVNYYFLSNGLQLRDAILKNEDGTYAYYGNDGRRYENG





YYQFMSGVWRHFNNGEMSVGLTVIDGQVQYFDEMGYQAKGKFVTTADGKIRYFDKQSGNMYRNR





FIENEEGKWLYLGEDGAAVTGSQTINGQHLYFRANGVQVKGEFVTDRYGRISYYDSNSGDQIRN





RFVRNAQGQWFYFDNNGYAVTGARTINGQHLYFRANGVQVKGEFVTDRHGRISYYDGNSGDQIR





NRFVRNAQGQWFYFDNNGYAVTGARTINGQHLYFRANGVQVKGEFVTDRYGRISYYDSNSGDQI





RNRFVRNAQGQWFYFDNNGYAVTGARTINGQHLYFRANGVQVKGEFVTDRYGRISYYDANSGER





VRIN




Glucosyltransferas
METKRRYKMYKVKKHWVTIAVASGLITLGTTTLGSSVSAETEQQTSDKVVTQKSEDDKAASESS
108



e-S (also known as
QTDAPKTKQAQTEQTQAQSQANVADTSTSITKETPSQNITTQANSDDKTVTNTKSEEAQTSEER




GTF-S,
TKQAEEAQATASSQALTQAKAELTKQRQTAAQENKNPVDLAAIPNVKQIDGKYYYIGSDGQPKK




Dextransucrase,
NFALTVNNKVLYFDKNTGALTDTSQYQFKQGLTKLNNDYTPHNQIVNFENTSLETIDNYVTADS




Sucrose 6-
WYRPKDILKNGKTWTASSESDLRPLLMSWWPDKQTQIAYLNYMNQQGLGTGENYTADSSQESLN




glucosyltransferase)
LAAQTVQVKIETKISQTQQTQWLRDIINSFVKTQPNWNSQTESDTSAGEKDHLQGGALLYSNSD




UniProtKB/Swiss-
KTAYANSDYRLLNRTPTSQTGKPKYFEDNSSGGYDFLLANDIDNSNPVVQAEQLNWLHYLMNYG




Prot: P49331.3
SIVANDPEANFDGVRVDAVDNVNADLLQIASDYLKAHYGVDKSEKNAINHLSILEAWSDNDPQY





NKDTKGAQLPIDNKLRLSLLYALTRPLEKDASNKNEIRSGLEPVITNSLNNRSAEGKNSERMAN





YIFIRAHDSEVQTVIAKIIKAQINPKTDGLTFTLDELKQAFKIYNEDMRQAKKKYTQSNIPTAY





ALMLSNKDSITRLYYGDMYSDDGQYMATKSPYYDAIDTLLKARIKYAAGGQDMKITYVEGDKSH





MDWDYTGVLTSVRYGTGANEATDQGSEATKTQGMAVITSNNPSLKLNQNDKVIVNMGTAHKNQE





YRPLLLTTKDGLTSYTSDAAAKSLYRKTNDKGELVFDASDIQGYLNPQVSGYLAVWVPVGASDN





QDVRVAASNKANATGQVYESSSALDSQLIYEGFSNFQDFVTKDSDYTNKKIAQNVQLFKSWGVT





SFEMAPQYVSSEDGSFLDSIIQNGYAFEDRYDLAMSKNNKYGSQQDMINAVKALHKSGIQVIAD





WVPDQIYNLPGKEVVTATRVNDYGEYRKDSEIKNTLYAANTKSNGKDYQAKYGGAFLSELAAKY





PSIFNRTQISNGKKIDPSEKITAWKAKYFNGTNILGRGVGYVLKDNASDKYFELKGNQTYLPKQ





MTNKEASTGFVNDGNGMTFYSTSGYQAKNSFVQDAKGNWYYFDNNGHMVYGLQHLNGEVQYFLS





NGVQLRESFLENADGSKNYFGHLGNRYSNGYYSFDNDSKWRYFDASGVMAVGLKTINGNTQYFD





QDGYQVKGAWITGSDGKKRYFDDGSGNMAVNRFANDKNGDWYYLNSDGIALVGVQTINGKTYYF





GQDGKQIKGKIITDNGKLKYFLANSGELARNIFATDSQNNWYYFGSDGVAVTGSQTIAGKKLYF





ASDGKQVKGSFVTYNGKVHYYHADSGELQVNRFEADKDGNWYYLDSNGEALTGSQRINGQRVFF





TREGKQVKGDVAYDERGLLRYYDKNSGNMVYNKVVTLANGRRIGIDRWGIARYY




Dextransucrase (EC
METKRRYKMHKVKKHWVTVAVASGLITLGTTTLGSSVSAETEQQTSDKVVTQKSEDDKAASESS
109



2.4.1.5) precursor
QTDAPKTKQAQTEQTQAQSQANVADTSTSITKETPSQNITTQANSDDKTVTNTKSEEAQTSEER




Streptococcus
TKQSEEAQTTASSQALTQAKAELTKQRQTAAQENKNPVDLAAIPNVKQIDGKYYYIGSDGQPKK




mutans
NFALTVNNKVLYFDKNTGALTDTSQYQFKQGLTKLNNDYTPHNQIVNFENTSLETIDNYVTADS




PIR: A45866
WYRPKDILKNGKTWTASSESDLRPLLMSWWPDKQTQIAYLNYMNQQGLGTGENYTADSSQESLN





LAAQTVQVKIETKISQTQQTQWLRDIINSFVKTQPNWNSQTESDTSAGEKDHLQGGALLYSNSD





KTAYANSDYRLLNRTPTSQTGKPKYFEDNSSGGYDFLLANDIDNSNPVVQAEQLNWLHYLMNYG





SIVANDPEANFDGVRVDAVDNVNADLLQIASDYLKAHYGVDKSEKNAINHLSILEAWSDNDPQY





NKDTKGAQLPIDNKLRLSLLYALTRPLEKDASNKNEIRSGLEPVITNSLNNRSAEGKNSERMAN





YIFIRAHDSEVQTVIAKIIKAQINPKTDGLTFTLDELKQAFKIYNEDMRQAKKKYTQSNIPTAY





ALMLSNKDSITRLYYGDMYSDDGQYMATKSPYYDAIDTLLKARIKYAAGGQDMKITYVEGDKSH





MDWDYTGVLTSVRYGTGANEATDQGSEATKTQGMAVITSNNPSLKLNQNDKVIVNMGAAHKNQE





YRPLLLTTKDGLTSYTSDAAAKSLYRKTNDKGELVFDASDIQGYLNPQVSGYLAVWVPVGASDN





QDVRVAASNKANATGQVYESSSALDSQLIYEGFSNFQDFVTKDSDYTNKKIAQNVQLFKSWGVT





SFEMAPQYVSSEDGSFLDSIIQNGYAFEDRYDLAMSKNNKYGSQQDMINAVKALHKSGIQVIAD





WVPDQIYNLPGKEVVTATRVNDYGEYRKDSEIKNTLYAANTKSNGKDYQAKYGGAFLSELAAKY





PSIFNRTQISNGKKIDPSEKITAWKAKYFNGTNILGRGVGYVLKDNASDKYFELKGNQTYLPKQ





MTNKEASTGFVNDGNGMTFYSTSGYQAKNSFVQDAKGNWYYFDNNGHMVYGLQQLNGEVQYFLS





NGVQLRESFLENADGSKNYFGHLGNRYSNGYYSFDNDSKWRYFDASGVMAVGLKTINGNTQYFD





QDGYQVKGAWITGSDGKKRYFDDGSGNMAVNRFANDKNGDWYYLNSDGIALVGVQTINGKTYYF





GQDGKQIKGKIITDNGKLKYFLANSGELARNIFATDSQNNWYYFGSDGVAVTGSQTIAGKKLYF





ASDGKQVKGSFVTYNGKVHYYHADSGELQVNRFEADKDGNWYYLDSNGEALTGSQRINDQRVFF





TREGKQVKGDVAYDERGLLRYYD




Dextranase
MFSAVLLGWLLFQPTVGHAIRQRAGNHTVCNSQLCTWWHDNGEINTASMVQLGNVRQSHKYLVQ
110




VSIAGVNDFYDSFAYESIPRNGRGRIYSPWDPPNSDTLGSDVDDGITIETSAGINMAWSQFEYS





TGVDVKILTRDGSRLPDPSGVKIRPTAISYDIRSSSDGGIVIRVPHDPNGRRFSVEFDNDLYTY





RSDGSRYVSSGGSIVGVEPRNALVIFASPFLPDNMVPRIDGPDTKVMTPGPINQGDWGSSGILY





FPPGVYWMNSNQQGQTPKIGENHIRLHPNTYWAYLAPGAYVKGAIEYSTKSDFYATGHGVLSGE





HYVYQANPATYYQALKSDATSLRMWWHNNLGGGQTWYCQGPTINAPPFNTMDFHGSSDITTRIS





DYKQVGAFFFQTDGPQMYPNSQVHDVFYHVNDDAIKTYYSGVTVTRATIWKAHNDPIIQMGWDT





RDVTGVTLQDLYIIHTRYIKSETYVPSAIIGASPFYMPGRSVDPAKSISMTISNLVCEGLCPAL





MRITPLQNYRDFRIQNVAFPDGLQANSIGTGKSIVPASSGLKFGVAISNWTVGGEQVTMSNFQS





DSLGQLDIDVSYWGQWVIR




Lanosterol
MTEFYSDTIGLPKTDPRLWRLRTDELGRESWEYLTPQQAANDPPSTFTQWLLQDPKFPQPHPER
111
SEQ ID NO: 55


synthase [S.
NKHSPDFSAFDACHNGASFFKLLQEPDSGIFPCQYKGPMFMTIGYVAVNYIAGIEIPEHERIEL

in WO



cerevisiae]

IRYIVNTAHPVDGGWGLHSVDKSTVFGTVLNYVILRLLGLPKDHPVCAKARSTLLRLGGAIGSP

2016/050890



HWGKIWLSALNLYKWEGVNPAPPETWLLPYSLPMHPGRWWVHTRGVYIPVSYLSLVKFSCPMTP





LLEELRNEIYTKPFDKINFSKNRNTVCGVDLYYPHSTTLNIANSLVVFYEKYLRNRFIYSLSKK





KVYDLIKTELQNTDSLCIAPVNQAFCALVTLIEEGVDSEAFQRLQYRFKDALFHGPQGMTIMGT





NGVQTWDCAFAIQYFFVAGLAERPEFYNTIVSAYKFLCHAQFDTECVPGSYRDKRKGAWGFSTK





TQGYTVADCTAEAIKAIIMVKNSPVFSEVHHMISSERLFEGIDVLLNLQNIGSFEYGSFATYEK





IKAPLAMETLNPAEVFGNIMVEYPYVECTDSSVLGLTYFHKYFDYRKEEIRTRIRIAIEFIKKS





QLPDGSWYGSWGICFTYAGMFALEALHTVGETYENSSTVRKGCDFLVSKQMKDGGWGESMKSSE





LHSYVDSEKSLVVQTAWALIALLFAEYPNKEVIDRGIDLLKNRQEESGEWKFESVEGVFNHSCA





IEYPSYRFLFPIKALGMYSRAYETHTL




DNA sequence
ATGAAGGTCTCTCCATTTGAGTTCATGTCGGCAATAATTAAGGGCAGGATGGACCCGTCCAATT
112
SEQ ID NO: 45


encoding S.
CTTCATTTGAGTCGACTGGCGAGGTTGCCTCAGTTATTTTCGAGAACCGTGAGCTGGTTGCGAT

of WO



grosvenorii

CTTAACCACCTCGATCGCCGTCATGATTGGCTGCTTCGTTGTTCTCATGTGGCGAAGAGCCGGC

2016/050890


CPR4497
AGTCGGAAAGTTAAGAACGTGGAGCTACCTAAGCCGTTGATTGTGCACGAGCCGGAGCCCGAAG





TTGAAGACGGCAAGAAGAAGGTTTCAATCTTCTTCGGTACACAGACAGGCACCGCCGAAGGATT





TGCAAAGGCTCTAGCTGACGAGGCGAAAGCACGATACGAGAAGGCCACATTTAGAGTTGTTGAT





TTGGATGATTATGCAGCTGATGACGATCAGTATGAAGAGAAGTTGAAGAACGAGTCTTTCGCTG





TCTTCTTATTGGCAACGTATGGCGATGGAGAGCCCACTGATAATGCCGCAAGATTCTATAAATG





GTTCGCGGAGGGGAAAGAGAGAGGGGAGTGGCTTCAGAACCTTCATTATGCGGTCTTTGGCCTT





GGCAACCGACAGTACGAGCATTTTAATAAGATTGCAAAGGTGGCAGATGAGCTGCTTGAGGCAC





AGGGAGGCAACCGCCTTGTTAAAGTTGGTCTTGGAGATGACGATCAGTGCATAGAGGATGACTT





CAGTGCCTGGAGAGAATCATTGTGGCCTGAGTTGGATATGTTGCTTCGAGATGAGGATGATGCA





ACAACAGTGACCACCCCTTACACAGCTGCCGTATTAGAATATCGAGTTGTATTCCATGATTCTG





CAGATGTAGCTGCTGAGGACAAGAGCTGGATCAATGCAAACGGTCATGCTGTACATGATGCTCA





GCATCCCTTCAGATCTAATGTGGTTGTGAGGAAGGAGCTCCATACGTCCGCATCTGATCGCTCC





TGTAGTCATCTAGAATTTAATATTTCTGGGTCTGCACTCAATTATGAAACAGGGGATCATGTCG





GTGTTTACTGTGAAAACTTAACTGAGACTGTGGACGAGGCACTAAACTTATTGGGTTTGTCTCC





TGAAACGTATTTCTCCATATATACTGATAACGAGGATGGCACTCCACTTGGTGGAAGCTCTTTA





CCACCTCCTTTTCCATCCTGCACCCTCAGAACAGCATTGACTCGATATGCAGATCTCTTGAATT





CACCCAAGAAGTCAGCTTTGCTTGCATTAGCAGCACATGCTTCAAATCCAGTAGAGGCTGACCG





ATTAAGATATCTTGCATCACCTGCCGGGAAGGATGAATACGCCCAGTCTGTGATTGGTAGCCAG





AAAAGCCTTCTTGAGGTCATGGCTGAATTTCCTTCTGCCAAGCCCCCACTTGGTGTCTTCTTCG





CAGCTGTTGCACCGCGCTTGCAGCCTCGATTCTACTCCATATCATCATCTCCAAGGATGGCTCC





ATCTAGAATTCATGTTACTTGTGCTTTAGTCTATGACAAAATGCCAACAGGACGTATTCATAAA





GGAGTGTGCTCAACTTGGATGAAGAATTCTGTGCCCATGGAGAAAAGCCATGAATGCAGTTGGG





CTCCAATTTTCGTGAGACAATCAAACTTCAAGCTTCCTGCAGAGAGTAAAGTGCCCATTATCAT





GGTTGGTCCTGGAACTGGATTGGCTCCTTTCAGAGGTTTCTTACAGGAAAGATTAGCTTTGAAG





GAATCTGGAGTAGAATTGGGGCCTTCCATATTGTTCTTTGGATGCAGAAACCGTAGGATGGATT





ACATATACGAGGATGAGCTGAACAACTTTGTTGAGACTGGTGCTCTCTCTGAGTTGGTTATTGC





CTTCTCACGCGAAGGGCCAACTAAGGAATATGTGCAGCATAAAATGGCAGAGAAGGCTTCGGAT





ATCTGGAATTTGATATCAGAAGGGGCTTACTTATATGTATGTGGTGATGCAAAGGGCATGGCTA





AGGATGTCCACCGAACTCTCCATACTATCATGCAAGAGCAGGGATCTCTTGACAGCTCAAAAGC





TGAGAGCATGGTGAAGAATCTGCAAATGAATGGAAGGTATCTGCGTGATGTCTGGTGA




CPR4497 protein
MKVSPFEFMSAIIKGRMDPSNSSFESTGEVASVIFENRELVAILTTSIAVMIGCFVVLMWRRAG
113
SEQ ID NO: 46


[S. grosvenorii]
SRKVKNVELPKPLIVHEPEPEVEDGKKKVSIFFGTQTGTAEGFAKALADEAKARYEKATFRVVD

of WO



LDDYAADDDQYEEKLKNESFAVFLLATYGDGEPTDNAARFYKWFAEGKERGEWLQNLHYAVFGL

2016/050890



GNRQYEHFNKIAKVADELLEAQGGNRLVKVGLGDDDQCIEDDFSAWRESLWPELDMLLRDEDDA





TTVTTPYTAAVLEYRVVFHDSADVAAEDKSWINANGHAVHDAQHPFRSNVVVRKELHTSASDRS





CSHLEFNISGSALNYETGDHVGVYCENLTETVDEALNLLGLSPETYFSIYTDNEDGTPLGGSSL





PPPFPSCTLRTALTRYADLLNSPKKSALLALAAHASNPVEADRLRYLASPAGKDEYAQSVIGSQ





KSLLEVMAEFPSAKPPLGVFFAAVAPRLQPRFYSISSSPRMAPSRIHVTCALVYDKMPTGRIHK





GVCSTWMKNSVPMEKSHECSWAPIFVRQSNFKLPAESKVPIIMVGPGTGLAPFRGFLQERLALK





ESGVELGPSILFFGCRNRRMDYIYEDELNNFVETGALSELVIAFSREGPTKEYVQHKMAEKASD





IWNLISEGAYLYVCGDAKGMAKDVHRTLHTIMQEQGSLDSSKAESMVKNLQMNGRYLRDVW




Codon optimized
ATGGACGCGATTGAACATAGAACCGTAAGTGTTAATGGTATCAATATGCATGTGGCAGAAAAGG
114
SEQ ID NO: 37


coding sequence of
GAGAGGGACCTGTCGTGTTGTTGCTTCATGGTTTCCCAGAATTGTGGTACAGTTGGAGACATCA

of WO


Epoxide hydrolase
AATATTGGCTCTTTCCTCTTTAGGTTACAGAGCTGTCGCACCAGACTTACGAGGCTACGGGGAT

2016/050890


l from S
ACAGATGCCCCAGGGTCAATTTCATCATACACATGCTTTCACATCGTAGGAGATCTCGTGGCTC




grosvenorii
TAGTTGAGTCTCTGGGTATGGACAGGGTTTTTGTTGTAGCCCACGATTGGGGTGCCATGATCGC





TTGGTGTTTGTGTCTGTTTAGACCTGAAATGGTTAAAGCTTTTGTTTGTCTCTCCGTCCCATTC





AGACAGAGAAACCCTAAGATGAAACCAGTTCAAAGTATGAGAGCCTTTTTCGGCGATGATTACT





ATATTTGCAGATTTCAAAATCCTGGGGAAATCGAAGAGGAGATGGCTCAAGTGGGTGCAAGGGA





AGTCTTAAGAGGAATTCTAACATCTCGTCGTCCTGGACCACCAATCTTACCAAAAGGGCAAGCT





TTTAGAGCAAGACCAGGAGCATCCACTGCATTGCCATCTTGGCTATCTGAAAAAGATCTGTCAT





TTTTCGCTTCTAAGTATGATCAAAAGGGCTTTACAGGCCCACTAAACTACTACAGAGCCATGGA





TCTTAATTGGGAATTGACTGCGTCATGGACTGGTGTCCAAGTTAAAGTACCTGTCAAATACATC





GTGGGTGACGTTGACATGGTTTTTACGACTCCTGGTGTAAAGGAATATGTCAACGGCGGTGGTT





TCAAAAAGGACGTTCCATTTTTACAGGAAGTGGTAATCATGGAAGGCGTTGGTCATTTCATTAA





TCAGGAAAAACCTGAGGAGATTTCATCTCATATACACGATTTCATAAGCAAATTCTAA




Codon optimized
ATGGATGAAATCGAACATATTACCATCAATACAAATGGAATCAAAATGCATATTGCGTCAGTCG
115
SEQ ID NO: 39


coding sequence of
GCACAGGACCAGTTGTTCTCTTGCTACACGGCTTTCCAGAATTATGGTACTCTTGGAGACACCA

of WO



S. grosvenorii

ACTACTTTACCTGTCCTCCGTTGGGTACAGAGCAATAGCTCCAGATTTGAGAGGCTATGGCGAT

2016/050890


Epoxide hydrolase
ACTGACAGTCCAGCTAGTCCTACCTCTTATACTGCTCTTCATATTGTAGGTGACCTGGTCGGCG




2
CATTAGACGAATTGGGAATAGAAAAGGTCTTTTTAGTGGGTCATGACTGGGGTGCTATTATCGC





ATGGTACTTTTGTTTGTTTAGACCAGATAGAATTAAAGCACTTGTGAATTTGTCTGTCCAGTTT





ATCCCACGTAACCCAGCAATACCTTTTATAGAAGGTTTCAGAACAGCTTTTGGTGATGACTTCT





ACATTTGTAGATTTCAAGTACCTGGGGAAGCTGAAGAGGATTTCGCGTCTATCGATACTGCTCA





ATTGTTTAAAACTTCATTATGCAATAGAAGCTCAGCCCCTCCTTGTTTGCCTAAAGAGATTGGT





TTTAGGGCTATCCCACCACCAGAAAATCTGCCATCTTGGCTCACAGAGGAAGATATCAACTTCT





ACGCAGCCAAGTTTAAACAAACTGGTTTTACTGGTGCCCTTAACTATTATAGAGCATTCGACTT





GACATGGGAATTAACAGCCCCATGGACAGGAGCCCAGATCCAAGTTCCTGTAAAGTTCATAGTT





GGTGATTCAGATCTCACGTACCATTTCCCTGGTGCTAAGGAATACATCCACAACGGAGGGTTTA





AAAGAGATGTGCCACTATTAGAGGAAGTTGTTGTGGTAAAAGATGCCTGCCACTTCATTAACCA





AGAGCGACCACAAGAGATTAATGCTCATATTCATGACTTCATCAATAAGTTCTAA




UGT3494
ATGGCGGATCGGAAAGAGAGCGTTGTGATGTTCCCGTTCATGGGGCAGGGCCATATCATCCCTT
116
SEQ ID NO: 29


[S. grosvenorii]
TTCTAGCTTTGGCCCTCCAGATTGAGCACAGAAACAGAAACTACGCCATATACTTGGTAAATAC

of WO



TCCTCTCAACGTTAAGAAAATGAGATCTTCTCTCCCTCCAGATTGA

2016/050890


Fragment of S.
TTCTGCTCCACGCCTGTAAATTTGGAAGCCATTAAACCAAAGCTTTCCAAAAGCTACTCTGATT
117
SEQ ID NO: 33



grosvenorii

CGATCCAACTAATGGAGGTTCCTCTCGAATCGACGCCGGAGCTTCCTCCTCACTATCATACAGC

of WO


UGT11789 gene
CAAAGGCCTTCCGCCGCATTTAATGCCCAAACTCATGAATGCCTTTAAAATGGTTGCTCCCAAT

2016/050890


sequence
CTCGAATCGATCCTAAAAACCCTAAACCCAGATCTGCTCATCGTCGACATTCTCCTTCCATGGA





TGCTTCCACTCGCTTCATCGCTCAAAATTCCGATGGTTTTCTTCACTATTTTCGGTGCCATGGC





CATCTCCTTTATGATTTATAATCGAACCGTCTCGAACGAGCTTCCATTTCCAGAATTTGAACTT





CACGAGTGCTGGAAATCGAAGTGCCCCTATTTGTTCAAGGACCAAGCGGAAAGTCAATCGTTCT





TAGAATACTTGGATCAATCTTCAGGCGTAATTTTGATCAAAACTTCCAGAGAGATTGAGGCTAA





GTATGTAGACTTTCTCACTTCGTCGTTTACGAAGAAGGTTGTGACCACCGGTCCCCTGGTTCAG





CAACCTTCTTCCGGCGAAGACGAGAAGCAGTACTCCGATATCATCGAATGGCTAGACAAGAAGG





AGCCGTTATCGACGGTGCTCGTTTCGTTTGGGAGCGAGTATTATCTGTCAAAGGAAGAGATGGA





AGAAATCGCCTACGGGCTGGAGAGCGCCAGCGAGGTGAATTTCATCTGGATTGTTAGGTTTCCG





ATGGGACAGGAAACGGAGGTCGAGGCGGCGCTGCCGGAGGGGTTCATCCAGAGGGCAGGAGAGA





GAGGGAAAGTGGTCGAGGGCTGGGCTCCGCAGGCGAAAATATTGGCGCATCCGAGCACCGGCGG





CCATGTGAGCCACAACGGGTGGAGCTCGATTGTGGAGTGCTTGATGTCCGGTGTACCGGTGATC





GGCGCGCCGATGCAACTTGACGGGCCAATCGTCGCAAGGCTGGTGGAGGAGATCGGCGTGGGTT





TGGAAATCAAGAGAGATGAGGAAGGGAGAATCACGAGGGGCGAAGTTGCCGATGCAATCAAGAC





GGTGGCGGTGGGCAAAACCGGGGAAGATTTTAGAAGGAAAGCAAAAAAAATCAGCAGCATTTTG





AAGATGAAAGATGAAGAAGAGGTTGACACTTTGGCAATGGAATTAGTGAGGTTATGCCAAATGA





AAAGAGGGCAGGAGTCTCAGGACTAA




UGT11999 gene
TCCCGGTCAACGGTAGAGGACTTCACGGAGCTTCGAGAGTGGATGCCTTCTGGATCGAACATGG
118
SEQ ID NO: 34


sequence
TCTACCGGTACCACGAGATTAAAAAATCCTTAGATGGAGCAACCGGCAACGAATCGGGGACGTC

of WO


[S. grosvenorii]
TGATTCGGTCCGATTCGGAATTGTGATTGAGGAGAGTGTTGCTGTGGCTGTAAGAAGCTCCCCT

2016/050890



GAACTGGAACCGGAATGGTTCGATTTGCTCGCGAAGCTTTACCAGAAGCCAGTTGTTCCGGTAG





GATTTCTACCTCCAGTAATTGAAGATGCGGAAGAATTGAGCAGCGATATCAAGGAATGGTTAGA





CAAACAGAGCTCAAACTCGGTCCTTTACGTCGCATTCGGGACCGAGGCGACTCTGAGTCAAGAT





GACGTCACTGAGTTAGCCATGGGGCTTGAGCAATCTGGGATACCATTTTTCTGGGTACTGAGAA





CCTCACCTCGGGACGAGTCAGACATGTTACCGGCCGGGTTCAAGGAGCGAGTCGAAGGTCGAGG





AAGTGTTCACGTGGGATGGGTCTCGCAGGTGAAGATACTGAGTCACGACTCGGTTGGCGGTTGT





TTGACACACTGTGGATGGAACTCGATCATAGAGGGGCTCGGATTCGGGCGCGTTATGGTATTGT





TTCCAGTCGTGAACGACCAGGGATTGAACGCTAGATTGTTGGGGGAGAAGAAGCTCGGGATAGA





GATAGAAAGGGACGAGCGAGATGGATCGTTCACACGCGACTCGGTGTCGGAATCGGTGAGGTCG





GCAATGGCGGAAAGTTCAGGCGAGGCCTTGAGAGTGAGGGCCAGGGAAATGAAGGGGTTGTTTG





GAAACGGAGATGAGAACGAGCATCAACTGAACAAGTTTGTACAATTTCTCGAGGCAAACAGGAA





TAGGCAGTCCGAGTAA




Partial UGT13679
CTGCTGCCGATTCCGCTGCCGAAACCGGCCGCCGATCTCTTGCCGGAAGGTGCAGAGGCGACGG
119
SEQ ID NO: 35


gene sequence
TGGATATTCCGTCCGACAAGATTCCGTATCTGAAATTGGCCCTCGATCTCGCCGAGCAGCCGTT

of WO


[Siraitia
TCGGAAGTTCGTCGTTGATCGTCCGCCGGATTGGATGATCGTCGATTTTAATGCTACTTGGGTC

2016/050890



grosvenorii]

TGCGATATTTCTCGGGAGCTTCAAATCCCAATCGTTTTCTTTCGTGTTCTTTCGCCTGGATTTC





TTGCTTTCTTTGCGCATGTTCTTGGGAGTGGTCTGCCGCTGTCGGAGATCGAAAGCCTGATGAC





TCCGCCGGTGATCGACGGGTCGACGGTGGCGTACCGCCGGCATGAAGCTGCCGTTATTTGTGCT





GGGTTTTTTGAGAAGAACGCTTCTGGTATGAGTGATCGCGATCGGGTAACCAAAATTCTCTCTG





CCAGTCAAGCAATCGCAGTTCGTTCTTGCTACGAATTTGACGTTGAGTATTTGAAATTGTACGA





GAAATATTGTGGAAAAAGAGTGATTCCTCTAGGGTTTCTCCCTCCAGAAAAGCCCCAAAAGTCC





GAGTTCGCCGCCGATTCGCCATGGAAACCGACCTTCGAGTGGCTTGACAAACAAAAGCCCCGAT





CAGTGGTGTTCGTCGGATTCGGCAGCGAATGCAAACTCACGAAAGATGATGTTTACGAGATAGC





GCGCGGGGTGGAGCTGTCGGAGCTGCCATTTTTGTGGGCTCTGAGAAAACCGATCTGGGCGGCG





GCGGACGATTCCGACGCTCTGCCTGCCGGATTCCTCGAGCGGACGGCGGAGAGAGGGATTGTGA





GCATGGGGTGGGCGCCGCAGATGGAGATTTTAACGCACCCGTCGATTGGCGGCTCTCTGTTTCA





CGCCGGGTGGGGATCCGCCATTGAAGCTCTGCAATTCGGGCATTGCCTTGTTCTGTTGCCATTC





ATCGTGGATCAGCCACTGAATGCAAGGCTTCTGGTGGAGAAGGGTGTTGCAGTCGAAGTTGGAA





GAAAGGAAGACGGGTCTTTTAGTGGAGAAGACATAGCTAAAGCTCTGAGAGAAGCTATGGTTTC





AGAAGAAGGTGAGCAGATGAGGAGGCAAGCGAGAAAG




Partial sequence
ATGGAAAACGACGGCGTTTTGCACGTGGTGGTATTCCCATGGCTAGCCTTGGGTCATCTCATTC
120
SEQ ID NO: 36


of S. grosvenorii
CTTTCGCTCGACTCGCCACCTGCTTAGCCCACAAGGGTCTCAGGGTTTCGTTCGTATCAACCAC

of WO


UGT 15423
AAGGAACCTGAGCAGAATTCCCAAAATACCCCCACATCTCTCCTCCTCCGTCAACCTCGTCGGC

2016/050890



TTTCCTCTGCCCCACGTCGACGGCCTTCCGGACGCCGCCGAGGCTTCCTCCGACGTGCCTTACA





ACAAGCAACAGTTACTGAAGAAGGCCTTCGACTCTCTGGAATCACCGCTCGCCGATTTGCTTCG





TGATTTGAATCCCGATTGGATTATCTACGATTACGCCTCTCATTGGCTTCCGCAGCTCGCGGCG





GAGCTCCGTATCTCGTCTGTTTTCTTCAGCCTCTTCACCGCGGCGTTTCTTGCTTTTCTTGGCC





CACCGTCGGCGTTGTCCGGCGACGGCAGTTCCCGGTGA




UGT1576 gene
ATGGCTTCTCCTCGCCACACTCCTCACTTTCTGCTCTTCCCTTTCATGGCTCAAGGCCACATGA
121
SEQ ID NO: 47


sequence [S.
TCCCCATGATTGACCTTGCCAGGCTTCTGGCTCAGCGAGGAGTTATCATCACTATTATCACCAC

of WO



grosvenorii]

GCCCCACAATGCTGCTCGCTACCACTCTGTTCTIATGCTCGCGCCATCGATTCTGGGTTACACATC

2016050890



CATGTCCTCCAACTGCAGTTTCCATGTAAGGAAGGTGGGCTGCCAGAAGGGTGCGAGAATGTGG





ACTTGCTACCTTCACTTGCTTCCATACCCAGATTCTACAGAGCAGCAAGTGATCTCCTTTACGA





ACCATCTGAAAAACTGTTTGAGGAACTCATCCCCCGGCCGACCTGCATAATCTCCGATATGTGC





CTGCCCTGGACCATGCGAATTGCTCTGAAATATCACGTCCCAAGGCTCGTTTTCTACAGTTTGA





GCTGCTTCTTTCTTCTCTGTATGCGGAGTTTAAAAAACAATCTAGCGCTTATAAGCTCCAAGTC





TGATTCTGAGTTCGTAAOTTTCTCTGACTTGCCTGATCCAGTCGAGTTTCTCAAGTCGGAGCTA





CCTAAATCCACCGATGAAGACTTGGTGAAGTTTAGTTATGAAATGGGGGAGGCCGATCGGCAGT





CATACGGCGTTATTTTAAATCTATTTGAGGAGATGGAACCAAAGTATCTTGCAGAATATGAAAA





GGAAAGAGAATCGCCGGAAAGAGTCTGGTGCGTCGGCCCAGTTTCGCTTTGCAACGACAACAAA





CTCGACAAAGCTGAAAGAGGCAACAAAGCCTCCATCGACGAATACAAATGCATCAGGTGGCTCG





ACGGGCAGCAGCCATCTTCGGTGGTTTACGTCTCTTTAGGAAGCTTGTGCAATCTGGTGACGGC





GCAGATCATAGAGCTGGGTTTGGGTTTGGAGGCATCAAAGAAACCCTTCATTTGGGTCATAAGA





AGAGGAAACATAACAGAGGAGTTACAGAAATGGCTTGTGGAGTACXATTTCGAGGAGAAAATTA





AAGGGAGAGGGCTGGTGIUTCTTGGCTGGGCICCCCAAGTTCTGATACTGTCACACCCTGCAAT





CGGATGCTTTTTGACGCACTGCGGTTGGAACTCAAGCATC6AAGGGATATCGGCCGGCGTGCCA





ATGGTCACCTGGCCGCTTTTTGCGATCAAGTCTTCAACGAGAAGCTAATTGCTACAAATACTCA





GAATCGGCGTAAGTGTAGGCACGGAAACTACTATGAACTGGGGAGAGGAAGAGGAGAAAGGGGT





GGTTGTGAAGAGAGAGAAAGTGAGGGAAGCCATAGAAATAGTGATGGATGGAGATGAGAGAGAA





GAGAGGAGAGAGAGATGCAAAGAGCTTGCTGAAACGGCGAAGAGAGCTATAGAAGAAGGGGGCT





CGTCTCACCGGAACCTCACGATGTTGATTGAAGATATAATTCATGGAGGAGGTTTGAGTTATGA





GAAAGGAAGTTGTCGCTGA




UGT SK98 gene
ATGGATGCCCAGCGAGGTCACACCACCACCATTTTGATGCTTCCATGGGTCGGCTACGGCCATC
122
SEQ ID NO: 49


sequence [S.
TCTTGCCTTTCCTCGAGCTGGCCAAAAGCCTCTCCAGGAGGAAATTATTCCACATCTACTTCTG

of WO 21589



grosvenorii]

TTCAACGTCTGTTAGCCTCGACGCCATTAAACCAAAGCTTCCTCCTTCTATCTCTTCTGATGAT





TCCATCCAACTTGTGGAACTTCGTCTCCCTTCTTCTCCTGAGTTACCTCCTCATCTTCACACAA





CCAACGGCCTTCCCTCTCACCTCATGCCCGCTCTCCACCAAGCCTTCGTCATGGCCGCCCAACA





CTTTCAGGTCATTTTACAAACACTTGCCCCGCATCTCCTCATTTATGACATTCTCCAACCTTGG





GCTCCTCAAGTGGCTTCATCCCTCAACATTCCAGCCATCAACTTCAGTACTACCGGAGCTTCAA





TGCTTTCTCGAACGCTTCACCCTACTCACTACCCAAGTTCTAAATTCCCAATCTCAGAGTTTGT





TCTTCACAATCACTGGAGAGCCATGTACACCACCGCCGATGGGGCTCTTACAGAAGAAGGCCAC





AAAATTGAAGAAACACTTGCGAATTGCTTGCATACTTCTTGCGGGGTAGTTTTGGTCAATAGTT





TCAGAGAGCTTGAGACGAAA6TATATCGATTATCTCTCTGTTCTCTTGAACAAGAAAGTTGTTC





CGGTCGGTCCTTTGGTTTACGAACCGAATCAAGAAGGGGAAGATGAAGGTTATTCAAGCATCAA





AAATTGGCTTGACAAAAAGGAACCGTCCTCAACCGTCTTCGTTTCATTTGGAACCGAATACTTC





CCGTCAAAGGAAGAAATGGAAGAGATAGCGTATGGGTTAGAGCTGAGCGAGGTTAATTTCATCT





GGGTCCTTAGATTTCCTCAAGGAGACAGCACCAGCACCATTGAAGACGCCTTGCCGAAGGGGTT





T





CTGGAGAGAGCGGGAGAGAGGGCGATGGTGGTGAAGGGTTGGGCTCCTCAGGCGAAGATACTGA





AGCATTGGAGCACAGGGGGGCTTGTGAGTCACTGTGGATGGAACTCGATGATGGAGGGCATGAT





GTTTGGCGTACCCATAATAGCGGTCCCGATGCATCTGGACCAGCCCTTTAACGCCGGACTCTTG





GAAGAAGCTGGCGTCGGCGTGGAAGCCAAGCGAGGTTCGGACGGCAAAATTCAAAGAGAAGAAG





TTGCAAAGTCGATCAAAGAAGTGGTGATTGAGAAAACCAGGGAAGACGTGAGGAAGAAAGCAAG





AGAAATGGGTGAGATTTTGAGGAGTAAAGGAGATGAGAAAATTGATGAGTTGGTGGCTGAAATT





TCTCTTTTGCGCAAAAAGGCTCCATGTTCAATTTAA





Sgrosvenorii UG98

ATGGATGCCCAGCGAGGTCACACCACAACCATTTTGATGTTTCCATGGCTCGGCTATGGCCATC
123
SEQ ID NO: 51


gene sequence
TTTCGGCTTTCCTAGAGTTGGCCAAAAGCCTCTCAAGGAGGAACTTCCATATCTACTTCTGTTC

of WO



AACCTCTGTTAACCTCGACGCCATTAAACCAAAGCTTCCTTCTTCTTCCTCTTCTGATTCCATC

2001606050089



CAACTTGTGGAACTTTGTCTTCCATCTTCTCCTGATCAGCTCCCTCCTCATCTTCACACAACCA

00



ACGCCCTCCCCCCTCACCTCATGCCCACTCTCCACCAAGCCTTCTCCATGGCTGCCCAACACTT





TGCTGCCATTTTACACACACTTGCTCCGCATCTCCTCATTTACGACTCTTTCCAACCTTGGGCT





CCTCAACTAGCTTCATCCCTCAACATTCCAGCCATCAACTTCAATACTACGGGAGCTTCAGTCC





TGACCCGAATGCTTCACGCTACTCACTACCCAAGTTCTAAATTCCCAATTTCAGAGTTTGTTCT





CCACGATTATTGGAAAGCCATGTACAGCGCCGCCGGTGGGGCTGTTACAAAAAAAGACCACAAA





ATTGGAGAAACACTTGCGAATTGCTTGCATGCTTCTTGTAGTGTAATTCTAATCAATAGTTTCA





GAGAGCTCGAGGAGAAATATATGGATTATCTCTCCGTTCTCTTGAACAAGAAAGTTGTTCCGGT





TGGTCCTTTGGTTTACGAACCGAATCAAGACGGGGAAGATGAAGGTTATTCAAGCATCAAAAAT





TGGCTTGACAAAAAGGAACCGTCCTCCACCGTCTTCGTTTCATTTGGAAGCGAATACTTCCCGT





CAAAGGAAGAAATGGAAGAGATAGCCCATGGGTTAGAGGCGAGCGAGGTTCATTTCATCTGGGT





CGTTAGGTTTCCTCAAGGAGACAACACCAGCGCCATTGAAGATGCCTTGCCGAAGGGGTTTCTG





GAGAGGGTGGGAGAGAGAGGGATGGTGGTGAAGGGTTGGGCTCCTCAGGCGAAGATACTGAAGC





ATTGGAGCACAGGGGGATTCGTGAGCCACTGTGGATGGAACTCGGTGATGGAAAGCATGATGTT





TGGCGTTCCCATAATAGGGGTTCCGATGCATCTGGACCAGCCCTTTAACGCCGGACTCGCGGAA





GAAGCTGGCGTCGGCGTGGAAGCCAAGCGAGATTCGGACGGCAAAATTCAAAGAGAAGAAGTTG





CAAAGTCGATCAAAGAAGTGGTGATTGAGAAAACCAGGGAAGACGTGAGGAAGAAAGCAAGAGA





AATGGGTGAGATTTTGAGGAGTAAAGGAGATGAGAAAATTGATGAGTTGGTGGCTGAAATTTCT





CTTTTGCGCAAAAAGGCTCCATGTTCAATTTAA




Codon optimized
CATTTGTCTGCTTTTTTGGAATTGGCCAAGTCCTTGTCTAGAAGAAACTTCCATATCTACTTTT
124
SEQ ID NO: 52


coding sequence
GCTCCACCTCCGTTAATTTGGATGCTATTAAGCCAAAGTTGCCATCCTCTTCATCCTCCGATTC

of WO


for UGT 5K98
TATTCAATTGGTTGAATTGTGCTTGCCATCTTCCCCAGATCAATTGCCACCACACTTGCATACA

2016050890



ACTAATGCTTTACCACCACATTTGATGCCAACATTGCATCAAGCTTTTTCTATGGCTGCTCAAC





ATTTTGCTGCTATCTTGCATACTTGGCTCCTCATTTGTTGATCTACGATTCTTTTCAACCATGG





GCTCCACAATTGGCTTCATCTTTGAATATTCCAGCCATCAACTTCAACACTACTGGTGCTTCAG





TTTTGACCAGAATGTTGCATGCTACTCATTACCCA




UGT protein also
MDAQRGHTTTILMFPWLGYGHLSAFLELAKSLSRRNFHIYFCSTSVNLDAIKPKLPSSSSSDSI
125
Seq ID NO: 6


referred to as
QLVELCLPSSPDQLPPHLHTTNALPPHLMPTLHQAFSMAAQHFAAILHTLAPHLLIYDSFQPWA

of WO


UG194A9, A9 or
PQLASSLNIPAINFNTTGASVLTRMLHATHYPSSKFPISEFVLHDYWKAMYSAAGGAVTKKDHK

2016038617


UGT94-289-1
IGETLANCLHASCSVILINSFRELEEKYMDYLSVLLNKKVVPVGPLVYEPNQDGEDEGYSSIKN





WLDKKEPSSTVFVSFGSEYFPSKEEMEEIAHGLEASEVHFIWVVRFPQGDNTSAIEDALPKGFL





ERVGERGMVVKGWAPQAKILKHWSTGGFVSHCGWNSVMESMMFGVPIIGVPMHLDQPFNAGLAE





EAGVGVEAKRDPDGKIQRDEVAKLIKEVVVEKTREDVRKKAREMSEILRSKGEEKMDEMVAAIS





LFLKI




UGT protien is
MENIEHTTVQTNGIKMHVAAIGTGPPVLLLHGFPELWYSWRHQLLYLSSAGYRAIAPDLRGYGD
126
Seq ID NO: 18


also referred to
TDAPPSPSSYTALHIVGDLVGLLDVLGIEKVFLIGHDWGAIIAWYFCLFRPDRIKALVNLSVQF

in


as EH1, EPH1 and
FPRNPTTPFVKGFRAVLGDQFYMVRFQEPGKAEEEFASVDIREFFKNVLSNRDPQAPYLPNEVK

W02016038617


contig 73966.
FEGVPPPALAPWLTPEDIDVYADKFAETGFTGGLNYYRAFDRTWELTAPWTGARIGVPVKFIVG





DLDLTYHFPGAQKYIHGEGFKKAVPGLEEVVVMEDTSHFINQERPHEINSHIHDFFSKFC




UCT85E5 gene
ATGGTGCAACCTCGGGTACTGCTGTTTCCTTTCCCGGCACTGGGCCACGTGAAGCCCTTCTTAT
127
Seq ID NO: 33


coding sequence
CACTGGCGGAGCTGCTTTCCGACGCCGGCATAGACGTCGTCTTCCTCAGCACCGAGTATAACCA

of



CCGTCGGATCTCCAACACTGAAGCCCTAGCCTCCCGCTTCCCGACGCTTCATTTCGAAACTATA

W02016038617



CCGGATGGCCTGCCGCCTAATGAGTCGCGCGCTCTTGCCGACGGCCCACTGTATTTCTCCATGC





GTGAGGGAACTAAACCGAGATTCCGGCAACTGATTCAATCTCTTAACGACGGTCGTTGGCCCAT





CACCTGTATTATCACTGACATCATGTTATCTTCTCCGATTGAAGTAGCGGAAGAATTTGGGATT





CCAGTAATTGCCTTCTGCCCCTGCAGTGCTCGCTACTTATCGATTCACTTTTTTATACCGAAGC





TCGTTGAGGAAGGTCAAATTCCATACGCAGATGACGATCCGATTGGAGAGATCCAGGGGGTGCC





CTTGTTCGAAGGTCTTTTGCGACGGAATCATTTGCCTGGTTCTTGGTCTGATAAATCTGCAGAT





ATATCTTTCTCGCATGGCTTGATTAATCAGACCCTTGCAGCTGGTCGAGCCTCGGCTCTTATAC





TCAACACCTTCGACGAGCTCGAAGCTCCATTTCTGACCCATCTCTCTTCCATTTTCAACAAAAT





CTACACCATTGGACCCCTCCATGCTCTGTCCAAATCAAGGCTCGGCGACTCCTCCTCCTCCGCT





TCTGCCCTCTCCGGATTCTGGAAAGAGGATAGAGCCTGCATGTCCTGGCTCGACTGTCAGCCGC





CGAGATCTGTGGTTTTCGTCAGTTTCGGGAGTACGATGAAGATGAAAGCCGATGAATTGAGAGA





GTTCTGGTATGGGTTGGTGAGCAGCGGGAAACCGTTCCTCTGCGTGTTGAGATCCGACGTTGTT





TCCGGCGGAGAAGCGGCGGAATTGATCGAACAGATGGCGGAGGAGGAGGGAGCTGGAGGGAAGC





TGGGAATGGTAGTGGAGTGGGCAGCGCAAGAGAAGGTCCTGAGCCACCCTGCCGTCGGTGGGTT





TTTGACGCACTGCGGGTGGAAC





TCAACGGTGGAAAGCATTGCCGCGGGAGTTCCGATGATGTGCTGGCCGATTCTCGGCGACCAAC





CCAGCAACGCCACTTGGATCGACAGAGTGTGGAAAATTGGGGTTGAAAGGAACAATCGTGAATG





GGACAGGTTGACGGTGGAGAAGATGGTGAGAGCATTGATGGAAGGCCAAAAGAGAGTGGAGATT





CAGAGATCAATGGAGAAGCTTTCAAAGTTGGCAAATGAGAAGGTTGTCAGGGGTGGGTTGTCTT





TTGATAACTTGGAAGTTCTCGTTGAAGACATCAAAAAATTGAAACCATATAAATTTTAA




UGT protein
MVQPRVLLFPFPALGHVKPFLSLAELLSDAGIDVVFLSTEYNHRRISNTEALASRFPTLHFETI
128
Seq ID NO: 34



PDGLPPNESRALADGPLYFSMREGTKPRFRQLIQSLNDGRWPITCIITDIMLSSPIEVAEEFGI

of



PVIAFCPCSARYLSIHFFIPKLVEEGQIPYADDDPIGEIQGVPLFEGLLRRNHLPGSWSDKSAD

W02016038617



ISFSHGLINQTLAAGRASALILNTFDELEAPFLTHLSSIFNKIYTIGPLHALSKSRLGDSSSSA





SALSGFWKEDRACMSWLDCQPPRSVVFVSFGSTMKMKADELREFWYGLVSSGKPFLCVLRSDVV





SGGEAAELIEQMAEEEGAGGKLGMVVEWAAQEKVLSHPAVGGFLTHCGWNSTVESIAAGVPMMC





WPILGDQPSNATWIDRVWKIGVERNNREWDRLTVE




UGT protein
MDAAQQGDTTTILMLPWLGYGHLSAFLELAKSLSRRNFHIYFCSTSVNLDAIKPKLPSSFSDSI
129
Seq ID NO: 38


referred to as
QFVELHLPSSPEFPPHLHTTNGLPPTLMPALHQAFSMAAQHFESILQTLAPHLLIYDSLQPWAP

of


UGT94C9 and UGT94-
RVASSLKIPAINFNTTGVFVISQGLHPIHYPHSKFPFSEFVLHNHWKAMYSTADGASTERTRKR

W02016038617


289-3
GEAFLYCLHASCSVILINSFRELEGKYMDYLSVLLNKKVVPVGPLVYEPNQDGEDEGYSSIKNW





LDKKEPSSTVFVSFGSEYFPSKEEMEEIAHGLEASEVNFIWVVRFPQGDNTSGIEDALPKGFLE





RAGERGMVVKGWAPQAKILKHWSTGGFVSHCGWNSVMESMMFGVPIIGVPMHVDQPFNAGLVEE





AGVGVEAKRDPDGKIQRDEVAKLIKEVVVEKTREDVRKKAREMSEILRSKGEEKFDEMVAEISL





LLKI




Coding sequence
TCACAACGTTTAGCTTTGACAGATGGTATCGCCGTGACAGACGGGGTCAAAGAAGTTTGGCTTT
130



for enzymes for
CATAAAAAGAGCTAGACATTCAAGCAAAAACTTGAATGTCTAGCTCTTTTGTTGATTGAATCGG




acetyl CoA
GGGATTTAAATACTTAGTTTCGATAAGAGCGAACGGTATTATTAATAGCAGAAATACTATATTT




synthesis
TAATTCATCTTCTAACGTTTGATCAATATCTGTGTCTAAAGTTTCTGCAAACATGGCTTCATAT





TCAGCGATAGAAAGTTCTGTCCGATTATCTAGCAGTGCTAAATGAGTTTCTTTTTGTAAATGAT





TTTGATAACCAGCTACTAATTCACCAGTGAAAAATTCAGCGACAGCACCAGAACCATAACTGAA





TAACCCAATTTGATTGCCTGCGGTTAAAGTCGTTGCATTTTCTAAAAGGGAAATGAGTCCCAGA





TAAAGTGAACCCGTATACAAGTTTCCTACGCGACGACTATAGATGATGCTTTCTTCATAACGGG





CTAAAATTCGTTCCTGTTCTGCTTCAGTTTGGTCGGAGATTTTTGCTAATAAGGCTTTTTTGCC





CATTTTTGTGTAAGGAATATGGAACGCTAAAGCATCATAATCTGCAAAATCAAGACCGGTTCTT





TTTTTATGTTCATCCCAGACTTGGGCAAAAGATTGGATGTAGGTTTCGTTTGACAAAGGACCAT





CGACCATAGGATACGGATGGCCTGTTGGACGCCAAAAGTCATAGATATCTTGCGTCAGCATCAC





ATTATCCTCTTTTAAAGCCAAGATGCGCGGTTCACTAGCAACTAACATTGCAACCGCCCCAGCT





CCTTGTGTAGGCTCACCGCCAGAATTTAATCCATATTTTGCAATATCTGCTGCTACAACCAAGA





CTTTTTTATCTGGATGTAAGGCTACGTGATTCTTAGCTAACTGTAAGCCTGCTGTTGCTCCGTA





ACAAGCTTCCTTGATTTCGAAAGAGCGAGCGAAAGGTTGAATCCCCATTAAACGATGTAAGACA





ACTGCGGCCGCTTTTGACTCATCGATACTGGACTCAGTCCCGACAATCACCATATCAATGGCCT





CTTTATCTTCTTTGGTCAAGATCGCTTCTGCGGCATTGGCTGCAAATGTCACAATATCTTGGCT





GATTGGGTTCACCGCCATTTGGTCTTGCCCAATACCAATATGAAATTTTCCAGGGTCTACATTT





CTGGCTTCAGCCAGTGCCGTCATATCAATATAATAAGGGGGCACAAAAAAACTAATTTTATCAA





TCCCAATTGTCATTTCTTTAACTCCTTTACGATAAATAGATTCATTATATAAAATAGCACGAAA





TGAACCAAAATGGGGAATTTTTGTATTAACTTCATAGATTATTAAAAAATATCTTATAAGTCTG





TTAACATTCAGTAATTGGCACTTGTGATTCTGGGATTTTATGATATATTTCAAGATGGAGGTGC





ATTTAGTTGAAAACAGTAGTTATTATTGATGCATTACGAACACCAATTGGAAAATATAAAGGCA





GCTTAAGTCAAGTAAGTGCCGTAGACTTAGGAACACATGTTACAACACAACTTTTAAAAAGACA





TTCCACTATTTCTGAAGAAATTGATCAAGTAATCTTTGGAAATGTTTTACAAGCTGGAAATGGC





CAAAATCCCGCACGACAAATAGCAATAAACAGCGGTTTATCTCATGAAATTCCCGCAATGACAG





TTAATGAGGTCTGCGGATCAGGAATGAAGGCCGTTATTTTGGCGAAACAATTGATTCAATTAGG





AGAAGCGGAAGTTTTAATTGCTGGCGGGATTGAGAATATGTCCCAAGCACCTAAATTACAACGA





TTTAATTACGAAACAGAAAGCTATGATGCGCCTTTTTCTAGTATGATGTACGATGGGTTAACGG





ATGCCTTTAGTGGTCAAGCAATGGGCTTAACTGCTGAAAATGTGGCCGAAAAGTATCATGTAAC





TAGAGAAGAGCAAGATCAATTTTCTGTACATTCACAATTAAAAGCAGCTCAAGCACAAGCAGAA





GGGATATTCGCTGACGAAATAGCCCCATTAGAAGTATCAGGAACGCTTGTGGAGAAAGATGAAG





GGATTCGCCCTAATTCGAGCGTTGAGAAGCTAGGAACGCTTAAAACAGTTTTTAAAGAAGACGG





TACTGTAACAGCAGGGAATGCATCAACCATTAATGATGGGGCTTCTGCTTTGATTATTGCTTCA





CAAGAATATGCCGAAGCACACGGTCTTCCTTATTTAGCTATTATTCGAGACAGTGTGGAAGTCG





GTATTGATCCAGCCTATATGGGAATTTCGCCGATTAAAGCCATTCAAAAACTGTTAGCGCGCAA





TCAACTTACTACGGAAGAAATTGATCTGTATGAAATCAACGAAGCATTTGCAGCAACTTCAATC





GTGGTCCAAAGAGAACTGGCTTTACCAGAGGAAAAGGTCAACATTTATGGTGGCGGTATTTCAT





TAGGTCATGCGATTGGTGCCACAGGTGCTCGTTTATTAACGAGTTTAAGTTATCAATTAAATCA





AAAAGAAAAGAAATATGGAGTGGCTTCTTTATGTATCGGCGGTGGCTTAGGACTCGCTATGCTA





CTAGAGAGACCTCAGCAAAAAAAAAACAGCCGATTTATCAAATGAGTCCTGAGGAACGCCTGGC





TTCTCTTCTTAATGAAGGCCAGATTTCTGCTGATACAAAAAAAGAATTTGAAAATACGGCTTTA





TCTTCGCAGATTGCCAATCATATGATTGAAAATCAAATCAGTGAAACAGAAGTGCCGATGGGCG





TTGGCTTACATTTAACAGTGGACGAAACTGATTATTTGGTACCAATGGCGACAGAAGAGCCCTC





AGTGATTGCGGCTTTGAGTAATGGTGCAAAAATAGCACAAGGATTTAAAACAGTGAATCAACAA





CGTTTAATGCGTGGACAAATCGTTTTTTACGATGTTGCAGACGCCGAGTCATTGATTGATGAAC





TACAAGTAAGAGAAACGGAAATTTTTCAACAAGCAGAGTTAAGTTATCCATCTATCGTTAAACG





CGGCGGCGGCTTAAGAGATTTGCAATATCGTGCTTTTGATGAATCATTTGTATCTGTCGACTTT





TTAGTAGATGTTAAGGATGCAATGGGGGCAAATATCGTTAACGCTATGTTGGAAGGTGTGGCCG





AGTTGTTCCGTGAATGGTTTGCGGAGCAAAAGATTTTATTCAGTATTTTAAGTAATTATGCCAC





GGAGTCGGTTGTTACGATGAAAACGGCTATTCCAGTTTCACGTTTAAGTAAGGGGAGCAATGGC





CGGGAAATTGCTGAAAAAATTGTTTTAGCTTCACGCTATGCTTCATTAGATCCTTATCGGGCAG





TCACGCATAACAAAGGGATCATGAATGGCATTGAAGCTGTCGTTTTAGCTACAGGAAATGATAC





ACGCGCTGTTAGCGCTTCTTGTCATGCTTTTGCGGTGAAGGAAGGTCGCTACCAAGGTTTGACT





AGTTGGACGCTGGATGGCGAACAACTAATTGGTGAAATTTCAGTTCCGCTTGCGTTAGCCACGG





TTGGCGGTGCCACAAAAGTCTTACCTAAATCTCAAGCAGCTGCTGATTTGTTAGCAGTGACGGA





TGCAAAAGAACTAAGTCGAGTAGTAGCGGCTGTTGGTTTGGCACAAAATTTAGCGGCGTTACGG





GCCTTAGTCTCTGAAGGAATTCAAAAAGGACACATGGCTCTACAAGCACGTTCTTTAGCGATGA





CGGTCGGAGCTACTGGTAAAGAAGTTGAGGCAGTCGCTCAACAATTAAAACGTCAAAAAACGAT





GAACCAAGACCGAGCCTTGGCTATTTTAAATGATTTAAGAAAACAATAAAAAAACAGTTCAGCA





GAAATTATTCTGCTGAACTGTTTTTTTTCACATTAGGTAGCCGTTTCAGGCCACGAATTGGTTT





TACTTTTAAGACATCTAAGAAGAAAGTGAA




CGT-SL
MKRWLSVVLSMSLVFSAFFLVSDTQKVTVEAAGNLNKVNFTSDIVYQIVVDRFVDGNTSNNPSG
148



glucotransferases
SLFSSGCTNLRKYCGGDWQGIINKINDGYLTEMGVTAIWISQPVENVFAVMNDADGSTSYHGYW




AAD00555.1
ARDFKKTNPFFGTLSDFQRLVDAAHAKGIKVIIDFAPNHTSPASETNPSYMENGRLYDNGTLIG





GYTNDTNSYFHHNGGTTFSNLEDGIYRNLFDLADFNHQNQFIDKYLKDAIKLWLDMGIDGIRMD





AVKHMPFGWQKSFMDEVYDYRPVFTFGEWFLSENEVDSNNHFFANESGMSLLDFRFGQKLRQVL





RNNSDDWYGFNQMIQDTASAYDEVIDQVTFIDNHDMDRFMADEGDPRKVDIALAVLLTSRGVPN





IYYGTEQYMTGNGDPNNRKMMTSFNKNTRAYQVIQKLSSLRRSNPALSYGDTEQRWINSDVYIY





ERQFGKDVVLVAVNRSLSKSYSITGLFTALPSGTYTDQLGALLDGNTIQVGSNGAVNAFNLGPG





EVGVWTYSAAESVPIIGHIGPMMGQVGHKLTIDGEGFGTNVGTVKFGNTVASVVSWSNNQITVT





VPNIPAGKYNITVQTSGGQVSAAYDNFEVLTNDQVSVRFVVNNANTNWGENIYLVGNVHELGNW





NTSKAIGPLFNQVIYSYPTWYVDVSVPEGKTIEFKFIKKDGSGNVIWESGSNHVYTTPTSTTGT





VNVNWQY




CGT-SL
MSRNGAVTPDWQFTVEVQEGETITYKYVKGGSWDQEGLADHTREDDNDDDVSYYGYGAIGTDLK
154



glucotransferases
VTVHNEGNNTMIVQDRILRWIDMPVVIEEVQKQGSQVTIKGNAIKNGVLTINGERVPIDGRMAF




KMY60644.1
SYTFTPASHQKEVSIHIEPSAESKTAIFNNDGGAIAKNTKDYVLNLETKQLREGKLTTPPSNGD





SPESDWPGSETPSHDGGATPGNGTSPGSGGPSDGTSPGGSVPPGGTAPPGNEAPPSRPPQKPSP





SKPKEKPRKPTTPPGQVKKVYWDGVELKKGQIGRLTVQKPINLWKRTKDGRLVFVRILQPGEVY





RVYGYDVRFGGQYAVGGGYYVTDIDTHIRYETPSKEKLKLVNGE




DexT protein
MPANAPDKQSVTNAPVVPPKHDTDQQDDSLEKQQVLEPSVNSNIPKKQTNQQLAVVTAPANSAP
156



[Leuconostoc
QTKTTAEISAGTELDTMPNVKHVDGKVYFYGDDGQPKKNFTTIIDGKPYYFDKDTGALSNNDKQ





citreum]

YVSELFSIGNKHNAVYNTSSDNFTQLEGHLTASSWYRPKDILKNGKRWAPSTVTDFRPLLMAWW





PDKSTQVTYLNYMKDQGLLSGTHHFSDNENMRTLTAAAMQAQVNIEKKIGQLGNTDWLKTAMTQ





YIDAQPNWNIDSEAKGDDHLQGGALLYTNSDMSPKANSDYRKLSRTPKNQKGQIADKYKQGGFE





LLLANDVDNSNPVVQAEQLNWLHYMMNIGSILQNDDQANFDGYRVDAVDNVDADLLQIAGEYAK





AAYGVDKNDARANQHLSILEDWGDEDPDYVKAHGNQQITMDFPLHLAIKYALNMPNDKRSGLEP





TREHSLVKRITDDKENVAQPNYSFIRAHDSEVQTIIADIIKDKINPASTGLDSTVTLDQIKQAF





DIYNADELKADKVYTPYNIPASYALLLTNKDTIPRVYYGDMFTDDGQYMAKQSPYYQAIDALLK





ARIKYAAGGQTMKMNYFPDEQSVMTSVRYGKGAMTASDSGNQETRYQGIGLVVNNRPDLKLSDK





DEVKMDMGAAHKNQDYRPVLLTTKSGLKVYSTDANAPVVRTDANGQLTFKADMVYGVNDPQVSG





YIAAWVPVGASENQDARTKSETTQSTDGSVYHSNAALDSQVIYEGFSNFQDFPTTPDEFTNIKI





AQNVNLFKDWGITSFEMAPQYRASSDKSFLDAIVQNGYAFTDRYDIGYNTPTKYGTADNLLDAL





RALHGQGIQAINDWVPDQIYNLPDEQLVTAIRTDGSGDHTYGSVIDHTLYASKTVGGGIYQQQY





GGAFLEQLKTQYPQLFQQKQISTDQPMNPDIQIKSWEAKYFNGSNIQGRGAWYVLKDWGTQQYF





NVSDAQTFLPKQLLGEKAKTGFVTRGKETSFYSTSGYQAKSAFICDNGNWYYFDDKGKMVVGNQ





VINGINYYFLPNGIELQDAYLVHDGMYYYYNNIGKQLHNTYYQDKQKNFHYFFEDGHMAQGIVT





IIQSDGTPVTQYFDENGKQQKGVAVKGSDGHLHYFDGASGNMLFKSWGRLADGSWLYVDEKGNA





VTGKQTINNQTVYFNDDGRQIKNNFKELADGSWLYLNNKGVAVTGEQIINGQTLYFGNDGRQFK





GTTHINATGESRYYDPDSGNMITDRFERVGDNQWAYFGYDGVAVTGDRIIKGQKLYFNQNGIQM





KGHLRLENGIMRYYDADTGELVRNRFVLLSDGSWVYFGQDGVPVTGVQVINGQTLYFDADGRQV





KGQQRVIGNQRYWMDKDNGEMKKITYAAALE




DexT gene (coding
ATGCCAGCAAATGCCCCAGATAAACAATCAGTGACTAATGCACCAGTAGTGCCGCCAAAGCATG
157



sequence)
ATACGGACCAGCAGGACGATTCACTAGAAAAACAGCAAGTATTAGAACCGAGCGTAAATAGTAA





TATACCAAAAAAGCAGACAAATCAACAGTTAGCGGTTGTTACAGCACCAGCAAATTCAGCACCT





CAAACCAAAACAACAGCAGAAATTTCTGCTGGTACAGAGTTAGACACGATGCCTAATGTTAAGC





ATGTAGATGGCAAAGTTTATTTTTATGGAGATGATGGCCAACCAAAAAAGAATTTTACTACTAT





TATAGATGGTAAACCTTACTACTTTGATAAAGATACAGGGGCACTATCTAATAACGATAAGCAA





TATGTATCGGAATTATTCAGTATTGGCAATAAACATAACGCCGTCTATAACACATCATCAGATA





ATTTTACGCAATTAGAAGGACATCTGACGGCAAGTAGTTGGTATCGTCCAAAAGATATTTTGAA





AAATGGTAAACGTTGGGCACCTTCAACAGTGACTGATTTCAGACCATTATTGATGGCCTGGTGG





CCGGATAAGAGTACGCAAGTCACTTATCTGAATTACATGAAAGATCAGGGCCTCTTGTCTGGTA





CTCATCACTTTTCCGATAATGAAAATATGCGGACCTTAACGGCAGCTGCCATGCAGGCACAGGT





AAACATTGAGAAAAAAATTGGGCAACTTGGCAATACGGATTGGTTGAAAACGGCGATGACGCAA





TACATTGATGCCCAGCCCAATTGGAATATTGACAGTGAGGCGAAAGGAGATGATCATCTACAAG





GTGGTGCACTACTTTATACAAATAGTGATATGTCGCCAAAGGCCAATTCTGATTATCGTAAGCT





GAGCCGTACGCCTAAAAATCAAAAAGGTCAAATTGCTGATAAATATAAGCAAGGTGGGTTTGAA





TTATTACTAGCAAACGATGTCGATAATTCTAATCCAGTTGTGCAAGCAGAACAACTTAATTGGT





TACATTATATGATGAATATCGGTAGTATTTTACAAAATGATGACCAAGCTAATTTTGATGGTTA





CCGTGTTGATGCTGTCGATAATGTGGACGCTGACTTACTACAGATTGCTGGTGAATATGCTAAG





GCTGCCTATGGTGTTGACAAAAATGACGCGAGAGCGAATCAACATTTATCAATTTTGGAAGACT





GGGGAGATGAAGATCCAGACTATGTCAAAGCACATGGCAACCAGCAAATTACAATGGATTTCCC





CTTGCATTTAGCGATTAAATACGCGCTCAACATGCCTAATGATAAGCGGAGTGGCCTTGAGCCA





ACCCGTGAACACAGTTTAGTCAAACGAATTACAGATGATAAAGAAAATGTTGCACAACCAAATT





ATTCATTTATCCGAGCTCATGACAGTGAAGTACAAACGATTATTGCTGATATTATTAAAGATAA





AATCAACCCGGCGTCAACAGGGCTAGATTCAACAGTGACTTTGGATCAAATTAAGCAGGCTTTT





GACATCTATAATGCTGATGAATTGAAAGCAGATAAAGTTTACACACCTTACAATATTCCAGCAT





CATACGCTTTGTTATTGACTAATAAAGACACAATTCCACGTGTTTATTATGGGGATATGTTCAC





GGATGATGGCCAATACATGGCTAAACAATCACCTTACTATCAAGCGATTGATGCGTTGTTGAAA





GCTCGTATCAAGTATGCTGCTGGTGGTCAAACCATGAAAATGAACTATTTTCCAGATGAACAAT





CTGTTATGACATCAGTTCGTTATGGTAAGGGTGCAATGACGGCAAGTGACTCTGGTAACCAAGA





GACACGCTATCAAGGTATTGGACTTGTTGTCAACAATCGCCCAGATTTGAAACTATCTGACAAA





GATGAAGTCAAAATGGATATGGGTGCGGCACATAAAAACCAAGATTATCGCCCAGTTTTGTTGA





CGACAAAATCAGGATTAAAAGTCTACAGCACTGATGCAAATGCACCTGTCGTTCGAACTGACGC





CAATGGCCAATTAACTTTTAAGGCAGACATGGTATATGGTGTAAACGACCCACAAGTGTCAGGG





TACATTGCGGCTTGGGTACCAGTAGGGGCTTCAGAAAATCAAGATGCTCGAACGAAAAGTGAAA





CAACGCAGTCAACTGACGGGAGTGTTTATCATTCTAATGCAGCGTTAGATTCGCAAGTCATTTA





TGAAGGCTTTTCAAATTTTCAAGACTTTCCAACAACACCCGATGAGTTTACGAACATTAAAATT





GCTCAAAATGTTAACTTATTTAAGGATTGGGGTATTACTAGCTTTGAAATGGCGCCACAATATC





GCGCCAGCTCAGATAAAAGTTTCTTAGATGCTATCGTACAAAATGGTTATGCATTTACAGATCG





ATATGATATTGGTTACAACACACCAACAAAGTATGGGACAGCAGATAATTTGTTAGATGCTTTA





CGTGCATTGCATGGTCAGGGTATTCAAGCGATTAACGACTGGGTACCAGATCAAATTTATAATC





TACCCGATGAACAGTTAGTCACGGCTATTCGAACAGACGGTTCAGGTGATCATACTTATGGTTC





AGTTATTGACCATACTTTGTATGCATCAAAGACAGTTGGCGGGGGCATTTATCAGCAACAATAT





GGTGGGGCCTTCTTGGAACAATTAAAAACACAGTACCCGCAACTTTTCCAGCAAAAACAGATTT





CCACAGATCAGCCAATGAACCCAGATATTCAAATTAAGTCATGGGAAGCCAAGTATTTCAACGG





TTCGAACATTCAGGGGCGTGGGGCTTGGTATGTTTTGAAGGACTGGGGCACACAACAGTATTTT





AATGTGTCAGATGCGCAGACCTTCCTTCCAAAGCAATTATTGGGTGAAAAGGCCAAAACTGGTT





TTGTTACGCGTGGTAAGGAGACTTCATTCTATTCCACTAGTGGCTATCAAGCAAAATCTGCCTT





TATTTGTGATAACGGTAATTGGTACTACTTTGATGACAAAGGGAAAATGGTTGTTGGAAACCAA





GTTATCAATGGCATCAATTATTACTTTTTACCGAATGGTATCGAATTACAAGATGCCTATCTAG





TACATGATGGTATGTACTATTATTATAATAATATTGGCAAGCAACTGCACAACACATATTACCA





AGATAAACAAAAAAATTTCCATTACTTCTTTGAAGATGGGCACATGGCACAGGGTATTGTCACC





ATCATTCAAAGTGATGGCACCCCAGTCACACAGTACTTTGATGAGAATGGTAAGCAACAAAAAG





GCGTGGCGGTCAAAGGATCAGATGGTCATTTGCATTACTTTGACGGTGCGTCAGGGAATATGCT





CTTTAAATCATGGGGTAGACTAGCAGATGGCTCTTGGCTATATGTAGACGAGAAAGGTAATGCG





GTTACAGGCAAACAAACCATTAATAATCAAACGGTTTACTTTAATGATGATGGTCGTCAAATCA





AAAATAACTTTAAAGAATTAGCAGATGGTTCTTGGCTTTATCTTAACAATAAAGGTGTTGCAGT





AACAGGAGAGCAAATAATTAATGGGCAGACACTTTATTTTGGTAACGATGGTCGTCAATTTAAA





GGGACAACACATATAAATGCTACTGGTGAAAGCCGTTACTATGACCCAGACTCAGGTAATATGA





TAACTGATCGTTTTGAACGTGTTGGTGATAATCAATGGGCTTATTTTGGTTATGATGGTGTTGC





AGTAACAGGGGACCGAATCATTAAAGGGCAAAAACTCTATTTCAACCAAAATGGTATCCAAATG





AAAGGCCACTTACGTCTTGAAAATGGTATCATGCGTTATTACGATGCTGATACTGGCGAATTAG





TTCGTAATCGATTTGTATTGCTATCTGATGGTTCATGGGTTTACTTTGGCCAAGATGGCGTACC





CGTAACTGGCGTGCAAGTGATTAATGGCCAAACATTATATTTTGACGCAGATGGTAGGCAAGTC





AAAGGGCAGCAACGTGTAATCGGCAATCAACGCTATTGGATGGATAAAGACAATGGTGAAATGA





AAAAAATAACATACGCGGCCGCACTCGAGCACCACCACCACCACCACTGA




DexT gene (coding
ATGCAAAACGGCGAAGTGTGTCAGCGTAAAAAACTGTACAAGTCAGGGAAGATATTAGTTACAG
158



sequence is c1oned
CAAGTATTTTTGCTGTTATGGGTTTTGGTACTGCCATGTCACAAGCAAACGCGAGCAGTAGTGA




into pET23a)
TAATGATAGCAAAACACAAACTATTTCAAAAATAGTAAAAAGTAAAGTCGAACCGGCAACTGTT





CAACCAGCGAAACCAGCGGAACCTACTAATAAAATAGTTGACCAAGCAGATATGCATACGGTCA





GCGGGCAAAACAGCGTGCCACCAGTAGTGACTAATCAATCCAATTAACAGGCTGCAAAACCAAC





TACACCTGTTACCGATGTCACAGATACGCATAAAATCGAAGCAAACAACGTCCCTGCTGATGTT





ATGCCAGCAAATGCCCCAGATAAACAATCAGTGACTAATGCACCAGTAGTGCCGCCAAAGCATG





ATACGGACCAGCAGGACGATTCACTAGAAAAACAGCAAGTATTAGAACCGAGCGTAAATAGTAA





TATACCAAAAAAGCAGACAAATCAACAGTTAGCGGTTGTTACAGCACCAGCAAATTCAGCACCT





CAAACCAAAACAACAGCAGAAATTTCTGCTGGTACAGAGTTAGACACGATGCCTAATGTTAAGC





ATGTAGATGGCAAAGTTTATTTTTATGGAGATGATGGCCAACCAAAAAAGAATTTTACTACTAT





TATAGATGGTAAACCTTACTACTTTGATAAAGATACAGGGGCACTATCTAATAACGATAAGCAA





TATGTATCGGAATTATTCAGTATTGGCAATAAACATAACGCCGTCTATAACACATCATCAGATA





ATTTTACGCAATTAGAAGGACATCTGACGGCAAGTAGTTGGTATCGTCCAAAAGATATTTTGAA





AAATGGTAAACGTTGGGCACCTTCAACAGTGACTGATTTCAGACCATTATTGATGGCCTGGTGG





CCGGATAAGAGTACGCAAGTCACTTATCTGAATTACATGAAAGATCAGGGCCTCTTGTCTGGTA





CTCATCACTTTTCCGATAATGAAAATATGCGGACCTTAACGGCAGCTGCCATGCAGGCACAGGT





AAACATTGAGAAAAAAATTGGGCAACTTGGCAATACGGATTGGTTGAAAACGGCGATGACGCAA





TACATTGATGCCCAGCCCAATTGGAATATTGACAGTGAGGCGAAAGGAGATGATCATCTACAAG





GTGGTGCACTACTTTATACAAATAGTGATATGTCGCCAAAGGCCAATTCTGATTATCGTAAGCT





GAGCCGTACGCCTAAAAATCAAAAAGGTCAAATTGCTGATAAATATAAGCAAGGTGGGTTTGAA





TTATTACTAGCAAACGATGTCGATAATTCTAATCCAGTTGTGCAAGCAGAACAACTTAATTGGT





TACATTATATGATGAATATCGGTAGTATTTTACAAAATGATGACCAAGCTAATTTTGATGGTTA





CCGTGTTGATGCTGTCGATAATGTGGACGCTGACTTACTACAGATTGCTGGTGAATATGCTAAG





GCTGCCTATGGTGTTGACAAAAATGACGCGAGAGCGAATCAACATTTATCAATTTTGGAAGACT





GGGGAGATGAAGATCCAGACTATGTCAAAGCACATGGCAACCAGCAAATTACAATGGATTTCCC





CTTGCATTTAGCGATTAAATACGCGCTCAACATGCCTAATGATAAGCGGAGTGGCCTTGAGCCA





ACCCGTGAACACAGTTTAGTCAAACGAATTACAGATGATAAAGAAAATGTTGCACAACCAAATT





ATTCATTTATCCGAGCTCATGACAGTGAAGTACAAACGATTATTGCTGATATTATTAAAGATAA





AATCAACCCGGCGTCAACAGGGCTAGATTCAACAGTGACTTTGGATCAAATTAAGCAGGCTTTT





GACATCTATAATGCTGATGAATTGAAAGCAGATAAAGTTTACACACCTTACAATATTCCAGCAT





CATACGCTTTGTTATTGACTAATAAAGACACAATTCCACGTGTTTATTATGGGGATATGTTCAC





GGATGATGGCCAATACATGGCTAAACAATCACCTTACTATCAAGCGATTGATGCGTTGTTGAAA





GCTCGTATCAAGTATGCTGCTGGTGGTCAAACCATGAAAATGAACTATTTTCCAGATGAACAAT





CTGTTATGACATCAGTTCGTTATGGTAAGGGTGCAATGACGGCAAGTGACTCTGGTAACCAAGA





GACACGCTATCAAGGTATTGGACTTGTTGTCAACAATCGCCCAGATTTGAAACTATCTGACAAA





GATGAAGTCAAAATGGATATGGGTGCGGCACATAAAAACCAAGATTATCGCCCAGTTTTGTTGA





CGACAAAATCAGGATTAAAAGTCTACAGCACTGATGCAAATGCACCTGTCGTTCGAACTGACGC





CAATGGCCAATTAACTTTTAAGGCAGACATGGTATATGGTGTAAACGACCCACAAGTGTCAGGG





TACATTGCGGCTTGGGTACCAGTAGGGGCTTCAGAAAATCAAGATGCTCGAACGAAAAGTGAAA





CAACGCAGTCAACTGACGGGAGTGTTTATCATTCTAATGCAGCGTTAGATTCGCAAGTCATTTA





TGAAGGCTTTTCAAATTTTCAAGACTTTCCAACAACACCCGATGAGTTTACGAACATTAAAATT





GCTCAAAATGTTAACTTATTTAAGGATTGGGGTATTACTAGCTTTGAAATGGCGCCACAATATC





GCGCCAGCTCAGATAAAAGTTTCTTAGATGCTATCGTACAAAATGGTTATGCATTTACAGATCG





ATATGATATTGGTTACAACACACCAACAAAGTATGGGACAGCAGATAATTTGTTAGATGCTTTA





CGTGCATTGCATGGTCAGGGTATTCAAGCGATTAACGACTGGGTACCAGATCAAATTTATAATC





TACCCGATGAACAGTTAGTCACGGCTATTCGAACAGACGGTTCAGGTGATCATACTTATGGTTC





AGTTATTGACCATACTTTGTATGCATCAAAGACAGTTGGCGGGGGCATTTATCAGCAACAATAT





GGTGGGGCCTTCTTGGAACAATTAAAAACACAGTACCCGCAACTTTTCCAGCAAAAACAGATTT





CCACAGATCAGCCAATGAACCCAGATATTCAAATTAAGTCATGGGAAGCCAAGTATTTCAACGG





TTCGAACATTCAGGGGCGTGGGGCTTGGTATGTTTTGAAGGACTGGGGCACACAACAGTATTTT





AATGTGTCAGATGCGCAGACCTTCCTTCCAAAGCAATTATTGGGTGAAAAGGCCAAAACTGGTT





TTGTTACGCGTGGTAAGGAGACTTCATTCTATTCCACTAGTGGCTATCAAGCAAAATCTGCCTT





TATTTGTGATAACGGTAATTGGTACTACTTTGATGACAAAGGGAAAATGGTTGTTGGAAACCAA





GTTATCAATGGCATCAATTATTACTTTTTACCGAATGGTATCGAATTACAAGATGCCTATCTAG





TACATGATGGTATGTACTATTATTATAATAATATTGGCAAGCAACTGCACAACACATATTACCA





AGATAAACAAAAAAATTTCCATTACTTCTTTGAAGATGGGCACATGGCACAGGGTATTGTCACC





ATCATTCAAAGTGATGGCACCCCAGTCACACAGTACTTTGATGAGAATGGTAAGCAACAAAAAG





GCGTGGCGGTCAAAGGATCAGATGGTCATTTGCATTACTTTGACGGTGCGTCAGGGAATATGCT





CTTTAAATCATGGGGTAGACTAGCAGATGGCTCTTGGCTATATGTAGACGAGAAAGGTAATGCG





GTTACAGGCAAACAAACCATTAATAATCAAACGGTTTACTTTAATGATGATGGTCGTCAAATCA





AAAATAACTTTAAAGAATTAGCAGATGGTTCTTGGCTTTATCTTAACAATAAAGGTGTTGCAGT





AACAGGAGAGCAAATAATTAATGGGCAGACACTTTATTTTGGTAACGATGGTCGTCAATTTAAA





GGGACAACACATATAAATGCTACTGGTGAAAGCCGTTACTATGACCCAGACTCAGGTAATATGA





TAACTGATCGTTTTGAACGTGTTGGTGATAATCAATGGGCTTATTTTGGTTATGATGGTGTTGC





AGTAACAGGGGACCGAATCATTAAAGGGCAAAAACTCTATTTCAACCAAAATGGTATCCAAATG





AAAGGCCACTTACGTCTTGAAAATGGTATCATGCGTTATTACGATGCTGATACTGGCGAATTAG





TTCGTAATCGATTTGTATTGCTATCTGATGGTTCATGGGTTTACTTTGGCCAAGATGGCGTACC





CGTAACTGGCGTGCAAGTGATTAATGGCCAAACATTATATTTTGACGCAGATGGTAGGCAAGTC





AAAGGGCAGCAACGTGTAATCGGCAATCAACGCTATTGGATGGATAAAGACAATGGTGAAATGA





AAAAAATAACATACGCGGCCGCACTCGAGCACCACCACCACCACCACTGA




Transglucosidase
MSFRSLLALSGLVCTGLANVISKRATLDSWLSNEATVARTAILNNIGADGAWVSGADSGIVVAS
163



CAA25303.1 GI:2343
PSTDNPDYFYTWTRDSGLVLKTLVDLFRNGDTSLLSTIENYISAQAIVQGISNPSGDLSSGAGL





GEPKFNVDETAYTGSWGRPQRDGPALRATAMIGFGQWLLDNGYTSTATDIVWPLVRNDLSYVAQ





YWNQTGYDLWEEVNGSSFFTIAVQHRALVEGSAFATAVGSSCSWCDSQAPEILCYLQSFWTGSF





ILANFDSSRSGKDANTLLGSIHTFDPEAACDDSTFQPCSPRALANHKEVVDSFRSIYTLNDGLS





DSEAVAVGRYPEDTYYNGNPWFLCTLAAAEQLYDALYQWDKQGSLEVTDVSLDFFKALYSDAAT





GTYSSSSSTYSSIVDAVKTFADGFVSIVETHAASNGSMSEQYDKSDGEQLSARDLTWSYAALLT





ANNRRNSVVPASWGETSASSVPGTCAATSAIGTYSSVTVTSWPSIVATGGTTTTATPTGSGSVT





STSKTTATASKTSTSTSSTSCTTPTAVAVTFDLTATTTYGENIYLVGSISQLGDWETSDGIALS





ADKYTSSDPLWYVTVTLPAGESFEYKFIRIESDDSVEWESDPNREYTVPQACGTSTATVTDTWR





EYTVPQACGTSTATVTDTWR




Glucoamylase G1
ATLDSWLSNEATVARTAILNNIGADGAWVSGADSGIVVASPSTDNPTYFYTRDSGLVLKTLVDL
164



1008149A
FRNGDTSLLSTIENYISAQAIVQGISNPSGDLSSGAGLGEPKFNVDETAYTGSWGRPQRDGPAL




GI:224027
RATAMIGFGQWLLDNGYTSTATDIVWPLVRNDLSYVAQYWNQTGYDLWEEVNGSSFFTIAVQHR





ALVEGSAFATAVGSSCSWCDSQAPEILCYLQSFWTGSFILANFDSSRSGKDANTLLGSIHTFDP





EAACDDSTFQPCSPRALANHKEVVDSFRSIYTLNDGLSDSEAVAVGRYPEDTYYNGNPWFLCTL





AAAEQLYDALYQWDKQGSLEVTDVSLDFFKALYSDAATGTYSSSSSTYSSIVDAVKTFADGFVS





IVETHAASNGSMSEQYDKSDGEQLSARDLTWSYAALLTANNRRNSVVPASWGETSASSVPGTCA





ATSAIGTYSSVTVTSWPSIVATGGTTTTATPTGSGSVTSTSKTTATASKTSTSTSSTSCTTPTA





VAVTFDLTATTTYGENIYLVGSISQLGDWETSDGIALSADKYTSSDPLWYVTVTLPAGESFEYK





FIRIESDDSVEWESDPNREYTVPQACGTSTATVTDTWR




Transglucosidase
SLLAPSQPQFXIPASAAVGAQLIANIDDPQAADAQSVCPGYKASKVQHNSRGFTASLQLAGRPC
165




NVYGTDVESLTLSVEYQDSDRLNIQILPTHVDSTXASWYFLSENLVPRPKASLXASVSQSDLFV





SWSNEPSFNFKVIRKATGDALFSTEGTVLVYENQFIEFVTALPEEYNLYGLGEHITQFRLQRNA





XLTIYPSDDGTPIDQNLYGQHPFYLDTRYYKGDRQ




Transglucosidase
SQDYISLSHGVFLRNSHGLEILLRSQKLIWRTLGGGIDLTFYSGPAPADVTRQYLTSTVGLPAM
166



AAB23581.1
QQYNTLGFHQCRWGYNXWSDLADVVANFEKFEIPLEYIWTDIDYMHGYRNFDNDQHRFSYSEGD




GI:257187
EFLSKLHESGRYYVPIVDAALYIPNPEXASDAYATYDRGAADDVFLKNPDGSLYIGAVWPGYTV





FPDWHHPKAVDFWANELVIWSKKVAFDGVWYDMSEVSSFCVGSCGTGXLTLNPAHPSFLLPGEP





GDIIYDYPEAFXITXATEAASAXAGASXQAAATATTXXXXVSYLRTTPXPGVRNVEHPPYVINH





DQEGHDLSVHAVSPXATHVDGVEEYDVHGLYGHQGLXATYQGLLEVWSHKRRPFIIGRSTFAGS





GKWAGHWGGDNYSKWWSMYYSISQALSFSLFGIPMFGADTCGFNGNSDEELCNRWMQLSAFFPF





YRNHNELSTIPQEPYRWASVIEATKSAMRIRYAILPYFYTLFDLAHTTGSTVMRALSWEFPNDP





TLAAVETQFMVGPAIMVVPVLEPLVNTVKGVFPGVGHGEVWYDWYTQAAVDAKPGVXTTISAPL





GHIPVYVRGGNILPMQEPALTTREARQTPWALLAALGSXGTASGQLYLDDGEXIYPXATLHVDF





TASRSSLRSSAQGRWKERNPLAMVTVLGVNKEPSAVTLNGQAVFPGSVTYXSTSQVLFVGGLQX





LTKGGAWAENWVLEW




Transglucosidase
MSFRSLLALSGLVCTGLANVISKRATLDSWLSNEATVARTAILNNIGADGAWVSGADSGIVVAS
167



CAA25219.1
PSTDNPDYFYTWTRDSGLVLKTLVDLFRNGDTSLLSTIENYISAQAIVQGISNPSGDLSSGAGL





GEPKFNVDETAYTGSWGRPQRDGPALRATAMIGFGQWLLDNGYTSTATDIVWPLVRNDLSYVAQ





YWNQTGYDLWEEVNGSSFFTIAVQHRALVEGSAFATAVGSSCSWCDSQAPEILCYLQSFWTGSF





ILANFDSSRSGKDANTLLGSIHTFDPEAACDDSTFQPCSPRALANHKEVVDSFRSIYTLNDGLS





DSEAVAVGRYPEDTYYNGNPWFLCTLAAAEQLYDALYQWDKQGSLEVTDVSLDFFKALYSDAAT





GTYSSSSSTYSSIVDAVKTFADGFVSIVETHAASNGSMSEQYDKSDGEQLSARDLTWSYAALLT





ANNRRNSVVPASWGETSASSVPGTCAATSAIGTYSSVTVTSWPSIVATGGTTTTATPTGSGSVT





STSKTTATASKTSTSTSSTSCTTPTAVAVTFDLTATTTYGENIYLVGSISQLGDWETSDGIALS





ADKYTSSDPLWYVTVTLPAGESFEYKFIRIESDDSVEWESDPNREYTVPQACGTSTATVTDTWR




Transglucosidase
MVKLTHLLARAWLVPLAYGASQSLLSTTAPSQPQFTIPASADVGAQLIANIDDPQAADAQSVCP
168



BAA23616.1
GYKASKVQHNSRGFTASLQLAGRPCNVYGTDVESLTLSVEYQDSDRLNIQILPTHVDSTNASWY





FLSENLVPRPKASLNASVSQSDLFVSWSNEPSFNFKVIRKATGDALFSTEGTVLVYENQFIEFV





TALPEEYNLYGLGEHITQFRLQRNANLTIYPSDDGTPIDQNLYGQHPFYLDTRYYKGDRQNGSY





IPVKSSEADASQDYISLSHGVFLRNSHGLEILLRSQKLIWRTLGGGIDLTFYSGPAPADVTRQY





LTSTVGLPAMQQYNTLGFHQCRWGYNNWSDLADVVANFEKFEIPLEYIWTDIDYMHGYRNFDND





QHRFSYSEGDEFLSKLHESGRYYVPIVDAALYIPNPENASDAYATYDRGAADDVFLKNPDGSLY





IGAVWPGYTVFPDWHHPKAVDFWANELVIWSKKVAFDGVWYDMSEVSSFCVGSCGTGNLTLNPA





HPSFLLPGEPGDIIYDYPEAFNITNATEAASASAGASSQAAATATTTSTSVSYLRTTPTPGVRN





VEHPPYVINHDQEGHDLSVHAVSPNATHVDGVEEYDVHGLYGHQGLNATYQGLLEVWSHKRRPF





IIGRSTFAGSGKWAGHWGGDNYSKWWSMYYSISQALSFSLFGIPMFGADTCGFNGNSDEELCNR





WMQLSAFFPFYRNHNELSTIPQEPYRWASVIEATKSAMRIRYAILPYFYTLFDLAHTTGSTVMR





ALSWEFPNDPTLAAVETQFMVGPAIMVVPVLEPLVNTVKGVFPGVGHGEVWYDWYTQAAVDAKP





GVNTTISAPLGHIPVYVRGGNILPMQEPALTTREARQTPWALLAALGSNGTASGQLYLDDGESI





YPNATLHVDFTASRSSLRSSAQGRWKERNPLANVTVLGVNKEPSAVTLNGQAVFPGSVTYNSTS





QVLFVGGLQNLTKGGAWAENWVLEW




Transglucosidase
MVKLTHLLARAWLVPLAYGASQSLLSTTAPSQPQFTIPASADVGAQLIANIDDPQAADAQSVCP
169



P56526.1
GYKASKVQHNSRGFTASLQLAGRPCNVYGTDVESLTLSVEYQDSDRLNIQILPTHVDSTNASWY





FLSENLVPRPKASLNASVSQSDLFVSWSNEPSFNFKVIRKATGDALFSTEGTVLVYENQFIEFV





TALPEEYNLYGLGEHITQFRLQRNANLTIYPSDDGTPIDQNLYGQHPFYLDTRYYKGDRQNGSY





IPVKSSEADASQDYISLSHGVFLRNSHGLEILLRSQKLIWRTLGGGIDLTFYSGPAPADVTRQY





LTSTVGLPAMQQYNTLGFHQCRWGYNNWSDLADVVANFEKFEIPLEYIWTDIDYMHGYRNFDND





QHRFSYSEGDEFLSKLHESGRYYVPIVDAALYIPNPENASDAYATYDRGAADDVFLKNPDGSLY





IGAVWPGYTVFPDWHHPKAVDFWANELVIWSKKVAFDGVWYDMSEVSSFCVGSCGTGNLTLNPA





HPSFLLPGEPGDIIYDYPEAFNITNATEAASASAGASSQAAATATTTSTSVSYLRTTPTPGVRN





VEHPPYVINHDQEGHDLSVHAVSPNATHVDGVEEYDVHGLYGHQGLNATYQGLLEVWSHKRRPF





IIGRSTFAGSGKWAGHWGGDNYSKWWSMYYSISQALSFSLFGIPMFGADTCGFNGNSDEELCNR





WMQLSAFFPFYRNHNELSTIPQEPYRWASVIEATKSAMRIRYAILPYFYTLFDLAHTTGSTVMR





ALSWEFPNDPTLAAVETQFMVGPAIMVVPVLEPLVNTVKGVFPGVGHGEVWYDWYTQAAVDAKP





GVNTTISAPLGHIPVYVRGGNILPMQEPALTTREARQTPWALLAALGSNGTASGQLYLDDGESI





YPNATLHVDFTASRSSLRSSAQGRWKERNPLANVTVLGVNKEPSAVTLNGQAVFPGSVTYNSTS





QVLFVGGLQNLTKGGAWAENWVLEW




Transglucosidase
MSFRSLLALSGLVCTGLANVISKRATWDSWLSNEATVARTAILNNIGADGAWVSGADSGIVVAS
170



AAP04499.1
PSTDNPDYFYTWTRDSGLVLKTLVDLFRNGDTSLLSTIENYISAQAIVQGISNPSGDLSSGAGL





GEPKFNVDETAYTGSWGRPQRDGPALRATAMIGFGQWLLDNGYTSTATDIVWPLVRNDLSYVAQ





YWNQTGYDLWEVNGSSFFTIAVQHRALVEGSAFATAVGSSCSWCDSQAPEILCYLQSFWTGSFI





LANFDSSRSAKDANTLLLGSIHTFDPEAACDDSTFQPCSPRALANHKEVVDSFRSIYTLNDGLS





DSEAVAVGRYPEDTYYNGNPWFLCTLAAAEQLYDALYQWDKQGSLEVTDVSLDFFKALYSDATG





TYSSSSSTYSSIVDAVKTFADGFVSIVETHAASNGSMSEQYDKSDGEQLSARDLTWSYAALLTA





NNRRNVVPSASWGETSASSVPGTCAATSAIGTYSSVTVTSWPSIVATGGTTTTATPTGSGSVTS





TSKTTATASKTSTSTSSTSCTTPTAVAVTFDLTATTTYGENIYLVGSISQLGDWETSDGIALSA





DKYTSSDPLWYVTVTLPAGESFEYKFIRIESDDSVEWESDPNREYTVPQACGTSTATVTDTWR




Transglucosidase
SSTSCTTPTAVAVTFDLTATTTYGENIYLVGSISQLGDWDTSDGIALSADKYTSSNPLWYVTVT
171



AAM18050.2
LPAGESFEYKFIRIESDDSVEWESDPNREYTVPQVCGESTATVTDTWR




Transglucosidase
MSFRSLLALSGLVCTGLANVISKRATLDSWLSNEATVARTAILNNIGADGAWVSGADSGIVVAS
172



AAT67041.1
PSTDNPDYFYTWTRDSGLVLKTLVDLFRNGDTSLLSTIENYISAQAIVQGISNPSGDLSSGAGL





GEPKFNVDETAYTGSWGRPQRDGPALRATAMIGFRQWLLDNGYTSTATDIVWPLVRNDLSYVAQ





YWNQTGYDLWEEVNGSSFFTIAVQHRALVEGSAFATAVGSSCSWCDSQAPEILCYLQSFWTGSF





ILANFDSSRSGKDANTLLGSIHTFDPEAACDDSTFQPCSPRALANHKEVVDSFRSIYTLNDGLS





DSEAVAVGRYPEDTYYNGNPWFLCTLAAAEQLYDALYQWDKQGSLEVTDVSLDFFKALYSDAAT





GTYSSSSSTYSSIVDAVKTFADGFVSIVETHAASNGSMSEQYDKSDGEQLSARDLTWSYAALLT





ANNRRNSVVPASWGETSASSVPGTCAATSAIGTYSSVTVTSWPSIVATGGTTTTATPTGSGSVT





STSKTTATASKTSTSTSSTSCTTPTAVAVTFDLTATTTYGENIYLVGSISQLGDWETSDGIALS





ADKYTSGDPLWYVTVTLPAGESFEYKFIRIESDDSVEWESDPNREYTVPQACGTSTATVTDTWR




Transglucosidase
MSFRSLLALSGLVCTGLANVISKRATLDSWLSNEATVARTAILNNIGADGAWVSGADSGIVVAS
173



P69328.1
PSTDNPDYFYTWTRDSGLVLKTLVDLFRNGDTSLLSTIENYISAQAIVQGISNPSGDLSSGAGL





GEPKFNVDETAYTGSWGRPQRDGPALRATAMIGFGQWLLDNGYTSTATDIVWPLVRNDLSYVAQ





YWNQTGYDLWEEVNGSSFFTIAVQHRALVEGSAFATAVGSSCSWCDSQAPEILCYLQSFWTGSF





ILANFDSSRSGKDANTLLGSIHTFDPEAACDDSTFQPCSPRALANHKEVVDSFRSIYTLNDGLS





DSEAVAVGRYPEDTYYNGNPWFLCTLAAAEQLYDALYQWDKQGSLEVTDVSLDFFKALYSDAAT





GTYSSSSSTYSSIVDAVKTFADGFVSIVETHAASNGSMSEQYDKSDGEQLSARDLTWSYAALLT





ANNRRNSVVPASWGETSASSVPGTCAATSAIGTYSSVTVTSWPSIVATGGTTTTATPTGSGSVT





STSKTTATASKTSTSTSSTSCTTPTAVAVTFDLTATTTYGENIYLVGSISQLGDWETSDGIALS





ADKYTSSDPLWYVTVTLPAGESFEYKFIRIESDDSVEWESDPNREYTVPQACGTSTATVTDTWR




Transglucosidase
MVKLTHLLARAWLVPLAYGASQSLLSTTAPSQPQFTIPASADVGAQLIANIDDPQAADAQSVCP
174



BAF37801.1
GYKASKVQHNSRGFTASLQLAGRPCNVYGTDVESLTLSVEYQDSDRLNIQILPTHVDSTNASWY





FLSENLVPRPKASLNASVSQSDLFVSWSNEPSFNFKVIRKATGDALFSTEGTVLVYENQFIEFV





TALPEEYNLYGLGEHITQFRLQRNANLTIYPSDDGTPIDQNLYGQHPFYLDTRYYKGDRQNGSY





IPVKSSEADASQDYISLSHGVFLRNSHGLEILLRSQKLIWRTLGGGIDLTFYSGPAPADVTRQY





LTSTVGLPAMQQYNTLGFHQCRWGYNNWSDLADVVANFEKFEIPLEYIWTDIDYMHGYRNFDND





QHRFSYSEGDEFLSKLHESGRYYVPIVDAALYIPNPENASDAYATYDRGAADDVFLKNPDGSLY





IGAVWPGYTVFPDWHHPKAVDFWANELVIWSKKVAFDGVWYDMSEVSSFCVGSCGTGNLTLNPA





HPSFLLPGEPGDIIYDYPEAFNITNATEAASASAGASSQAAATATTTSTSVSYLRTTPTPGVRN





VEHPPYVINHDQEGHDLSVHAVSPNATHVDGVEEYDVHGLYGHQGLNATYQGLLEVWSHKRRPF





IIGRSTFAGSGKWAGHWGGDNYSKWWSMYYSISQALSFSLFGIPMFGADTCGFNGNSDEELCNR





WMQLSAFFPFYRNHNELSTIPQEPYRWASVIEATKSAMRIRYAILPYFYTLFDLAHTTGSTVMR





ALSWEFPNDPTLAAVETQFMVGPAIMVVPVLEPLVNTVKGVFPGVGHGEVWYDWYTQAAVDAKP





GVNTTISAPLGHIPVYVRGGNILPMQEPALTTREARQTPWALLAALGSNGTASGQLYLDDGESI





YPNATLHVDFTASRSSLRSSAQGRWKERNPLANVTVLGVNKEPSAVTLNGQAVFPGSVTYNSTS





QVLFVGGLQNLTKGGAWAENWVLEW




Transglucosidase
MLYAEDNKLIFRFDDHLLWIQPWGENALRVRATKLASMPTEDWALSSKVTSIEPTISIEEHKDS
175



CAK37022.1
SITNGKIKATVSQRGKITIYNQKGEKLLEEYARNRRDLKDPKCSALEVEARELRPILGGDFHLT





MRFESLDPKEKIYGMGQYQQPFLNLKGVDIELAHRNSQASVPFALSSLGYGFLWNNPAIGRAVL





GTNTMSFEAYSTKVLDYWVVAGDSPAEIEEAYSKVTGYVPMMPEYGLGFWQCKLRYWNQEQLLD





VAREYKRRNIPLDLIVVDFFHWKHQGEWSFDPEFWPDPDAMIKELQSLNVELMVSVWPTVENAS





TNYPEMLEKGLLIRHDRGLRVSMQCNGDITHFDATNPSARAYVWSKAKQNYYDKGIKVFWLDEA





EPEYSVYDFDLYRYHAGSNLQIGNIFPKEYARGFYEGMESAGQTNIVNLLRCAWAGSQKYGALV





WSGDIASSWSSFRNQLAAGLNMGLAGIPWWTTDIGGFHGGDPSDPAFRELFTRWFQWGAFCPVM





RLHGDREPKPENRPTDSGSDNEIWSYGEEVYEICKKYIGIREELRDYTRGLMKEAHEKGTPVMR





TLFYEFPADKKAWDVETEHLFGSKYLVVPVFEAGKRSVEVYLPAGASWKVWGQEDVIHEGGKEI





QVDCPIETMPVFVRV




Transglucosidase
MSSPQQVYLLPLKDDGSPDVPGGYIYLPAPTNPPYLLRFVIEGSSSICREGALWVNIPEKGESF
176



CAK37087.1
NRSAFRSFSLSPDFNKNIQIDVPITSAGSFAFYVTFSPLPEFSVLSTPTPEPTRTPTHYIDVSP





KLTLRGQDLPLNALSIYSVISKFMGQYPKEWEKHLNGISQRNYNMVHFTPLMKRGASNSPYSIF





DQLQFDDAVFPNGEDDVARLISKMENEYGLLSLTDVVWNHTAHNSKWLEEHPEAGYSVETAPWL





EAALELDTALLKFGQDLQNLGLPTEFQTVDELMKVMNVMRDKVIAGIRLWEFYAIDVKSDTHKI





LDKWKTSKDIDLTDTNWAQLNLQDYKNWTLKQQATFIRDHAIPTSKQVLGRFSRAVDLQFGAAI





LTALFGPHNPSTSDTSIVEESLSKILDEVNLPFYEEYDGDVSEIMNQVFNRIKYLRIDDHGPKL





GAVTAQSPLIETYFTRLPLNDVTKKHKKEALALVNNGWIWNADALRDNAGPDSRAYLRREVIVW





GDCVKLRYGSCRDDNPFLWDFMTDYTRLMAKYFSGFRIDNCHSTPLVVAEYLLDEARKVRPNLT





VFAELFTGSEEADYIFVKRLGINALIREAMQAWSTGELSRLVHRHGGRPIGSFDLDLPSSGSSH





AIASSGLDSGKEKVVHIRPTPVQALFMDCTHDNEMPAQKRTAKDTLPNGALVAMCASAIGSVIG





YDEVYPRLVDLVHEHRLYFSEFSEAPETGLNSLEGGIGGIKKLLNELHTKMGIEGYDETHIHHD





GEYITVHRVHPRTRKGVFLIAHTAFPGQDSRSVLAPTHLVGTQVKHIGTWLLEVDTSQTTKERI





QADKSYLRGLPSQVKTFEGTKIEESGKDTIISVLNSFVAGSIALFETSMPSVEHASGLDNYITE





GVDHAFSDLSLVDLNFALYRCEAEERDSSKGQDGAYDIPGHGPLVYAGLQGWWSVLENIIKYNE





LGHPLCDHLRNGQWALDYIVARLEKLSHKEEHPALGRPAAWLQEKFQAVRQLPSFLLPRYFAII





VQVAYNAAWKRGIQLLGPHIQKGQEFIHQLGMVSVQQTGYVNSASLWPTKKVPSLAAGLPHFAV





DWARCWGRDVFISLRGLLLCTGRFEDAKEHITAFASVLKHGMIPNLLSSGKLPRYNSRDSVWFF





LQSIQDYTEMAPDGLEILDHKVPRRFLPYDDVWFPFDDPRAYSQQSTISEIIQEVFQRHAQGLS





FREYNAGPDLDMQMTQDGFQIDVKVDWETGLIFGGSQYNCGTWQDKMGESAKAGNKGVPGTPRD





GAAIEITGLVYSALTWVAKLHERGIYKHDGVDIGGGKSISFEDWASRIRANFERCYYVPLQPKD





DGQYDIDANIINRRGIYKDLYRSGKPYEDYQLRSNFPIAMTVAPDLFTASKALAALALADEVLV





GPVGMATLDPSDLNYRPNYNNSEDSTDFATAKGRNYHQGPEWVWQRGYFLRAFLHFDLARRTTP





AERTETYQQITRRLEGCKRALRESPWKGLTELTNKNGAYCADSSPTQAWSAGCLLDLYYDASRH





SQS




Transglucosidase
MWSSWLLSALLATEALAVPYEEYILAPSSRDLAPASVRQVNGSVTNAAALTGAGGQATFNGVSS
177



CAK43781.1
VTYDFGINVAGIVSVDVASASSDSAFIGVTFTESSMWISSEACDATQDAGLDTPLWFAVGQGAG





LYTVEKKYNRGAFRYMTVVSNTTATVSLNSVKINYTASPTQDLRAYTGYFHSNDELLNRIWYAG





AYTLQLCSIDPTTGDALVGLGVITSSETISLPQTDKWWTNYTITNGSSTLTDGAKRDRLVWPGD





MSIALESVAVSTEDLYSVRTALESLYALQKPDGRLPYAGKPFFDTVSFTYHLHSLVGAASYYQY





TGDRAWLTRYWGQYKKGVQWALSSVDSTGLANITASADWLRFGMGAHNIEANAILYYVLNDAIS





LAQTLNDNAPIRNWTTTAARIKTVANELLWDDKNGLYTDNETTTLHPQDGNSWAVKANLTLSAN





QSAIVSESLAARWGPYGAPAPEAGATVSPFIGGFELQAHYQAGQPDRALDLLRLQWGFMLDDPR





MTNSTFIEGYSTDGSLAYAPYTNTPRVSHAHGWATGPTSALTIYTAGLRVTGPAGATWLYKPQP





GNLTQVEAGFSTRLGSFASSFSRSGGRYQELSFSTPNGTTGSVELGDVSGQLVSDRGVKVQLVG





GKASGLQGGKWKLSNN




Transglucosidase
MSSDSQLSRSHFLAPPTVIPAPSYIASSAAAQIITADQEFNAADFVADDEGHDSSASALVTPEA
178



CAK37133.1
LSSLNAFLDNILFNILAAAKSTQLVKIRPAVAEVLKPRLAKEMVSAADDELSEYLGGPEDEQLE





FRSGQTSIGEFDLVRSWKLTRLRCMVYTRLGDMEEDDEEEYINQEIIGEDGGGLRRLASHVGHI





TPAASIFLTSIIEHMGEQALIIAGEIARSRLSANLEDEDDLAGTGANRASMDRLVVEDHDMERL





ALNPTLGRLWRTWRKRVRGSNLSRAVSRESLRNRQSLVFGPGSRKSSAITIDEISPRTASSRSV





NEPLPETEDEVDPASVPLPMSEHDIQEIEIPCFLPELDTGDIQTMQAVVAHKVRPHSLMVLTLP





SPRSPSSNGNSPITPRLVNIKSPRHVRSRSLPNTAPADEQPSEVEQPAERTSPTPSEERRRLET





MYEHDEDDERHGEAATKPEAVEQNEPVVPSAGQGAAATPSTSVASVEVAMSDASSTPVSSPSLS





DRDYPETDEVEKHERVEKAQLAPGVETAPGPLAPRTQGVVDSTPAQPTAAADQDASKAADCDQS





TPEDSTPPTVPSSAEDTAVEKASRPVSTSGESAISDSSRSLPGKRGSSVPGVQHQYGRSSPGIA





SVSSGVERAAVQRLPARPSTSVASSVYSKSRRSGSFSSSREKRPVTAGSTTSQVSSKLKGLIGR





PADTGSLRLRTSSEVSRVSTRESAYDDTSGLDELIRSEETIHFTLTPRSMREMELPDSPRWRAQ





QASTDPTDLPKSVEPIPDDMSRSRHSTTSSKSTVDLPPVPKYIQSKPKSIEIPTTGLQQKPAVG





QARDAKHSMESTRDFANFLKSTGPNTPTTPATVDGSPAKSSRLRRLSDATEISKKLSRPASSTV





SVANSARSGPRLEARSAVAPRGDQTSDLIDFIREGPPTAGAHRIPRTVAPFRDTMDSDELQAIE





PGRTAKGAPSVASTQSVAETSLVSVGSRTGLLESTSRTSTPTALAKETKTTFAAPVSVSDDHRP





PRTRRRVPDPYAIDLDDDDELDELLEEPKPKRDEESLIDFLRNVPPPEPTPPQQPLAATANSRR





GSASVKARLRRNTASEKTLMAKPSKTSLHQQPDNYMGGASNYTVKVGMERNAGAMNGAYDLKTP





SVRQTETSALADFLKNTGPPEPPVTKAPAATKSKDSGFSRLFMRRKKVEA




Transglucosidase
MAALVQTIPQQSGTVSVLQTRPSSSSGTFTTSSQPGQQQNPRNSTMSWNPYNNSGSSGNYRVGH
179



CAK37219.1
QVVAPYAFTSTPNPSNPTNMQSRQSLSPHLRPEHRTSSAPSVPQGSASPANVGVNSRFAHPAAG





SVSTSSSNSSVHSYMSKDDSAIPTRQIRTDAPLRPLSTVNLPSPSSSNFMNISSPTVARPSPDR





YRRGNRRPENAAGAQPASTQPNGPAPARSATLATDDSSLHMSTPGLAGVSLDAPRRPGHVRVPS





ADDTTRADKPQTELAKRYRRRSWGNIDNAGLINMQLHLPTSSPTPTAGGHDYFDQSMRPRSAQS





HREVQGSIPSAHSSTSSVRDAGHSESASSSKSGPKTDDSKRPNKPSPLSQPVEVDAKPPTPKAP





QPSTTPQPAPTESLATQRLAEITKGDPKRPGKSRLRRAFSFGSASELLKASSQNKREAMATERA





RRELLQEELGPEQAAIAEQQEASGLGESIYSSHHQGRIFNSSTDNLSVSSTASSASIMLRKMGK





GMKRSTRSLVGLFRPKSVVSTSSTDGVMIEPMAPQLSVVNIEAERKSTAVTSDSQDHALGSSLF





SKVETDAANAVSHEDGALDKSRKSIVGGDRERAEVLAAVRKGILKKTYSDPANQTYVLKSSENL





NSNDSPHSSVPSTPEDQTRSGNRRSDAVKIAGEDDYLSEGRFQTSESKSAPITPQAMMPKSLVF





SPRIQFHETWPSGEYDRRGDIATCNRLTPLLAQQIKEELNNFKMVRNLLPLLPST




Transglucosidase
MFVYCSNCSFALLFLMLLSLLSFTASRTLAFTTSITQDGLLCFPSALEFLLRTEITVPYWAPSG
180



CAK372261
SILRPTAALHAYDCSVCLDPPSKIQEARKSVLMSANALSEIIDIPVSDGGFIHGVIRFYARGDH





LRWLQPPTRDAKFLAPDPYLHSLMIESWRQTLGEMHFWTRAGYLFDVVLAEVKRSEPDNYEFLN





WSGTNYMPTCPYYYVSMSPMVQPVVRSAQGSVDAAQRSSQISQDPSNLVQTPLHSPEFSDDSTR





SSFNTAQTASSCQSFSSPVSQSPCHEDVIQQNVQTSQVSLPFVRMDPSVTLDDPFVIEPISAEG





SWSMQDHIADMKRQFRLPGPMVRNASPSFDSPTPSTTERISAREINRRRDSEKPYDPTPLANDT





SASCDDETWSMEDASEKDASESSFKDAVEAHSDSASSTATGPVVASKNDDDQSLPKCNTNDTQP





TTCTTVNPSLLMFESSHKTYPTIEPSYEVAASRPRSLSPVQNLENELQVGSIGGKDAEDAGSLS





FNDEMKEGSEMDLFSASLDQYTAEQLASRQLTRTPELEESNPNEYGLGFGFQHNLFDGFDFFLP





EDQSELPLESNMIM




Transglucosidase
MLGSLLLLLPLVGAAVIGPRANSQSCPGYKASNVQKQARSLTADLTLAGTPCNSYGKDLEDLKL
181



CAK37273.1
LVEYQTDERLHVMIYDADEEVYQVPESVLPRVGSDEDSEDSVLEFDYVEEPFSFTISKGDEVLF





DSSASPLVFQSQYVNLRTWLPDDPYVYGLGEHSDPMRLPTYNYTRTLWNRDAYGTPNNTNLYGS





HPVYYDHRGKSGTYGVFLLNSNGMDIKINQTTDGKQYLEYNLLGGVLDFYFFYGEDPKQASMEY





SKIVGLPAMQSYWTFGVCPPPPNPITVRVVVYNYSQAKIPLETMWTDIDYMDKRRVFTLDPQRF





PLEKMRELVTYLHNHDQHYIVMVDPAVSVSNNTAYITGVRDDVFLHNQNGSLYEGAVWPGVTVF





PDWFNEGTQDYWTAQFQQFFDPKSGVDIDALWIDMNEASNFCPYPCLDPAAYAISADLPPAAPP





VRPSSPIPLPGFPADFQPSSKRSVKRAQGDKGKKVGLPNRNLTDPPYTIRNAAGVLSMSTIETD





LIHAGEGYAEYDTHNLYGTRLVMSSASRTAMQARRPDVRPLVITRSTFAGAGAHVGHWLGDNFS





DWVHYRISIAQILSFASMFQIPMVGADVCGFGSNTTEELCARWASLGAFYTFYRNHNELGDISQ





EFYRWPTVAESARKAIDIRYKLLDYIYTALHRQSQTGEPFLQPQFYLYPEDSNTFANDRQFFYG





DALLVSPVLNEGSTSVDAYFPDDIFYDWYTGAVVRGHGENITLSNINITHIPLHIRGGNIIPVR





TSSGMTTTEVRKQGFELIIAPDLDDTASGSLYLDDGDSLNPSSVTELEFTYSKGELHVKGTFGQ





KAVPKVEKCTLLGKSARTFKGFALDAPVNFKLK




Transglucosidase
MSLSFSSDVALNATEAAVFLSERDVAGQIPINFVTTSAVSLRAACFGDNIYDRDAAGRCISNLL
182



CAK96369.1
VVGYRRFLVDLYWSSDQRDWMFCPLSLSPDVPVVTVSSISPASSTTTTATSGITATTTATTTTT





TTSETIKATVTAVARSSGSVLYELGPYRCSLDFDLSDLINVFRGFFQAYSSELTVFTRYISLNL





HAAGSATSPDEPASTVTGSQLPTSSEFVSYQFDEHLSSYIYTPSSLASERANLNQSWYQVEDGY





KPITEYFTIHEEPNGDQSTPDGWPCVKYLQLAQEKRLLIDYGTIDSQLQDYNFSYMSDVIFPPN





YLTSTVSVSLDSDGSVDTGCFYDSGATTVSQANNSWAISDYIPIPEGLSENSTIAAMSLVASNL





TACGLTPALNNTLFNQTADTHPQPYTDISLSSSWAWSIGQPANADSSSASFSATDRCAVIDLTN





SGHWRAINCSQVRYAACRVGNNPFTWQLSPTPYTFRDAYDHGCPENTSMAVPRTGLENTYLYQY





LLTRTDVLDPTSAIPNKTKVWLNMNCIDVESCWVTGGPDQECPYASDPQQLERRTVLVAAIAGI





VICIIAALTLFVKCNANRRNSRRNKRVIKGWEYEGVPS




Transglucosidase
MADSKPKPSSIPPWQQSNNASNTDNSSESTSSPTSDDTSRSTLIEQASKFLEDESIRDAPTDRK
183



CAK96386.1
VSFLQSKGLREDEINSLLGISATSTASDTTEEEKAASPDTTTPSSTEPAPAPEPTDNASASSNQ





STPSSSITTPTPSPSTTTPKTNNTRDVPPIITYPEFLSTPTKPPPLVTLRSVLYTLYGAAGISA





SFYGASEYLIKPMLSNLTSARQELASTATSNLQKLNEKLEQNVSVIPESLKNKTANVENDSSST





DTESITSDPTELFHRDVATQTATSDFAATYNNSNKTGTDKDTPADPTAAVTDHLKRLESIRSQL





RECSDTEKESGTLESGMRTRLNELHHYLDGLIYSKPGFNPLSGYGMYSTPGIDSGSGAATGVGK





GEEDAIANFRAEIRGVKGALLSARNFPAGRGGRIGGVAGSIPTGLMRMNRVVNGIGSARPKKER





YKHSPTTFRYYQ




Transglucosidase
MGVGDYVHSKEAGQPRPRTTEVSNQSRQAVAAQARIDVPPTNLVAPVPLPINKSIPLEHYSTPA
184



CAK47557.1
FSEQMPQAPAENGVHRDMFDTDVEGIDESTIAATSVMGAEDAPLQFQLRPATVPQYQEAAPVVD





ERPLHPSRLPRRAYDGKWYENFGDKAMKSAGFDSEDADDASQLTSMAGDDERSDTTEDANYARR





YRSSTEEPLSKRLQSFWSASRRSYKNPEPQAYPEPSKTAASAAPPLLRQSTSDARLSKQALPNR





KVTLPRSMTATPRTRFSPPKPSLLEQLDITPTRRTSGPRPQPGKEPGITSTTTHQHHRHNSDDN





HLFNTSRDSLPPLSTFDMTNIDDLDVDDDNDPINDPFARRNSVQRIVSDPDFQPNKSTITSSSS





QNKRRNLESDYPPEILRQKSFKDLQSEPFDHTPTATASAPVKTTTPAPTPGPNATSDEKMDFLM





NSEDKDRRDFFSTLTMNEWEDYGDLLIDQFSDALSKMKDLRHARRKTAALFEAEIARRNEVVEE





QSADLTRKLEEMRSGGAEVLKGRTP




Transglucosidase
MQAIEQAGSIFTGWISSCLFCLSGRGDDESFHHQQAMKQKGVEREMRVCHTQPHLVPPMNLTDY
185



CAK47704.1
DDLPSPSSQPRVSSLQSWVVEGRTRASRASNRASMSLKRKSTAPVRISGPSEFRRVSMFLTELD





EYRPLELSFNTPGNRLPDLPRFEDFPLDHDRKQVISRPPRALSSTEMGRISQPRTHRPSSSFQL





ARKPVGSGSRRSSLPTQEQLQLLEKNTPITSPLIPHFSQRSSAVTGLTANAVPSTTPRLDLSGG





STSLARNELHRDTHEAPTSTVPRTPTKPSLQDRPLPSIPTEEDSPSSGSTYHPPTTPSESRPPT





TPSENPNQTPTRSGRVTQWLFQTPNKPNFLFSNPSKISDKGPFRIRSRTLSGSTLASTTTNITG





GHKTTPSLASGTTVAPTMQASSTESRNFDLPLGSPFSPKQTFPTVTEEHTYPTIHEGEQQQHQE





PEVFDDMLTQYYEYRHSAVGLAF




Transglucosidase
MKIFILLAIWLLASLGYATSFVGNMAEVYHEITDVLRGPHHAAHFAKLNGGKPKPDKPKGVGKT
186



CAK44239.1
LDDVVTLTVCKTDVSLAPTVGTFSLPTIVTAATLAPTVVVTGVVPTEISIGLPTEFAPEIQTGV





LTGESHSTDTTVVPTNSVVSGLSSFLTGSQTVSEITGSTENHTITTEINASAVTSTFAHTTFPT





ANAGNDHEGMASSAVALLVALIFSLVRI




Transglucosidase
MRTRSQQASPGGFVSLDENAPRRTRSAKNAAQQEPATSEQPPTRSKSQRAPKKTTTTTTTKKST
187



CAK44326.1
AKAQTRKATTTKRTTRQSTRKTDQPVSNEDAQTTHTEEDFATDNTTAEKNTPREVPETVDPTPA





SSENENPDRESLPHVSHPFMVPQPPKESQEIDCFDGPDPRGIGAASCLKSLIDELSSVGSPLSE





RSKTPSWTSEDGTEAALAPRPAQESGATNVETSTTTERVSLPTPAAEERTVEPPAEPTVAEPRG





VAIVTSSEQFNTVSSASAAGSGGVVVVEDERVGALIASFARLSLDDLAPRSSNEAAATLMESTT





ASFGEPVEPAHVTRRSLRASRQEWILGWAQQVPSTGYFHPITGQLVEGPAASVELVGDPASNRG





PPTRRIGVRDYILRRRRREVQVMSPLQEEPQSSPGPGSRAVALNAVPPRGIRAQRAQNTKRLTK





PPVTRKRARTESSDEEAPGPQTPAANKRRNLGPPGSTPYRPATRPRSLTANITPYSERLRRRAA





EKDGRIHSTSLRVSQLLAQQEADRRRQAAESSAPPCSELPRTTFDFSLDDAHETSQGQEQSQSL





QEQSSTPQPPATPERQSGWNIRGLLNSVPRTFTRILPSFRRTPEPTQVQAPPEPSSERISRTQP





PQSSSISQSQAQNSRRSSEEPPQKRRRKSWSLFAQPFDRSLYLGDIPKKDSATSSSAPLESRPV





AKLSAEATTPQESATSDAKKDVAAEGEDSRGREIEEQKQKKRKRSPSPDVIPNPPGCSYGLDLD





YFCYSSESEDEQEPPLPRTEPNKFGRLTRTAVRGALRSERHSSKKVRFDASPEDTPSKLRLRAR





ATDPYRGRHFIGMGNDSEIATPDSPTPAPHAADESSSRRPGFVPNVQGTFQLDYDAFSDDSDSS





GASASANVSASAPIPAPSSATVTQASISESVPSTESRQTPRQAAPAPSTPAKIDEEALARARSQ





AEKYKPKTPSGLRTASRYSSPMTATPDTVSAPVIAPAITPTPSTSQTAPASAPEPEQQTTEDFG





DDEFAREAQWLYENCPSGDLNDLVWPQPITYEEEGFSPEVIDLVNEIWDPSTVDYAYTNIWTPG





LDAFKRELETGASEAAQA




Transglucosidase
MAKSASQIHRAWWKECSVYQIWPASYKDSNDDGIGDIPGIISKLDYIKNIGVDIVWLCPSYKSP
188



CAK47737.1
QVDMGYDIADYYSIADEYGTVADVEKLIQGCHERGMKLLMDLVVNHTSDQNEWFKQSRSSKDNK





YRNWYVWKPARYDEQGNRHPPNNWVSHFQGSAWEWDEHTGEYYLHLYATEQPDLNWEHPPVRKA





VHDIMRFWLDKGADGFRMDVINFISKDQRFPDAPVKDPRTPWQWGDKYYANGPRLHEYLQDLGK





ILKEYDAFSVGEMPFVRDTEEVLRAVRYDRNEINMIFNFEHVDIDHGTYDKFEPGSWKLTDLKA





FFETWQKFMYNNDGWNALYWENHDQPRSIDRYAQAKEEFRTEAGKMLATVLALQSGTPFVYQGQ





EIGMRNVPVEWDMNEYKDIDCLNHWHRLLKHRPDDIEAQKSARQEYQKKSRDNGRTPVQWSSAP





NGGFTGPNAKPWMSVSPDYVRFNAEAQVNDPNSIYHYWAAVLGLRKKYLDIFVYGDYDLVDKDS





QEIFAYARQYENKKALVLTNWTEKTLEWDATTNGVKGVKDVLLNSYESAEAAKGRFSGQKWSLR





PYEAVVLLVEA




Transglucosidase
MAYYEPQGWQAPAARQASWEQPAPPSRSGSSSVSQRDEIPAFSSQFDEVDRAIDNLVKSGKLWA
189



CAK47819.1
APRRDSMPMMMGRPYPDYDPRMVNSMSQRHHSISEFDSRMHPSPNVQGFYASQRFQGRPNEVEQ





MMQAKRRMAAQRERELRNYHQEQQYNRSLLAEMSGNKSDRSLSPAAMSEESRRELLARQHRALY





GNDSPAFFPPAGLADDGTRSESQAGGTPTSSTGVRGASPRNVDPFGLAQTPVQAGADSLGQTAA





SAASLQSPSRANSTSSPSSAINPVFGKYDSADQPVTSTSSPGGADSPSSRQAPSKSMAGPIGSV





GPIGTRPLPQPHAGQVSNPALNKRSTTPLPSPLGFGFTPGDAASDRSVPSVSTAPTTAAATASV





KDTSGGVGLGWGNGSGVWGSKNGLGVQASVWG




Transglucosidase
MLSKMQLAQLAAFAMTLATSEAAYQGFNYGNKFSDESSKFQADFEAEFKAAKNLVGTSGFTSAR
190



CAK49181.1
LYTMIQAYSTSDVIEAIPAAIAQDTSLLLGLWASGGGMDNEITALKTAISQYGEELGKLVVGIS





VGSEDLYRNSVEGAEADAGVGVNPDELVEYIKEVRSVIAGTALADVSIGHVDTWDSWTNSSNSA





VVEAVDWLGFDGYPFFQSSMANSIDNAKTLFEESVAKTKAVAGDKEVWITETGWPVSGDSQGDA





VASIANAKTFWDEVGCPLFGNVNTWWYILQDASPTTPNPSFGIVGSTLSTTPLFDLSCKNSTTS





SSSAVVSAAASSAAGSKAVGSSQASSGAAAWATSASGSAKPTFTVGRPGVNGTVFGNGTYPLRP





SGSASARPSAGAISSGSGSSSSGSGSSGSTGTSATSGQSSSSGSSAAAGSSSPAAFSGASTLSG





SLFGAVVAVFMTLAAL




Transglucosidase
MPCVQAAAETDKSFVQIANADIEELIKQLTLDEKVALLTGDDFWHTVPIPRLGIPSIRLSDGPN
191



CAK49185.1
GVRGTRFFGSVPAACLPCGTAIGATFDRNLAVQVGHLLAAEAKAKGAHVILGPTINIQRGPLGG





RGFESFSEDPLLSGIIAGHYCKGLKEDNIVATLKHFVCNDQEHERMAVNSILTDRALREIYLLP





FMIAIALGKPEAIMTAYNKVNGLHASESPALLQGILREEWGWEGLLMSDWFGTYSTSEAIHAGL





DLEMPGPTRWRGGALTHAITANKIPMATVNARVRAVLRLVQQASRSGIPERALELQLNRAEDRQ





LLRKIASEAVVLLKNDDNILPLDKTKKIAVIGPNSKIATYCGGGSAALNPYEAVTPFEGISNSA





SGGVEFAQGIYGHQNLPLLGKRLRTQDGLTGFTLRIFNDPPTVANRVPLEERHETDSMVFFLDY





NHPKLQPVWFADAEGYFVPEESGMYDFGLCVQGTGKLFVDGKLLVNNANVQRPGPSFLGSGTME





ERGTLELTAGRQYKVHVQWGCAKTSTFKVPGVVDFGHGGFRFGACRQLSPHTGIEEAVQLAASV





DQVVLVAGLSAEWESEGEDRTSMGLPPHTDELISRVLEVNPDTVVVLQSGTPVEMPWIQNAKAV





LHAWYGGNETGNGLADVIFGDVNPSGKLPLTFPRHVKNNPTYFNHRSEGGRVLYGEDVYVGYRF





YDEIEIDPLFPFGHGLSYTTFELSGLSFERDSNSLHAICTLRNTGSRAGAEVIQLYVAPVSPPI





KRPQKELKEFRKVWLEPGAEDVVQIPLDLVRATSFWDEKSSSWCSHSGTYRIMLGTSSRGAFLE





SPIELSETTFWSGL




Transglucosidase
MSFRSLLALSGLVCTGLANVISKRATLDSWLSNEATVARTAILNNIGADGAWVSGADSGIVVAS
192



CAK38411.1
PSTDNPDYFYTWTRDSGLVLKTLVDLFRNGDTSLLSTIENYISAQAIVQGISNPSGDLSSGAGL





GEPKFNVDETAYTGSWGRPQRDGPALRATAMIGFGQWLLDNGYTSTATDIVWPLVRNDLSYVAQ





YWNQTGYDLWEEVNGSSFFTIAVQHRALVEGSAFATAVGSSCSWCDSQAPEILCYLQSFWTGSF





ILANFDSSRSGKDANTLLGSIHTFDPEAACDDSTFQPCSPRALANHKEVVDSFRSIYTLNDGLS





DSEAVAVGRYPEDTYYNGNPWFLCTLAAAEQLYDALYQWDKQGSLEVTDVSLDFFKALYSDAAT





GTYSSSSSTYSSIVDAVKTFADGFVSIVETHAASNGSMSEQYDKSDGEQLSARDLTWSYAALLT





ANNRRNSVVPASWGETSASSVPGTCAATSAIGTYSSVTVTSWPSIVATGGTTTTATPTGSGSVT





STSKTTATASKTSTSTSSTSCTTPTAVAVTFDLTATTTYGENIYLVGSISQLGDWETSDGIALS





ADKYTSSDPLWYVTVTLPAGESFEYKFIRIESDDSVEWESDPNREYTVPQACGTSTATVTDTWR




Transglucosidase
MDPNTSDRLKRLQMENLGTARREYISTADDRRHKKARLEDIQAIRMTNVPGSAAQRMATVNQGR
193



CAK47899.1
LEDWANVHKTIGDTEDLENLDSLLDGQSHRLQLSAIIRESGGQKYIGSQRRSGSAPATRSLHPV





GRGGGVIGTRGRGTRQSSLPPSNPTPRGHVTQPPKRSHGDAALDDNDFYRTAREGNASRKEEAM





RAQNRRSSVRRPTSSTTRPRRPVSQVDYSSMLSQPQSFLAAARSLVSARTTPAAPTPASQISRD





GGRSEASRKSSSPMDTTDRPTVQKTQQEPKPQMAEPPKPFTRPVVQLPAIPPRCTAVQQESTTK





ADEPLPVLPSSAVLDATSQDQGSASMSLGSTEDGQSISGIPESQTGTQEKVNSDLSTAAAPVPD





IKDTSPKAATDVKEAILLDFSYTPPEQSIHGQSPAPTEVLTPSLEDLRGLDFKQDIHPKFPTRR





RVDFDMSSDKREASTNVHDLMPTKQYDKAEASEDLHRQINMLCELLQSTSLSGEHRESLKQCKT





ALEGKLHGAYDSTGKRTQTQGDPFLGKPVLETLEAAAEPDEQPQSTISGLGIQNVSMDNFTPNK





AETIVEPDTQATPSVGEMIMPTSVENARLAMASPSPSSRLNVTAPPFVPKTPFRAQSNSFSSDS





NATCVPETPCPHRRVSMPEGHIIGDHLLPGRRRETISSGTEPLAAKQPPANEVTEPRFKFSIPP





KISRKLTIKTPVREGFGKEETPGPVAPRLASGNIPKPAPKPSAALQQSVHAPKAKPSSVLGGLE





SSRYASPSSNKPFR




Transglucosidase
MPPSTVFAYWRREHRRSSASPVSPSLQPTSKAPVTSNPPQLPGLSSTRPNNLTALGASSVESSS
194



CAK38738.1
PQVPNNPHEDYYDATKKTIAVSVAPANAPGSSSANLAIPSSSSDSHTRPLSISDEQDVTTTTSQ





SNYSQSSIAPPRSDQSDGDSPKPSSPFRLSLGKSLLNSHNLTSDHYNKRSSTPGLSSSGHFRFR





TSPDISPGDRMALSHKDKDKEYKYEGAGNRRSADRDGSSEQAHHKSGRTRLHLLNPMSLLARRR





SSNLASLRTEDTRVGARNIVPAIPDDYDPRIRGNIVHDFSAPRPRRNLSTAPVLMHDVNNQSSS





ADVTYNGTGNFAHGNDQSAQSGEQRKRHTQYSPVFREHFEDDQKVLQVESKAYLQSSLLTAQTN





AENDPHTLPVFARKLPSKIPEQEVSPEVPSDQTTLKQDSQHSPPNNSRELAQEDTDTIEVIPHQ





PSGLPKHLKSNASRFSFDMNGVESSAQEKLLEEKHKEKEAARRAKARMEGTSFSDGEDDFDEDL





LDDMDDLEEKIPGVNVDADEDDDFSGFSGPGNALNKPWLAPELSPIIASPLPTGSTNSQNVQEL





AQGPLAGISAPLPVSDPAVSDVTTNFQALSVATIAPNNAPQVAMGSHPPAPQPIEDDDDLYFDD





GEFGDLSTEDMGEKFDESIFDDETSHLYERKPVVQQPVPAPVPPPDNGTGSTNPLDVTAEHDEF





TPEPDYDGGLRHVPSMASDYRKGSIRVYGQTRESLANLGSAKAQGGVLSEHNLEAFHNALAKAA





SEAAASDRFGREASISEQSLGQESTAQTMDTPSGLVSDDSRLSQTVDMAAFEEVFEDFSYDDND





DALFDDPIIAAANAEALENDDEGFYGQEFGFYAQAHGGCNGELTNGGYFGPRGVEGVNRSFSSR





GKFREPSLTPITERSEWSTRNSVISLTAHGAAHSNPIASPGLAQLVDLGAMDDEMSLSALMKLR





RGAWGGSNGSLRSSSGSPPLLHSTSNRASFISDASPTVYTAPPDAFGGSATESPIRESDKFRWS





LNNTEQRVGQSAAGEREP




Transgiucosidase
MLVEPLIRTDWPVWACKPHPHLVGPEAVAKNRNRSALQPSLAPSQSLVVLAQSNLPPAFQPSAL
195



CAK38790.1
SHGSVFGWPCILPWRSSGNAGDEPPSGPYSYSGWPTPLTSSNQPSPSRREHAVQPPPLTTSLGG





HQFQGLGLALGSGYSSTPLSSTSLSSPFTQGQSPAVGSPGGAAIGSSPMASRQYNVPYNPQDWG





PVGSGSMNAGQATYTPPNSMLRIVSQPRSTGPHSDVSLSPPPPPYSPPSQQHQRENVSQNTSSM





GSTSPSISSSYNGAVRAGVDAPSEYRQRRLPRTRPLSMFVGSESSHNRRVSLPPPPPLPPGLSS





RSSSQNRSETYREPASVMAGPGPHIVVSPDNLHSTQLSDDSNMLEPTQPFDTDRPPAARRAVSA





GPAVNSASSSRAHSQSGARSPPGTSWEPGMPLPPPPPGPPPATRSQSVNGLSDSSSSRNSQGPV





RGGRARPPPVLGTSLDSIPPTPAGWVDETIDVKPRTERQPLTIDTATTSNTNGPESLESSRASH





NPNSGGLFRSPAIKDPNAKGIRERRIERRNRQSQVLDSLSAVSMSSNPWAEALEQLKPSNLVLG





ESSVDTDNGRNPASAKAAPLSTRSISSDGPQITSRSRASSGGLFSNRSCSTPKPEPSPQAPTSN





SRFAQTPPFSPGTERSSAFPKRTSPALPPKALPTPPLQSGSETTPSRPGSKEERPVSHILHLPN





EPVTTVSPLAPRRVSAQQNPSLDSVIKRDDDYVRNAIQRHREFIEKEAGTMDEKEALRLFADFI





ISESQIRRERYAKVWDLDSFDVESVRRKLFVSPPKTAPVPQTSQVVPGPSSRRASNPTAPKLDI





PQVRPESAWWNNYQPCLSPIASLGLSNDEMSSRGRAPSRWWESKTGSSSEGGERRVQRSKRETK





YMGLSRGALLWEESQGSSDTGNAGTSNGGNQYAAYGPDEYPPEKVGWHEEPALEDYSNNVRLGS





SRRFEEVQRMDVSRLITLPPPYPRHYPAVNNSHPDLVTYRTLVRSITDLSEIKTARQRHQTEMD





GLFQDHQARVREGRRQFKANIQSQIQQGSITFAEAAEAEAALIVEENRLERDLIKGGLDTYQES





VFKPMRAILADRIDRATACIDELRGRLFDDARSETPDQTQEEGDEKPELLEKLTQLKWLFEARE





QLHREVFDLISDRDEKYRAVVLLPYKQASNEDKVRETNEFFVKDALDRRVDYEANALARLESFL





DVIEGNVARGVEIQLSAFWDIAPSLSELVQQIPEKLRGFTVQIPANEYEENPSYRAHPLQYLYT





LVSHAEKSSYQYIESQINLFCLLHEVKSAVMRASCNLMEAERIRLGESEGKVQQEMQETRTDEE





RTLTSDLKDKVATVEGQWAEALGSGIQRLRERVKEQLMVEDGWEDLEQLEQA




Transgiucosidase
MVTQSSLLDRVWLYTHKRSPILLALHPPSQSSLISSEGHLEEQTEPTVQSRRYIPNTIMNTNMH
196



CAK38810.1
PQSLDSVRSGEEAKSENEDSNTRSGAISLIRARTISRSSTPRDTQEGCSSEQADDSQQAVSIPR





IVVMEPPGDSKAKRNKMKKNKYKKKKLLLGDSESETSGSQGSGLNGKPTAESNTSNDTSSHMKE





IEDLSHSAWMPAKGLNSGGCANKEKPMSGSNVSERCTVDSDKKRDLIESLSKLDIKGKQRVWQV





NAVTGNDTLEEINTQRGPQPVDRARRTVLAPSYATVLSGKRASIEGPSSMPNNSDSLLPTTMEF





PKLTDKTSAGQDSSPEARSGHAKLSPIPEISGEFAGDNSNDIPLSQTDLETAGSSSLDNPISTP





VSTVSTGWSSTALQISPQTTEPSSEPSSSKAVSHRHATSLHHAHPLPPTPPSSTHSHSLSTANA





TITNAQGTLSAQKPEGFFWQLDSHGFPCAKAHCEKRCNLWDGATVICPRCGPYSEVRYCSRAHL





LEDIKPHWLYCGQMVFQHPCRETSIPRRVRAGPPLIPCLHHYDMPERHRQAVHFNMNAREGDYF





IFTDWLDFVTAGLPGDKTAIRCSNRIMYVVKFDDAAEKDRFRRVLAACLFMTIELPDLTDYLYR





LIRDKLRLANAPNHIEPSLRYQFLQEFNVTIQERITGSRHACETDWDGRNRRNCQDPVCRAEYR





RLLGSVGGRGYSRMIETLESTYWILRAARTTHPSVKDAMKRMMGEGYAEVAEEDRRAFRRGDGW





DGAGSGDMEIEGFNEGDE




Transglucosidase
MLATPMTPQASHPSSSNMVCSLASTTTTTSSSSSSSSSSSSSSATQQTTISSRPKLTLQTTSLP
197



CAK38817.1
RTFGTSSTGLSLSIAAGTASPTVRNTFKNAYEVTGPSSATASPSSKHPSNLRFSKPSSPFTTHN





PYQLPLGVKSILRNSPLEPTCRRRAGSVATTGPNGGPSARRVFFPAKKQVSYRNPLEEEIQTVH





YTARHSDLHDDPEPALEPQSQPQQPEVTSSDEDSDSNASGCPSDTSTSEDEPETGLGKTTSSPI





KRKKRKHSNAERQVRAVALMDGIAGPSNPDSLTPQTPRRKRAKRRCEWRWTLGPLENRDKLLHP





VQDETGPTSSASQPETIPHESETETPSSDPPLSSASTTLYHSSPSSSVSSDVETENDEWQTHTT





HELECAHADQ




Transglucosidase
MAFWGVAEREVIERAVALEWADAAQVDERKESPNIRGVLSAGPSQPSRGDASEIKPGFGFSSAL
198



CAK38846.1
LWGAIFGAFGWTRVLRPVGRIPTRDSCSDRSDGTSWKRYLDLTLLSLDEPPTKGTKELEGQRKS





QRARETKWALGSRGEKWALPELIILDD




Transglucosidase
MVKLTHLLARAWLVPLAYGASQSLLSTTAPSQPQFTIPASADVGAQLIANIDDPQAADAQSVCP
199



CAK44692.1
GYKASKVQHNSRGFTASLQLAGRPCNVYGTDVESLTLSVEYQDSDRLNIQILPTHVDSTNASWY





FLSENLVPRPKASLNASVSQSDLFVSWSNEPSFNFKVIRKATGDALFSTEGTVLVYENQFIEFV





TALPEEYNLYGLGEHITQFRLQRNANLTIYPSDDGTPIDQNLYGQHPFYLDTRYYKGDRQNGSY





IPVKSSEADASQDYISLSHGVFLRNSHGLEILLRSQKLIWRTLGGGIDLTFYSGPAPADVTRQY





LTSTVGLPAMQQYNTLGFHQCRWGYNNWSDLADVVANFEKFEIPLEYIWTDIDYMHGYRNFDND





QHRFSYSEGDEFLSKLHESGRYYVPIVDAALYIPNPENASDAYATYDRGAADDVFLKNPDGSLY





IGAVWPGYTVFPDWHHPKAVDFWANELVIWSKKVAFDGVWYDMSEVSSFCVGSCGTGNLTLNPA





HPSFLLPGEPGDIIYDYPEAFNITNATEAASASAGASSQAAATATTTSTSVSYLRTTPTPGVRN





VEHPPYVINHDQEGHDLSVHAVSPNATHVDGVEEYDVHGLYGHQGLNATYQGLLEVWSHKRRPF





IIGRSTFAGSGKWAGHWGGDNYSKWWSMYYSISQALSFSLFGIPMFGADTCGFNGNSDEELCNR





WMQLSAFFPFYRNHNELSTIPQEPYRWASVIEATKSAMRIRYAILPYFYTLFDLAHTTGSTVMR





ALSWEFPNDPTLAAVETQFMVGPAIMVVPVLEPLVNTVKGVFPGVGHGEVWYDWYTQAAVDAKP





GVNTTISAPLGHIPVYVRGGNILPMQEPALTTREARQTPWALLAALGSNGTASGQLYLDDGESI





YPNATLHVDFTASRSSLRSSAQGRWKERNPLANVTVLGVNKEPSAVTLNGQAVFPGSVTYNSTS





QVLFVGGLQNLTKGGAWAENWVLEW




Transglucosidase
MLANSLVVLAAIVASILNPVLGAPALDVGVTEPQAEPKYVFAHFMVGIVENYQLEDWITDMKAA
200



CAK44966.1
QAIGIDAFALNCASIDKYTPTQLALAYQAAQQVNFKVFISFDFAYWSNGDTGKITAYMQQYANH





PAQMQYRGGAVVSTFVGDSFNWSPVKQATSHPIHAVPNLQDPAAASSNSQRGADGAFSWYAWPT





DGGNSIIKGPMTTIWDDRYIKDLAGTTYMAPVSPWFATHFNSKNWVFICENLPTLRWEQMLSLK





PSLVEIISWNDYGESHYIGPYSANHSDDGSSKWANGMPHDAWRDLYKPYIAAYKSGDSKPTIPQ





EGLVYWYRPTPKGVNCPEDNMPAPNGFQMLSDSIFVATMLSSPATLTVTSGSLGPVKVDVPAGI





VTTNVTMGIGAQTFQISRNGQVILSGKGGLDVADRSKYYNFNVFVGSVMGSSAAGNASRMLLLL





HTTLLKVLLSGDKQVNVCSTTGSKGVICHLLIPDQVTSLIIPKFLQHIVQGKKYN




Transglucosidase
MKRLMYLLVVLLLSYVVCALPYDDHGKKRDLGPLSDLPGGDVIVWVDQAGNALANNVVGGGNSD
201



CAK47997.1
PTATADNSPTTLPPILSTLDGDLDLSPAVPLPASTNLPKTGNYRRFGISYSPYNNDGSCKSQDQ





VDEDLDKLAQYGFVRIYGVDCDQTNKVTKAARQRNLKVFAGVFDLQNFPSSLDYITGAANGDWS





VFHTINIGNELVNDGKNSAADVVNAVNTARSKLRAAGYQGPVVTVDAFSVMIQHPELCQASDYC





AANCHAFFDNNNTPDKAGQYVKDQANKVSKAARGKKTLISESGWPHNGQPNGKAVPSSLNQQKA





IASLQQTFTGEDELVLFTAFDDLWKQDSSGTFGAEKFWGIQKH




Transglucosidase
MPHEERVSSHVRQLLQSLTLEEKVALLAGKNMWETVNIDRLHIPSLKMTDGPAGVRGSKWTYGS
202



CAK4847.1
LTTWIPCGISLAATFDPAMVEQVGSVLGQEARRKGCQVLLAPTMNLSRSPLGGRNFESYGEDPY





LVGVIATAMIRGIQAHGVGACMKHFILNDTETRRFNVDQTIDERTLREVYMKPFTMVLNDPAST





PWTAMVSYPKINGLHADISPHILPRLLRQELQFDRLVMSDWGGLNSTAESLRATTDLEMPGPAV





RRGERLLAAIRAGEVEVAAHVDPSVRRFLQLLERTGLLGDATKSAEHSEAATDDPIFHRIARDA





AQSGLVLLKNDKGILPLKPTTLQRVAIIGPNACQPTAGGAGSAAVNPFYVSTPESCLRDVLHAA





NSELQVSYEPGIPSSLRPPLLGKLLTVPDGSRKGWQVSFFEGHALEGPVVASSMWDDSLIYLFS





DGDVPAVLDDRPYSYRATGVVTPQESGRYTWSLANTGKAKLFVNDELLIDNTEWTGLTGGFLGC





SSADKTASVYLEAGRAYQLRVDNVVTLPVVEAFDNTLFPRISGVRVGLALEQDEPEMLAQAVAA





ARQADVAVVVVGHNKDSEGEGGDRATMQLPGRTDELVAAVCAANPNTVVVVQSASAVAMPWVDA





ASGLVMAWYQGQENGHALAAALLGDCDFSGRLPITFPRRLKDHGSHAWFPGEAAQDRNTFGEGV





RVGYRHFDAQGIPALWPFGFGLSYTRFQLTNIRVCGRVEGRSPESQPVLIQARVCNVGGRDGQE





VVQVYVAPSAGIREAGEMSFPKTLGGFCKISVPAGDSREVSIPIRGSELSWYDARVAQWRLDAG





KYACWVGRSSSHIDAELEIEVAEGEDTRQGTLE




Transglucosidase
MQVLRCGIHGFHEVLGIDVDEIRFYWTIETDDKHASQLAYRVVLSTDEAAVQGDAIVESKLAWD
203



CAK39248.1
SGRVMSNEQRNIICKPDNGFQSTCSYYWRVTLWDQSERPHHSAVNHFFTAYPRSHLLPPYSMNQ





TYMPHTSLIFRSWFEDEPNRWKAVWIGDGGDKPIYLRKAFDLAQPPARAIMFASGLGHFNMTVN





GSPASDHRLDPGWTNYHRRVQFTAYDVTAQLQTGANVLGAHLGNGFYAGDKGEDRFFWPMYEDN





TYVRYGNELCFFSELHLFYPDGSHTTMISDPSWRVRRSATSLANIYASENHDRRQYPTGWDTPD





FDDADWAFAKPLTGPRGHIYYQTQPPVVLHETFQPVKITEPRPGIVCYDLGQNASTMVRVEVEG





PRGSEIIVRYSETIQEDGTVLMPDPLFKEYETGVFSRIHLAGTGAPETWEPDFSFTSARYIQVE





GVSLDGSDGRPVIRSVVGRHISSAARRLGTMQTDKEDVNQLLSALSWTFSSNLFSYHTDCPQIE





KFGWLEVTHLLAPATQYVRDMEALYTKILDDILDTQEPSGLVPTMAPEIRYMCGPLHDTITWGC





AVCLLPDILREYYGSTHVIAKVFPAAVRYMEYMRTKERRGGLIEHGLGDWGRGIAFGNNQANIE





TAIYYRCLQCVAMMARELGEMQKAKEFEQWAARIYAVYNRHLLVTDDASRPYAYYTSLDNYPAR





DRDAIAQAMALQFGLVPEQHRKDVMAAFLDDVADGRIRAGEIGLRFLFNTLADAKRPDLVLQMA





RQEEHPSYMRFLRRGETTLLEFWQDECRSKCHDMLGTIYEWFYAAVLGLKPTGPAYRTFVVDPP





YNAEFKHVKGSVDCPYGTIAVEFTRNEQGQAVVNVRVPFGTTAIVKLPRSGKSSAYCREGEESR





AVDGGEVSLSHGVYSIIEG




Transg1ucosidase
MPSTYLGALATLAVFPCLGQARSTWPLGSGLELSYQASQHQISIHQDNQTIFSTIPGQPFLSAS
204



CAK39259.1
AGKDQFVEDSGNFNITNVNQARCRGQNITQLAGIPRSDSVKNQVAVRGYLLDCGGEDIAYGMNF





WVPRRFSDRVAFEASVDSEANASVPVDRLYLTFASHALEDFYGLGAQASFASMKNRSIPIFSRE





QGVGRGDQPYTAIEDSQGFFSGGDQYTTYTAIPQYVSSDGRVFYLDENDTAYAVFDFQRSDAVT





VRYDSLSVHGHLMQADTMLDAITMLTEYTGRMPTLPEWVDHGALLGIQGGQEKVNRIVKQGFEH





DCPVAGVWLQDWSGTHLQSAPYGNMNISRLWWNWESDTSLYPTWAEFVQTLREQHGVRTLAYVN





PFLANVSSKSDGYRRNLFLEASQHRYMVQNTTTNSTAIISSGKGIDAGILDLTNEDTRAWFADV





LRTQVWSANISGCMWDFGEYTPITPDTSLANISTSAFFYHNQYPRDWAAYQRSVAAEMPLFHEM





VTFHRSASMGANRHMNLFWVGDQATLWTRNDGIKSVVTIQGQMGISGYAHSHSDIGGYTTVFEP





PTTSNSSGAIPRSAELLGRWGELGAVSSAVFRSHEGNVPSVNAQFYSNSTTYAYFAYNARLFRS





LGPYRRRILNTESQRRGWPLLRMPVLYHPEDLRARQISYESFFLGRDLYVAPVLDEGHKSVEVY





FPGHSANRTYTHVWTGQTYRAGQTAKVSAPFGKPAVFLVDGASSPELDVFLDFVRKENGTVLYA




Transg1ucosidase
MAGVNRSFSYSRGDDALLRDDEREISPLRSAEDGLYSTSYGDVSPLSAGVQAQNRPFDRGLVSV
205



CAK39383.1
PEGQTLERHMTSTPGMDNLGPASVGGGISDVPVRNLPAERDFNTTGSDNPYIPAPPDGDIYPSS





EAVRYRDSYSSHTGLGAGAPFAEHSTPGTTPSQRSFFDSPYQGVDAGPYQRHSAYSSHDYPLVI





NPDDIADDGDDGFPVHPKGAADYRSNANVPGTGVAGAAAAGGFLGKFRALFKREEPSPFYDSDI





GGGLGGAEKAQGGRHIIGGGSRKRGWIVGLILAAVIVAAIVGGAVGGILGHQEHDGDTSSSSSS





SSSSGTGSGGSDKGDGLLDKDSDEIKALMNNKNLHKVFPGVDYTPWGVQYPLCLQYPPSQNNVT





RDLAVLTQLTNTIRLYGTDCNQTEMVLEAIDRLQLTNMKLWLGVWIDTNTTTTDRQISQLYKIV





ENANDTSIFKGAIVGNEALYRAGSDVASAETNLIGYINDVKDHFKDKNIDLPVGTSDLGDNWNA





QLVSAADFVMSNIHPFFGGVEIDDAASWTWTFWQTHDTPLTAGTNKQQIISEVGWPTGGGNDCG





SDNKCQNDKQGAVAGIDELNQFLSEWVCQALDNGTEYFWFEAFDEPWKVQYNTPGQEWEDKWGL





MDSARNLKPGVKIPDCGGKTIT




Transg1ucosidase
MSRNFHPVNTNFPSTTTLTADPDIIPSPEDNRNNYTSTQSYFCRQDVSTNPTFNSNDQALLDNC
206



CAK96650.1
SNIDVSQLVTEANCFWGSRSSTLPKNEGYPSVEEYPSTYLMGNHGHGYADENMHETSAPYGPCD





HLQVAGAGMDMEMRMTGNVMGKVDETYEANQAAMLAALGMTHLDEGVSHAADDDAKTSSSVTVD





DDPEWKEWKKWKEREANGGVRLPPEERMNQADVAAWLGMDSKEEHGVFQTQEEDDYDEDETISY





VSDDDMMEMEEGMEDDEVVVNVVEGPSRQVLISSQTGAHAEYDTASFGCDFEEQEEQEDSEEQE





NEERWGGDGQPPPSLFPRFYSDNAETGLIIELSSADSDAEETMSDPSPSSLSSSSSSSPSETPL





QPHLQEAYALPWTTSPSNTSTITSTTSTPTDTSPVPTPFQPDPHPHPHPHPTTQHYLPPPSSQT





TPPFTLLTDLHTHLTHTPQHRASHLRNLASEISNTMYFVNSHTISGDFSAEDAAPVDRVMRIVR





EGELKMRYQERYERQKGKLVKRERRVAEREDRVSKREEMVSRREMGVAGWWEVWRERGREVWEG





VEGEMMGTMEGNGYRRNGMDTLQRTVRVTREVVRRMDRIVDEGEVDGDGDVEGGVEVRGAYKID





YNCDDDIVGSTSYRIEICNPPQRSSAS




Transg1ucosidase
MSSTPDDKPQRATAAQLANRKIKEVRRRRPNSAAPSAPAPSFGGGPFASIDPNTVSSTPASSQT
207



CAK40060.1
ASNGFTFGQSQNQSFPGASSAPSQNGGTPFAFGSGGGGSSSSSFNFSSSFGGTSSASNPFASMN





TTTSDQSSKPASFSGFQGSLFNIPPGGNQSPAQQPLPSGSIFGTSSQSNASTGGLFGASTNNGP





SASGAATPASGSIFGQNNAAAAASTPSTNLFGQSSVKKPSPFGQSSAFSGDDSMQTSPDAKGSG





SQQKPSIFSSAPPAQPSFAGAGSTSLFGASASSAAETPSKPVFDFKATTPSTSLFGGAATPTPS





ASTPAPAAVTTAAPASSSLFGAASPAKPSTPFQNPFQSSNLFQTTPSTSQKPDEEKKEEPKPAD





SQPKSPFQFSASTTTPGSLFAKSDAPASPAPAAPSTGLFQTSSTKNLFEPKPPATADAEQTKAP





ANPFGGLFAKPATPSKPAGEQQPLPSSSTPFQGLFSKPSTSNDAAKTSEPEKQATPGPMSFAPS





SGGFSQTSNLFSPKPAASPAPAAETQATAASTSAAPVDSPIKVNGANSSPSVFTNGNTAPSTFG





QMQTPKLAGKTTDPKTSEDAEMLYRMRTLNECFQRELAKLDPSSQNFDAAVQFYMRVRATLGAS





VGSKRKASGEAEDAVATKKARPFGIPSEKADTPKENSTTPAVSAQSSTPFKGFGTSQASPASSK





RKSIDEGDDNSPAKRVNGDSSTANIFAQSFSKSKTEAEKPEPSVVKPSTPESTKPALFSTTPTT





APAKPLFSLSDSASKGSASTSLFSSSMSSATSTSAASGGNAPKNPFVLKPTSSEGSSTGSAGGT





DFFAQFKARSFEDAEKEKEKRKAEDFDSEEEDEAEWERRDAEEQRKKQEQFGTQTQKRAKFVPG





KGFVFEDESNESPAKKAEDSSSTTPGAGTIFSSQNNTAVKSNNIFGHLSATPSEAEDNDNDADD





TEEASTPGDESDDAAENAVAADKKAESADSSAKEPEAGGRSLFDRVQYGEDGKPKRQGEEEPKG





NVSTLFGSSNFSSSFNTPTSLTPASSGESNLAAPKPATTNLFGAPSTTSSIFGTPLSGSGNSTP





SIFNAAQNATKSTGDNTWKPDSPIKFASDSASASSSKPDSGSATPALEAPKPFSNLFGAPPSLT





KSSTSKDAQPSLGFTFGTPGQSSPSVFAPSTLTSAAPSRSTTPGGASDTGAEESGDGDGAESLP





QLDLTRGGAGEENEDLVAESRARAMKHTTGTGWESQGVGFLRVLKDRTTSRGRIVVRADPSGKV





ILNTRLMKEIRYSVAKNSVQFLVPQSEGPPQMWALRVKTNADAERLCKSMEETKN




Transg1ucosidase
MSNRWTLLLSLVILLGCLVIPGVTVKHENFKTCSQSGFCKRNRAFADDAAAQGSSWASPYELDS
208



CAK40395.1
SSIQFKDGQLHGTILKSVSPNEKVKLPLVVSFLESGAARVVVDEEKRMNGDIQLRHDSKARKER





YNEAEKWVLVGGLELSKTATLRPETESGFTRVLYGPDNQFEAVIRHAPFSADFKRDGQTHVQLN





NKGYLNMEHWRPKVEVEGEGEQQTQEDESTWWDESFGGNTDTKPRGPESVGLDITFPGYKHVFG





IPEHADSLSLKETRGGEGNHEEPYRMYNADVFEYELSSPMTLYGAIPFMQAHRKDSTVGVFWLN





AAETWVDIVKSTSSPNPLALGVGATTDTQSHWFSESGQLDVFVFLGPTPQEISKTYGELTGYTQ





LPQHFAIAYHQCRWNYITDEDVKEVDRNFDKYQIPYDVIWLDIEYTDDRKYFTWDPLSFPDPIS





MEEQLDESERKLVVIIDPHIKNQDKYSIVQEMKSKDLATKNKDGEIYDGWCWPGSSHWIDTFNP





AAIKWWVSLFKFDKFKGTLSNVFIWNDMNEPSVFNGPETTMPKDNLHHGNWEHRDIHNVHGITL





VNATYDALLERKKGEIRRPFILTRSYYAGAQRMSAMWTGDNQATWEHLAASIPMVLNNGIAGFP





FAGADVGGFFQNPSKELLTRWYQAGIWYPFFRAHAHIDTRRREPYLIAEPHRSIISQAIRLRYQ





LLPAWYTAFHEASVNGMPIVRPQYYAHPWDEAGFAIDDQLYLGSTGLLAKPVVSEEATTADIYL





ADDEKYYDYFDYTVYQGAGKRHTVPAPMETVPLLMQGGHVIPRKDRPRRSSALMRWDPYTLVVV





LDKNGQADGSLYVDDGETFDYERGAYIHRRFRFQESALVSEDVGTKGPKTAEYLKTMANVRVER





VVVVDPPKEWQGKTSVTVIEDGASAASTASMQYHSQPDGKAAYAVVKNPNVGIGKTWRIEF




Transg1ucosidase
MPLRPPASPLSLETISPSPPDLDSDPLIASDDDLDDDDRAARDQRIEKLAQAYCHGTPLFILSA
209



CAK96888.1
SLRGPFENGWANPWKKDRRTTGGGHVGIHSEHSERPIIPETIPQKRPLYRESLGISRSKSAVPL





SDPTYSSKKRESQGGTAGEPSSKRPRDSRGRTSSSNNTPKPIALHKRTIETSGHSDALTPFQHT





EQSWLKRDRAEINFRKVDPPTSPTTTVSSRHREGGHYTIQVPGTDYRVTTKNILRARTDSRDGT





TFDSHVPDSVKAQLTSRHPGVRPETEDHENSICVLSSTSHLSKFEFRRRKRSADPEHGTTSPVP





MQLENEQVSHQTTVVEDSRVSSQPAPVPPNTTSMQAIEVDMQVNEDNSHHPNVSERSGTVDHQG





NSQSRKVSSTTTAEGVNISDKYPSAQRVSVNPALAENVTSLQTISAVKPNSECDNDTIPDLHFN





TQAALLHAQKSFQNDLESQVPNPGETHNQPSSPANDITPFHRMNTSRIGKYSRAIHPGTAHMPM





STQCIIDAVTPFTFSTEKKARSRFISPQMPSSSRIDRGTATPNTGSPLSSESEDEDEDEDPTIL





PHKSPAAQQQTPDDTQEGSALPMALSHSHPTTVQDGQGVAPGPDSFNLSQAIADAGSWLQQSFD





INHEIQQCRSSAKSRPSSSAGISRSASGTILGHSTLDSVLLC




Transg1ucosidase
MPGHSRSRDRLSPSSELDDADPVYSPSVYQREHYYNNDSLFDSADDDYTRTPRNVYSYETHDEY
210



CAK45960.1
HDDDDDDDDVHEHDHDHEYDDKFEEPWVPLRAQVEGDQWREGFETAIPKEEDVTQAKEYQYQMS





GALGDDGPPPLPSDALGRGKGKKRLDRETRRQRRKERLAAFFKHKNGSASAGLVSGDALAKLLG





SQDGDEDCLSHLGTERADSMSQKNLEGGRQRKLPVLSEEPMMLRPFPAVAPTGQTQGRVVSGAQ





LEEGGPGMEMRHRGGGGPPAEGLLQKEGDWDGSTKGSSTSARPSFWKRYHKTFIFFAILIVLAA





IAIPVGIIEARRLHGTSGGDNSSNSNLKGISRDSIPAYARGTYLDPFTWYDTTDFNVTFTNATV





GGLSIMGLNSTWNDSAQANENVPPLNEKFPYGSQPIRGVNLGGWLSIEPFIVPSLFDTYTSSEG





IIDEWTLSEKLGDSAASVIEKHYATFITEQDFADIRDAGLDHVRIQFSYWAIKTYDGDPYVPKI





AWRYLLRAIEYCRKYGLRVNLDPHGIPGSQNGWNHSGRQGTIGWLNGTDGELNRQRSLEMHDQL





SQFFAQDRYKNVVTIYGLVNEPLMLSLPVEKVLNWTTEATNLVQKNGIKAWVTVHDGFLNLDKW





DKMLKTRPSNMMLDTHQYTVFNTGEIVLNHTRRVELICESWYSMIQQINITSTGWGPTICGEWS





QADTDCAQYVNNVGRGTRWEGTFSLTDSTQYCPTASEGTCSCTQANAVPGVYSEGYKTFLQTYA





EAQMSAFESAMGWFYWTWATESAAQWSYRTAWKNGYMPKKAYSPSFKCGDTIPSFGNLPEYY




Transglucosidase
STSYGGTRTPDSSSTDVSRPSDLRTGPATRAGSGLTPSLDPSSRPLASRPANRDRIPPPPPKSH
211



CAK40856.1
HGKRIAPSPGVTPSLTQTTPGKATNRFSFHGSPSEPSYSPRPPQSGSDYFSAKPKDEPPSTEQS





TESLRRSQSQHKRPPTPPLSRRHSQMRRSKTTMSKVNPLRLSIHVAQASSAASSSSSPPPSPSG





WSLNPARTRESRTGSTPSEEPMHTATSTLRPEPSAAAPVSPTETSQSTSTGSSTKRTSLYNPLP





PPPPPRRSRGSSNHSIDSSGQSLRSGKPADETTAAPAAPAAQDEFVPHPSNAHDILADLSRLQK





EVDDLRGHYESRKASQ




Transglucosidase
MYISKVLLVTCAAFAPFASAAVQAKPTDTPVPVSSTHVASSPLAPTPTPVSPSPLHTASSSVII
212



CAK40944.1
SSSSSSVRFHPSSSASPSHMASSSRRISSSAISSSAIASSSASFTRSYITKASARPTTTTSTDA





DSKSNSNSGSDSESATAAAASATHSGAAAPAVQLSGGMAAGVLAAAGFIML




Transglucosidase
MPRSTVDQTSSAEAPYSGPRKLVLCLDGTGNQFMGFERDSNIVKIYQMLEKNTPGQFHYYQPGI
213



CAK41060.1
GTYVEGQSSSSGLLRYPRKLQSNIITTIDQGVGTTFESHVLAAYRFIMRYYSPGDHIYIFGFSR





GAYTARFLAEMIHELGLLSQGNEEMIHFAWETFSNFQQARGKTDRTAKDEALISYMKKFNTTFC





RPQVQIHFLGLFDCVNSVGQFEIPFHRKSHQYLVSPAARHIRHAVSIHERRLKFKPALVLLDKT





KPVDLKEVWFAGNHGDVGGGWSLAPGQFHLLSDTPLNWMLQEVLHLENSESKLSFHTLNVADVV





ERENAFPGKEEPGTTAYDVRKRTNQPHDMLMFNRGATFLMVIFWWILEILPLFTRLELEHGKWV





PRQWPPNMGAPRDIPEDAVIHQSVHEMVRAGILDPKSIPPRGGNNSHLPSTARITGAWKAMRKN





QEKQISSLLQKKPAGALRKEFDGKAD




Transglucosidase
MDPANEYCGLEDYGLVGDMHTCALVSKNGSVDSMCWPVFDSPSIFCRILDKEKGGHFSITPDRR
214



CAK41144.1
LKNPLSKQRYRPYTNMLETRWIHEEGVMNILDYFPIAKPKPHVACRSGMVRKAECVRGEMEIEI





ELFPAFNYARDSHVAQQSSASDDAIQVYHFQAESQNLVVSVLGDRGDISGDDSDLSIEFELSDR





PGHLGPGLVGKVTLKEGQSITMLLHDQESITCNVEDLAPYLQQIERTTGDFWSDWTSKCTFRGH





YREQVERSLLVLKLLTYKPTGAIVAAPTFSLPEHIGGSRNWDYRYSWVRDAAFTVYVFLKNGYP





EEAESYINFIFERIFPPMDKNPKPGEPFLPIMITIHGEREIPEMELEHLEGYRGSRPVRIGNGA





ATHIQLDIYGELMDSIYLYNKHAADISYDQWRAIRRMIDFVIQIRHQPDQSIWEVRGPPQNFVY





SKIMLWVALDRGLRLAEKRSNLPCPDRARWMHERDALYDEIMTKGYNSEKGFFCMSYENQDAMD





AAVLIAPLVFFVAPNDPRLLSTIQKITEVPAKGGLSVANMVSRYDTGKVDDGVGGNEGAFLMVT





FWLVEAMMRQLRKTATSQFDSILSFANHLGMFSEEVATSGEQIGNMPQAFSHLACQYMFNVIVK





RKSLETCEYHIALDVEKRILQICYNEDISIFNKWLSISGFTYRQSFTISQLVVV




Transglucosidase
MPIKLPKGFARRKSSSNALEEVQNPTQSSFRVFERPTGDKKSFSDGNLVAKRLSEGQPLDSPSE
215



CAK46428.1
DDNNIFALHNSQPARHQYEPPLSPDEYLTPFAHPDGPQPESQSPHTRNLYDIPIPPLSGAIRAA





GRTFSFGGRFSKASAPTPPPQPSTPGPSRSRGMTTSTSSTATPPKLPDTELRIGKIDDDFQNMF





GDIGKRYYGSKDASLDQPVDLDSSGPSSRPDALPRKDERVSRPTPIDTDRSREVEPSPYSWDSR





HSEEGLLTTLDSPNEQPATQPYQRNIDPVSVGDRRKSIPLPGTTPLATTSHRSLAKPRTTADKG





LRRSGVYSNRRDSVPVEDEDAKLIMESLYSKRSSQVPFMADHGASDAENDGPLFDQPGANSSHP





DRRESIQKDSLSSPVLSDHLDPSIAAHARLAAQYEKAQPVTVSSTNKVMTPSQFEHYRQQQELR





RSNSDASKSENSAESDYDDDDEVEKDREAERQ





RRKQEAHLSVYRQQMMKVTGQESPAPAMRTELDQASKSAPNLLQPGTTLGSGKSSDGDDDEEIP





LGILAAHGFPNRNRPPSRLAPSSSIPNLRASFQQPYLSSPSPASVAERDPNNRGSLPVFARNLP





RDPYFGASLVNQSNRESLALGGGASVHGGPSPALPPGGLVGVIATEERARAMRRGSPNTQAMYD





YQGGMPVPPVHPRGIPRPYTMMSLSSPNAGGVQPTISATEQAQIELSQQMTQMVQMQMQWMQQM





IHMQGGQSTQLMPPGGPPPTLGANLNARPSSMPSAGNMNNPHAGYSGDQRTLSMLDPNVSSRLN





SAAMPHVSGGLRPSTPAGQGYAPSIAPSERSNVGLAPRYRPVSTIQPDLGNVGSPSIPKSWSDE





NRKSSLSAAVPAAPQASQMSHRPMPSNSKPIRASKLNVGADQDDEDDDEGWAEMMKKRENKRNN





WKMKKETSSFGDLLNAVH




Transglucosidase
MYGSQSGHQSSAPPQPEWRLPSSTQQPASSRHHPPQPSWRASSPPPPPPPPRPTTTTTSSSSPF
216



CAK41498.1
NPTVYGQISNPPSTVNNYPGTSVSPVSAVAGSETTSWGVKYNRHQLHAQSPPPLPPRPSSTAQS





PQAQSPVVSPLDPNKPLPAAPGWATQSADNTSYQQWPSNPPYAPQQSVSASSLQPPPPPPAIST





GYQSSSAQQSNPWQQPPAVPPPPYSGVPLGQYQDSSVQQPATLPSNQQITGHNAPNPAIPQAPS





PKPSTGLHYESQPLPGPPQAPVAATLSPTHGTPPVVPPPVPPKTSPITVPTSASVLATGGPSDW





EHLSPIPGSIDDLGAFGSRPQDGSSSEPLSQASQSRPPNIGEPVRKDESVSPITPPNNAPQMTS





QTEGPLASQTVVRGNPHQPVRMGSTGSVSSDISTSETPESIDGIIEAWNRPISSQPSAEQNPQS





SASGVRAGILPPSRKQSPIGTPTPRQESIIPR





KQVRSGSSSVESSSVTNGPTTDKRTILPAFVPLDPYDDLDPWSKSSLERYVAMLRKEAVADSDA





ERYNIFTAFMAKETKLREILFNIEPESTRVGENPKVSSRQPTPILRASTSVSNDDTESGLIPVE





TEGGHVVSTTDDADSEDGSYSPGGRPILPRIQTPGATKLQRSASHTVSNKYNTDHVAHATSSRA





TSVPPSMLGDARHEHALPPLTTNPPQPIYIPFRYTEGPQRGSDVLVFDRPAYQAYSDLRQASAE





SGRVMSNAPAPTPGERPDSAVPSRRNEHDETFIGLIREKSVAYRKRAPRKTSSPPPLPAALRHG





KPASPVDDLRSMASSPLSKQSESSWNMTTRKDLENYSSDFSYIREAVKSWEISSKSRREQLDKE





RIHRQEVSEKRIDALFNGKEIGYADINLLEEEFRQKEARAQLDEERQELDKFVAEVFEPLDQRL





KEEIAALQALYEAALAQLDHENGRTKSATTDR





YNLSHTMRTVNEIYRKLELRYQKRLEIALDRERRRKKAERRPLVFMGDSVALKGVDQEFDQMEK





RNILEAARERDHRANRLMDSFDDAIMHGLGENQSLLDEVAAKVAKVDTATIRSSGLPESEVEQL





LKSVYNLIESLRKDSESILHNFNMADSVLNDADYSVSVAEARYSDADADVFRRLDDEKRKEDTK





IQTDLKTKLESIRSGPANIVTSINGLLESLGKPPIIDQTGPSSQMPADTPASVSQHLPAEIAPQ





KPQEDPEHQERLRKALENAKRRNAARVNTEISRP




Transglucosidase
MCNKSNYSSPKWWKESVVYQVYPASFNCGKSTTNTNGWGDVTGIIEKVPYLESLGVDISQTSRE
217



CAK41767.1
QCLTSLSLVYTSPQVDMGYDIADYESIDPRYGTLADVDLLIKTLKDHDMKLMMDLVVNHTSDQH





SWFVESANSKDSPKRDWYIWRPAKGFDEAGNPVPPNNWAQILGDTLSAWTWHAETQEFYLTLHT





SAQAELNWENPDVVTAVYDVMEFWLRRGICGFRMDVINFISKDQSFPDAPIIDPASKYQPGEQF





YTNGPRFHEFMHGIYDNVLSKYDTITVGETPYVTDMKEIIKTVGSTAKELNMAFNFDHMEIEDI





KTKGESKWSLRDWKLTELKGILSGWQKRMREWDGWNAIFLECHDQARSVSRYTNDSDEFRDRGA





KLLALLETTLGGTIFLYQGQEIGMRNFPVEWDPDTEYKDIESVNFWKKSKELHPVGSEGLAQAR





TLLQKKARDHARTPMQWSADPHAGFTVPDATPWMRVNDDYGTVNVEAQMSFPWEMKGELSVWQY





WQQALQRRKLHKGAFVYGDFEDLDYHNELVFAYSRTSADGKETWLVAMNWTTDAVEWTVPSGIH





VTRWVSSTLQTAPLMAGQSTVTLRALEGVVGCCS




Transglucosidase
MPLFNAKTLLAGLCAASIVSPSLGLPQSSHPGSSAVANTQGRASSESHPSWTLESDSTAATVST
218



CAK41979.1
SNTPLYSPSSSATASLGATQGSLTDDHDGASKSSTRSYGVTTISYSSAPVNNPQSAHDASASPS





TPSGPHSHTTTLSSSSAGVPAQSQTTSSSRSHSVVSKETSTRSPSSLFTPSASTETPTNPSHTP





STYTISTHSFSSSEHASSESATSFHAVSTSKHTHTHTPTSSTSSSNTPTRSSALTQHETSTSSS





TPTRSHTHTASTPASSKANTSSSIKTHTTHSHTEDTTSSSASHTPTSSKSSSSSAEDVSSSPAS





HSPTPSTHSITTTTNTDTSQSASITSGPSTTPNSTITTTSSTTTSSVDVYAIIKHLYKLVKDTY





PVIKKWKEDPKSVKASDLIKPLKRVIPVADDVLDVLGAPSSLSSSSGDSESVLDSCSSGGGLLG





DLIGIASCISSTADEAVSILGSSSDSDSSDESTLSSYFDAFETEGSSLSAVGVTATGSTASSTG





TTTTTSSGDTSSKSTKTATSTNTDSDTSTQSTKTKTSTKSTTTSDSSSSAGSGGHSASASASPT





STKSSTSTKESSTSTKTKTKSNTESSSKAASSSAAASKTDSSSSSAKSTSTDSTSTKKTTSTKS





TATSSSAPLSASSSAHTSSIATTNTTSTSTNSKSSTSTDTTTIIIHTHSGTASGTTTHHTSTVT





PSPSRNQTATTLVTTTSSYTPPLCYNHADPDNGAGNVCICTRSNGDYTTLSELPSGSGCSYTSI





PTPTTTSTTKTTSTKTTSDPPFTVTELNSDVIVCATSTLSYFSTFTYTQCAGSSSTIYTAPTPT





PTAQVVIAYLSDVYSFWSFFTPDIGSSIDFCNDAEAGELEASGSIKVIDPPYPDGTKELDFEIH





DMKDCVYKGTSDEPGTFTCPDLAKTVDCESYGENKVHDCYGALSDGGVVYEEGIDCASIGSVEV




Transglucosidase
MHLSKISAILTPVLNAAAVLSSQAPADDLSVLSSEVARANNQSLLWGPYKPNLYFGVRPRIPNS
219



CAL00956.1
LFAGLMWAKVDNYATAQQNFRHTCEQNEGMAGYGWDEYDIRKGGRQTIHDAGNSLDLTIDFVKV





PGGQHGGSWAARVKGVPRGDADPDQPTSVLFYAGLEGLGNLGVEGEPEDPRGFTGDVKLGGFTT





DLGDFSIDVTSGPESNEYPEHGHPTYDEKPLDRTLVSSLTMHPEQLWQTKVIMFTQMKKEVDEM





VEKYGSENPPPPYQLFTIKNEPGDGNMHLVQKVFKGSFEFDILFSSASSPQPMTSELLTEQISS





ASLEFSERFESVHPPQAPFDTAEYTEFSKSMLSNLVGGIGFFHGTDIVDRSAAPEYDEENEGFW





EETAEARGRAQPILEGPKDLFTCVPSRPFFPRGFLWDEGFHLIPVIDWDTDLALEIVKSWLSLM





DEDGWIAREQILGSEARSKVPPEFTIQYPHYANPPTLFIILEAFIDKLDAKKNASMQTYADSGV





TGNLRSIFVDQPELGEAFIRSIYPLLKKHYYWYRSTQKGDIKSYDREAYSTREAYRWRGRSIQH





ILTSGLDDYPRPQPPHPGELHVDLMSWMGMMTRALRRIAVTIGETEDAEVFKTYETAIERNIDD





LHWDDDARTYCDATIDEYEEHVHVCHKGYISIFPFLTGMLGPDSPRLKAILDLIGDPEELWSDY





GIRSLSKKDQFYGTAENYWRSPIWVNINYLVLKNLYDIAIVSGPHKEQARELYSNLRKNLVENV





FQEWKKTGFAWEQYNPETGSGQRTQHFTGWTSMVVKMMSMPDLPASEQKGHDEL




Transglucosidase
MEVMDMPKRNSPVSQPTAVAMASTSPHPEKKASDAPYIVDDDFPGDDDDDDDVSISPISERAPP
220



CAL00976.1
WSGTRWARFFPELSSHFSLASPTNSTNPPFPQPLTKGPPHIDGPSQQPERRSKGPSSLSSEDVA





DNRSSSYTSRSSLTSQGSEATSPVHKLVDSLHIKSPTKAGVFDESKFAHQIPPPFPSRSSIAQS





KDKPLPQEPPIELTPLSIRHKTPQIPDRPGYLSRLDPPPRSKKHASHHHPTLSQACTDLERTLA





GLAEQQHSPAQLSPRSPLQILDGPLQISRGNMDMVATRPAPRPPASVHDNRQIHKAKSREDMKQ





TKKLLKNKPSFSFTVPAFGRKLSRVHHRSTSNTSSKSEPESYRASVLHQPAVAELGDSEVAELQ





GSSVIGFRERPSSAGGEKELRMRLPRLQTKEMGAPGRKRDNIHSTHEGPEQLRRPNGRARGASV





GEKYFVSYSKLDGMPVHSTRQHQPTSQTSCMVYELEGGSTQPPAELQGDTTSPIDVVPVRISVG





VSSVGAMPGTLPDRVILTVLEHITSLDDLFNVAVSRKDFYRVFKMHELKLIRIAVFAMSAPAWE





LREMSPPWDTEWHFVLDPDAPVPEYTPSCYLKRYAEDIYTLAHLKSLILARCGTFLRPETIRGL





SGTDDIRAAEVDDAFWRVWTFCRLFGSGKGREGDIAGQLDWLRGGEVARNRRVSEFTSIADPYD





ANSVLFEPPTGFGDGNNGGLSKEQLLDMTEIWTCLGVLLQPMHEEWAYYILTLGLSAVLVLGSI





HPYDNTTAVFQRAHSMGLTNWEASDTGASRSSFLREAVSKACQPRGSSTSQASMRSSGFSSQPS





GSHDVSQTSNVRGEREPSPDFHRRRQAAYSAQLRIQRQQQPSPPNPMLAEERPISHYATIMSRL





EGLPPAPQPPMSVSRIEIPPTTHSYMTNVSYMQPLQSVTPVYYPPQVRDPVDHAIDIMVRELGF





GEEDAKWALKITDSGEGINVNAAISLLTRERKTHEQSSRGFSLRKRKSFLSSVINSPESRHSGW





KWA




Transg1ucosidase
MEPLRRSQSSRSMRRSHHSSQSTEPFDPELARFQATTAASRAMLRSKCSYDVLGGPSKMAVPQR
221



CAK42352.1
QHRPAGAALNATNAPVEDVDLRRSVLDKTSDLSPPAGLPSIREFGRLDAGIATLPSSYRRLRKT





RSMFTNWQRSSHVPRGLSSPGCPTHNILTRREPQDVLRAPGTLRRSMSFFRGDTQNSDSLRYAR





GQDVAIEMARSHYQQPEIYPTELRKSSLTVPKSRPFKKTLRSVVSDAESASVSPAIQRSTNVIS





YGKARSLSSSLKKGLKKVLELSRPSSARISLGKSSSNDQQRIQGSPSTISAKHSDPLNSGVEGN





ALTHSPDTEGTVVYTGVKKSESSESLATSRSRVTSWADSTIANTVITYRADDHSSLSVIDEHDS





SCLKPSSSEDVSLTTCRTPKPNCTIDSQRLYSALMKRIDGNKTENASKEIVLGHVREHRAIPTP





VSSMYTRRSRKTIRLIASDESLQSPGSYTTADVGTVTPCEPAQRQAQRTHGQKHLQDIRFGSAN





LASSKHTIKEDSRDETGNMAMGRSQSPEEDEDSPSMYSRSTGGTSPKTTDPKMGESDPEVANEP





GVATIYASQRAIYSSPKRNADQGPEAMQRPSADWQQWVRTQMERIEYLTPTRRHYREDAQIQDE





TADLAYRTPSRDRRGFWSGSPDEDLRTTCKVTARNNFSRPFSRSSSVRTTVIAPKEQADTLVPP





PPPSDSTPKVLSSSSGRSLFINQVETRPDGMALSPVPALLNNRYRAPESPTPRRDATDKARWRA





GGRRYGRQPSRLLPEAQDSKASQIRSSRVPQENRRLTDENVRLENGYQEVASKDSQLQNMYSPI





SSKRMVEMFLESRRRRMGTEMSDGAPSKDDGTKLSSDSVYDRHHIESLFYSLAPMYETPTN




Transg1ucosidase
MANIIWLALVLVALSIHVQAKDVFAHFILANAENFTQTHWTRDISAAKAAQIDAFALNTGYGAA
222



CAK42453.1
NTDQLLTDAFTVAAAHDFKLFLSLDYSGDGHWPPDQVLKVLQGYANHTAYYRVDNKHPLVSTFE





GYEALADWSTIKEKLPNIYFMPEWSVRTPQELASEDAVDGLLSWSAWPYGTTPMNTSTDEQYIS





ALKAKDKPYIMPVSPWFYTDMVRYHKNWVWQGDGLWHTRWKQVLDLQPQFVEILTWNDFGESHY





IGPLHENELGIFSFGQAPFNYASGMVHDAWREFLPYVVGEYKNGSGKGVIDKEGVVVWYRVTPA





WACKAGLTTGNSVTQGQQTMPPGQVLKDEVFFQALLEDTADVEVSIGGGENKSVGWTDTPSGGS





SGGGRGLYFGSVPMDNRTGEVVVTLSRNGKFVAQMIGEKITTQCPDKLTNWNAWVGTAMSNVSN





ASTSRASLSEENGAASVRVGGGRGMDMWMGALWMVVVVGIRADRSIPTKWACGSQINEEVLIQR





RERLPLVEGDRVYLDSSSKAFRRRRRTCTR




Transg1ucosidase
MKVPADHALLLSSLLLAPSVGASTCQEPINHPGEPFSFVQPLNTSILTPYGGSPPVFPSPETKG
223



CAK42457.1
KGGWEKAMAQAKNWVSQLTVEEKAWMATGQPGPCVGNILPIPRLNFTGLCLQNGPQCIQQGDYS





SVFVSGVSAAASWDRKLLYDRGYAMATEHKGKGTHVVLGPIGGPLGRSPYDGRTWEGFAADPYL





TGVCMEETILGIQDAGVQANAKHFIANEQETQRNPTYAPDANATTYIQDSVSSNLDDRTLHEIY





MWPFANAARARVASFMCSYNRVNGSHSCQNSYLLNHLLKTELGFQGYVMSDWGATHSGVASAES





GMDMTMPGGFTVYGELWTEGSYFGKNLTEAINNGTITTDRIDDMIVRIMTPYFWLGQDKNYPSV





DASVGPLNVDSPPDTWLYDWKFTGPSNRDVRGNNSAMIREHGAASTVLLKNERNALPLRKPRNI





VIVGNDAGSDTQGPSTQTDFEYGVLANAGGSGTCRFSYLSTPQDAITTRARQYGGRVQTWLNNT





LITEKSMPELWNPEQPDVCLVFLKSWSEENVDRTYLTLDWNGNAVVEAVAKYCNNTVVVTHSAG





VNVLPFADHPNVTAILAAHYPGEEAGNAIADLLYGDANPSAKLPYVIAYNESDYNAPLTTAVAT





NGTYDWQSWFDEELEVGYRYFDAHNIPVRYEFGFGLSYTTYNLTKLVAAKPVASNLTALPEQRA





VQPGGNPALWDTVYTLTAQVSNTGSVDGYAIPQLYVGFPDTAPAGTPPSQLRGFDKIWLEAGET





KKVTFELMRRDVSYWDVTAQDWRIPAGEFTFKAGFSSRDFHANATATFFRK




Transglucosidase
MHLRRIFVLTVLSYVTALPSDINLGVALRGCDVEACDMECRMAGSIGGNCGGNPALNLLGLPLL
224



CAK42741.1
STNTPNSSSAGPVTLAATETLTDVESTTTTKTTTDRESVTATETTTDIESTTATQTVTDTHLLT





VTKSITEKQPTTATQTATDTKFLRTTQTINNTLTATQTTTDIESLRVTKTKNNTITATQTTTDI





EPTTATQTINNTLTATQTTTDTESYTAMEITTATASFTTTQTTTDTESITATQNITDTQTIHHT





ESLTATRTITDTDSVTATATPTTVTDTQTSISTTTATQTATPTPEVGACFCCTEQVRLPYELNG





NCDNIPVSNSTDGCPSGDDPKRNHLLCCDSSGYCTQLS




Transglucosidase
MGSYTFTWPYNANEVFVTGTFDDWGKTVKLDRVGDVFEKEVPLPVTDEKVHYKFVVDGIWTTDN
225



CAK46804.1
RAPEEDDGSSNINNVLYPDQILKDSTTPLLNGTAAMAGVTPGSTTAALAAGVPKESSSKHGQNG





YYPTISSAAPGSTTAALGQDVPLEQRANVPGSFPVTPASEADKFSVNPIPASSGAGNPIKLNPG





EKVPDSSTFNTNTISSTARTDRAGYEQGTSGGFPGSPAYDASAFAIPPVSKNMIPESSLPMGEN





QGATEPTYTIQSAAPTSTTAGLAAAVPLESQRQTSSGAPTRDVPDVVRQSMSEAHRDPEAATNK





EAVDEKKEMEEELRRKVPVDNSTGAPAPTTVAGLGTSSGLGFTAGAAPSTNLGPSTGLDVATGM





GTTTGLDSVSGPTAAQSFQKETTSGLPAHDVPDVVKQSISEAHKDPEAAGVEEAVGEKREVEEE





LQQKVPVSNQSGTPAPVITAATSETAPGSGAE





PASERAPRATGGGPASAQISPRATTPTDGPTVTTGVATSKAPEESGPGASGREETTEIPTKPAA





GATGASATKTVDSGVESGIAPEDTTSAPTAGATGASATKTADPTETSGAPTSGASKPAESAPTN





NAAATSKPATNNAAGAATNGKEEKKKKGFFSRLKEKLKSV




Transglucosidase
MARVDFWHTASIPRLNIPALRMSDGPNGVRGTRFFNGIPAACFPCATALGATWDAHLLHEVGQL
226



CAK97412.1
MGDESIAKGSHIVLGPTINIQRSPLGGRGFESFAEDGVLSGILAGNYCKGLQEKGVAATLKHFV





CNDQEHERLAVSSIVTMRALREIYLLPFQLAMRICPTACVMTAYNKVNGTHVSENKELITDILR





KEWNWDGLVMSDWFGTYTTSDAINAGLDLEMPGKTRWRGSALAHAVSSNKVAEFVLDDRVRNIL





NLVNWVEPLGIPEHAPEKALNRPQDRDLLRRAAAESVVLMKNEDNILPLRKDKPILVIGPNAQI





AAYCGGGSASLDPYYTVSPFEGVTAKATSEVQFSQGVYSHKELPLLGPLLKTQDGKPGFTFRVY





NEPPSHKDRTLVDELHLLRSSGFLMDYINPKIHSFTFFVDMEGYFTPTESGVYDFGVTVVGTGR





LLIDNETVVDNTKNQRQGTAFFGNATVEERGSKHLNAGQTYKVVLEFGSAPTSDLDTRGIVVFG





PGGFRFGAARQVSQEELISNAVSQASQASQVIIFAGLTSEWETEGNDREHMDLPPGTDEMISRV





LDANPDNTVVCLQSGTPVTMPWVHKAKALVHAWFGGNECGNGIADVLFGDVNPSAKLPVTFPVR





LQDNPSYLNFRSERGRVLYGEDVYVGYRYYEKTNVKPLYPFGHGLSYTTFSRSDLKITTSPEKS





TLTDGEPITATVQVKNTGTVAGAEIVQLWVLPPKTEVNRPVRELKGFTKVFLQPGEEKQVEIVV





EKKLATSWWDEQRGKWASEKGTYGVSVTGTGEEELSGEFGVERTRYWVGL




Transglucosidase
MAVRRSARLRSRQATEPEAPADPVVTDNNAPCDTNNHNNSENTSEIDTTMARLGKQPERLPPVV
227



CAK43189.1
EHEEPADAAKDVPVQRSRKKTKTETASKRRSKVEAKEPVAEVTPVIAESQSNTDTTEPAVAETE





KKPTPKAVPEPAVAETEKNPTPKALPETASKLSTPKKSIPTLKGTPVHRNTPVHRSTPVHKSTP





TRTPSSTLVRPSHQEMHPSKVRQSTTKQADSGLILGFKPIKKDAEGKVIKDTLADNTPTKAKAS





PAPYYGTPAFEFKFSCESQLSDEAKKLMETVREDAAKIKAQMTLEPDQNRAEAADRKIVQPKGK





ASRFSDVHMAEFKKMDSIAGHASAFRATPGRFQPVVKTLKRTNSKARLDESDRNSPSPSKIARP





SPAIVAPASNKRVKHDKADDASTRRPTAASPPKPVQPRPRSTVRSSLMTPTRSSAARASSVTAR





PPRTSMIPSLVRSPAAKPADVPRTPQTEFNPRLKSNLPTLGNLKSILRRHQPLFSKDPSKIAAG





THVAAPDFTSNLLFGSRGTTEEPAQTPSPKKRVEFTPSVKARHEEVMFSPSPSKVPVASPSRTT





SDVVYPTLPVLTPEQNRVSAKSPAQATTPTIRHVRPSDVHANPLPEVAGVPHGIGHKKRTRESG





EDTKTNDLPEVAGVPHGIGQKKRNRAALEDETDTENVPPVDLTADARSAKRMKMTSPSPLKAPT





LSARKVAAPSPTKAATPSPTKPRSHTPLRSATTSRMSTPGISTPASVRARNRGVLSVSRLNMLA





QPKNRG




Transglucosidase
MSLRWRKKTHWPPSPCVEDEVVSLSRELHGLSQIREMPGLEGVCSRGSVDQYPVLVDVFSYSSY
228



CAK43257.1
EETTVIYEDFSRDSSSEDNVGPPTPVDEKQDPMLYLVGDDQAVSLSAPLATREQSQDPSKTPAD





NEQGSTTRGRPRADTRAQRDSPSKKDTAQSSRDASRASNIRTPAVTQSKSTPSLPRRFGSVKHG





RSADALTTKSGYQSDSATVKSKAKPETVDKSAAPQTDKKSPTGLTVAERLEEKIRQRQELRAKE





SSGDAPKTPSPPSTDQPSVPVGRAAEPSITPAATAAPKSTAPKTRPRSTSTPKEPTNDHRVDGP





EASSSAAAPALQLPPRPGLASKPAGRSVSSNDAKSTAQLPARRAVSFLDDVPQRSSSLPRTPED





IPEPPPLPRRRSSSQDAVRHRSSSQDAVRQTSPKRPFFLPPCPRSTPIAGYQDWHTVKGLPHLN





ICPSCMKQMRKSEFRDHFVLASPRSRGEKIRCSMSEPWTRLAWMQTLKKQLDHLELLHQITRPP





LSIKPCPGRIITEQHWYRIVDPETNMYLPQFNVCSACVRNLRVLMPQHRDTFKRSSTKQERACD





FLTDSPRFVRYIDYLDIAANRADQENMLRPDVTEFLSYARRKVVLRDCRRDRRILSTWHYMPQL





PELTVCEDCYDDVVWPLVRAKQPIARKFSTSMRLLPGDGPSRCREASCQLYSPRMRAKFAEAVQ





SNDLMYLKQVALRRRDAEQRYRDREEELLEDASRGYDVEGEMRRNVEEWKRNE




Transglucosidase
MSLLPVILVTFFLVFCCAAGPIAPFASKRDLESLSPSPSLTTPYVHVASSSVDDDDDNEINALT
229



CAK97469.1
VVPVTPSSLPQTASSSSTTIEPALSSSAAFIQESSIVAATTSSLSESSSAVTHSFAPSTSSDST





VNTLSQTLTSTTTTTTTTSSPTTLGEPSSAPFSPLSSAVRSSHTTSSSSHIMHITTPSTTLSES





SQIVTPSIIIPGGPMEASSSHAPGTATSSHITHETSSSAVRVSHSSSAAAEMSKSSPSHQGTLN





SSSRLLHSSTAALLPSSTAPESAPETSSTRTSETTTSTSLTIGVIVPLPEASTSPSTMLEMSSS





TSQSSESITTTADDTSSASSTKVLSTPTESETTTPTSHTASPFIGVTIPTTTKTTQPADPADIT





TTTTTPTSEAEDATTTSAPTPVVVLVTPEGSTTVIGTSSFIAPNPDITSTSTSTSSLTTTIEPT





PTTSTTEPPTTLHATSTTIQTVYVVITDTPTPEPTWDSTTIATAIITVYDKSTSETVPTTTTTS





RAGEEMESTTTTPLDTEQTSTQSTEDEIPTSTPIADTETATAIITSYPSTSTLSGETGDVIRIV





PVTPTGPITVTVTVTEKERETVTKTETVTERVTETESVTT




Transglucosidase
MPQKEFVPKTYQESSTGAQSSSSVHLRSSPEERSFDFSFEPIRENLFRVTFSSQDHPLPPYPSV
230



CAK97480.1
KPATSLDGVHVSATGGSNQKTIEVGDVTASVEWSNTPVVSLSWKGTEKPLYRDLPLRSYVADS





TGIAHYTEHDRDCLHVGLGEKRAPMDLTGRHFQLSATDSFGYDVYNTDPLYKHIPLLIKASPDG





CVAIFSTTHGRGTWSVGSEVDGLWGHFKVYRQDYGGLEQYLIVGKTLKDVVRSYAELVGLPILV





PRWAYGYISGGYKYTMLDDPPAHEALMEFADKLEEHGIPCSAHQMSSGYSIAETEPKVRNVFTW





NKYRFPNPEEWIAKYHGRGIRLLSNIKPFLLASHPDFQKLIDGNGFFKDPESSKPGYMRLWSAG





GATGGDGCHIDFSSAVAFKWWYDGVQSLKRAGIDAMWNDNNEYTLPDDDWKLALDEPTVSDAVK





KGVENSVGQWGRAMHTELMGKASHDALLNIEPNHRPFVLTRSATAGTMRYAASTWSGDNVTSWE





GMKGANALSLSAGISLLQCCGHDIGGFEGPQPSPELLLRWIQLGIHSPRFAINCFKTSPGNSSV





GDVIEPWMYPEITPLVRDTIKRRYEILPYIYSLGLESHLTASPPQRWVGWGYESDPEVWTKALK





SGDEQFWFGDTIMVGGVYEPGVSVAKLYLPRKANDQFDFGYVNMNEPYNYLASGQWVEVPSEWR





KSIPLLARIGGAIPVGKPVHTRVPGDDTPASVAVKEVDDYRGVEIFPPLGSSHGQVFSTTWFED





DGISLEARISEYTVTYSSTEEKVIVGFSRDEKSGFVPAWTDLDIILHNGDERRVVSDIGKTVEY





KGKGSRGRVVYTLKN




Transglucosidase
MNEGRLAHPQFNQYSFKAGASTVQAEAAPALNYEDASTHNAAKNATKRSGRKGDQTYTYSIPPE
231



CAK47332.1
LAEAARLVAEASPQPVPTDYGVDISLVVSKYRKYDNNDTNVPKQKYVEPNGLDGYVHTGQPEDS





PEIHTELKKRATTDFWLTQMGDSGSSPYAPDGYKVWRNVRDYGAKGDGITDDTAAINKAISDGG





RCGAECGSSTIYPAFVYFPAGKYLVSSPIIQYYNTEFYGNPFDYPTILAASSFVGLGVITSDVY





TGDDTEWYINQNNFLRSIRNFKMDITRTDPNAYVCAIHWQVAQGTSLENIEFYMMQDGLTTQQG





IYMENGSGGFLTNLTFVGGNFGYVYAFSQRCTPLSDLPSGHTLATQFTSTSLTFMNCKTALQVH





WDWAWTMQDVVVENCTNGIVIVGGAGGPKSTGQSVGSLILVDAVIAHTQTGIVTTLLAENSTSF





LLQGVVFIEVDTAILDSAQGKTLMAGGSNVPVFSWGFGRVVTTGAESTFYNGQDIPRTNRSVPL





TTIGYIEPNFYLRRRPTYRDIGMSQVINVKDWGAAGDGKTDDTAVLNSILDRAANMSSIVFFPY





GVYIIRDTLRVPVNSRIMGQVWSQIMATGPKFQDEQNPHIAVQVGQVGDRGIVEIQSLMFTVSG





PTAGAVLMEWNVHQVIQGSAGMWDSHFRVGGATGSQLQADECPKGSGVVLPACKAASLLLHLTS





QSSAYLENIWLWVADHDLDLQDQAQIDVYSARGLLVESQGPTWLYGTASEHNVLYQYQVSQARD





LYMGMIQTESPYFQNVPPAPSPFSPGLFPNDPTFSDCDSDSQTCPVSWALRIIDSTSVYSMGAG





IYSWFSAYSQDCLDTESCQQHAVGISQSTNTWLYNLVTKGIAEMVTPTNEHPTLSADNVNGFMS





SILAWVRLANTTIGARKFPGFQLYQPKWLDGLTDTCKTALSQKILCHPYLEMKFSNPGIGQYID





NNTLADEVCDQGCGESLQMWTTNVANSCLNQTIDDTDPVAAGGYIYAGYNLTCLRDPHTKKYCP





DVLSHFTIVDSVRSMTLAEMCSYCFTTSLEMRQASPYAAYTDVDKDALETVNAECGLSGPTDLH





KPLYTEDEVDRPICMSGITHTTSEGDTCDLLAYKYHVASAVIQLANPMLVNDCSELIPGRQLCM





PLSCDTQYTLQDNDTCLSIEWAQPIGFGEVRRYNPWLNVDCTNLQTTRQVHGSVLCLSPQGGSH





NVTGTGSPCPGISDGYTNVVQYAPTNSTIAKGTTCYCGKWYTVQQGDSCATICIKQGIPSSLFL





AVNPSLSTSDCDTSLQVGYTYCVGPDTHWDDTDNFWGEFACEAY




Transglucosidase
CTTPTAVAVTFDLTATTTYGENIYLVGSISQLGDWETSDGIALSADKYTSSDPLWYVTVTLPAG
232



1ACZ_A
ESFEYKFIRIESDDSVEWESDPNREYTVPQACGTSTATVTDTWX




Transglucosidase
MSNRWTLLLSLVILLGCLVIPGVTVKHENFKTCSQSGFCKRNRAFADDAAAQGSSWASPYELDS
233



ACF60497.1
SSIQFKDGQLHGTILKSVSPNEKVKLPLVVSFLESGAARVVVDEEKRMNGDIQLRHDSKARKER





YNEAEKWVLVGGLELSKTATLRPETESGFTRVLYGPDNQFEAVIRHAPFSADFKRDGQTHVQLN





NKGYLNMEHWRPKVEVEGEGEQQTQEDESTWWDESFGGNTDTKPRGPESVGLDITFPGYKHVFG





IPEHADSLSLKETRGGEGNHEEPYRMYNADVFEYELSSPMTLYGAIPFMQAHRKDSTVGVFWLN





AAETWVDIVKSTSSPNPLALGVGATTDTQSHWFSESGQLDVFVFLGPTPQEISKTYGELTGYTQ





LPQHFAIAYHQCRWNYITDEDVKEVDRNFDKYQIPYDVIWLDIEYTDDRKYFTWDPLSFPDPIS





MEEQLDESERKLVVIIDPHIKNQDKYSIVQEMKSKDLATKNKDGEIYDGWCWPGSSHWIDTFNP





AAIKWWVSLFKFDKFKGTLSNVFIWNDMNEPSVFNGPETTMPKDNLHHGNWEHRDIHNVHGITL





VNATYDALLERKKGEIRRPFILTRSYYAGAQRMSAMWTGDNQATWEHLAASIPMVLNNGIAGFP





FAGADVGGFFQNPSKELLTRWYQAGIWYPFFRAHAHIDTRRREPYLIAEPHRSIISQAIRLRYQ





LLPAWYTAFHEASVNGMPIVRPQYYAHPWDEAGFAIDDQLYLGSTGLLAKPVVSEEATTADIYL





ADDEKYYDYFDYTVYQGAGKRHTVPAPMETVPLLMQGGHVIPRKDRPRRSSALMRWDPYTLVVV





LDKNGQADGSLYVDDGETFDYKRGAYIHRRFRFQESALVSEDVGTKGPKTAEYLKTMANVRVER





VVVVDPPKEWQGKTSVTVIEDGASAASTASMQYHSQPDGKAAYAVVKNPNVGIGKTWRIEF




Transglucosidase
MSFRSLLALSGLVCTGLANVISKRATLDSWLSNEATVARTAILNNIGADGAWVSGADSGIVVAS
234



CAY05387.1
PSTDNPDYFYTWTRDSGLVLKTLVDLFRNGDTSLLSTIENYISAQAIVQGISNPSGDLSSGAGL





GEPKFNVDETAYTGSWGRPQRDGPALRATAMIGFRQWLLDNGYTSTATDIVWPLVRNDLSYVAQ





YWNQTGYDLWEEVNGSSFFTIAVQHRALVEGSAFATAVGSSCSWCDSQAPEILCYLQSFWTGSF





ILANFDSSRSGKDANTLLGSIHTFDPEAACDDSTFQPCSPRALANHKEVVDSFRSIYTLNDGLS





DSEAVAVGRYPEDTYYNGNPWFLCTLAAAEQLYDALYQWDKQGSLEVTDVSLDFFKALYSDAAT





GTYSSSSSTYSSIVDAVKTFADGFVSIVETHAASNGSMSEQYDKSDGEQLSARDLTWSYAALLT





ANNRRNSVVPASWGETSASSVPGTCAATSAIGTYSSVTVTSWPSIVATGGTTTTATPTGSGSVT





STSKTTATASKTSTSTSSTSCTTPTAVAVTFDLTATTTYGENIYLVGSISQLGDWETSDGIALS





ADKYTSGDPLWYVTVTLPAGESFEYKFIRIESDDSVEWESDPNREYTVPQACGTSTATVTDTWR




Transglucosidase
MSFRSLLALSGLVCTGLANVISKRATWDSWLSNEATVARTAILNNIGADGAWVSGADSGIVVAS
235



CAY05391.1
PSTDNPDYFYTWTRDSGLVLKTLVDLFRNGDTSLLSTIENYISAQAIVQGISNPSGDLSSGAGL





GEPKFNVDETAYTGSWGRPQRDGPALRATAMIGFGQWLLDNGYTSTATDIVWPLVRNDLSYVAQ





YWNQTGYDLWEVNGSSFFTIAVQHRALVEGSAFATAVGSSCSWCDSQAPEILCYLQSFWTGSFI





LANFDSSRSAKDANTLLLGSIHTFDPEAACDDSTFQPCSPRALANHKEVVDSFRSIYTLNDGLS





DSEAVAVGRYPEDTYYNGNPWFLCTLAAAEQLYDALYQWDKQGSLEVTDVSLDFFKALYSDATG





TYSSSSSTYSSIVDAVKTFADGFVSIVETHAASNGSMSEQYDKSDGEQLSARDLTWSYAALLTA





NNRRNVVPSASWGETSASSVPGTCAATSAIGTYSSVTVTSWPSIVATGGTTTTATPTGSGSVTS





TSKTTATASKTSTSTSSTSCTTPTAVAVTFDLTATTTYGENIYLVGSISQLGDWETSDGIALSA





DKYTSSDPLWYVTVTLPAGESFEYKFIRIESDDSVEWESDPNREYTVPQACGTSTATVTDTWR




Transglucosidase
CTTPTAVAVTFDLTATTTYGENIYLVGSISQLGDWETSDGIALSADKYTSSDPLWYVTVTLPAG
236



1ACO_A
ESFEYKFIRIESDDSVEWESDPNREYTVPQACGTSTATVTDTWR




Transglucosidase
MSNRWTLLLSLVILLGCLVIPGVTVKHENFKTCSQSGFCKRNRAFADDAAAQGSSWASPYELDS
237



CAS97680.1
SSIQFKDGQLHGTILKSVSPNEKVKLPLVVSFLESGAARVVVDEEKRMNGDIQLRHDSKARKER





YNEAEKWVLVGGLELSKTATLRPETESGFTRVLYGPDNQFEAVIRHAPFSADFKRDGQTHVQLN





NKGYLNMEHWRPKVEVEGEGEQQTQEDESTWWDESFGGNTDTKPRGPESVGLDITFPGYKHVFG





IPEHADSLSLKETRGGEGNHEEPYRMYNADVFEYELSSPMTLYGAIPFMQAHRKDSTVGVFWLN





AAETWVDIVKSTSSPNPLALGVGATTDTQSHWFSESGQLDVFVFLGPTPQEISKTYGELTGYTQ





LPQHFAIAYHQCRWNYITDEDVKEVDRNFDKYQIPYDVIWLDIEYTDDRKYFTWDPLSFPDPIS





MEEQLDESERKLVVIIDPHIKNQDKYSIVQEMKSKDLATKNKDGEIYDGWCWPGSSHWIDTFNP





AAIKWWVSLFKFDKFKGTLSNVFIWNDMNEPSVFNGPETTMPKDNLHHGNWEHRDIHNVHGITL





VNATYDALLERKKGEIRRPFILTRSYYAGAQRMSAMWTGDNQATWEHLAASIPMVLNNGIAGFP





FAGADVGGFFQNPSKELLTRWYQAGIWYPFFRAHAHIDTRRREPYLIAEPHRSIISQAIRLRYQ





LLPAWYTAFHEASVNGMPIVRPQYYAHPWDEAGFAIDDQLYLGSTGLLAKPVVSEEATTADIYL





ADDEKYYDYFDYTVYQGAGKRHTVPAPMETVPLLMQGGHVIPRKDRPRRSSALMRWDPYTLVVV





LDKNGQADGSLYVDDGETFDYERGAYIHRRFRFQESALVSEDVGTKGPKTAEYLKTMANVRVER





VVVVDPPKEWQGKTSVTVIEDGASAASTASMQYHSQPDGKAAYAVVKNPNVGIGKTWRIEF




Transglucosidase
CTTPTAVAVTFDLTATTTYGENIYLVGSISQLGDWETSDGIALSADKYTSSDPLWYVTVTLPAG
239



1KUL_A
ESFEYKFIRIESDDSVEWESDPNREYTVPQACGTSTATVTDTWR




Transglucosidase
CTTPTAVAVTFDLTATTTYGENIYLVGSISQLGDWETSDGIALSADKYTSSDPLWYVTVTLPAG
240



1KUM_A
ESFEYKFIRIESDDSVEWESDPNREYTVPQACGTSTATVTDTWR




Transglucosidase
MSFRSLLALSGLVCTGLANVISKRATLDSWLSNEATVARTAILSNIGADGAWVSGADSGIVVAS
241



AD032576.1
PSTDNPDYFYTWTRDSGLVLKTLVDLFRNEDTSLLSTIENYISAQAIVQGISNPSGDLSSGAGL





GEPKFNVDETAYTGSWGRPQRDGPALRATAMIGFGQWLLDNGYTSTATDIVWPLVRNDLSYVAQ





YWNQTGYDLWEEVNGSSFFTIAVQHRALVEGSAFATAVGSSCSWCDSQAPEILCYLQSFWTGSF





ILANFDSSRSGKDANTPLGSIHTFDPEAACDDSTFQPCSPRALANHKEVVDSFRSIYTLNDGLS





DSEAVAVGRYPEDTYYNGNPWFLCTLAAAEQLYDALYQWDKQGSLEVTDVSLDFFKALYSDAAT





GTYSSSSSTYSSIVDAVKTFADGFVSIVETHAASNGSMSEQYDKSDGEQLSARDLTWSYAALLT





ANNRRNSVVPASWGETSASSVPGTCAATSAIGTYSSVTVTSWPSIVATGGTTTTATPTGSGSVT





STSKTTATASKTSTSTSSTSCTTPTAVAVTFDLTATTTYGENIYLVGSISQLGDWETSDGIALS





ADKYTSSDPLWYVTVTLPAGESFEYKFIRIESDDSVEWESDPNREYTVPQACGTSTATVTDTWR




Transglucosidase
MSFRSLLALSGLVCTGLANVISKRATLDSWLSNEATVARTAILNNIGADGAWVSGADSGIVVAS
242



ADX86749.1
PSTDNPDYFYTWTRDSGLVLKTLVDLFRNGDTSLLSTIENYISAQAIVQGISNPSGDLSSGAGL





GEPKFNVDETAYTGSWGRPQRDGPALRATAMIGFGQWLLDNGYTSTATDIVWPLVRNDLSYVAQ





YWNQTGYDLWEEVNGSSFFTIAVQHRALVEGSAFATAVGSSCSWCDSQAPEILCYLQSFWTGSF





ILANFDSSRSGKDANTLLGSIHTFDPEAACDDSTFQPCSPRALANHKEVVDSFRSIYTLNDGLS





DSEAVAVGRYPEDTYYNGNPWFLCTLAAAEQLYDALYQWDKQGSLEVTDVSLDFFKALYSDAAT





GTYSSSSSTYSSIVDAVKTFADGFVSIVETHAASNGSMSEQYDKSDGEQLSARDLTWSYAALLT





ANNRRNSVVPASWGETSASSVPGTCAATSAIGTYSSVTVTSWPSIVATGGTTTTATPTGSGSVT





STSKTTATASKTSTSTSSTSCTTPTAVAVTFDLTATTTYGENIYLVGSISQLGDWETSDGIALS





ADKYTSSDPLWYVTVTLPAGESFEYKFIRIESDDSVEWESDPNREYTVPQACGTSTATVTDTWR




Transglucosidase
MSFRSLLALSGLVCTGLANVISKRATLDSWLSNEATVARTAILNNIGADGAWVSGADSGIVVAS
243



AEE60909.1
PSTDNPDYFYTWTRDSGLVLKTLVDLFRNGDTSLLSTIENYISAQAIVQGISNPSGDLSSGAGL





GEPKFNVDETAYTGSWGRPQRDGPALRATAMIGFGQWLLDNGYTSTATDIVWPLVRNDLSYVAQ





YWNQTGYDLWEEVNGSSFFTIAVQHRALVEGSAFATAVGSSCSWCDSQAPEILCYLQSFWTGSF





ILANFDSSRSGKDANTLLGSIHTFDPEAACDDSTFQPCSPRALANHKEVVDSFRSIYTLNDGLS





DSEAVAVGRYPEDTYYNGNPWFLCTLAAAEQLYDALYQWDKQGSLEVTDVSLDFFKALYSDAAT





GTYSSSSSTYSSIVDAVKTFADGFVSIVETHAASNGSMSEQYDKSDGEQLSARDLTWSYAALLT





ANNRRNSVVPASWGETSASSVPGTCAATSAIGTYSSVTVTSWPSIVATGGTTTTATPTGSGSVT





STSKTTATASKTSTSTSSTSCTTPTAVAVTFDLTATTTYGENIYLVGSISQLGDWETSDGIALS





ADKYTSSDPLWYVTVTLPAGESFEYKFIRIESDDSVEWESDPNREYTVPQACGTSTATVTDTWR




Transglucosidase
CTTPTAVAVTFDLTATTTYGENIYLVGSISQLGDWETSDGIALSADKYTSSDPLWYVTVTLPAG
244



AFJ52556.1
ESFEYKFIRIESDDSVEWESDPNREYTVPQACGTSTATVTDTWR




Transglucosidase
MSNRWTLLLSLVILLGCLVIPGVTVKHENFKTCSQSGFCKRNRAFADDAAAQGSSWASPYELDS
245



CCO73840.1 
SSIQFKDGQLHGTILKSVSPNEKVKLPLVVSFLESGAARVVVDEEKRMNGDIQLRHDSKARKER





YNEAEKWVLVGGLELSKTATLRPETESGFTRVLYGPDNQFEAVIRHAPFSADFKRDGQTHVQLN





NKGYLNMEHWRPKVEVEGEGEQQTQEDESTWWDESFGGNTDTKPRGPESVGLDITFPGYKHVFG





IPEHADSLSLKETRGGEGNHEEPYRMYNADVFEYELSSPMTLYGAIPFMQAHRKDSTVGVFWLN





AAETWVDIVKSTSSPNPLALGVGATTDTQSHWFSESGQLDVFVFLGPTPQEISKTYGELTGYTQ





LPQHFAIAYHQCRWNYITDEDVKEVDRNFDKYQIPYDVIWLDIEYTDDRKYFTWDPLSFPDPIS





MEEQLDESERKLVVIIDPHIKNQDKYSIVQEMKSKDLATKNKDGEIYDGWCWPGSSHWIDTFNP





AAIKWWVSLFKFDKFKGTLSNVFIWNDMNEPSVFNGPETTMPKDNLHHGNWEHRDIHNVHGITL





VNATYDALLERKKGEIRRPFILTRSYYAGAQRMSAMWTGDNQATWEHLAASIPMVLNNGIAGFP





FAGADVGGFFQNPSKELLTRWYQAGIWYPFFRAHAHIDTRRREPYLIAEPHRSIISQAIRLRYQ





LLPAWYTAFHEASVNGMPIVRPQYYAHPWDEAGFAIDDQLYLGSTGLLAKPVVSEEATTADIYL





ADDEKYYDYFDYTVYQGAGKRHTVPAPMETVPLLMQGGHVIPRKDRPRRSSALMRWDPYTLVVV





LDKNGQADGSLYVDDGETFDYERGAYIHRRFRFQESALVSEDVGTKGPKTAEYLKTMANVRVER





VVVVDPPKEWQGKTSVTVIEDGASAASTASMQYHSQPDGKAAYAVVKNPNVGIGKTWRIEF




Transglucosidase
MVKLTHLLARAWLVPLAYGASQSLLSTTAPSQPQFTIPASADVGAQLIANIDDPQAADAQSVCP
246



BAM72725.1
GYKASKVQHNSRGFTASLQLAGRPCNVYGTDVESLTLSVEYQDSDRLNIQILPTHVDSTNASWY





FLSENLVPRPKASLNASVSQSDLFVSWSNEPSFNFKVIRKATGDALFSTEGTVLVYENQFIEFV





TALPEEYNLYGLGEHITQFRLQRNANLTIYPSDDGTPIDQNLYGQHPFYLDTRYYKGDRQNGSY





IPVKSSEADASQDYISLSHGVFLRNSHGLEILLRSQKLIWRTLGGGIDLTFYSGPAPADVTRQY





LTSTVGLPAMQQYNTLGFHQCRWGYNNWSDLADVVANFEKFEIPLEYIWTDIDYMHGYRNFDND





QHRFSYSEGDEFLSKLHESGRYYVPIVDAALYIPNPENASDAYATYDRGAADDVFLKNPDGSLY





IGAVWPGYTVFPDWHHPKAVDFWANELVIWSKKVAFDGVWYDMSEVSSFCVGSCGTGNLTLNPA





HPSFLLPGEPGDIIYDYPEAFNITNATEAASASAGASSQAAATATTTSTSVSYLRTTPTPGVRN





VEHPPYVINHDQEGHDLSVHAVSPNATHVDGVEEYDVHGLYGHQGLNATYQGLLEVWSHKRRPF





IIGRSTFAGSGKWAGHWGGDNYSKWWSMYYSISQALSFSLFGIPMFGADTCGFNGNSDEELCNR





WMQLSAFFPFYRNHNELSTIPQEPYRWASVIEATKSAMRIRYAILPYFYTLFDLAHTTGSTVMR





ALSWEFPNDPTLAAVETQFMVGPAIMVVPVLEPLVNTVKGVFPGVGHGEVWYDWYTQAAVDAKP





GVNTTISAPLGHIPVYVRGGNILPMQEPALTTREARQTPWALLAALGSNGTASGQLYLDDGESI





YPNATLHVDFTASRSSLRSSAQGRWKERNPLANVTVLGVNKEPSAVTLNGQAVFPGSVTYNSTS





QVLFVGGLQNLTKGGAWAENWVLEW




Transglucosidase
MWSSWLLSALLATEALAVPYEEYILAPSSRDLAPASVRQVNGSVTNAAALTGAGGQATFNGVSS
247



AGN929631
VTYDFGINVAGIVSVDVASASSESAFIGVTFTESSMWISNEACDATQDAGLDTPLWFAVGQGAG





VYSVGKKYTRGAFRYMTVVSNTTATVSLNSVKINYTASPIQDLRAYTGYFHSSDELLNRIWYAG





AYTLQLCSIDPTTGDALVGLGAITSSETITLPQTDKWWTNYTITNGSSTLTDGAKRDRLVWPGD





MSIALESVAVSTEDLYSVRTALESLYALQKADGQLPYAGKPFYDTVSFTYHLHSLVGAASYYQY





TGDRAWLTRYWGQYKKGVQWALSGVDSTGLANITASADWLRFGMGAHNIEANAILYYVLNDAIS





LAQSLNDNAPIRNWTATAARIKTVANELLWDDKNGLYTDNETTTLHPQDGNSWAVKANLTLSAN





QSAIISESLAARWGPYGAPAPEAGATVSPFIG





GFELQAHYQAGQPDRALDLLRLQWGFMLDDPRMTNSTFIEGYSTDGSLVYAPYTNRPRVSHAHG





WSTGPTSALTIYTAGLRVTGPAGATWLYKPQPGNLTQVEAGFSTRLGSFASSFSRSGGRYQELS





FTTPNGTTGSVELGDVSGQLVSEGGVKVQLVGGKASGLQGGKWRLNV




Transg1ucosidase
MVKLTHLLARAWLVPLAYGASQSLLSTTAPSQPQFTIPASADVGAQLIANIDDPQAADAQSVYP
248



AIY23066.1
GYKASKVQHNSRGFTASLQLAGRPCNVYGTDVESLTLSVEYQDSDRLNIQILPTHVDSTNASWY





FFLSENLVPRPKASLNASVSQSDLFVSWSNEPSFNFKVIRKATGDALFSTEGTVLVYENQFIEFV





FTALPEEYNLYGLGEHITQFRLQRNANLTIYPSDDGTPIDQNLYGQHPFYLDTRYYKGDRQNGSY





FIPVKSSEADASQDYISLSHGVFLRNSHGLEILLRSQKLIWRTLGGGIDLTFYSGPAPADVTRQY





FLTSTVGLPAMQQYNTLGFHQCRWGYNNWSDLADVVANFEKFEIPLEYIWTDIDYMHGYHNFDND





FQHRFSYSEGDEFLSKLHESGRYYVPIVDAALYIPNPENASDAYATYDRGAADDVFLKNPDGSLY





FIGAVWPGYTVFPDWHHPKAVDFWANELVIWSKKVAFDGVWYDMSEVSSFCVGSCGTGNLTLNPA





FHPSFLLPGEPGDIIYDYPEAFNITNATEAASASAGASSQAAATATTTSTSVSYLRTTPTPGVRN





FVEHPPYVINHDQEGHDLSVHAVSPNATHVDGVEEYDVHGLYGHQGLNATYQGLLEVWSHKRRPF





FIIGRSTFAGSGKWAGHWGGDNYSKWWSMYYSISQALSFSLFGIPMFGADTCGFNGNSDEELCNR





FWMQLSAFFPFYRNHNELSTIPQEPYRWASVIEATKSAMRIRYAILPYFYTLFDLAHTSGSTVMR





FALSWEFPNDPTLAAVETQFMVGPAIMVVPVLEPLVNTVKGVFPGVGHGEVWYDWYTQAAVDAKP





FGVTTTISAPLGHIPVYVRGGNILPMQEPALTTREARQTPWALLAALGSNGAASGQLYLDDGESI





FYPNATLHVDFTASRSSLRSSAQGRWKERNPLANVTVLGVNKEPSAVTLNGQAVFPGSVTYNSTS





FQVLFVGGLQNLTKGGAWAENWVLEW




Transglucosidase
MSFRSLLALSGLVCTGLANVISKRATLDSWLSNEATVARTAILNNIGADGAWVSGADSGIVVAS
249



AIY23067.1
PSTDNPDYFYTWTRDSGLVLKTLVDLFRNGDTSLLSTIENYISAQAIVQGISNPSGDLSSGAGL





FGEPKFNVDETAYTGSWGRPQRDGPALRATAMIGFGQWLLDNGYTSTATDIVWPLVRNDLSYVAQ





FYWNQTGYDLWEEVNGSSFFTIAVQHRALVEGSAFATAVGSSCSWCDSQAPEILCYLQSFWTGSF





FILANFDSSRSGKDANTLLGSIHTFDPEAACDDSTFQPCSPRALANHKEVVDSFRSIYTLNDGLS





FDSEAVAVGRYPEDTYYNGNPWFLCTLAAAEQLYDALYQWDKQGSLEVTDVSLDFFKALYSDAAT





FGTYSSSSSTYSSIVDAVKTFADGFVSIVETHAASNGSMSEQYDKSDGEQLSARDLTWSYAALLT





FANNRRNSVVPASWGETSASSVPGTCAASSAIGTYSSVTVTSWPSIVATGGTTTTATPTGSGSVT





FSTSKTTATASKTSTSTSSTSCTTPTAVAVTFDLTATTTYGENIYLVGSISQLGDWETSDGIALS





FADKYTSSDPLWYVTVTLPAGESFEYKFIRIESDDSVEWESDPNREYTVPQACGTSTATVTDTWR




Transglucosidase
MDPANEYCGLEDYGLVGDMHTCALVSKNGSVDSMCWPVFDSPSIFCRILDKEKGGHFSITPDRR
250



GAQ47522.1
LKNPLSKQRYRPYTNMLETRWIHEEGVVNILDYFPIAKPKPHVVEKGLPQWCRCYQNKGSAQQE





FACRSGMVRKAECVRGEMEIEIELFPAFNYARDSHIAQQSSASDDGIQVYHFQSESQNLVVSVLG





FDKGDISEDDSDLSIEFELSDRPGHLGPGLIGKVTLKEGQSVTMLLHDQESITCNAQDLAPYLQQ





FIERTTGDFWSDWTSKCTFRGHYREQVERSLLVLKLLTYKPTGAIVAAPTFSLPEHIGGSRNWDY





FRYSWVRDAAFTVYVFLKNGYPEEAESYINFIFERIFPPMDKNPKPGEPFLPIMITIHGEREIPE





FMELDHLEGYRGSRPVRIGNGAATHIQLDIYGELMDSIYLYNKHAADISYDQWRAIRRMIDFVIQ





FIRHQPDQSIWEVRGPPQNFVYSKIMLWVALDRGVRLAEKRSNLPCPDRARWMHERDALYDEIMT





FKGYNAEKGFFCMSYENQDAMDAAVLIAPLVFFVAPNDPRLLSTIQKITDVPAKGGLSVANMVSR





FYDTGKVDDGVGGNEGAFLMVTFWLVEAMMRAAKSKAYLPHDPFFQQLRKTATSQFDSILSFANH





FLGMFSEEVATSGEQIGNMPQAFSHLACVSAAMNLGGGGDR




Transglucosidase
MSFRSLLALSGLVCSGLASVISKRATLDSWLSNEATVARTAILNNIGADGAWVSGADSGIVVAS
251



GAQ47133.1
PSTDNPDYFYTWTRDSGLVIKTLVDLFRNGDTDLLSTIENYISSQAIVQGISNPSGDLSSGGLG





FEPKFNVDETAYTGSWGRPQRDGPALRATAMIGFGQWLLLTGQDNGYTSAATEIVWPLVRNDLSY





FVAQYWNQTGYDLWEEVNGSSFFTIAVQHRALVEGSAFATAVGSSCSWCDSQAPQILCYLQSFWT





FGEYILANFDSSRSGKDTNTLLGSIHTFDPEAGCDDSTFQPCSPRALANHKEVVDSFRSIYTLND





FGLSDSEAVAVGRYPEDSYYNGNPWFLCTLAAAEQLYDALYQWDKQGSLEITDVSLDFFQALYSD





FAATGTYSSSSSTYSSIVDAVKTFADGFVSIVETHAASNGSLSEQYDKSDGDELSARDLTWSYAA





FLLTANNRRNSVVPPSWGETSASSVPGTCAATSASGTYSSVTVTSWPSIVATGGTTTTATTTGSG





FSVTSTSKTTTTASKTSTTTSSTSCTTPTAVAVTFDLTATTTYGENIYLVGSISQLGDWDTSDGI





FALSADKYTSSNPLWYVTVTLPAGESFEYKFIRIESDDSVEWESDPNREYTVPQACGESTATVTD





FTWR




Transglucosidase
MAKSASQIHRAWWKECSVYQIWPASYKDSNDDGIGDIPGIISKLDYIKNIGVDIVWLCPSYKSP
252



GAQ46031.1
QVDMGYDIADYYSIADEYGTVADVEKLIQGCHERGMKLLMDLVVNHTSDQNEWFKQSRSSKDNE





YRNWYVWKPARYDEQGNRHPPNNWVSHFQGSAWEWDEHTQEYYLHLYAVEQPDLNWEHPPVRKA





VHDIMRFWLDKGADGFRMDVINFVSKDQRFPDAPVKDPRTPWQWGDKYYANGPRLHEYLADLGK





ILKEYDAFSVGEMPFVRDTEEVLRAVRYDRNEINMIFNFEHVDIDHGTYDKFEPGSWKLTDLKA





FFETWQKFMYNNDGWNALYWENHDQPRSIDRYAQAKEEFRTEAGKMLATVLALQSGTPFVYQGQ





EIGMRNVPIEWDMNEYKDIDCLNHWHRLLKHRPDDIEAQKSARQEYQKKSRDNGRTPVQWSSAP





NGGFTGPNAKPWMSVNPDYVRFNAEAQVNDPNSIYHYWAAVLGLRKKYLDIFVYGDYDLVDKDS





QEIFAYSRQYEDQKALVLTNWTENTLEWDATANGVKGVKDVVLNSYESAEAAKGRFSGQKWSLR





PYEAVVLLVEA




Transglucosidase
MRCHRLLSGVLAFLPLSVAQSCWRNTTCSGPTDSAFSGPWEKNIFAPSSRTLNPEKLFLITQPD
253



GAQ44395.1
KTEDYIPFALHGNGSLVVYDFGKEVGGIVSVNFSSTGSGALGVAFTEAKNYIGEWSDSSNGGFK





GPDGALYGNFTEAGSHYYVMPDKSLRGGFRYLTLFLITSDNSTIHIEDVSLEIGFQPTWSNLKA





YQGYFHSNDDLLNKIWYTGAYTLQTNEVPTDTGRQIPAMAEGWANNCTLGPGDTIIVDGAKRDR





AVWPGDMGIAVPSAFVSLGDLDSVKNALQVMYDTQDNSTGAFDESGPPLSQKDSDTYHMWTMVG





TYNYMLYTNDSDFLEQNWEGYQKAMDYIYGKVTYPSGLLNVTGTRDWARWQQGYNNSEAQMILY





HTLNTGAELATWAGDSGDLSSTWTSRAEKLRQAINEYCWDESYGAFKDNATDTTLHPQDANSMA





LLFGVVDADRAASISERLTDNWTPIGAVAPELPENISPFISSFEIQGHLTVGQPQRALELIRRS





WGWYYNNANGTQSTVIEGYLQNGTFGYRGSRGYYYDTAYVSHSHGWSSGPTSALTNYIVGITVT





SPLGATWRIAPQFVDLQSAEGGFTTSLGKFQAGWSKTDKGYTLDFTVPHGTQGNLTLPFVGTAK





PSIKIDGTEITRGVQYANSTATVTVSGGGTYKVVVQ




Transglucosidase
MRFRDGMWLVDPSKSLQYAEDIYSINASPDNRSLNLLCPTRHIFSRGNTLNLSTLHINLESHFD
254



GAQ43928.1
GVISLEVQHWLGARKGTPDFELFPDGEGPKLSEERIGISKSERGTTLKSGALSVTVSPDQHDFS





IRFHSSDDYDWEVTSLLNRSVGLAYDPPISNGKQVEDLVQGQSGSRKHYIFTQTELDIGESVHG





LGERFGPFNRLGQHVEIWNEDGGTSSDQAYKNVSFWMSSKGYGVFIDTPEKVDLEIGSERCCRV





QTSVEGQRLKWYIIYGPSPKEVLTKYSVLTGRAPMVPAWSFGLWLTTSFTTNYDEATVTDFLQQ





MSDRSIPVEVFHYDSFWMRAFHWCDFVFSPDHFPDPKGSIARIKHAGLTNKVCVWINPYLGQAS





PVFLEAAEKGYLLKRTNGDVWQWDLWQTGMGLVDFTNPEAVRWYEGCLERLFDVGIESIKTDFG





ERIPTKGVKWHDESVDPARMHNYYAFIYNKIVYNALTRRYGDGQAVLFARSACAGVQRFPLCWG





GDCESTPAALAESVRGGLSIGLSSFSFWSCDIGGFEGTPPPWIYKRWVAFGLLCSHSRLHGSDS





YRVPWLIDNDDAGPQGSTAVLRTFVRLKRRLMPYLYTQAVQSTRMGWPLSLRATALEFPHDPTA





WAACDRQFFVGENLLVAPVFTEHGDVEFYLPEGQWTSLWDEKKVVSGPGWRREKHGFGTLPIYV





REGAVIVMGKEQGEGGFAYDWCEAPEVRLYQTKQGDCATVVDASGKEVGTLTVQDDGSLKGLEC





FRGDVTVRRIE




Transglucosidase
MDFFQTFWSSSPLSKIPSSSFNQTFMCTMCPKSNYTTPKWWKEAVVYQVYPASFNCGKPTSKTN
255



GAQ43980.1
GWGDVTGIIDKVPYLKSLGVDIVWLSPIYTSPHVDMGYDIADYKSIDPRYGTLADVDLLIKALR





NHDMKLMMDLVVNHTSDQHSWFVESASSKYSPKRDWYIWRPAKGFDDDGNPVPPNNWAQILGDA





LSAWTWHEETREFYLTLHTSAQVELNWENPEVVAAVYDVMEFWLRRGICGFRMDVINLISKDQS





FPDAPIIDPTSKYQPGEQFYTNGPRFHEFMHGIYDNVLSKYDTITVGETPYVTDIEEIIKTVGS





TAKELNMAFNFDHMEIEDVKTKGDSKWSLRDWKLTELKGILSGWQKRMKKWDGWNAIFLECHDQ





ARSVSRYTIDSDEFRERGAKLLALLETTLGGTIFLYQGQEIGMRNFPLEWDPDIEYKDVESVNF





LNKSKELHPVGTEGLAKARTLLQKKARDHARTPMQWSAAPHAGFTVPDATPWMRVNDDYETVNV





ETQMSFPWQSKGELSVWQYWQQAIQHRKLYKNAFVYGGFEDLDYHNEKVFAYLRTSADGNDSWL





VAMNWTTSAVEWTVPSDIHVTRWVSSTLQTAPPVASKTVITLRAFEGVLGCCN




Transg1ucosidase
MAGTRPMSNRWTLLLSLVILLGCLVIPGVTVKHENFKTCSQSGFCKRNRAFADDASAQGPSWTS
256



GAQ43844.1
PYELDSSSIQFKDGQLHGTILKSVSANEKVKLPLVVSFLESGAARVVVDEEKRLNGEIQLRHDS





KARKERYNEAEKWVLVGGLELSKTATLKPETETGFTRVLYGPDNQFEAVIRHAPFSADFKRDGQ





THVQLNNKGYLNMEHWRPKVEVEGEGEQQTQEDESTWWDESFGGNTDTKPRGPESVGLDITFPG





YKHVFGIPEHADSLSLKETRGGEGNHEEPYRMYNADVFEYELNSPMTLYGAIPFMQAHRKDSTV





GVFWLNAAETWVDIVKSTSSPNPLALGVGATTDTQSHWFSESGQLDVFVFLGPTPQEISKTYGE





LTGYTQLPQHFAIAYHQCRWNYITDEDVKEVDRNFDKYQIPYDVIWLDIEYTDDRKYFTWDPLT





FPDPISMEEQLDESERKLVVIIDPHIKNQDKYSISQEMTSKDLATKNKDGEIYDGWCWPGSSHW





IDTFNPAAIKWWISLFKFDKFKGTLSNVFIWNDMNEPSVFNGPETTMPKDNLHHGNWEHRDIHN





VHGITLVNATYDALLERKKGEVRRPFILTRSYYAGAQRMSAMWTGDNQATWEHLAASIPMVLNN





GIAGFPFAGADVGGFFHNPSKELLTRWYQAGIWYPFFRAHAHIDTRRREPYLIAEPHRSIISQA





IRLRYQLLPAWYTAFHEASVNGMPIVRPQYYAHPTDEAGFAIDDQLYLGSTGLLAKPVVSEEAT





TADIYLADDEKYYDYFDYTVYQGAGKRHTVPAPMETVPLLMQGGHVIPRKDRPRRSSALMRWDP





YTLVVVLDKNGQADGSLYVDDGETFDYERGAYIHRRFRFQESALVSEDVGTKGPKTAEYLKTMA





NVRVERVVVVDPPKEWQGKTSVTVIEDGASAASTAPMQYHSQSDGKAAYAVVKNPNVGIGKTWR





IEF




Transg1ucosidase
MLLHLLAYAALSSVVTAASLQPRLQDGLALTPQMGWNTYNHYSCSPNETIVRSNAQALVDLGLS
257



GAQ42954.1
SLGYRYVTTDCGWTVADRLPDGSLTWNETLFPQGFPAMGDFLHDLGLLFGVYQDSGILLCGSPP





NETGSLYHEAQDARTFASWNVDSLKYDNCYSDAATNYPNVNYAPSTSPEPRFANMSHALLQQNR





TILFQICEWGISFPAGWAPALGHSWRIGNDIIPAWRTIFRIINQAAPQTDFAGPGQWPDLDMLE





VGNNIFSLPEEQTHFSLWAILKSPLIIGAALKDELTAINDASLAVLKQKDVVAFNQDALGKSAS





LRRRWTEEGYEVWSGPLSNGRTVAAVINWRNESRDLTLDLPDIGLQHAGTVKNIWDGTTAQNVV





TSYTATVAGHGTMLLELQNTTAVGVYPRDVFGESSGQTTTFENIYAVTTSAKYTVSVYFSQPAS





SAETISIGSNANQSIISVQVPASSTLVSANIPLTAGSSNTVTINTSIPIDAIHITAPNGTYYPC





TNFTLAGSTTLTTCGSGYCQPVGSKIGYISPSGTAKATISATTSGSKYLEIDWINNEIAFDSSW





GWGSNSRNLTVTVNSEEPVRIEVPLSGRHSELFGPGLGWWDTATLGLLTSGWKEGLNEVIVGNV





GGDEGFQSYGADFEILRSNFEKMKVSLTLYAAALQLADAAVVQKRTVDVAELEHYWSYGRSEPV





YPTPETSGSGDWEEAFTKAKSLVAQMTNDEKNNITYGYTSTTNGCSGMSGGVPRLGYPGMCLQD





AASGVRGTDMVNAYASGLHIGASWNRDLAYEHAHYMGAEFKRKGANVALGPVVGPLGRMARGGR





NWEGYSNDPYLSGSLVQNTIRGLQESVIACVKHFIGNEQETNRNTPQLLEDSYNQSVSSNIDDK





TIHELYLWPFQDAVKAGAGAVMCSYNRINNSYGCQNSKNLNGLLKGELGFQGFVVSDWNAQQSG





IASAAAGLDMVMPDSVYWENGNLSLAVRNGSLSSTRLDDMATRIVAAWYKYAELEDPGFGMPIS





LLEPHDPVDARDPASKATILQEAIEGHVLVKNTDNALPLKEPKFLSLFGYDAIAAQRNTMDDLS





WSLWTMGLDNTLSYPNGTAVDPSHLKYMFLSSTNPSENGPGVSLNGTMISGGGSGASTPSYIDA





PFDAFQRQAYEDNTFLAWDFASQSPVVNPASEACLVFINEAAAEGWDRPYVADPYSDTLVENVA





SQCNNTMVIIHNAGIRLVDRWVDNPNVTAVIYGHLPGQDSGRALVEIMYGKQSPSGRLPYTVAK





NASDYGALLSPVVPEGTKDLYYPQDNFTEGVYIDYKAFEQKNITPRYEFGYGLTYSTFDYSGLK





ISIHTGVNTDYLPPNSTIEEGGIPALWDVVATVTCSVANTGSVAAAEVAQLYLGIPGGPAKVLR





GFEKKLIQPGHHTKVQFDLTRRDLSSWDVVNQAWVLQKGDYSVYVGKSVLDTQLTGTLTI




Transglucosidase
MAVSASSPEPLGANIDERTPLNSSAQHATPSANTPDYSSITKGLTSDVHSSHGSDEEQPLINPP
258



GAQ42198.1
ESPGKDVTALTSISTVIGVLLLGEFISNADATLVMAATGRISSEFNRLRDASWLSTAYTLGLCA





AQPMYGKLSDIYGRKPLLLWAYFLFGVGCVISGIGPDMATVILGRAISGIGGAGTMAMGSIIIT





DIVPRRDVAHWRAYINIAMTLGRSAGGPVGGWLTDTIGWRWSFIIQGPLAAVAALLVVWKLKLV





HPVTEKSIRRVDFLGTFLLATGIITITVIMDQAGQSFAWASLSTAILSTLSLSAFVAFVLVELY





VAPEPIFELRMLRKPNVTPSYLIGSLQITAQVGMMFSVPLYFQVTSKASATVAGGHLVPAVIGN





TLGGLIAGAFIRRTGQFKVLLILAGLVASVAYLLLFLRWNGHTGFWESLYIIPGGMGTGFCSAA





AFVSMTAFLMPQEVAMATGGYFLLFSFAMTAGVTVTNSLLGTVFKRQMEQHLTGPGAKKIIERA





LSDTSYINGLQGHVRDVVVKGYVAGLRYTYPMRVSISSLALSVYLFGKLALGLSAAEWRTQSIY





FLLTDRFGRADNSTTATCDTGDQIYCGGSWQGIINHLDYIQGMGFTAIWISPITEQLPQDTSDG





EAYHGYWQQKIYDVNSNFGTADDLKSLSDALHARGMYLMVDVVPNHMGYAGNGNDVDYSVFDPF





DSSSYFHPYCLITDWDNLTMVQDCWEGDTIVSLPDLNTTETAVRTIWYDWVADLVSNYSVDGLR





IDSVLEVEPDFFPGYQEAAGVYCVGEVDNGNPALDCPYQDYLDGVLNYPIYWQLLYAFESSSGS





ISDLYNMIKSVASDCSDPTLLGNFIENHDNPRFASYTSDYSQAKNVLSYIFLSDGIPIVYAGEE





QHYSGGDVPYNREATWLSGYDTSAELYTWIATTNAIRKLAISADSDYITYANDPIYTDSNTIAM





RKGTSGSQVITVLSNKGSSGSSYTLSLSGSGYTSGTELIEAYTCTSVTVDSNGDIPVPMASGLP





RVLLPASVVDSSSLCGGSGSTTTTAATSTSTSKATSSTTTTTAITTTSSSCTATSTTLPITFEE





LVTTTYGEEIYLSGSISQLGEWDTSDAVKLSADDYTSSNPEWYVTVSLPVGTTFEYKFIKVEED





GSVTWESDPNREYTVPECGSGETVVDTWR




Transglucosidase
MPLTYLGALAMLTALPSLGQARSTWPLGSGLELSYQASQHQISIHQDNQTIFSTLPGQPFLSAG
259



GAQ39994.1
AGKDQIVEDSGNFNITNVAQARCQGQNITQLAGIPRRDSVKNQVAVRGYLLDCGGEDIAYAMNF





WVPKTLSDRVAFEATVDSDANASVPVERLYLTFASHAREDFYGLGAQASFASMKNRSIPIFSRE





QGVGRGDQPYTAIEDSQGFFSGGDQYTTYTAIPQYVSSDGRVFYLDENDTAYAVFDFQRPDAVT





VRYDSITVHGHLMQADNMLDAITMLTEYTGRMPALPEWVDHGALLGIQGGQEKVNRIVKQGFEH





DCPVAGVWLQDWSGTHLQSAPYGNMNISRLWWNWESDTSLYPTWAEFVQALREQHGVRTLAYVN





PFLADVSSKSDGYRRNLFQEASKHRYMVQNTTTNSTAIISSGKGIDAGILDLTNEETRAWFADV





LRTQVWSANISGCMWDFGEYTPITADTSLANISTSAFFYHNQYPRDWAAYQRSVAAEMPLFHEM





VTFHRSASMGANRHMNLFWVGDQATLWTPNDGIKSVVTIQGQMGISGYAHSHSDIGGYTTVFEP





PTTSNSSGAIPRSAELLGRWGELGAVSSAVFRSHEGNVPSVNAQFYSNSTTYAYFAYNARMFRS





LGPYRRRILNTESQRRGWPLLRMPVLYHPEDLRARQISYESFFLGRDLYVAPVLDEGRKSVEVY





FPGHSANRTYTHVWSGQTYRGGQTAQVSAPFGKPAVFVVDGASSPELDVFLDFVRKENGTVLRA




Transglucosidase
MVKLTDLLARAWLVPLAYGASQSRLSTTTSSQPQFTIPASADVGAQLIANIDDPQAANAQSVCP
260



GAQ38166.1
GYKASKVQHNSRGFTASLQLAGRPCNVYGTDVDSLTLSVEYQDSDRLNIQILPTHVDSTNASWY





FLSENLVPRPKASLNASVSDSDFSVSWSNEPSFNFKVIRKATGDALFSTEGTVLVYEDQFIEFV





TALPEEYNLYGLGEHITQFRLQRDANLTIYPSDDGTPIDKNIYGQHPFYLDTRYYKGDRQNGSY





VPVKSSETDASQKYISLSHGVFLRNSHGLEILLRPQKLIWRTLGGGIDLTFYSGPNPADVTRQY





LTSTVGLPAMQQYSTLGFHQCRWGYNNWSDLADVVANFEKFEIPLEYIWTDIDYMHGYRNFDND





QNRFSYSEGDEFLSKLHESGRYYVPIVDAALYIPNPENASDAYATYDRGAADDVFLKNPDGSLY





IGAVWPGYTVFPDWHHPKAVEFWANELVIWSKKVAFDGVWYDMSEVSSFCVGSCGTGNLTLNPA





HPPFLLPGEPGDIIYDYPEAFNITNATEAASASAGASSQAAATATSTSTSVSYLRTTPTPGVRN





VEHPPYVINHDQEGHDLSVHAVSPNATHIDGVEEYDVHGLYGHQGLNATYHGLLEVWSHERRPF





IIGRSTFAGSGKWAGHWGGDNYSKWWSMYYSISQALSFSLFGIPMFGADTCGFTGNSDEELCNR





WMQLSAFFPFYRNHNELSTIPQEPYRWASVIEATKSAMRIRYAILPYFYTLFDLAHTTGSTVMR





ALSWEFPNDPTLAAVETQFMVGPAIMVIPVLEPLVNTVKGVFPGVGHGEVWYDWYTQAAVDAKP





GVNTTISAPLGHIPVYVRGGNILPMQEPALTTRGARQTPWALLAALGSNGTASGQLYLDDGESI





YPNATLRVGFTASRSSLRSSAQGRWKERNPLANVTVLGVNKEPSAVTLNGKTVSPGSITYNSTS





QVLFVGGLQNLTNGGAWAENWVLEW




Transglucosidase
MLGSLLFLLPLVGAAVIGPRAGSQSCPGYKASNVQKSARSLTADLTLAGAPCNSYGKDVEDLKL
261



GAQ36312.1
LVEYQTDERLHVMIYDADEEVYQVPESVLPRVGSDKDSQDSVLEFDYVEEPFSFTISKGDEVLF





DSSASTLIFQSQYVRLRTWLPDDPYVYGLGEHSDPMRLPTYNYTRTLWNRDAYGTPNNTNLYGS





HPVYYDHRGKSGTHGVFLLNSNGMDIKINQTTDGKQYLEYNLLGGVLDFYFFYGEDPKQASMEY





SKIVGLPAMQSYWTFGFHQCRYGYRDVYELAEVVYNYSQAKIPLETMWTDIDYMDKRRVFTLDP





QRFPLEKMRELVTYLHNHDQHYIVMVDPAVSVSNNSAYLTGVRDNVFLHNQNGSLYEGAVWPGV





TVFPDWFNEDTQDYWTAQFQQFFDPKSGVDIDALWIDMNEASNFCPYPCLDPAAFAISDDLPPA





APPVRPSSPIPLPGFPADFQPSSKRSVKRAQGDKGKKVGLPNRNLTDPPYTIRNAAGVLSMSTI





ETDLIHAGEGYAEYDTHNLYGTMMSSASRTAMQARRPDVRPLVITRSTFAGAGAHVGHWLGDNL





SDWVHYRISIAQILSFASMFQIPMVGADVCGFGSNTTEELCGRWASLGAFYTFYRNHNELGDIP





QEFYRWPTVAESARKAIDIRYRLLDYIYTALHRQSQTGEPFLQPQFYLYPEDSNTFANDRQFFY





GDALLVSPVLNEGSTSVDAYFPDDIFYDWYTGAVVRGHGENITLSNINITHIPLHIRGGNIIPV





RMSSGMTTTEVRKQGFELIIAPDLDGTASGSLYLDDGDSLNPSSVTELEFTYSNGELHVQGTFG





QKAVPKVEKCTLLGKSARTFKGFALDAPVNLKLK




Transglucosidase
MSSPQQVYLLPLKDDGSPDVPGGYLYLPSPTDPPYLLRFVIEGSSSICREGALWVNIPEKGESF
262



GAQ33831.1
NRSAFRSFSLSPDFNKNIQIDIPITSAGSFAFYVTFSPLPEFSVLSTPTPEPTRTPTHYIDVSP





KLTLRGQDLPLNALSIYSVISKFMGQYPKDWEKHLNGISQRNYNMVHFTPLMKRGASNSPYSIF





DQLQFDDAVFPNGEDDVARLVSKMEDEYGLLSLTDVVWNHTAHNSKWLEEHPEAGYSVETAPWL





EAALELDTALLKFGQELSTLGLPTEFHTVDELMEVMNAMRDKVISGIRLWEFYAIDVKADTQRI





LDQWKTSKDLNLTDKKWAQLNLSDYKNWTLKQQATFIREYAIPTSKQVLGRFSRAVDLHFGAAI





LTALFGPHDSPTSDTNTVEESLSKILDEVNLPFYEEYDGDVSEIMNQVFNRIKYLRIDDHGPKL





GAVTAQSPLIETYFTRLPLNDVTKKHKKGALALVNNGWIWNADALRDNAGPDSRAYLRREVIVW





GDCVKLRYGSCRDDNPFLWDFMTDYTRLMAKYFSGFRIDNCHSTPLVVAEYLLDEARKVRPNLT





VFAELFTGSEEADYIFVKRLGINALIREAMQAWSTGELSRLVHRHGGRPIGSFDLDLPSSGSSH





AIASSGLDSGKEKVAHIRPTPVQALFMDCTHDNEMPAQKRTAKDTLPNGALVAMCASAIGSVIG





YDEVYPRLVDLVHEHRLYFSEFSEAPETGLNSLEGGIGGIKKLLNDLHTRMGVEEYDETHIHHD





GEYITVHRVHPRTRKGVFLIAHTAFSGQDGKSVLAPTHLVGTHVKHIGTWLLEVDASQTTKERI





QTDKSYLRGLPSQVKTFEGTKIEESGKDTIISVLDSFVAGSIALFETSMPSVEHASGLDNYITE





GVDHAFSDLSLVDLNFALYRCEAEERDSSKGQDGVYDIPGHGPLVYAGLQGWWSVLENIIKYNE





LGHPLCDHLRNGQWALDYIVARLEKLGHTDEHTALGRPAAWLQEKFQAVRQLPSFLLPRYFAII





VQVAYNAAWKRGIQLLGPHIQNGQEFIHQLGMVSVQQTGYVNSASLWPTKKVPSLAAGLPHFAV





DWARCWGRDVFISLRGLLLCTSRFEDAKEHITAFASVLKHGMIPNLLSSGKLPRYNSRDSVWFF





LQSIQDYTKMAPDGLRLLDHNVPRRFLPYDDVWFPYDDPRAYSQHSTISEIIQEVLQRHAQGLS





FREYNAGPDLDMQMTQEGFQIDVKVDWETGLIFGGSQYNCGTWQDKMGESAKAGNKGVPGTPRD





GAAIEITGLVYSALTWVAELHERGLYKHDGVDIGDGKSISFKEWASRIQANFERCYYVPLQPKD





DGQYDIDANIINRRGIYKDLYRSGKPYEDYQLRSNFPIAMTVAPDLFTSSKALAALALADEVLV





GPVGMATLDPSDLNYRPNYNNSEDSTDFATAKGRNYHQGPEWVWQRGYFLRAFLHFDLARRTTP





AERTETYQQITRRLEGCKRALRESPWKGLTELTNKNGAHCADSNICFWSVMSLTESAAAAMLTA





AVVVDSPIDVHEPRWDDQTRGRDLYTQLVISLLVGLSAFFSFCVLRPKWTELYAARRRQRCAAS





YLPELPDSFFGWIPVLYRITDEQVLESAGLDAFVFLTFLKFAIRFLSAIFFFALVIILPTHYKN





TGKSGVPGWDDDDDETFDGDKDKKKIISDPNYLWMYVIFTYIFTGLAVYMLIQETNKVIRTRQK





YLGSQTSTTDRTIRLSGIPPDLGTEEKIKDFMEGLKVGKVESVTLCRDWRELDHLIDERLKLLR





NLERAWTRHLGYKRVKASPNALTLMHQQPRGSSIVSDGESERIQLLSEGGRDHVTDYAHKRPTV





RIWYGPFKLRYKNIDAIDYYEEKLRRLDEKIQVARQKEYPPTEVAFVTMESIAASQMVVQAILD





PHPMQLLARLAPAPADVVWKNTYLPRSRRMMQSWFITVVIGFLTVFWSVLLIPVAYLLEYETLH





KVFPQLADALARNPLAKSLVQTGLPTLVLSLLTVAVPYLYNWLSNQQGMMSRGDIELSVISKTF





FFSFFNLFLVFTVFGTATTFYGFWENLRDAFKDATTIAFALAKTLENFAPFYINFLCLQGIGLF





PFRLLEFGSVAMYPINFLAAKTPRDYAELSTPPTFSYGYSIPQTVLSLIICVVYSVFPSSWLIC





LFGLIYFTIGKFIYKYQLLYAMDHQQHSTGRAWPMICSRILMGLIVFQLAMIGVLALRRAITRS





LLIVPLLMATVWFSYFFARTYEPLMKFIALKSIDRERPGGGDISPSPSSTFSPPSGLDRDSFPI





RIGGQELGLRLRKYVNPSLILPLHDAWLPGRTMVPELQGELEHRNSENNAADESV




Transglucosidase
MWSSWLLSALLATEALAVPYEEYILAPSSRDLAPASVRQVNGSVTNAAALTGAGGQATFNGVSS
263



GAQ33901.1
VTYDFGINVAGIVSVDVASASSESAFIGVTFTESSMWISNEACDATQDAGLDTPLWFAVGQGAG





VYSVGKKYTRGAFRYMTVVSNTTATVSLNSVKINYTASPIQDLRAYTGYFHSSDELLNRIWYAG





AYTLQLCSIDPTTGDALVGLGVITSSETITLPQTDKWWTNYTITNGSSTLTDGAKRDRLVWPGD





MSIALESVAVSTEDLYSVRTALESLYALQKADGQLPYAGKPFYDTVSFTYHLHSLVGAASYYQY





TGDRAWLTRYWGQYKKGVQWALSSVDSTGLANITASADWLRFGMGAHNIEANAILYYVLNDAIS





LAQSLNDNAPIRNWTATAARIKTVANELLWDDKNGLYTDNETTTLHPQDGNSWAVKANLTLSAN





QSAIISESLAARWGPYGAPAPEAGATVSPFIGGFELQAHYQAGQPDRALDLLRLQWGFMLDDPR





MTNSTFIEGYSTDGSLVYAPYTNRPRVSHAHGWSTGPTSALTIYTAGLRVTGPAGATWLYKPQP





GNLTQVEAGFSTRLGSFASSFSRSGGRYQELSFTTPNGTTGSVELGDVSGQLVSEGGVKVQLVG





GKASGLQGGKWRLNV




Transglucosidase
MRWHKLLPGVLALLPLSVAQSCWRNTTCSGPTESAFSGPWEKNIFAPSSRTVNPEKLFLITQPD
264



EHA19108.1
KTEEYSPFALHGNGSLVVYDFGKEVGGIVSVNFSSTGSGALGVAFTEAKNWIGEWSDSSNGGFK





GPDGALYGNFTEAGSHYYVMPDKSLRGGFRYLTLFLITSDNSTIQIEDVNLEIGFQPTWSNLKA





YQGYFHSNDDLLNKIWYTGAYTLQTNEVPTDTGRQIPAMAVGWANNCTLGPGDTIIVDGAKRDR





AVWPGDMGIAVPSAFVSLGDLDSVKNALQVMYDTQNNSTGAFDESGPPLSQKDSDTYHMWTMVG





TYNYMLFTNDSDFLERNWEGYQKAMDYIYGKVTYPSGLLNVTGTRDWARWQQGYNNSEAQMILY





HTLNTGAELATWAGDSGDLSSTWTSRAEKLRQAINEYCWDDSYGAFKDNATDTTLHPQDANSMA





LLFGVVDADRAASISERLTDNWTPIGAVAPELPENISPFISSFEIQGHLTVGQPQRALELIRRS





WGWYYNNANGTQSTVIEGYLQNGTFGYRSDRGYYYDTAYVSHSHGWSSGPTSALTNYIVGISVT





SPLGATWRIAPQFVDLQSAEGGFTTSLGKFQAGWSKTDKGYTLDFTVPHGTQGNLTLPFVSAAK





PSIKIDGTEISRGVQYANSTATVTVSGGGTYKVEVQ




Transglucosidase
MPQKEFVPKTYQESSTGAQSSSSVHLRSSPEERSFDFSFEPIRENLFRVTFSSQDHPLPPYPSV
265



EHA19157.1
TKPATSLDGVHVSATGGSNQKTIEVGDVTASVEWSNTPVVSLSWKGTEKPLYRDLPLRSYVADS





TGIAHYTEHDRDCLHVGLGEKRAPMDLTGRHFQLSATDSFGYDVYNTDPLYKHIPLLIKASPDG





CVAIFSTTHGRGTWSVGSEVDGLWGHFKVYRQDYGGLEQYLIVGKTLKDVVRSYAELVGLPILV





PRWAYGYISGGYKYTMLDDPPAHEALMEFADKLEEHGIPCSAHQMSSGYSIAETEPKVRNVFTW





NKYRFPNPEEWIAKYHGRGIRLLSNIKPFLLASHPDFQKLIDGNGFFKDPESSKPGYMRLWSAG





GATGGDGCHIDFSSAVAFKWWYDGVQSLKRAGIDAMWNDNNEYTLPDDDWKLALDEPTVSDAVK





KGVENSVGQWGRAMHTELMGKASHDALLNIEPNHRPFVLTRSATAGTMRYAASTWSGDNVTSWE





GMKGANALSLSAGISLLQCCGHDIGGFEGPQPSPELLLRWIQLGIHSPRFAINCFKTSPGNSSV





GDVIEPWMYPEITPLVRDTIKRRYEILPYIYSLGLESHLTASPPQRWVGWGYESDPEVWTKALK





SGDEQFWFGDTIMVGGVYEPGVSVAKLYLPRKANDQFDFGYVNMNEPYNYLASGQWVEVPSEWR





KSIPLLARIGGAIPVGKPVHTRVPGDDTPASVAVKEVDDYRGVEIFPPLGSSHGQVFSTTWFED





DGISLEARISEYTVTYSSTEEKVIVGFSRDEKSGFVPAWTDLDIILHNGDERRVVSDIGKTVEY





KGKGSRGRVVYTLKN




Transglucosidase
MRLSTSSLLLSVSLLGKLALGLSAAEWRTQSIYFLLTDRFGRTDNSTTATCNTGDQIYCGGSWQ
266



EHA19519.1
GIINHLDYIQGMGFTAIWISPITEQLPQDTADGEAYHGYWQQKIYDVNSNFGTADDLKSLSDAL





HARGMYLMVDVVPNHMGYAGNGNDVDYSVFDPFDSSSYFHPYCLITDWDNLTMVQDCWEGDTIV





SLPDLNTTETAVRTIWYDWVADLVSNYSVDGLRIDSVLEVEPDFFPGYQEAAGVYCVGEVDNGN





PALDCPYQEYLDGVLNYPIYWQLLYAFESSSGSISDLYNMIKSVASDCSDPTLLGNFIENHDNP





RFASYTSDYSQAKNVLSYIFLSDGIPIVYAGEEQHYSGGKNDAFYTDSNTIAMRKGTSGSQVIT





VLSNKGSSGSSYTLTLSGSGYTSGTKLIEAYTCTSVTVDSSGDIPVPMASGLPRVLLPASVVDS





SSLCGGSGSNSSTTTTTTATSSSTATSKSASTSSTSTACTATSTSLAVTFEELVTTTYGEEIYL





SGSISQLGDWDTSDAVKMSADDYTSSNPEWSVTVTLPVGTTFEYKFIKVESDGTVTWESDPNRE





YTVPECGSGETVVDTWR




Transglucosidase
MVKLTHLLARAWLVPLAYGASQSLLSTTAPSQPQFTIPASADVGAQLIANIDDPQAADAQSVCP
267



EHA20839.1
GYKASKVQHNSRGFTASLQLAGRPCNVYGTDVESLTLSVEYQDSDRLNIQILPTHVDSTNASWY





FLSENLVPRPKASLNASVSQSDLFVSWSNEPSFNFKVIRKATGDALFSTEGTVLVYENQFIEFV





TALPEEYNLYGLGEHITQFRLQRNANLTIYPSDDGTPIDQNLYGQHPFYLDTRYYKGDRQNGSY





IPVKSSEADASQDYISLSHGVFLRNSHGLEILLRSQKLIWRTLGGGIDLTFYSGPAPADVTRQY





LTSTVGLPAMQQYNTLGFHQCRWGYNNWSDLADVVANFEKFEIPLEYIWTDIDYMHGYRNFDND





QHRFSYSEGDEFLSKLHESGRYYVPIVDAALYIPNPENASDAYATYDRGAADDVFLKNPDGSLY





IGAVWPGYTVFPDWHHPKAVDFWANELVIWSKKVAFDGVWYDMSEVSSFCVGSCGTGNLTLNPA





HPSFLLPGEPGDIIYDYPEAFNITNATEAASASAGASSQAAATATTTSTSVSYLRTTPTPGVRN





VEHPPYVINHDQEGHDLSVHAVSPNATHVDGVEEYDVHGLYGHQGLNATYQGLLEVWSHKRRPF





IIGRSTFAGSGKWAGHWGGDNYSKWWSMYYSISQALSFSLFGIPMFGADTCGFNGNSDEELCNR





WMQLSAFFPFYRNHNELSTIPQEPYRWASVIEATKSAMRIRYAILPYFYTLFDLAHTTGSTVMR





ALSWEFPNDPTLAAVETQFMVGPAIMVVPVLEPLVNTVKGVFPGVGHGEVWYDWYTQAAVDAKP





GVNTTISAPLGHIPVYVRGGNILPMQEPALTTREARQTPWALLAALGSNGTASGQLYLDDGESI





YPNATLHVDFTASRSSLRSSAQGRWKERNPLANVTVLGVNKEPSAVTLNGQAVFPGSVTYNSTS





QVLFVGGLQNLTKGGAWAENWVLEW




Transglucosidase
MSFRSLLALSGLVCTGLANVISKRATLDSWLSNEATVARTAILNNIGADGAWVSGADSGIVVAS
268



EHA21384.1
PSTDNPDYFYTWTRDSGLVLKTLVDLFRNGDTSLLSTIENYISAQAIVQGISNPSGDLSSGAGL





GEPKFNVDETAYTGSWGRPQRDGPALRATAMIGFGQWLLDNGYTSTATDIVWPLVRNDLSYVAQ





YWNQTGYDLWEEVNGSSFFTIAVQHRALVEGSAFATAVGSSCSWCDSQAPEILCYLQSFWTGSF





ILANFDSSRSGKDANTLLGSIHTFDPEAACDDSTFQPCSPRALANHKEVVDSFRSIYTLNDGLS





DSEAVAVGRYPEDTYYNGNPWFLCTLAAAEQLYDALYQWDKQGSLEVTDVSLDFFKALYSDAAT





GTYSSSSSTYSSIVDAVKTFADGFVSIVETHAASNGSMSEQYDKSDGEQLSARDLTWSYAALLT





ANNRRNSVVPASWGETSASSVPGTCAATSAIGTYSSVTVTSWPSIVATGGTTTTATPTGSGSVT





STSKTTATASKTSTSTSSTSCTTPTAVAVTFDLTATTTYGENIYLVGSISQLGDWETSDGIALS





ADKYTSSDPLWYVTVTLPAGESFEYKFIRIESDDSVEWESDPNREYTVPQACGTSTATVTDTWR




Transglucosidase
MAKSASQIHRAWWKECSVYQIWPASYKDSNDDGIGDIPGIISKLDYIKNIGVDIVWLCPSYKSP
269



EHA23512.1
QVDMGYDIADYYSIADEYGTVADVEKLIQGCHERGMKLLMDLVVNHTSDQNEWFKQSRSSKDNE





YRNWYVWKPARYDEQGNRHPPNNWVSHFQGSAWEWDEHTGEYYLHLYATEQPDLNWEHPPVRKA





VHDIMRFWLDKGADGFRMDVINFISKDQRFPDAPVKDPRTPWQWGDKYYANGPRLHEYLQDLGK





ILKEYDAFSVGEMPFVRDTEEVLRAVRYDRNEINMIFNFEHVDIDHGTYDKFEPGSWKLTDLKA





FFETWQKFMYNNDGWNALYWENHDQPRSIDRYAQAKEEFRTEAGKMLATVLALQSGTPFVYQGQ





EIGMRNVPVEWDMNEYKDIDCLNHWHRLLKHRPDDIEAQKSARQEYQKKSRDNGRTPVQWSSAP





NGGFTGPNAKPWMSVNPDYVRFNAEAQVNDPNSIYHYWAAVLGLRKKYLDIFVYGDYDLVDKDS





QEVFAYARQFENQKALVLTNWTEKTLEWDATANGVKGIKDVLLNSYESAEAAKERFTGQKWSLR





PYEAVVLLVEA




Transglucosidase
MPSTYLGALATLAVFPCLGQARSTWPLGSGLELSYQASQHQISIHQDNQTIFSTIPGQPFLSAS
270



EHA23680.1
AGKDQFVEDSGNFNITNVNQARCRGQNITQLAGIPRSDSVKNQVAVRGYLLDCGGEDIAYGMNF





WVPRRFSDRVAFEATVDSEANASVPVDRLYLTFASHALEDFYGLGAQASFASMKNRSIPIFSRE





QGVGRGDQPYTAIEDSQGFFSGGDQYTTYTAIPQYVSSDGRVFYLDENDTAYAVFDFQRSDAVT





VRYDSLSVHGHLMQADTMLDAITMLTEYTGRMPTLPEWVDHGALLGIQGGQEKVNRIVKQGFEH





DCPVAGVWLQDWSGTHLQSAPYGNMNISRLWWNWESDTSLYPTWAEFVQTLREQHGVRTLAYVN





PFLANVSSKSDGYRRNLFLEASQHRYMVQNTTTNSTAIISSGKGIDAGILDLTNEDTRAWFADV





LRTQVWSANISGCMWDFGEYTPITPDTSLANISTSAFFYHNQYPRDWAAYQRSVAAEMPLFHEM





VTFHRSASMGANRHMNLFWVGDQATLWTRNDGIKSVVTIQGQMGISGYAHSHSDIGGYTTVFEP





PTTSNSSGAIPRSAELLGRWGELGAVSSAVFRSHEGNVPSVNAQFYSNSTTYAYFAYNARLFRS





LGPYRRRILNTESQRRGWPLLRMPVLYHPEDLRARQISYESFFLGRDLYVAPVLDEGHKSVDVY





FPGHGANRTYTHVWTGQTYRAGQTAKVSAPFGKPAVFLVNGASSPELDVFLNFVRKENGTVLHA




Transglucosidase
MDPANEYCGLEDYGLVGDMHTCALVSKNGSVDSMCWPVFDSPSIFCRILDKEKGGHFSITPDRR
271



EHA25759.1
LKNPLSKQRYRPYTNMLETRWIHEEGVMNILDYFPIAKPKPHVVEKGLPQWCRCYQNKGSAQYQ





ACRSGMVRKAECVRGEMEIEIELFPAFNYARDSHVAQQSSASDDAIQVYHFQAESQNLVVSVLG





DKGDISEDDSDLSIEFELSDRPGHLGPGLVGKVILKEGQSITMLLHDQESITCDVDDLAPYLQQ





IERTTGDFWSDWTSKCTFRGHYREQVERSLLVLKLLTYKPTGAIVAAPTFSLPEHIGGSRNWDY





RYSWVRDAAFTVYVFLKNGYPEEAESYINFIFERIFPPMDKNPKPGEPFLPIMITIHGEREIPE





MELDHLEGYRGSRPVRIGNGAATHIQLDIYGELMDSIYLYNKHAADISYDQWRAIRRMIDFVIQ





IRHQPDQSIWEVRGPPQNFVYSKIMLWVALDRGLRLAEKRSNLPCPDRARWMHERDALYDEIMT





KGYNAEKGFFCMSYENQDAMDAAVLIAPLVFFVAPNDPRLLSTIQRITEVPAKGGLSVANMVSR





YDTGKVDDGVGGNEGAFLMVTFWLVEAMMRAARSKSYLPHDPFFQQLRKTATSQFDSILSFANH





LGMFSEEVATSGEQIGNMPQAFSHLACVSAAMNLGGGGDR





MSSPQQVYLLPLKDDGSPDVPGGYIYLPAPTNPPYLLRFVIEGSSSICREGALWVNIPEKGESF




Transglucosidase
NRSAFRSFSLSPDFNKNIQIDVPITSAGSFAFYVTFSPLPEFSVISTPTPEPTRTPTHYIDVSP
272



EHA26514.1
KLTLRGQDLPLNALSIYSVISKFMGQYPKEWEKHLNGISQRNYNMVHFTPLMKRGASNSPYSIF





DQLQFDDAVFPNGEDDVARLISKMENEYGLLSLTDVVWNHTAHNSKWLEEHPEAGYSVETAPWL





EAALELDTALLKFGQDLQNLGLPTEFQTVDELMKVMNVMRDKVIAGIRLWEFYAIDVKSDTHKI





LDKWKTSKDIDLTDTNWAQLNLQDYKNWTLKQQATFIRDHAIPTSKQVLDRFSRAVDLQFGAAI





LTALFGPHNPSTSDTSIVEESLSKILDEVNLPFYEEYDGDVSEIMNQVFNRIKYLRIDDHGPKL





GAVTAQSPLIETYFTRLPLNDVTKKHKKEALALVNNGWIWNADALRDNAGPDSRAYLRREVIVW





GDCVKLRYGSCRDDNPFLWDFMTDYTRLMAKYFSGFRIDNCHSTPLVVAEYLLDEARKVRPNLT





VFAELFTGSEEADYIFVKRLGINALIREAMQAWSTGELSRLVHRHGGRPIGSFDLDLPSSGSSH





AIASSGLDSGKEKVVHIRPTPVQALFMDCTHDNEMPAQKRTAKDTLPNGALVAMCASAIGSVIG





YDEVYPRLVDLVHEHRLYFSEFSEAPETGLNSLEGGIGGIKKLLNELHTKMGIEGYDETHIHHD





GEYITVHRVHPRTRKGVFLIAHTAFPGQDSRSVLAPTHLVGTQVKHIGTWLLEVDTSQTTKERI





QADKSYLRGLPSQVKTFEGTKIEESGKDTIISVLNSFVAGSIALFETSMPSVEHASGLDNYITE





GVDHAFSDLSLVDLNFALYRCEAEERDSSKGQDGAYDIPGHGPLVYAGLQGWWSVLENIIKYNE





LGHPLCDHLRNGQWALDYIVARLEKLSHKEEHPALGRPAAWLQEKFQAVRQLPSFLLPRYFAII





VQVAYNAAWKRGIQLLGSHIQKGQEFIHQLGMVSVQQTGYVNSASLWPTKKVPSLAAGLPHFAV





DWARCWGRDVFISLRGLLLCTGRFEDAKEHITAFASVLKHGMIPNLLSSGKLPRYNSRDSVWFF





LQSIQDYTEMAPDGLEILDHKVPRRFLPYDDVWFPFDDPRAYSQQSTISEIIQEVFQRHAQGLS





FREYNAGPDLDMQMTQDGFQIDVKVDWETGLIFGGSQYNCGTWQDKMGESAKAGNKGVPGTPRD





GAAIEITGLVYSALTWVAKLHERGIYKHDGVDIGGGKSISFEDWASRIRANFERCYYVPLQPKD





DGQYDIDANIINRRGIYKDLYRSGKPYEDYQLRSNFPIAMTVAPDLFTASKALAALALADEVLV





GPVGMATLDPSDLNYRPNYNNSEDSTDFATAKGRNYHQGPEWVWQRGYFLRAFLHFDLARRTTP





AERTETYQQITRRLEGCKRALRESPWKGLTELTNKNGAYCADSSPTQAWSAGCLLDLYYDASRH





SQS




Transglucosidase
MWSSWLLSALLATEALAVPYEEYILAPSSRDLAPASVRQVNGSVTNAAALTGAGGQATFNGVSS
273



EHA26552.1
VTYDFGINVAGIVSVDVASASSESAFIGVTFTESSMWISSEACDATQDAGLDTPLWFAVGQGAG





LYTVEKKYNRGAFRYMTVVSNTTATVSLNSVKINYTASPTQDLRAYTGYFHSNDELLNRIWYAG





AYTLQLCSIDPTTGDALVGLGVITSSETISLPQTDKWWTNYTITNGSSTLTDGAKRDRLVWPGD





MSIALESVAVSTEDLYSVRTALESLYALQKPDGRLPYAGKPFFDTVSFTYHLHSLVGAASYYQY





TGDRAWLTRYWGQYKKGVQWALSSVDSTGLANITASADWLRFGMGAHNIEANAILYYVLNDAIS





LAQTLNDNAPIRNWTTTAARIKTVANELLWDDKNGLYTDNETTTLHPQDGNSWAVKANLTLSAN





QSAIVSESLAARWGPYGAPAPEAGATVSPFIGGFELQAHYQAGQPDRALDLLRLQWGFMLDDPR





MTNSTFIEGYSTDGSLAYAPYTNTPRVSHAHGWATGPTSALTIYTAGLRVTGPAGATWLYKPQP





GNLTQVEAGFSTRLGSFASSFSRSGGRYQELSFSTPNGTTGSVELGDVSGQLVSDRGVKVQLVG





GKASGLQGGKWKLSNN




Transg1ucosidase
MLGSLLLLLPLVGAAVIGPRANSQSCPGYKASNVQKQARSLTADLTLAGTPCNSYGKDLEDLKL
274



EHA26885.1
LVEYQTDERLHVMIYDADEEVYQVPESVLPRVGSDEDSEDSVLEFDYVEEPFSFTISKGDEVLF





DSSASPLVFQSQYVNLRTWLPDDPYVYGLGEHSDPMRLPTYNYTRTLWNRDAYGTPNNTNLYGS





HPVYYDHRGKSGTYGVFLLNSNGMDIKINQTTDGKQYLEYNLLGGVLDFYFFYGEDPKQASMEY





SKIVGLPAMQSYWTFGFHQCRYGYRDVYELAEVVYNYSQAKIPLETMWTDIDYMDKRRVFTLDP





QRFPLEKMRELVTYLHNHDQHYIVMVDPAVSVSNNTAYITGVRDDVFLHNQNGSLYEGAVWPGV





TVFPDWFNEGTQDYWTAQFQQFFDPKSGVDIDALWIDMNEASNFCPYPCLDPAAYAISADLPPA





APPVRPSSPIPLPGFPADFQPSSKRSVKRAQGDKGKKVGLPNRNLTDPPYTIRNAAGVLSMSTI





ETDLIHAGEGYAEYDTHNLYGTTHIPMVGADVCGFGSNTTEELCARWASLGAFYTFYRNHNELG





DISQEFYRWPTVAESARKAIDIRYKLLDYIYTALHRQSQSGEPFLQPQFYLYPEDSNTFANDRQ





FFYGDALLVSPVLNEGSTSVDAYFPDDIFYDWYTGAVVRGHGENITLSNINITHIPLHIRGGNI





IPVRTSSGMTTTEVRKQGFELIIAPDLDDTASGSLYLDDGDSLNPSSVTELEFTYSKGELHVKG





TFGQKAVPKVEKCTLLGKSART




Transg1ucosidase
MCHKSNYSSPKWWKESVVYQVYPASFNCGKSTTTTNGWGDVTGIIEKVPYLKSLGVDIVWLSPI
275



EHA27488.1
YTSPQVDMGYDIADYKSIDPRYGTLADVDLLIKSLKDHDMRLMMDLVVNHTSDQHSWFVESASS





KDSPKRDWYIWRPAKGFDEAGNPVPPNNWAQILGDTLSAWTWHEETQEFYLTLHTSAQAELNWE





NPDVVTAVYDVMEFWLRRGICGFRMDVINFISKDQSFPDAPIIDPASKYQPGEQFYTNGPRFHE





FMHGIYDNVLSKYDTITVGETPYVTDMKEIIKTVGSTAKELNMAFNFDHMEIEDIKTKGESKWS





LRDWKLTELKGILSGWQKRMREWDGWNAIFLECHDQARSVSRYTNDSDEFRDRGAKLLALLETT





LGGTIFLYQGQEIGMRNFPVEWGPDTEYKDIESVNFWKKSKELHPVGSEGLAQARTLLQKKARD





HARTPMQWSADPHAGFTVPDATPWMRVNDDYRTVNVEAQMSFPWEMKGELSVWQYWQQALQRRK





LHKGAFVYGDFEDLDYHNESVFAYSRTSADGKETWLPVPSWSLPKKRTLS




Transg1ucosidase
MSNRWTLLLSLVILLGCLVIPGVTVKHENFKTCSQSGFCKRNRAFADDAAAQGSSWASPYELDS
276



EHA28539.1
SSIQFKDGQLHGTILKSVSPNEKVKLPLVVSFLESGAARVVVDEEKRMNGDIQLRHDSKARKER





YNEAEKWVLVGGLELSKTATLRPETESGFTRVLYGPDNQFEAVIRHAPFSADFKRDGQTHVQLN





NKGYLNMEHWRPKVEVEGEGEQQTQEDESTWWDESFGGNTDTKPRGPESVGLDITFPGYKHVFG





IPEHADSLSLKETRGGEGNHEEPYRMYNADVFEYELSSPMTLYGAIPFMQAHRKDSTVGVFWLN





AAETWVDIVKSTSSPNPLALGVGATTDTQSHWFSESGQLDVFVFLGPTPQEISKTYGELTGYTQ





LPQHFAIAYHQCRWNYITDEDVKEVDRNFDKYQIPYDVIWLDIEYTDDRKYFTWDPLSFPDPIS





MEEQLDESERKLVVIIDPHIKNQDKYSIVQEMKSKDLATKNKDGEIYDGWCWPGSSHWIDTFNP





AAIKWWVSLFKFDKFKGTLSNVFIWNDMNEPSVFNGPETTMPKDNLHHGNWEHRDIHNVHGITL





VNATYDALLERKKGEIRRPFILTRSYYAGAQRMSAMWTGDNQATWEHLAASIPMVLNNGIAGFP





FAGADVGGFFQNPSKELLTRWYQAGIWYPFFRAHAHIDTRRREPYLIAEPHRSIISQAIRLRYQ





LLPAWYTAFHEASVNGMPIVRPQYYAHPWDEAGFAIDDQLYLGSTGLLAKPVVSEEATTADIYL





ADDEKYYDYFDYTVYQGAGKRHTVPAPMETVPLLMQGGHVIPRKDRPRRSSALMRWDPYTLVVV





LDKNGQADGSLYVDDGETFDYKRGAYIHRRFRFQESALVSEDVGTKGPKTAEYLKTMANVRVER





VVVVDPPKEWQGKTSVTVIEDGASAASTASMQYHSQPDGKAAYAVVKNPNVGIGKTWRIEF




Transg1ucosidase
MWSSWLLSALLATEALAVPYEEYILAPSSRDLAPASVRQVNGSVTNAAALTGAGGQATFNGVSS
277



XP_001389086.1
VTYDFGINVAGIVSVDVASASSDSAFIGVTFTESSMWISSEACDATQDAGLDTPLWFAVGQGAG





LYTVEKKYNRGAFRYMTVVSNTTATVSLNSVKINYTASPTQDLRAYTGYFHSNDELLNRIWYAG





AYTLQLCSIDPTTGDALVGLGVITSSETISLPQTDKWWTNYTITNGSSTLTDGAKRDRLVWPGD





MSIALESVAVSTEDLYSVRTALESLYALQKPDGRLPYAGKPFFDTVSFTYHLHSLVGAASYYQY





TGDRAWLTRYWGQYKKGVQWALSSVDSTGLANITASADWLRFGMGAHNIEANAILYYVLNDAIS





LAQTLNDNAPIRNWTTTAARIKTVANELLWDDKNGLYTDNETTTLHPQDGNSWAVKANLTLSAN





QSAIVSESLAARWGPYGAPAPEAGATVSPFIGGFELQAHYQAGQPDRALDLLRLQWGFMLDDPR





MTNSTFIEGYSTDGSLAYAPYTNTPRVSHAHGWATGPTSALTIYTAGLRVTGPAGATWLYKPQP





GNLTQVEAGFSTRLGSFASSFSRSGGRYQELSFSTPNGTTGSVELGDVSGQLVSDRGVKVQLVG





GKASGLQGGKWKLSNN




Transglucosidase
MAKSASQIHRAWWKECSVYQIWPASYKDSNDDGIGDIPGIISKLDYIKNIGVDIVWLCPSYKSP
278



XP_001400455.1
QVDMGYDIADYYSIADEYGTVADVEKLIQGCHERGMKLLMDLVVNHTSDQNEWFKQSRSSKDNK





YRNWYVWKPARYDEQGNRHPPNNWVSHFQGSAWEWDEHTGEYYLHLYATEQPDLNWEHPPVRKA





VHDIMRFWLDKGADGFRMDVINFISKDQRFPDAPVKDPRTPWQWGDKYYANGPRLHEYLQDLGK





ILKEYDAFSVGEMPFVRDTEEVLRAVRYDRNEINMIFNFEHVDIDHGTYDKFEPGSWKLTDLKA





FFETWQKFMYNNDGWNALYWENHDQPRSIDRYAQAKEEFRTEAGKMLATVLALQSGTPFVYQGQ





EIGMRNVPVEWDMNEYKDIDCLNHWHRLLKHRPDDIEAQKSARQEYQKKSRDNGRTPVQWSSAP





NGGFTGPNAKPWMSVSPDYVRFNAEAQVNDPNSIYHYWAAVLGLRKKYLDIFVYGDYDLVDKDS





QEIFAYARQYENKKALVLTNWTEKTLEWDATTNGVKGVKDVLLNSYESAEAAKGRFSGQKWSLR





PYEAVVLLVEA




Transglucosidase
MSFRSLLALSGLVCTGLANVISKRATLDSWLSNEATVARTAILNNIGADGAWVSGADSGIVVAS
279



XP_001390530.1
PSTDNPDYFYTWTRDSGLVLKTLVDLFRNGDTSLLSTIENYISAQAIVQGISNPSGDLSSGAGL





GEPKFNVDETAYTGSWGRPQRDGPALRATAMIGFGQWLLDNGYTSTATDIVWPLVRNDLSYVAQ





YWNQTGYDLWEEVNGSSFFTIAVQHRALVEGSAFATAVGSSCSWCDSQAPEILCYLQSFWTGSF





ILANFDSSRSGKDANTLLGSIHTFDPEAACDDSTFQPCSPRALANHKEVVDSFRSIYTLNDGLS





DSEAVAVGRYPEDTYYNGNPWFLCTLAAAEQLYDALYQWDKQGSLEVTDVSLDFFKALYSDAAT





GTYSSSSSTYSSIVDAVKTFADGFVSIVETHAASNGSMSEQYDKSDGEQLSARDLTWSYAALLT





ANNRRNSVVPASWGETSASSVPGTCAATSAIGTYSSVTVTSWPSIVATGGTTTTATPTGSGSVT





STSKTTATASKTSTSTSSTSCTTPTAVAVTFDLTATTTYGENIYLVGSISQLGDWETSDGIALS





ADKYTSSDPLWYVTVTLPAGESFEYKFIRIESDDSVEWESDPNREYTVPQACGTSTATVTDTWR




Transglucosidase
MSNRWTLLLSLVILLGCLVIPGVTVKHENFKTCSQSGFCKRNRAFADDAAAQGSSWASPYELDS
280



XP_001393899.1
SSIQFKDGQLHGTILKSVSPNEKVKLPLVVSFLESGAARVVVDEEKRMNGDIQLRHDSKARKER





YNEAEKWVLVGGLELSKTATLRPETESGFTRVLYGPDNQFEAVIRHAPFSADFKRDGQTHVQLN





NKGYLNMEHWRPKVEVEGEGEQQTQEDESTWWDESFGGNTDTKPRGPESVGLDITFPGYKHVFG





IPEHADSLSLKETRGGEGNHEEPYRMYNADVFEYELSSPMTLYGAIPFMQAHRKDSTVGVFWLN





AAETWVDIVKSTSSPNPLALGVGATTDTQSHWFSESGQLDVFVFLGPTPQEISKTYGELTGYTQ





LPQHFAIAYHQCRWNYITDEDVKEVDRNFDKYQIPYDVIWLDIEYTDDRKYFTWDPLSFPDPIS





MEEQLDESERKLVVIIDPHIKNQDKYSIVQEMKSKDLATKNKDGEIYDGWCWPGSSHWIDTFNP





AAIKWWVSLFKFDKFKGTLSNVFIWNDMNEPSVFNGPETTMPKDNLHHGNWEHRDIHNVHGITL





VNATYDALLERKKGEIRRPFILTRSYYAGAQRMSAMWTGDNQATWEHLAASIPMVLNNGIAGFP





FAGADVGGFFQNPSKELLTRWYQAGIWYPFFRAHAHIDTRRREPYLIAEPHRSIISQAIRLRYQ





LLPAWYTAFHEASVNGMPIVRPQYYAHPWDEAGFAIDDQLYLGSTGLLAKPVVSEEATTADIYL





ADDEKYYDYFDYTVYQGAGKRHTVPAPMETVPLLMQGGHVIPRKDRPRRSSALMRWDPYTLVVV





LDKNGQADGSLYVDDGETFDYERGAYIHRRFRFQESALVSEDVGTKGPKTAEYLKTMANVRVER





VVVVDPPKEWQGKTSVTVIEDGASAASTASMQYHSQPDGKAAYAVVKNPNVGIGKTWRIEF




Transg1ucosidase
MPQKEFVPKTYQESSTGAQSSSSVHLRSSPEERSFDFSFEPIRENLFRVTFSSQDHPLPPYPSV
281



XP_001399012.1
TKPATSLDGVHVSATGGSNQKTIEVGDVTASVEWSNTPVVSLSWKGTEKPLYRDLPLRSYVADS





TGIAHYTEHDRDCLHVGLGEKRAPMDLTGRHFQLSATDSFGYDVYNTDPLYKHIPLLIKASPDG





CVAIFSTTHGRGTWSVGSEVDGLWGHFKVYRQDYGGLEQYLIVGKTLKDVVRSYAELVGLPILV





PRWAYGYISGGYKYTMLDDPPAHEALMEFADKLEEHGIPCSAHQMSSGYSIAETEPKVRNVFTW





NKYRFPNPEEWIAKYHGRGIRLLSNIKPFLLASHPDFQKLIDGNGFFKDPESSKPGYMRLWSAG





GATGGDGCHIDFSSAVAFKWWYDGVQSLKRAGIDAMWNDNNEYTLPDDDWKLALDEPTVSDAVK





KGVENSVGQWGRAMHTELMGKASHDALLNIEPNHRPFVLTRSATAGTMRYAASTWSGDNVTSWE





GMKGANALSLSAGISLLQCCGHDIGGFEGPQPSPELLLRWIQLGIHSPRFAINCFKTSPGNSSV





GDVIEPWMYPEITPLVRDTIKRRYEILPYIYSLGLESHLTASPPQRWVGWGYESDPEVWTKALK





SGDEQFWFGDTIMVGGVYEPGVSVAKLYLPRKANDQFDFGYVNMNEPYNYLASGQWVEVPSEWR





KSIPLLARIGGAIPVGKPVHTRVPGDDTPASVAVKEVDDYRGVEIFPPLGSSHGQVFSTTWFED





DGISLEARISEYTVTYSSTEEKVIVGFSRDEKSGFVPAWTDLDIILHNGDERRVVSDIGKTVEY





KGKGSRGRVVYTLKN




Transg1ucosidase
MVKLTHLLARAWLVPLAYGASQSLLSTTAPSQPQFTIPASADVGAQLIANIDDPQAADAQSVCP
282



XP_001402053.1
GYKASKVQHNSRGFTASLQLAGRPCNVYGTDVESLTLSVEYQDSDRLNIQILPTHVDSTNASWY





FLSENLVPRPKASLNASVSQSDLFVSWSNEPSFNFKVIRKATGDALFSTEGTVLVYENQFIEFV





TALPEEYNLYGLGEHITQFRLQRNANLTIYPSDDGTPIDQNLYGQHPFYLDTRYYKGDRQNGSY





IPVKSSEADASQDYISLSHGVFLRNSHGLEILLRSQKLIWRTLGGGIDLTFYSGPAPADVTRQY





LTSTVGLPAMQQYNTLGFHQCRWGYNNWSDLADVVANFEKFEIPLEYIWTDIDYMHGYRNFDND





QHRFSYSEGDEFLSKLHESGRYYVPIVDAALYIPNPENASDAYATYDRGAADDVFLKNPDGSLY





IGAVWPGYTVFPDWHHPKAVDFWANELVIWSKKVAFDGVWYDMSEVSSFCVGSCGTGNLTLNPA





HPSFLLPGEPGDIIYDYPEAFNITNATEAASASAGASSQAAATATTTSTSVSYLRTTPTPGVRN





VEHPPYVINHDQEGHDLSVHAVSPNATHVDGVEEYDVHGLYGHQGLNATYQGLLEVWSHKRRPF





IIGRSTFAGSGKWAGHWGGDNYSKWWSMYYSISQALSFSLFGIPMFGADTCGFNGNSDEELCNR





WMQLSAFFPFYRNHNELSTIPQEPYRWASVIEATKSAMRIRYAILPYFYTLFDLAHTTGSTVMR





ALSWEFPNDPTLAAVETQFMVGPAIMVVPVLEPLVNTVKGVFPGVGHGEVWYDWYTQAAVDAKP





GVNTTISAPLGHIPVYVRGGNILPMQEPALTTREARQTPWALLAALGSNGTASGQLYLDDGESI





YPNATLHVDFTASRSSLRSSAQGRWKERNPLANVTVLGVNKEPSAVTLNGQAVFPGSVTYNSTS





QVLFVGGLQNLTKGGAWAENWVLEW




Transg1ucosidase
MFVESAKKALLALSLLAASAQAVPRVRRQGASSSFDYKSQIVRGVNLGGWLVTEPWITPSLYDS
283



A2RAR6.1
TGGGAVDEWTLCQILGKDEAQAKLSSHWSSFITQSDFDRMAQAGLNHVRIPIGYWAVAPIDGEP





YVSGQIDYLDQAVTWARAAGLKVLVDLHGAPGSQNGFDNSGHRGPIQWQQGDTVNQTMTAFDAL





ARRYAQSDTVTAIEAVNEPNIPGGVNEDGLKNYYYGALADVQRLNPSTTLFMSDGFQPVESWNG





FMQGSNVVMDTHHYQVFDTGLLSMSIDDHVKTACSLATQHTMQSDKPVVVGEWTGALTDCAKYL





NGVGNAARYDGTYMSTTKYGDCTGKSTGSVADFSADEKANTRRYIEAQLEAYEMKSGWLFWTWK





TEGAPGWDMQDLLANQLFPTSPTDRQYPHQCS




Transg1ucosidase
MPGHSRSRDRLSPSSELDDADPVYSPSVYQREHYYNNDSLFDSADDDYTRTPRNVYSYETHDEY
284



A2QX52.1
HDDDDDDDDVHEHDHDHEYDDKFEEPWVPLRAQVEGDQWREGFETAIPKEEDVTQAKEYQYQMS





GALGDDGPPPLPSDALGRGKGKKRLDRETRRQRRKERLAAFFKHKNGSASAGLVSGDALAKLLG





SQDGDEDCLSHLGTERADSMSQKNLEGGRQRKLPVLSEEPMMLRPFPAVAPTGQTQGRVVSGAQ





LEEGGPGMEMRHRGGGGPPAEGLLQKEGDWDGSTKGSSTSARPSFWKRYHKTFIFFAILIVLAA





IAIPVGIIEARRLHGTSGGDNSSNSNLKGISRDSIPAYARGTYLDPFTWYDTTDFNVTFTNATV





GGLSIMGLNSTWNDSAQANENVPPLNEKFPYGSQPIRGVNLGGWLSIEPFIVPSLFDTYTSSEG





IIDEWTLSEKLGDSAASVIEKHYATFITEQDFADIRDAGLDHVRIQFSYWAIKTYDGDPYVPKI





AWRYLLRAIEYCRKYGLRVNLDPHGIPGSQNGWNHSGRQGTIGWLNGTDGELNRQRSLEMHDQL





SQFFAQDRYKNVVTIYGLVNEPLMLSLPVEKVLNWTTEATNLVQKNGIKAWVTVHDGFLNLDKW





DKMLKTRPSNMMLDTHQYTVFNTGEIVLNHTRRVELICESWYSMIQQINITSTGWGPTICGEWS





QADTDCAQYVNNVGRGTRWEGTFSLTDSTQYCPTASEGTCSCTQANAVPGVYSEGYKTFLQTYA





EAQMSAFESAMGWFYWTWATESAAQWSYRTAWKNGYMPKKAYSPSFKCGDTIPSFGNLPEYY




Transg1ucosidase
MSSPQQVYLLPLKDDGSPDVPGGYIYLPAPTNPPYLLRFVIEGSSSICREGALWVNIPEKGESF
285



XP_001389036.2
NRSAFRSFSLSPDFNKNIQIDVPITSAGSFAFYVTFSPLPEFSVLSTPTPEPTRTPTHYIDVSP





KLTLRGQDLPLNALSIYSVISKFMGQYPKEWEKHLNGISQRNYNMVHFTPLMKRGASNSPYSIF





DQLQFDDAVFPNGEDDVARLISKMENEYGLLSLTDVVWNHTAHNSKWLEEHPEAGYSVETAPWL





EAALELDTALLKFGQDLQNLGLPTEFQTVDELMKVMNVMRDKVIAGIRLWEFYAIDVKSDTHKI





LDKWKTSKDIDLTDTNWAQLNLQDYKNWTLKQQATFIRDHAIPTSKQVLGRFSRAVDLQFGAAI





LTALFGPHNPSTSDTSIVEESLSKILDEVNLPFYEEYDGDVSEIMNQVFNRIKYLRIDDHGPKL





GAVTAQSPLIETYFTRLPLNDVTKKHKKEALALVNNGWIWNADALRDNAGPDSRAYLRREVIVW





GDCVKLRYGSCRDDNPFLWDFMTDYTRLMAKYFSGFRIDNCHSTPLVVAEYLLDEARKVRPNLT





VFAELFTGSEEADYIFVKRLGINALIREAMQAWSTGELSRLVHRHGGRPIGSFDLDLPSSGSSH





AIASSGLDSGKEKVVHIRPTPVQALFMDCTHDNEMPAQKRTAKDTLPNGALVAMCASAIGSVIG





YDEVYPRLVDLVHEHRLYFSEFSEAPETGLNSLEGGIGGIKKLLNELHTKMGIEGYDETHIHHD





GEYITVHRVHPRTRKGVFLIAHTAFPGQDSRSVLAPTHLVGTQVKHIGTWLLEVDTSQTTKERI





QADKSYLRGLPSQVKTFEGTKIEESGKDTIISVLNSFVAGSIALFETSMPSVEHASGLDNYITE





GVDHAFSDLSLVDLNFALYRCEAEERDSSKGQDGAYDIPGHGPLVYAGLQGWWSVLENIIKYNE





LGHPLCDHLRNGQWALDYIVARLEKLSHKEEHPALGRPAAWLQEKFQAVRQLPSFLLPRYFAII





VQVAYNAAWKRGIQLLGPHIQKGQEFIHQLGMVSVQQTGYVNSASLWPTKKVPSLAAGLPHFAV





DWARCWGRDVFISLRGLLLCTGRFEDAKEHITAFASVLKHGMIPNLLSSGKLPRYNSRDSVWFF





LQSIQDYTEMAPDGLEILDHKVPRRFLPYDDVWFPFDDPRAYSQQSTISEIIQEVFQRHAQGLS





FREYNAGPDLDMQMTQDGFQIDVKVDWETGLIFGGSQYNCGTWQDKMGESAKAGNKGVPGTPRD





GAAIEITGLVYSALTWVAKLHERGIYKHDGVDIGGGKSISFEDWASRIRANFERCYYVPLQPKD





DGQYDIDANIINRRGIYKDLYRSGKPYEDYQLRSNFPIAMTVAPDLFTASKALAALALADEVLV





GPVGMATLDPSDLNYRPNYNNSEDSTDFATAKGRNYHQGPEWVWQRGYFLRAFLHFDLARRTTP





AERTETYQQITRRLEGCKRALRESPWKGLTELTNKNGAYCADSSPTQAWSAGCLLDLYYDASRH





SQMRIWYGPFKLRYKNIDAIDYYEEKLRRLDEKIQVARQKEYPPTEVAFVTMESIAASQMVVQA





ILDPHPMQLLARLAPAPADVVWKNTYLPRSRRMMQSWFITVVIGFLTVFWSVLLIPVAYLLEYE





TLHKVFPQLADALARNPLAKSLVQTGLPTLVLSLLTVAVPYLYNWLSNQQGMMSRGDIELSVIS





KTFFFSFFNLFLVFTVFGTATTFYGFWENLRDAFKDATTIAFALAKTLENFAPFYINFLCLQGI





GLFPFRLLEFGSVAMYPINFLAAKTPRDYAELSTPPTFSYGYSIPQTVLSLIICVVYSVFPSSW





LICLFGLIYFTIGKFIYKYQLLYAMDHQQHSTGRAWPMICSRILMGLMVFQLAMIGVLALRRAI





TRSLLIVPLLMATVWFSYFFARTYEPLMKFIALKSIDRERPGGGDISPSPSSTFSPPSGLDRDS





FPIRIGGQELGLRLRKYVNPSLILPLHDAWLPGRTMVPELQGELEHRNPGNNAADESV




Transgiucosidase
MLGSLLLLLPLVGAAVIGPRANSQSCPGYKASNVQKQARSLTADLTLAGTPCNSYGKDLEDLKL
286



XP_001389510.2
LVEYQTDERLHVMIYDADEEVYQVPESVLPRVGSDEDSEDSVLEFDYVEEPFSFTISKGDEVLF





DSSASPLVFQSQYVNLRTWLPDDPYVYGLGEHSDPMRLPTYNYTRTLWNRDAYGTPNNTNLYGS





HPVYYDHRGKSGTYGVFLLNSNGMDIKINQTTDGKQYLEYNLLGGVLDFYFFYGEDPKQASMEY





SKIVGLPAMQSYWTFGFHQCRYGYRDVYELAEVVYNYSQAKIPLETMWTDIDYMDKRRVFTLDP





QRFPLEKMRELVTYLHNHDQHYIVMVDPAVSVSNNTAYITGVRDDVFLHNQNGSLYEGAVWPGV





TVFPDWFNEGTQDYWTAQFQQFFDPKSGVDIDALWIDMNEASNFCPYPCLDPAAYAISADLPPA





APPVRPSSPIPLPGFPADFQPSSKRSVKRAQGDKGKKVGLPNRNLTDPPYTIRNAAGVLSMSTI





ETDLIHAGEGYAEYDTHNLYGTMMSSASRTAMQARRPDVRPLVITRSTFAGAGAHVGHWLGDNF





SDWVHYRISIAQILSFASMFQIPMVGADVCGFGSNTTEELCARWASLGAFYTFYRNHNELGDIS





QEFYRWPTVAESARKAIDIRYKLLDYIYTALHRQSQTGEPFLQPQFYLYPEDSNTFANDRQFFY





GDALLVSPVLNEGSTSVDAYFPDDIFYDWYTGAVVRGHGENITLSNINITHIPLHIRGGNIIPV





RTSSGMTTTEVRKQGFELIIAPDLDDTASGSLYLDDGDSLNPSSVTELEFTYSKGELHVKGTFG





QKAVPKVEKCTLLGKSARTFKGFALDAPVNFKLK




Transgiucosidase
MPSTYLGALATLAVFPCLGQARSTWPLGSGLELSYQASQHQISIHQDNQTIFSTIPGQPFLSAS
287



XP_001391128.2
AGKDQFVEDSGNFNITNVNQARCRGQNITQLAGIPRSDSVKNQVAVRGYLLDCGGEDIAYGMNF





WVPRRFSDRVAFEASVDSEANASFASMKNRSIPIFSREQGVGRGDQPYTAIEDSQGFFSGGDQY





TTYTAIPQYVSSDGRVFYLDENDTAYAVFDFQRSDAVTVRYDSLSVHGHLMQADTMLDAITMLT





EYTGRMPTLPEWVDHGALLGIQGGQEKVNRIVKQGFEHDCPVAGVWLQDWSGTHLQSAPYGNMN





ISRLWWNWESDTSLYPTWAEFVQTLREQHGVRTLAYVNPFLANVSSKSDGYRRNLFLEASQHRY





MVQNTTTNSTAIISSGKGIDAGILDLTNEDTRAWFADVLRTQVWSANISGCMWDFGEYTPITPD





TSLANISTSAFFYHNQYPRDWAAYQRSVAAEMPLFHEMVTFHRSASMGANRHMNLFWVGDQATL





WTRNDGIKSVVTIQGQMGISGYAHSHSDIGGYTTVFEPPTTSNSSGAIPRSAELLGRWGELGAV





SSAVFRSHEGNVPSVNAQFYSNSTTYAYFAYNARLFRSLGPYRRRILNTESQRRGWPLLRMPVL





YHPEDLRARQISYESFFLGRDLYVAPVLDEGHKSVEVYFPGHSANRTYTHVWTGQTYRAGQTAK





VSAPFGKPAVFLVDGASSPELDVFLDFVRKENGTVL YA




Transglucosidase
MDPANEYCGLEDYGLVGDMHTCALVSKNGSVDSMCWPVFDSPSIFCRILDKEKGGHFSITPDRR
288



XP_001395384.2
LKNPLSKQRYRPYTNMLETRWIHEEGVMNILDYFPIAKPKPHVVEKGLPQWCRCYQNKGSAQYQ





ACRSGMVRKAECVRGEMEIEIELFPAFNYARDSHVAQQSSASDDAIQVYHFQAESQNLVVSVLG





DRGDISGDDSDLSIEFELSDRPGHLGPGLVGKVTLKEGQSITMLLHDQESITCNVEDLAPYLQQ





IERTTGDFWSDWTSKCTFRGHYREQVERSLLVLKLLTYKPTGAIVAAPTFSLPEHIGGSRNWDY





RYSWVRDAAFTVYVFLKNGYPEEAESYINFIFERIFPPMDKNPKPGEPFLPIMITIHGEREIPE





MELEHLEGYRGSRPVRIGNGAATHIQLDIYGELMDSIYLYNKHAADISYDQWRAIRRMIDFVIQ





IRHQPDQSIWEVRGPPQNFVYSKIMLWVALDRGLRLAEKRSNLPCPDRARWMHERDALYDEIMT





KGYNSEKGFFCMSYENQDAMDAAVLIAPLVFFVAPNDPRLLSTIQKITEVPAKGGLSVANMVSR





YDTGKVDDGVGGNEGAFLMVTFWLVEAMMRAARSKAYLPHDPFFQQLRKTATSQFDSILSFANH





LGMFSEEVATSGEQIGNMPQAFSHLACVSAAMNLGGGGDR




Transglucosidase
MCNKSNYSSPKWWKESVVYQVYPASFNCGKSTTNTNGWGDVTGIIEKVPYLESLGVDIVWLSPI
289



XP_001396506.2
YTSPQVDMGYDIADYESIDPRYGTLADVDLLIKTLKDHDMKLMMDLVVNHTSDQHSWFVESANS





KDSPKRDWYIWRPAKGFDEAGNPVPPNNWAQILGDTLSAWTWHAETQEFYLTLHTSAQAELNWE





NPDVVTAVYDVMEFWLRRGICGFRMDVINFISKDQSFPDAPIIDPASKYQPGEQFYTNGPRFHE





FMHGIYDNVLSKYDTITVGETPYVTDMKEIIKTVGSTAKELNMAFNFDHMEIEDIKTKGESKWS





LRDWKLTELKGILSGWQKRMREWDGWNAIFLECHDQARSVSRYTNDSDEFRDRGAKLLALLETT





LGGTIFLYQGQEIGMRNFPVEWDPDTEYKDIESVNFWKKSKELHPVGSEGLAQARTLLQKKARD





HARTPMQWSADPHAGFTVPDATPWMRVNDDYGTVNVEAQMSFPWEMKGELSVWQYWQQALQRRK





LHKGAFVYGDFEDLDYHNELVFAYSRTSADGKETWLVAMNWTTDAVEWTVPSGIHVTRWVSSTL





QTAPLMAGQSTVTLRALEGVVGCCS




Transglucosidase
MGLCVGWRWILLCVVMGAAVCGTDKTATMRWHKLLPGVLALLPLSVAQSCWRNTTCSGPTESAF
290



XP_001398938.2
SGPWEKNIFAPSSRTVNPEKLFLITQPDKTEEYSPFALHGNGSLVVYDFGKEVGGIVSVNFSST





GSGALGVAFTEAKNWIGEWSDSSNGGFKGPDGALYGNFTEAGSHYYVMPDKSLRGGFRYLTLFL





ITSDNSTIQIEDVNLEIGFQPTWSNLKAYQGYFHSNDDLLNKIWYTGAYTLQTNEVPTDTGRQI





PAMAVGWANNCTLGPGDTIIVDGAKRDRAVWPGDMGIAVPSAFVSLGDLDSVKNALQVMYDTQN





NSTGAFDESGPPLSQKDSDTYHMWTMVGTYNYMLFTNDSDFLERNWEGYQKAMDYIYGKVTYPS





GLLNVTGTRDWARWQQGYNNSEAQMILYHTLNTGAELATWAGDSGDLSSTWTSRAEKLRQAINE





YCWDDSYGAFKDNATDTTLHPQDANSMALLFGVVDADRAASISERLTDNWTPIGAVAPELPENI





SPFISSFEIQGHLTVGQPQRALELIRRSWGWYYNNANGTQSTVIEGYLQNGTFGYRSDRGYYYD





TAYVSHSHGWSSGPTSALTNYIVGISVTSPLGATWRIAPQFVDLQSAEGGFTTSLGKFQAGWSK





TDKGYTLDFTVPHGTQGNLTLPFVSAAKPSIKIDGTEISRGVQYANSTATVTVSGGGTYKVEVQ




Betaglucosidase
MAMQLRSLLLCVLLLLLGFALADTNAAARIHPPVVCANLSRANFDTLVPGFVFGAATASYQVEG
292



Protein ID:
AANLDGRGPSIWDTFTHKHPEKIADGSNGDVAIDQYHRYKEDVAIMKDMGLESYRFSISWSRVL




H9ZGE3|H9ZGE3
PNGTLSGGINKKGIEYYNNLINELLHNGIEPLVTLFHWDVPQTLEDEYGGFLSNRIVNDFEEYA





ELCFKKFGDRVKHWTTLNEPYTFSSHGYAKGTHAPGRCSAWYNQTCFGGDSATEPYLVTHNLLL





AHAAAVKLYKTKYQAYQKGVIGITVVTPWFEPASEAKEDIDAVFRALDFIYGWFMDPLTRGDYP





QSMRSLVGERLPNFTKKESKSLSGSFDYIGINYYSARYASASKNYSGHPSYLNDVNVDVKTELN





GVPIGPQAASSWLYFYPKGLYDLLCYTKEKYNDPIIYITENGVDEFNQPNPKLSLCQLLDDSNR





IYYYYHHLCYLQAAIKEGVKVKGYFAWSLLDNFEWDNGYTVRFGINYVDYDNGLKRHSKHSTHW





FKSFLKKSSRNTKKIRRCGNNNTSATKFVF




UGT73-251_5
MDSPPQKPHFLLFPFMAQGHMIPMIDLAKLLAQRGAIITVVTTPHNAARYHSVLARAIDSGLHI
293
Disclosed in



HVLQLQFPCNEGGLPEGCENFDLLPSLGSASTFFRATFLLYEPSEKVFEELIPRPTCIISDMCL

Itkin et at.,



PWTVRLAQKYHVPRLVFYSLSCFFLLCMRSLKNNQALISSKSDSELVTFSDLPDPVEFLKSQLP

2016, and WO



KSNDEEMAKFGYEIGEADRQSHGVIVNVFEEMEPKYLAEYRKERESPEKVWCVGPVSLCNDNKL

2016/038617



DKAQRGNKASIDERECIEWLDGQQPSSVVYVSLGSLCNLVTAQLIELGLGLEASNKPFIWVIRK





GNITEELQKWLVEYDFEEKTKGRGLVILGWAPQVLILSHPAIGCFLTHCGWNSSIEGISAGMPM





ITWPLFADQVFNEKLIVEILRIGVSVGMETAMHWGEEEEKGVVVKREKVREAIERAMDGDEREE





RRERCKELAEMAKRAVEEGGSSHRNLTLLTEDILVNGGGQERMDDADDFPTIVN




UGT73-251-6
MDSPPHRPHFLLFPFMAQGHMIPMIDLAKLLAQRGAIVTILTTPHNAARTHSVLARAIDSGLQI
294
Disclsoed in



RVRPLQFPCKEAGLPEGCENLDLLPSLGSASTFFRATCLLYDPSEKLFEELSPRPTCIISDMCL

Itkin et at.,



PWTIRLAQKYHVPRLVFYSLSCFFLLCMRSLKNNPALISSKSDSEFVTFSDLPDPVEFLKSELP

2016, and WO



KSTDEDLVKFSYEMGEADRKSYGVILNIFEEMEPKYLAEYGNERESPEKVWCVGPVSLCNDNKL

2016/038617



DKAQRGNKASIDERECIKWLGGQQPSSVVYASLGSLCNLVTAQFIELGLGLEASNKPFIWVIRK





GNITEELQKWLVEYDFEEKTKGRGLVILGWAPQVLILSHPSIGCFLTHCGWNSSIEGISAGVPM





VTWPLFSDQVFNEKLIVQILRIGVSVGAETAMNWGEEEEKGVVVKREKVREAIERMMDGDEREE





RRERCKELAETAKRAIEEGGSSHRNLTLLIEDIGTSLRRL




UGT73-327-2
MGSAGVELKVAFLPFAAPGHMIPLMNIARLFAMHGADVTFITTPATASRFQNVVDSDLRRGHKI
295
Disclosed in



KLHTFQLPSAEAGLPPGVESFNECTSKEMTEKLFGAFEMLNGDIEQFLKGAKVDCIVSDTILVW

Itkin et at.,



TLDAAARLGIPRIAFRSSGFFSECIHHSLRCHKPHKKVGSDTEPFIFPGLPHKIEITRLNIPQW

2016, and WO



YSEEGYIQHIEKMKEMDKKSYAVLLNTFYELEADYVEYFESVIGLKTWIVGPVSLWANEGGGKN

2016/038617



DSRTENNNAELMEWLDSKQPNSVLYVSFGSMTKFPSAQVLEIAHGLEDSGCHFIWVVRKMNESE





AADEEFPEGFEERVRESKRGLIIRDWAPQELILNHAAVGGFVTHCGWNSILESVCAGRPIIAWP





LSAEQFFNEKFVTRVLKVGVSIGVRKWWGSTSSETLDVVKRDRIAEAVARLMGDDREVVEMRDG





VRELSHAAKRAIKEGGSSHSTLLSLIHELKTMKFKRQSSNVDG




UGT74-345-2
MDETTVNGGRRASDVVVFAFPRHGHMSPMLQFSKRLVSKGLRVTFLITTSATESLRLNLPPSSS
296
Itkin et at.,



LDLQVISDVPESNDIATLEGYLRSFKATVSKTLADFIDGIGNPPKFIVYDSVMPWVQEVARGRG

2016, and WO



LDAAPFFTQSSAVNHILNHVYGGSLSIPAPENTAVSLPSMPVLQAEDLPAFPDDPEVVMNFMTS

2016/038617



QFSNFQDAKWIFFNTFDQLECKVVNWMADRWPIKTVGPTIPSAYLDDGRLEDDRAFGLNLLKPE





DGKNTRQWQWLDSKDTASVLYISFGSLAILQEEQVKELAYFLKDTNLSFLWVLRDSELQKLPHN





FVQETSHRGLVVNWCSQLQVLSHRAVSCFVTHCGWNSTLEALSLGVPMVAIPQWVDQTTNAKFV





ADVWRVGVRVKKKDERIVTKEELEASIRQVVQGEGRNEFKHNAIKWKKLAKEAVDEGGSSDKNI





EEFVKTIA




UGT75-281-2
MMRNHHFLLVCFPSQGYINPSLQLARRLISLGVNVTFATTVLAGRRMKNKTHQTATTPGLSFAT
297
Itkin et al



FSDGFDDETLKPNGDLTHYFSELRRCGSESLTHLITSAANEGRPITFVIYSLLLSWAADIASTY

2016, and WO



DIPSALFFAQPATVLALYFYYFHGYGDTICSKLQDPSSYIELPGLPLLTSQDMPSFFSPSGPHA

2016/038617



FILPPMREQAEFLGRQSQPKVLVNTFDALEADALRAIDKLKMLAIGPLIPSALLGGNDSSDASF





CGDLFQVSSEDYIEWLNSKPDSSVVYISVGSICVLSDEQEDELVHALLNSGHTFLWVKRSKENN





EGVKQETDEEKLKKLEEQGKMVSWCRQVEVLKHPALGCFLTHCGWNSTIESLVSGLPVVAFPQQ





IDQATNAKLIEDVWKTGVRVKANTEGIVEREEIRRCLDLVMGSRDGQKEEIERNAKKWKELARQ





AIGEGGSSDSNLKTFLWEIDLEI




UGT85-269-4
MAEQAHDLLHVLLFPFPAEGHIKPFLCLAELLCNAGFHVTFLNTDYNHRRLHNLHLLAARFPSL
298
Itkin et al.,



HFESISDGLPPDQPRDILDPKFFISICQVTKPLFRELLLSYKRISSVQTGRPPITCVITDVIFR

2016, and WO



FPIDVAEELDIPVFSFCTFSARFMFLYFWIPKLIEDGQLPYPNGNINQKLYGVAPEAEGLLRCK

2016/038617



DLPGHWAFADELKDDQLNFVDQTTASSRSSGLILNTFDDLEAPFLGRLSTIFKKIYAVGPIHSL





LNSHHCGLWKEDHSCLAWLDSRAAKSVVFVSFGSLVKITSRQLMEFWHGLLNSGKSFLFVLRSD





VVEGDDEKQVVKEIYETKAEGKWLVVGWAPQEKVLAHEAVGGFLTHSGWNSILESIAAGVPMIS





CPKIGDQSSNCTWISKVWKIGLEMEDRYDRVSVETMVRSIMEQEGEKMQKTIAELAKQAKYKVS





KDGTSYQNLECLIQDIKKLNQIEGFINNPNFSDLLRV




UGT85-269-1
MVQPRVLLFPFPALGHVKPFLSLAELLSDAGIDVVFLSTEYNHRRISNTEALASRFPTLHFETI
300
Itkin et al.,



PDGLPPNESRALADGPLYFSMREGTKPRFRQLIQSLNDGRWPITCIITDIMLSSPIEVAEEFGI

2016, and WO



PVIAFCPCSARYLSIHFFIPKLVEEGQIPYADDDPIGEIQGVPLFEGLLRRNHLPGSWSDKSAD

2016/038617



ISFSHGLINQTLAAGRASALILNTFDELEAPFLTHLSSIFNKIYTIGPLHALSKSRLGDSSSSA





SALSGFWKEDRACMSWLDCQPPRSVVFVSFGSTMKMKADELREFWYGLVSSGKPFLCVLRSDVV





SGGEAAELIEQMAEEEGAGGKLGMVVEWAAQEKVLSHPAVGGFLTHCGWNSTVESIAAGVPMMC





WPILGDQPSNATWIDRVWKIGVERNNREWDRLTVE




UGT94-289-1
MDAQRGHTTTILMFPWLGYGHLSAFLELAKSLSRRNFHIYFCSTSVNLDAIKPKLPSSSSSDSI
301
Itkin et al.,



QLVELCLPSSPDQLPPHLHTTNALPPHLMPTLHQAFSMAAQHFAAILHTLAPHLLIYDSFQPWA

2016, and WO



PQLASSLNIPAINFNTTGASVLTRMLHATHYPSSKFPISEFVLHDYWKAMYSAAGGAVTKKDHK

2016/038617



IGETLANCLHASCSVILINSFRELEEKYMDYLSVLLNKKVVPVGPLVYEPNQDGEDEGYSSIKN





WLDKKEPSSTVFVSFGSEYFPSKEEMEEIAHGLEASEVHFIWVVRFPQGDNTSAIEDALPKGFL





ERVGERGMVVKGWAPQAKILKHWSTGGFVSHCGWNSVMESMMFGVPIIGVPMHLDQPFNAGLAE





EAGVGVEAKRDPDGKIQRDEVAKLIKEVVVEKTREDVRKKAREMSEILRSKGEEKMDEMVAAIS





LFLKI




UGT94-289-2
MDAQQGHTTTILMLPWVGYGHLLPFLELAKSLSRRKLFHIYFCSTSVSLDAIKPKLPPSISSDD
302
Itkin et at.,



SIQLVELRLPSSPELPPHLHTTNGLPSHLMPALHQAFVMAAQHFQVILQTLAPHLLIYDILQPW

2016, and WO



APQVASSLNIPAINFSTTGASMLSRTLHPTHYPSSKFPISEFVLHNHWRAMYTTADGALTEEGH

2016/038617



KIEETLANCLHTSCGVVLVNSFRELETKYIDYLSVLLNKKVVPVGPLVYEPNQEGEDEGYSSIK





NWLDKKEPSSTVFVSFGTEYFPSKEEMEEIAYGLELSEVNFIWVLRFPQGDSTSTIEDALPKGF





LERAGERAMVVKGWAPQAKILKHWSTGGLVSHCGWNSMMEGMMFGVPIIAVPMHLDQPFNAGLV





EEAGVGVEAKRDSDGKIQREEVAKSIKEVVIEKTREDVRKKAREMGEILRSKGDEKIDELVAEI





SLLRKKAPCSI




UGT94-289-3
MDAAQQGDTTTILMLPWLGYGHLSAFLELAKSLSRRNFHIYFCSTSVNLDAIKPKLPSSFSDSI
303
Itkin et al.,



QFVELHLPSSPEFPPHLHTTNGLPPTLMPALHQAFSMAAQHFESILQTLAPHLLIYDSLQPWAP

2016, and WO



RVASSLKIPAINFNTTGVFVISQGLHPIHYPHSKFPFSEFVLHNHWKAMYSTADGASTERTRKR

2016/038617



GEAFLYCLHASCSVILINSFRELEGKYMDYLSVLLNKKVVPVGPLVYEPNQDGEDEGYSSIKNW





LDKKEPSSTVFVSFGSEYFPSKEEMEEIAHGLEASEVNFIWVVRFPQGDNTSGIEDALPKGFLE





RAGERGMVVKGWAPQAKILKHWSTGGFVSHCGWNSVMESMMFGVPIIGVPMHVDQPFNAGLVEE





AGVGVEAKRDPDGKIQRDEVAKLIKEVVVEKTREDVRKKAREMSEILRSKGEEKFDEMVAEISL





LLKI




UDP-
MDTRKRSIRILMFPWLAHGHISAFLELAKSLAKRNFVIYICSSQVNLNSISKNMSSKDSISVKL
304
Disclosed in


glycotransferase
VELHIPTTILPPPYHTTNGLPPHLMSTLKRALDSARPAFSTLLQTLKPDLVLYDFLQSWASEEA

Noguchi et


(330)
ESQNIPAMVFLSTGAAAISFIMYHWFETRPEEYPFPAIYFREHEYDNFCRFKSSDSGTSDQLRV

al.,



SDCVKRSHDLVLIKTFRELEGQYVDFLSDLTRKRFVPVGPLVQEVGCDMENEGNDIIEWLDGKD

2008) (Plant



RRSTVFSSFGSEYFLSANEIEEIAYGLELSGLNFIWVVRFPHGDEKIKIEEKLPEGFLERVEGR

J. 2008



GLVVEGWAQQRRILSHPSVGGFLSHCGWSSVMEGVYSGVPIIAVPMHLDQPFNARLVEAVGFGE

May;54(3):415



EVVRSRQGNLDRGEVARVVKKLVMGKSGEGLRRRVEELSEKMREKGEEEIDSLVEELVTVVRRR

-27)



ERSNLKSENSMKKLNVMDDGE




UDP-
ATGGATACAAGAAAGAGAAGCATCAGGATTCTAATGTTCCCATGGCTTGCTCATGGCCATATCT
305
Disclosed in


glycotransferase
CAGCATTCCTCGAGCTGGCGAAGTCACTTGCCAAAAGAAACTTCGTCATTTACATTTGTTCTTC

Noguchi et


(330)
ACAAGTAAATCTAAATTCCATCAGCAAGAACATGTCATCAAAAGACTCCATTTCCGTAAAACTT

al., 2008



GTTGAGCTTCACATTCCCACCACCATACTTCCCCCTCCTTACCACACCACCAATGGCCTCCCAC





CCCACCTCATGTCCACCCTCAAGAGAGCCCTCGACAGTGCCCGGCCCGCCTTCTCCACCCTCCT





CCAAACCCTCAAGCCCGACTTGGTTTTATACGATTTCCTCCAGTCGTGGGCCTCGGAGGAGGCC





GAGTCGCAGAATATACCAGCCATGGTGTTTCTGAGTACCGGAGCTGCAGCGATTTCTTTTATTA





TGTACCATTGGTTTGAGACCAGACCGGAGGAGTACCCTTTTCCGGCTATATACTTCCGGGAACA





CGAGTATGATAACTTCTGCCGTTTTAAGTCTTCCGACAGCGGTACTAGTGATCAATTGAGAGTC





AGCGATTGCGTTAAACGGTCGCACGATTTGGTTCTGATCAAGACATTCCGTGAACTGGAAGGAC





AATACGTAGATTTTCTCTCCGACTTGACTCGGAAGAGATTCGTACCAGTTGGCCCCCTTGTTCA





GGAGGTAGGTTGTGATATGGAGAATGAAGGAAATGACATCATCGAATGGCTCGACGGGAAAGAC





CGTCGTTCGACGGTTTTCTCCTCATTCGGGAGCGAGTACTTCTTGTCTGCCAATGAGATCGAAG





AGATAGCTTATGGGCTGGAGCTAAGCGGGCTTAACTTCATCTGGGTTGTTAGGTTTCCTCATGG





CGACGAGAAAATCAAGATTGAGGAGAAACTGCCGGAAGGGTTTCTTGAGAGAGTGGAAGGAAGA





GGGTTGGTGGTGGAGGGATGGGCACAGCAGAGGAGAATATTGTCACATCCGAGTGTTGGAGGGT





TTTTGAGCCACTGTGGGTGGAGTTCTGTGATGGAAGGGGTGTATTCCGGTGTGCCGATTATTGC





CGTGCCGATGCATCTTGACCAGCCGTTCAATGCTAGGTTGGTGGAGGCGGTGGGGTTTGGGGAG





GAGGTGGTGAGGAGTAGACAAGGAAATCTTGACAGAGGAGAGGTGGCGAGGGTGGTGAAGAAGC





TGGTTATGGGGAAAAGTGGGGAGGGGTTACGGCGGAGGGTGGAGGAGTTGAGTGAGAAGATGAG





AGAGAAAGGGGAGGAGGAGATTGATTCACTGGTGGAGGAATTGGTGACGGTGGTTAGGAGGAGA





GAGAGATCGAATCTCAAGTCTGAGAATTCTATGAAGAAATTGAATGTGATGGATGATGGAGAAT





AG




UGT98 protein [S.
MDAQRGHTTTILMFPWLGYGHLSAFLELAKSLSRRNFHIYFCSTSVNLDAIKPKLPSSSSSDSI
306




grosvenorii]

QLVELCLPSSPDQLPPHLHTTNALPPHLMPTLHQAFSMAAQHFAAILHTLAPHLLIYDSFQPWA





PQLASSLNIPAINFNTTGASVLTRMLHATHYPSSKFPISEFVLHDYWKAMYSAAGGAVTKKDHK





IGETLANCLHASCSVILINSFRELEEKYMDYLSVLLNKKVVPVGPLVYEPNQDGEDEGYSSIKN





WLDKKEPSSTVFVSFGSEYFPSKEEMEEIAHGLEASEVHFIWVVRFPQGDNTSAIEDALPKGFL





ERVGERGMVVKGWAPQAKILKHWSTGGFVSHCGWNSVMESMMFGVPIIGVPMHLDQPFNAGLAE





EAGVGVEAKRDSDGKIQREEVAKSIKEVVIEKTREDVRKKAREMGEILRSKGDEKIDELVAEIS





LLRKKAPCSIAAALEHHHHHH




UGT98 gene [S.
CTCGAATTCATGGATGCCCAGCGAGGTCACACCACAACCATTTTGATGTTTCCATGGCTCGGCT
307




grosvenorii]

ATGGCCATCTTTCGGCTTTCCTAGAGTTGGCCAAAAGCCTCTCAAGGAGGAACTTCCATATCTA





CTTCTGTTCAACCTCTGTTAACCTCGACGCCATTAAACCAAAGCTTCCTTCTTCTTCCTCTTCT





GATTCCATCCAACTTGTGGAACTTTGTCTTCCATCTTCTCCTGATCAGCTCCCTCCTCATCTTC





ACACAACCAACGCCCTCCCCCCTCACCTCATGCCCACTCTCCACCAAGCCTTCTCCATGGCTGC





CCAACACTTTGCTGCCATTTTACACACACTTGCTCCGCATCTCCTCATTTACGACTCTTTCCAA





CCTTGGGCTCCTCAACTAGCTTCATCCCTCAACATTCCAGCCATCAACTTCAATACTACGGGAG





CTTCAGTCCTGACCCGAATGCTTCACGCTACTCACTACCCAAGTTCTAAATTCCCAATTTCAGA





GTTTGTTCTCCACGATTATTGGAAAGCCATGTACAGCGCCGCCGGTGGGGCTGTTACAAAAAAA





GACCACAAAATTGGAGAAACACTTGCGAATTGCTTGCATGCTTCTTGTAGTGTAATTCTAATCA





ATAGTTTCAGAGAGCTCGAGGAGAAATATATGGATTATCTCTCCGTTCTCTTGAACAAGAAAGT





TGTTCCGGTTGGTCCTTTGGTTTACGAACCGAATCAAGACGGGGAAGATGAAGGTTATTCAAGC





ATCAAAAATTGGCTTGACAAAAAGGAACCGTCCTCCACCGTCTTCGTTTCATTTGGAAGCGAAT





ACTTCCCGTCAAAGGAAGAAATGGAAGAGATAGCCCATGGGTTAGAGGCGAGCGAGGTTCATTT





CATCTGGGTCGTTAGGTTTCCTCAAGGAGACAACACCAGCGCCATTGAAGATGCCTTGCCGAAG





GGGTTTCTGGAGAGGGTGGGAGAGAGAGGGATGGTGGTGAAGGGTTGGGCTCCCCAGGCGAAGA





TACTGAAGCATTGGAGCACAGGGGGATTCGTGAGCCACTGTGGATGGAACTCGGTGATGGAAAG





CATGATGTTTGGCGTTCCCATAATAGGGGTTCCGATGCATCTGGACCAGCCCTTTAACGCCGGA





CTCGCGGAAGAAGCTGGCGTCGGCGTGGAAGCCAAGCGAGATTCGGACGGCAAAATTCAAAGAG





AAGAAGTTGCAAAGTCGATCAAAGAAGTGGTGATTGAGAAAACCAGGGAAGACGTGAGGAAGAA





AGCAAGAGAAATGGGTGAGATTTTGAGGAGTAAAGGAGATGAGAAAATTGATGAGTTGGTGGCT





GAAATTTCTCTTTTGCGCAAAAAGGCCCCATGTTCAATTGCGGCCGCACTCGAGCACCACCACC





ACCACCACTGA




CYP 1798
MEMSSSVAATISIWMVVVCIVGVGWRVVNWVWLRPKKLEKRLREQGLAGNSYRLLFGDLKERAA
308




MEEQANSKPINFSHDIGPRVFPSMYKTIQNYGKNSYMWLGPYPRVHIMDPQQLKTVFTLVYDIQ





KPNLNPLIKFLLDGIVTHEGEKWAKHRKIINPAFHLEKLKDMIPAFFHSCNEIVNEWERLISKE





GSCELDVMPYLQNLAADAISRTAFGSSYEEGKMIFQLLKELTDLVVKVAFGVYIPGWRFLPTKS





NNKMKEINRKIKSLLLGIINKRQKANMEEGEAGQSDLLGILMESNSNEIQGEGNNKEDGMSIED





VIEECKVFYIGGQETTARLLIWTMILLSSHTEWQERARTEVLKVFGNKKPDFDGLSRLKVVTMI





LNEVLRLYPPASMLTRIIQKETRVGKLTLPAGVILIMPIILIHRDHDLWGEDANEFKPERFSKG





VSKAAKVQPAFFPFGWGRICMGQNFAMIEAKMALSLILQRFSFELSSSYVHAPTVVFTTQPQHG





AHIVLRKL




Epoxide hydrolase
MDAIEHRTVSVNGINSHVAEKGEGPVVLLLHGFPELWYSWRHQILALSSLGYRAVAPDLRGYGD
309




TDAPGSISSYTCFHIVGDLVALVESLGMDRVFVVAHDWGAMIAWCLCLFRPEMVKAFVCLSVPF





RQRNPKMKPVQSMRAFFGDDYYICRFQNPGEIEEEMAQVGAREVLRGILTSRRPGPPILPKGQA





FRARPGASTALPSWLSEKDLSFFASKYDQKGFTGPLNYYRAMDLNWELTASWTGVQVKVPVKYI





VGDVDMVFTTPGVKEYVNGGGFKKDVPFLQEVVIMEGVGHFINQEKQEISSHIMDFISKF




Epoxide hydrolase
MDEIEHITINTNGIKMHIASVGTGPVVLLLHGFPELWYSWRHQLLYLSSVGYRAIAPDLRGYGD
310




TDSPASPTSYTALHIVGDLVGALDELGIEKVFLVGHDWGAIIAWYFCLFRPDRIKALVNLSVQF





IPRNPAIPFIEGFRTAFGDDFYICRFQVPGEAEEDFASIDTAQLFKTSLCNRSSAPPCLPKEIG





FRAIPPPENLPSWLTEEDINFYAAKFKQTGFTGALNYYRAFDLTWELTAPWTGAQIQVPVKFIV





GDSDLTYHFPGAKEYIHNGGFKRDVPLLEEVVVVKDACHFFNQERPQEINAHIHDFINKF




Epoxide hydrolase
MENIEHTTVQTNGIKMHVAAIGTGPPVLLLHGFPELWYSWRHQLLYLSSAGYRAIAPDLRGYGD
311




TDAPPSPSSYTALHIVGDLVGLLDVLGIEKVFLIGHDWGAIIAWYFCLFRPDRIKALVNLSVQF





FPRNPTTPFVKGFRAVLGDQFYMVRFQEPGKAEEEFASVDIREFFKNVLSNRDPQAPYLPNEVK





FEGVPPPALAPWLTPEDIDVYADKFAETGFTGGLNYYRAFDRTWELTAPWTGARIGVPVKFIVG





DLDLTYHFPGAQKYIHGEGFKKAVPGLEEVVVMEDTSHFINQERPHEINSHIHDFFSKFC




Epoxide hydrolase
MDQIEHITINTNGIKMHIASVGTGPVVLLLHGFPELWYSWRHQLLYLSSVGYRAIAPDLRGYGD
312




TDSPASPTSYTALHIVGDLVGALDELGIEKVFLVGHDWAAIIAWYFCLFRPDRIKALVNLSVQF





IPRNPAIPFIEGFRTAFGDDFYMCRFQVPGEAEEDFASIDTAQLFKTSLCNRSSAPPCLPKEIG





FRAIPPPENLPSWLTEEDINYYAAKFKQTGFTGALNYYRAFDLTWELTAPWTGAQIQVPVKFIV





GDSDLTYHFPGAKEYIHNGGFKKDVPLLEEVVVVKDACHFINQERPQEINAHIHDFINKF




Epoxide hydrolase
MEKIEHSTIATNGINMHVASAGSGPAVLFLHGFPELWYSWRHQLLYLSSLGYRAIAPDLRGFGD
313




TDAPPSPSSYTAHHIVGDLVGLLDQLGVDQVFLVGHDWGAMMAWYFCLFRPDRVKALVNLSVHF





TPRNPAISPLDGFRLMLGDDFYVCKFQEPGVAEADFGSVDTATMFKKFLTMRDPRPPIIPNGFR





SLATPEALPSWLTEEDIDYFAAKFAKTGFTGGFNYYRAIDLTWELTAPWSGSEIKVPTKFIVGD





LDLVYHFPGVKEYIHGGGFKKDVPFLEEVVVMEGAAHFINQEKADEINSLIYDFIKQF




Epoxide hydrolase
MEKIEHTTISTNGINMHVASIGSGPAVLFLHGFPELWYSWRHQLLFLSSMGYRAIAPDLRGFGD
314




TDAPPSPSSYTAHHIVGDLVGLLDQLGIDQVFLVGHDWGAMMAWYFCLFRPDRVKALVNLSVHF





LRRHPSIKFVDGFRALLGDDFYFCQFQEPGVAEADFGSVDVATMLKKFLTMRDPRPPMIPKEKG





FRALETPDPLPAWLTEEDIDYFAGKFRKTGFTGGFNYYRAFNLTWELTAPWSGSEIKVAAKFIV





GDLDLVYHFPGAKEYIHGGGFKKDVPLLEEVVVVDGAAHFINQERPAEISSLIYDFIKKF




CYP87D18
MWTVVLGLATLFVAYYIHWINKWRDSKFNGVLPPGTMGLPLIGETIQLSRPSDSLDVHPFIQKK
315




VERYGPIFKTCLAGRPVVVSADAEFNNYIMLQEGRAVEMWYLDTLSKFFGLDTEWLKALGLIHK





YIRSITLNHFGAEALRERFLPFIEASSMEALHSWSTQPSVEVKNASALMVFRTSVNKMFGEDAK





KLSGNIPGKFTKLLGGFLSLPLNFPGTTYHKCLKDMKEIQKKLREVVDDRLANVGPDVEDFLGQ





ALKDKESEKFISEEFIIQLLFSISFASFESISTTLTLILKLLDEHPEVVKELEAEHEAIRKARA





DPDGPITWEEYKSMTFTLQVINETLRLGSVTPALLRKTVKDLQVKGYIIPEGWTIMLVTASRHR





DPKVYKDPHIFNPWRWKDLDSITIQKNFMPFGGGLRHCAGAEYSKVYLCTFLHILCTKYRWTKL





GGGRIARAHILSFEDGLHVKFTPKE




CYP87D18 gene
ATGTGGACTGTCGTGCTCGGTTTGGCGACGCTGTTTGTCGCCTACTACATCCATTGGATTAACA
316



sequence
AATGGAGAGATTCCAAGTTCAACGGAGTTCTGCCGCCGGGCACCATGGGTTTGCCGCTCATCGG





AGAGACGATTCAACTGAGTCGACCCAGTGACTCCCTCGACGTTCACCCTTTCATCCAGAAAAAA





GTTGAAAGATACGGGCCGATCTTCAAAACATGTCTGGCCGGAAGGCCGGTGGTGGTGTCGGCGG





ACGCAGAGTTCAACAACTACATAATGCTGCAGGAAGGAAGAGCAGTGGAAATGTGGTATTTGGA





TACGCTCTCCAAATTTTTCGGCCTCGACACCGAGTGGCTCAAAGCTCTGGGCCTCATCCACAAG





TACATCAGAAGCATTACTCTCAATCACTTCGGCGCCGAGGCCCTGCGGGAGAGATTTCTTCCTT





TTATTGAAGCATCCTCCATGGAAGCCCTTCACTCCTGGTCTACTCAACCTAGCGTCGAAGTCAA





AAATGCCTCCGCTCTCATGGTTTTTAGGACCTCGGTGAATAAGATGTTCGGTGAGGATGCGAAG





AAGCTATCGGGAAATATCCCTGGGAAGTTCACGAAGCTTCTAGGAGGATTTCTCAGTTTACCAC





TGAATTTTCCCGGCACCACCTACCACAAATGCTTGAAGGATATGAAGGAAATCCAGAAGAAGCT





AAGAGAGGTTGTAGACGATAGATTGGCTAATGTGGGCCCTGATGTGGAAGATTTCTTGGGGCAA





GCCCTTAAAGATAAGGAATCAGAGAAGTTCATTTCAGAGGAGTTCATCATCCAACTGTTGTTTT





CTATCAGTTTTGCTAGCTTTGAGTCCATCTCCACCACTCTTACTTTGATTCTCAAGCTCCTTGA





TGAACACCCAGAAGTAGTGAAAGAGTTGGAAGCTGAACACGAGGCGATTCGAAAAGCTAGAGCA





GATCCAGATGGACCAATTACTTGGGAAGAATACAAATCCATGACTTTTACATTACAAGTCATCA





ATGAAACCCTAAGGTTGGGGAGTGTCACACCTGCCTTGTTGAGGAAAACAGTTAAAGATCTTCA





AGTAAAAGGATACATAATCCCGGAAGGATGGACAATAATGCTTGTCACCGCTTCACGTCACAGA





GACCCAAAAGTCTATAAGGACCCTCATATCTTCAATCCATGGCGTTGGAAGGACTTGGACTCAA





TTACCATCCAAAAGAACTTCATGCCTTTTGGGGGAGGCTTAAGGCATTGTGCTGGTGCTGAGTA





CTCTAAAGTCTACTTGTGCACCTTCTTGCACATCCTCTGTACCAAATACCGATGGACCAAACTT





GGGGGAGGAAGGATTGCAAGAGCTCATATATTGAGTTTTGAAGATGGGTTACATGTGAAGTTCA





CACCCAAGGAATGA




AtCPR protein
MTSALYASDLFKQLKSIMGTDSLSDDVVLVIATTSLALVAGFVVLLWKKTTADRSGELKPLMIP
317




KSLMAKDEDDDLDLGSGKTRVSIFFGTQTGTAEGFAKALSEEIKARYEKAAVKDDYAADDDQYE





EKLKKETLAFFCVATYGDGEPTDNAARFYKWFTEENERDIKLQQLAYGVFALGNRQYEHFNKIG





IVLDEELCKKGAKRLIEVGLGDDDQSIEDDFNAWKESLWSELDKLLKDEDDKSVATPYTAVIPE





YRVVTHDPRFTTQKSMESNVANGNTTIDIHHPCRVDVAVQKELHTHESDRSCIHLEFDISRTGI





TYETGDHVGVYAENHVEIVEEAGKLLGHSLDLVFSIHADKEDGSPLESAVPPPFPGPCTLGTGL





ARYADLLNPPRKSALVALAAYATEPSEAEKLKHLTSPDGKDEYSQWIVASQRSLLEVMAAFPSA





KPPLGVFFAAIAPRLQPRYYSISSSPRLAPSRVHVTSALVYGPTPTGRIHKGVCSTWMKNAVPA





EKSHECSGAPIFIRASNFKLPSNPSTPIVMVGPGTGLAPFRGFLQERMALKEDGEELGSSLLFF





GCRNRQMDFIYEDELNNFVDQGVISELIMAFSREGAQKEYVQHKMMEKAAQVWDLIKEEGYLYV





CGDAKGMARDVHRTLHTIVQEQEGVSSSEAEAIVKKLQTEGRYLRDVW.




AtCPR gene
ATGACTTCTGCTTTGTATGCTTCCGATTTGTTTAAGCAGCTCAAGTCAATTATGGGGACAGATT
318



sequence
CGTTATCCGACGATGTTGTACTTGTGATTGCAACGACGTCTTTGGCACTAGTAGCTGGATTTGT





GGTGTTGTTATGGAAGAAAACGACGGCGGATCGGAGCGGGGAGCTGAAGCCTTTGATGATCCCT





AAGTCTCTTATGGCTAAGGACGAGGATGATGATTTGGATTTGGGATCCGGGAAGACTAGAGTCT





CTATCTTCTTCGGTACGCAGACTGGAACAGCTGAGGGATTTGCTAAGGCATTATCCGAAGAAAT





CAAAGCGAGATATGAAAAAGCAGCAGTCAAAGATGACTATGCTGCCGATGATGACCAGTATGAA





GAGAAATTGAAGAAGGAAACTTTGGCATTTTTCTGTGTTGCTACTTATGGAGATGGAGAGCCTA





CTGACAATGCTGCCAGATTTTACAAATGGTTTACGGAGGAAAATGAACGGGATATAAAGCTTCA





ACAACTAGCATATGGTGTGTTTGCTCTTGGTAATCGCCAATATGAACATTTTAATAAGATCGGG





ATAGTTCTTGATGAAGAGTTATGTAAGAAAGGTGCAAAGCGTCTTATTGAAGTCGGTCTAGGAG





ATGATGATCAGAGCATTGAGGATGATTTTAATGCCTGGAAAGAATCACTATGGTCTGAGCTAGA





CAAGCTCCTCAAAGACGAGGATGATAAAAGTGTGGCAACTCCTTATACAGCTGTTATTCCTGAA





TACCGGGTGGTGACTCATGATCCTCGGTTTACAACTCAAAAATCAATGGAATCAAATGTGGCCA





ATGGAAATACTACTATTGACATTCATCATCCCTGCAGAGTTGATGTTGCTGTGCAGAAGGAGCT





TCACACACATGAATCTGATCGGTCTTGCATTCATCTCGAGTTCGACATATCCAGGACGGGTATT





ACATATGAAACAGGTGACCATGTAGGTGTATATGCTGAAAATCATGTTGAAATAGTTGAAGAAG





CTGGAAAATTGCTTGGCCACTCTTTAGATTTAGTATTTTCCATACATGCTGACAAGGAAGATGG





CTCCCCATTGGAAAGCGCAGTGCCGCCTCCTTTCCCTGGTCCATGCACACTTGGGACTGGTTTG





GCAAGATACGCAGACCTTTTGAACCCTCCTCGAAAGTCTGCGTTAGTTGCCTTGGCGGCCTATG





CCACTGAACCAAGTGAAGCCGAGAAACTTAAGCACCTGACATCACCTGATGGAAAGGATGAGTA





CTCACAATGGATTGTTGCAAGTCAGAGAAGTCTTTTAGAGGTGATGGCTGCTTTTCCATCTGCA





AAACCCCCACTAGGTGTATTTTTTGCTGCAATAGCTCCTCGTCTACAACCTCGTTACTACTCCA





TCTCATCCTCGCCAAGATTGGCGCCAAGTAGAGTTCATGTTACATCCGCACTAGTATATGGTCC





AACTCCTACTGGTAGAATCCACAAGGGTGTGTGTTCTACGTGGATGAAGAATGCAGTTCCTGCG





GAGAAAAGTCATGAATGTAGTGGAGCCCCAATCTTTATTCGAGCATCTAATTTCAAGTTACCAT





CCAACCCTTCAACTCCAATCGTTATGGTGGGACCTGGGACTGGGCTGGCACCTTTTAGAGGTTT





TCTGCAGGAAAGGATGGCACTAAAAGAAGATGGAGAAGAACTAGGTTCATCTTTGCTCTTCTTT





GGGTGTAGAAATCGACAGATGGACTTTATATACGAGGATGAGCTCAATAATTTTGTTGATCAAG





GCGTAATATCTGAGCTCATCATGGCATTCTCCCGTGAAGGAGCTCAGAAGGAGTATGTTCAACA





TAAGATGATGGAGAAGGCAGCACAAGTTTGGGATCTAATAAAGGAAGAAGGATATCTCTATGTA





TGCGGTGATGCTAAGGGCATGGCGAGGGACGTCCACCGAACTCTACACACCATTGTTCAGGAGC





AGGAAGGTGTGAGTTCGTCAGAGGCAGAGGCTATAGTTAAGAAACTTCAAACCGAAGGAAGATA





CCTCAGAGATGTCTGGTGA




cucurbitadienol
MWRLKVGAESVGENDEKWLKSISNHLGRQVWEFCPDAGTQQQLLQVHKARKAFHDDRFHRKQSS
319



synthase [S.
DLFITIQYGKEVENGGKTAGVKLKEGEEVRKEAVESSLERALSFYSSIQTSDGNWASDLGGPMF





grosvernorii]

LLPGLVIALYVTGVLNSVLSKHHRQEMCRYVYNHQNEDGGWGLHIEGPSTMFGSALNYVALRLL




Seq 59, SgCbQ
GEDANAGAMPKARAWILDHGGATGITSWGKLWLSVLGVYEWSGNNPLPPEFWLFPYFLPFHPGR




protein
MWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYAVPYHEIDWNKSRNTCAKEDLYYPHPKM





QDILWGSLHHVYEPLFTRWPAKRLREKALQTAMQHIHYEDENTRYICLGPVNKVLNLLCCWVED





PYSDAFKLHLQRVHDYLWVAEDGMKMQGYNGSQLWDTAFSIQAIVSTKLVDNYGPTLRKAHDFV





KSSQIQQDCPGDPNVWYRHIHKGAWPFSTRDHGWLISDCTAEGLKAALMLSKLPSETVGESLER





NRLCDAVNVLLSLQNDNGGFASYELTRSYPWLELINPAETFGDIVIDYPYVECTSATMEALTLF





KKLHPGHRTKEIDTAIVRAANFLENMQRTDGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCLA





IRKACDFLLSKELPGGGWGESYLSCQNKVYTNLEGNRPHLVNTAWVLMALIEAGQAERDPTPLH





RAARLLINSQLENGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYCHRVLTE




cucurbitadieno1
ATGTGGAGGTTAAAGGTCGGAGCAGAAAGCGTTGGGGAGAATGATGAGAAATGGTTGAAGAGCA
320



synthase SgCbQ
TAAGCAATCACTTGGGACGCCAGGTGTGGGAGTTCTGTCCGGATGCCGGCACCCAACAACAGCT




gene sequence
CTTGCAAGTCCACAAAGCTCGTAAAGCTTTCCACGATGACCGTTTCCACCGAAAGCAATCTTCC





GATCTCTTTATCACTATTCAGTATGGAAAGGAAGTAGAAAATGGTGGAAAGACAGCGGGAGTGA





AATTGAAAGAAGGGGAAGAGGTGAGGAAAGAGGCAGTAGAGAGTAGCTTAGAGAGGGCATTAAG





TTTCTACTCAAGCATCCAGACAAGCGATGGGAACTGGGCTTCGGATCTTGGGGGGCCCATGTTT





TTACTTCCGGGTCTGGTGATTGCCCTCTACGTTACAGGCGTCTTGAATTCTGTTTTATCCAAGC





ACCACCGGCAAGAGATGTGCAGATATGTTTACAATCACCAGAATGAAGATGGGGGGTGGGGTCT





CCACATCGAGGGCCCAAGCACCATGTTTGGTTCCGCACTGAATTATGTTGCACTCAGGCTGCTT





GGAGAAGACGCCAACGCCGGGGCAATGCCAAAAGCACGTGCTTGGATCTTGGACCACGGTGGCG





CCACCGGAATCACTTCCTGGGGCAAATTGTGGCTTTCTGTACTTGGAGTCTACGAATGGAGTGG





CAATAATCCTCTTCCACCCGAATTTTGGTTATTTCCTTACTTCCTACCATTTCATCCAGGAAGA





ATGTGGTGCCATTGTCGAATGGTTTATCTACCAATGTCATACTTATATGGAAAGAGATTTGTTG





GGCCAATCACACCCATAGTTCTGTCTCTCAGAAAAGAACTCTACGCAGTTCCATATCATGAAAT





AGACTGGAATAAATCTCGCAATACATGTGCAAAGGAGGATCTGTACTATCCACATCCCAAGATG





CAAGATATTCTGTGGGGATCTCTCCACCACGTGTATGAGCCCTTGTTTACTCGTTGGCCTGCCA





AACGCCTGAGAGAAAAGGCTTTGCAGACTGCAATGCAACATATTCACTATGAAGATGAGAATAC





CCGATATATATGCCTTGGCCCTGTCAACAAGGTACTCAATCTGCTTTGTTGTTGGGTTGAAGAT





CCCTACTCCGACGCCTTCAAACTTCATCTTCAACGAGTCCATGACTATCTCTGGGTTGCTGAAG





ATGGCATGAAAATGCAGGGTTATAATGGGAGCCAGTTGTGGGACACTGCTTTCTCCATCCAAGC





AATCGTATCCACCAAACTTGTAGACAACTATGGCCCAACCTTAAGAAAGGCACACGACTTCGTT





AAAAGTTCTCAGATTCAGCAGGACTGTCCTGGGGATCCTAATGTTTGGTACCGTCACATTCATA





AAGGTGCATGGCCATTTTCAACTCGAGATCATGGATGGCTCATCTCTGACTGTACAGCAGAGGG





ATTAAAGGCTGCTTTGATGTTATCCAAACTTCCATCCGAAACAGTTGGGGAATCATTAGAACGG





AATCGCCTTTGCGATGCTGTAAACGTTCTCCTTTCTTTGCAAAACGATAATGGTGGCTTTGCAT





CATATGAGTTGACAAGATCATACCCTTGGTTGGAGTTGATCAACCCCGCAGAAACGTTTGGAGA





TATTGTCATTGATTATCCGTATGTGGAGTGCACCTCAGCCACAATGGAAGCACTGACGTTGTTT





AAGAAATTACATCCCGGCCATAGGACCAAAGAAATTGATACTGCTATTGTCAGGGCGGCCAACT





TCCTTGAAAATATGCAAAGGACGGATGGCTCTTGGTATGGATGTTGGGGGGTTTGCTTCACGTA





TGCGGGGTGGTTTGGCATAAAGGGATTGGTGGCTGCAGGAAGGACATATAATAATTGCCTTGCC





ATTCGCAAGGCTTGCGATTTTTTACTATCTAAAGAGCTGCCCGGCGGTGGATGGGGAGAGAGTT





ACCTTTCATGTCAGAATAAGGTATACACAAATCTTGAAGGAAACAGACCGCACCTGGTTAACAC





GGCCTGGGTTTTAATGGCCCTCATAGAAGCTGGCCAGGCTGAGAGAGACCCAACACCATTGCAT





CGTGCAGCAAGGTTGTTAATCAATTCCCAGTTGGAGAATGGTGATTTCCCCCAACAGGAGATCA





TGGGAGTCTTTAATAAAAATTGCATGATCACATATGCTGCATACCGAAACATTTTTCCCATTTG





GGCTCTTGGAGAGTATTGCCATCGGGTTTTGACTGAATAA




cucurbitadieno1
MWRLKVGAESVGEKDEKWVKSVSNHLGRQVWEFCAADAAAVTPHQLLQIQNARNHFHRNRFHRK
321



synthase Cpep2
QSSDLFLAIQYEKEIAKGGKGKEAVKVKEGEEVGKEAVKSTLERALSFYTAVQTSDGNWASDLG




protein
GPMFLLPGLVIALYVTGVLNSVLSKHHRVEMCRYIYNHQNEDGGWGLHIEGTSTMFGSALNYVA





LRLLGEDADGGDDGAMTKARAWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPPEFWLLPYS





LPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPKVLSLRQELYTVPYHEIDWNKSRNTCAKEDL





YYPHPKMQDILWGSIYHVYEPLFTRWPGKRLREKALQTAMKHIHYEDENSRYICLGPVNKVLNM





LCCWVEDPYSDAFKLHLQRVHDYLWVAEDGMRMQGYNGSQLWDTAFSIQAIVATKLVDSYAPTL





RKAHDFVKDSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLMLSKLPSTM





VGEPLEKNRLCDAVNVLLSLQNDNGGFASYELTRSYPWLELINPAETFGDIVIDYPYVECTAAT





MEALTLFKKLHPGHRTKEIDTAIGKAANFLEKMQRADGSWYGCWGVCFTYAGWFGIKGLVAAGR





TYNSCLAIRKACEFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVLMALIEAGQGE





RDPAPLHRGARLVMNSQLENGDFVQQEIMGVFNKNCMITYAAYRNIFPIWALGEYCHRVLTE.




cucurbitadienol
ATGTGGAGGCTGAAGGTGGGAGCAGAGAGCGTTGGGGAGAAGGATGAGAAATGGGTGAAGAGCG
322



synthase Cpep2
TAAGCAATCACTTGGGCCGCCAAGTTTGGGAGTTCTGTGCCGCCGACGCCGCCGCCGTCACTCC




gene sequence
TCACCAGTTACTACAAATTCAGAATGCTCGCAACCACTTCCATCGCAATCGTTTCCACCGGAAG





CAGTCTTCCGATCTCTTTCTCGCTATTCAGTATGAAAAGGAAATAGCGAAGGGCGGAAAAGGGA





AAGAGGCGGTGAAAGTGAAAGAAGGGGAGGAGGTGGGGAAAGAGGCGGTGAAGAGTACGTTAGA





GAGGGCACTAAGTTTCTACACAGCCGTGCAGACGAGCGATGGGAATTGGGCCTCGGATCTTGGA





GGGCCCATGTTTTTACTTCCGGGTCTCGTGATTGCCCTTTATGTCACAGGCGTGTTGAATTCAG





TTTTGTCCAAGCACCACCGCGTAGAGATGTGCAGATATATTTACAATCACCAGAATGAAGATGG





AGGGTGGGGTCTACATATTGAGGGCACAAGCACCATGTTTGGTTCGGCACTCAATTATGTTGCA





CTTAGGCTGCTTGGAGAAGACGCCGATGGCGGAGACGATGGTGCAATGACAAAAGCACGTGCTT





GGATCTTGGAGCGCGGCGGCGCCACTGCGATCACTTCGTGGGGAAAATTGTGGCTGTCCGTGCT





TGGAGTGTACGAATGGAGTGGCAACAACCCTCTTCCGCCTGAGTTTTGGCTTCTCCCTTACAGC





CTACCATTTCATCCAGGACGAATGTGGTGCCATTGTCGAATGGTTTATCTTCCCATGTCTTACT





TATATGGGAAGAGATTTGTTGGCCCAATCACTCCCAAAGTTCTTTCTCTAAGACAAGAGCTCTA





CACGGTTCCTTATCATGAAATAGACTGGAATAAATCCCGCAATACATGTGCAAAGGAGGATCTA





TACTATCCACATCCCAAGATGCAAGACATACTATGGGGATCTATCTACCATGTATATGAGCCAT





TGTTCACTCGTTGGCCTGGGAAACGCCTGAGGGAAAAGGCTTTACAAACTGCAATGAAACATAT





TCACTATGAAGATGAAAATAGTCGCTATATATGTCTTGGCCCAGTCAACAAGGTACTCAACATG





CTTTGTTGTTGGGTTGAAGATCCCTACTCAGACGCCTTCAAACTTCACCTTCAACGCGTCCATG





ACTATCTCTGGGTTGCTGAAGATGGCATGAGAATGCAGGGTTACAATGGCAGCCAGTTGTGGGA





CACTGCTTTCTCCATCCAAGCCATTGTAGCTACCAAACTTGTAGACAGCTATGCCCCAACTTTA





AGAAAAGCACATGACTTTGTTAAGGATTCTCAGATCCAGGAGGACTGTCCTGGGGATCCTAATG





TTTGGTTCCGTCATATTCATAAAGGTGCTTGGCCATTTTCGACTCGAGATCATGGATGGCTCAT





CTCTGACTGCACGGCTGAGGGATTGAAGGCTTCTTTGATGTTATCCAAACTTCCATCCACAATG





GTTGGGGAGCCATTAGAAAAGAATCGCCTTTGTGATGCTGTTAATGTTCTCCTTTCTTTGCAAA





ATGATAACGGTGGATTTGCATCATACGAGTTGACGAGATCATACCCTTGGTTGGAGTTGATCAA





CCCAGCAGAAACATTCGGAGACATTGTCATCGACTATCCGTATGTGGAGTGCACCGCAGCAACA





ATGGAAGCACTGACGTTATTTAAGAAGCTACATCCAGGCCATAGGACCAAAGAGATTGACACAG





CTATTGGCAAGGCAGCCAACTTCCTTGAGAAAATGCAAAGGGCGGATGGCTCTTGGTATGGGTG





TTGGGGGGTTTGTTTCACGTATGCGGGGTGGTTTGGCATCAAGGGATTGGTGGCTGCAGGAAGA





ACATATAATAGCTGCCTTGCCATCCGCAAGGCTTGTGAGTTTCTGCTATCTAAAGAGCTGCCCG





GCGGTGGATGGGGGGAGAGTTACCTTTCATGTCAGAATAAGGTGTACACCAATCTTGAGGGAAA





CAAGCCACACTTGGTTAACACTGCCTGGGTTTTAATGGCTCTCATTGAAGCCGGCCAGGGTGAG





AGAGACCCAGCACCATTGCACCGTGGAGCAAGGTTGGTAATGAATTCTCAACTGGAGAATGGTG





ATTTCGTGCAACAGGAGATCATGGGAGTGTTCAATAAGAACTGCATGATCACATATGCTGCATA





CCGAAACATCTTCCCCATTTGGGCGCTTGGAGAGTATTGCCATCGGGTTCTTACTGAATGA




cucurbitadienol
MWRLKVGAESVGEKDEKWVKSVSNHLGRQVWEFCAADAAAVTPHQLLQIQNARNHFHRNRFHRK
323



synthase Cpep4
QSSDLFLAIQYEKEIAKGGKGKEAVKVKEGEEVGKEAVKSTLERALSFYTAVQTSDGNWASDLG




protein
GPMFLLPGLVIALYVTGVLNSVLSKHHRVEMCRYIYNHQNEDGGWGLHIEGTSTMFGSALNYVA





LRLLGEDADGGDDGAMTKARAWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPPEFLLLPYS





LPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPKVLSLRQELYTVPYHEIDWNKSRNTCAKEDL





YYPHPKMQDILWGSIYHVYEPLFTRWPGKRLREKALQTAMKHIHYEDENSRYICLGPVNKVLNM





LCCWVEDPYSDAFKLHLQRVHDYLWVAEDGMRMQGYNGSQLWDTAFSIQAIVATKLVDSYAPTL





RKAHDFVKDSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLMLSKLPSTM





VGEPLEKNRLCDAVNVLLSLQNDNGGFASYELTRSYPWLELINPAETFGDIVIDYSYVECTAAT





MEALTLFKKLHPGHRTKEIDTAIGKAANFLEKMQRADGSWYGCWGVCFTYAGWFGIKGLVAAGR





TYNSCLAIRKACEFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVLMALIEAGQGE





RDPAPLHRAARLVMNSQLENGDFVQQEIMGVFNKNCMITYAAYRNIFPIWALGEYCHRVLTE.




cucurbitadienol
ATGTGGAGGCTGAAGGTGGGAGCAGAGAGCGTTGGGGAGAAGGATGAGAAATGGGTGAAGAGCG
324



synthase Cpep4
TAAGCAATCACTTGGGCCGCCAAGTTTGGGAGTTCTGTGCCGCCGACGCCGCCGCCGTCACTCC




gene sequence
TCACCAGTTACTACAAATTCAGAATGCTCGCAACCACTTCCATCGCAATCGTTTCCACCGGAAG





CAGTCTTCCGATCTCTTTCTCGCTATTCAGTATGAAAAGGAAATAGCGAAGGGCGGAAAAGGGA





AAGAGGCGGTGAAAGTGAAAGAAGGGGAGGAGGTGGGGAAAGAGGCGGTGAAGAGTACGTTAGA





GAGGGCACTAAGTTTCTACACAGCCGTGCAGACGAGCGATGGGAATTGGGCCTCGGATCTTGGA





GGGCCCATGTTTTTACTTCCGGGTCTCGTGATTGCCCTTTATGTCACAGGCGTGTTGAATTCAG





TTTTGTCCAAGCACCACCGCGTAGAGATGTGCAGATATATTTACAATCACCAGAATGAAGATGG





AGGGTGGGGTCTACATATTGAGGGCACAAGCACCATGTTTGGTTCGGCACTCAATTATGTTGCA





CTTAGGCTGCTTGGAGAAGACGCCGATGGCGGAGACGATGGTGCAATGACAAAAGCACGTGCTT





GGATCTTGGAGCGCGGCGGCGCCACTGCGATCACTTCGTGGGGAAAATTGTGGCTGTCCGTGCT





TGGAGTGTACGAATGGAGTGGCAACAACCCTCTTCCGCCTGAGTTTTTGCTTCTCCCTTACAGC





CTACCATTTCATCCAGGACGAATGTGGTGCCATTGTCGAATGGTTTATCTTCCCATGTCTTACT





TATATGGGAAGAGATCTGTTCGCCCAATCACTCCCAAAGTTCTTTCTCTAAGACAAGAGCTCTA





CACGGTTCCTTATCATGAAATAGACTGGAATAAATCCCGCAATACATGTGCAAAGGAGGATCTA





TACTATCCACATCCCAAGATGCAAGACATACTATGGGGATCTATCTACCATGTATATGAGCCAT





TGTTCACTCGTTGGCCTGGGAAACGCCTGAGGGAAAAGGCTTTACAAACTGCAATGAAACATAT





TCACTATGAAGATGAAAATAGTCGCTATATATGTCTTGGCCCAGTCAACAAGGTACTCAACATG





CTTTGTTGTTGGGTTGAAGATCCCTACTCAGACGCCTTCAAACTTCACCTTCAACGCGTCCATG





ACTATCTCTGGGTTGCTGAAGATGGCATGAGAATGCAGGGTTACAATGGCAGCCAGTTGTGGGA





CACTGCTTTCTCCATCCAAGCCATTGTAGCTACCAAACTTGTAGACAGCTATGCCCCAACTTTA





AGAAAAGCACATGACTTTGTTAAGGATTCTCAGATCCAGGAGGACTGTCCTGGGGATCCTAATG





TTTGGTTCCGTCATATTCATAAAGGTGCTTGGCCATTTTCGACTCGAGATCATGGATGGCTCAT





CTCTGACTGCACGGCTGAGGGATTGAAGGCTTCTTTGATGTTATCCAAACTTCCATCCACAATG





GTTGGGGAGCCATTAGAAAAGAATCGCCTTTGTGATGCTGTTAATGTTCTCCTTTCTTTGCAAA





ATGATAACGGCGGATTTGCATCATACGAGTTGACGAGATCATACCCTTGGTTGGAGTTGATCAA





CCCAGCAGAAACATTCGGAGACATTGTCATCGACTATTCGTATGTGGAGTGCACCGCAGCAACA





ATGGAAGCACTGACGTTATTTAAGAAGCTACATCCAGGCCATAGGACCAAAGAGATTGACACAG





CTATTGGCAAGGCAGCCAACTTCCTTGAGAAAATGCAAAGGGCGGATGGCTCTTGGTATGGGTG





TTGGGGGGTTTGTTTCACGTATGCGGGGTGGTTTGGCATAAAGGGATTGGTGGCTGCAGGAAGA





ACATATAATAGCTGTCTTGCCATCCGCAAGGCTTGTGAGTTTCTGCTATCTAAAGAGCTGCCCG





GCGGTGGATGGGGGGAGAGTTACCTTTCATGTCAGAATAAGGTGTACACCAATCTTGAGGGAAA





CAAGCCACACTTGGTTAACACTGCCTGGGTTTTAATGGCTCTCATTGAAGCTGGCCAGGGTGAG





AGAGACCCAGCACCATTGCACCGTGCAGCAAGGTTGGTAATGAATTCTCAACTGGAGAATGGCG





ATTTCGTGCAACAGGAGATCATGGGAGTGTTCAATAAGAACTGCATGATCACATATGCTGCATA





CCGAAACATCTTCCCCATTTGGGCGCTTGGAGAGTATTGCCATCGGGTTCTTACTGAATGA




cucurbitadienol
MWRLKVGAESVGEKDEKWVKSVSNHLGRQVWEFCADAAADTPHQLLQIQNARNHFHHNRFHRKQ
325



synthase Cmaxl
SSDLFLAIQYEKEIAKGAKGGAVKVKEGEEVGKEAVKSTLESALGFYSAVQTSDGNWASDLGGP




protein
MFLLPGLVIALHVTGVLNSVLSKHHRVEMCRYLYNHQNEDGGWGLHIEGTSTMFGSALNYVALR





LLGEDADGGDGGAMTKARAWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPPEFWLLPYSLP





FHPGRMWCHCRMVYLPMSYLYGKRFVGPITPKVLSLRQELYTIPYHEIDWNKSRNTCAKEDLYY





PHPKMQDILWGSIYHVYEPLFTRWPGKRLREKALQAAMKHIHYEDENSRYICLGPVNKVLNMLC





CWVEDPYSDAFKLHLQRVHDYLWVAEDGMRMQGYNGSQLWDTAFSIQAIVATKLVDSYAPTLRK





AHDFVKDSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLMLSKLPSAMVG





EPLEKNRLCDAVNVLLSLQNDNGGFASYELTRSYPWLELINPAETFGDIVIDYPYVECTAATME





ALTLFKKLHPGHRTKEIDTAIGKAANFLEKMQRADGSWYGCWGVCFTYAGWFGIKGLVAAGRTY





NSCLAIRKACEFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVLMALIEAGQGERD





PAPLHRAARLLMNSQLENGDFVQQEIMGVFNKNCMITYAAYRNIFPIWALGEYCHRVLTE




cucurbitadienol
ATGTGGAGGCTGAAGGTGGGAGCAGAGAGCGTTGGGGAGGAGGATGAGAAATGGGTGAAGAGCG
326



synthase gene
TAAGCAATCACTTGGGCCGCCAAGTTTGGGAGTTCTGTGCCGACGCCGCCGCCGACACTCCTCA





CCAGTTACTACAAATTCAGAATGCTCGCAACCACTTCCATCACAATCGTTTCCACCGGAAGCAG





TCTTCCGATCTCTTTCTGGCTATTCAATATGAAAAGGAAATAGCAAAGGGCGCAAAAGGTGGAG





CGGTGAAAGTGAAAGAAGGGGAGGAGGTGGGGAAAGAGGCGGTGAAGAGTACGTTAGAAAGGGC





ACTCGGTTTCTACTCGGCCGTGCAGACAAGAGATGGGAATTGGGCCTCGGATCTTGGAGGGCCC





TTGTTTTTACTTCCGGGTCTCGTGATTGCCCTTCATGTCACAGGCGTCTTGAATTCAGTTTTGT





CCAAGCACCACCGCGTAGAGATGTGCAGATATCTTTACAATCACCAGAATGAAGATGGAGGGTG





GGGTCTACATATTGAGGGCACAAGCACCATGTTTGGTTCGGCACTGAATTACGTTGCACTAAGG





CTGCTTGGAGAAGACGCCGATGGCGGAGACGGTGGCGCAATGACAAAAGCACGTGCTTGGATCT





TGGAGCGCGGCGGCGCCACTGCGATCACTTCGTGGGGAAAATTGTGGCTGTCCGTACTTGGAGT





GTACGAATGGAGTGGCAACAACCCTCTTCCGCCTGAGTTTTGGCTTCTCCCTTACAGCCTACCA





TTTCATCCAGGAAGAATGTGGTGCCATTGTCGAATGGTTTATCTTCCAATGTCTTACTTATATG





GGAAGAGATTTGTTGGGCCAATCACTCCCAAAGTTCTTTCTCTAAGGCAAGAGCTCTACACAAT





TCCTTATCATGAAATAGACTGGAATAAATCCCGCAATACATGTGCAAAGGAGGATCTGTACTAT





CCACATCCCAAGATGCAAGACATTCTATGGGGATCCATCTACCATGTATATGAGCCATTGTTCA





CTCGTTGGCCTGGGAAACGCCTGAGGGAAAAGGCTTTACAAGCTGCAATGAAACATATTCACTA





TGAAGATGAAAATAGTCGATATATATGTCTTGGCCCAGTCAACAAGGTACTCAACATGCTTTGT





TGTTGGGTTGAAGATCCCTACTCAGACGCCTTCAAACTTCACCTTCAACGCGTCCATGACTATC





TCTGGGTTGCTGAAGATGGCATGAGAATGCAGGGCTACAATGGCAGCCAGTTGTGGGACACTGC





TTTCTCCATCCAAGCCATCGTAGCCACCAAACTTGTAGACAGCTATGCCCCAACTTTAAGAAAA





GCACATGACTTTGTTAAGGATTCTCAGATCCAGGAGGACTGTCCTGGGGATCCTAATGTTTGGT





TCCGTCATATTCATAAAGGTGCTTGGCCACTTTCGACACGAGATCATGGATGGCTCATCTCCGA





CTGTACAGCTGAGGGATTGAAGGCTTCTTTGATGTTATCCAAACTTCCATCCACAATGGTTGGG





GAGCCATTAGAAAAGAATCGCCTTTGTGATGCTGTTAATGTTCTCCTTTCTTTGCAAAATGATA





ATGGTGGATTTGCATCATACGAGTTGACGAGATCATACCCTTGGTTGGAGTTGATCAACCCAGC





TGAAACATTCGGAGACATTGTCATTGACTATCCGTATGTGGAGTGCACCGCAGCAACAATGGAA





GCACTGACGTTATTTAAGAAGCTACATCCAGGCCATAGGACCAAAGAGATTGACACAGCTATTG





GCAAGGCAGCCAACTTCCTTGAGAAAATGCAGAGGGCGGATGGCTCTTGGTACGGGTGTTGGGG





GGTTTGTTTTACGTATGCGGGTTGGTTTGGCATAAAGGGATTGGTGGCTGCAGGAAGAACATAT





AATAGCTGCCTTGCCATTCGCAAGGCTTGTGAGTTTCTGCTATCTAAAGAGCTGCCCGGCGGTG





GATGGGGGGAGAGTTACCTTTCATGTCAGAATAAGGTGTACACCAATCTTGAGGGGAACAAGCC





ACACTTGGTTAACACTGCCTGGGTTTTAATGGCTCTCATTGAAGCTGGCCAGGGTGAGAGAGAC





CCAGCACCATTGCACCGTGCAGCAAGGTTGCTAATGAATTCCCAATTGGAGAATGGCGATTTCG





TGCAACAGGAGATCATGGGAGTGTTCAATAAGAACTGCATGATCACATATGCTGCATACCGAAA





CATCTTCCCCATTTGGGCGCTTGGAGAGTATTGCCATCGGGTTCTTACTGAATGA




cucurbitadienol
MWRLKVGAESVGEKDEKWVKSVSNHLGRQVWEFCADAAAAATPRQLLQIQNARNHFHRNRFHRK
327



synthase Cmos1
QSSDLFLAIQYEKEIAEGGKGGAVKVKEEEEVGKEAVKSTLERALSFYSAVQTSDGNWASDLGG




protein
PMFLLPGLVIALYVTGVLNSVLSKHHRVEMCRYLYNHQNEDGGWGLHIEGTSTMFGSALNYVAL





RLLGEDADGGDDGAMTKARAWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPPEFWLLPYSL





PFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPKVLSLRQELYTVPYHEIDWNKSRNTCAKEDLY





YPHPKMQDILWGSIYHVYEPLFTRWPGKRLREKALQTAMKHIHYEDENSRYICLGPVNKVLNML





CCWVEDPYSDAFKLHLQRVHDYLWVAEDGMRMQGYNGSQLWDTAFSIQAIVATKLVDSYAPTLR





KAHDFVKDSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLMLSKLPSAMV





GEPLEKNRLCDAVNVLLSLQNDNGGFASYELTRSYPWLELINPAETFGDIVIDYPYVECTAATM





EALTLFKKLHPGHRTKEIDTAIGKAANFLEKMQRADGSWYGCWGVCFTYAGWFGIKGLVAAGRT





YNSCLAIRKACEFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVLMALIEAGQGER





DPAPLHRAARLLMNSQLENGDFVQQEIMGVFNKNCMITYAAYRNIFPIWALGEYCHRVLTE




cucurbitadienol
ATGTGGAGGTTGAAGGTGGGAGCAGAGAGCGTTGGGGAGAAGGATGAGAAATGGGTGAAGAGCG
328



synthase Cmos1
TAAGCAATCACTTGGGCCGCCAAGTTTGGGAGTTCTGTGCCGACGCCGCCGCCGCCGCCACTCC




gene sequence
TCGCCAGTTACTACAAATTCAGAATGCTCGCAACCACTTCCATCGCAATCGTTTCCACCGGAAG





CAGTCTTCCGATCTCTTTCTCGCTATTCAGTATGAAAAGGAAATAGCAGAGGGCGGAAAAGGTG





GAGCGGTGAAAGTGAAAGAAGAGGAGGAGGTGGGGAAAGAGGCGGTGAAGAGTACGTTAGAAAG





GGCACTAAGTTTCTACTCAGCCGTGCAGACAAGCGATGGGAATTGGGCCTCGGATCTTGGAGGG





CCCATGTTTTTACTTCCGGGTCTCGTGATTGCCCTTTATGTCACAGGCGTGTTGAATTCAGTTT





TGTCCAAGCACCACCGCGTAGAGATGTGCAGATATCTTTACAATCACCAGAATGAAGATGGAGG





GTGGGGTCTACATATTGAGGGCACAAGCACCATGTTTGGTTCGGCACTCAATTACGTTGCACTA





AGGCTGCTTGGAGAAGACGCGGATGGCGGAGACGATGGCGCAATGACAAAAGCACGTGCTTGGA





TCTTGGAGCGCGGCGGCGCCACTGCGATCACTTCGTGGGGAAAGTTGTGGCTGTCCGTGCTTGG





AGTGTACGAATGGAGTGGCAACAACCCTCTTCCGCCTGAGTTTTGGCTTCTCCCTTACAGCCTA





CCATTTCATCCAGGAAGAATGTGGTGCCATTGTCGAATGGTTTATCTTCCCATGTCTTACTTAT





ATGGGAAGAGATTTGTTGGGCCAATCACTCCCAAAGTTCTATCGCTAAGACAAGAGCTTTACAC





GGTTCCTTATCATGAAATAGACTGGAACAAATCCCGCAATACATGTGCAAAGGAGGATCTATAC





TATCCACATCCCAAGATGCAAGACATTCTATGGGGATCCATCTACCATGTGTATGAGCCATTGT





TCACTCGTTGGCCTGGGAAACGCCTGAGGGAAAAGGCTTTACAAACTGCAATGAAACATATTCA





CTATGAAGATGAAAATAGTCGATATATATGTCTTGGCCCAGTCAACAAGGTACTCAACATGCTT





TGTTGTTGGGTTGAAGATCCCTACTCAGACGCCTTCAAACTTCACCTTCAACGCGTCCATGACT





ATCTCTGGGTTGCTGAAGATGGCATGAGAATGCAGGGCTACAATGGCAGCCAGTTGTGGGACAC





TGCTTTCTCCATCCAAGCCATCGTAGCCACCAAACTTGTAGACAGCTATGCCCCAACTTTAAGA





AAAGCACATGACTTTGTTAAGGATTCTCAGATCCAGGAGGACTGTCCTGGGGATCCTAATGTTT





GGTTCCGTCATATTCATAAAGGTGCTTGGCCATTTTCGACTCGAGATCATGGATGGCTCATCTC





CGACTGTACAGCTGAGGGATTGAAGGCTTCTTTGATGTTATCCAAACTTCCATCCGCAATGGTT





GGGGAGCCATTAGAAAAGAATCGCCTTTGTGATGCTGTTAATGTTCTCCTTTCTTTGCAAAATG





ATAATGGTGGATTTGCATCATACGAGTTGACGAGATCATACCCTTGGTTGGAGTTGATCAACCC





AGCAGAAACATTCGGAGACATTGTCATCGACTATCCGTATGTGGAGTGCACCGCAGCAACAATG





GAAGCACTGACGTTATTTAAGAAGCTACATCCAGGCCATAGGACCAAAGAGATTGACACAGCTA





TTGGCAAGGCAGCCAACTTCCTTGAGAAAATGCAGAGGGCGGATGGCTCTTGGTATGGGTGTTG





GGGGGTTTGTTTCACGTATGCGGGGTGGTTTGGCATAAAGGGATTGGTGGCTGCAGGAAGAACA





TATAATAGCTGCCTTGCCATCCGCAAGGCTTGTGAGTTTCTGCTATCTAAAGAGCTGCCCGGCG





GTGGATGGGGGGAGAGTTACCTTTCATGTCAGAATAAGGTGTACACCAATCTTGAGGGAAACAA





GCCACACTTGGTTAACACTGCCTGGGTTTTAATGGCTCTCATTGAAGCTGGCCAGGGTGAGAGA





GACCCAGCACCATTGCACCGTGCAGCAAGGTTGCTAATGAATTCCCAATTGGAGAATGGCGATT





TCGTGCAACAGGAGATCATGGGAGTGTTCAATAAGAACTGCATGATCACATATGCTGCATACCG





AAACATCTTCCCCATTTGGGCGCTTGGAGAGTATTGCCATCGGGTTCTGACTGAAT




cucurbitadienol
MWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDDEAIAVANNSASKFENARNHFR
329



synthase [Cucumis
NNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKKEAVKNTLERALSFYSAVQTSD





melo]

GNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIYNHQNEDGGWGLHIEGSSTMF





GSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPP





EFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNRSR





NTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKAMKIAMEHIHYEDENSRYICLG





PVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGYNGSQLWDTAFSIQAIISTKL





IDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLM





LSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDYS





YVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQKTDGSWYGCWGVCFTYAGWFGI





KGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVMMA





LIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYS





HRVLDM.




cucurbitadienol
MWRLKVGAESVGEKEEKWLKSISNHLGRQVWEFCADQPTASPNHLQQIDNARKHFRNNRFHRKQ
330



synthase
SSDLFLAIQNEKEIANGTKGGGIKVKEEEDVRKETVKNTVERALSFYSAIQTNDGNWASDLGGP




[Citrullus
MFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYLYNHQNEDGGWGLHIEGTSTMFGSALNYVALR





colocynthis]

LLGEDADGGEGGAMTKARGWILDRGGATAITSWGKLWLSVLGVYEWSGNNPLPPEFWLLPYCLP





FHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNKSRNTCAKEDLYY





PHPKMQDILWGSIYHLYEPLFTRWPGKRLREKALQMAMKHIHYEDENSRYICLGPVNKVLNMLC





CWVEDPYSDAFKFHLQRVPDYLWIAEDGMRMQGYNGSQLWDTAFSVQAIISTKLIDSFGTTLKK





AHDFVKDSQIQQDFPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLMLSKLPSKIVG





EPLEKSRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDYPYVECTSATME





ALTLFKKLHPGHRTKEIDTAVAKAANFLENMQRTDGSWYGCWGVCFTYAGWFGIKGLVAAGRTY





STCVAIRKACDFLLSKELPGGGWGESYLSCQNKVYTNLEGNRPHLVNTAWVLMALIEAGQAERD





PAPLHRAARLLINSQLENGDFPQEEIMGVFNKNCMITYAAYRNIFPIWALGEYFHRVLTE.




cucurbitadienol
MWRLKVGAESVGEEDEKWVKSVSNHLGRQVWEFCADAAADTPHQLLQIQNARNHFHHNRFHRKQ
331



synthase
SSDLFLAIQYEKEIAKGAKGGAVKVKEGEEVGKEAVKSTLERALGFYSAVQTRDGNWASDLGGP




[Cucurbitapepo]
LFLLPGLVIALHVTGVLNSVLSKHHRVEMCRYLYNHQNEDGGWGLHIEGTSTMFGSALNYVALR





LLGEDADGGDGGAMTKARAWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPPEFWLLPYSLP





FHPGRMWCHCRMVYLPMSYLYGKRFVGPITPKVLSLRQELYTIPYHEIDWNKSRNTCAKEDLYY





PHPKMQDILWGSIYHVYEPLFTRWPGKRLREKALQAAMKHIHYEDENSRYICLGPVNKVLNMLC





CWVEDPYSDAFKLHLQRVHDYLWVAEDGMRMQGYNGSQLWDTAFSIQAIVATKLVDSYAPTLRK





AHDFVKDSQIQEDCPGDPNVWFRHIHKGAWPLSTRDHGWLISDCTAEGLKASLMLSKLPSTMVG





EPLEKNRLCDAVNVLLSLQNDNGGFASYELTRSYPWLELINPAETFGDIVIDYPYVECTAATME





ALTLFKKLHPGHRTKEIDTAIGKAANFLEKMQRADGSWYGCWGVCFTYAGWFGIKGLVAAGRTY





NSCLAIRKACEFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVLMALIEAGQGERD





PAPLHRAARLLMNSQLENGDFVQQEIMGVFNKNCMITYAAYRNIFPIWALGEYCHRVLTE




cucurbitadienol
MWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCAENDDDDDDEAVIHVVANSSKHLLQQQRRQ
332



synthase [Cucumis
SSFENARKQFRNNRFHRKQSSDLFLTIQYEKEIARNGAKNGGNTKVKEGEDVKKEAVNNTLERA





sativa]

LSFYSAIQTSDGNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIYNHQNEDGGW





GLHIEGSSTMFGSALNYVALRLLGEDANGGECGAMTKARSWILERGGATAITSWGKLWLSVLGV





YEWSGNNPLPPEFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITHMVLSLRKELYTI





PYHEIDWNRSRNTCAQEDLYYPHPKMQDILWGSIYHVYEPLFNGWPGRRLREKAMKIAMEHIHY





EDENSRYIYLGPVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGYNGSQLWDTA





FSIQAILSTKLIDTFGSTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISD





CTAEGLKASLMLSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPA





ETFGDIVIDYSYVECTSATMEALALFKKLHPGHRTKEIDAALAKAANFLENMQRTDGSWYGCWG





VCFTYAGWFGIKGLVAAGRTYNNCVAIRKACHFLLSKELPGGGWGESYLSCQNKVYTNLEGNRP





HLVNTAWVLMALIEAGQGERDPAPLHRAARLLINSQLENGDFPQQEIMGVFNKNCMITYAAYRN





IFPIWALGEYSHRVLTE




cucurbitadienol
DGNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYLYNHQNEDGGWGLHIEGTSTM
333



synthase
FGSALNYVALRLLGEDADGGEGGAMTKARSWILDRGGATAITSWGKLWLSVLGVYEWSGNNPLP




[Citrullus
PEFWLLPYCLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNRS





lanatus] (partial)

RNTCAKEDLYYPHPKMQDILWGSIYHLYEPLFTRWPGKRLREKALQMAMKHIHYEDENSRYICL





GPVNKVLNMLCCWVEDPYSDAFKFHLQRVPDYLWVAEDGMRMQGYNGSQLWDTAFSVQAIISTK





LIDSFGTTLKKAHDFVKDSQIQQDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASL





MLSKLPSEIVGEPLEKSRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDY





PYVECTSATMEALTLFKKLHPGRRTKEIDIAVARAANFLENMQRTDGSWYGCWGVCFTYAGWFG





IKGLVAAGRTYNSCVAIRKACDFLLSKELPGGGWGESYLSCQNKVYTNLEGNRPHLVNTAWVLM





ALIEAGQAERDPAPLHRAARLLINSQLENGDFPQEEIMGVFNKNCMITYAAYRNIFPIWALGEY





FHRVLTE




Squalene
MSAVNVAPELINADNTITYDAIVIGAGVIGPCVATGLARKGKKVLIVERDWAMPDRIVGELMQP
334



epoxidase/
GGVRALRSLGMIQSINNIEAYPVTGYTVFFNGEQVDIPYPYKADIPKVEKLKDLVKDGNDKVLE




squalene
DSTIHIKDYEDDERERGVAFVHGRFLNNLRNITAQEPNVTRVQGNCIEILKDEKNEVVGAKVDI




monooxidase
DGRGKVEFKAHLTFICDGIFSRFRKELHPDHVPTVGSSFVGMSLFNAKNPAPMHGHVILGSDHM





PILVYQISPEETRILCAYNSPKVPADIKSWMIKDVQPFIPKSLRPSFDEAVSQGKFRAMPNSYL





PARQNDVTGMCVIGDALNMRHPLTGGGMTVGLHDVVLLIKKIGDLDFSDREKVLDELLDYHFER





KSYDSVINVLSVALYSLFAADSDNLKALQKGCFKYFQRGGDCVNKPVEFLSGVLPKPLQLTRVF





FAVAFYTIYLNMEERGFLGLPMALLEGIMILITAIRVFTPFLFGELIG




Squalene
ATGTCTGCTGTTAACGTTGCACCTGAATTGATTAATGCCGACAACACAATTACCTACGATGCGA
335



epoxidase/
TTGTCATCGGTGCTGGTGTTATCGGTCCATGTGTTGCTACTGGTCTAGCAAGAAAGGGTAAGAA




squalene
AGTTCTTATCGTAGAACGTGACTGGGCTATGCCTGATAGAATTGTTGGTGAATTGATGCAACCA




monooxidase gene
GGTGGTGTTAGAGCATTGAGAAGTCTGGGTATGATTCAATCTATCAACAACATCGAAGCATATC




sequence
CTGTTACCGGTTATACCGTCTTTTTCAACGGCGAACAAGTTGATATTCCATACCCTTACAAGGC





CGATATCCCTAAAGTTGAAAAATTGAAGGACTTGGTCAAAGATGGTAATGACAAGGTCTTGGAA





GACAGCACTATTCACATCAAGGATTACGAAGATGATGAAAGAGAAAGGGGTGTTGCTTTTGTTC





ATGGTAGATTCTTGAACAACTTGAGAAACATTACTGCTCAAGAGCCAAATGTTACTAGAGTGCA





AGGTAACTGTATTGAGATATTGAAGGATGAAAAGAATGAGGTTGTTGGTGCCAAGGTTGACATT





GATGGCCGTGGCAAGGTGGAATTCAAAGCCCACTTGACATTTATCTGTGACGGTATCTTTTCAC





GTTTCAGAAAGGAATTGCACCCAGACCATGTTCCAACTGTCGGTTCTTCGTTTGTCGGTATGTC





TTTGTTCAATGCTAAGAATCCTGCTCCTATGCACGGTCACGTTATTCTTGGTAGTGATCATATG





CCAATCTTGGTTTACCAAATCAGTCCAGAAGAAACAAGAATCCTTTGTGCTTACAACTCTCCAA





AGGTCCCAGCTGATATCAAGAGTTGGATGATTAAGGATGTCCAACCTTTCATTCCAAAGAGTCT





ACGTCCTTCATTTGATGAAGCCGTCAGCCAAGGTAAATTTAGAGCTATGCCAAACTCCTACTTG





CCAGCTAGACAAAACGACGTCACTGGTATGTGTGTTATCGGTGACGCTCTAAATATGAGACATC





CATTGACTGGTGGTGGTATGACTGTCGGTTTGCATGATGTTGTCTTGTTGATTAAGAAAATAGG





TGACCTAGACTTCAGCGACCGTGAAAAGGTTTTGGATGAATTACTAGACTACCATTTCGAAAGA





AAGAGTTACGATTCCGTTATTAACGTTTTGTCAGTGGCTTTGTATTCTTTGTTCGCTGCTGACA





GCGATAACTTGAAGGCATTACAAAAAGGTTGTTTCAAATATTTCCAAAGAGGTGGCGATTGTGT





CAACAAACCCGTTGAATTTCTGTCTGGTGTCTTGCCAAAGCCTTTGCAATTGACCAGGGTTTTC





TTCGCTGTCGCTTTTTACACCATTTACTTGAACATGGAAGAACGTGGTTTCTTGGGATTACCAA





TGGCTTTATTGGAAGGTATTATGATTTTGATCACAGCTATTAGAGTATTCACCCCATTTTTGTT





TGGTGAGTTGATTGGTTAA




Squalene synthase
MGKLLQLALHPVEMKAALKLKFCRTPLFSIYDQSTSPYLLHCFELLNLTSRSFAAVIRELHPEL
336



Erg9
RNCVTLFYLILRALDTIEDDMSIEHDLKIDLLRHFHEKLLLTKWSFDGNAPDVKDRAVLTDFES





ILIEFHKLKPEYQEVIKEITEKMGNGMADYILDENYNLNGLQTVHDYDVYCHYVAGLVGDGLTR





LIVIAKFANESLYSNEQLYESMGLFLQKTNIIRDYNEDLVDGRSFWPKEIWSQYAPQLKDFMKP





ENEQLGLDCINHLVLNALSHVIDVLTYLAGIHEQSTFQFCAIPQVMAIATLALVFNNREVLHGN





VKIRKGTTCYLILKSRTLRGCVEIFDYYLRDIKSKLAVQDPNFLKLNIQISKIEQFMEEMYQDK





LPPNVKPNETPIFLKVKERSRYDDELVPTQQEEEYKFNMVLSIILSVLLGFYYIYTLHRA




Squalene synthase
ATGGGAAAGCTATTACAATTGGCATTGCATCCGGTCGAGATGAAGGCAGCTTTGAAGCTGAAGT
337



Erg9 gene sequence
TTTGCAGAACACCGCTATTCTCCATCTATGATCAGTCCACGTCTCCATATCTCTTGCACTGTTT





CGAACTGTTGAACTTGACCTCCAGATCGTTTGCTGCTGTGATCAGAGAGCTGCATCCAGAATTG





AGAAACTGTGTTACTCTCTTTTATTTGATTTTAAGGGCTTTGGATACCATCGAAGACGATATGT





CCATCGAACACGATTTGAAAATTGACTTGTTGCGTCACTTCCACGAGAAATTGTTGTTAACTAA





ATGGAGTTTCGACGGAAATGCCCCCGATGTGAAGGACAGAGCCGTTTTGACAGATTTCGAATCG





ATTCTTATTGAATTCCACAAATTGAAACCAGAATATCAAGAAGTCATCAAGGAGATCACCGAGA





AAATGGGTAATGGTATGGCCGACTACATCTTAGATGAAAATTACAACTTGAATGGGTTGCAAAC





CGTCCACGACTACGACGTGTACTGTCACTACGTAGCTGGTTTGGTCGGTGATGGTTTGACCCGT





TTGATTGTCATTGCCAAGTTTGCCAACGAATCTTTGTATTCTAATGAGCAATTGTATGAAAGCA





TGGGTCTTTTCCTACAAAAAACCAACATCATCAGAGATTACAATGAAGATTTGGTCGATGGTAG





ATCCTTCTGGCCCAAGGAAATCTGGTCACAATACGCTCCTCAGTTGAAGGACTTCATGAAACCT





GAAAACGAACAACTGGGGTTGGACTGTATAAACCACCTCGTCTTAAACGCATTGAGTCATGTTA





TCGATGTGTTGACTTATTTGGCCGGTATCCACGAGCAATCCACTTTCCAATTTTGTGCCATTCC





CCAAGTTATGGCCATTGCAACCTTGGCTTTGGTATTCAACAACCGTGAAGTGCTACATGGCAAT





GTAAAGATTCGTAAGGGTACTACCTGCTATTTAATTTTGAAATCAAGGACTTTGCGTGGCTGTG





TCGAGATTTTTGACTATTACTTACGTGATATCAAATCTAAATTGGCTGTGCAAGATCCAAATTT





CTTAAAATTGAACATTCAAATCTCCAAGATCGAACAGTTTATGGAAGAAATGTACCAGGATAAA





TTACCTCCTAACGTGAAGCCAAATGAAACTCCAATTTTCTTGAAAGTTAAAGAAAGATCCAGAT





ACGATGATGAATTGGTTCCAACCCAACAAGAAGAAGAGTACAAGTTCAATATGGTTTTATCTAT





CATCTTGTCCGTTCTTCTTGGGTTTTATTATATATACACTTTACACAGAGCGTGA




Farnesyl PP
MASEKEIRRERFLNVFPKLVEELNASLLAYGMPKEACDWYAHSLNYNTPGGKLNRGLSVV61DT
338



synthase
YAILSNKTVEQLGQEEYEKVAILGWCIELLQAYFLVADDMMDKSITRRGQPCWYKVPEVGEIAI





NDAFMLEAAIYKLLKSHFRNEKYYIDITELFHEVTFQTELGQLMDLITAPEDKVDLSKFSLKKH





SFIVTFKTAYYSFYLPVALAMYVAGITDEKDLKQARDVLIPLGEYFQIQDDYLDCFGTPEQIGK





IGTDIQDNKCSWVINKALELASAEQRKTLDENYGKKDSVAEAKCKKIFNDLKIEQLYHEYEESI





AKDLKAKISQVDESRGFKADVLTAFLNKVYKRSK




Farnesyl PP
ATGGCTTCAGAAAAAGAAATTAGGAGAGAGAGATTCTTGAACGTTTTCCCTAAATTAGTAGAGG
339



synthase gene
AATTGAACGCATCGCTTTTGGCTTACGGTATGCCTAAGGAAGCATGTGACTGGTATGCCCACTC




sequence
ATTGAACTACAACACTCCAGGCGGTAAGCTAAATAGAGGTTTGTCCGTTGTGGACACGTATGCT





ATTCTCTCCAACAAGACCGTTGAACAATTGGGGCAAGAAGAATACGAAAAGGTTGCCATTCTAG





GTTGGTGCATTGAGTTGTTGCAGGCTTACTTCTTGGTCGCCGATGATATGATGGACAAGTCCAT





TACCAGAAGAGGCCAACCATGTTGGTACAAGGTTCCTGAAGTTGGGGAAATTGCCATCAATGAC





GCATTCATGTTAGAGGCTGCTATCTACAAGCTTTTGAAATCTCACTTCAGAAACGAAAAATACT





ACATAGATATCACCGAATTGTTCCATGAGGTCACCTTCCAAACCGAATTGGGCCAATTGATGGA





CTTAATCACTGCACCTGAAGACAAAGTCGACTTGAGTAAGTTCTCCCTAAAGAAGCACTCCTTC





ATAGTTACTTTCAAGACTGCTTACTATTCTTTCTACTTGCCTGTCGCATTGGCCATGTACGTTG





CCGGTATCACGGATGAAAAGGATTTGAAACAAGCCAGAGATGTCTTGATTCCATTGGGTGAATA





CTTCCAAATTCAAGATGACTACTTAGACTGCTTCGGTACCCCAGAACAGATCGGTAAGATCGGT





ACAGATATCCAAGATAACAAATGTTCTTGGGTAATCAACAAGGCATTGGAACTTGCTTCCGCAG





AACAAAGAAAGACTTTAGACGAAAATTACGGTAAGAAGGACTCAGTCGCAGAAGCCAAATGCAA





AAAGATTTTCAATGACTTGAAAATTGAACAGCTATACCACGAATATGAAGAGTCTATTGCCAAG





GATTTGAAGGCCAAAATTTCTCAGGTCGATGAGTCTCGTGGCTTCAAAGCTGATGTCTTAACTG





CGTTCTTGAACAAAGTTTACAAGAGAAGCAAATAG




cycloartenol
MWKLKVAEGGTPWLRTLNNHVGRQVWEFDPHSGSPQDLDDIETARRNFHDNRFTHKHSDDLLMR
340



synthase
LQFAKENPMNEVLPKVKVKDVEDVTEEAVATTLRRGLNFYSTIQSHDGHWPGDYGGPMFLMPGL





VITLSVTGALNAVLTDEHRKEMRRYLYNHQNKDGGWGLHIEGPSTMFGSVLCYVTLRLLGEGPN





DGEGDMERGRDWILEHGGATYITSWGKMWLSVLGVFEWSGNNPMPPEWLLPYALPVHPGRMWCH





CRMVYLPMSYLYGKRFVGPITPTVLSLRKELFTVPYHDIDWNQARNLCAKEDLYYPHPLVQDIL





WATLHKFVEPVFMNWPGKKLREKAIKTAIEHIHYEDENTRYICIGPVNKVLNMLCCWVEDPNSE





AFKLHLPRIYDYLWVAEDGMKMQGYNGSQLWDTAFAAQAIISTNLIDEFGPTLKKAHAFIKNSQ





VSEDCPGDLSKWYRHISKGAWPFSTADHGWPISDCTAEGLKAVLLLSKIAPEIVGEPLDSKRLY





DAVNVILSLQNENGGLATYELTRSYTWLEIINPAETFGDIVIDCPYVECTSAAIQALATFGKLY





PGHRREEIQCCIEKAVAFIEKIQASDGSWYGSWGVCFTYGTWFGIKGLIAAGKNFSNCLSIRKA





CEFLLSKQLPSGGWAESYLSCQNKVYSNLEGNRSHVVNTGWAMLALIEAEQAKRDPTPLHRAAV





CLINSQLENGDFPQEEIMGVFNKNCMITYAAYRCIFPIWALGEYRRVLQAC




oxidosqualene
MWKLKVAEGGTPWLRTLNNHVGRQVWEFDPHSGSPQDLDDIETARRNFHDNRFTHKHSDDLLMR
341
PMID:


cylcases
LQFAKENPMNEVLPKVKVKDVEDVTEEAVATTLRRGLNFYSTIQSHDGHWPGDLGGPMFLMPGL

26058429



VITLSVTGALNAVLTDEHRKEMRRYLYNHQNKDGGWGLHIEGPSTMFGSVLCYVTLRLLGEGPN

Takase el



DGEGDMERGRDWILEHGGATYITSWGKMWLSVLGVFEWSGNNPMPPEIWLLPYALPVHPGRMWC

al., 2015



HCRMVYLPMSYLYGKRFVGPITPTVLSLRKELFTVPYHDIDWNQARNLCAKEDLYYPHPLVQDI





LWATLHKFVEPVFMNWPGKKLREKAIKTAIEHIHYEDENTRYICIGPVNKVLNMLCCWVEDPNS





EAFKLHLPRIYDYLWVAEDGMKMQGYNGSQLWDTAFAAQAIISTNLIDEFGPTLKKAHAFIKNS





QVSEDCPGDLSKWYRHISKGAWPFSTADHGWPISDCTAEGLKAVLLLSKIAPEIVGEPLDSKRL





YDAVNVILSLQNENGGLATYELTRSYTWLEIINPAETFGDIVIDCPYVECTSAAIQALATFGKL





YPGHRREEIQCCIEKAVAFIEKIQASDGSWYGSWGVCFTYGTWFGIKGLIAAGKNFSNCLSIRK





ACEFLLSKQLPSGGWAESYLSCQNKVYSNLEGNRSHVVNTGWAMLALIEAEQAKRDPTPLHRAA





VCLINSQLENGDFPQEEIMGVFNKNCMITYAAYRCIFPIWALGEYRRVLQAC




oxidosqualene
ATGTGGAGGCTAACAATAGGTGAGGGCGGCGGTCCGTGGCTGAAGTCGAACAATGGCTTCCTTG
342



cylcases gene
GCCGCCAAGTGTGGGAGTACGACGCCGATGCCGGCACGCCGGAAGAGCGTGCCGAGGTTGAGAG




sequence
GGTGCGTGCGGAATTCACAAAGAACAGGTTCCAGAGGAAGGAGTCACAGGACCTTCTTCTACGC





TTGCAGTACGCAAAAGACAACCCTCTTCCGGCGAATATTCCGACAGAAGCCAAGCTTGAAAAGA





GTACAGAGGTCACTCACGAGACTATCTACGAATCATTGATGCGAGCTTTACATCAATATTCCTC





TCTACAAGCAGACGATGGGCATTGGCCTGGTGATTACAGTGGGATTCTCTTCATTATGCCTATC





ATTATATTCTCTTTATATGTTACTAGATCACTTGACACCTTTTTATCTCCGGAACATCGTCATG





AGATATGTCGCTACATTTACAATCAACAGAATGAAGATGGTGGTTGGGGAAAAATGGTTCTTGG





CCCAAGTACCATGTTTGGATCGTGTATGAATTATGCAACCTTAATGATTCTTGGCGAGAAGCGA





AATGGTGATCATAAGGATGCATTGGAAAAAGGGCGTTCTTGGATTTTATCTCATGGAACTGCAA





CTGCAATACCACAGTGGGGAAAAATATGGTTGTCGATAATTGGCGTTTACGAATGGTCAGGAAA





CAATCCTATTATACCTGAATTGTGGTTGGTTCCACATTTTCTTCCGATTCACCCAGGTCGTTTT





TGGTGTTTTACCCGGTTGATATACATGTCAATGGCATATCTCTATGGTAAGAAATTTGTTGGGC





CTATTAGTCCTACAATATTAGCTCTGCGACAAGACCTCTATAGTATACCTTACTGCAACATTAA





TTGGGACAAGGCGCGTGATTATTGTGCAAAGGAGGACCTTCATTACCCACGCTCACGGGCACAA





GATCTTATATCTGGTTGCCTAACGAAAATTGTGGAGCCAATTTTGAATTGGTGGCCAGCAAACA





AGCTAAGAGATAGAGCTTTAACTAACCTCATGGAGCATATCCATTATGACGACGAATCAACCAA





ATATGTGGGCATTTGCCCTATTAACAAGGCATTGAACATGATTTGTTGTTGGGTAGAAAACCCA





AATTCGCCTGAATTCCAACAACATCTTCCACGATTCCATGACTATTTGTGGATGGCGGAGGATG





GAATGAAGGCACAGGTATATGATGGATGTCATAGCTGGGAACTAGCGTTCATAATTCATGCCTA





TTGTTCCACGGATCTTACTAGCGAGTTTATCCCGACTCTAAAAAAGGCGCACGAGTTCATGAAG





AACTCACAGGTTCTTTTCAACCACCCAAATCATGAAAGCTATTATCGCCACAGATCAAAAGGCT





CATGGACCCTTTCAAGTGTAGATAATGGTTGGTCTGTATCTGATTGTACTGCGGAAGCTGTTAA





GGCATTGCTACTATTATCAAAGATATCCGCTGACCTTGTTGGCGATCCAATAAAACAAGACAGG





TTGTATGATGCCATTGATTGCATCCTATCTTTCATGAATACAGATGGAACATTTTCTACCTACG





AATGCAAACGGACATTCGCTTGGTTAGAGGTTCTCAACCCTTCTGAGAGTTTTCGGAACATTGT





CGTGGACTATCCATCTGTTGAATGCACATCATCTGTGGTTGATGCTCTCATATTATTTAAAGAG





ACGAATCCACGATATCGAAGAGCAGAGATAGATAAATGCATTGAAGAAGCTGTTGTATTTATTG





AGAACAGTCAAAATAAGGATGGTTCATGGTATGGCTCATGGGGTATATGTTTCGCATATGGATG





CATGTTTGCAGTAAGGGCGTTGGTTGCTACAGGAAAAACCTACGACAATTGTGCTTCTATCAGG





AAATCATGCAAATTTGTCTTATCAAAGCAACAAACAACAGGTGGATGGGGTGAAGACTATCTTT





CTAGTGACAATGGGGAATATATTGATAGCGGTAGGCCTAATGCTGTGACCACCTCATGGGCAAT





GTTGGCTTTAATTTATGCTGGACAGGTTGAACGTGACCCAGTACCACTGTATAATGCTGCAAGA





CAGCTAATGAATATGCAGCTAGAAACAGGTGACTTCCCCCAACAGGAACACATGGGTTGCTTCA





ACTCCTCCTTGAACTTCAACTACGCCAACTACCGCAATCTATACCCGATTATGGCTCTTGGGGA





ACTTCGCCGTCGACTTCTTGCGATTAAGAGCTGA




cycloartenol
MWKLKVAEGGTPWLRTLNNHVGRQVWEFDPHSGSPQDLDDIETARRNFHDNRFTHKHSDDLLMR
343
PMID:


synthase
LQFAKENPMNEVLPKVKVKDVEDVTEEAVATTLRRGLNFYSTIQSHDGHWPGDLGGPMFLMPGL

26058429



VITLSVTGALNAVLTDEHRKEMRRYLYNHQNKDGGWGLHIEGPSTMFGSVLCYVTLRLLGEGPN

Takase el



DGEGDMERGRDWILEHGGATYITSWGKMWLSVLGVFEWSGNNPMPPEIWLLPYALPVHPGRMWC

al., 2015



HCRMVYLPMSYLYGKRFVGPITPTVLSLRKELFTVPYHDIDWNQARNLCAKEDLYYPHPLVQDI





LWATLHKFVEPVFMNWPGKKLREKAIKTAIEHIHYEDENTRYICIGPVNKVLNMLCCWVEDPNS





EAFKLHLPRIYDYLWVAEDGMKMQGYNGSQLWDTAFAAQAIISTNLIDEFGPTLKKAHAFIKNS





QVSEDCPGDLSKWYRHISKGAWPFSTADHGWPISDCTAEGLKAVLLLSKIAPEIVGEPLDSKRL





YDAVNVILSLQNENGGLATYELTRSYTWLEIINPAETFGDIVIDCPYVECTSAAIQALATFGKL





YPGHRREEIQCCIEKAVAFIEKIQASDGSWYGSWGVCFTYGTWFGIKGLIAAGKNFSNCLSIRK





ACEFLLSKQLPSGGWAESYLSCQNKVYSNLEGNRSHVVNTGWAMLALIEAEQAKRDPTPLHRAA





VCLINSQLENGDFPQEEIMGVFNKNCMITYAAYRCIFPIWALGEYRRVLQAC




beta-amyrin
MWRLTIGEGGGPWLKSNNGFLGRQVWEYDADAGTPEERAEVERRAEFTKNRFQRKESQDLLLRL
344



synthase
QYAKDNPLPANIPTEAKLEKSTEVTHETIYESLMRALHQYSSLQADDGHWPGDYSGILFIMPII





IFSLYVTRSLDTFLSPEHRHEICRYIYNQQNEDGGWGKMVLGPSTMFGSCMNYATLMILGKRNG





DHKDALEKGRSWILSHGTATAIPQWGKIWLSIIGVYEWSGNNPIIPELWLVPHFLPIHPGRFWC





FTRLIYMSMAYLYGKKFVGPISPTILALRQDLYSIPYCNINWDKARDYCAKEDLHYPRSRAQDL





ISGCLTKIVEPILNWWPANKLRDRALTNLMEHIHYDDESTKYVGICPINKALNMICCWVENPNS





PEFQQHLPRFHDYLWMAEDGMKAQVYDGCHSWELAFIIHAYCSTDLTSEFIPTLKKAHEFMKNS





QVLFNHPNHESYYRHRSKGSWTLSSVDNGWSVSDCTAEAVKALLLLSKISADLVGDPIKQDRLY





DAIDCILSFMNTDGTFSTYECKRTFAWLEVLNPSESFRNIVVDYPSVECTSSVVDALILFKETN





PRYRRAEIDKCIEEAVVFIENSQNKDGSWYGSWGICFAYGCMFAVRALVATGKTYDNCASIRKS





CKFVLSKQQTTGGWGEDYLSSDNGEYIDSGRPNAVTTSWAMLALIYAGQVERDPVPLYNAARQL





MNMQLETGDFPQQEHMGCFNSSLNFNYANYRNLYPIMALGELRRRLLAIKS




beta-amyrin
ATGTGGAGGCTAACAATAGGTGAGGGCGGCGGTCCGTGGCTGAAGTCGAACAATGGCTTCCTTG
345



synthase gene
GCCGCCAAGTGTGGGAGTACGACGCCGATGCCGGCACGCCGGAAGAGCGTGCCGAGGTTGAGAG




sequence
GGTGCGTGCGGAATTCACAAAGAACAGGTTCCAGAGGAAGGAGTCACAGGACCTTCTTCTACGC





TTGCAGTACGCAAAAGACAACCCTCTTCCGGCGAATATTCCGACAGAAGCCAAGCTTGAAAAGA





GTACAGAGGTCACTCACGAGACTATCTACGAATCATTGATGCGAGCTTTACATCAATATTCCTC





TCTACAAGCAGACGATGGGCATTGGCCTGGTGATTACAGTGGGATTCTCTTCATTATGCCTATC





ATTATATTCTCTTTATATGTTACTAGATCACTTGACACCTTTTTATCTCCGGAACATCGTCATG





AGATATGTCGCTACATTTACAATCAACAGAATGAAGATGGTGGTTGGGGAAAAATGGTTCTTGG





CCCAAGTACCATGTTTGGATCGTGTATGAATTATGCAACCTTAATGATTCTTGGCGAGAAGCGA





AATGGTGATCATAAGGATGCATTGGAAAAAGGGCGTTCTTGGATTTTATCTCATGGAACTGCAA





CTGCAATACCACAGTGGGGAAAAATATGGTTGTCGATAATTGGCGTTTACGAATGGTCAGGAAA





CAATCCTATTATACCTGAATTGTGGTTGGTTCCACATTTTCTTCCGATTCACCCAGGTCGTTTT





TGGTGTTTTACCCGGTTGATATACATGTCAATGGCATATCTCTATGGTAAGAAATTTGTTGGGC





CTATTAGTCCTACAATATTAGCTCTGCGACAAGACCTCTATAGTATACCTTACTGCAACATTAA





TTGGGACAAGGCGCGTGATTATTGTGCAAAGGAGGACCTTCATTACCCACGCTCACGGGCACAA





GATCTTATATCTGGTTGCCTAACGAAAATTGTGGAGCCAATTTTGAATTGGTGGCCAGCAAACA





AGCTAAGAGATAGAGCTTTAACTAACCTCATGGAGCATATCCATTATGACGACGAATCAACCAA





ATATGTGGGCATTTGCCCTATTAACAAGGCATTGAACATGATTTGTTGTTGGGTAGAAAACCCA





AATTCGCCTGAATTCCAACAACATCTTCCACGATTCCATGACTATTTGTGGATGGCGGAGGATG





GAATGAAGGCACAGGTATATGATGGATGTCATAGCTGGGAACTAGCGTTCATAATTCATGCCTA





vTTGTTCCACGGATCTTACTAGCGAGTTTATCCCGACTCTAAAAAAGGCGCACGAGTTCATGAAG





AACTCACAGGTTCTTTTCAACCACCCAAATCATGAAAGCTATTATCGCCACAGATCAAAAGGCT





CATGGACCCTTTCAAGTGTAGATAATGGTTGGTCTGTATCTGATTGTACTGCGGAAGCTGTTAA





GGCATTGCTACTATTATCAAAGATATCCGCTGACCTTGTTGGCGATCCAATAAAACAAGACAGG





TTGTATGATGCCATTGATTGCATCTATCTTTCATGAATACAGATGGAACATTTTCTACCTACGA





ATGCAAACGGACATTCGCTTGGTTAGAGGTTCTCAACCCTTCTGAGAGTTTTCGGAACATTGTC





GTGGACTATCCATCTGTTGAATGCACATCATCTGTGGTTGATGCTCTCATATTATTTAAAGAGA





CGAATCCACGATATCGAAGAGCAGAGATAGATAAATGCATTGAAGAAGCTGTTGTATTTATTGA





GAACAGTCAAAATAAGGATGGTTCATGGTATGGCTCATGGGGTATATGTTTCGCATATGGATGC





ATGTTTGCAGTAAGGGCGTTGGTTGCTACAGGAAAAACCTACGACAATTGTGCTTCTATCAGGA





AATCATGCAAATTTGTCTTATCAAAGCAACAAACAACAGGTGGATGGGGTGAAGACTATCTTTC





TAGTGACAATGGGGAATATATTGATAGCGGTAGGCCTAATGCTGTGACCACCTCATGGGCAATG





TTGGCTTTAATTTATGCTGGACAGGTTGAACGTGACCCAGTACCACTGTATAATGCTGCAAGAC





AGCTAATGAATATGCAGCTAGAAACAGGTGACTTCCCCCAACAGGAACACATGGGTTGCTTCAA





CTCCTCCTTGAACTTCAACTACGCCAACTACCGCAATCTATACCCGATTATGGCTCTTGGGGAA





CTTCGCCGTCGACTTCTTGCGATTAAGAGCTGA




Modified sequence
MWRLTIGEGGGPWLKSNNGFLGRQVWEYDADAGTPEERAEVERVRAEFTKNRFQRKESQDLLLR
346
PMID:


of B-amyrin
LQYAKDNPLPANIPTEAKLEKSTEVTHETIYESLMRALHQYSSLQADDGHWPGDYSGILFIMPI

27412861


synthase from
IIFSLYVTRSLDTFLSPEHRHEICRYIYNQQNEDGGWGKMVLGPSTMFGSCMNYATLMILGEKR

(Salmon et


Avena strigosa
NGDHKDALEKGRSWILSHGTATAIPQWGKIWLSIIGVYEWSGNNPIIPELWLVPHFLPIHPGRF

al., 216)


(AJ311789), which
WCFTRLIYMSMAYLYGKKFVGPISPTILALRQDLYSIPYCNINWDKARDYCAKEDLHYPRSRAQ




reacts
DLI SGCLTKIVEPILNWWPANKLRDRALTNLMEHIHYDDESTKYVGICPINKALNMICCWVENP




preferentially
NSPEFQQHLPRFHDYLWMAEDGMKAQVYDGCHSWELAFIIHAYCSTDLTSEFIPTLKKAHEFMK




with
NSQVLFNHPNHESYYRHRSKGSWTLSSVDNGWSVSDCTAEAVKALLLLSKISADLVGDPIKQDR




diepoxysqualene
LYDAIDCILSFMNTDGTFSTYECKRTFAWLEVLNPSESFRNIVVDYPSVECTSSVVDALILFKE





TNPRYRRAEIDKCIEEAVVFIENSQNKDGSWYGSWGICFAYGCMFAVRALVATGKTYDNCASIR





KSCKFVLSKQQTTGGWGEDYLSSDNGEYIDSGRPNAVTTSWAMLALIYAGQVERDPVPLYNAAR





QLMNMQLETGDFPQQEHMGCFNSFLNFNYANYRNLYPIMALGELRRRLLAIKS




Modified sequence
MWKLKIGKGNGEDPHLFSSNNFVGRQTWKFDHKAGSPEERAAVEEARRGFLDNRFRVKGCSDLL
347
PMID:


of from
WRMQFLREKKFEQGIPQLKATNIEEITYETTTNALRRGVRYFTALQASDGHWPGEITGPLFFLP

27412861


Arabidopsis
PLIFCLYITGHLEEVFDAEHRKEMLRHIYCHQNEDGGWGLHIESKSVMFCTVLNYICLRMLGEN

(Salmon et


thaliana (AtLup1,
PEQDACKRARQWILDRGGVIFIPSWGKFWLSILGVYDWSGTNPTPPELLMLPSFLPIHPGKILC

al. , 2016)


Q9C5M3.1), which
YSRMVSIPMSYLYGKRFVGPITPLILLLREELYLEPYEEINWKKSRRLYAKEDMYYAHPLVQDL




reacts
LSDTLQNFVEPLLTRWPLNKLVREKALQLTMKHIHYEDENSHYITIGCVEKVLCMLACWVENPN




preferentially
GDYFKKHLARIPDYMWVAEDGMKMQSFGCQLWDTGFAIQALLASNLPDETDDALKRGHNYIKAS




with
QVRENPSGDFRSMYRHISKGAWTFSDRDHGWQVSDCTAEALKCCLLLSMMSADIVGQKIDDEQL




diepoxysqualene
YDSVNLLLSLQSGNGGVNAWEPSRAYKWLELLNPTEFMANTMVEREFVECTSSVIQALDLFRKL





YPDHRKKEINRSIEKAVQFIQDNQTPDGSWYGNWGVCFIYATWFALGGLAAAGETYNDCLAMRN





GVHFLLTTQRDDGGWGESYLSCSEQRYIPSEGERSNLVQTSWAMMALIHTGQAERDLIPLHRAA





KLIINSQLENGDFPQQEIVGAFMNFCMLHYATYRNTFPLWALAEYRKVVFIVN




XP_001396506.2
MCNKSNYSSPKWWKESVVYQVYPASFNCGKSTTNTNGWGDVTGIIEKVPYLESLGVDIVWLSPI
352




YTSPQVDMGYDIADYESIDPRYGTLADVDLLIKTLKDHDMKLMMDLVVNHTSDQHSWFVESANS





KDSPKRDWYIWRPAKGFDEAGNPVPPNNWAQILGDTLSAWTWHAETQEFYLTLHTSAQAELNWE





NPDVVTAVYDVMEFWLRRGICGFRMDVINFISKDQSFPDAPIIDPASKYQPGEQFYTNGPRFHE





FMHGIYDNVLSKYDTITVGETPYVTDMKEIIKTVGSTAKELNMAFNFDHMEIEDIKTKGESKWS





LRDWKLTELKGILSGWQKRMREWDGWNAIFLECHDQARSVSRYTNDSDEFRDRGAKLLALLETT





LGGTIFLYQGQEIGMRNFPVEWDPDTEYKDIESVNFWKKSKELHPVGSEGLAQARTLLQKKARD





HARTPMQWSADPHAGFTVPDATPWMRVNDDYGTVNVEAQMSFPWEMKGELSVWQYWQQALQRRK





LHKGAFVYGDFEDLDYHNELVFAYSRTSADGKETWLVAMNWTTDAVEWTVPSGIHVTRWVSSTL





QTAPLMAGQSTVTLRALEGVVGCCS




beta-glucosidase
MTSFHDGVKLSTVTCVLSGLVALGSAGPTAASANAQVAAAAAAQAWVPDGYYVPPYYPAPYGGW
354



[Trichodermareesei]
VEDWQESYTKAKALVDSMTLAEKTNITAGTGIYMGERCAGNTGSAFRVSFPQLCLNDSPAGVRH




GenBank: BAP5915.1
ADNVTAFPDGITVGATFDKALMYKRGVAIGKENRGKGVNVWLGPTVGPIGRKPKGGRNWEGFGA





DPVLQAVGARETIKGVQEQGVIATIKHFIGNEQEMYRMYNPFQYAYSSNIDDRTLHEVYAWPFA





EGIRAGVGAVMMAYNAVNGTACSQHPYLMSALLKDEMGFQGFIMTDWLAHMSGVASAIAGLDMD





MPGDVQIPFFGGSYWMYELTRSALNGSVPMDRINDAATRIAAAWYKMGQDKGFPATNFDTNSRA





AFNPLYPAALPLSPFGITNEFVPVQDDHDVIARQISQEAITLLKNDGDILPLSPSQHLKVFGTD





AQKNPDGINSCTDRNCNKGTLGQGWGSGTVDYPYLDDPISAITAEADNVTFYNTDKFPSVGEVS





DSDVAIVFVNSDAGENTYTVEGNHGDRDKSGLYAWHDGDKLVQDAASKFSNVIVVIHTVGPLIL





EKWIDLPSVKAVLVAHLPGQEAGKSLTNVLFGHASPCGHLPYSITKEEDDLPKSVTTLIDSEFL





NQPQDTYTEGLYIDYRWLNKNKTKPRYAFGHGLSYTNFTFKAASIKQVARLSAYPPARPAKGST





PDFAQSIPSASEAVAPSGFGKIPRYIYSWLSQGDANRAISDGKTGKYPYPDGYSTTQKPGARAG





GGEGGNPALWDVAYSLTVTVQNTGDEYAGKASVQAYLQFPDDIDYDTPIIQLRDFEKTKELKPG





ETTTVTLTLTRKDVSVWDVVAQDWKVPAVDGGYKVWIGDASDSLSIVCHTDTLECETGVVGPV




Beta-glucosidase
MMGFDVEDVLSQLSQNEKIALLSGIDFWHTYPIPKYNVPSVRLTDGPNGIRGTKFFAGIPAACL
355



[Trichodermareesei]
PCGTALASTWDKQLLKKAGKLLGDECIAKGAHCWLGPTINTPRSPLGGRGFESFSEDPYLSGIL




BAP59014.1
AASMILGCESTGVISAVKHFVANDQEHERRAVDCLITQRALREVYLRPFQIVARDARPGALMTS




GI:690966588
YNKVNGKHVADSAEFLQGILRTEWNWDPLIVSDWYGTYTTIDAIKAGLDLEMPGVSRYRGKYIE





SALQARLLKQSTIDERARRVLRFAQKASHLKVSEVEQGRDFPEDRVLNRQICGSSIVLLKNENS





ILPLPKSVKKVALVGSHVRLPAISGGGSASLVPYYAISLYDAVSEVLAGATITHEVGAYAHQML





PVIDAMISNAVIHFYNDPIDVKDRKLLGSENVSSTSFQLMDYNNIPTLNKAMFWGTLVGEFIPT





ATGIWEFGLSVFGTADLYIDNELVIENTTHQTRGTAFFGKGTTEKVATRRMVAGSTYKLRLEFG





SANTTKMETTGVVNFGGGAVHLGACLKVDPQEMIARAVKAAADADYTIICTGLSGEWESEGFDR





PHMDLPPGVDTMISQVLDAAPNAVVVNQSGTPVTMSWAHKAKAIVQAWYGGNETGHGISDVLFG





NVNPSGKLSLSWPVDVKHNPAYLNYASVGGRVLYGEDVYVGYKFYDKTEREVLFPFGHGLSYAT





FKLPDSTVRTVPETFHPDQPTVAIVKIKNTSSVPGAQVLQLYISAPNSPTHRPVKELHGFEKVY





LEAGEEKEVQIPIDQYATSFWDEIESMWKSERGIYDVLVGFSSQEISGKGKLIVPETRFWMGL




beta-glucosidase
MLPKDFQWGFATAAYQIEGAVDQDGRGPSIWDTFCAQPGKIADGSSGVTACDSYNRTAEDIALL
356



reesel
KSLGAKSYRFSISWSRIIPEGGRGDAVNQAGIDHYVKFVDDLLDAGITPFITLFHWDLPEGLHQ




[Trichodermareesei]
RYGGLLNRTEFPLDFENYARVMFRALPKVRNWITFNEPLCSAIPGYGSGTFAPGRQSTSEPWTV




AHK23047.1
GHNILVAHGRAVKAYRDDFKPASGDGQIGIVLNGDFTYPWDAADPADKEAAERRLEFFTAWFAD




GI:588294532
PIYLGDYPASMRKQLGDRLPTFTPEERALVHGSNDFYGMNHYTSNYIRHRSSPASADDTVGNVD





VLFTNKQGNCIGPETQSPWLRPCAAGFRDFLVWISKRYGYPPIYVTENGTSIKGESDLPKEKIL





EDDFRVKYYNEYIRAMVTAVELDGVNVKGYFAWSLMDNFEWADGYVTRFGVTYVDYENGQKRFP





KKSAKSLKPLFDELIAAA




beta-glucosidase
MLPKDFQWGFATAAYQIEGAVDQDGRGPSIWDTFCAQPGKIADGSSGVTACDSYNRTAEDIALL
357



[Trichodermareesei]
KSLGAKSYRFSISWSRIIPEGGRGDAVNQAGIDHYVKFVDDLLDAGITPFITLFHWDLPEGLHQ




BAA74959.1
RYGGLLNRTEFPLDFENYARVMFRALPKVRNWITFNEPLCSAIPGYGSGTFAPGRQSTSEPWTV




GI:4249562
GHNILVAHGRAVKAYRDDFKPASGDGQIGIVLNGDFTYPWDAADPADKEAAERRLEFFTAWFAD





PIYLGDYPASMRKQLGDRLPTFTPEERALVHGSNDFYGMNHYTSNYIRHRSSPASADDTVGNVD





VLFTNKQGNCIGPETQSPWLRPCAAGFRDFLVWISKRYGYPPIYVTENGTSIKGESDLPKEKIL





EDDFRVKYYNEYIRAMVTAVELDGVNVKGYFAWSLMDNFEWADGYVTRFGVTYVDYENGQKRFP





KKSAKSLKPLFDELIAAA




beta-glucosidase
MLPKDFQWGFATAAYQIEGAVDQDGRGPSIWDTFCAQPGKIADGSSGVTACDSYNRTAEDIALL
358



[Trichoderma
KSLGAKSYRFSISWSRIIPEGGRGDAVNQAGIDHYVKFVDDLLDAGITPFITLFHWDLPEGLHQ





reesei RUT C-3

RYGGLLNRTEFPLDFENYARVMFRALPKVRNWITFNEPLCSAIPGYGSGTFAPGRQSTSEPWTV




ETS5552.1
GHNILVAHGRAVKAYRDDFKPASGDGQIGIVLNGDFTYPWDAADPADKEAAERRLEFFTAWFAD




GI:572282538
PIYLGDYPASMRKQLGDRLPTFTPEERALVHGSNDFYGMNHYTSNYIRHRSSPASADDTVGNVD





VLFTNKQGNCIGPETQSPWLRPCAAGFRDFLVWISKRYGYPPIYVTENGTSIKGESDLPKEKIL





EDDFRVKYYNEYIRAMVTAVELDGVNVKGYFAWSLMDNFEWADGYVTRFGVTYVDYENGQKRFP





KKSAKSLKPLFDELIAAA




Chain D, Crystal
MHHHHHHMLPKDFQWGFATAAYQIEGAVDQDGRGPSIWDTFCAQPGKIADGSSGVTACDSYNRT
359



Structure Of Beta-
AEDIALLKSLGAKSYRFSISWSRIIPEGGRGDAVNQAGIDHYVKFVDDLLDAGITPFITLFHWD




Glucosidase 2 From
LPEGLHQRYGGLLNRTEFPLDFENYARVMFRALPKVRNWITFNEPLCSAIPGYGSGTFAPGRQS




Fungus Trichoderma
TSEPWTVGHNILVAHGRAVKAYRDDFKPASGDGQIGIVLNGDFTYPWDAADPADKEAAERRLEF





Reesei In Complex

FTAWFADPIYLGDYPASMRKQLGDRLPTFTPEERALVHGSNDFYGMNHYTSNYIRHRSSPASAD




With Tris
DTVGNVDVLFTNKQGNCIGPETQSPWLRPCAAGFRDFLVWISKRYGYPPIYVTENGTSIKGESD




3AHY_D
LPKEKILEDDFRVKYYNEYIRAMVTAVELDGVNVKGYFAWSLMDNFEWADGYVTRFGVTYVDYE




GI :303324838
NGQKRFPKKSAKSLKPLFDELIAAA




Chain C, Crystal
MHHHHHHMLPKDFQWGFATAAYQIEGAVDQDGRGPSIWDTFCAQPGKIADGSSGVTACDSYNRT
360



Structure Of Beta-
AEDIALLKSLGAKSYRFSISWSRIIPEGGRGDAVNQAGIDHYVKFVDDLLDAGITPFITLFHWD




Glucosidase 2 From
LPEGLHQRYGGLLNRTEFPLDFENYARVMFRALPKVRNWITFNEPLCSAIPGYGSGTFAPGRQS




Fungus Trichoderma
TSEPWTVGHNILVAHGRAVKAYRDDFKPASGDGQIGIVLNGDFTYPWDAADPADKEAAERRLEF





Reesei In Complex

FTAWFADPIYLGDYPASMRKQLGDRLPTFTPEERALVHGSNDFYGMNHYTSNYIRHRSSPASAD




With Tris
DTVGNVDVLFTNKQGNCIGPETQSPWLRPCAAGFRDFLVWISKRYGYPPIYVTENGTSIKGESD




3AHY_C GI:33324837
LPKEKILEDDFRVKYYNEYIRAMVTAVELDGVNVKGYFAWSLMDNFEWADGYVTRFGVTYVDYE





NGQKRFPKKSAKSLKPLFDELIAAA




Chain B, Crystal
MHHHHHHMLPKDFQWGFATAAYQIEGAVDQDGRGPSIWDTFCAQPGKIADGSSGVTACDSYNRT
361



Structure Of Beta-
AEDIALLKSLGAKSYRFSISWSRIIPEGGRGDAVNQAGIDHYVKFVDDLLDAGITPFITLFHWD




GlucosIdase 2 From
LPEGLHQRYGGLLNRTEFPLDFENYARVMFRALPKVRNWITFNEPLCSAIPGYGSGTFAPGRQS




Fungus Trichoderma
TSEPWTVGHNILVAHGRAVKAYRDDFKPASGDGQIGIVLNGDFTYPWDAADPADKEAAERRLEF





Reesei In Complex

FTAWFADPIYLGDYPASMRKQLGDRLPTFTPEERALVHGSNDFYGMNHYTSNYIRHRSSPASAD




With Tris
DTVGNVDVLFTNKQGNCIGPETQSPWLRPCAAGFRDFLVWISKRYGYPPIYVTENGTSIKGESD




3AHY_B
LPKEKILEDDFRVKYYNEYIRAMVTAVELDGVNVKGYFAWSLMDNFEWADGYVTRFGVTYVDYE




GI:303324836
NGQKRFPKKSAKSLKPLFDELIAAA




Chain A, Crystal
MHHHHHHMLPKDFQWGFATAAYQIEGAVDQDGRGPSIWDTFCAQPGKIADGSSGVTACDSYNRT
362



Structure Of Beta-
AEDIALLKSLGAKSYRFSISWSRIIPEGGRGDAVNQAGIDHYVKFVDDLLDAGITPFITLFHWD




Glucosidase 2 From
LPEGLHQRYGGLLNRTEFPLDFENYARVMFRALPKVRNWITFNEPLCSAIPGYGSGTFAPGRQS




Fungus Trichoderma
TSEPWTVGHNILVAHGRAVKAYRDDFKPASGDGQIGIVLNGDFTYPWDAADPADKEAAERRLEF




Reese In Complex
FTAWFADPIYLGDYPASMRKQLGDRLPTFTPEERALVHGSNDFYGMNHYTSNYIRHRSSPASAD




With Tris
DTVGNVDVLFTNKQGNCIGPETQSPWLRPCAAGFRDFLVWISKRYGYPPIYVTENGTSIKGESD




3AHY_A
LPKEKILEDDFRVKYYNEYIRAMVTAVELDGVNVKGYFAWSLMDNFEWADGYVTRFGVTYVDYE




GI:303324835
NGQKRFPKKSAKSLKPLFDELIAAA




beta-glucosidase-
MRLKHWKTAAFAAASIVSQVEAGFWNFGRDTSSSTRPPTKDQFIESLISKLTLEDLVLQLHLMF
363



like protein
ADDIVGAASHNELYDQTMHLSPKSPIGTIHDWYPMNKSYFNVLQKLQLDNSHVKIPMMLVEECL




[Trichodermareesei]
HGVGSFKQSIFPQNIAMAASFDTDIVYRVGRAIGTEARSIGIHGCFSPVLDLAQDPRWGRVQED




reesei
FGEDKILTSHIGSAYSSGLSKNKTWSDPDAVFPIMKHFAAHGAAQAGHNTAPFTGLGPRQIKQD




BAP59016.1
LLVPFKANYDLGGARGVMMAYNEIDGVPSCVNPMLYEVLDDWGYDGIVIGDDTAMRNLLTQHRV




GI:690966592
TTSEADTLQQWYNAGGQIDFYDFDLDSKINITKALVANGTVPLKTLQSHVRKILGVKWDLGLFE





NPYIPEHIDPLAVVASHQDVALEAAHKSIILLKNDNRTLPLSSPKKIALIGPFADTINLGDYSG





ALGQYPAKYTQTLREGVLRHANKSGHTVRTSWGTNSWEYNNQYVIPGYLLSTNGKPGGLKATYY





AHTNFTSPKATRVEVPAQDWGLYPPPGLSSNNFSAVWEGELESPTDLDVNGWIGLAIGPNSTSK





LYVDGKLISSKGYSGSGNLLGTIEGYAWTQANSTLPPQGGVEFTFKKNAKHHVRIEFQSWNNYK





KTANVNSVNSQLIFWWNLVSPNGKALDQAVSIAKDSDVVILAVGAAWNSDGESGDRGTLGLAPS




extracellularbeta
QDELAREVFALGKPVVLVLEGGRPSAIPDHYGNSSAVLSTFFGGQAGGQAIADVLFGDFNPGAR
364



glucosidase
VPITVPWSVGQIPAYYNYKPSARAAQYLDIPSEPIYPFGYGLSYTTFSTSSPTASVSGSSKRSS




1713235A GI:227874
VDAQTSQSFGSGDWITFSVTVKNTGSVAGSYVAQVYLLGRVSTITQPVKQLVGFQRVYLEAGQK





KTANIQLEVDRYLKIINRKDEWELEKGSYTFALLEHGGSNADTSKNVTLQCVG





MRYRTAAALALATGPFARADSHSTSGASAEAVVPPAGTPWGTAYDKAKAALAKLNLQDKVGIVS





GVGWNGGPCVGNTSPASKISYPSLCLQDGPLGVRYSTGSTAFTPGVQAASTWDVNLIRERGQFI





GEEVKASGIHVILGPVAGPLGKTPQGGRNWEGFGVDPYLTGIAMGQTINGIQSVGVQATAKHYI





LNEQELNRETISSNPDDRTLHELYTWPFADAVQANVASVMCSYNKVNTTWACEDQYTLQTVLKD





QLGFPGYVMTDWDAQHTTVQSANSGLDMSMPGTDFNGNNRLWGPALTNAVNSNQVPTSRVDDMV





TRILAAWYLTGQDQAGYPSFNISRNVQGNHKTNVRAIARDGIVLLKNDANILPLKKPASIAVVG





SAAIIGNHARNSPSCNDKGCDDGALGMGWGSGAVNYPYFVAPYDAINTRASSQGTQVTLSNTDN





TSSGASAARGKDVAIVFITADSGEGYITVEGNAGDRNNLDPWHNGNALVQAVAGANSNVIVVVH





SVGAIILEQILALPQVKAVVWAGLPSQESGNALVDVLWGDVSPSGKLVYTIAKSPNDYNTRIVS





GGSDSFSEGLFIDYKHFDDANITPRYEFGYGLSYTKFNYSRLSVLSTAKSGPATGAVVPGGPSD





LFQNVATVTVDIANSGQVTGAEVAQLYITYPSSAPRTPPKQLRGFAKLNLTPGQSGTATFNIRR





RDLSYWDTASQKWVVPSGSFGISVGASSRDIRLTSTLSVA




Chain A, Crystal
VVPPAGTPWGTAYDKAKAALAKLNLQDKVGIVSGVGWNGGPCVGNTSPASKISYPSLCLQDGPL
365



Structure Of A
GVRYSTGSTAFTPGVQAASTWDVNLIRERGQFIGEEVKASGIHVILGPVAGPLGKTPQGGRNWE




Glycoside
GFGVDPYLTGIAMGQTINGIQSVGVQATAKHYILNEQELNRETISSNPDDRTLHELYTWPFADA




Hydrolase Family 3
VQANVASVMCSYNKVNTTWACEDQYTLQTVLKDQLGFPGYVMTDWNAQHTTVQSANSGLDMSMP




Beta-glucosidase,
GTDFNGNNRLWGPALTNAVNSNQVPTSRVDDMVTRILAAWYLTGQDQAGYPSFNISRNVQGNHK




Bg11 From Hypocrea
TNVRAIARDGIVLLKNDANILPLKKPASIAVVGSAAIIGNHARNSPSCNDKGCDDGALGMGWGS





Jecorina

GAVNYPYFVAPYDAINTRASSQGTQVTLSNTDNTSSGASAARGKDVAIVFITADSGEGYITVEG




3ZZ1_A
NAGDRNNLDPWHNGNALVQAVAGANSNVIVVVHSVGAIILEQILALPQVKAVVWAGLPSQESGN




GI:429544273
ALVDVLWGDVSPSGKLVYTIAKSPNDYNTRIVSGGSDSFSEGLFIDYKHFDDANITPRYEFGYG





LSYTKFNYSRLSVLSTAKSGPATGAVVPGGPSDLFQNVATVTVDIANSGQVTGAEVAQLYITYP





SSAPRTPPKQLRGFAKLNLTPGQSGTATFNIRRRDLSYWDTASQKWVVPSGSFGISVGASSRDI





RLTSTLSVA




Chain A, Crystal
VVPPAGTPWGTAYDKAKAALAKLNLQDKVGIVSGVGWNGGPCVGNTSPASKISYPSLCLQDGPL
366



Structure Of A
GVRYSTGSTAFTPGVQAASTWDVNLIRERGQFIGEEVKASGIHVILGPVAGPLGKTPQGGRNWE




Glycoside
GFGVDPYLTGIAMGQTINGIQSVGVQATAKHYILNEQELNRETISSNPDDRTLHELYTWPFADA




Hydrolase Family 3
VQANVASVMCSYNKVNTTWACEDQYTLQTVLKDQLGFPGYVMTDWNAQHTTVQSANSGLDMSMP




Beta-glucosidase,
GTDFNGNNRLWGPALTNAVNSNQVPTSRVDDMVTRILAAWYLTGQDQAGYPSFNISRNVQGNHK




Bg11 from Hypocrea
TNVRAIARDGIVLLKNDANILPLKKPASIAVVGSAAIIGNHARNSPSCNDKGCDDGALGMGWGS





Jecorina

GAVNYPYFVAPYDAINTRASSQGTQVTLSNTDNTSSGASAARGKDVAIVFITADSGEGYITVEG




3ZYZ_A
NAGDRNNLDPWHNGNALVQAVAGANSNVIVVVHSVGAIILEQILALPQVKAVVWAGLPSQESGN




GI:429544272
ALVDVLWGDVSPSGKLVYTIAKSPNDYNTRIVSGGSDSFSEGLFIDYKHFDDANITPRYEFGYG





LSYTKFNYSRLSVLSTAKSGPATGAVVPGGPSDLFQNVATVTVDIANSGQVTGAEVAQLYITYP





SSAPRTPPKQLRGFAKLNLTPGQSGTATFNIRRRDLSYWDTASQKWVVPSGSFGISVGASSRDI





RLTSTLSVA




Chain B, Crystal
AVVPPAGTPWGTAYDKAKAALAKLNLQDKVGIVSGVGWNGGPCVGNTSPASKISYPSLCLQDGP
367



Structure Of Beta-
LGVRYSTGSTAFTPGVQAASTWDVNLIRERGQFIGEEVKASGIHVILGPVAGPLGKTPQGGRNW




d-glucoside
EGFGVDPYLTGIAMGQTINGIQSVGVQATAKHYILNEQELNRETISSNPDDRTLHELYTWPFAD




glucohydrolase
AVQANVASVMCSYNKVNTTWACEDQYTLQTVLKDQLGFPGYVMTDWNAQHTTVQSANSGLDMSM




from Trichoderma
PGTDFNGNNRLWGPALTNAVNSNQVPTSRVDDMVTRILAAWYLTGQDQAGYPSFNISRNVQGNH





Reesei

KTNVRAIARDGIVLLKNDANILPLKKPASIAVVGSAAIIGNHARNSPSCNDKGCDDGALGMGWG




4I8D_B
SGAVNYPYFVAPYDAINTRASSQGTQVTLSNTDNTSSGASAARGKDVAIVFITADSGEGYITVE




GI:430801090
GNAGDRNNLDPWHNGNALVQAVAGANSNVIVVVHSVGAIILEQILALPQVKAVVWAGLPSQESG





NALVDVLWGDVSPSGKLVYTIAKSPNDYNTRIVSGGSDSFSEGLFIDYKHFDDANITPRYEFGY





GLSYTKFNYSRLSVLSTAKSGPATGAVVPGGPSDLFQNVATVTVDIANSGQVTGAEVAQLYITY





PSSAPRTPPKQLRGFAKLNLTPGQSGTATFNIRRRDLSYWDTASQKWVVPSGSFGISVGASSRD





IRLTSTLSVA




Chain A, Crystal
AVVPPAGTPWGTAYDKAKAALAKLNLQDKVGIVSGVGWNGGPCVGNTSPASKISYPSLCLQDGP
368



Structure Of Beta-
LGVRYSTGSTAFTPGVQAASTWDVNLIRERGQFIGEEVKASGIHVILGPVAGPLGKTPQGGRNW




d-glucoside
EGFGVDPYLTGIAMGQTINGIQSVGVQATAKHYILNEQELNRETISSNPDDRTLHELYTWPFAD




Glucohydrolase
AVQANVASVMCSYNKVNTTWACEDQYTLQTVLKDQLGFPGYVMTDWNAQHTTVQSANSGLDMSM




from Trichoderma
PGTDFNGNNRLWGPALTNAVNSNQVPTSRVDDMVTRILAAWYLTGQDQAGYPSFNISRNVQGNH





Reesei

KTNVRAIARDGIVLLKNDANILPLKKPASIAVVGSAAIIGNHARNSPSCNDKGCDDGALGMGWG




4I8D_A
SGAVNYPYFVAPYDAINTRASSQGTQVTLSNTDNTSSGASAARGKDVAIVFITADSGEGYITVE




GI:430801089
GNAGDRNNLDPWHNGNALVQAVAGANSNVIVVVHSVGAIILEQILALPQVKAVVWAGLPSQESG





NALVDVLWGDVSPSGKLVYTIAKSPNDYNTRIVSGGSDSFSEGLFIDYKHFDDANITPRYEFGY





GLSYTKFNYSRLSVLSTAKSGPATGAVVPGGPSDLFQNVATVTVDIANSGQVTGAEVAQLYITY





PSSAPRTPPKQLRGFAKLNLTPGQSGTATFNIRRRDLSYWDTASQKWVVPSGSFGISVGASSRD





IRLTSTLSVA




Cel3d protein
MILGCESTGVISAVKHFVANDQEHERRAVDCLITQRALREVYLRPFQIVARDARPGALMTSYNK
369



[Trichoderma
VNGKHVADSAEFLQGILRTEWNWDPLIVSDWYGTYTTIDAIKAGLDLEMPGVSRYRGKYIESAL





reesei]

QARLLKQSTIDERARRVLRFAQKASHLKVSEVEQGRDFPEDRVLNRQICGSSIVLLKNENSILP




AAP57759.1
LPKSVKKVALVGSHVRLPAISGGGSASLVPYYAISLYDAVSEVLAGATITHEVGAYAHQMLPVI




GI:31747172
DAMISNAVIHFYNDPIDVKDRKLLGSENVSSTSFQLMDYNNIPTLNKAMFWGTLVGEFIPTATG





IWEFGLSVFGTADLYIDNELVIENTTHQTRGTAFFGKGTTEKVATRRMVAGSTYKLRLEFGSAN





TTKMETTGVVNFGGGAVHLGACLKVDPQEMIARAVKAAADADYTIICTGLSGEWESEGFDRPHM





DLPPGVDTMISQVLDAAPNAVVVNQSGTPVTMSWAHKAKAIVQAWYGGNETGHGISDVLFGNVN





PSGKLSLSWPVDVKHNPAYLNYASVGGRVLYGEDVYVGYKFYDKTEREVLFPFGHGLSYATFKL





PDSTVRTVPETFHPDQPTVAIVKIKNTSSVPGAQVLQLYISAPNSPTHRPVKELHGFEKVYLEA





GEEKEVQIPIDQYATSFWDEIESMWKSERGIYDVLVGFSSQEISGKGKLIVPETRFWMGL




Cel3c protein
MADIDVEAILKKLTLAEKVDLLAGIDFWHTKALPKHGVPSLRFTDGPNGVRGTKFFNGVPAACF
370



[Trichoderma
PCGTSLGSTFNQTLLEEAGKMMGKEAIAKSAHVILGPTINMQRSPLGGRGFESIGEDPFLAGLG





reesei]

AAALIRGIQSTGVQATIKHFLCNDQEDRRMMVQSIVTERALREIYALPFQIAVRDSQPGAFMTA




AAP57756.1
YNGINGVSCSENPKYLDGMLRKEWGWDGLIMSDWYGTYSTTEAVVAGLDLEMPGPPRFRGETLK




GI:31747168
FNVSNGKPFIHVIDQRAREVLQFVKKCAASGVTENGPETTVNNTPETAALLRKVGNEGIVLLKN





ENNVLPLSKKKKTLIVGPNAKQATYHGGGSAALRAYYAVTPFDGLSKQLETPPSYTVGAYTTVP





PILGEQCLTPDGAPGMRWRVFNEPPGTPNRQHIDELFFTKTDMHLVDYYHPKAADTWYADMEGT





YTADEDCTYELGLVVCGTAKAYVDDQLVVDNATKQVPGDAFFGSATREETGRINLVKGNTYKFK





IEFGSAPTYTLKGDTIVPGHGSLRVGGCKVIDDQAEIEKSVALAKEHDQVIICAGLNADWETEG





ADRASMKLPGVLDQLIADVAAANPNTVVVMQTGTPEEMPWLDATPAVIQAWYGGNETGNSIADV





VFGDYNPSGKLSLSFPKRLQDNPAFLNFRTEAGRTLYGEDVYVGYRYYEFADKDVNFPFGHGLS





YTTFAFSNLSVSHKDGKLSVSLSVKNTGSVPGAQVAQLYVKPLQAAKINRPVKELKGFAKVELQ





PGETKAVTIEEQEKYVAAYFDEERDQWCVEKGDYEVIVSDSSAAKDGVALRGKFTVGETYWWSG





V




Ce13b protein
MKTLSVFAAALLAAVAEANPYPPPHSNQAYSPPFYPSPWMDPSAPGWEQAYAQAKEFVSGLTLL
371



[Trichoderma
EKVNLTTGVGWMGEKCVGNVGTVPRLGMRSLCMQDGPLGLRFNTYNSAFSVGLTAAASWSRHLW





reesei]

VDRGTALGSEAKGKGVDVLLGPVAGPLGRNPNGGRNVEGFGSDPYLAGLALADTVTGIQNAGTI




AAP57755.1
ACAKHFLLNEQEHFRQVGEANGYGYPITEALSSNVDDKTIHEVYGWPFQDAVKAGVGSFMCSYN




GI:31747166
QVNNSYACQNSKLINGLLKEEYGFQGFVMSDWQAQHTGVASAVAGLDMTMPGDTAFNTGASYFG





SNLTLAVLNGTVPEWRIDDMVMRIMAPFFKVGKTVDSLIDTNFDSWTNGEYGYVQAAVNENWEK





VNYGVDVRANHANHIREVGAKGTVIFKNNGILPLKKPKFLTVIGEDAGGNPAGPNGCGDRGCDD





GTLAMEWGSGTTNFPYLVTPDAALQSQALQDGTRYESILSNYAISQTQALVSQPDAIAIVFANS





DSGEGYINVDGNEGDRKNLTLWKNGDDLIKTVAAVNPKTIVVIHSTGPVILKDYANHPNISAIL





WAGAPGQESGNSLVDILYGKQSPGRTPFTWGPSLESYGVSVMTTPNNGNGAPQDNFNEGAFIDY





RYFDKVAPGKPRSSDKAPTYEFGFGLSWSTFKFSNLHIQKNNVGPMSPPNGKTIAAPSLGSFSK





NLKDYGFPKNVRRIKEFIYPYLSTTTSGKEASGDAHYGQTAKEFLPAGALDGSPQPRSAASGEP





GGNRQLYDILYTVTATITNTGSVMDDAVPQLYLSHGGPNEPPKVLRGFDRIERIAPGQSVTFKA





DLTRRDLSNWDTKKQQWVITDYPKTVYVGSSSRDLPLSARLP




beta-D-glucoside
MRYRTAAALALATGPFARADSHSTSGASAEAVVPPAGTPWGTAYDKAKAALAKLNLQDKVGIVS
372



glucohydrolase
GVGWNGGPCVGNTSPASKISYPSLCLQDGPLGVRYSTGSTAFTPGVQAASTWDVNLIRERGQFI




[Trichoderma
GEEVKASGIHVILGPVAGPLGKTPQGGRNWEGFGVDPYLTGIAMGQTINGIQSVGVQATAKHYI





reesei]

LNEQELNRETISSNPDDRTLHELYTWPFADAVQANVASVMCSYNKVNTTWACEDQYTLQTVLKD




AAA18473.1
QLGFPGYVMTDWNAQHTTVQSANSGLDMSMPGTDFNGNNRLWGPALTNAVNSNQVPTSRVDDMV




GI:493580
TRILAAWYLTGQDQAGYPSFNISRNVQGNHKTNVRAIARDGIVLLKNDANILPLKKPASIAVVG





SAAIIGNHARNSPSCNDKGCDDGALGMGWGSGAVNYPYFVAPYDAINTRASSQGTQVTLSNTDN





TSSGASAARGKDVAIVFITADSGEGYITVEGNAGDRNNLDPWHNGNALVQAVAGANSNVIVVVH





SVGAIILEQILALPQVKAVVWAGLPSQESGNALVDVLWGDVSPSGKLVYTIAKSPNDYNTRIVS





GGSDSFSEGLFIDYKHFDDANITPRYEFGYGLSYTKFNYSRLSVLSTAKSGPATGAVVPGGPSD





LFQNVATVTVDIANSGQVTGAEVAQLYITYPSSAPRTPPKQLRGFAKLNLTPGQSGTATFNIRR





RDLSYWDTASQKWVVPSGSFGISVGASSRDIRLTST LSVA




putative beta-
MTSFHDGVKLSTVTYGYYVPPYYPAPYGGWVEDWQESYTKAKALVDSMTLAEKTNITAGTGIYM
373



glucosidase 1
GEWRCAGNTGSAFRVSFPQLCLNDSPAGVRHADNVTAFPDGITVGATFDKALMYKRGVAIGKEN




precursor
RGKGVNVWLGPTVGPIGRKPKGGRNWEGFGADPVLQAVGARETIKGVQEQGVIATIKHFIGNEQ




[Trichoderma
EMYRMYNPFQYAYSSNIDDRTLHEVYAWPFAEGIRAGVGAVMMAYNAVNGTACSQHPYLMSALL





reesei RUT C-30]

KDEMGFQGFIMTDWLAHMSGVASAIAGLDMDMPGDVQIPFFGGSYWMYELTRSALNGSVPMDRI




ET502983.1
NDAATRIAAAWYKMGQDKGFPATNFDTNSRAAFNPLYPAALPLSPFGITNEFVPVQDDHDVIAR




GI:572279864
QISQEAITLLKNDGDILPLSPSQHLKVFGTDAQKNPDGINSCTDRNCNKGTLGQGWGSGTVDYP





YLDDPISAITAEADNVTFYNTDKFPSVGEVSDSDVAIVFVNSDAGENTYTVEGNHGDRDKSGLY





AWHDGDKLVQDAASKFSNVIVVIHTVGPLILEKWIDLPSVKAVLVAHLPGQEAGKSLTNVLFGH





ASPCGHLPYSITKEEDDLPKSVTTLIDSEFLNQPQDTYTEGLYIDYRWLNKNKTKPRYAFGHGL





SYTNFTFKAASIKQVARLSAYPPARPAKGSTPDFAQSIPSASEAVAPSGFGKIPRYIYSWLSQG





DANRAISDGKTGKYPYPDGYSTTQKPGARAGGGEGGNPALWDVAYSLTVTVQNTGDEYAGKASV





QAYLQFPDDIDYDTPIIQLRDFEKTKELKPGETTTVTLTLTRKDVSVWDVVAQDWKVPAVDGGY





KVWIGDASDSLSIVCHTDTLECETGVVGPV




putative beta-
MVAVKQIALLAGLAHWADAAEKVITNDTHFYGQSPPVYPSPEMTGGNEWEAAYQKAKAFVGQLT
374



glucosidase
LEEKVNLTAGVPPNTTCSGVIPAIERLKFPGMCLSDAGNGLRNTDFVSGFPSGIHVGASWSKDL




[Trichoderma
AFRRAVAMGAEFRKKGVNVLLGPVVGPAGRTVRGGRNWEGFSVDPWLAGVLVSETVSGIQEQGV





reesei RUT C-30]

ITSTKHYILNEQETHRMPEANVSAVSSNIDDKTMHEYYLWPFQDAVRAGSGNIMCSYQRINNSY




ET501786.1
GCSNSKTLNGLLKTELGFQGFVVSDWSAQHAGVASAEAGMDMAMPGPAEFWGEHLVEAVKNGSL




GI:572278616
PESRITDMATRIIATWYQFDQDNGIPKPGIGMPSNVLDSHEIVDARDPAAVPVLLNGAIEGHVL





VKNTKNTLPLKKPRKLSLFGYSATTPDFFSPSRDEQLSDSWIFGKEAYNSNYLSPDGFATFGRN





GTTFGGCGSGAITPALAISPFEALKWRAAQDGTATFNNFLSDKPDVDPTSDACIVFGNAYACEG





NDRPAIQDDYTDDLIKAVASQCNKTIVVLHNAGIRLVDGFVDHPNITAVIFAHLPGQESGPALT





SLLYGETSPSGRLPYTVAKNDTDYGVVLDPAQATGEFAYFPQADFKEGVYLDYRYFDKEGIEPR





YEFGFGLSYTTFAYLNLSVDHVSGANTYPWPGGPIVSGGQTDLWDAIATVSVDIRNTGSVASYE





VAQLYIGIPGAPAKQLRGFEKPFLRPNESQSVTFHLTRRDLSVWSVERQKWQLQQGTYKIYVGS





SSRRLHVNGTLDI




predicted protein,
HLRSHTVESPNSIVKRGTCAFPTDDPNLVAVTPDAENAGWAMSPDQPCKPGHYCPIACKPGMVM
375



partial
AQWSPDSSYSYPSSMDGGLYCDEDGEVHKPFPNKPYCVEGTGAVVAVNKCGEPMSWCQTVLPGN




[Trichoderma
EAMLIPTLVEDQATIAVPDTSYWCETAAHYYINPPGSSVADCVWGVSSKPVGNWSPYVAGANTD





reesei QM6a]

GDGNTFVKLGWNPIWQDSALKSTLPSFGVKIECPDGGCNGLPCEISPNSDGSVDSKESAVGAGN




EGR51923.1
AAFCVVTVPKGGVANIVAYNVDGSSGGSDSDSDSDSGSSSSAAPSSTAHGLKAGGFAALAEKPT




GI:340521689
STTAAPSSTEVSTTAAASTTEVASTTAAAESTTTAAESTAAETTDASATATTKAAHSTTGGKAS





STARARPSVNPGMFHENGTSPHQTTAAPSGPSATQADSAPVTTTTKKGEAGRQQGSTAFAGLIV





AFVAAACFL




glycoside
MVAVKQIALLAGLAHWADAAEKVITNDTHFYGQSPPVYPSPEMTGGNEWEAAYQKAKAFVGQLT
376



hydrolase family 3
LEEKVNLTAGVPPNTTCSGVIPAIERLKFPGMCLSDAGNGLRNTDFVSGFPSGIHVGASWSKDL




protien
AFRRAVAMGAEFRKKGVNVLLGPVVGPAGRTVRGGRNWEGFSVDPWLAGVLVSETVSGIQEQGV




[Trichoderma
ITSTKHYILNEQETHRMPEANVSAVSSNIDDKTMHEYYLWPFQDAVRAGSGNIMCSYQRINNSY





reesei QM6a]

GCSNSKTLNGLLKTELGFQGFVVSDWSAQHAGVASAEAGMDMAMPGPAEFWGEHLVEAVKNGSL




EGR50829.1
PESRITDMATRIIATWYQFDQDNGIPKPGIGMPSNVLDSHEIVDARDPAAVPVLLNGAIEGHVL




GI:340520593
VKNTKNTLPLKKPRKLSLFGYSATTPDFFSPSRDEQLSDSWIFGKEAYNSNYLSPDGFATFGRN





GTTFGGCGSGAITPALAISPFEALKWRAAQDGTATFNNFLSDKPDVDPTSDACIVFGNAYACEG





NDRPAIQDDYTDDLIKAVASQCNKTIVVLHNAGIRLVDGFVDHPNITAVIFAHLPGQESGPALT





SLLYGETSPSGRLPYTVAKNDTDYGVVLDPAQATGEFAYFPQADFKEGVYLDYRYFDKEGIEPR





YEFGFGLSYTTFAYLNLSVDHVSGANTYPWPGGPIVSGGQTDLWDAIATVSVDIRNTGSVASYE





VAQLYIGIPGAPAKQLRGFEKPFLRPNESQSVTFHLTRRDLSVWSVERQKWQLQQGTYKIYVGS





SSRRLHVNGTLDI




cell wall protein
MLACITRATLPTVVAATPSHHHHHAHRHAKKHAAARVEKRAPDVVTEVVVGATATVFELDGKIV
377



[Trichoderma
DAATAKAGLAEGEYIIVGETTPTFVPPPPPPPATSSAAPLRAQFVEEPISSPAAPTTTSAPPPP





reesei QM6a]

PTTTAQATTSSAPPPPKTSKPAQSSPSSGATGLDADFPSGKISCKTFPSEYGAVALDWLGTGGW




EGR50785.1
SGLQFVPNYSPDAQSISDIITGIAGQTCSKGAMCSYACPPGYQKTQWPKAQGATLQSIGGLYCN




GI:340520549
EDGFLELTRPDHPKLCEAGAGGVTIKNDLDDSVCTCRTDYPGIESMVIPACTSAGETIELTNPD





ETDYYVWDGKTTSAQYYVNKKGYAVEDACVWNSPLDPRGAGNWSPINIGTGKTADGITWLSIFE





NLPTSSAKLDFNIEITGDVNSKCSYIDGAWTGGDKGCTTAMPSGGKAVIRYF




glycoside
MILGCESTGVISAVKHFVANDQEHERRAVDCLITQRALREVYLRPFQIVARDARPGALMTSYNK
378



hydrolase family 3
VNGKHVADSAEFLQGILRTEWNWDPLIVSDWYGTYTTIDAIKAGLDLEMPGVSRYRGKYIESAL




protein
QARLLKQSTIDERARRVLRFAQKASHLKVSEVEQGRDFPEDRVLNRQICGSSIVLLKNENSILP




[Trichoderma
LPKSVKKVALVGSHVRLPAISGGGSASLVPYYAISLYDAVSEVLAGATITHEVGAYAHQMLPVI





reesei QM6a]

DAMISNAVIHFYNDPIDVKDRKLLGSENVSSTSFQLMDYNNIPTLNKAMFWGTLVGEFIPTATG




EGR49878.1
IWEFGLSVFGTADLYIDNELVIENTTHQTRGTAFFGKGTTEKVATRRMVAGSTYKLRLEFGSAN




GI:340519640
TTKMETTGVVNFGGGAVHLGACLKVDPQEMIARAVKAAADADYTIICTGLSGEWESEGFDRPHM





DLPPGVDTMISQVLDAAPNAVVVNQSGTPVTMSWAHKAKAIVQAWYGGNETGHGISDVLFGNVN





PSGKLSLSWPVDVKHNPAYLNYASVGGRVLYGEDVYVGYKFYDKTEREVLFPFGHGLSYATFKL





PDSTVRTVPETFHPDQPTVAIVKIKNTSSVPGAQVLQLYISAPNSPTHRPVKELHGFEKVYLEA





GEEKEVQIPIDQYATSFWDEIESMWKSERGIYDVLVGFSSQEISGKGKLIVPETRFWMGL




glycoside
MRYRTAAALALATGPFARADSHSTSGASAEAVVPPAGTPWGTAYDKAKAALAKLNLQDKVGIVS
379



hydrolase family 3
GVGWNGGPCVGNTSPASKISYPSLCLQDGPLGVRYSTGSTAFTPGVQAASTWDVNLIRERGQFI




protein
GEEVKASGIHVILGPVAGPLGKTPQGGRNWEGFGVDPYLTGIAMGQTINGIQSVGVQATAKHYI




[Trichoderma
LNEQELNRETISSNPDDRTLHELYTWPFADAVQANVASVMCSYNKVNTTWACEDQYTLQTVLKD





reesei QM6a]

QLGFPGYVMTDWNAQHTTVQSANSGLDMSMPGTDFNGNNRLWGPALTNAVNSNQVPTSRVDDMV




EGR49703.1
TRILAAWYLTGQDQAGYPSFNISRNVQGNHKTNVRAIARDGIVLLKNDANILPLKKPASIAVVG




GI:340519465
SAAIIGNHARNSPSCNDKGCDDGALGMGWGSGAVNYPYFVAPYDAINTRASSQGTQVTLSNTDN





TSSGASAARGKDVAIVFITADSGEGYITVEGNAGDRNNLDPWHNGNALVQAVAGANSNVIVVVH





SVGAIILEQILALPQVKAVVWAGLPSQESGNALVDVLWGDVSPSGKLVYTIAKSPNDYNTRIVS





GGSDSFSEGLFIDYKHFDDANITPRYEFGYGLSYTKFNYSRLSVLSTAKSGPATGAVVPGGPSD





LFQNVATVTVDIANSGQVTGAEVAQLYITYPSSAPRTPPKQLRGFAKLNLTPGQSGTATFNIRR





RDLSYWDTASQKWVVPSGSFGISVGASSRDIRLTST LSVA




glycoside
MTLAEKTNITAGTGIYMGERCAGNTGSAFRVSFPQLCLNDSPAGVRHADNVTAFPDGITVGATF
380



hydrolase family 3
DKALMYKRGVAIGKENRGKGVNVWLGPTVGPIGRKPKGGRNWEGFGADPVLQAVGARETIKGVQ




protein
EQGVIATIKHFIGNEQEMYRMYNPFQYAYSSNIDDRTLHEVYAWPFAEGIRAGVGAVMMAYNAV




[Trichoderma
NGTACSQHPYLMSALLKDEMGFQGFIMTDWLAHMSGVASAIAGLDMDMPGDVQIPFFGGSYWMY





reesei QM6a]

ELTRSALNGSVPMDRINDAATRIAAAWYKMGQDKGFPATNFDTNSRAAFNPLYPAALPLSPFGI




814 aa protein
TNEFVPVQDDHDVIARQISQEAITLLKNDGDILPLSPSQHLKVFGTDAQKNPDGINSCTDRNCN




EGR49559.1
KGTLGQGWGSGTVDYPYLDDPISAITAEADNVTFYNTDKFPSVGEVSDSDVAIVFVNSDAGENT




GI:340519320
YTVEGNHGDRDKSGLYAWHDGDKLVQDAASKFSNVIVVIHTVGPLILEKWIDLPSVKAVLVAHL





PGQEAGKSLTNVLFGHASPCGHLPYSITKEEDDLPKSVTTLIDSEFLNQPQDTYTEGLYIDYRW





LNKNKTKPRYAFGHGLSYTNFTFKAASIKQVARLSAYPPARPAKGSTPDFAQSIPSASEAVAPS





GFGKIPRYIYSWLSQGDANRAISDGKTGKYPYPDGYSTTQKPGARAGGGEGGNPALWDVAYSLT





VTVQNTGDEYAGKASVQAYLQFPDDIDYDTPIIQLRDFEKTKELKPGETTTVTLTLTRKDVSVW





DVVAQDWKVPAVDGGYKVWIGDASDSLSIVCHTDTLECETGVVGPV




glycosidehydrolase
MKTLSVFAAALLAAVAEANPYPPPHSNQAYSPPFYPSPWMDPSAPGWEQAYAQAKEFVSGLTLL
381



family 3 protein
EKVNLTTGVGWMGEKCVGNVGTVPRLGMRSLCMQDGPLGLRFNTYNSAFSVGLTAAASWSRHLW




[Trichoderma
VDRGTALGSEAKGKGVDVLLGPVAGPLGRNPNGGRNVEGFGSDPYLAGLALADTVTGIQNAGTI





reesei QM6a]

ACAKHFLLNEQEHFRQVGEANGYGYPITEALSSNVDDKTIHEVYGWPFQDAVKAGVGSIMCSYN




EGR48517.1
QVNNSYACQNSKLINGLLKEEYGFQGFVMSDWQAQHTGVASAVAGLDMTMPGDTAFNTGASYFG




GI:340518276
SNLTLAVLNGTVPEWRIDDMVMRIMAPFFKVGKTVDSLIDTNFDSWTNGEYGYVQAAVNENWEK





VNYGVDVRANHANHIREVGAKGTVIFKNNGILPLKKPKFLTVIGEDAGGNPAGPNGCGDRGCDD





GTLAMEWGSGTTNFPYLVTPDAALQSQALQDGTRYESILSNYAISQTQALVSQPDAIAIVFANS





DSGEGYINVDGNEGDRKNLTLWKNGDDLIKTVAAVNPKTIVVIHSTGPVILKDYANHPNISAIL





WAGAPGQESGNSLVDILYGKQSPGRTPFTWGPSLESYGVSVMTTPNNGNGAPQDNFNEGAFIDY





RYFDKVAPGKPRSSDKAPTYEFGFGLSWSTFKFSNLHIQKNNVGPMSPPNGKTIAAPSLGSFSK





NLKDYGFPKNVRRIKEFIYPYLSTTTSGKEASGDAHYGQTAKEFLPAGALDGSPQPRSAASGEP





GGNRQLYDILYTVTATITNTGSVMDDAVPQLYLSHGGPNEPPKVLRGFDRIERIAPGQSVTFKA





DLTRRDLSNWDTKKQQWVITDYPKTVYVGSSSRDLPLSARLP




glycosidehydrolase
MANSIGGSSADKFDLDPLWQNLDWAIGQMMLMGWDGTQVTPQIRSLIEDHHLGSIILTAKNLKS
382



family 3 protein
AHHTALLVQELQMIAKNSGHPQPLLIAVDQENGGVNSLFDEDFVCQFPSAMAIAATGSLELSYE




[Trichoderma
VNKATATEISACGVNLMLGPVLDVLNNARYQVIGVRASGDDPQEVSQYGLAALSGIRDAGVASC





reesei QM6a]

GKHFPSYGNLDFLGSNLDVPIITQTLEELSLSALVPFRNAIASGKLDAMFIGGCGISNPSMNVS




EGR47352.1
HACLSDQVVDDLLRNELGFKGVAISECLEMEALSQDLGVQNGVVMAVEAGCDIVLLCRAYDVQL




GI:340517106
EAIKGLKLGYENGIITKERIFTSLKRIFHLKSTCTSWAKALNPPGINLLSQIRPSHLALSRRAY





DDSITIVRDKEKLLPLSLSMHPGEELLLLTPLVKPLPASSLTKSLLESKNDPSLVSTEHDRWNH





QIRERSAIMSGEGVFREFGKTLARYRNEKLLHTSYTANGVRPVHENLINRASCIIIFTADANRN





LYQAGFTKHVDMMCSMLRSRGQKKQLIVVAVSSPYDFAMDKSIGTYICTYDFTENAMAALVRAL





VGDSNPVGTMPGTLRKSKKVLKSRQHWLVEEFDSSRDRKGLNDLIRAVHRASDQDFRYLQTATA





DTFLLANQNIKETHFVVRNSSTQALYGFAATYFVQNVGILGALIVDPTKRNMSIGRSLHRRAIK





SLTQQRGIKKVQLGSCFPALFLGIPLDIEVTTTKEWFSNSGWDTQFPRRLTNMVIQDLSAWYAP





EGLSQSIQRANISFDLIYGVESGDTVMHHVRTHANPEVLELYRTALEESKACGIVRAKDAAGNL





LGTIIICRPNSPLARYVPPLVSLGQDIGGLLAPIVPPAPLSTLVLQGLALMGVRQNKGHKATKS





VLSWVVDDAYEPLVAMGFDVLQAFEEITNSPETFQT




carbohydrate
MLPRRMRKSRCCIAVLAVIAIIVMLLAAAGAFGYKKLKITPLDGKSPPWYPTPKGGSVRQWADS
383



esterase family 4
YQKAAEMVARMTLPEKVNITTGTGWSMGLAVGNTAPALLVGFPALALQDGPLGIRFADNATALP




protein
AGVTVGATWNRHLMYEHGRVHALEARGKGINALLGPCVGPLGRMPAGGRNWEGFGADPYLQGVA




[Trichoderma
GYETIKGIQDQGVMATIKHFVANEQEHFRQAWEWVLPNALSSNIDDRTMHEIYAWPFGDAVKAG





reesei QM6a]

VASVMCSYNMVNNSYACGNSKLLNGILKDELGFQGFVMSDWLAQRSGVGSALAGLDMSMPGDGL




EGR46266.1
RWQDGNSLWGPNLSRAVLNGSLPLERLNDMVVRIVAAWYQLGQDDEKLFDRKGPNFSSWTNDRM




GI:340516015
GVTAPASSSPQEKVVVNQFVNVQANHSILARQIAAEGTVLLKNEGVLPLSVDGLLGGGGGSNST





KREGQVRIGIFGEDAGPGKGPNYCEDRSCNQGTLASGWGSGAVEFPYLVSPIEALRKKFNKDKV





KLTEHLKNDELDTGVIKSQDICMVFINSDSGEGYRAWEGVRGDRNDLKPQKGGVGLVTHVGLNC





GNGSGTTIVVLHSVGPVVVDPWIDMPGIKALISANLPGQESGNALASILFGEENPSGKLPYTVG





KSLSDYGPGGQVMYLPNGAVPQQDFSEGLYIDYRHFDKFNIEPRYEFGFGLSYTNFDYKNLKIT





ETKPRSPLPDERPAAEVEPPSFDTTIPQAEEALFPSGIRRLKKYVYPYIESVKDIKEGQYSYPD





GYDKEQPLSGAGGGEGGNPSLWDSHVIVSVEVTNTGKLGGKAVPQLYLSYPASETVDFPVRVLR





GFDKVYIGKGETKTVEFSLTRRDLSYWDVERQNWVIPEGEYTFAVGESSRDLRVSGTW




glycosidehydrolase
QLFAVGFHGREINQEITTLIRDYGVGAIVLFKRNENGLVTRISPPIASQQPGPMTLGAAGSLEY
384



family 3 protein,
AYEVAKATAEMLRYFGINMNYAPVGDVNSEPLNPVIGVRSPSDQAETVSKFAAACTKGLREHKV




partial
VPCIKHFPGHGDTAVDSHYGLPVINKSRADMEKLELIPFRDAVADNIEMVMTAHISLPQLAKDG




[Trichoderma
LPATLSPDTIGILRNEWKYEGVIMTDCLEMDGIRATYGTVEGSLMAFQAGVDNVMICHTFDVQA





reesei QM6a]

AAVDYICGAIESGKLSQERVDQSLERLRKLKERYTNWDIALHAEPPEALEAINERGEKLARQVY




EGR44807.1
ADATTLVRAQEGLLPLKATAKIAFVSPGPDVPIGGAVDSGTLPTRVPWIADTFGEQIRKRAPEM




GI:340514546
SDVRFTSSNLTEEQWEQIDEADVVILATRNARESKYQKELGLEVAKRRGSRPLISIATCAPYDF





LDDEEIRTYIAVYEPTVEAFSAAVDILFGDAQPRGKLPVAH




glycosidehydrolase
MADIDVEAILKKLTLAEKVDLLAGIDFWHTKALPKHGVPSLRFTDGPNGVRGTKFFNGVPAACF
385



family 3 protein
PCGTSLGSTFNQTLLEEAGKMMGKEAIAKSAHVILGPTINMQRSPLGGRGFESIGEDPFLAGLG




[Trichoderma
AAALIRGIQSTGVQATIKHFLCNDQEDRRMMVQSIVTERALREIYALPFQIAVRDSQPGAFMTA





reesei QM6a]

YNGINGVSCSENPKYLDGMLRKEWGWDGLIMSDWYGTYSTTEAVVAGLDLEMPGPPRFRGETLK




EGR44527.1
FNVSNGKPFIHVIDQRAREVLQFVKKCAASGVTENGPETTVNNTPETAALLRKVGNEGIVLLKN




GI:340514262
ENNVLPLSKKKKTLIVGPNAKQATYHGGGSAALRAYYAVTPFDGLSKQLETPPSYTVGAYTHRF





LPILGEQCLTPDGAPGMRWRVFNEPPGTPNRQHIDELFFTKTDMHLVDYYHPKAADTWYADMEG





TYTADEDCTYELGLVVCGTAKAYVDDQLVVDNATKQVPGDAFFGSATREETGRINLVKGNTYKF





KIEFGSAPTYTLKGDTIVPGHGSLRVGGCKVIDDQAEIEKSVALAKEHDQVIICAGLNADWETE





GADRASMKLPGVLDQLIADVAAANPNTVVVMQTGTPEEMPWLDATPAVIQAWYGGNETGNSIAD





VVFGDYNPSGKLSLSFPKRLQDNPAFLNFRTEAGRTLYGEDVYVGYRYYEFADKDVNFPFGHGL





SYTTFAFSNLSVSHKDGKLSVSLSVKNTGSVPGAQVAQLYVKPLQAAKINRPVKELKGFAKVEL





QPGETKAVTIEEQEKYVAAYFDEERDQWCVEKGDYEVIVSDSSAAKDGVALRGKFTVGETYWWS





GV




glycoside
MADIDVEAILKKLTLAEKVDLLAGIDFWHTKALPKHGVPSLRFTDGPNGVRGTKFFNGVPAACF
386



hydrolase family 3
PCGTSLGSTFNQTLLEEAGKMMGKEAIAKSAHVILGPTINMQRSPLGGRGFESIGEDPFLAGLG




protein
AAALIRGIQSTGVQATIKHFLCNDQEDRRMMVQSIVTERALREIYALPFQIAVRDSQPGAFMTA




[Trichoderma
YNGINGVSCSENPKYLDGMLRKEWGWDGLIMSDWYGTYSTTEAVVAGLDLEMPGPPRFRGETLK





reesei QM6a]

FNVSNGKPFIHVIDQRAREVLQFVKKCAASGVTENGPETTVNNTPETAALLRKVGNEGIVLLKN




XP_006969529.1
ENNVLPLSKKKKTLIVGPNAKQATYHGGGSAALRAYYAVTPFDGLSKQLETPPSYTVGAYTHRF




GI:589115013
LPILGEQCLTPDGAPGMRWRVFNEPPGTPNRQHIDELFFTKTDMHLVDYYHPKAADTWYADMEG





TYTADEDCTYELGLVVCGTAKAYVDDQLVVDNATKQVPGDAFFGSATREETGRINLVKGNTYKF





KIEFGSAPTYTLKGDTIVPGHGSLRVGGCKVIDDQAEIEKSVALAKEHDQVIICAGLNADWETE





GADRASMKLPGVLDQLIADVAAANPNTVVVMQTGTPEEMPWLDATPAVIQAWYGGNETGNSIAD





VVFGDYNPSGKLSLSFPKRLQDNPAFLNFRTEAGRTLYGEDVYVGYRYYEFADKDVNFPFGHGL





SYTTFAFSNLSVSHKDGKLSVSLSVKNTGSVPGAQVAQLYVKPLQAAKINRPVKELKGFAKVEL





QPGETKAVTIEEQEKYVAAYFDEERDQWCVEKGDYEVIVSDSSAAKDGVALRGKFTVGETYWWS





GV




glycoside
QLFAVGFHGREINQEITTLIRDYGVGAIVLFKRNENGLVTRISPPIASQQPGPMTLGAAGSLEY
387



hydrolase family 3
AYEVAKATAEMLRYFGINMNYAPVGDVNSEPLNPVIGVRSPSDQAETVSKFAAACTKGLREHKV




protein, partial
VPCIKHFPGHGDTAVDSHYGLPVINKSRADMEKLELIPFRDAVADNIEMVMTAHISLPQLAKDG




[Trichoderma
LPATLSPDTIGILRNEWKYEGVIMTDCLEMDGIRATYGTVEGSLMAFQAGVDNVMICHTFDVQA





reesei QM6a]

AAVDYICGAIESGKLSQERVDQSLERLRKLKERYTNWDIALHAEPPEALEAINERGEKLARQVY




XP_006969215.1
ADATTLVRAQEGLLPLKATAKIAFVSPGPDVPIGGAVDSGTLPTRVPWIADTFGEQIRKRAPEM




GI:589114385
SDVRFTSSNLTEEQWEQIDEADVVILATRNARESKYQKELGLEVAKRRGSRPLISIATCAPYDF





LDDEEIRTYIAVYEPTVEAFSAAVDILFGDAQPRGKLPVAH




carbohydrate
MLPRRMRKSRCCIAVLAVIAIIVMLLAAAGAFGYKKLKITPLDGKSPPWYPTPKGGSVRQWADS
388



esterase family 4
YQKAAEMVARMTLPEKVNITTGTGWSMGLAVGNTAPALLVGFPALALQDGPLGIRFADNATALP




protein
AGVTVGATWNRHLMYEHGRVHALEARGKGINALLGPCVGPLGRMPAGGRNWEGFGADPYLQGVA




[Trichoderma
GYETIKGIQDQGVMATIKHFVANEQEHFRQAWEWVLPNALSSNIDDRTMHEIYAWPFGDAVKAG





reesei QM6a]

VASVMCSYNMVNNSYACGNSKLLNGILKDELGFQGFVMSDWLAQRSGVGSALAGLDMSMPGDGL




XP_006967911.1
RWQDGNSLWGPNLSRAVLNGSLPLERLNDMVVRIVAAWYQLGQDDEKLFDRKGPNFSSWTNDRM




GI:589111777
GVTAPASSSPQEKVVVNQFVNVQANHSILARQIAAEGTVLLKNEGVLPLSVDGLLGGGGGSNST





KREGQVRIGIFGEDAGPGKGPNYCEDRSCNQGTLASGWGSGAVEFPYLVSPIEALRKKFNKDKV





KLTEHLKNDELDTGVIKSQDICMVFINSDSGEGYRAWEGVRGDRNDLKPQKGGVGLVTHVGLNC





GNGSGTTIVVLHSVGPVVVDPWIDMPGIKALISANLPGQESGNALASILFGEENPSGKLPYTVG





KSLSDYGPGGQVMYLPNGAVPQQDFSEGLYIDYRHFDKFNIEPRYEFGFGLSYTNFDYKNLKIT





ETKPRSPLPDERPAAEVEPPSFDTTIPQAEEALFPSGIRRLKKYVYPYIESVKDIKEGQYSYPD





GYDKEQPLSGAGGGEGGNPSLWDSHVIVSVEVTNTGKLGGKAKGETKTVEFSLTRRDLSYWDVE





RQNWVIPEGEYTFAVGESSRDLRVSGTW




glycoside
MANSIGGSSADKFDLDPLWQNLDWAIGQMMLMGWDGTQVTPQIRSLIEDHHLGSIILTAKNLKS
389



hydrolase family 3
AHHTALLVQELQMIAKNSGHPQPLLIAVDQENGGVNSLFDEDFVCQFPSAMAIAATGSLELSYE




protein
VNKATATEISACGVNLMLGPVLDVLNNARYQVIGVRASGDDPQEVSQYGLAALSGIRDAGVASC




[Trichoderma
GKHFPSYGNLDFLGSNLDVPIITQTLEELSLSALVPFRNAIASGKLDAMFIGGCGISNPSMNVS





reesei QM6a]

HACLSDQVVDDLLRNELGFKGVAISECLEMEALSQDLGVQNGVVMAVEAGCDIVLLCRAYDVQL




XP_006966911.1
EAIKGLKLGYENGIITKERIFTSLKRIFHLKSTCTSWAKALNPPGINLLSQIRPSHLALSRRAY




GI:589109777
DDSITIVRDKEKLLPLSLSMHPGEELLLLTPLVKPLPASSLTKSLLESKNDPSLVSTEHDRWNH





QIRERSAIMSGEGVFREFGKTLARYRNEKLLHTSYTANGVRPVHENLINRASCIIIFTADANRN





LYQAGFTKHVDMMCSMLRSRGQKKQLIVVAVSSPYDFAMDKSIGTYICTYDFTENAMAALVRAL





VGDSNPVGTMPGTLRKSKKVLKSRQHWLVEEFDSSRDRKGLNDLIRAVHRASDQDFRYLQTATA





DTFLLANQNIKETHFVVRNSSTQALYGFAATYFVQNVGILGALIVDPTKRNMSIGRSLHRRAIK





SLTQQRGIKKVQLGSCFPALFLGIPLDIEVTTTKEWFSNSGWDTQFPRRLTNMVIQDLSAWYAP





EGLSQSIQRANISFDLIYGVESGDTVMHHVRTHANPEVLELYRTALEESKACGIVRAKDAAGNL





LGTIIICRPNSPLARYVPPLVSLGQDIGGLLAPIVPPAPLSTLVLQGLALMGVRQNKGHKATKS





VLSWVVDDAYEPLVAMGFDVLQAFEEITNSPETFQT




glycoside
MKTLSVFAAALLAAVAEANPYPPPHSNQAYSPPFYPSPWMDPSAPGWEQAYAQAKEFVSGLTLL
390



hydrolase family 3
EKVNLTTGVGWMGEKCVGNVGTVPRLGMRSLCMQDGPLGLRFNTYNSAFSVGLTAAASWSRHLW




protein
VDRGTALGSEAKGKGVDVLLGPVAGPLGRNPNGGRNVEGFGSDPYLAGLALADTVTGIQNAGTI




[Trichoderma
ACAKHFLLNEQEHFRQVGEANGYGYPITEALSSNVDDKTIHEVYGWPFQDAVKAGVGSIMCSYN





reesei QM6a

QVNNSYACQNSKLINGLLKEEYGFQGFVMSDWQAQHTGVASAVAGLDMTMPGDTAFNTGASYFG




XP_006965281.1
SNLTLAVLNGTVPEWRIDDMVMRIMAPFFKVGKTVDSLIDTNFDSWTNGEYGYVQAAVNENWEK




GI:589106517
VNYGVDVRANHANHIREVGAKGTVIFKNNGILPLKKPKFLTVIGEDAGGNPAGPNGCGDRGCDD





GTLAMEWGSGTTNFPYLVTPDAALQSQALQDGTRYESILSNYAISQTQALVSQPDAIAIVFANS





DSGEGYINVDGNEGDRKNLTLWKNGDDLIKTVAAVNPKTIVVIHSTGPVILKDYANHPNISAIL





WAGAPGQESGNSLVDILYGKQSPGRTPFTWGPSLESYGVSVMTTPNNGNGAPQDNFNEGAFIDY





RYFDKVAPGKPRSSDKAPTYEFGFGLSWSTFKFSNLHIQKNNVGPMSPPNGKTIAAPSLGSFSK





NLKDYGFPKNVRRIKEFIYPYLSTTTSGKEASGDAHYGQTAKEFLPAGALDGSPQPRSAASGEP





GGNRQLYDILYTVTATITNTGSVMDDAVPQLYLSHGGPNEPPKVLRGFDRIERIAPGQSVTFKA





DLTRRDLSNWDTKKQQWVITDYPKTVYVGSSSRDLPLSARLP




glycoside
MTLAEKTNITAGTGIYMGERCAGNTGSAFRVSFPQLCLNDSPAGVRHADNVTAFPDGITVGATF
391



hydrolase family 3
DKALMYKRGVAIGKENRGKGVNVWLGPTVGPIGRKPKGGRNWEGFGADPVLQAVGARETIKGVQ




protein
EQGVIATIKHFIGNEQEMYRMYNPFQYAYSSNIDDRTLHEVYAWPFAEGIRAGVGAVMMAYNAV




[Trichoderma
NGTACSQHPYLMSALLKDEMGFQGFIMTDWLAHMSGVASAIAGLDMDMPGDVQIPFFGGSYWMY





reesei QM6a

ELTRSALNGSVPMDRINDAATRIAAAWYKMGQDKGFPATNFDTNSRAAFNPLYPAALPLSPFGI




XP_006964430.1
TNEFVPVQDDHDVIARQISQEAITLLKNDGDILPLSPSQHLKVFGTDAQKNPDGINSCTDRNCN




GI:589104815
KGTLGQGWGSGTVDYPYLDDPISAITAEADNVTFYNTDKFPSVGEVSDSDVAIVFVNSDAGENT





YTVEGNHGDRDKSGLYAWHDGDKLVQDAASKFSNVIVVIHTVGPLILEKWIDLPSVKAVLVAHL





PGQEAGKSLTNVLFGHASPCGHLPYSITKEEDDLPKSVTTLIDSEFLNQPQDTYTEGLYIDYRW





LNKNKTKPRYAFGHGLSYTNFTFKAASIKQVARLSAYPPARPAKGSTPDFAQSIPSASEAVAPS





GFGKIPRYIYSWLSQGDANRAISDGKTGKYPYPDGYSTTQKPGARAGGGEGGNPALWDVAYSLT





VTVQNTGDEYAGKASVQAYLQFPDDIDYDTPIIQLRDFEKTKELKPGETTTVTLTLTRKDVSVW





DVVAQDWKVPAVDGGYKVWIGDASDSLSIVCHTDTLECETGVVGPV




glycoside
MRYRTAAALALATGPFARADSHSTSGASAEAVVPPAGTPWGTAYDKAKAALAKLNLQDKVGIVS
392



hydrolase family 3
GVGWNGGPCVGNTSPASKISYPSLCLQDGPLGVRYSTGSTAFTPGVQAASTWDVNLIRERGQFI




protein
GEEVKASGIHVILGPVAGPLGKTPQGGRNWEGFGVDPYLTGIAMGQTINGIQSVGVQATAKHYI




[Trichoderma
LNEQELNRETISSNPDDRTLHELYTWPFADAVQANVASVMCSYNKVNTTWACEDQYTLQTVLKD





reesei QM6a

QLGFPGYVMTDWNAQHTTVQSANSGLDMSMPGTDFNGNNRLWGPALTNAVNSNQVPTSRVDDMV




XP_006964076.1
TRILAAWYLTGQDQAGYPSFNISRNVQGNHKTNVRAIARDGIVLLKNDANILPLKKPASIAVVG




GI:589104107
SAAIIGNHARNSPSCNDKGCDDGALGMGWGSGAVNYPYFVAPYDAINTRASSQGTQVTLSNTDN





TSSGASAARGKDVAIVFITADSGEGYITVEGNAGDRNNLDPWHNGNALVQAVAGANSNVIVVVH





SVGAIILEQILALPQVKAVVWAGLPSQESGNALVDVLWGDVSPSGKLVYTIAKSPNDYNTRIVS





GGSDSFSEGLFIDYKHFDDANITPRYEFGYGLSYTKFNYSRLSVLSTAKSGPATGAVVPGGPSD





LFQNVATVTVDIANSGQVTGAEVAQLYITYPSSAPRTPPKQLRGFAKLNLTPGQSGTATFNIRR





RDLSYWDTASQKWVVPSGSFGISVGASSRDIRLTSTLSVA




glycoside
MILGCESTGVISAVKHFVANDQEHERRAVDCLITQRALREVYLRPFQIVARDARPGALMTSYNK
393



hydrolase family 3
VNGKHVADSAEFLQGILRTEWNWDPLIVSDWYGTYTTIDAIKAGLDLEMPGVSRYRGKYIESAL




protein
QARLLKQSTIDERARRVLRFAQKASHLKVSEVEQGRDFPEDRVLNRQICGSSIVLLKNENSILP




[Trichoderma
LPKSVKKVALVGSHVRLPAISGGGSASLVPYYAISLYDAVSEVLAGATITHEVGAYAHQMLPVI





reesei QM6a

DAMISNAVIHFYNDPIDVKDRKLLGSENVSSTSFQLMDYNNIPTLNKAMFWGTLVGEFIPTATG




XP_006964050.1
IWEFGLSVFGTADLYIDNELVIENTTHQTRGTAFFGKGTTEKVATRRMVAGSTYKLRLEFGSAN




GI:589104055
TTKMETTGVVNFGGGAVHLGACLKVDPQEMIARAVKAAADADYTIICTGLSGEWESEGFDRPHM





DLPPGVDTMISQVLDAAPNAVVVNQSGTPVTMSWAHKAKAIVQAWYGGNETGHGISDVLFGNVN





PSGKLSLSWPVDVKHNPAYLNYASVGGRVLYGEDVYVGYKFYDKTEREVLFPFGHGLSYATFKL





PDSTVRTVPETFHPDQPTVAIVKIKNTSSVPGAQVLQLYISAPNSPTHRPVKELHGFEKVYLEA





GEEKEVQIPIDQYATSFWDEIESMWKSERGIYDVLVGFSSQEISGKGKLIVPETRFWMGL




glycoside
MVAVKQIALLAGLAHWADAAEKVITNDTHFYGQSPPVYPSPEMTGGNEWEAAYQKAKAFVGQLT
394



hydrolase family 3
LEEKVNLTAGVPPNTTCSGVIPAIERLKFPGMCLSDAGNGLRNTDFVSGFPSGIHVGASWSKDL




protein
AFRRAVAMGAEFRKKGVNVLLGPVVGPAGRTVRGGRNWEGFSVDPWLAGVLVSETVSGIQEQGV




[Trichoderma
ITSTKHYILNEQETHRMPEANVSAVSSNIDDKTMHEYYLWPFQDAVRAGSGNIMCSYQRINNSY





reesei QM6a

GCSNSKTLNGLLKTELGFQGFVVSDWSAQHAGVASAEAGMDMAMPGPAEFWGEHLVEAVKNGSL




XP_006963375.1
PESRITDMATRIIATWYQFDQDNGIPKPGIGMPSNVLDSHEIVDARDPAAVPVLLNGAIEGHVL




GI:589102705
VKNTKNTLPLKKPRKLSLFGYSATTPDFFSPSRDEQLSDSWIFGKEAYNSNYLSPDGFATFGRN





GTTFGGCGSGAITPALAISPFEALKWRAAQDGTATFNNFLSDKPDVDPTSDACIVFGNAYACEG





NDRPAIQDDYTDDLIKAVASQCNKTIVVLHNAGIRLVDGFVDHPNITAVIFAHLPGQESGPALT





SLLYGETSPSGRLPYTVAKNDTDYGVVLDPAQATGEFAYFPQADFKEGVYLDYRYFDKEGIEPR





YEFGFGLSYTTFAYLNLSVDHVSGANTYPWPGGPIVSGGQTDLWDAIATVSVDIRNTGSVASYE





VAQLYIGIPGAPAKQLRGFEKPFLRPNESQSVTFHLTRRDLSVWSVERQKWQLQQGTYKIYVGS





SSRRLHVNGTLDI




cell wall protein
MLACITRATLPTVVAATPSHHHHHAHRHAKKHAAARVEKRAPDVVTEVVVGATATVFELDGKIV
395



[Trichoderma
DAATAKAGLAEGEYIIVGETTPTFVPPPPPPPATSSAAPLRAQFVEEPISSPAAPTTTSAPPPP




reesei QM6a
PTTTAQATTSSAPPPPKTSKPAQSSPSSGATGLDADFPSGKISCKTFPSEYGAVALDWLGTGGW




XP_006963339.1
SGLQFVPNYSPDAQSISDIITGIAGQTCSKGAMCSYACPPGYQKTQWPKAQGATLQSIGGLYCN




GI:589102633
EDGFLELTRPDHPKLCEAGAGGVTIKNDLDDSVCTCRTDYPGIESMVIPACTSAGETIELTNPD





ETDYYVWDGKTTSAQYYVNKKGYAVEDACVWNSPLDPRGAGNWSPINIGTGKTADGITWLSIFE





NLPTSSAKLDFNIEITGDVNSKCSYIDGAWTGGDKGCTTAMPSGGKAVIRYF




predicted protein
HLRSHTVESPNSIVKRGTCAFPTDDPNLVAVTPDAENAGWAMSPDQPCKPGHYCPIACKPGMVM
396



partial
AQWSPDSSYSYPSSMDGGLYCDEDGEVHKPFPNKPYCVEGTGAVVAVNKCGEPMSWCQTVLPGN




[Trichoderma
EAMLIPTLVEDQATIAVPDTSYWCETAAHYYINPPGSSVADCVWGVSSKPVGNWSPYVAGANTD





reesei QM6a

GDGNTFVKLGWNPIWQDSALKSTLPSFGVKIECPDGGCNGLPCEISPNSDGSVDSKESAVGAGN




XP_006962014.1
AAFCVVTVPKGGVANIVAYNVDGSSGGSDSDSDSDSGSSSSAAPSSTAHGLKAGGFAALAEKPT




GI:589099983
STTAAPSSTEVSTTAAASTTEVASTTAAAESTTTAAESTAAETTDASATATTKAAHSTTGGKAS





STARARPSVNPGMFHENGTSPHQTTAAPSGPSATQADSAPVTTTTKKGEAGRQQGSTAFAGLIV





FVAAACFL




hypothetical
MKVTDVQAALASAVVLLSLPAGSVASSHKRFHQLPNKKHTHLRSHTVESPNSIVKRGTCAFPTD
397



protein
DPNLVAVTPDAENAGWAMSPDQPCKPGHYCPIACKPGMVMAQWSPDSSYSYPSSMDGGLYCDED




M419DRAFT_70331
GEVHKPFPNKPYCVEGTGAVVAVNKCGEPMSWCQTVLPGNEAMLIPTLVEDQATIAVPDTSYWC




[Trichoderma
ETAAHYYINPPGSSVADCVWGVSSKPVGNWSPYVAGANTDGDGNTFVKLGWNPIWQDSALKSTL





reesei RUT C-30

PSFGVKIECPDGGCNGLPCEISPNSDGSVDSKESAVGAGNAAFCVVTVPKGGVANIVAYNKPTS




ETS05514.1
TTAAPSSTEVSTTAAASTTEVASTTAAAESTTTAAESTAAETTDASATATTKAAHSTTGGKASS




GI:572282500
TARARPSVNPGMFHENGTSPHQTTAAPSGPSATQADSAPVTTTTKKGEAGRQQGSTAFAGLIVA





FVAAACFL




beta-D-glucoside
MRYRTAAALALATGPFARADSHSTSGASAEAVVPPAGTPWGTAYDKAKAALAKLNLQDKVGIVS
398



glucohydrolase I
GVGWNGGPCVGNTSPASKISYPSLCLQDGPLGVRYSTGSTAFTPGVQAASTWDVNLIRERGQFI




[Trichoderma
GEEVKASGIHVILGPVAGPLGKTPQGGRNWEGFGVDPYLTGIAMGQTINGIQSVGVQATAKHYI





reesei RUT C-30

LNEQELNRETISSNPDDRTLHELYTWPFADAVQANVASVMCSYNKVNTTWACEDQYTLQTVLKD




ETS03194.1
QLGFPGYVMTDWNAQHTTVQSANSGLDMSMPGTDFNGNNRLWGPALTNAVNSNQVPTSRVDDMV




GI:572280097
TRILAAWYLTGQDQAGYPSFNISRNVQGNHKTNVRAIARDGIVLLKNDANILPLKKPASIAVVG





SAAIIGNHARNSPSCNDKGCDDGALGMGWGSGAVNYPYFVAPYDAINTRASSQGTQVTLSNTDN





TSSGASAARGKDVAIVFITADSGEGYITVEGNAGDRNNLDPWHNGNALVQAVAGANSNVIVVVH





SVGAIILEQILALPQVKAVVWAGLPSQESGNALVDVLWGDVSPSGKLVYTIAKSPNDYNTRIVS





GGSDSFSEGLFIDYKHFDDANITPRYEFGYGLSYTKFNYSRLSVLSTAKSGPATGAVVPGGPSD





LFQNVATVTVDIANSGQVTGAEVAQLYITYPSSAPRTPPKQLRGFAKLNLTPGQSGTATFNIRR





RDLSYWDTASQKWVVPSGSFGISVGASSRDIRLTSTLSVA




hypotheticalprotei
MILGCESTGVISAVKHFVANDQEHERRAVDCLITQRALREVYLRPFQIVARDARPGALMTSYNK
399



n M419DRAFT_122639
VNGKHVADSAEFLQGILRTEWNWDPLIVSDWYGTYTTIDAIKAGLDLEMPGVSRYRGKYIESAL




[Trichoderma
QARLLKQSTIDERARRVLRFAQKASHLKVSEVEQGRDFPEDRVLNRQICGSSIVLLKNENSILP





reesei RUT C-30

LPKSVKKVALVGSHVRLPAISGGGSASLVPYYAISLYDAVSEVLAGATITHEVGAYAHQMLPVI




ETS03170.1
DAMISNAVIHFYNDPIDVKDRKLLGSENVSSTSFQLMDYNNIPTLNKAMFWGTLVGEFIPTATG




GI:572280073
IWEFGLSVFGTADLYIDNELVIENTTHQTRGTAFFGKGTTEKVATRRMVAGSTYKLRLEFGSAN





TTKMETTGVVNFGGGAVHLGACLKVDPQEMIARAVKAAADADYTIICTGLSGEWESEGFDRPHM





DLPPGVDTMISQVLDAAPNAVVVNQSGTPVTMSWAHKAKAIVQAWYGGNETGHGISDVLFGNVN





PSGKLSLSWPVDVKHNPAYLNYASVGGRVLYGEDVYVGYKFYDKTEREVLFPFGHGLSYATFKL





PDSTVRTVPETFHPDQPTVAIVKIKNTSSVPGAQVLQLYISAPNSPTHRPVKELHGFEKVYLEA




SUN-domain-
GEEKEVQIPIDQYATSFWDEIESMWKSERGIYDVLVGFSSQEISGKGKLIVPETRFWMGL
400



containing protein
MLACITRATLPTVVAATPSHHHHHAHRHAKKHAAARVEKRAPDVVTEVVVGATATVFELDGKIV




[Trichoderma
DAATAKAGLAEGEYIIVGETTPTFVPPPPPPPATSSAAPLRAQFVEEPISSPAAPTTTSAPPPP





reesei RUT C-30

PTTTAQATTSSAPPPPKTSKPAQSSPSSGATGLDADFPSGKISCKTFPSEYGAVALDWLGTGGW




ETS01671.1
SGLQFVPNYSPDAQSISDIITGIAGQTCSKGAMCSYACPPGYQKTQWPKAQGATLQSIGGLYCN




GI:572278501
EDGFLELTRPDHPKLCEAGAGGVTIKNDLDDSVCTCRTDYPGIESMVIPACTSAGETIELTNPD





ETDYYVWDGKTTSAQYYVNKKGYAVEDACVWNSPLDPRGAGNWSPINIGTGKTADGITWLSIFE





NLPTSSAKLDFNIEITGDVNSKCSYIDGAWTGGDKGCTTAMPSGGKAVIRYF




hypotheticalprotei
MKTLSVFAAALLAAVAEANPYPPPHSNQAYSPPFYPSPWMDPSAPGWEQAYAQAKEFVSGLTLL




n M419DRAFT_25095
EKVNLTTGVGWMGEKCVGNVGTVPRLGMRSLCMQDGPLGLRFNTYNSAFSVGLTAAASWSRHLW
401



[Trichoderma
VDRGTALGSEAKGKGVDVLLGPVAGPLGRNPNGGRNVEGFGSDPYLAGLALADTVTGIQNAGTI





reesei RUT C-30

ACAKHFLLNEQEHFRQVGEANGYGYPITEALSSNVDDKTIHEVYGWPFQDAVKAGVGSIMCSYN




ET01349.1
QVNNSYACQNSKLINGLLKEEYGFQGFVMSDWQAQHTGVASAVAGLDMTMPGDTAFNTGASYFG




GI:572278157
SNLTLAVLNGTVPEWRIDDMVMRIMAPFFKVGKTVDSLIDTNFDSWTNGEYGYVQAAVNENWEK





VNYGVDVRANHANHIREVGAKGTVIFKNNGILPLKKPKFLTVIGEDAGGNPAGPNGCGDRGCDD





GTLAMEWGSGTTNFPYLVTPDAALQSQALQDGTRYESILSNYAISQTQALVSQPDAIAIVFANS





DSGEGYINVDGNEGDRKNLTLWKNGDDLIKTVAAVNPKTIVVIHSTGPVILKDYANHPNISAIL





WAGAPGQESGNSLVDILYGKQSPGRTPFTWGPSLESYGVSVMTTPNNGNGAPQDNFNEGAFIDY





RYFDKVAPGKPRSSDKAPTYEFGFGLSWSTFKFSNLHIQKNNVGPMSPPNGKTIAAPSLGSFSK





NLKDYGFPKNVRRIKEFIYPYLSTTTSGKEASGDAHYGQTAKEFLPAGALDGSPQPRSAASGEP





GGNRQLYDILYTVTATITNTGSVMDDAVPQLYLSHGGPNEPPKVLRGFDRIERIAPGQSVTFKA





DLTRRDLSNWDTKKQQWVITDYPKTVYVGSSSRDLPLSARLP




beta-N-
MANSIGGSSADKFDLDPLWQNLDWAIGQMMLMGWDGTQVTPQIRSLIEDHHLGSIILTAKNLKS
402



acetylglucosaminid
AHHTALLVQELQMIAKNSGHPQPLLIAVDQENGGVNSLFDEDFVCQFPSAMAIAATGSLELSYE




ase [Trichoderma
VNKATATEISACGVNLMLGPVLDVLNNARYQVIGVRASGDDPQEVSQYGLAALSGIRDAGVASC





reesei RUT C-30

GKHFPSYGNLDFLGSNLDVPIITQTLEELSLSALVPFRNAIASGKLDAMFIGGCGISNPSMNVS




ET00749.1
HACLSDQVVDDLLRNELGFKGVAISECLEMEALSQDLGVQNGVVMAVEAGCDIVLLCRAYDVQL




GI:572277491
EAIKGLKLGYENGIITKERIFTSLKRIFHLKSTCTSWAKALNPPGINLLSQIRPSHLALSRRAY





DDSITIVRDKEKLLPLSLSMHPGEELLLLTPLVKPLPASSLTKSLLESKNDPSLVSTEHDRWNH





QIRERSAIMSGEGVFREFGKTLARYRNEKLLHTSYTANGVRPVHENLINRASCIIIFTADANRN





LYQAGFTKHVDMMCSMLRSRGQKKQLIVVAVSSPYDFAMDKSIGTYICTYDFTENAMAALVRAL





VGDSNPVGTMPGTLRKSKKVLKSRQHWLVEEFDSSRDRKGLNDLIRAVHRASDQDFRYLQTATA





DTFLLANQNIKETHFVVRNSSTQALYGFAATYFVQNVGILGALIVDPTKRNMSIGRSLHRRAIK





SLTQQRGIKKVQLGSCFPALFLGIPLDIEVTTTKEWFSNSGWDTQFPRRLTNMVIQDLSAWYAP





EGLSQSIQRANISFDLIYGVESGDTVMHHVRTHANPEVLELYRTALEESKACGIVRAKDAAGNL





LGTIIICRPNSPLARYVPPLVSLGQDIGGLLAPIVPPAPLSTLVLQGLALMGVRQNKGHKATKS





VLSWVVDDAYEPLVAMGFDVLQAFEEITNSPETFQT




hypothetical
MLPRRMRKSRCCIAVLAVIAIIVMLLAAAGAFGYKKLKITPLDGKSPPWYPTPKGGSVRQWADS
403



protein
YQKAAEMVARMTLPEKVNITTGTGWSMGLAVGNTAPALLVGFPALALQDGPLGIRFADNATALP




M419DRAFT_86704
AGVTVGATWNRHLMYEHGRVHALEARGKGINALLGPCVGPLGRMPAGGRNWEGFGADPYLQGVA




[Trichoderma
GYETIKGIQDQGVMATIKHFVANEQEHFRQAWEWVLPNALSSNIDDRTMHEIYAWPFGDAVKAG





reesei RUT C-30

VASVMCSYNMVNNSYACGNSKLLNGILKDELGFQGFVMSDWLAQRSGVGSALAGLDMSMPGDGL




ETR99336.1
RWQDGNSLWGPNLSRAVLNGSLPLERLNDMVVRIVAAWYQLGQDDEKLFDRKGPNFSSWTNDRM




GI:572275968
GVTAPASSSPQEKVVVNQFVNVQANHSILARQIAAEGTVLLKNEGVLPLSVDGLLGGGGGSNST





KREGQVRIGIFGEDAGPGKGPNYCEDRSCNQGTLASGWGSGAVEFPYLVSPIEALRKKFNKDKV





KLTEHLKNDELDTGVIKSQDICMVFINSDSGEGYRAWEGVRGDRNDLKPQKGGVGLVTHVGLNC





GNGSGTTIVVLHSVGPVVVDPWIDMPGIKALISANLPGQESGNALASILFGEENPSGKLPYTVG





KSLSDYGPGGQVMYLPNGAVPQQDFSEGLYIDYRHFDKFNIEPRYEFGFGLSYTNFDYKNLKIT





ETKPRSPLPDERPAAEVEPPSFDTTIPQAEEALFPSGIRRLKKYVYPYIESVKDIKEGQYSYPD





GYDKEQPLSGAGGGEGGNPSLWDSHVIVSVEVTNTGKLGGKAVPQLYLSYPASETVDFPVRVLR





GFDKVYIGKGETKTVEFSLTRRDLSYWDVERQNWVIPEGEYTFAVGESSRDLRVSGTW





MPSPEQRRKIGQLFAVGFHGREINQEITTLIRDYGVGAIVLFKRNVLDAAQLQALCLGLQKIAR
404



putative beta-N-
DAGHNQPLFIGIDQENGLVTRISPPIASQQPGPMTLGAAGSLEYAYEVAKATAEMLRYFGINMN




acetylglucosaminid
YAPVGDVNSEPLNPVIGVRSPSDQAETVSKFAAACTKGLREHKVVPCIKHFPGHGDTAVDSHYG




ase [Trichoderma
LPVINKSRADMEKLELIPFRDAVADNIEMVMTAHISLPQLAKDGLPATLSPDTIGILRNEWKYE





reesei RUT C-30

GVIMTDCLEMDGIRATYGTVEGSLMAFQAGVDNVMICHTFDVQAAAVDYICGAIESGKLSQERV




ETR97676.1
DQSLERLRKLKERYTNWDIALHAEPPEALEAINERGEKLARQVYADATTLVRAQEGLLPLKATA




GI:572274120
KIAFVSPGPDVPIGGAVDSGTLPTRVPWIADTFGEQIRKRAPEMSDVRFTSSNLTEEQWEQIDE





ADVVILATRNARESKYQKELGLEVAKRRGSRPLISIATCAPYDFLDDEEIRTYIAVYEPTVEAF





SAAVDILFGDAQPRGKLPVAH




UDP-
MGHKHIAIFNIPAHGHINPTLALTASLVKRGYRVTYPVTDEFVKAVEETGAEPLNYRSTLNIDP
405
Pandey et


glycotransferase
QQIRELMKNKKDMSQAPLMFIKEMEEVLPQLEALYENDKPDLILFDFMAMAGKLLAEKFGIEAV

al., 2014


(338)
RLCSTYAQNEHFTFRSISEEFKIELTPEQEDALKNSNLPSFNFEDMFEPAKLNIVFMPRAFQPY





GETFDERFSFVGPSLAKRKFQEKETPIISDSGRPVMLISLGTAFNAWPEFYHMCIEAFRDTKWQ





VIMAVGTTIDPESFDDIPENFSIHQRVPQLEILKKAELFITHGGMNSTMEGLNAGVPLVAVPQM





PEQEITARRVEELGLGKHLQPEDTTAASLREAVSQTDGDPHVLKRIQDMQKHIKQAGGAEKAAD





EIEAFLAPAGVK.




UDP-
ATGGGACATAAACATATCGCGATTTTTAATATTCCGGCTCACGGCCATATTAATCCAACGCTAG
406
Pandey et


glycotransferase
CTTTAACGGCAAGCCTTGTCAAACGCGGTTATCGGGTAACATATCCGGTGACGGATGAGTTTGT

al., 2014


338 (gDNA, native)
GAAGGCTGTTGAGGAAACTGGGGCAGAGCCGCTCAACTACCGCTCAACTTTAAATATCGATCCG





CAGCAAATTCGGGAGCTGATGAAAAATAAAAAAGATATGTCGCAGGCTCCGCTGATGTTTATCA





AAGAAATGGAGGAGGTTCTTCCTCAGCTTGAAGCGCTCTATGAGAATGACAAGCCAGACCTTAT





CCTTTTTGACTTTATGGCCATGGCGGGAAAACTGCTGGCTGAGAAGTTTGGAATAGAGGCGGTC





CGCCTTTGTTCTACATATGCACAGAACGAACATTTTACATTCAGATCCATTTCTGAAGAGTTTA





AGATCGAGCTGACGCCTGAGCAAGAGGATGCTTTGAAAAATTCGAATCTTCCGTCATTTAACTT





TGAGGATATGTTCGAGCCTGCAAAATTGAACATTGTCTTTATGCCTCGTGCTTTTCAGCCTTAC





GGCGAAACGTTTGATGAGCGGTTCTCTTTTGTTGGTCCTTCTCTTGCCAAACGCAAGTTTCAGG





AAAAAGAAACGCCGATTATTTCGGACAGCGGCCGTCCTGTCATGCTGATATCTTTAGGGACGGC





GTTCAATGCCTGGCCGGAATTTTATCATATGTGCATAGAAGCATTCAGGGACACGAAGTGGCAG





GTTATCATGGCTGTTGGCACGACAATCGATCCTGAAAGCTTTGATGACATACCTGAGAACTTTT





CGATTCATCAGCGCGTTCCTCAGCTGGAGATCCTGAAGAAAGCGGAGCTGTTCATCACCCATGG





GGGTATGAACAGTACGATGGAAGGGTTGAATGCCGGTGTACCGCTCGTTGCCGTTCCGCAAATG





CCTGAACAGGAAATCACTGCCCGCCGCGTCGAAGAGCTTGGGCTTGGCAAGCATTTGCAGCCGG





AAGACACAACAGCAGCTTCACTGCGGGAAGCCGTCTCTCAGACGGATGGTGACCCGCATGTCCT





GAAACGGATACAGGACATGCAAAAGCACATTAAACAAGCCGGAGGGGCCGAGAAAGCCGCAGAT





GAAATTGAGGCATTTTTAGCACCCGCAGGAGTAAAATAA




301 UGT98
MDAQRGHTTTILMFPWLGYGHLSAFLELAKSLSRRNFHIYFCSTSVNLDAIKPKLPSSSSSDSI
407




QLVELCLPSSPDQLPPHLHTTNALPPHLMPTLHQAFSMAAQHFAAILHTLAPHLLIYDSFQPWA





PQLASSLNIPAINFNTTGASVLTRMLHATHYPSSKFPISEFVLHDYWKAMYSAAGGAVTKKDHK





IGETLANCLHASCSVILINSFRELEEKYMDYLSVLLNKKVVPVGPLVYEPNQDGEDEGYSSIKN





WLDKKEPSSTVFVSFGSEYFPSKEEMEEIAHGLEASEVHFIWVVRFPQGDNTSAIEDALPKGFL





ERVGERGMVVKGWAPQAKILKHWSTGGFVSHCGWNSVMESMMFGVPIIGVPMHLDQPFNAGLAE





EAGVGVEAKRDSDGKIQREEVAKSIKEVVIEKTREDVRKKAREMGEILRSKGDEKIDELVAEIS





LLRKKAPCSIAAALEHHHHHH




301 UGT98 (gDNA,
CTCGAATTCATGGATGCCCAGCGAGGTCACACCACAACCATTTTGATGTTTCCATGGCTCGGCT
408



native)
ATGGCCATCTTTCGGCTTTCCTAGAGTTGGCCAAAAGCCTCTCAAGGAGGAACTTCCATATCTA





CTTCTGTTCAACCTCTGTTAACCTCGACGCCATTAAACCAAAGCTTCCTTCTTCTTCCTCTTCT





GATTCCATCCAACTTGTGGAACTTTGTCTTCCATCTTCTCCTGATCAGCTCCCTCCTCATCTTC





ACACAACCAACGCCCTCCCCCCTCACCTCATGCCCACTCTCCACCAAGCCTTCTCCATGGCTGC





CCAACACTTTGCTGCCATTTTACACACACTTGCTCCGCATCTCCTCATTTACGACTCTTTCCAA





CCTTGGGCTCCTCAACTAGCTTCATCCCTCAACATTCCAGCCATCAACTTCAATACTACGGGAG





CTTCAGTCCTGACCCGAATGCTTCACGCTACTCACTACCCAAGTTCTAAATTCCCAATTTCAGA





GTTTGTTCTCCACGATTATTGGAAAGCCATGTACAGCGCCGCCGGTGGGGCTGTTACAAAAAAA





GACCACAAAATTGGAGAAACACTTGCGAATTGCTTGCATGCTTCTTGTAGTGTAATTCTAATCA





ATAGTTTCAGAGAGCTCGAGGAGAAATATATGGATTATCTCTCCGTTCTCTTGAACAAGAAAGT





TGTTCCGGTTGGTCCTTTGGTTTACGAACCGAATCAAGACGGGGAAGATGAAGGTTATTCAAGC





ATCAAAAATTGGCTTGACAAAAAGGAACCGTCCTCCACCGTCTTCGTTTCATTTGGAAGCGAAT





ACTTCCCGTCAAAGGAAGAAATGGAAGAGATAGCCCATGGGTTAGAGGCGAGCGAGGTTCATTT





CATCTGGGTCGTTAGGTTTCCTCAAGGAGACAACACCAGCGCCATTGAAGATGCCTTGCCGAAG





GGGTTTCTGGAGAGGGTGGGAGAGAGAGGGATGGTGGTGAAGGGTTGGGCTCCCCAGGCGAAGA





TACTGAAGCATTGGAGCACAGGGGGATTCGTGAGCCACTGTGGATGGAACTCGGTGATGGAAAG





CATGATGTTTGGCGTTCCCATAATAGGGGTTCCGATGCATCTGGACCAGCCCTTTAACGCCGGA





CTCGCGGAAGAAGCTGGCGTCGGCGTGGAAGCCAAGCGAGATTCGGACGGCAAAATTCAAAGAG





AAGAAGTTGCAAAGTCGATCAAAGAAGTGGTGATTGAGAAAACCAGGGAAGACGTGAGGAAGAA





AGCAAGAGAAATGGGTGAGATTTTGAGGAGTAAAGGAGATGAGAAAATTGATGAGTTGGTGGCT





GAAATTTCTCTTTTGCGCAAAAAGGCCCCATGTTCAATTGCGGCCGCACTCGAGCACCACCACC





ACCACCACTGA




UDP-
MVQPRVLLFPFPALGHVKPFLSLAELLSDAGIDVVFLSTEYNHRRISNTEALASRFPTLHFETI
409



glycosy1transferas
PDGLPPNESRALADGPLYFSMREGTKPRFRQLIQSLNDGRWPITCIITDIMLSSPIEVAEEFGI




es (339)
PVIAFCPCSARYLSIHFFIPKLVEEGQIPYADDDPIGEIQGVPLFEGLLRRNHLPGSWSDKSAD





ISFSHGLINQTLAAGRASALILNTFDELEAPFLTHLSSIFNKIYTIGPLHALSKSRLGDSSSSA





SALSGFWKEDRACMSWLDCQPPRSVVFVSFGSTMKMKADELREFWYGLVSSGKPFLCVLRSDVV





SGGEAAELIEQMAEEEGAGGKLGMVVEWAAQEKVLSHPAVGGFLTHCGWNSTVESIAAGVPMMC





WPILGDQPSNATWIDRVWKIGVERNNREWDRLTVEKMVRALMEGQKRVEIQRSMEKLSKLANEK





VVRGGLSFDNLEVLVEDIKKLKPYKF




UDP-
ATGGTGCAACCTCGGGTACTGCTGTTTCCTTTCCCGGCACTGGGCCACGTGAAGCCCTTCTTAT
410



glycosy1transferas
CACTGGCGGAGCTGCTTTCCGACGCCGGCATAGACGTCGTCTTCCTCAGCACCGAGTATAACCA




es (339) (gDNA)
CCGTCGGATCTCCAACACTGAAGCCCTAGCCTCCCGCTTCCCGACGCTTCATTTCGAAACTATA





CCGGATGGCCTGCCGCCTAATGAGTCGCGCGCTCTTGCCGACGGCCCACTGTATTTCTCCATGC





GTGAGGGAACTAAACCGAGATTCCGGCAACTGATTCAATCTCTTAACGACGGTCGTTGGCCCAT





CACCTGTATTATCACTGACATCATGTTATCTTCTCCGATTGAAGTAGCGGAAGAATTTGGGATT





CCAGTAATTGCCTTCTGCCCCTGCAGTGCTCGCTACTTATCGATTCACTTTTTTATACCGAAGC





TCGTTGAGGAAGGTCAAATTCCATACGCAGATGACGATCCGATTGGAGAGATCCAGGGGGTGCC





CTTGTTCGAAGGTCTTTTGCGACGGAATCATTTGCCTGGTTCTTGGTCTGATAAATCTGCAGAT





ATATCTTTCTCGCATGGCTTGATTAATCAGACCCTTGCAGCTGGTCGAGCCTCGGCTCTTATAC





TCAACACCTTCGACGAGCTCGAAGCTCCATTTCTGACCCATCTCTCTTCCATTTTCAACAAAAT





CTACACCATTGGACCCCTCCATGCTCTGTCCAAATCAAGGCTCGGCGACTCCTCCTCCTCCGCT





TCTGCCCTCTCCGGATTCTGGAAAGAGGATAGAGCCTGCATGTCCTGGCTCGACTGTCAGCCGC





CGAGATCTGTGGTTTTCGTCAGTTTCGGGAGTACGATGAAGATGAAAGCCGATGAATTGAGAGA





GTTCTGGTATGGGTTGGTGAGCAGCGGGAAACCGTTCCTCTGCGTGTTGAGATCCGACGTTGTT





TCCGGCGGAGAAGCGGCGGAATTGATCGAACAGATGGCGGAGGAGGAGGGAGCTGGAGGGAAGC





TGGGAATGGTAGTGGAGTGGGCAGCGCAAGAGAAGGTCCTGAGCCACCCTGCCGTCGGTGGGTT





TTTGACGCACTGCGGGTGGAACTCAACGGTGGAAAGCATTGCCGCGGGAGTTCCGATGATGTGC





TGGCCGATTCTCGGCGACCAACCCAGCAACGCCACTTGGATCGACAGAGTGTGGAAAATTGGGG





TTGAAAGGAACAATCGTGAATGGGACAGGTTGACGGTGGAGAAGATGGTGAGAGCATTGATGGA





AGGCCAAAAGAGAGTGGAGATTCAGAGATCAATGGAGAAGCTTTCAAAGTTGGCAAATGAGAAG





GTTGTCAGGGGTGGGTTGTCTTTTGATAACTTGGAAGTTCTCGTTGAAGACATCAAAAAATTGA





AACCATATAAATTTTAA




UDP-
MDTRKRSIRILMFPWLAHGHISAFLELAKSLAKRNFVIYICSSQVNLNSISKNMSSKDSISVKL
411



glycosyltransferas
VELHIPTTILPPPYHTTNGLPPHLMSTLKRALDSARPAFSTLLQTLKPDLVLYDFLQSWASEEA




es (330)
ESQNIPAMVFLSTGAAAISFIMYHWFETRPEEYPFPAIYFREHEYDNFCRFKSSDSGTSDQLRV




(protein)
SDCVKRSHDLVLIKTFRELEGQYVDFLSDLTRKRFVPVGPLVQEVGCDMENEGNDIIEWLDGKD





RRSTVFSSFGSEYFLSANEIEEIAYGLELSGLNFIWVVRFPHGDEKIKIEEKLPEGFLERVEGR





GLVVEGWAQQRRILSHPSVGGFLSHCGWSSVMEGVYSGVPIIAVPMHLDQPFNARLVEAVGFGE





EVVRSRQGNLDRGEVARVVKKLVMGKSGEGLRRRVEELSEKMREKGEEEIDSLVEELVTVVRRR





ERSNLKSENSMKKLNVMDDGE




UDP-
ATGGATACAAGAAAGAGAAGCATCAGGATTCTAATGTTCCCATGGCTTGCTCATGGCCATATCT
412



glycosyltransferas
CAGCATTCCTCGAGCTGGCGAAGTCACTTGCCAAAAGAAACTTCGTCATTTACATTTGTTCTTC




es (330) (gDNA,
ACAAGTAAATCTAAATTCCATCAGCAAGAACATGTCATCAAAAGACTCCATTTCCGTAAAACTT




native)
GTTGAGCTTCACATTCCCACCACCATACTTCCCCCTCCTTACCACACCACCAATGGCCTCCCAC





CCCACCTCATGTCCACCCTCAAGAGAGCCCTCGACAGTGCCCGGCCCGCCTTCTCCACCCTCCT





CCAAACCCTCAAGCCCGACTTGGTTTTATACGATTTCCTCCAGTCGTGGGCCTCGGAGGAGGCC





GAGTCGCAGAATATACCAGCCATGGTGTTTCTGAGTACCGGAGCTGCAGCGATTTCTTTTATTA





TGTACCATTGGTTTGAGACCAGACCGGAGGAGTACCCTTTTCCGGCTATATACTTCCGGGAACA





CGAGTATGATAACTTCTGCCGTTTTAAGTCTTCCGACAGCGGTACTAGTGATCAATTGAGAGTC





AGCGATTGCGTTAAACGGTCGCACGATTTGGTTCTGATCAAGACATTCCGTGAACTGGAAGGAC





AATACGTAGATTTTCTCTCCGACTTGACTCGGAAGAGATTCGTACCAGTTGGCCCCCTTGTTCA





GGAGGTAGGTTGTGATATGGAGAATGAAGGAAATGACATCATCGAATGGCTCGACGGGAAAGAC





CGTCGTTCGACGGTTTTCTCCTCATTCGGGAGCGAGTACTTCTTGTCTGCCAATGAGATCGAAG





AGATAGCTTATGGGCTGGAGCTAAGCGGGCTTAACTTCATCTGGGTTGTTAGGTTTCCTCATGG





CGACGAGAAAATCAAGATTGAGGAGAAACTGCCGGAAGGGTTTCTTGAGAGAGTGGAAGGAAGA





GGGTTGGTGGTGGAGGGATGGGCACAGCAGAGGAGAATATTGTCACATCCGAGTGTTGGAGGGT





TTTTGAGCCACTGTGGGTGGAGTTCTGTGATGGAAGGGGTGTATTCCGGTGTGCCGATTATTGC





CGTGCCGATGCATCTTGACCAGCCGTTCAATGCTAGGTTGGTGGAGGCGGTGGGGTTTGGGGAG





GAGGTGGTGAGGAGTAGACAAGGAAATCTTGACAGAGGAGAGGTGGCGAGGGTGGTGAAGAAGC





TGGTTATGGGGAAAAGTGGGGAGGGGTTACGGCGGAGGGTGGAGGAGTTGAGTGAGAAGATGAG





AGAGAAAGGGGAGGAGGAGATTGATTCACTGGTGGAGGAATTGGTGACGGTGGTTAGGAGGAGA





GAGAGATCGAATCTCAAGTCTGAGAATTCTATGAAGAAATTGAATGTGATGGATGATGGAGAAT





AG




UDP-
MDAAQQGDTTTILMLPWLGYGHLSAFLELAKSLSRRNFHIYFCSTSVNLDAIKPKLPSSFSDSI
413



glycosyltransferas
QFVELHLPSSPEFPPHLHTTNGLPPTLMPALHQAFSMAAQHFESILQTLAPHLLIYDSLQPWAP




es (328) described
RVASSLKIPAINFNTTGVFVISQGXHPIHYPHSKFPFSEFVLHNHWKAMYSTADGASTERTRKR




in Itkin et al.
GEAFLYCLHASCSVILINSFRELEGKYMDYLSVLLNKKVVPVGPLVYEPNQDGEDEGYSSIKNW




(protein)
LDKKEPSSTVFVSFGSEYFPSKEEMEEIAHGLEASEVNFIWVVRFPQGDNTSGIEDALPKGFLE





RAGERGMVVKGWAPQAKILKHWSTGGFVSHCGWNSVMESMMFGVPIIGVPMHVDQPFNAGLVEE





AGVGVEAKRDPDGKIQRDEVAKLIKEVVVEKTREDVRKKAREMSEILRSKGEEKFDEMVAEISL





LLKI




UDP-
ATGGATGCTGCCCAACAAGGTGACACCACAACCATTTTGATGCTTCCATGGCTCGGCTATGGCC
414



glycosyltransferas
ATCTTTCAGCTTTTCTCGAGCTGGCCAAAAGCCTCTCAAGGAGGAACTTCCATATCTACTTCTG




es (328 (gDNA,
TTCAACCTCTGTTAATCTTGACGCCATTAAACCAAAGCTTCCTTCTTCTTTCTCTGATTCCATT




native)
CAATTTGTGGAGCTCCATCTCCCTTCTTCTCCTGAGTTCCCTCCTCATCTTCACACAACCAACG





GCCTTCCCCCTACCCTCATGCCCGCTCTCCACCAAGCCTTCTCCATGGCTGCCCAGCACTTTGA





GTCCATTTTACAAACACTTGCCCCGCACCTTCTCATTTATGACTCTCTTCAACCTTGGGCTCCT





CGGGTAGCTTCATCCCTCAAAATTCCGGCCATCAACTTCAATACCACGGGAGTTTTCGTCATTT





CTCAAGGGYTTCACCCTATTCACTACCCACATTCTAAATTCCCATTCTCAGAGTTCGTTCTTCA





CAATCATTGGAAAGCCATGTACTCCACTGCCGATGGAGCTTCTACCGAAAGAACCCGCAAACGT





GGAGAAGCGTTTCTGTATTGCTTGCATGCTTCTTGTAGTGTAATTCTAATCAATAGTTTCAGAG





AGCTCGAGGGGAAATATATGGATTATCTCTCTGTTCTCTTGAACAAGAAAGTTGTTCCGGTTGG





TCCTTTGGTTTACGAACCGAATCAAGACGGGGAAGATGAAGGTTATTCAAGCATCAAAAATTGG





CTTGACAAAAAGGAACCGTCCTCCACCGTCTTCGTGTCATTTGGAAGCGAATACTTCCCGTCAA





AGGAAGAAATGGAAGAGATAGCCCATGGGTTAGAGGCGAGCGAGGTTAATTTCATCTGGGTCGT





TAGGTTTCCTCAAGGAGACAACACCAGCGGCATTGAAGATGCCTTGCCGAAGGGTTTTCTGGAG





AGGGCGGGAGAGAGAGGGATGGTGGTGAAGGGTTGGGCTCCTCAGGCGAAGATACTGAAGCATT





GGAGCACAGGGGGATTCGTGAGCCACTGTGGATGGAACTCGGTGATGGAGAGCATGATGTTTGG





CGTTCCCATAATAGGGGTTCCGATGCATGTGGACCAGCCCTTTAACGCCGGACTCGTGGAAGAA





GCTGGCGTCGGCGTGGAGGCCAAGCGAGATCCAGACGGCAAAATTCAAAGAGACGAAGTTGCAA





AGTTGATCAAAGAAGTGGTGGTTGAGAAAACCAGAGAAGATGTGCGGAAGAAAGCAAGAGAAAT





GAGTGAGATTTTGAGGAGCAAGGGAGAGGAGAAGTTTGATGAGATGGTCGCTGAAATTTCTCTC





TTGCTTAAAATATGA




AtSus1 protein
MKHHHHHHQLHAGAHAAAGTMANAERMITRVHSQRERLNETLVSERNEVLALLSRVEAKGKGIL
415




QQNQIIAEFEALPEQTRKKLEGGPFFDLLKSTQEAIVLPPWVALAVRPRPGVWEYLRVNLHALV





VEELQPAEFLHFKEELVDGVKNGNFTLELDFEPFNASIPRPTLHKYIGNGVDFLNRHLSAKLFH





DKESLLPLLKFLRLHSHQGKNLMLSEKIQNLNTLQHTLRKAEEYLAELKSETLYEEFEAKFEEI





GLERGWGDNAERVLDMIRLLLDLLEAPDPCTLETFLGRVPMVFNVVILSPHGYFAQDNVLGYPD





TGGQVVYILDQVRALEIEMLQRIKQQGLNIKPRILILTRLLPDAVGTTCGERLERVYDSEYCDI





LRVPFRTEKGIVRKWISRFEVWPYLETYTEDAAVELSKELNGKPDLIIGNYSDGNLVASLLAHK





LGVTQCTIAHALEKTKYPDSDIYWKKLDDKYHFSCQFTADIFAMNHTDFIITSTFQEIAGSKET





VGQYESHTAFTLPGLYRVVHGIDVFDPKFNIVSPGADMSIYFPYTEEKRRLTKFHSEIEELLYS





DVENKEHLCVLKDKKKPILFTMARLDRVKNLSGLVEWYGKNTRLRELANLVVVGGDRRKESKDN





EEKAEMKKMYDLIEEYKLNGQFRWISSQMDRVRNGELYRYICDTKGAFVQPALYEAFGLTVVEA





MTCGLPTFATCKGGPAEIIVHGKSGFHIDPYHGDQAADTLADFFTKCKEDPSHWDEISKGGLQR





IEEKYTWQIYSQRLLTLTGVYGFWKHVSNLDRLEARRYLEMFYALKYRPLAQAVPLAQDD




AtSus1 (gDNA)
ATGAAACATCACCATCACCATCACCAGCTGCATGCGGGAGCTCATGCGGCCGCGGGTACCATGG
416




CAAACGCTGAACGTATGATAACGCGCGTCCACAGCCAACGTGAGCGTTTGAACGAAACGCTTGT





TTCTGAGAGAAACGAAGTCCTTGCCTTGCTTTCCAGGGTTGAAGCCAAAGGTAAAGGTATTTTA





CAACAAAACCAGATCATTGCTGAATTCGAAGCTTTGCCTGAACAAACCCGGAAGAAACTTGAAG





GTGGTCCTTTCTTTGACCTTCTCAAATCCACTCAGGAAGCAATTGTGTTGCCACCATGGGTTGC





TCTAGCTGTGAGGCCAAGGCCTGGTGTTTGGGAATACTTACGAGTCAATCTCCATGCTCTTGTC





GTTGAAGAACTCCAACCTGCTGAGTTTCTTCATTTCAAGGAAGAACTCGTTGATGGAGTTAAGA





ATGGTAATTTCACTCTTGAGCTTGATTTCGAGCCATTCAATGCGTCTATCCCTCGTCCAACACT





CCACAAATACATTGGAAATGGTGTTGACTTCCTTAACCGTCATTTATCGGCTAAGCTCTTCCAT





GACAAGGAGAGTTTGCTTCCATTGCTTAAGTTCCTTCGTCTTCACAGCCACCAGGGCAAGAACC





TGATGTTGAGCGAGAAGATTCAGAACCTCAACACTCTGCAACACACCTTGAGGAAAGCAGAAGA





GTATCTAGCAGAGCTTAAGTCCGAAACACTGTATGAAGAGTTTGAGGCCAAGTTTGAGGAGATT





GGTCTTGAGAGGGGATGGGGAGACAATGCAGAGCGTGTCCTTGACATGATACGTCTTCTTTTGG





ACCTTCTTGAGGCGCCTGATCCTTGCACTCTTGAGACTTTTCTTGGAAGAGTACCAATGGTGTT





CAACGTTGTGATCCTCTCTCCACATGGTTACTTTGCTCAGGACAATGTTCTTGGTTACCCTGAC





ACTGGTGGACAGGTTGTTTACATTCTTGATCAAGTTCGTGCTCTGGAGATAGAGATGCTTCAAC





GTATTAAGCAACAAGGACTCAACATTAAACCAAGGATTCTCATTCTAACTCGACTTCTACCTGA





TGCGGTAGGAACTACATGCGGTGAACGTCTCGAGAGAGTTTATGATTCTGAGTACTGTGATATT





CTTCGTGTGCCCTTCAGAACAGAGAAGGGTATTGTTCGCAAATGGATCTCAAGGTTCGAAGTCT





GGCCATATCTAGAGACTTACACCGAGGATGCTGCGGTTGAGCTATCGAAAGAATTGAATGGCAA





GCCTGACCTTATCATTGGTAACTACAGTGATGGAAATCTTGTTGCTTCTTTATTGGCTCACAAA





CTTGGTGTCACTCAGTGTACCATTGCTCATGCTCTTGAGAAAACAAAGTACCCGGATTCTGATA





TCTACTGGAAGAAGCTTGACGACAAGTACCATTTCTCATGCCAGTTCACTGCGGATATTTTCGC





AATGAACCACACTGATTTCATCATCACTAGTACTTTCCAAGAAATTGCTGGAAGCAAAGAAACT





GTTGGGCAGTATGAAAGCCACACAGCCTTTACTCTTCCCGGATTGTATCGAGTTGTTCACGGGA





TTGATGTGTTTGATCCCAAGTTCAACATTGTCTCTCCTGGTGCTGATATGAGCATCTACTTCCC





TTACACAGAGGAGAAGCGTAGATTGACTAAGTTCCACTCTGAGATCGAGGAGCTCCTCTACAGC





GATGTTGAGAACAAAGAGCACTTATGTGTGCTCAAGGACAAGAAGAAGCCGATTCTCTTCACAA





TGGCTAGGCTTGATCGTGTCAAGAACTTGTCAGGTCTTGTTGAGTGGTACGGGAAGAACACCCG





CTTGCGTGAGCTAGCTAACTTGGTTGTTGTTGGAGGAGACAGGAGGAAAGAGTCAAAGGACAAT





GAAGAGAAAGCAGAGATGAAGAAAATGTATGATCTCATTGAGGAATACAAGCTAAACGGTCAGT





TCAGGTGGATCTCCTCTCAGATGGACCGGGTAAGGAACGGTGAGCTGTACCGGTACATCTGTGA





CACCAAGGGTGCTTTTGTCCAACCTGCATTATATGAAGCCTTTGGGTTAACTGTTGTGGAGGCT





ATGACTTGTGGTTTACCGACTTTCGCCACTTGCAAAGGTGGTCCAGCTGAGATCATTGTGCACG





GTAAATCGGGTTTCCACATTGACCCTTACCATGGTGATCAGGCTGCTGATACTCTTGCTGATTT





CTTCACCAAGTGTAAGGAGGATCCATCTCACTGGGATGAGATCTCAAAAGGAGGGCTTCAGAGG





ATTGAGGAGAAATACACTTGGCAAATCTATTCACAGAGGCTCTTGACATTGACTGGTGTGTATG





GATTCTGGAAGCATGTCTCGAACCTTGACCGTCTTGAGGCTCGCCGTTACCTTGAAATGTTCTA





TGCATTGAAGTATCGCCCATTGGCTCAGGCTGTTCCTCTTGCACAAGATGATTGA




SgCbQ protein
MWRLKVGAESVGENDEKWLKSISNHLGRQVWEFCPDAGTQQQLLQVHKARKAFHDDRFHRKQSS
417




DLFITIQYGKEVENGGKTAGVKLKEGEEVRKEAVESSLERALSFYSSIQTSDGNWASDLGGPMF





LLPGLVIALYVTGVLNSVLSKHHRQEMCRYVYNHQNEDGGWGLHIEGPSTMFGSALNYVALRLL





GEDANAGAMPKARAWILDHGGATGITSWGKLWLSVLGVYEWSGNNPLPPEFWLFPYFLPFHPGR





MWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYAVPYHEIDWNKSRNTCAKEDLYYPHPKM





QDILWGSLHHVYEPLFTRWPAKRLREKALQTAMQHIHYEDENTRYICLGPVNKVLNLLCCWVED





PYSDAFKLHLQRVHDYLWVAEDGMKMQGYNGSQLWDTAFSIQAIVSTKLVDNYGPTLRKAHDFV





KSSQIQQDCPGDPNVWYRHIHKGAWPFSTRDHGWLISDCTAEGLKAALMLSKLPSETVGESLER





NRLCDAVNVLLSLQNDNGGFASYELTRSYPWLELINPAETFGDIVIDYPYVECTSATMEALTLF





KKLHPGHRTKEIDTAIVRAANFLENMQRTDGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCLA





IRKACDFLLSKELPGGGWGESYLSCQNKVYTNLEGNRPHLVNTAWVLMALIEAGQAERDPTPLH





RAARLLINSQLENGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYCHRVLTE




SgCbQ (gDNA)
ATGTGGAGGTTAAAGGTCGGAGCAGAAAGCGTTGGGGAGAATGATGAGAAATGGTTGAAGAGCA
418




TAAGCAATCACTTGGGACGCCAGGTGTGGGAGTTCTGTCCGGATGCCGGCACCCAACAACAGCT





CTTGCAAGTCCACAAAGCTCGTAAAGCTTTCCACGATGACCGTTTCCACCGAAAGCAATCTTCC





GATCTCTTTATCACTATTCAGTATGGAAAGGAAGTAGAAAATGGTGGAAAGACAGCGGGAGTGA





AATTGAAAGAAGGGGAAGAGGTGAGGAAAGAGGCAGTAGAGAGTAGCTTAGAGAGGGCATTAAG





TTTCTACTCAAGCATCCAGACAAGCGATGGGAACTGGGCTTCGGATCTTGGGGGGCCCATGTTT





TTACTTCCGGGTCTGGTGATTGCCCTCTACGTTACAGGCGTCTTGAATTCTGTTTTATCCAAGC





ACCACCGGCAAGAGATGTGCAGATATGTTTACAATCACCAGAATGAAGATGGGGGGTGGGGTCT





CCACATCGAGGGCCCAAGCACCATGTTTGGTTCCGCACTGAATTATGTTGCACTCAGGCTGCTT





GGAGAAGACGCCAACGCCGGGGCAATGCCAAAAGCACGTGCTTGGATCTTGGACCACGGTGGCG





CCACCGGAATCACTTCCTGGGGCAAATTGTGGCTTTCTGTACTTGGAGTCTACGAATGGAGTGG





CAATAATCCTCTTCCACCCGAATTTTGGTTATTTCCTTACTTCCTACCATTTCATCCAGGAAGA





ATGTGGTGCCATTGTCGAATGGTTTATCTACCAATGTCATACTTATATGGAAAGAGATTTGTTG





GGCCAATCACACCCATAGTTCTGTCTCTCAGAAAAGAACTCTACGCAGTTCCATATCATGAAAT





AGACTGGAATAAATCTCGCAATACATGTGCAAAGGAGGATCTGTACTATCCACATCCCAAGATG





CAAGATATTCTGTGGGGATCTCTCCACCACGTGTATGAGCCCTTGTTTACTCGTTGGCCTGCCA





AACGCCTGAGAGAAAAGGCTTTGCAGACTGCAATGCAACATATTCACTATGAAGATGAGAATAC





CCGATATATATGCCTTGGCCCTGTCAACAAGGTACTCAATCTGCTTTGTTGTTGGGTTGAAGAT





CCCTACTCCGACGCCTTCAAACTTCATCTTCAACGAGTCCATGACTATCTCTGGGTTGCTGAAG





ATGGCATGAAAATGCAGGGTTATAATGGGAGCCAGTTGTGGGACACTGCTTTCTCCATCCAAGC





AATCGTATCCACCAAACTTGTAGACAACTATGGCCCAACCTTAAGAAAGGCACACGACTTCGTT





AAAAGTTCTCAGATTCAGCAGGACTGTCCTGGGGATCCTAATGTTTGGTACCGTCACATTCATA





AAGGTGCATGGCCATTTTCAACTCGAGATCATGGATGGCTCATCTCTGACTGTACAGCAGAGGG





ATTAAAGGCTGCTTTGATGTTATCCAAACTTCCATCCGAAACAGTTGGGGAATCATTAGAACGG





AATCGCCTTTGCGATGCTGTAAACGTTCTCCTTTCTTTGCAAAACGATAATGGTGGCTTTGCAT





CATATGAGTTGACAAGATCATACCCTTGGTTGGAGTTGATCAACCCCGCAGAAACGTTTGGAGA





TATTGTCATTGATTATCCGTATGTGGAGTGCACCTCAGCCACAATGGAAGCACTGACGTTGTTT





AAGAAATTACATCCCGGCCATAGGACCAAAGAAATTGATACTGCTATTGTCAGGGCGGCCAACT





TCCTTGAAAATATGCAAAGGACGGATGGCTCTTGGTATGGATGTTGGGGGGTTTGCTTCACGTA





TGCGGGGTGGTTTGGCATAAAGGGATTGGTGGCTGCAGGAAGGACATATAATAATTGCCTTGCC





ATTCGCAAGGCTTGCGATTTTTTACTATCTAAAGAGCTGCCCGGCGGTGGATGGGGAGAGAGTT





ACCTTTCATGTCAGAATAAGGTATACACAAATCTTGAAGGAAACAGACCGCACCTGGTTAACAC





GGCCTGGGTTTTAATGGCCCTCATAGAAGCTGGCCAGGCTGAGAGAGACCCAACACCATTGCAT





CGTGCAGCAAGGTTGTTAATCAATTCCCAGTTGGAGAATGGTGATTTCCCCCAACAGGAGATCA





TGGGAGTCTTTAATAAAAATTGCATGATCACATATGCTGCATACCGAAACATTTTTCCCATTTG





GGCTCTTGGAGAGTATTGCCATCGGGTTTTGACTGAATAA





ATGTGGAGGTTAAAGGTCGGAGCAGAAAGCGTTGGGGAGAATGATGAGAAATGGTTGAAGAGCA
419




TAAGCAATCACTTGGGACGCCAGGTGTGGGAGTTCTGTCCGGATGCCGGCACCCAACAACAGCT





CTTGCAAGTCCACAAAGCTCGTAAAGCTTTCCACGATGACCGTTTCCACCGAAAGCAATCTTCC





GATCTCTTTATCACTATTCAGTATGGAAAGGAAGTAGAAAATGGTGGAAAGACAGCGGGAGTGA





AATTGAAAGAAGGGGAAGAGGTGAGGAAAGAGGCAGTAGAGAGTAGCTTAGAGAGGGCATTAAG





TTTCTACTCAAGCATCCAGACAAGCGATGGGAACTGGGCTTCGGATCTTGGGGGGCCCATGTTT





TTACTTCCGGGTCTGGTGATTGCCCTCTACGTTACAGGCGTCTTGAATTCTGTTTTATCCAAGC





ACCACCGGCAAGAGATGTGCAGATATGTTTACAATCACCAGAATGAAGATGGGGGGTGGGGTCT





CCACATCGAGGGCCCAAGCACCATGTTTGGTTCCGCACTGAATTATGTTGCACTCAGGCTGCTT





GGAGAAGACGCCAACGCCGGGGCAATGCCAAAAGCACGTGCTTGGATCTTGGACCACGGTGGCG





CCACCGGAATCACTTCCTGGGGCAAATTGTGGCTTTCTGTACTTGGAGTCTACGAATGGAGTGG





CAATAATCCTCTTCCACCCGAATTTTGGTTATTTCCTTACTTCCTACCATTTCATCCAGGAAGA





ATGTGGTGCCATTGTCGAATGGTTTATCTACCAATGTCATACTTATATGGAAAGAGATTTGTTG





GGCCAATCACACCCATAGTTCTGTCTCTCAGAAAAGAACTCTACGCAGTTCCATATCATGAAAT





AGACTGGAATAAATCTCGCAATACATGTGCAAAGGAGGATCTGTACTATCCACATCCCAAGATG





CAAGATATTCTGTGGGGATCTCTCCACCACGTGTATGAGCCCTTGTTTACTCGTTGGCCTGCCA





AACGCCTGAGAGAAAAGGCTTTGCAGACTGCAATGCAACATATTCACTATGAAGATGAGAATAC





CCGATATATATGCCTTGGCCCTGTCAACAAGGTACTCAATCTGCTTTGTTGTTGGGTTGAAGAT





CCCTACTCCGACGCCTTCAAACTTCATCTTCAACGAGTCCATGACTATCTCTGGGTTGCTGAAG





ATGGCATGAAAATGCAGGGTTATAATGGGAGCCAGTTGTGGGACACTGCTTTCTCCATCCAAGC





AATCGTATCCACCAAACTTGTAGACAACTATGGCCCAACCTTAAGAAAGGCACACGACTTCGTT





AAAAGTTCTCAGATTCAGCAGGACTGTCCTGGGGATCCTAATGTTTGGTACCGTCACATTCATA





AAGGTGCATGGCCATTTTCAACTCGAGATCATGGATGGCTCATCTCTGACTGTACAGCAGAGGG





ATTAAAGGCTGCTTTGATGTTATCCAAACTTCCATCCGAAACAGTTGGGGAATCATTAGAACGG





AATCGCCTTTGCGATGCTGTAAACGTTCTCCTTTCTTTGCAAAACGATAATGGTGGCTTTGCAT





CATATGAGTTGACAAGATCATACCCTTGGTTGGAGTTGATCAACCCCGCAGAAACGTTTGGAGA





TATTGTCATTGATTATCCGTATGTGGAGTGCACCTCAGCCACAATGGAAGCACTGACGTTGTTT





AAGAAATTACATCCCGGCCATAGGACCAAAGAAATTGATACTGCTATTGTCAGGGCGGCCAACT





TCCTTGAAAATATGCAAAGGACGGATGGCTCTTGGTATGGATGTTGGGGGGTTTGCTTCACGTA





TGCGGGGTGGTTTGGCATAAAGGGATTGGTGGCTGCAGGAAGGACATATAATAATTGCCTTGCC





ATTCGCAAGGCTTGCGATTTTTTACTATCTAAAGAGCTGCCCGGCGGTGGATGGGGAGAGAGTT





ACCTTTCATGTCAGAATAAGGTATACACAAATCTTGAAGGAAACAGACCGCACCTGGTTAACAC





GGCCTGGGTTTTAATGGCCCTCATAGAAGCTGGCCAGGCTGAGAGAGACCCAACACCATTGCAT





CGTGCAGCAAGGTTGTTAATCAATTCCCAGTTGGAGAATGGTGATTTCCCCCAACAGGAGATCA





TGGGAGTCTTTAATAAAAATTGCATGATCACATATGCTGCATACCGAAACATTTTTCCCATTTG





GGCTCTTGGAGAGTATTGCCATCGGGTTTTGACTGAATAA




Cpep2 protein
MWRLKVGAESVGEKDEKWVKSVSNHLGRQVWEFCAADAAAVTPHQLLQIQNARNHFHRNRFHRK
420




QSSDLFLAIQYEKEIAKGGKGKEAVKVKEGEEVGKEAVKSTLERALSFYTAVQTSDGNWASDLG





GPMFLLPGLVIALYVTGVLNSVLSKHHRVEMCRYIYNHQNEDGGWGLHIEGTSTMFGSALNYVA





LRLLGEDADGGDDGAMTKARAWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPPEFWLLPYS





LPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPKVLSLRQELYTVPYHEIDWNKSRNTCAKEDL





YYPHPKMQDILWGSIYHVYEPLFTRWPGKRLREKALQTAMKHIHYEDENSRYICLGPVNKVLNM





LCCWVEDPYSDAFKLHLQRVHDYLWVAEDGMRMQGYNGSQLWDTAFSIQAIVATKLVDSYAPTL





RKAHDFVKDSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLMLSKLPSTM





VGEPLEKNRLCDAVNVLLSLQNDNGGFASYELTRSYPWLELINPAETFGDIVIDYPYVECTAAT





MEALTLFKKLHPGHRTKEIDTAIGKAANFLEKMQRADGSWYGCWGVCFTYAGWFGIKGLVAAGR





TYNSCLAIRKACEFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVLMALIEAGQGE





RDPAPLHRGARLVMNSQLENGDFVQQEIMGVFNKNCMITYAAYRNIFPIWALGEYCHRVLTE




Cpep2 gene
ATGTGGAGGCTGAAGGTGGGAGCAGAGAGCGTTGGGGAGAAGGATGAGAAATGGGTGAAGAGCG
421



sequence
TAAGCAATCACTTGGGCCGCCAAGTTTGGGAGTTCTGTGCCGCCGACGCCGCCGCCGTCACTCC





TCACCAGTTACTACAAATTCAGAATGCTCGCAACCACTTCCATCGCAATCGTTTCCACCGGAAG





CAGTCTTCCGATCTCTTTCTCGCTATTCAGTATGAAAAGGAAATAGCGAAGGGCGGAAAAGGGA





AAGAGGCGGTGAAAGTGAAAGAAGGGGAGGAGGTGGGGAAAGAGGCGGTGAAGAGTACGTTAGA





GAGGGCACTAAGTTTCTACACAGCCGTGCAGACGAGCGATGGGAATTGGGCCTCGGATCTTGGA





GGGCCCATGTTTTTACTTCCGGGTCTCGTGATTGCCCTTTATGTCACAGGCGTGTTGAATTCAG





TTTTGTCCAAGCACCACCGCGTAGAGATGTGCAGATATATTTACAATCACCAGAATGAAGATGG





AGGGTGGGGTCTACATATTGAGGGCACAAGCACCATGTTTGGTTCGGCACTCAATTATGTTGCA





CTTAGGCTGCTTGGAGAAGACGCCGATGGCGGAGACGATGGTGCAATGACAAAAGCACGTGCTT





GGATCTTGGAGCGCGGCGGCGCCACTGCGATCACTTCGTGGGGAAAATTGTGGCTGTCCGTGCT





TGGAGTGTACGAATGGAGTGGCAACAACCCTCTTCCGCCTGAGTTTTGGCTTCTCCCTTACAGC





CTACCATTTCATCCAGGACGAATGTGGTGCCATTGTCGAATGGTTTATCTTCCCATGTCTTACT





TATATGGGAAGAGATTTGTTGGCCCAATCACTCCCAAAGTTCTTTCTCTAAGACAAGAGCTCTA





CACGGTTCCTTATCATGAAATAGACTGGAATAAATCCCGCAATACATGTGCAAAGGAGGATCTA





TACTATCCACATCCCAAGArGCAAGACATACTATGGGGATCTArCTACCATGTATArGAGCCAT





TGTTCACTCGTTGGCCTGGGAAACGCCTGAGGGAAAAGGCTTTACAAACTGCAATGAAACATAT





TCACTATGAAGATGAAAATAGTCGCTATATATGTCTTGGCCCAGTCAACAAGGTACTCAACATG





CTTTGTTGTTGGGTTGAAGATCCCTACTCAGACGCCTTCAAACTTCACCTTCAACGCGTCCATG





ACTATCTCTGGGTTGCTGAAGATGGCATGAGAATGCAGGGTTACAATGGCAGCCAGTTGTGGGA





CACTGCTTTCTCCATCCAAGCCATTGTAGCTACCAAACTTGTAGACAGCTATGCCCCAACTTTA





AGAAAAGCACATGACTTTGTTAAGGATTCTCAGATCCAGGAGGACTGTCCTGGGGATCCTAATG





TTTGGTTCCGTCATATTCATAAAGGTGCTTGGCCATTTTCGACTCGAGATCATGGATGGCTCAT





CTCTGACTGCACGGCTGAGGGATTGAAGGCTTCTTTGATGTTATCCAAACTTCCATCCACAATG





GTTGGGGAGCCATTAGAAAAGAATCGCCTTTGTGATGCTGTTAATGTTCTCCTTTCTTTGCAAA





ATGATAACGGTGGATTTGCATCATACGAGTTGACGAGATCATACCCTTGGTTGGAGTTGATCAA





CCCAGCAGAAACATTCGGAGACATTGTCATCGACTATCCGTATGTGGAGTGCACCGCAGCAACA





ATGGAAGCACTGACGTTATTTAAGAAGCTACATCCAGGCCATAGGACCAAAGAGATTGACACAG





CTATTGGCAAGGCAGCCAACTTCCTTGAGAAAATGCAAAGGGCGGATGGCTCTTGGTATGGGTG





TTGGGGGGTTTGTTTCACGTATGCGGGGTGGTTTGGCATCAAGGGATTGGTGGCTGCAGGAAGA





ACATATAATAGCTGCCTTGCCATCCGCAAGGCTTGTGAGTTTCTGCTATCTAAAGAGCTGCCCG





GCGGTGGATGGGGGGAGAGTTACCTTTCATGTCAGAATAAGGTGTACACCAATCTTGAGGGAAA





CAAGCCACACTTGGTTAACACTGCCTGGGTTTTAATGGCTCTCATTGAAGCCGGCCAGGGTGAG





AGAGACCCAGCACCATTGCACCGTGGAGCAAGGTTGGTAATGAATTCTCAACTGGAGAATGGTG





ATTTCGTGCAACAGGAGATCATGGGAGTGTTCAATAAGAACTGCATGATCACATATGCTGCATA





CCGAAACATCTTCCCCATTrGGGCGCTIGGAGAGTATTGCCATCGGGTTCTTACTGAATGA




Cpep4 protein
MWRLKVGAESVGEKDEKWVKSVSNHLGRQVWEFCAADAAAVTPHQLLQIQNARNHFHRNRFHRK
422




QSSDLFLAIQYEKEIAKGGKGKEAVKVKEGEEVGKEAVKSTLERALSFYTAVQTSDGNWASDLG





GPMFLLPGLVIALYVTGVLNSVLSKHHRVEMCRYIYNHQNEDGGWGLHIEGTSTMFGSALNYVA





LRLLGEDADGGDDGAMTKARAWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPPEFLLLPYS





LPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPKVLSLRQELYTVPYHEIDWNKSRNTCAKEDL





YYPHPKMQDILWGSIYHVYEPLFTRWPGKRLREKALQTAMKHIHYEDENSRYICLGPVNKVLNM





LCCWVEDPYSDAFKLHLQRVHDYLWVAEDGMRMQGYNGSQLWDTAFSIQAIVATKLVDSYAPTL





RKAHDFVKDSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLMLSKLPSTM





VGEPLEKNRLCDAVNVLLSLQNDNGGFASYELTRSYPWLELINPAETFGDIVIDYSYVECTAAT





MEALTLFKKLHPGHRTKEIDTAIGKAANFLEKMQRADGSWYGCWGVCFTYAGWFGIKGLVAAGR





TYNSCLAIRKACEFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVLMALIEAGQGE





RDPAPLHRAARLVMNSQLENGDFVQQEIMGVFNKNCMITYAAYRNIFPIWALGEYCHRVLTE.




Cpep4 gene
ATGTGGAGGCTGAAGGTGGGAGCAGAGAGCGTTGGGGAGAAGGATGAGAAATGGGTGAAGAGCG
423



sequence
TAAGCAATCACTTGGGCCGCCAAGTTTGGGAGTTCTGTGCCGCCGACGCCGCCGCCGTCACTCC





TCACCAGTTACTACAAATTCAGAATGCTCGCAACCACTTCCATCGCAATCGTTTCCACCGGAAG





CAGTCTTCCGATCTCTTTCTCGCTATTCAGTATGAAAAGGAAATAGCGAAGGGCGGAAAAGGGA





AAGAGGCGGTGAAAGTGAAAGAAGGGGAGGAGGTGGGGAAAGAGGCGGTGAAGAGTACGTTAGA





GAGGGCACTAAGTTTCTACACAGCCGTGCAGACGAGCGATGGGAATTGGGCCTCGGATCTTGGA





GGGCCCATGTTTTTACTTCCGGGTCTCGTGATTGCCCTTTATGTCACAGGCGTGTTGAATTCAG





TTTTGTCCAAGCACCACCGCGTAGAGATGTGCAGATATATTTACAATCACCAGAATGAAGATGG





AGGGTGGGGTCTACATATTGAGGGCACAAGCACCATGTTTGGTTCGGCACTCAATTATGTTGCA





CTTAGGCTGCTTGGAGAAGACGCCGATGGCGGAGACGATGGTGCAATGACAAAAGCACGTGCTT





GGATCTTGGAGCGCGGCGGCGCCACTGCGATCACTTCGTGGGGAAAATTGTGGCTGTCCGTGCT





TGGAGTGTACGAATGGAGTGGCAACAACCCTCTTCCGCCTGAGTTTTTGCTTCTCCCTTACAGC





CTACCATTTCATCCAGGACGAATGTGGTGCCATTGTCGAATGGTTTATCTTCCCATGTCTTACT





TATATGGGAAGAGATCTGTTCGCCCAATCACTCCCAAAGTTCTTTCTCTAAGACAAGAGCTCTA





CACGGTTCCTTATCATGAAATAGACTGGAATAAATCCCGCAATACATGTGCAAAGGAGGATCTA





TACTATCCACATCCCAAGATGCAAGACATACTATGGGGATCTATCTACCATGTATATGAGCCAT





TGTTCACTCGTTGGCCTGGGAAACGCCTGAGGGAAAAGGCTTTACAAACTGCAATGAAACATAT





TCACTATGAAGATGAAAATAGTCGCTATATATGTCTTGGCCCAGTCAACAAGGTACTCAACATG





CTTTGTTGTTGGGTTGAAGATCCCTACTCAGACGCCTTCAAACTTCACCTTCAACGCGTCCATG





ACTATCTCTGGGTTGCTGAAGATGGCATGAGAATGCAGGGTTACAATGGCAGCCAGTTGTGGGA





CACTGCTTTCTCCATCCAAGCCATTGTAGCTACCAAACTTGTAGACAGCTATGCCCCAACTTTA





AGAAAAGCACATGACTTTGTTAAGGATTCTCAGATCCAGGAGGACTGTCCTGGGGATCCTAATG





TTTGGTTCCGTCATATTCATAAAGGTGCTTGGCCATTTTCGACTCGAGATCATGGATGGCTCAT





CTCTGACTGCACGGCTGAGGGATTGAAGGCTTCTTTGATGTTATCCAAACTTCCATCCACAATG





GTTGGGGAGCCATTAGAAAAGAATCGCCTTTGTGATGCTGTTAATGTTCTCCTTTCTTTGCAAA





ATGATAACGGCGGATTTGCATCATACGAGTTGACGAGATCATACCCTTGGTTGGAGTTGATCAA





CCCAGCAGAAACATTCGGAGACATTGTCATCGACTATTCGTATGTGGAGTGCACCGCAGCAACA





ATGGAAGCACTGACGTTATTTAAGAAGCTACATCCAGGCCATAGGACCAAAGAGATTGACACAG





CTATTGGCAAGGCAGCCAACTTCCTTGAGAAAATGCAAAGGGCGGATGGCTCTTGGTATGGGTG





TTGGGGGGTTTGTTTCACGTATGCGGGGTGGTTTGGCATAAAGGGATTGGTGGCTGCAGGAAGA





ACATATAATAGCTGTCTTGCCATCCGCAAGGCTTGTGAGTTTCTGCTATCTAAAGAGCTGCCCG





GCGGTGGATGGGGGGAGAGTTACCTTTCATGTCAGAATAAGGTGTACACCAATCTTGAGGGAAA





CAAGCCACACTTGGTTAACACTGCCTGGGTTTTAATGGCTCTCATTGAAGCTGGCCAGGGTGAG





AGAGACCCAGCACCATTGCACCGTGCAGCAAGGTTGGTAATGAATTCTCAACTGGAGAATGGCG





ATTTCGTGCAACAGGAGATCATGGGAGTGTTCAATAAGAACTGCATGATCACATATGCTGCATA





CCGAAACATCTTCCCCATTTGGGCGCTTGGAGAGTATTGCCATCGGGTTCTTACTGAATGA




Cmax1 protein
MWRLKVGAESVGEKDEKWVKSVSNHLGRQVWEFCADAAADTPHQLLQIQNARNHFHHNRFHRKQ
424




SSDLFLAIQYEKEIAKGAKGGAVKVKEGEEVGKEAVKSTLESALGFYSAVQTSDGNWASDLGGP





MFLLPGLVIALHVTGVLNSVLSKHHRVEMCRYLYNHQNEDGGWGLHIEGTSTMFGSALNYVALR





LLGEDADGGDGGAMTKARAWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPPEFWLLPYSLP





FHPGRMWCHCRMVYLPMSYLYGKRFVGPITPKVLSLRQELYTIPYHEIDWNKSRNTCAKEDLYY





PHPKMQDILWGSIYHVYEPLFTRWPGKRLREKALQAAMKHIHYEDENSRYICLGPVNKVLNMLC





CWVEDPYSDAFKLHLQRVHDYLWVAEDGMRMQGYNGSQLWDTAFSIQAIVATKLVDSYAPTLRK





AHDFVKDSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLMLSKLPSAMVG





EPLEKNRLCDAVNVLLSLQNDNGGFASYELTRSYPWLELINPAETFGDIVIDYPYVECTAATME





ALTLFKKLHPGHRTKEIDTAIGKAANFLEKMQRADGSWYGCWGVCFTYAGWFGIKGLVAAGRTY





NSCLAIRKACEFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVLMALIEAGQGERD





PAPLHRAARLLMNSQLENGDFVQQEIMGVFNKNCMITYAAYRNIFPIWALGEYCHRVLTE




Cmax1 gene
ATGTGGAGGCTGAAGGTGGGAGCAGAGAGCGTTGGGGAGGAGGATGAGAAATGGGTGAAGAGCG
425



sequence
TAAGCAATCACTTGGGCCGCCAAGTTTGGGAGTTCTGTGCCGACGCCGCCGCCGACACTCCTCA





CCAGTTACTACAAATTCAGAATGCTCGCAACCACTTCCATCACAATCGTTTCCACCGGAAGCAG





TCTTCCGATCTCTTTCTGGCTATTCAATATGAAAAGGAAATAGCAAAGGGCGCAAAAGGTGGAG





CGGTGAAAGTGAAAGAAGGGGAGGAGGTGGGGAAAGAGGCGGTGAAGAGTACGTTAGAAAGGGC





ACTCGGTTTCTACTCGGCCGTGCAGACAAGAGATGGGAATTGGGCCTCGGATCTTGGAGGGCCC





TTGTTTTTACTTCCGGGTCTCGTGATTGCCCTTCATGTCACAGGCGTCTTGAATTCAGTTTTGT





CCAAGCACCACCGCGTAGAGATGTGCAGATATCTTTACAATCACCAGAATGAAGATGGAGGGTG





GGGTCTACATATTGAGGGCACAAGCACCATGTTTGGTTCGGCACTGAATTACGTTGCACTAAGG





CTGCTTGGAGAAGACGCCGATGGCGGAGACGGTGGCGCAATGACAAAAGCACGTGCTTGGATCT





TGGAGCGCGGCGGCGCCACTGCGATCACTTCGTGGGGAAAATTGTGGCTGTCCGTACTTGGAGT





GTACGAATGGAGTGGCAACAACCCTCTTCCGCCTGAGTTTTGGCTTCTCCCTTACAGCCTACCA





TTTCATCCAGGAAGAATGTGGTGCCATTGTCGAATGGTTTATCTTCCAATGTCTTACTTATATG





GGAAGAGATTTGTTGGGCCAATCACTCCCAAAGTTCTTTCTCTAAGGCAAGAGCTCTACACAAT





TCCTTATCATGAAATAGACTGGAATAAATCCCGCAATACATGTGCAAAGGAGGATCTGTACTAT





CCACATCCCAAGATGCAAGACATTCTATGGGGATCCATCTACCATGTATATGAGCCATTGTTCA





CTCGTTGGCCTGGGAAACGCCTGAGGGAAAAGGCTTTACAAGCTGCAATGAAACATATTCACTA





TGAAGATGAAAATAGTCGATATATATGTCTTGGCCCAGTCAACAAGGTACTCAACATGCTTTGT





TGTTGGGTTGAAGATCCCTACTCAGACGCCTTCAAACTTCACCTTCAACGCGTCCATGACTATC





TCTGGGTTGCTGAAGATGGCATGAGAATGCAGGGCTACAATGGCAGCCAGTTGTGGGACACTGC





TTTCTCCATCCAAGCCATCGTAGCCACCAAACTTGTAGACAGCTATGCCCCAACTTTAAGAAAA





GCACATGACTTTGTTAAGGATTCTCAGATCCAGGAGGACTGTCCTGGGGATCCTAATGTTTGGT





TCCGTCATATTCATAAAGGTGCTTGGCCACTTTCGACACGAGATCATGGATGGCTCATCTCCGA





CTGTACAGCTGAGGGATTGAAGGCTTCTTTGATGTTATCCAAACTTCCATCCACAATGGTTGGG





GAGCCATTAGAAAAGAATCGCCTTTGTGATGCTGTTAATGTTCTCCTTTCTTTGCAAAATGATA





ATGGTGGATTTGCATCATACGAGTTGACGAGATCATACCCTTGGTTGGAGTTGATCAACCCAGC





TGAAACATTCGGAGACATTGTCATTGACTATCCGTATGTGGAGTGCACCGCAGCAACAATGGAA





GCACTGACGTTATTTAAGAAGCTACATCCAGGCCATAGGACCAAAGAGATTGACACAGCTATTG





GCAAGGCAGCCAACTTCCTTGAGAAAATGCAGAGGGCGGATGGCTCTTGGTACGGGTGTTGGGG





GGTTTGTTTTACGTATGCGGGTTGGTTTGGCATAAAGGGATTGGTGGCTGCAGGAAGAACATAT





AATAGCTGCCTTGCCATTCGCAAGGCTTGTGAGTTTCTGCTATCTAAAGAGCTGCCCGGCGGTG





GATGGGGGGAGAGTTACCTTTCATGTCAGAATAAGGTGTACACCAATCTTGAGGGGAACAAGCC





ACACTTGGTTAACACTGCCTGGGTTTTAATGGCTCTCATTGAAGCTGGCCAGGGTGAGAGAGAC





CCAGCACCATTGCACCGTGCAGCAAGGTTGCTAATGAATTCCCAATTGGAGAATGGCGATTTCG





TGCAACAGGAGATCATGGGAGTGTTCAATAAGAACTGCATGATCACATATGCTGCATACCGAAA





CATCTTCCCCATTTGGGCGCTTGGAGAGTATTGCCATCGGGTTCTTACTGAATGA




Cmos1 protein
MWRLKVGAESVGEKDEKWVKSVSNHLGRQVWEFCADAAAAATPRQLLQIQNARNHFHRNRFHRK
426




QSSDLFLAIQYEKEIAEGGKGGAVKVKEEEEVGKEAVKSTLERALSFYSAVQTSDGNWASDLGG





PMFLLPGLVIALYVTGVLNSVLSKHHRVEMCRYLYNHQNEDGGWGLHIEGTSTMFGSALNYVAL





RLLGEDADGGDDGAMTKARAWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPPEFWLLPYSL





PFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPKVLSLRQELYTVPYHEIDWNKSRNTCAKEDLY





YPHPKMQDILWGSIYHVYEPLFTRWPGKRLREKALQTAMKHIHYEDENSRYICLGPVNKVLNML





CCWVEDPYSDAFKLHLQRVHDYLWVAEDGMRMQGYNGSQLWDTAFSIQAIVATKLVDSYAPTLR





KAHDFVKDSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLMLSKLPSAMV





GEPLEKNRLCDAVNVLLSLQNDNGGFASYELTRSYPWLELINPAETFGDIVIDYPYVECTAATM





EALTLFKKLHPGHRTKEIDTAIGKAANFLEKMQRADGSWYGCWGVCFTYAGWFGIKGLVAAGRT





YNSCLAIRKACEFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVLMALIEAGQGER





DPAPLHRAARLLMNSQLENGDFVQQEIMGVFNKNCMITYAAYRNIFPIWALGEYCHRVLTE




Cmos1 gene
ATGTGGAGGTTGAAGGTGGGAGCAGAGAGCGTTGGGGAGAAGGATGAGAAATGGGTGAAGAGCG
427



sequence
TAAGCAATCACTTGGGCCGCCAAGTTTGGGAGTTCTGTGCCGACGCCGCCGCCGCCGCCACTCC





TCGCCAGTTACTACAAATTCAGAATGCTCGCAACCACTTCCATCGCAATCGTTTCCACCGGAAG





CAGTCTTCCGATCTCTTTCTCGCTATTCAGTATGAAAAGGAAATAGCAGAGGGCGGAAAAGGTG





GAGCGGTGAAAGTGAAAGAAGAGGAGGAGGTGGGGAAAGAGGCGGTGAAGAGTACGTTAGAAAG





GGCACTAAGTTTCTACTCAGCCGTGCAGACAAGCGATGGGAATTGGGCCTCGGATCTTGGAGGG





CCCATGTTTTTACTTCCGGGTCTCGTGATTGCCCTTTATGTCACAGGCGTGTTGAATTCAGTTT





TGTCCAAGCACCACCGCGTAGAGATGTGCAGATATCTTTACAATCACCAGAATGAAGATGGAGG





GTGGGGTCTACATATTGAGGGCACAAGCACCATGTTTGGTTCGGCACTCAATTACGTTGCACTA





AGGCTGCTTGGAGAAGACGCGGATGGCGGAGACGATGGCGCAATGACAAAAGCACGTGCTTGGA





TCTTGGAGCGCGGCGGCGCCACTGCGATCACTTCGTGGGGAAAGTTGTGGCTGTCCGTGCTTGG





AGTGTACGAATGGAGTGGCAACAACCCTCTTCCGCCTGAGTTTTGGCTTCTCCCTTACAGCCTA





CCATTTCATCCAGGAAGAATGTGGTGCCATTGTCGAATGGTTTATCTTCCCATGTCTTACTTAT





ATGGGAAGAGATTTGTTGGGCCAATCACTCCCAAAGTTCTATCGCTAAGACAAGAGCTTTACAC





GGTTCCTTATCATGAAATAGACTGGAACAAATCCCGCAATACATGTGCAAAGGAGGATCTATAC





TATCCACATCCCAAGATGCAAGACATTCTATGGGGATCCATCTACCATGTGTATGAGCCATTGT





TCACTCGTTGGCCTGGGAAACGCCTGAGGGAAAAGGCTTTACAAACTGCAATCAAACATATTCA





CTATGAAGATGAAAATAGTCGATATATATGTCTTGGCCCAGTCAACAAGGTACTCAACATGCTT





TGTTGTTGGGTTGAAGATCCCTACTCAGACGCCTTCAAACTTCACCTTCAACGCGTCCATGACT





ATCTCTGGGTTGCTGAAGATGGCATGAGAATGCAGGGCTACAATGGCAGCCAGTTGTGGGACAC





TGCTTTCTCCATCCAAGCCATCGTAGCCACCAAACTTGTAGACAGCTATGCCCCAACTTTAAGA





AAAGCACATGACTTTGTTAAGGATTCTCAGATCCAGGAGGACTGTCCTGGGGATCCTAATGTTT





GGTTCCGTCATATTCATAAAGGTGCTTGGCCATTTTCGACTCGAGATCATGGATGGCTCATCTC





CGACTGTACAGCTGAGGGATTGAAGGCTTCTTrGATGTTATCCAAACTTCCATCCGCAATGGTT





GGGGAGCCATTAGAAAAGAATCGCCTTTGTGATGCTGTTAATGTTCTCCTTTCTTTGCAAAATG





ATAATGGTGGATTTGCATCATACGAGTTGACGAGATCATACCCTTGGTTGGAGTTGATCAACCC





AGCAGAAACATTCGGAGACATTGTCATCGACTATCCGTATGTGGAGTGCACCGCAGCAACAATG





GAAGCACTGACGTTATTTAAGAAGCTACATCCAGGCCATAGGACCAAAGAGATTGACACAGCTA





TTGGCAAGGCAGCCAACTTCCTTGAGAAAATGCAGAGGGCGGATGGCTCTTGGTATGGGTGTTG





GGGGGTTTGTTTCACGTATGCGGGGTGGTITGGCATAAAGGGATTGGTGGCTGCAGGAAGAACA





TATAATAGCTGCCTTGCCATCCGCAAGGCTTGTGAGTTTCTGCTATCTAAAGAGCTGCCCGGCG





GTGGATGGGGGGAGAGTTACCTTTCATGTCAGAATAAGGTGTACACCAATCTTGAGGGAAACAA





GCCACACTTGGTTAACACTGCCTGGGTTTTAATGGCTCTCATTGAAGCTGGCCAGGGTGAGAGA





GACCCAGCACCATTGCACCGTGCACCAAGGTTGCTAATGAATTCCCAATTGGAGAATGGCGATT





TCGTGCAACAGGAGATCATGGGAGTGTTCAATAAGAACTGCATGATCACATATGCTGCATACCG





AAACATCTTCCCCATTTGGGCGCTTGGAGAGTATTGCCATCGGGTTCTGACTGAAT




EPH protein
MEKIEHSTIATNGINMHVASAGSGPAVLFLHGFPELWYSWRHQLLYLSSLGYRAIAPDLRGFGD
428




TDAPPSPSSYTAHHIVGDLVGLLDQLGVDQVFLVGHDWGAMMAWYFCLFRPDRVKALVNLSVHF





TPRNPAISPLDGFRLMLGDDFYVCKFQEPGVAEADFGSVDTATMFKKFLTMRDPRPPIIPNGFR





SLATPEALPSWLTEEDIDYFAAKFAKTGFTGGFNYYRAIDLTWELTAPWSGSEIKVPTKFIVGD





LDLVYHFPGVKEYIHGGGFKKDVPFLEEVVVMEGAAHFINQEKADEINSLIYDFIKQF.




EPH gene sequence
ATGGAGAAGATTGAACACTCTACTATCGCTACTAATGGTATCAATATGCACGTTGCCTCTGCTG
429



(codon optimized,
GTTCTGGTCCAGCTGTTTTGTTTTTGCACGGTTTCCCAGAATTATGGTATTCCTGGAGACACCA





E coli)

ATTGTTGTACTTGTCTTCTTTGGGTTACAGAGCTATTGCTCCAGATTTGAGAGGTTTCGGTGAC





ACCGATGCTCCACCATCTCCATCCTCCTACACCGCCCACCACATCGTTGGTGATTTGGTCGGTT





TGTTGGATCAATTAGGTGTCGATCAAGTCTTTTTGGTTGGTCATGATTGGGGTGCTATGATGGC





CTGGTACTTCTGTTTGTTCCGTCCAGACAGAGTCAAGGCCTTAGTTAATTTATCTGTCCACTTC





ACCCCACGTAACCCAGCTATCTCTCCATTAGATGGTTTCCGTTTGATGTTGGGTGATGATTTCT





ACGTTTGTAAGTTTCAAGAACCAGGTGTCGCTGAAGCCGATTTCGGTTCTGTTGATACTGCCAC





TATGTTTAAAAAGTTCTTGACCATGAGAGATCCACGTCCACCTATTATTCCAAACGGTTTCAGA





TCCTTGGCCACCCCAGAAGCTTTGCCATCCTGGTTGACTGAAGAGGATATCGATTACTTTGCTG





CCAAATTCGCTAAGACTGGTTTTACTGGTGGTTTCAACTACTACAGAGCTATCGACTTGACCTG





GGAGTTGACTGCTCCATGGTCCGGTTCTGAAATCAAGGTTCCAACTAAGTTTATTGTTGGTGAC





TTAGACTTGGTTTACCATTTCCCAGGTGTTAAGGAATACATTCACGGTGGTGGTTTCAAGAAGG





ACGTTCCATTCTTGGAAGAAGTTGTCGTCATGGAAGGTGCTGCTCATTTTATCAACCAAGAAAA





AGCTGACGAAATTAATTCTTTGATCTATGACTTCATTAAACAATTCTAG




CYP87D18 protein
MWTVVLGLATLFVAYYIHWINKWRDSKFNGVLPPGTMGLPLIGETIQLSRPSDSLDVHPFIQKK
430




VERYGPIFKTCLAGRPVVVSADAEFNNYIMLQEGRAVEMWYLDTLSKFFGLDTEWLKALGLIHK





YIRSITLNHFGAEALRERFLPFIEASSMEALHSWSTQPSVEVKNASALMVFRTSVNKMFGEDAK





KLSGNIPGKFTKLLGGFLSLPLNFPGTTYHKCLKDMKEIQKKLREVVDDRLANVGPDVEDFLGQ





ALKDKESEKFISEEFIIQLLFSISFASFESISTTLTLILKLLDEHPEVVKELEAEHEAIRKARA





DPDGPITWEEYKSMTFTLQVINETLRLGSVTPALLRKTVKDLQVKGYIIPEGWTIMLVTASRHR





DPKVYKDPHIFNPWRWKDLDSITIQKNFMPFGGGLRHCAGAEYSKVYLCTFLHILCTKYRWTKL





GGGRIARAHILSFEDGLHVKFTPKE.




CYP87D18 gene
ATGTGGACTGTCGTGCTCGGTTTGGCGACGCTGTTTGTCGCCTACTACATCCATTGGATTAACA
431



sequence
AATGGAGAGATTCCAAGTTCAACGGAGTTCTGCCGCCGGGCACCATGGGTTTGCCGCTCATCGG





AGAGACGATTCAACTGAGTCGACCCAGTGACTCCCTCGACGTTCACCCTTTCATCCAGAAAAAA





GTTGAAAGATACGGGCCGATCTTCAAAACATGTCTGGCCGGAAGGCCGGTGGTGGTGTCGGCGG





ACGCAGAGTTCAACAACTACATAATGCTGCAGGAAGGAAGAGCAGTGGAAATGTGGTATTTGGA





TACGCTCTCCAAATTTTTCGGCCTCGACACCGAGTGGCTCAAAGCTCTGGGCCTCATCCACAAG





TACATCAGAAGCATTACTCTCAATCACTTCGGCGCCGAGGCCCTGCGGGAGAGATTTCTTCCTT





TTATTGAAGCATCCTCCATGGAAGCCCTTCACTCCTGGTCTACTCAACCTAGCGTCGAAGTCAA





AAATGCCTCCGCTCTCATGGTTTTTAGGACCTCGGTGAATAAGATGTTCGGTGAGGATGCGAAG





AAGCTATCGGGAAATATCCCTGGGAAGTTCACGAAGCTTCTAGGAGGATTTCTCAGTTTACCAC





TGAATTTTCCCGGCACCACCTACCACAAATGCTTGAAGGATATGAAGGAAATCCAGAAGAAGCT





AAGAGAGGTTGTAGACGATAGATTGGCTAATGTGGGCCCTGATGTGGAAGATTTCTTGGGGCAA





GCCCTTAAAGATAAGGAATCAGAGAAGTTCATTTCAGAGGAGTTCATCATCCAACTGTTGTTTT





CTATCAGTTTTGCTAGCTTTGAGTCCATCTCCACCACTCTTACTTTGATTCTCAAGCTCCTTGA





TGAACACCCAGAAGTAGTGAAAGAGTTGGAAGCTGAACACGAGGCGATTCGAAAAGCTAGAGCA





GATCCAGATGGACCAATTACTTGGGAAGAATACAAATCCATGACTTTTACATTACAAGTCATCA





ATGAAACCCTAAGGTTGGGGAGTGTCACACCTGCCTTGTTGAGGAAAACAGTTAAAGATCTTCA





AGTAAAAGGATACATAATCCCGGAAGGATGGACAATAATGCTTGTCACCGCTTCACGTCACAGA





GACCCAAAAGTCTATAAGGACCCTCATATCTTCAATCCATGGCGTTGGAAGGACTTGGACTCAA





TTACCATCCAAAAGAACTTCATGCCTTTTGGGGGAGGCTTAAGGCATTGTGCTGGTGCTGAGTA





CTCTAAAGTCTACTTGTGCACCTTCTTGCACATCCTCTGTACCAAATACCGATGGACCAAACTT





GGGGGAGGAAGGATTGCAAGAGCTCATATATTGAGTTTTGAAGATGGGTTACATGTGAAGTTCA





CACCCAAGGAATGA




AtCPR protein
MTSALYASDLFKQLKSIMGTDSLSDDVVLVIATTSLALVAGFVVLLWKKTTADRSGELKPLMIP
432




KSLMAKDEDDDLDLGSGKTRVSIFFGTQTGTAEGFAKALSEEIKARYEKAAVKDDYAADDDQYE





EKLKKETLAFFCVATYGDGEPTDNAARFYKWFTEENERDIKLQQLAYGVFALGNRQYEHFNKIG





IVLDEELCKKGAKRLIEVGLGDDDQSIEDDFNAWKESLWSELDKLLKDEDDKSVATPYTAVIPE





YRVVTHDPRFTTQKSMESNVANGNTTIDIHHPCRVDVAVQKELHTHESDRSCIHLEFDISRTGI





TYETGDHVGVYAENHVEIVEEAGKLLGHSLDLVFSIHADKEDGSPLESAVPPPFPGPCTLGTGL





ARYADLLNPPRKSALVALAAYATEPSEAEKLKHLTSPDGKDEYSQWIVASQRSLLEVMAAFPSA





KPPLGVFFAAIAPRLQPRYYSISSSPRLAPSRVHVTSALVYGPTPTGRIHKGVCSTWMKNAVPA





EKSHECSGAPIFIRASNFKLPSNPSTPIVMVGPGTGLAPFRGFLQERMALKEDGEELGSSLLFF





GCRNRQMDFIYEDELNNFVDQGVISELIMAFSREGAQKEYVQHKMMEKAAQVWDLIKEEGYLYV





CGDAKGMARDVHRTLHTIVQEQEGVSSSEAEAIVKKLQTEGRYLRDVW




AtCPR gene
ATGACTTCTGCTTTGTATGCTTCCGATTTGTTTAAGCAGCTCAAGTCAATTATGGGGACAGATT
433



sequence
CGTTATCCGACGATGTTGTACTTGTGATTGCAACGACGTCTTTGGCACTAGTAGCTGGATTTGT





GGTGTTGTTATGGAAGAAAACGACGGCGGATCGGAGCGGGGAGCTGAAGCCTTTGATGATCCCT





AAGTCTCTTATGGCTAAGGACGAGGATGATGATTTGGATTTGGGATCCGGGAAGACTAGAGTCT





CTATCTTCTTCGGTACGCAGACTGGAACAGCTGAGGGATTTGCTAAGGCATTATCCGAAGAAAT





CAAAGCGAGATATGAAAAAGCAGCAGTCAAAGATGACTATGCTGCCGATGATGACCAGTATGAA





GAGAAATTGAAGAAGGAAACTTTGGCATTTTTCTGTGTTGCTACTTATGGAGATGGAGAGCCTA





CTGACAATGCTGCCAGATTTTACAAATGGTTTACGGAGGAAAATGAACGGGATATAAAGCTTCA





ACAACTAGCATATGGTGTGTTTGCTCTTGGTAATCGCCAATATGAACATTTTAATAAGATCGGG





ATAGTTCTTGATGAAGAGTTATGTAAGAAAGGTGCAAAGCGTCTTATTGAAGTCGGTCTAGGAG





ATGATGATCAGAGCATTGAGGATGATTTTAATGCCTGGAAAGAATCACTATGGTCTGAGCTAGA





CAAGCTCCTCAAAGACGAGGATGATAAAAGTGTGGCAACTCCTTATACAGCTGTTATTCCTGAA





TACCGGGTGGTGACTCATGATCCTCGGTTTACAACTCAAAAATCAATGGAATCAAATGTGGCCA





ATGGAAATACTACTATTGACATTCATCATCCCTGCAGAGTTGATGTTGCTGTGCAGAAGGAGCT





TCACACACATGAATCTGATCGGTCTTGCATTCATCTCGAGTTCGACATATCCAGGACGGGTATT





ACATATGAAACAGGTGACCATGTAGGTGTATATGCTGAAAATCATGTTGAAATAGTTGAAGAAG





CTGGAAAATTGCTTGGCCACTCTTTAGATTTAGTATTTTCCATACATGCTGACAAGGAAGATGG





CTCCCCATTGGAAAGCGCAGTGCCGCCTCCTTTCCCTGGTCCATGCACACTTGGGACTGGTTTG





GCAAGATACGCAGACCTTTTGAACCCTCCTCGAAAGTCTGCGTTAGTTGCCTTGGCGGCCTATG





CCACTGAACCAAGTGAAGCCGAGAAACTTAAGCACCTGACATCACCTGATGGAAAGGATGAGTA





CTCACAATGGATTGTTGCAAGTCAGAGAAGTCTTTTAGAGGTGATGGCTGCTTTTCCATCTGCA





AAACCCCCACTAGGTGTATTTTTTGCTGCAATAGCTCCTCGTCTACAACCTCGTTACTACTCCA





TCTCATCCTCGCCAAGATTGGCGCCAAGTAGAGTTCATGTTACATCCGCACTAGTATATGGTCC





AACTCCTACTGGTAGAATCCACAAGGGTGTGTGTTCTACGTGGATGAAGAATGCAGTTCCTGCG





GAGAAAAGTCATGAATGTAGTGGAGCCCCAATCTTTATTCGAGCATCTAATTTCAAGTTACCAT





CCAACCCTTCAACTCCAATCGTTATGGTGGGACCTGGGACTGGGCTGGCACCTTTTAGAGGTTT





TCTGCAGGAAAGGATGGCACTAAAAGAAGATGGAGAAGAACTAGGTTCATCTTTGCTCTTCTTT





GGGTGTAGAAATCGACAGATGGACTTTATATACGAGGATGAGCTCAATAATTTTGTTGATCAAG





GCGTAATATCTGAGCTCATCATGGCATTCTCCCGTGAAGGAGCTCAGAAGGAGTATGTTCAACA





TAAGATGATGGAGAAGGCAGCACAAGTTTGGGATCTAATAAAGGAAGAAGGATATCTCTATGTA





TGCGGTGATGCTAAGGGCATGGCGAGGGACGTCCACCGAACTCTACACACCATTGTTCAGGAGC





AGGAAGGTGTGAGTTCGTCAGAGGCAGAGGCTATAGTTAAGAAACTTCAAACCGAAGGAAGATA





CCTCAGAGATGTCTGGTGA




AGY15763.1 protein
MWKVPKFIKQSYLVFLLALLLYSSFGFSFSRTEATTSTGALGPVTPKDTIYQIVTDRFFDGDPS
434




NNKPPGFDPTLFDDPDGNNQGNGKDLKLYQGGDFQGIIDKIPYLKNMGITAVWISAPYENRDTV





IEDYQSDGSINRWTSFHGYHARNYFATNKHFGTMKDFIRLRDALHQNGIKLVIDFVSNHSSRWQ





NPTLNFAPEDGKLYEPDKDANGNYVFDANGEPADYNGDGKVENLLADPHNDVNGFFHGLGDRGN





DTSRFGYRYKDLGSLADYSQENALVVEHLEKAAKFWKSKGIDGFRHDATLHMNPAFVKGFKDAI





DSDAGGPVTHFGEFFIGRPDPKYDEYRTFPERTGVNNLDFEYFRAATNAFGNFSETMSSFGDMM





IKTSNDYIYENQTVTFLDNHDVTRFRYIQPNDKPYHAALAVLMTSRGIPNIYYGTEQYLMPSDS





SDIAGRMFMQTSTNFDENTTAYKVIQKLSNLRKNNEAIAYGTTEILYSTNDVLVFKRQFYDKQV





IVAVNRQPDQTFTIPELDTTLPVGTYSDVLGGLLYGSSMSVNNVNGQNKISSFTLSGGEVNVWS





YNPSLGTLTPRIGDVISTMGRPGNTVYIYGTGLGGSVTVKFGSTVATVVSNSDQMIEAIVPNTN





PGIQNITVTKGSVTSDPFRYEVLSGDQVQVIFHVNATTNWGENIYVVGNIPELGSWDPNQSSEA





MLNPNYPEWFLPVSVPKGATFEFKFIKKDNNGNVIWESRSNRVFTAPNSSTGTIDTPLYFWDN




AGY15764.1 protein
TTSTGALGPVTPKDTIYQIVTDRFFDGDPSNNKPPGFDPTLFDDPDGNNQGNGKDLKLYQGGDF
435




QGIIDKIPYLKNMGITAVWISAPYENRDTVIEDYQSDGSINRWTSFHGYHARNYFATNKHFGTM





KDFIRLRDALHQNGIKLVIDFVSNHSSRWQNPTLNFAPEDGKLYEPDKDANGNYVFDANGEPAD





YNGDGKVENLLADPHNDVNGFFHGLGDRGNDTSRFGYRYKDLGSLADYSQENALVVEHLEKAAK





FWKSKGIDGFRHDATLHMNPAFVKGFKDAIDSDAGGPVTHFGEFFIGRPDPKYDEYRTFPERTG





VNNLDFEYFRAATNAFGNFSETMSSFGDMMIKTSNDYIYENQTVTFLDNHDVTRFRYIQPNDKP





YHAALAVLMTSRGIPNIYYGTEQYLMPSDSSDIAGRMFMQTSTNFDENTTAYKVIQKLSNLRKN





NEAIAYGTTEILYSTNDVLVFKRQFYDKQVIVAVNRQPDQTFTIPELDTTLPVGTYSDVLGGLL





YGSSMSVNNVNGQNKISSFTLSGGEVNVWSYNPSLGTLTPRIGDVISTMGRPGNTVYIYGTGLG





GSVTVKFGSTVATVVSNSDQMIEAIVPNTNPGIQNITVTKGSVTSDPFRYEVLSGDQVQVIFHV





NATTNWGENIYVVGNIPELGSWDPNQSSEAMLNPNYPEWFLPVSVPKGATFEFKFIKKDNNGNV





IWESRSNRVFTAPNSSTGTIDTPLYFWDN




Glycosyltransferas
MDSGYSSSYAAAAGMHVVICPWLAFGHLLPCLDLAQRLASRGHRVSFVSTPRNISRLPPVRPAL
436



e(311)
APLVAFVALPLPRVEGLPDGAESTNDVPHDRPDMVELHRRAFDGLAAPFSEFLGTACADWVIVD





VFHHWAAAAALEHKVPCAMMLLGSAHMIASIADRRLERAETESPAAAGQGRPAAAPTFEVARMK





LIRTKGSSGMSLAERFSLTLSRSSLVVGRSCVEFEPETVPLLSTLRGKPITFLGLMPPLHEGRR





EDGEDATVRWLDAQPAKSVVYVALGSEVPLGVEKVHELALGLELAGTRFLWALRKPTGVSDADL





LPAGFEERTRGRGVVATRWVPQMSILAHAAVGAFLTHCGWNSTIEGLMFGHPLIMLPIFGDQGP





NARLIEAKNAGLQVARNDGDGSFDREGVAAAIRAVAVEEESSKVFQAKAKKLQEIVADMACHER





YIDGFIQQLRSYKDAAALE




Glycosyltransferas
ATGGACTCCGGCTACTCCTCCTCCTACGCCGCCGCCGCCGGGATGCACGTCGTGATCTGCCCGT
437



e(311) (gDNA
GGCTCGCCTTCGGCCACCTGCTCCCGTGCCTCGACCTCGCCCAGCGCCTCGCGTCGCGGGGCCA




native)
CCGCGTGTCGTTCGTCTCCACGCCGCGGAACATATCCCGCCTCCCGCCGGTGCGCCCCGCGCTC





GCGCCGCTCGTCGCCTTCGTGGCGCTGCCGCTCCCGCGCGTCGAGGGGCTCCCCGACGGCGCCG





AGTCCACCAACGACGTCCCCCACGACAGGCCGGACATGGTCGAGCTCCACCGGAGGGCCTTCGA





CGGGCTCGCCGCGCCCTTCTCGGAGTTCTTGGGCACCGCGTGCGCCGACTGGGTCATCGTCGAC





GTCTTCCACCACTGGGCCGCAGCCGCCGCTCTCGAGCACAAGGTGCCATGTGCAATGATGTTGT





TGGGCTCTGCACATATGATCGCTTCCATAGCAGACAGACGGCTCGAGCGCGCGGAGACAGAGTC





GCCTGCGGCTGCCGGGCAGGGACGCCCAGCGGCGGCGCCAACGTTCGAGGTGGCGAGGATGAAG





TTGATACGAACCAAAGGCTCATCGGGAATGTCCCTCGCCGAGCGCTTCTCCTTGACGCTCTCGA





GGAGCAGCCTCGTCGTCGGGCGGAGCTGCGTGGAGTTCGAGCCGGAGACCGTCCCGCTCCTGTC





GACGCTCCGCGGTAAGCCTATTACCTTCCTTGGCCTTATGCCGCCGTTGCATGAAGGCCGCCGC





GAGGACGGCGAGGATGCCACCGTCCGCTGGCTCGACGCGCAGCCGGCCAAGTCCGTCGTGTACG





TCGCGCTAGGCAGCGAGGTGCCACTGGGAGTGGAGAAGGTCCACGAGCTCGCGCTCGGGCTGGA





GCTCGCCGGGACGCGCTTCCTCTGGGCTCTTAGGAAGCCCACTGGCGTCTCCGACGCCGACCTC





CTCCCCGCCGGCTTCGAGGAGCGCACGCGCGGCCGCGGCGTCGTGGCGACGAGATGGGTTCCTC





AGATGAGCATACTGGCGCACGCCGCCGTGGGCGCGTTCCTGACCCACTGCGGCTGGAACTCGAC





CATCGAGGGGCTCATGTTCGGCCACCCGCTTATCATGCTGCCGATCTTCGGCGACCAGGGACCG





AACGCGCGGCTAATCGAGGCGAAGAACGCCGGATTGCAGGTGGCAAGAAACGACGGCGATGGAT





CGTTCGACCGAGAAGGCGTCGCGGCGGCGATTCGTGCAGTCGCGGTGGAGGAAGAAAGCAGCAA





AGTGTTTCAAGCCAAAGCCAAGAAGCTGCAGGAGATCGTCGCGGACATGGCCTGCCATGAGAGG





TACATCGACGGATTCATTCAGCAATTGAGATCTTACAAGGATTGA




Glycosyltransferas
ATGGATAGCGGTTATAGCAGCAGCTATGCAGCAGCAGCCGGTATGCATGTTGTTATTTGTCCGT
438



e(311) (gDNA
GGCTGGCATTTGGTCATCTGCTGCCGTGTCTGGATCTGGCACAGCGTCTGGCAAGCCGTGGTCA




codon optimized,
TCGTGTTAGCTTTGTTAGCACACCGCGTAATATTAGCCGTCTGCCTCCGGTTCGTCCGGCACTG





E. coli)

GCACCGCTGGTTGCATTTGTTGCACTGCCGCTGCCTCGTGTTGAAGGTCTGCCGGATGGTGCAG





AAAGCACCAATGATGTTCCGCATGATCGTCCGGATATGGTTGAACTGCATCGTCGTGCATTTGA





TGGTCTGGCAGCACCGTTTAGCGAATTTCTGGGCACCGCATGTGCAGATTGGGTTATTGTTGAT





GTTTTTCATCATTGGGCAGCCGCAGCAGCACTGGAACATAAAGTTCCGTGTGCAATGATGCTGC





TGGGTAGCGCACATATGATTGCAAGCATTGCAGATCGTCGTCTGGAACGTGCAGAAACCGAAAG





TCCTGCGGCAGCAGGTCAGGGTCGTCCTGCAGCCGCACCGACCTTTGAAGTTGCACGTATGAAA





CTGATTCGTACCAAAGGTAGCAGCGGTATGAGCCTGGCAGAACGTTTTAGTCTGACCCTGAGCC





GTAGCAGCCTGGTTGTTGGTCGTAGCTGTGTTGAATTTGAACCGGAAACCGTTCCGCTGCTGAG





CACCCTGCGTGGTAAACCGATTACCTTTCTGGGTCTGATGCCTCCGCTGCATGAAGGTCGTCGC





GAAGATGGTGAAGATGCAACCGTTCGTTGGCTGGATGCACAGCCTGCAAAAAGCGTTGTTTATG





TTGCCCTGGGTAGTGAAGTTCCGCTGGGTGTTGAAAAAGTGCATGAACTGGCACTGGGTTTAGA





ACTGGCAGGCACCCGTTTTCTGTGGGCACTGCGTAAACCGACCGGTGTTAGTGATGCCGATCTG





CTTCCGGCAGGTTTTGAAGAACGTACCCGTGGTCGTGGTGTTGTTGCAACCCGTTGGGTTCCGC





AGATGAGCATTCTGGCACATGCAGCAGTGGGTGCATTTCTGACCCATTGTGGTTGGAATAGCAC





CATTGAAGGCCTGATGTTTGGCCATCCGCTGATTATGCTGCCGATTTTTGGTGATCAGGGTCCG





AATGCACGTCTGATTGAAGCAAAAAATGCAGGTCTGCAGGTTGCCCGTAATGATGGTGATGGTA





GCTTTGATCGTGAAGGTGTTGCAGCAGCCATTCGTGCAGTTGCAGTTGAAGAAGAAAGCAGCAA





AGTTTTTCAGGCCAAAGCCAAAAAACTGCAAGAAATTGTTGCAGATATGGCCTGCCATGAACGT





TATATTGATGGTTTTATTCAGCAGCTGCGTAGCTACAAAGAT




UGT76G1 protein
MENKTETTVRRRRRIILFPVPFQGHINPILQLANVLYSKGFSITIFHTNFNKPKTSNYPHFTFR
439




FILDNDPQDERISNLPTHGPLAGMRIPIINEHGADELRRELELLMLASEEDEEVSCLITDALWY





FAQSVADSLNLRRLVLMTSSLFNFHAHVSLPQFDELGYLDPDDKTRLEEQASGFPMLKVKDIKS





AYSNWQILKEILGKMIKQTKASSGVIWNSFKELEESELETVIREIPAPSFLIPLPKHLTASSSS





LLDHDRTVFQWLDQQPPSSVLYVSFGSTSEVDEKDFLEIARGLVDSKQSFLWVVRPGFVKGSTW





VEPLPDGFLGERGRIVKWVPQQEVLAHGAIGAFWTHSGWNSTLESVCEGVPMIFSDFGLDQPLN





ARYMSDVLKVGVYLENGWERGEIANAIRRVMVDEEGEYIRQNARVLKQKADVSLMKGGSSYESL





ESLVSYISSL




UGT76G1 gene
ATGGAAAATAAAACGGAGACCACCGTTCGCCGGCGCCGGAGAATAATATTATTCCCGGTACCAT
440



sequence
TTCAAGGCCACATTAACCCAATTCTTCAGCTAGCCAATGTGTTGTACTCTAAAGGATTCAGTAT





CACCATCTTTCACACCAACTTCAACAAACCCAAAACATCTAATTACCCTCACTTCACTTTCAGA





TTCATCCTCGACAACGACCCACAAGACGAACGCATTTCCAATCTACCGACTCATGGTCCGCTCG





CTGGTATGCGGATTCCGATTATCAACGAACACGGAGCTGACGAATTACGACGCGAACTGGAACT





GTTGATGTTAGCTTCTGAAGAAGATGAAGAGGTATCGTGTTTAATCACGGATGCTCTTTGGTAC





TTCGCGCAATCTGTTGCTGACAGTCTTAACCTCCGACGGCTTGTTTTGATGACAAGCAGCTTGT





TTAATTTTCATGCACATGTTTCACTTCCTCAGTTTGATGAGCTTGGTTACCTCGATCCTGATGA





CAAAACCCGTTTGGAAGAACAAGCGAGTGGGTTTCCTATGCTAAAAGTGAAAGACATCAAGTCT





GCGTATTCGAACTGGCAAATACTCAAAGAGATATTAGGGAAGATGATAAAACAAACAAAAGCAT





CTTCAGGAGTCATCTGGAACTCATTTAAGGAACTCGAAGAGTCTGAGCTCGAAACTGTTATCCG





TGAGATCCCGGCTCCAAGTTTCTTGATACCACTCCCCAAGCATTTGACAGCCTCTTCCAGCAGC





TTACTAGACCACGATCGAACCGTTTTTCAATGGTTAGACCAACAACCGCCAAGTTCGGTACTGT





ATGTTAGTTTTGGTAGTACTAGTGAAGTGGATGAGAAAGATTTCTTGGAAATAGCTCGTGGGTT





GGTTGATAGCAAGCAGTCGTTTTTATGGGTGGTTCGACCTGGGTTTGTCAAGGGTTCGACGTGG





GTCGAACCGTTGCCAGATGGGTTCTTGGGTGAAAGAGGACGTATTGTGAAATGGGTTCCACAGC





AAGAAGTGCTAGCTCATGGAGCAATAGGCGCATTCTGGACTCATAGCGGATGGAACTCTACGTT





GGAAAGCGTTTGTGAAGGTGTTCCTATGATTTTCTCGGATTTTGGGCTCGATCAACCGTTGAAT





GCTAGATACATGAGTGATGTTTTGAAGGTAGGGGTGTATTTGGAAAATGGGTGGGAAAGAGGAG





AGATAGCAAATGCAATAAGAAGAGTTATGGTGGATGAAGAAGGAGAATACATTAGACAGAATGC





AAGAGTTTTGAAACAAAAGGCAGATGTTTCTTTGATGAAGGGTGGTTCGTCTTACGAATCATTA





GAGTCTCTAGTTTCTTACATTTCATCGTTGTAA




UGT73C5 protein
MSPILGYWKIKGLVQPTRLLLEYLEEKYEEHLYERDEGDKWRNKKFELGLEFPNLPYYIDGDVK
441




LTQSMAIIRYIADKHNMLGGCPKERAEISMLEGAVLDIRYGVSRIAYSKDFETLKVDFLSKLPE





MLKMFEDRLCHKTYLNGDHVTHPDFMLYDALDVVLYMDPMCLDAFPKLVCFKKRIEAIPQIDKY





LKSSKYIAWPLQGWQATFGGGDHPPKSDLVPRGSVSETTKSSPLHFVLFPFMAQGHMIPMVDIA





RLLAQRGVIITIVTTPHNAARFKNVLNRAIESGLPINLVQVKFPYLEAGLQEGQENIDSLDTME





RMIPFFKAVNFLEEPVQKLIEEMNPRPSCLISDFCLPYTSKIAKKFNIPKILFHGMGCFCLLCM





HVLRKNREILDNLKSDKELFTVPDFPDRVEFTRTQVPVETYVPAGDWKDIFDGMVEANETSYGV





IVNSFQELEPAYAKDYKEVRSGKAWTIGPVSLCNKVGADKAERGNKSDIDQDECLKWLDSKKHG





SVLYVCLGSICNLPLSQLKELGLGLEESQRPFIWVIRGWEKYKELVEWFSESGFEDRIQDRGLL





IKGWSPQMLILSHPSVGGFLTHCGWNSTLEGITAGLPLLTWPLFADQFCNEKLVVEVLKAGVRS





GVEQPMKWGEEEKIGVLVDKEGVKKAVEELMGESDDAKERRRRAKELGDSAHKAVEEGGSSHSN





ISFLLQDIMELAEPNNAAAS




UGT73C5 gene
ATGTCCCCTATACTAGGTTATTGGAAAATTAAGGGCCTTGTGCAACCCACTCGACTTCTTTTGG
442



sequence
AATATCTTGAAGAAAAATATGAAGAGCATTTGTATGAGCGCGATGAAGGTGATAAATGGCGAAA





CAAAAAGTTTGAATTGGGTTTGGAGTTTCCCAATCTTCCTTATTATATTGATGGTGATGTTAAA





TTAACACAGTCTATGGCCATCATACGTTATATAGCTGACAAGCACAACATGTTGGGTGGTTGTC





CAAAAGAGCGTGCAGAGATTTCAATGCTTGAAGGAGCGGTTTTGGATATTAGATACGGTGTTTC





GAGAATTGCATATAGTAAAGACTTTGAAACTCTCAAAGTTGATTTTCTTAGCAAGCTACCTGAA





ATGCTGAAAATGTTCGAAGATCGTTTATGTCATAAAACATATTTAAATGGTGATCATGTAACCC





ATCCTGACTTCATGTTGTATGACGCTCTTGATGTTGTTTTATACATGGACCCAATGTGCCTGGA





TGCGTTCCCAAAATTAGTTTGTTTTAAAAAACGTATTGAAGCTATCCCACAAATTGATAAGTAC





TTGAAATCCAGCAAGTATATAGCATGGCCTTTGCAGGGCTGGCAAGCCACGTTTGGTGGTGGCG





ACCATCCTCCAAAATCGGATCTGGTTCCGCGTGGATCCGTTTCCGAAACAACCAAATCTTCTCC





ACTTCACTTTGTTCTCTTCCCTTTCATGGCTCAAGGCCACATGATTCCCATGGTTGATATTGCA





AGGCTCTTGGCTCAGCGTGGTGTGATCATAACAATTGTCACGACGCCTCACAATGCAGCGAGGT





TCAAGAATGTCCTAAACCGTGCCATTGAGTCTGGCTTGCCCATCAACTTAGTGCAAGTCAAGTT





TCCATATCTAGAAGCTGGTTTGCAAGAAGGACAAGAGAATATCGATTCTCTTGACACAATGGAG





CGGATGATACCTTTCTTTAAAGCGGTTAACTTTCTCGAAGAACCAGTCCAGAAGCTCATTGAAG





AGATGAACCCTCGACCAAGCTGTCTAATTTCTGATTTTTGTTTGCCTTATACAAGCAAAATCGC





CAAGAAGTTCAATATCCCAAAGATCCTCTTCCATGGCATGGGTTGCTTTTGTCTTCTGTGTATG





CATGTTTTACGCAAGAACCGTGAGATCTTGGACAATTTAAAGTCAGATAAGGAGCTTTTCACTG





TTCCTGATTTTCCTGATAGAGTTGAATTCACAAGAACGCAAGTTCCGGTAGAAACATATGTTCC





AGCTGGAGACTGGAAAGATATCTTTGATGGTATGGTAGAAGCGAATGAGACATCTTATGGTGTG





ATCGTCAACTCATTTCAAGAGCTCGAGCCTGCTTATGCCAAAGACTACAAGGAGGTAAGGTCCG





GTAAAGCATGGACCATTGGACCCGTTTCCTTGTGCAACAAGGTAGGAGCCGACAAAGCAGAGAG





GGGAAACAAATCAGACATTGATCAAGATGAGTGCCTTAAATGGCTCGATTCTAAGAAACATGGC





TCGGTGCTTTACGTTTGTCTTGGAAGTATCTGTAATCTTCCTTTGTCTCAACTCAAGGAGCTGG





GACTAGGCCTAGAGGAATCCCAAAGACCTTTCATTTGGGTCATAAGAGGTTGGGAGAAGTACAA





AGAGTTAGTTGAGTGGTTCTCGGAAAGCGGCTTTGAAGATAGAATCCAAGATAGAGGACTTCTC





ATCAAAGGATGGTCCCCTCAAATGCTTATCCTTTCACATCCATCAGTTGGAGGGTTCCTAACAC





ACTGTGGTTGGAACTCGACTCTTGAGGGGATAACTGCTGGTCTACCGCTACTTACATGGCCGCT





ATTCGCAGACCAATTCTGCAATGAGAAATTGGTCGTTGAGGTACTAAAAGCCGGTGTAAGATCC





GGGGTTGAACAGCCTATGAAATGGGGAGAAGAGGAGAAAATAGGAGTGTTGGTGGATAAAGAAG





GAGTGAAGAAGGCAGTGGAAGAATTAATGGGTGAGAGTGATGATGCAAAAGAGAGAAGAAGAAG





AGCCAAAGAGCTTGGAGATTCAGCTCACAAGGCTGTGGAAGAAGGAGGCTCTTCTCATTCTAAC





ATCTCTTTCTTGCTACAAGACATAATGGAACTGGCAGAACCCAATAATGCGGCCGCATCGTGA




UGT73C5 gene
ATGAGGCATGGATCCGTTAGCGAAACCACCAAAAGCAGTCCGCTGCATTTTGTTCTGTTTCCGT
443



sequence (Codon
TTATGGCACAGGGTCATATGATTCCGATGGTTGATATTGCACGTCTGCTGGCACAGCGTGGTGT




optimized, E.
GATTATTACCATTGTTACCACACCGCATAATGCAGCACGCTTTAAAAACGTTCTGAATCGTGCA





coli)

ATTGAAAGCGGTCTGCCGATTAATCTGGTTCAGGTTAAATTTCCGTATCTGGAAGCAGGTCTGC





AAGAAGGTCAAGAAAATATTGATAGCCTGGATACCATGGAACGCATGATTCCGTTTTTCAAAGC





CGTGAATTTTCTGGAAGAACCGGTGCAGAAACTGATCGAAGAAATGAATCCGCGTCCGAGCTGT





CTGATTAGCGATTTTTGTCTGCCGTATACCAGCAAAATCGCCAAAAAATTCAACATCCCGAAAA





TCCTGTTTCATGGTATGGGTTGTTTTTGcctgctgtgtatgcatgttcTGCGTAAAAATCGTGA





AATCCTGGATAACCTGAAAAGCGATAAAGAACTGTTTACCGTTCCGGATTTTCCGGATCGTGTG





GAATTTACCCGTACACAGGTTCCGGTTGAAACCTATGTTCCGGCAGGCGATTGGAAAGATATTT





TTGATGGTATGGTGGAAGCCAACGAAACCAGCTATGGTGTTATTGTGAATAGCTTTCAAGAACT





GGAACCGGCATATGCGAAAGATTACAAAGAAGTTCGTAGCGGTAAAGCATGGACCATTGGTCCG





GTTAGCCTGTGTAATAAAGTTGGTGCAGATAAAGCAGAACGCGGTAATAAAAGTGATATCGATC





AGGATGAATGCCTGAAATGGCTGGATAGCAAAAAACATGGTAGCGTTCTGTATGTTTGTCTGGG





TAGCATTTGCAATCTGCCGCTGAGCCAGCTGAAAGAATTAGGTCTGGGTTTAGAAGAAAGCCAG





CGTCCGTTTATTTGGGTTATTCGTGGTTGGGAGAAATACAAAGAACTGGTTGAATGGTTTAGCG





AAAGCGGTTTTGAAGATCGTATTCAGGATCGTGGCCTGCTGATTAAAGGTTGGAGTCCGCAGAT





GCTGATTCTGAGCCATCCGAGCGTTGGTGGCTTTCTGACCCATTGTGGTTGGAATAGCACCCTG





GAAGGTATTACAGCTGGCCTGCCGCTGCTGACCTGGCCTCTGTTTGCAGATCAGTTTTGTAATG





AAAAACTGGTGGTGGAAGTTCTGAAAGCCGGTGTGCGTAGCGGTGTTGAACAGCCGATGAAATG





GGGTGAAGAAGAAAAAATTGGCGTCCTGGTTGATAAAGAAGGTGTTAAAAAAGCCGTGGAAGAA





CTGATGGGTGAAAGTGATGATGCAAAAGAACGTCGTCGTCGTGCAAAAGAGCTGGGCGATAGCG





CACATAAAGCAGTTGAAGAAGGTGGTAGCAGCCATAGCAATATTAGCTTTCTGCTGCAGGATAT





TATGGAACTGGCAGAACCGAATAACTAAGCGGCCGCTGAA




UGT73C6 protein
MAFEKNNEPFPLHFVLFPFMAQGHMIPMVDIARLLAQRGVLITIVTTPHNAARFKNVLNRAIES
444




GLPINLVQVKFPYQEAGLQEGQENMDLLTTMEQITSFFKAVNLLKEPVQNLIEEMSPRPSCLIS





DMCLSYTSEIAKKFKIPKILFHGMGCFCLLCVNVLRKNREILDNLKSDKEYFIVPYFPDRVEFT





RPQVPVETYVPAGWKEILEDMVEADKTSYGVIVNSFQELEPAYAKDFKEARSGKAWTIGPVSLC





NKVGVDKAERGNKSDIDQDECLEWLDSKEPGSVLYVCLGSICNLPLSQLLELGLGLEESQRPFI





WVIRGWEKYKELVEWFSESGFEDRIQDRGLLIKGWSPQMLILSHPSVGGFLTHCGWNSTLEGIT





AGLPMLTWPLFADQFCNEKLVVQILKVGVSAEVKEVMKWGEEEKIGVLVDKEGVKKAVEELMGE





SDDAKERRRRAKELGESAHKAVEEGGSSHSNITFLLQDIMQLAQSNN




UGT73C6 (gDNA,
ATGGCTTTCGAAAAAAACAACGAACCTTTTCCTCTTCACTTTGTTCTCTTCCCTTTCATGGCTC
445



native)
AAGGCCACATGATTCCCATGGTTGATATTGCAAGGCTCTTGGCTCAGCGAGGTGTGCTTATAAC





AATTGTCACGACGCCTCACAATGCAGCAAGGTTCAAGAATGTCCTAAACCGTGCCATTGAGTCT





GGTTTGCCCATCAACCTAGTGCAAGTCAAGTTTCCATATCAAGAAGCTGGTCTGCAAGAAGGAC





AAGAAAATATGGATTTGCTTACCACGATGGAGCAGATAACATCTTTCTTTAAAGCGGTTAACTT





ACTCAAAGAACCAGTCCAGAACCTTATTGAAGAGATGAGCCCGCGACCAAGCTGTCTAATCTCT





GATATGTGTTTGTCGTATACAAGCGAAATCGCCAAGAAGTTCAAAATACCAAAGATCCTCTTCC





ATGGCATGGGTTGCTTTTGTCTTCTGTGTGTTAACGTTCTGCGCAAGAACCGTGAGATCTTGGA





CAATTTAAAGTCTGATAAGGAGTACTTCATTGTTCCTTATTTTCCTGATAGAGTTGAATTCACA





AGACCTCAAGTTCCGGTGGAAACATATGTTCCTGCAGGCTGGAAAGAGATCTTGGAGGATATGG





TAGAAGCGGATAAGACATCTTATGGTGTTATAGTCAACTCATTTCAAGAGCTCGAACCTGCGTA





TGCCAAAGACTTCAAGGAGGCAAGGTCTGGTAAAGCATGGACCATTGGACCTGTTTCCTTGTGC





AACAAGGTAGGAGTAGACAAAGCAGAGAGGGGAAACAAATCAGATATTGATCAAGATGAGTGCC





TTGAATGGCTCGATTCTAAGGAACCGGGATCTGTGCTCTACGTTTGCCTTGGAAGTATTTGTAA





TCTTCCTCTGTCTCAGCTCCTTGAGCTGGGACTAGGCCTAGAGGAATCCCAAAGACCTTTCATC





TGGGTCATAAGAGGTTGGGAGAAATACAAAGAGTTAGTTGAGTGGTTCTCGGAAAGCGGCTTTG





AAGATAGAATCCAAGATAGAGGACTTCTCATCAAAGGATGGTCCCCTCAAATGCTTATCCTTTC





ACATCCTTCTGTTGGAGGGTTCTTAACGCACTGCGGATGGAACTCGACTCTTGAGGGGATAACT





GCTGGTCTACCAATGCTTACATGGCCACTATTTGCAGACCAATTCTGCAACGAGAAACTGGTCG





TACAAATACTAAAAGTCGGTGTAAGTGCCGAGGTTAAAGAGGTCATGAAATGGGGAGAAGAAGA





GAAGATAGGAGTGTTGGTGGATAAAGAAGGAGTGAAGAAGGCAGTGGAAGAACTAATGGGTGAG





AGTGATGATGCAAAAGAGAGAAGAAGAAGAGCCAAAGAGCTTGGAGAATCAGCTCACAAGGCTG





TGGAAGAAGGAGGCTCCTCTCATTCTAATATCACTTTCTTGCTACAAGACATAATGCAACTAGC





ACAGTCCAATAAT




SgCbQ protein
MWRLKVGAESVGENDEKWLKSISNHLGRQVWEFCPDAGTQQQLLQVHKARKAFHDDRFHRKQSS
446




DLFITIQYGKEVENGGKTAGVKLKEGEEVRKEAVESSLERALSFYSSIQTSDGNWASDLGGPMF





LLPGLVIALYVTGVLNSVLSKHHRQEMCRYVYNHQNEDGGWGLHIEGPSTMFGSALNYVALRLL





GEDANAGAMPKARAWILDHGGATGITSWGKLWLSVLGVYEWSGNNPLPPEFWLFPYFLPFHPGR





MWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYAVPYHEIDWNKSRNTCAKEDLYYPHPKM





QDILWGSLHHVYEPLFTRWPAKRLREKALQTAMQHIHYEDENTRYICLGPVNKVLNLLCCWVED





PYSDAFKLHLQRVHDYLWVAEDGMKMQGYNGSQLWDTAFSIQAIVSTKLVDNYGPTLRKAHDFV





KSSQIQQDCPGDPNVWYRHIHKGAWPFSTRDHGWLISDCTAEGLKAALMLSKLPSETVGESLER





NRLCDAVNVLLSLQNDNGGFASYELTRSYPWLELINPAETFGDIVIDYPYVECTSATMEALTLF





KKLHPGHRTKEIDTAIVRAANFLENMQRTDGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCLA





IRKACDFLLSKELPGGGWGESYLSCQNKVYTNLEGNRPHLVNTAWVLMALIEAGQAERDPTPLH





RAARLLINSQLENGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYCHRVLTE




glycoside
MNKSVAPLLLAASILYGGAAAQQTVWGQCGGIGWSGPTNCAPGSACSTLNPYYAQCIPGATTIT
447



hydrolase family 5
TSTRPPSGPTTTTRATSTSSSTPPTSSGVRFAGVNIAGFDFGCTTDGTCVTSKVYPPLKNFTGS




protein
NNYPDGIGQMQHFVNDDGMTIFRLPVGWQYLVNNNLGGNLDSTSISKYDQLVQGCLSLGAYCIV




[Trichoderma
DIHNYARWNGGIIGQGGPTNAQFTSLWSQLASKYASQSRVWFGIMNEPHDVNINTWAATVQEVV





reesei QM6a]

TAIRNAGATSQFISLPGNDWQSAGAFISDGSAAALSQVTNPDGSTTNLIFDVHKYLDSDNSGTH




GenBank: EGR512.1
AECTTNNIDGAFSPLATWLRQNNRQAILTETGGGNVQSCIQDMCQQIQYLNQNSDVYLGYVGWG





AGSFDSTYVLTETPTGSGNSWTDTSLVSSCLARK




glycoside
MKSCAILAALGCLAGSVLGHGQVQNFTINGQYNQGFILDYYYQKQNTGHFPNVAGWYAEDLDLG
448



hydrolase family
FISPDQYTTPDIVCHKNAAPGAISATAAAGSNIVFQWGPGVWPHPYGPIVTYVAECSGSCTTVN




61 protein [T.
KNNLRWVKIQEAGINYNTQVWAQQDLINQGNKWTVKIPSSLRPGNYVFRHELLAAHGASSANGM





reesei QM6a]

QNYPQCVNIAVTGSGTKALPAGTPATQLYKPTDPGILFNPYTTITSYTIPGPALWQG




EGR50392.1





GI :340520155





glycoside
SAHTTFTTLFIDKKNQGDGTCVRMPYDDKTATNPVKPITSSDMACGRNGGDPVPFICSAKKGSL
449



hydrolase family
LTFEFRLWPDAQQPGSIDPGHLGPCAVYLKKVDNMFSDSAAGGGWFKIWEDGYDSKTQKWCVDR




61 protein,
LVKNNGLLSVRLPRGLPAGYYIVRPEILALHWAAHRDDPQFYLGCAQIFVDSDVRGPLEIPRRQ




partial
QATIPGYVNAKTPGLTFDIYQDKLPPYPMPGPKVYIPPAKGNKPNQDLNAGRLVQTDGLIPKDC




[Trichoderma
LIKKANWCGRPVEPYSSARMCWRAVNDCYAQSKKCRESSPPIGLTNCDRWSDHCGKMDALCEQE





reesei QM6a]

KYKGPP




EGR49821.1





GI :340519583





glycoside
MAPSVTLPLTTAILAIARLVAAQQPGTSTPEVHPKLTTYKCTKSGGCVAQDTSVVLDWNYRWMH
450



hydrolase family 7
DANYNSCTVNGGVNTTLCPDEATCGKNCFIEGVDYAASGVTTSGSSLTMNQYMPSSSGGYSSVS




protein [T. reesei
PRLYLLDSDGEYVMLKLNGQELSFDVDLSALPCGENGSLYLSQMDENGGANQYNTAGANYGSGY




QM6a]
CDAQCPVQTWRNGTLNTSHQGFCCNEMDILEGNSRANALTPHSCTATACDSAGCGFNPYGSGYK




EGR48251.1
SYYGPGDTVDTSKTFTIITQFNTDNGSPSGNLVSITRKYQQNGVDIPSAQPGGDTISSCPSASA




GI:340518009
YGGLATMGKALSSGMVLVFSIWNDNSQYMNWLDSGNAGPCSSTEGNPSNILANNPNTHVVFSNI





RWGDIGSTTNSTAPPPPPASSTTFSTTRRSSTTSSSPSCTQTHWGQCGGIGYSGCKTCTSGTTC





QYSNDYYSQCL




glycoside
MKATLVLGSLIVGAVSAYKATTTRYYDGQEGACGCGSSSGAFPWQLGIGNGVYTAAGSQALFDT
451



hydrolase family
AGASWCGAGCGKCYQLTSTGQAPCSSCGTGGAAGQSIIVMVTNLCPNNGNAQWCPVVGGTNQYG




45 protein [T.
YSYHFDIMAQNEIFGDNVVVDFEPIACPGQAASDWGTCLCVGQQETDPTPVLGNDTGSTPPGSS





reesei QM6a]

PPATSSSPPSGGGQQTLYGQCGGAGWTGPTTCQAPGTCKVQNQWYSQCLP




EGR47058.1





GI:340516811





glycoside
MRATSLLAAALAVAGDALAGKIKYLGVAIPGIDFGCDIDGSCPTDTSSVPLLSYKGGDGAGQMK
452



hydrolase family 5
HFAEDDGLNVFRISATWQFVLNNTVDGKLDELNWGSYNKVVNACLETGAYCMIDMHNFARYNGG
453



protein [T. reesei
IIGQGGVSDDIFVDLWVQIAKYYEDNDKIIFGLMNEPHDLDIEIWAQTCQKVVTAIRKAGATSQ




QM6a]
MILLPGTNFASVETYVSTGSAEALGKITNPDGSTDLLYFDVHKYLDINNSGSHAECTTDNVDAF




EGR44174.1
NDFADWLRQNKRQAIISETGASMEPSCMTAFCAQNKAISENSDVYIGFVGWGAGSFDTSYILTL




GI:340513898
TPLGKPGNYTDNKLMNECILDQFTLDEKYRPTPTSISTAAEETATATATSDGDAPSTTKPIFRE





ETASPTPNAVTKPSPDTSDSSDDDKDSAASMSAQGLTGTVLFTVAALGYMLVAF




glycoside
MKATLVLGSLIVGAVSAYKATTTRYYDGQEGACGCGSSSGAFPWQLGIGNGVYTAAGSQALFDT
454



hydrolase family
AGASWCGAGCGKCYQLTSTGQAPCSSCGTGGAAGQSIIVMVTNLCPNNGNAQWCPVVGGTNQYG




45 protein [T.
YSYHFDIMAQNEIFGDNVVVDFEPIACPGQAASDWGTCLCVGQQETDPTPVLGNDTGSTPPGSS





reesei QM6a]

PPATSSSPPSGGGQQTLYGQCGGAGWTGPTTCQAPGTCKVQNQWYSQCLP




XP_006967072.1





GI:589110099





glycoside
MAPSVTLPLTTAILAIARLVAAQQPGTSTPEVHPKLTTYKCTKSGGCVAQDTSVVLDWNYRWMH
455



hydrolase family 7
DANYNSCTVNGGVNTTLCPDEATCGKNCFIEGVDYAASGVTTSGSSLTMNQYMPSSSGGYSSVS




protein [T. reesei
PRLYLLDSDGEYVMLKLNGQELSFDVDLSALPCGENGSLYLSQMDENGGANQYNTAGANYGSGY




QM6a]
CDAQCPVQTWRNGTLNTSHQGFCCNEMDILEGNSRANALTPHSCTATACDSAGCGFNPYGSGYK




XP_006965674.1
SYYGPGDTVDTSKTFTIITQFNTDNGSPSGNLVSITRKYQQNGVDIPSAQPGGDTISSCPSASA




GI:589107303
YGGLATMGKALSSGMVLVFSIWNDNSQYMNWLDSGNAGPCSSTEGNPSNILANNPNTHVVFSNI





RWGDIGSTTNSTAPPPPPASSTTFSTTRRSSTTSSSPSCTQTHWGQCGGIGYSGCKTCTSGTTC





QYSNDYYSQCL




glycoside
SAHTTFTTLFIDKKNQGDGTCVRMPYDDKTATNPVKPITSSDMACGRNGGDPVPFICSAKKGSL
456



hydrolase family
LTFEFRLWPDAQQPGSIDPGHLGPCAVYLKKVDNMFSDSAAGGGWFKIWEDGYDSKTQKWCVDR




61 protein,
LVKNNGLLSVRLPRGLPAGYYIVRPEILALHWAAHRDDPQFYLGCAQIFVDSDVRGPLEIPRRQ




partial [T. reesei
QATIPGYVNAKTPGLTFDIYQDKLPPYPMPGPKVYIPPAKGNKPNQDLNAGRLVQTDGLIPKDC




QM6a]
LIKKANWCGRPVEPYSSARMCWRAVNDCYAQSKKCRESSPPIGLTNCDRWSDHCGKMDALCEQE




XP_006964038.1
KYKGPP




GI :589104031





glycoside
MKSCAILAALGCLAGSVLGHGQVQNFTINGQYNQGFILDYYYQKQNTGHFPNVAGWYAEDLDLG
457



hydrolase family
FISPDQYTTPDIVCHKNAAPGAISATAAAGSNIVFQWGPGVWPHPYGPIVTYVAECSGSCTTVN




61 protein [T.
KNNLRWVKIQEAGINYNTQVWAQQDLINQGNKWTVKIPSSLRPGNYVFRHELLAAHGASSANGM





reesei QM6a]

QNYPQCVNIAVTGSGTKALPAGTPATQLYKPTDPGILFNPYTTITSYTIPGPALWQG




XP_006963879.1





GI :589103713





glycoside
MNKSVAPLLLAASILYGGAAAQQTVWGQCGGIGWSGPTNCAPGSACSTLNPYYAQCIPGATTIT
458



hydrolase family 5
TSTRPPSGPTTTTRATSTSSSTPPTSSGVRFAGVNIAGFDFGCTTDGTCVTSKVYPPLKNFTGS




protein [T. reesei
NNYPDGIGQMQHFVNDDGMTIFRLPVGWQYLVNNNLGGNLDSTSISKYDQLVQGCLSLGAYCIV




QM6a]
DIHNYARWNGGIIGQGGPTNAQFTSLWSQLASKYASQSRVWFGIMNEPHDVNINTWAATVQEVV




XP_006962583.1
TAIRNAGATSQFISLPGNDWQSAGAFISDGSAAALSQVTNPDGSTTNLIFDVHKYLDSDNSGTH




GI :589101121
AECTTNNIDGAFSPLATWLRQNNRQAILTETGGGNVQSCIQDMCQQIQYLNQNSDVYLGYVGWG





AGSFDSTYVLTETPTGSGNSWTDTSLVSSCLARK




glycoside
MIQKLSNLLVTALAVATGVVGHGHINDIVINGVWYQAYDPTTFPYESNPPIVVGWTAADLDNGF
459



hydrolase family
VSPDAYQNPDIICHKNATNAKGHASVKAGDTILFQWVPVPWPHPGPIVDYLANCNGDCETVDKT




61 protein [T.
TLEFFKIDGVGLLSGGDPGTWASDVLISNNNTWVVKIPDNLAPGNYVLRHEIIALHSAGQANGA





reesei QM6a]

QNYPQCFNIAVSGSGSLQPSGVLGTDLYHATDPGVLINIYTSPLNYIIPGPTVVSGLPTSVAQG




XP_006961567.1
SSAATATASATVPGGGSGPTSRTTTTARTTQASSRPSSTPPATTSAPAGGPTQTLYGQCGGSGY




GI:589099089
SGPTRCAPPATCSTLNPYYAQCLN




Endoglucanase-7;
MKSCAILAALGCLAGSVLGHGQVQNFTINGQYNQGFILDYYYQKQNTGHFPNVAGWYAEDLDLG
460



also known as
FISPDQYTTPDIVCHKNAAPGAISATAAAGSNIVFQWGPGVWPHPYGPIVTYVVECSGSCTTVN




Cellulase-61B
KNNLRWVKIQEAGINYNTQVWAQQDLINQGNKWTVKIPSSLRPGNYVFRHELLAAHGASSANGM




(Ce161B), Endo-1,
QNYPQCVNIAVTGSGTKALPAGTPATQLYKPTDPGILFNPYTTITSYTIPGPALWQG




4-beta-glucanase





(EGVII);





Endoglucanase VII;





Endoglucanase-61B;





Q7Z9M7.3





GI:43314396





xylanase
MVSFTSLLAASPPSRASCRPAAEVESVAVEKRQTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPG
461



[Trichoderma
GQFSVNWSNSGNFVGGKGWQPGTKNKVINFSGSYNPNGNSYLSVYGWSRNPLIEYYIVENFGTY





reesei]

NPSTGATKLGEVTSDGSVYDIYRTQRVNQPSIIGTATFYQYWSVRRNHRSSGSVNTANHFNAWA




CAA49293.1
QQGLTLGTMDYQIVAVEGYFSSGSASITVS




GI :396564





xylanase [T.
MVAFSSLICALTSIASTLAMPTGLEPESSVNVTERGMYDFVLGAHNDHRRRASINYDQNYQTGG
462



reesei]
QVSYSPSNTGFSVNWNTQDDFVVGVGWTTGSSAPINFGGSFSVNSGTGLLSVYGWSTNPLVEYY




CAA49294.1
IMEDNHNYPAQGTVKGTVTSDGATYTIWENTRVNEPSIQGTATFNQYISVRNSPRTSGTVTVQN




GI :396566
HFNAWASLGLHLGQMNYQVVAVEGWGGSGSASQSVSN




beta-xylanase
MVSFTSLLAGVAAISGVLAAPAAEVEPVAVEKRQTIQPGTGYNNGYFHSYWNDGHGGVTYTNGP
463



precursor
GGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSGSYNPNGNSYLSVYGWSRNPLIEYYIVGNFGT




[Trichoderma
YNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSIIGTATFYQYWSVRRNHRSSGSVNTANHFNAW




reesei]
AQQGLTLGTMDYQIVAVEGYFSSGSASITVS




AAB5278.1





GI:78816





Chain A,
XTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG
464



Structural
SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI




Comparison Of Two
IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS




Major Endo-1,4-





Beta-Xylanases





From Trichodrema






Reesei






1XYP_A GI:112721





Chain B,
XTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG
465



Structural
SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI




Comparison Of Two
IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS




Major Endo-1,4-





Beta-Xylanases





From Trichodrema






Reesei






1XYP_B GI:1127211





Chain A,
XTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG
466



Structural
SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI




Comparison Of Two
IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS




Major Endo-1,4-





Beta-Xylanases





From Trichodrema






Reesei






1XYO_A GI:1127212





Chain B,
XTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG
467



Structural
SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI




Comparison Of Two
IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS




Major Endo-1,4-





Beta-Xylanases





From Trichodrema






Reesei






1XYO_B GI:1127213





Chain A,
XTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG
468



Structural
SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI




Comparison Of Two
IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS




Major Endo-1,4-





Beta-Xylanases





From Trichodrema






Reesei






1ENX_A GI:1127272





Chain B,
XTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG
469



Structural
SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI




Comparison Of Two
IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS




Major Endo-1,4-





Beta-Xylanases





From Trichodrema






Reesei






1ENX_B GI:1127273





Chain A, Endo-1,4-
XTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG
470



beta-xylanase Ii
SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI




Complex With 4,5-
IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS




epoxypentyl-beta-





D-xyloside





1RED_A GI:1942592





Chain B, Endo-1,4-
XTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG
471



beta-xylanase Ii
SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI




Complex With 4,5-
IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS




epoxypentyl-beta-





D-xyloside





1RED_B GI:1942593





Chain A, Endo-1,4-
XTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG
472



Beta-Xylanase Ii
SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI




Complex With 3,4-
IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS




Epoxybutyl-Beta-D-





Xylostde





1REE_A GI:1942594





Chain B, Endo-1,4-
XTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG
473



Beta-Xylanase Ii
SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI




Complex With 3,4-
IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS




Epoxybutyl-Beta-





D-Xylostde





1REE_B GI:1942595





Chain A, Endo-1,4-
XTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG
474



Beta-Xylanase Ii
SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI




Complex With 2,3-
IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS




Epoxypropyl-Beta-





D-Xyloside





1REF_A GI:1942596





Chain B, Endo-1,4-
XTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG
475



Beta-Xylanase Ii
SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI




Complex With 2,3-
IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS




Epoxypropyl-Beta-





D-Xyloside





1REF_B GI:1942597





xylanase III [T.






reesei]






BAA89465.2





GI:7328936





xylanase, partial
SYLSVYGWXTDPLIEYYIVESYGDYNPGSGGTYKGTVTSDGSVYDIYTATRTNAASIQGTATFT
476



[T. reesei]
QYWSVRR




AAG01167.1





GI:9858850





Transcription
MDLRQACDRCHDKKLRCPRISGSPCCSRCAKANVACVFSPPSRPFRPHEPLNHSHEHSHSHSHN
477



factor ACEII [T.
HNGVGVSFDWLDLMSLEQQQEQQQGQPQHPPPPVQTLSERLAALLCALDRMLQAVPSSLDMHHV





reesei]

SRQQLREYADTVGTGFDLQSTLDSLLHHAQDLASLYSEAVPASFNKRTTAAEADALCAVPDCVH




AAK69383.1
QDRTSLHTTPLPKLDHALLNLVMACHIRLLDVMDTLAEHGRMCAFMVATLPPDYDPKFAVPEIR




GI: 14581734
VGTFVAPTDTAASMLLSVVVELQTVLVARVKDLVAMVDQVKDDARAAREAKVVRLQCGILLERA





ESTLGEWSRFKDGLVSARLLK




xylanase regulator
MLSNPLRRYSAYPDISSASFDPNYHGSQSHLHSINVNTFGNSHPYPMQHLAQHAELSSSRMIRA
478



1, partial
SPVQPKQRQGSLIAARKNSTGTAGPIRRRISRACDQCNQLRTKCDGLHPCAHCIEFGLGCEYVR




[Trichoderma
ERKKRGKASRKDIAAQQAAAAAAQHSGQVQDGPEDQHRKLSRQQSESSRGSAELAQPAHDPPHG





reesei]

HIEGSVSSFSDNGLSQHAAMGGMDGLEDHHGHVGVDPALGRTQLEASSAMGLGAYGEVHPGYES




AAO33577.1
PGMNGHVMVPPSYGAQTTMAGYSGISYAAQAPSPATYSSDGNFRLTGHIHDYPLANGSSPSWGV




GI:28194501
SLASPSLRYHVLRPVLLDVRNIYPVSLACDQMDMYFSSSSSAQMRPMSPYVEGFVFRKRSFLHP





TDPRRCQPALLASMLWVAAQTSEASFLTSLPSARSKVCQKLLELTVGLLQPLIHTGTNSPSPKT





SPVVGAAALGVLGVAMPGSLNMDSLAGETGAFGAIGSLDDVIAYVRLATVVSASEYKGASLRWW





GAAWSLARELKLGRELPPGNPPANQEDGEGLSEDVDEHDLNRNNTRLGRKRSAKSDAITEEERE





ERRRAWWLVYIVDRHLALCYNRPLFLLDSECSDLYHPMDDIKWQAGKFRSHDAGNSSINIDSSM





TDEFGDSPRAARGAHYECRGRSIFGYFLSLMTILGEIVDVHHAKSHPRFGVGFRSARDWDEQVA





EITRHLDMYEESLKRFVAKHLPLSSKDKEQHEMHDSGAVTDMQSPLSVRTNASSRMTESEIQAS





IVVAYSTHVMHVLHILLADKWDPINLLDDDDLWISSEGFVTATSHAVSAAEAISQILEFDPGLE





FMPFFFGIYLLQGSFLLLLIADKLQAEASPSVIKACETIVRAHEACVVTLSTEYQRNFSKVMRS





ALALIRGRVPEDLAEQQQRRRELLALYRWTGNGTGLAL




Transcription
MDLRQACDRCHDKKLRCPRISGSPCCSRCAKANVACVFSPPSRPFRPHEPLNHSHEHSHSHSHN
479



factor ACEII
HNGVGVSFDWLDLMSLEQQQEQQQGQPQHPPPPVQTLSERLAALLCALDRMLQAVPSSLDMHHV




protein
SRQQLREYADTVGTGFDLQSTLDSLLHHAQDLASLYSEAVPASFNKRTTAAEADALCAVPDCVH




Q96WN6.1
QDRTSLHTTPLPKLDHALLNLVMACHIRLLDVMDTLAEHGRMCAFMVATLPPDYDPKFAVPEIR




GI:50400614
VGTFVAPTDTAASMLLSVVVELQTVLVARVKDLVAMVDQVKDDARAAREAKVVRLQCGILLERA





ESTLGEWSRFKDGLVSARLLK




Chain A, Structure
TIQPGTGXNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSGS
480



Of Vi1-Xylanase
YNPNGNSXLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSII




2D97_A
GTATFXQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGXFSSGSASITVS




GI:112490431





Chain A, Structure
TIQPGTGXNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSGS
481



Of Vi1 (Extra KiI2
YNPNGNSXLSVYGWSRNPLIEYYIVENFGTXNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSII




ADDED)-Xylanase
GTATFXQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDXQIVAVEGXFSSGSASITVS




2D98_A,





GI:112490433





Chain A, Xylanase
XTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG
482



Ii From Tricoderma
SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI





Reesei At 100k

IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS




2DFB_A





GI:112490475





Chain A, Xylanase
XTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG
483



Ii From T. Reesei
SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI




At 293k
IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS




190 aa protein





2DFC_A





GI:112490477





TPA_inf: chitinase
MPSLTALAGLLALVPSALAGWNPDSKQNIAVYWGQNSANSQSTQQRLSFYCNDDNINVIEIAFL
484



18-12 [T. reesei]
NGINPPMTNFANAGDRCTPFSDNPWLLSCPEIEADIKTCQANGKTILLSLGGDTYSQGGWASPE




DAA05860.1
AAQDAAAQVWAMFGPVQSDSSAPRPFGDAVVDGFDFDFESTTNNLVAFGAQLRTLSDAAATDSN




GI:126032263
KKFYLAAAPQCFFPDAAVGPLINAVPMDWIQIQFYNNPCGVSAYTPGSEQQNNYNYQTWEDWAK





TSPNPNVKLLVGIPAGPNAGHGYVSDAQLKSVFEYSKKFDTFAGAMMWDMSQLYQNSGFEDQVV





DALK




TPA_inf: chitinase
MPSLTALAGLLALVPSALAGWNPDSKQNIAVYWGQNSANSQSTQQRLSFYCNDDNINVIEIAFL
485



18-12 [T. reesei]
NGINPPMTNFANAGDRCTPFSDNPWLLSCPEIEADIKTCQANGKTILLSLGGDTYSQGGWASPE




DAA05860.1
AAQDAAAQVWAMFGPVQSDSSAPRPFGDAVVDGFDFDFESTTNNLVAFGAQLRTLSDAAATDSN




GI:126032263
KKFYLAAAPQCFFPDAAVGPLINAVPMDWIQIQFYNNPCGVSAYTPGSEQQNNYNYQTWEDWAK





TSPNPNVKLLVGIPAGPNAGHGYVSDAQLKSVFEYSKKFDTFAGAMMWDMSQLYQNSGFEDQVV





DALK




TPA_inf: chitinase
MFFSKALAAAGLLATAAYAAPTMEKRAAGGKLVVYWGAEDDSTTLANVCADSSYDIVNLAFLSR
486



18-13 [T. reesei]
FFAGGGYPELSLSTLGGPSAAQRAAGATNLQDGTSLIPAIQACQAAGKLVILSMGGAVDFSAVT




DAA5861.1
TSKKYYLTAAPQCPFPDASEPLNVCQLADYIWVQFYNNGNCNIAQSGFNNAVKNWSKSIGNATL




GI:126032265
FIGALASGADGDQGYVSASSLLSAYQGVSALNLPNIGGIMLWEAQLAVKNGNFQKTVKAGIASG





TTPPPPPPTGGCSWAGHCAGASCSTDNDCSDDLTCNGGVCGTAGSTPAPTCSWEGHCLGASCGN





DNDCSDPYSCKNGVCSN




TPA_inf: chitinase
MFFTKAVGGLGLLASLASSAPNPIARRQAPGAQNVVYWGQNGGGTVENNDLSAYCTPTSGIDII
487



18-14 [T. reesei]
VLSFLYQWGQGSSALGGTIGQSCGITTSGEPQNCDALTAAITKCKTAGVKIILSLGGASAFSSF




DAA05862.1
QTADQAAQAGQYLWNAYGGGSGVTRPLGNNVMDGFDLDIESNPGTNENYAALVSALRSNFASDP




GI:126032267
SRQYVISGAPQCPLPEPNMGVIIQNAQFDYLWVQFYNNNEYPGDPCSLGLPGDAPFNFNNWTTF





IQSTPSKDAKVFVGVPAAPLAANGAPSGEVYYATPSQLADIVNDVKSNPAFGGIMMWSAGFSDT





NVNDGCNYAQEAKNILLTGSPCSSGPVSVSRPPVSSPTITSSPPGTSPAPPSQTGSVPQWGQCG





GNGYTGPTQCVAPFKCVATSEWWSQCE




TPA_inf: chitinase
MLSRTLLTALGLTTIAAAAPSQTVKTRQAPGGQNAVYWGATNNENDNLSTYCTASSGIDIVILS
488



18-16 [T. reesei]
FLDIYGATGNFPSGNMGNSCYVGTNGVPQLCDDLASSIATCQAAGIKVIISLGGAASSYSLQSQ




DAA05864.1
SQAVAIGQYLWNAYGNSGNTTVQRPFGNVFVNGFDFDIELNAGSQYYQYLISTLRSNFANDPKN




GI:126032269
TYYITGAPQCPIPEPNMGEIISTSQFDYLWVQFYNNNPVCSLGLPGDAPFNFNDWVSFISTTPS





KNAKLFVGAPASTLGANGNAGGAKYYATPEQLAGIVNSVKSSPFFGGIMLWDAGYSDSNVNNGC





NYAQEAKNILLTGTACGGESSPPPSTTTTAVPPPASSTPSNPSGGSVPQWGQCGGDGYTGPTQC





VAPYKCVATSEWWSSCQ




TPA_inf: chitinase
MVSASAGLAAVGLLNGYWGQYTTTEGLRPHCDSGVDSITLGFVNGAPDASGYPSLNFGPNCWAE
489



18-18 [T. reesei]
SYPGNLGLPSKLLSHCMSLQSDIPYCRSKGVKVILSIGGVYNALTSNYFVGDNGTATDFATFLY




DAA05866.1
NAFGPYNASYTGPRPFDDITTGLPTSVDGFDFDIEADFPNGPYIKMIETFRSLDSSMLITGAPQ




GI:0126032275
CPTNPQYFVMKDMIQQAAFDKLFIQFYNNPVCDAIPGNTAGDKFNYDDWEAVIAGSAKSKSAKL





YIGLPAIQEPNESGYIDPIAMKNLVCQYKDRPHFGGLSLWDLSRGLVNNINGTSFNQWALDALQ





YGCNPIPTTTTTTSTVSSTTAASSTTASSTTASTTKASSTSKASSTSKASSTSKASSTSKASST





SKASSTSKASSTSKASTTSKASTTSKVSTTSKASSTSKASSSTKASTTSKASSTSKASTTSKAS





TTSKASTTSKASTTSKASTTSKASSTSKASSSTKASTTSKASSTSKASTTSKASTTSKASTTSK





ASTTSKASTTSKASSTSKASSSTKASTTSKASSTSKASTTSKASTTSKASTTSKASTTSKASTT





SKASTTSKASTTSKASTTSKASTTSKASTTSKASTTSKASTTSKVSTTSKASTTSKASTTSKAS





STSKVSTTSKASTTSKVSAKATTSTKASTTVKPSTTSKASTTSKASTTSKASTTSKASTTSKAS





TTSKASTTSKASTTSKAATTSVKPTSKTSTSSKPNVSASSSNVGRDATSLVEASTSTSAAVLYP





TTTSRWSNSTITRSSSLTTPIVSDPASLTTSVVYTTSVHTVTKCPAYVTDCPAGGYVTTETIPL





YTTVCPISEATQTAAPTVTTEAPQPWTTSTVYTTRVYTITSCAPGVVDCPANQVTTETIPWYTT





VCPVTATATPVGPGSVVFPQNTEVGQPSLVGPVVEAAYPTASSSLQTLVKPATSVGVPQGSPAG





SSVAPGSSSKPTAPAGPPSYPTGGSGNASPSGSWSGVPVGPSSVPGIPEANAASVMSASLFGLV





IVMAAQVFVL




Chain A,
ASINYDQNYQTGGQVSYSPSNTGFSVNWNTQDDFVVGVGWTTGSSAPINFGGSFSVNSGTGLLS
490



Structural
VYGWSTNPLVEYYIMEDNHNYPAQGTVKGTVTSDGATYTIWENTRVNEPSIQGTATFNQYISVR




Comparison Of Two
NSPRTSGTVTVQNHFNAWASLGLHLGQMNYQVVAVEGWGGSGSASQSVSN




Major Endo-1,4-





Beta-Xylanases





From T. Reesei





1XYN_A





GI:157834272





xylanase, partial
QTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG
491



[Trichoderma
SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI





reesei]

IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS




ACB38137.1





GI:170786291





Chain A, Xylanase
XTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG
492



Ii From T. Reesei
SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI




Cocrystallized
IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS




With TrIs-





Dipicolinate





Europium





3LGR_A





GI:319443539





Chain A, Crystal
TIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSGS
493



Structures Of
YNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSII




Mutant Endo-1,4-
GTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVQGYFSSGSASITVS




xylanase Ii





Complexed With





Substrate (1.15 A)





And Products





(1.6A)





4HK8_A





GI:572153255





Chain A, Crystal
IQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGHFVGGKGWQPGTKNKVINFSGSY
494



Structures Of
NPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSIIG




Mutant Endo-beta-
TATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS




1,4-xylanase Ii





Complexed With





Substrate (1.15 A)





And Products





(1.6A)





4HK9_A





GI:572153256





Chain A, Crystal
TIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGHFVGGKGWQPGTKNKVINFSGS
495



Structures Of
YNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSII




Mutant Endo-beta-
GTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS




1,4-xylanase Ii





Complexed With





Substrate (1.15 A)





And Products





(1.6A)





4HKL_A





GI:572153257





Chain A, Crystal
TIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINNPLI
496



Structures Of
EYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSIIGTATFYQYWSVRRNHRSSGS




Mutant Endo-beta-
VNTANHFNAWAQQGLTLGTMDYQIVAVQGYFSSGSASITVS




1,4-xylanase Ii





(e177p) In Apo





Form





4HKO_A





GI:572153258





Chain A, Crystal
XTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG
497



Structures Of
SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI




Mutant Endo-beta-
IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS




1,4-xylanase Ii





Complexed With





Substrate And





Products





4HKW_A





GI:572153259





Chain A, Joint X-
XTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG
498



ray/neutron
SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI




Structure Of
IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS





TrichodermaReesei






Xylanase Ii In





Complex With Mes





At Ph 5.7





4S2D_A





GI:929984639





Chain A, Joint X-
XTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG
499



ray/neutron
SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI




Structure Of T.
IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS




Reesei Xylanasei Ii





At Phi 4.4





4S2F_A





GI:929984640





Chain A, Joint X-
XTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG
500



ray/neutron
SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI




Structure Of T.
IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS




Reesei Xylanase Ii





At Ph 5.8





4S2G_A





GI:929984641





Chain A, Joint X-
XTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG
501



ray/neutron
SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI




StructureOf T.
IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS




Reesei Xylanase Ii





At Ph 8.5





4S2H_A





GI:929984642





Chain A, X-ray
TIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGDFVGGKGWQPGTKNKVINFSGS
502



Structure Analysis
YNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSII




Of Xylanase - N44d
GTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS




4XQ4_A





GI:929984784





Chain B, X-ray
TIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGDFVGGKGWQPGTKNKVINFSGS
503



Structure Analysis
YNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSII




Of Xylanase-N44d
GTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS




4XQ4_B





GI:929984785





Chain A, X-ray
TIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSGS
504



Structure Analysis
YNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSII




Of Xylanase-wt At
GTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS




Ph4.0





4XQD_A





GI:929984786





Chain B, X-ray
TIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSGS
505



Structure Analysis
YNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSII




Of Xylanase-wt At
GTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS




Ph4.0





4XQD_B





GI:929984787





Chain A, X-ray
TIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGEFVGGKGWQPGTKNKVINFSGS
506



Structure Analysis
YNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSII




Of Xylanase-n44e
GTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS




With Mes At
Ph6.0




4XQW_A





GI:929984788





Chain A, Neutron
TIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGDFVGGKGWQPGTKNKVINFSGS
507



And X-ray
YNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSII




Structure Analysis
GTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS




Of Xylanase: N44d





At Ph6





4XPV_A





GI :931139811





truncated xylanase
MVSFSSLVVALVGIASSWALWRNPSTQT
508



5 [T. reesei]





ANW825841





GI :1048222282





truncated xylanase
MVSFSSLVVALVGIASSWALWRNPSTQT
509



5 [T. reesei]





ANW825851





GI:1048222284





xylanase [T.
MVSFSSLVVALVGIASSWAAPLEESPNANITERGPSNFVLGGHNAVRRAAINYNQDYTTGGDVV
510




reesei]

YTHSNTGFAVNWSYPNDFVVGVGWNPGGSAPINFSGNFGVGSGVGLLSVYGWSTNPLVEYYVVE




ANX99792.1
DNFGFSSGGTVKGSVTSDGSSYTIWENTRVNEPSIVGTATFNQYISIRNSKRSSGTVTVANHFN




GI:1049178838
AWKSLGMNLGTMNYQVIAVEGWGGQGGVQQTVSN




xylanase [T.
MVSFSSLVVALVGIASSLAAPLEESLNANITERGPNNFVLGGHNAVRRAAINYNQDYTTGGDVV
511




reesei]

YTHSNTGFAVNWSYPNDFVVGVGWNPGGSAPINFSGNFGVGSGVGLLSVYGWSTNPLVEYYVVE




ANX99793.1
DNFGFSSGGTVKGSVTSDGSSYTIWENTRVNEPSIVGTATFNQYISIRNSKRSSGTVTIANHFN




GI:1049178840
AWKSLGMNLGTLNYQVIAVEGWGGQGGVQQTVSN




xylanase [T.
MVSFSSLVVALVGIASSLAAPLEESLNANITERGPNNFVLGGHNAVRRAAINYNQDYTTGGDVV
512




reesei]

YTHSNTGFAVNWSYPNDFVVGVGWNPGGSAPINFSGNFGVGSGVGLLSVYGWSTNPLVEYYVVE




ANX99794.1
DNFGFSSGGTVKGSVTSDGSSYTIWENTRVNEPSIVGTATFNQYISIRNSKRSSGTVTIANHFN




GI:1049178842
AWKSLGMNLGTLNYQVIAVEGWGGQGGVQQTVSN




xylanase [T.
MVSFSSLVVALVGIASSLAAPLEESLNANITERGPNNFVLGGHNAVRRAAINYNQDYTTGGDVV
513




reesei]

YTHSNTGFAVNWSYPNDFVVGVGWNPGGSAPINFSGNFGVGSGVGLLSVYGWSTNPLVEYYVVE




ANX99795.1
DNFGFSSGGTVKGSVTSDGSSYTIWENTRVNEPSIVGTATFNQYISIRNSKRSSGTVTIANHFN




GI:1049178844
AWKSLGMNLGTLNYQVIAVEGWGGQGGVQQTVSN




xylanase 2,
QTIQPGTGYNNGYCYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWCPGTKNKVINFSG
514



partial [T.
SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI





reesei]

IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS




APU51339.1





GI:1130479396





xylanase 2,
QTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG
515



partial [T.
SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI





reesei]

IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS




APU51340.1





GI:1130479398





Chain A, Microed
QTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG
516



Structure Of
SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI




Xylanase At 2.3 A
IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS




Resolution





5K7P_A





GI:1175128641





glycoside
MFFTKAVGGLGLLASLASSAPNPIARRQAPGAQNVVYWGQNGGGTVENNDLSAYCTPTSGIDII
517



hydrolase family
VLSFLYQWGQGSSALGGTIGQSCGITTSGEPQNCDALTAAITKCKTAGVKIILSLGGASAFSSF




18 protein,
QTADQAAQAGQYLWNAYGGGSGVTRPLGNNVMDGFDLDIESNPGTNENYAALVSALRSNFASDP




chitinase [T.
SRQYVISGAPQCPLPEPNMGVIIQNAQFDYLWVQFYNNNEYPGDPCSLGLPGDAPFNFNNWTTF





reesei] QM6a

IQSTPSKDAKVFVGVPAAPLAANGAPSGEVYYATPSQLADIVNDVKSNPAFGGIMMWSAGFSDT




EGR44650.1
NVNDGCNYAQEAKNILLTGSPCSSGPVSVSRPPVSSPTITSSPPGTSPAPPSQTGSVPQWGQCG




GI :340514387
GNGYTGPTQCVAPFKCVATSEWWSQCE




glycoside
MKSSISVVLALLGHSAAWSYATKSQYRANIKINARQTYQTMIGGGCSGAFGIACQQFGSSGLSP
518



hydrolase family 5
ENQQKVTQILFDENIGGLSIVRNDIGSSPGTTILPTCPATPQDKFDYVWDGSDNCQFNLTKTAL




protein [T. reesei]
KYNPNLYVYADAWSAPGCMKTVGTENLGGQICGVRGTDCKHDWRQAYADYLVQYVRFYKEEGID




QM6a
ISLLGAWNEPDFNPFTYESMLSDGYQAKDFLEVLYPTLKKAFPKVDVSCCDATGARQERNILYE




EGR44819.1
LQQAGGERYFDIATWHNYQSNPERPFNAGGKPNIQTEWADGTGPWNSTWDYSGQLAEGLQWALY




GI:340514558
MHNAFVNSDTSGYTHWWCAQNTNGDNALIRLDRDSYEVSARLWAFAQYFRFARPGSVRIGATSD





VENVYVTAYVNKNGTVAIPVINAAHFPYDLTIDLEGIKKRKLSEYLTDNSHNVTLQSRYKVSGS





SLKVTVEPRAMKTFWLEPQSTFAVI




glycoside
MVSFTSLLAGVAAISGVLAAPAAEVESVAVEKRQTIQPGTGYNNGYFYSYWNDGHGGVTYTNGP
519



hydrolase family
GGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSGSYNPNGNSYLSVYGWSRNPLIEYYIVENFGT




11 [T. reesei]
YNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSIIGTATFYQYWSVRRNHRSSGSVNTANHFNAW




QM6a
AQQGLTLGTMDYQIVAVEGYFSSGSASITVS




EGR45030.1





GI:340514771





glycoside
MLSRTLLTALGLTTIAAAAPSQTVKTRQAPGGQNAVYWGATNNENDNLSTYCTASSGIDIVILS
520



hydrolase family
FLDIYGATGNFPSGNMGNSCYVGTNGVPQLCDDLASSIATCQAAGIKVIISLGGAASSYSLQSQ




18, chitinase [T.
SQAVAIGQYLWNAYGNSGNTTVQRPFGNVFVNGFDFDIELNAGSQYYQYLISTLRSNFANDPKN





reesei QM6a]

TYYITGAPQCPIPEPNMGEIISTSQFDYLWVQFYNNNPVCSLGLPGDAPFNFNDWVSFISTTPS




EGR45486.1
KNAKLFVGAPASTLGANGNAGGAKYYATPEQLAGIVNSVKSSPFFGGIMLWDAGYSDSNVNNGC




GI:340515230
NYAQEAKNILLTGTACGGESSPPPSTTTTAVPPPASSTPSNPSGGSVPQWGQCGGDGYTGPTQC





VAPYKCVATSEWWSSCQ




xylanase regulator
MLSNPLRRYSAYPDISSASFDPNYHGSQSHLHSINVNTFGNSHPYPMQHLAQHAELSSSRMIRA
521



1 [T. reesei QM6a]
SPVQPKQRQGSLIAARKNSTGTAGPIRRRISRACDQCNQLRTKCDGLHPCAHCIEFGLGCEYVR




EGR48040.1
ERKKRGKASRKDIAAQQAAAAAAQHSGQVQDGPEDQHRKLSRQQSESSRGSAELAQPAHDPPHG




GI:340517797
HIEGSVSSFSDNGLSQHAAMGGMDGLEDHHGHVGVDPALGRTQLEASSAMGLGAYGEHPGYESP





GMNGHVMVPPSYGAQTTMAGYSGISYAAQAPSPATYSSDGNFRLTGHIHDYPLANGSSPSWGQS





DLRYPVLEPLLPHLGNILPVSLACDLIDLYFSSSSSAQMHPMSPYVLGFVFRKRSFLHPTNPRR





CQPALLASMLWVAAQTSEASFLTSLPSARSKVCQKLLELTVGLLQPLIHTGTNSPSPKTSPVVG





AAALGVLGVAMPGSLNMDSLAGETGAFGAIGSLDDVITYVHLATVVSASEYKGASLRWWGAAWS





LARELKLGRELPPGNPPANQEDGEGLSEDVDEHDLNRNNTRFVTEEEREERRRAWWLVYIVDRH





LALCYNRPLFLLDSECSDLYHPMDDIKWQAGKFRSHDAGNSSINIDSSMTDEFGDSPRAARGAH





YECRGRSIFGYFLSLMTILGEIVDVHHAKSHPRFGVGFRSARDWDEQVAEITRHLDMYEESLKR





FVAKHLPLSSKDKEQHEMHDSGAVTDMQSPLSVRTNASSRMTESEIQASIVVAYSTHVMHVLHI





LLADKWDPINLLDDDDLWISSEGFVTATSHAVSAAEAISQILEFDPGLEFMPFFYGVYLLQGSF





LLLLIADKLQAEASPSVIKACETIVRAHEACVVTLSTEYQRNFSKVMRSALALIRGRVPEDLAE





QQQRRRELLALYRWTGNGTGLAL




predicted protein,
LRILPVGDSITYGFLSDQDGGDGNGYRLQLRQHLSKDRVVFAGTETSGNMTDGYYLIVSSSLHR
522



partial [T. reesei
QAAWNGKTIQYISDHVTPSLEQRPNIILLHAGTNDMNPNGAISREGHDPVAASERLGSLVDKMT




QM6a]
TLCPDAVILVAMIIGTCNDEQAPQTKVFQSLIPNVVAPRLESGKHVLAVDFSTFPLDKLRDCIH




EGR49987.1
PTNEGYHLLGYYWYDFIAQIPRDWITAPVGEDPQRPEEQNLAMRLETDL




GI:340519749





Transcription
MSFSNPRRRTPVTRPGTDCEHGLSLKTTMTLRKGATFHSPTSPSASSAAGDFVPPTLTRSQSAF
523



factor [T. reesei
DDVVDASRRRIAMTLNDIDEALSKASLSDKSPRPKPLRDTSLPVPRGFLEPPVVDPAMNKQEPE




QM6a]
RRVLRPRSVRRTRNHASDSGIGSSVVSTNDKAGAADSTKKPQASALTRSAASSTTAMLPSLSHR




EGR51484.1
AVNRIREHTLRPLLEKPTLKEFEPIVLDVPRRIRSKEIICLRDLEKTLIFMAPEKAKSAALYLD




GI:340521249
FCLTSVRCIQATVEYLTDREQVRPGDRPYTNGYFIDLKEQIYQYGKQLAAIKEKGSLADDMDID





PSDEVRLYGGVAENGRPAELIRVKKDGTAYSMATGKIVDMTESPTPLKRSLSEQREDEEEIMRS





MARRKKNATPEELAPKKCREPGCTKEFKRPCDLTKHEKTHSRPWKCPIPTCKYHEYGWPTEKEM





DRHINDKHSDAPAMYECLFKPCPYKSKRESNCKQHMEKAHGWTYVRTKTNGKKAPSQNGSTAQQ





TPPLANVSTPSSTPSYSVPTPPQDQVMSTDFPMYPADDDWLATYGAQPNTIDAMDLGLENLSPA





SAASSYEQYPPYQNGSTFIINDEDIYAAHVQIPAQLPTPEQVYTKMMPQQMPVYHVQQEPCTTV





PILGEPQFSPNAQQNAVLYTPTSLREVDEGFDESYAADGADFQLFPATVDKTDVFQSLFTDMPS





ANLGFSQTTQPDIFNQIDWSNLDYQGFQE




Glycoside
MKANVILCLLAPLVAALPTETIHLDPELAALRANLTERTADLWDRQASQSIDQLIKRKGKLYFG
524



hydrolase family
TATDRGLLQREKNAAIIQADLGQVTPENSMKWQSLENNQGQLNWGDADYLVNFAQQNGKSIRGH




10 protein [T.
TLIWHSQLPAWVNNINNADTLRQVIRTHVSTVVGRYKGKIRAWDVVNEIFNEDGTLRSSVFSRL





reesei QM6a]

LGEEFVSIAFRAARDADPSARLYINDYNLDRANYGKVNGLKTYVSKWISQGVPIDGIGSQSHLS




EGR52056.1
GGGGSGTLGALQQLATVPVTELAITELDIQGAPTTDYTQVVQACLSVSKCVGITVWGISDKDSW




GI:340521822
RASTNPLLFDANFNPKPAYNSIVGILQ




Glycoside
MFFSKALAAAGLLATAAYAAPTMEKRAAGGKLVVYWGAEDDSTTLANVCADSSYDIVNLAFLSR
525



hydrolase family
FFAGGGYPELSLSTLGGPSAAQRAAGATNLQDGTSLIPAIQACQAAGKLVILSMGGAVDFSAVT




18 protein,
LSSDAQGQQLADTVWNLFLGGTANPTLRPFGSVKLDGVDLDNETGNPTGYLAMAQRFKSNFAKD




chitinase [T.
TSKKYYLTAAPQCPFPDASEPLNVCQLADYIWVQFYNNGNCNIAQSGFNNAVKNWSKSIGNATL





reesei QM6a]

FIGALASGADGDQGYVSASSLLSAYQGVSALNLPNIGGIMLWEAQLAVKNGNFQKTVKAGIASG




EGR52465.1
TTPPPPPPTGGCSWAGHCAGASCSTDNDCSDDLTCNGGVCGTAGSTPAPTCSWEGHCLGASCGN




GI:340522232
DNDCSDPYSCKNGVCSN




Glycoside
MPSLTALAGLLALVPSALAGWNPDSKQNIAVYWGQNSANSQSTQQRLSFYCNDDNINVIEIAFL
526



hydrolase family
NGINPPMTNFANAGDRCTPFSDNPWLLSCPEIEADIKTCQANGKTILLSLGGDTYSQGGWASPE




18 proetin [T.
AAQDAAAQVWAMFGPVQSDSSAPRPFGDAVVDGFDFDFESTTNNLVAFGAQLRTLSDAAATDSN





reesei QM6a]

KKFYLAAAPQCFFPDAAVGPLINAVPMDWIQIQFYNNPCGVSAYTPGSEQQNNYNYQTWEDWAK




EGR52759.1
TSPNPNVKLLVGIPAGPNAGHGYVSDAQLKSVFEYSKKFDTFAGAMMWDMSQLYQNSGFEDQVV




GI:340522526
DALK




Glycoside
MVAFSSLICALTSIASTLAMPTGLEPESSVNVTERGMYDFVLGAHNDHRRRASINYDQNYQTGG
527



hydrolase family
QVSYSPSNTGFSVNWNTQDDFVVGVGWTTGSSAPINFGGSFSVNSGTGLLSVYGWSTNPLVEYY




11 protein [T.
IMEDNHNYPAQGTVKGTVTSDGATYTIWENTRVNEPSIQGTATFNQYISVRNSPRTSGTVTVQN





reesei QM6a]

HFNAWASLGLHLGQMNYQVVAVEGWGGSGSASQSVSN




EGR52985.1





GI:340522752





Glycoside
MPSLTALAGLLALVPSALAGWNPDSKQNIAVYWGQNSANSQSTQQRLSFYCNDDNINVIEIAFL
528



hydrolase family
NGINPPMTNFANAGDRCTPFSDNPWLLSCPEIEADIKTCQANGKTILLSLGGDTYSQGGWASPE




18 protein [T.
AAQDAAAQVWAMFGPVQSDSSAPRPFGDAVVDGFDFDFESTTNNLVAFGAQLRTLSDAAATDSN





reesei QM6a]

KKFYLAAAPQCFFPDAAVGPLINAVPMDWIQIQFYNNPCGVSAYTPGSEQQNNYNYQTWEDWAK




XP_006961069.1
TSPNPNVKLLVGIPAGPNAGHGYVSDAQLKSVFEYSKKFDTFAGAMMWDMSQLYQNSGFEDQVV




GI:589098093
DALK




Glycoside
MFFSKALAAAGLLATAAYAAPTMEKRAAGGKLVVYWGAEDDSTTLANVCADSSYDIVNLAFLSR
529



hydrolase family
FFAGGGYPELSLSTLGGPSAAQRAAGATNLQDGTSLIPAIQACQAAGKLVILSMGGAVDFSAVT




18 protein,
LSSDAQGQQLADTVWNLFLGGTANPTLRPFGSVKLDGVDLDNETGNPTGYLAMAQRFKSNFAKD




chitinase [T.
TSKKYYLTAAPQCPFPDASEPLNVCQLADYIWVQFYNNGNCNIAQSGFNNAVKNWSKSIGNATL





reesei QM6a]

FIGALASGADGDQGYVSASSLLSAYQGVSALNLPNIGGIMLWEAQLAVKNGNFQKTVKAGIASG




XP_006961376.1
TTPPPPPPTGGCSWAGHCAGASCSTDNDCSDDLTCNGGVCGTAGSTPAPTCSWEGHCLGASCGN




GI:589098707
DNDCSDPYSCKNGVCSN




Glycoside
MVAFSSLICALTSIASTLAMPTGLEPESSVNVTERGMYDFVLGAHNDHRRRASINYDQNYQTGG
530



hydrolase family
QVSYSPSNTGFSVNWNTQDDFVVGVGWTTGSSAPINFGGSFSVNSGTGLLSVYGWSTNPLVEYY




11 protein [T.
IMEDNHNYPAQGTVKGTVTSDGATYTIWENTRVNEPSIQGTATFNQYISVRNSPRTSGTVTVQN





reesei QM6a]

HFNAWASLGLHLGQMNYQVVAVEGWGGSGSASQSVSN




XP_006961811.1





GI :589099577





Glycoside
MKANVILCLLAPLVAALPTETIHLDPELAALRANLTERTADLWDRQASQSIDQLIKRKGKLYFG
531



hydrolase family
TATDRGLLQREKNAAIIQADLGQVTPENSMKWQSLENNQGQLNWGDADYLVNFAQQNGKSIRGH




10 [T. reesei
TLIWHSQLPAWVNNINNADTLRQVIRTHVSTVVGRYKGKIRAWDVVNEIFNEDGTLRSSVFSRL




QM6a]
LGEEFVSIAFRAARDADPSARLYINDYNLDRANYGKVNGLKTYVSKWISQGVPIDGIGSQSHLS




XP_006962419.1
GGGGSGTLGALQQLATVPVTELAITELDIQGAPTTDYTQVVQACLSVSKCVGITVWGISDKDSW




GI :589100793
RASTNPLLFDANFNPKPAYNSIVGILQ




Transcription
MSFSNPRRRTPVTRPGTDCEHGLSLKTTMTLRKGATFHSPTSPSASSAAGDFVPPTLTRSQSAF
532



factor protein [T.
DDVVDASRRRIAMTLNDIDEALSKASLSDKSPRPKPLRDTSLPVPRGFLEPPVVDPAMNKQEPE





reesei QM6a]

RRVLRPRSVRRTRNHASDSGIGSSVVSTNDKAGAADSTKKPQASALTRSAASSTTAMLPSLSHR




XP_006962963.1
AVNRIREHTLRPLLEKPTLKEFEPIVLDVPRRIRSKEIICLRDLEKTLIFMAPEKAKSAALYLD




GI:589101881
FCLTSVRCIQATVEYLTDREQVRPGDRPYTNGYFIDLKEQIYQYGKQLAAIKEKGSLADDMDID





PSDEVRLYGGVAENGRPAELIRVKKDGTAYSMATGKIVDMTESPTPLKRSLSEQREDEEEIMRS





MARRKKNATPEELAPKKCREPGCTKEFKRPCDLTKHEKTHSRPWKCPIPTCKYHEYGWPTEKEM





DRHINDKHSDAPAMYECLFKPCPYKSKRESNCKQHMEKAHGWTYVRTKTNGKKAPSQNGSTAQQ





TPPLANVSTPSSTPSYSVPTPPQDQVMSTDFPMYPADDDWLATYGAQPNTIDAMDLGLENLSPA





SAASSYEQYPPYQNGSTFIINDEDIYAAHVQIPAQLPTPEQVYTKMMPQQMPVYHVQQEPCTTV





PILGEPQFSPNAQQNAVLYTPTSLREVDEGFDESYAADGADFQLFPATVDKTDVFQSLFTDMPS





ANLGFSQTTQPDIFNQIDWSNLDYQGFQE




Predicted protein,
LRILPVGDSITYGFLSDQDGGDGNGYRLQLRQHLSKDRVVFAGTETSGNMTDGYYLIVSSSLHR
533



partial [T. reesei
QAAWNGKTIQYISDHVTPSLEQRPNIILLHAGTNDMNPNGAISREGHDPVAASERLGSLVDKMT




QM6a]
TLCPDAVILVAMIIGTCNDEQAPQTKVFQSLIPNVVAPRLESGKHVLAVDFSTFPLDKLRDCIH




XP_006964048.1
PTNEGYHLLGYYWYDFIAQIPRDWITAPVGEDPQRPEEQNLAMRLETDL




GI:589104051





Xylanase regulator
MLSNPLRRYSAYPDISSASFDPNYHGSQSHLHSINVNTFGNSHPYPMQHLAQHAELSSSRMIRA
534



1 protein [T.
SPVQPKQRQGSLIAARKNSTGTAGPIRRRISRACDQCNQLRTKCDGLHPCAHCIEFGLGCEYVR





reesei QM6a]

ERKKRGKASRKDIAAQQAAAAAAQHSGQVQDGPEDQHRKLSRQQSESSRGSAELAQPAHDPPHG




XP_006966092.1
HIEGSVSSFSDNGLSQHAAMGGMDGLEDHHGHVGVDPALGRTQLEASSAMGLGAYGEVHPGYES




GI:589108139
PGMNGHVMVPPSYGAQTTMAGYSGISYAAQAPSPATYSSDGNFRLTGHIHDYPLANGSSPSWGQ





SDLRYPVLEPLLPHLGNILPVSLACDLIDLYFSSSSSAQMHPMSPYVLGFVFRKRSFLHPTNPR





RCQPALLASMLWVAAQTSEASFLTSLPSARSKVCQKLLELTVGLLQPLIHTGTNSPSPKTSPVV





GAAALGVLGVAMPGSLNMDSLAGETGAFGAIGSLDDVITYVHLATVVSASEYKGASLRWWGAAW





SLARELKLGRELPPGNPPANQEDGEGLSEDVDEHDLNRNNTRFVTEEEREERRRAWWLVYIVDR





HLALCYNRPLFLLDSECSDLYHPMDDIKWQAGKFRSHDAGNSSINIDSSMTDEFGDSPRAARGA





HYECRGRSIFGYFLSLMTILGEIVDVHHAKSHPRFGVGFRSARDWDEQVAEITRHLDMYEESLK





RFVAKHLPLSSKDKEQHEMHDSGAVTDMQSPLSVRTNASSRMTESEIQASIVVAYSTHVMHVLH





ILLADKWDPINLLDDDDLWISSEGFVTATSHAVSAAEAISQILEFDPGLEFMPFFYGVYLLQGS





FLLLLIADKLQAEASPSVIKACETIVRAHEACVVTLSTEYQRNFSKVMRSALALIRGRVPEDLA





EQQQRRRELLALYRWTGNGTGLAL




Glycoside
MLSRTLLTALGLTTIAAAAPSQTVKTRQAPGGQNAVYWGATNNENDNLSTYCTASSGIDIVILS
535



hydrolase family
FLDIYGATGNFPSGNMGNSCYVGTNGVPQLCDDLASSIATCQAAGIKVIISLGGAASSYSLQSQ




18 protein,
SQAVAIGQYLWNAYGNSGNTTVQRPFGNVFVNGFDFDIELNAGSQYYQYLISTLRSNFANDPKN




chitinase [T.
TYYITGAPQCPIPEPNMGEIISTSQFDYLWVQFYNNNPVCSLGLPGDAPFNFNDWVSFISTTPS





reesei QM6a]

KNAKLFVGAPASTLGANGNAGGAKYYATPEQLAGIVNSVKSSPFFGGIMLWDAGYSDSNVNNGC




XP_006968673.1
NYAQEAKNILLTGTACGGESSPPPSTTTTAVPPPASSTPSNPSGGSVPQWGQCGGDGYTGPTQC




GI:589113301
VAPYKCVATSEWWSSCQ




Glycoside
MVSFTSLLAGVAAISGVLAAPAAEVESVAVEKRQTIQPGTGYNNGYFYSYWNDGHGGVTYTNGP
536



hydrolase family
GGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSGSYNPNGNSYLSVYGWSRNPLIEYYIVENFGT




11 [T. reesei





QM6a]
YNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSIIGTATFYQYWSVRRNHRSSGSVNTANHFNAW




XP_006968947.1
AQQGLTLGTMDYQIVAVEGYFSSGSASITVS




GI:589113849





Glycoside
MKSSISVVLALLGHSAAWSYATKSQYRANIKINARQTYQTMIGGGCSGAFGIACQQFGSSGLSP
537



hydrolase family 5
ENQQKVTQILFDENIGGLSIVRNDIGSSPGTTILPTCPATPQDKFDYVWDGSDNCQFNLTKTAL




protein [T. reesei
KYNPNLYVYADAWSAPGCMKTVGTENLGGQICGVRGTDCKHDWRQAYADYLVQYVRFYKEEGID




QM6a]
ISLLGAWNEPDFNPFTYESMLSDGYQAKDFLEVLYPTLKKAFPKVDVSCCDATGARQERNILYE




XP_006969226.1
LQQAGGERYFDIATWHNYQSNPERPFNAGGKPNIQTEWADGTGPWNSTWDYSGQLAEGLQWALY




GI:589114407
MHNAFVNSDTSGYTHWWCAQNTNGDNALIRLDRDSYEVSARLWAFAQYFRFARPGSVRIGATSD





VENVYVTAYVNKNGTVAIPVINAAHFPYDLTIDLEGIKKRKLSEYLTDNSHNVTLQSRYKVSGS





SLKVTVEPRAMKTFWLEPQSTFAVI




Glycoside
MFFTKAVGGLGLLASLASSAPNPIARRQAPGAQNVVYWGQNGGGTVENNDLSAYCTPTSGIDII
538



hydrolase family
VLSFLYQWGQGSSALGGTIGQSCGITTSGEPQNCDALTAAITKCKTAGVKIILSLGGASAFSSF




18 protein,
QTADQAAQAGQYLWNAYGGGSGVTRPLGNNVMDGFDLDIESNPGTNENYAALVSALRSNFASDP




chitinase [T.
SRQYVISGAPQCPLPEPNMGVIIQNAQFDYLWVQFYNNNEYPGDPCSLGLPGDAPFNFNNWTTF





reesei QM6a]

IQSTPSKDAKVFVGVPAAPLAANGAPSGEVYYATPSQLADIVNDVKSNPAFGGIMMWSAGFSDT




XP_0069693971.
NVNDGCNYAQEAKNILLTGSPCSSGPVSVSRPPVSSPTITSSPPGTSPAPPSQTGSVPQWGQCG




GI:589114749
GNGYTGPTQCVAPFKCVATSEWWSQCE




Chain A, Crystal
XASQSIDQLIKRKGKLYFGTATDRGLLQREKNAAIIQADLGQVTPENSMKWQSLENNQGQLNWG
539



Structure Of An
DADYLVNFAQQNGKSIRGHTLIWHSQLPAWVNNINNADTLRQVIRTHVSTVVGRYKGKIRAWDV




Endo-beta-1,4-
VNEIFNEDGTLRSSVFSRLLGEEFVSIAFRAARDADPSARLYINDYNLDRANYGKVNGLKTYVS




xylanase
KWISQGVPIDGIGSQSHLSGGGGSGTLGALQQLATVPVTELAITELDIQGAPTTDYTQVVQACL




(glycoside
SVSKCVGITVWGISDKDSWRASTNPLLFDANFNPKPAYNSIVGILQ




Hydrolase Family





10/gh10) Enzyme





From T. Reesei





4XVO_A





GI:756143139





Endo-1,4-beta-
MVAFSSLICALTSIASTLAMPTGLEPESSVNVTERGMYDFVLGAHNDHRRRASINYDQNYQTGG
540



xylanase 1 (also
QVSYSPSNTGFSVNWNTQDDFVVGVGWTTGSSAPINFGGSFSVNSGTGLLSVYGWSTNPLVEYY




known as EX 1;
IMEDNHNYPAQGTVKGTVTSDGATYTIWENTRVNEPSIQGTATFNQYISVRNSPRTSGTVTVQN




Xylanase 1; 1,4-
HFNAWASLGLHLGQMNYQVVAVEGWGGSGSASQSVSN




beta-D-xylan





xylanohydrolase 1;





Acidic endo-beta-





1,4-xylanase)





GOR947.1





GI:1042851765





Endo-1,4-beta-
MVSFTSLLAGVAAISGVLAAPAAEVESVAVEKRQTIQPGTGYNNGYFYSYWNDGHGGVTYTNGP
541



xylanase 2 (also
GGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSGSYNPNGNSYLSVYGWSRNPLIEYYIVENFGT




known as Xylanase
YNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSIIGTATFYQYWSVRRNHRSSGSVNTANHFNAW




2; 1,4-beta-D-
AQQGLTLGTMDYQIVAVEGYFSSGSASITVS




xylan





xylanohydrolase 2;





Alkaline endo-





beta-1,4-xylanase)





GRUP7.1





GI:142851766





Endo-1,4-beta-
MKANVILCLLAPLVAALPTETIHLDPELAALRANLTERTADLWDRQASQSIDQLIKRKGKLYFG
542



xylanase 3 (also
TATDRGLLQREKNAAIIQADLGQVTPENSMKWQSLENNQGQLNWGDADYLVNFAQQNGKSIRGH




known as Xylanase
TLIWHSQLPAWVNNINNADTLRQVIRTHVSTVVGRYKGKIRAWDVVNEIFNEDGTLRSSVFSRL




3; 1,4-beta-D-
LGEEFVSIAFRAARDADPSARLYINDYNLDRANYGKVNGLKTYVSKWISQGVPIDGIGSQSHLS




xylan
GGGGSGTLGALQQLATVPVTELAITELDIQGAPTTDYTQVVQACLSVSKCVGITVWGISDKDSW




xylanohydrolase 3)
RASTNPLLFDANFNPKPAYNSIVGILQ




GORA32.1





GI:1042851767





Endo-1,4-beta-
MVAFSSLICALTSIASTLAMPTGLEPESSVNVTERGMYDFVLGAHNDHRRRASINYDQNYQTGG
543



xylanase 1 (also
QVSYSPSNTGFSVNWNTQDDFVVGVGWTTGSSAPINFGGSFSVNSGTGLLSVYGWSTNPLVEYY




known as EX 1;
IMEDNHNYPAQGTVKGTVTSDGATYTIWENTRVNEPSIQGTATFNQYISVRNSPRTSGTVTVQN




Xylanase 1; 1,4-
HFNAWASLGLHLGQMNYQVVAVEGWGGSGSASQSVSN




beta-D-xylan





xylanohydrolase 1;





Acidic endo-beta-





1,4-xylanase)





P36218.1 GI:549460





Hypothetical
MFFTKAVGGLGLLASLASSAPNPIARRQAPGAQNVVYWGQNGGGTVENNDLSAYCTPTSGIDII
544



protein
VLSFLYQWGQGSSALGGTIGQSCGITTSGEPQNCDALTAAITKCKTAGVKIILSLGGASAFSSF




M419DRAFT_104468
QTADQAAQAGQYLWNAYGGGSGVTRPLGNNVMDGFDLDIESNPGTNENYAALVSALRSNFASDP




[T. reesei RUT 
SRQYVISGAPQCPLPEPNMGVIIQNAQFDYLWVQFYNNNEYPGDPCSLGLPGDAPFNFNNWTTF




C-30] ETR97430.1
IQSTPSKDAKVFVGVPAAPLAANGAPSGEVYYATPSQLADIVNDVKSNPAFGGIMMWSAGFSDT




GI:572273844
NVNDGCNYAQEAKNILLTGSPCSSGPVSVSRPPVSSPTITSSPPGTSPAPPSQTGSVPQWGQCG





GNGYTGPTQCVAPFKCVATSEWWSQCE




Endo beta-1,4-
MVAFSSLICALTSIASTLAMPTGLEPESSVNVTERGMYDFVLGAHNDHRRRASINYDQNYQTGG
545



xylanase isotype 2
QVSYSPSNTGFSVNWNTQDDFVVGVGWTTGSSAPINFGGSFSVNSGTGLLSVYGWSTNPLVEYY




[T. reesei RUT 
IMEDNHNYPAQGTVKGTVTSDGATYTIWENTRVNEPSIQGTATFNQYISVRNSPRTSGTVTVQN




C-30] ETR98398.1
HFNAWASLGLHLGQMNYQVVAVEGWGGSGSASQSVSN




GI:572274931





Hypothetical
MFFSKALAAAGLLATAAYAAPTMEKRAAGGKLVVYWGAEDDSTTLANVCADSSYDIVNLAFLSR
546



protein
FFAGGGYPELSLSTLGGPSAAQRAAGATNLQDGTSLIPAIQACQAAGKLVILSMGGAVDFSAVT




M419DRAFT_114979
LSSDAQGQQLADTVWNLFLGGTANPTLRPFGSVKLDGVDLDNETGNPTGYLAMAQRFKSNFAKD




[T. reesei RUT 
TSKKYYLTAAPQCPFPDASEPLNVCQLADYIWVQFYNNGNCNIAQSGFNNAVKNWSKSIGNATL




C-30] ETR98463.1
FIGALASGADGDQGYVSASSLLSAYQGVSALNLPNIGGIMLWEAQLAVKNGNFQKTVKAGIASG




GI:572274996
TTPPPPPPTGGCSWAGHCAGASCSTDNDCSDDLTCNGGVCGTAGSTPAPTCSWEGHCLGASCGN





DNDCSDPYSCKNGVCSN




Hypothetical
MLSRTLLTALGLTTIAAAAPSQTVKTRQAPGGQNAVYWGATNNENDNLSTYCTASSGIDIVILS
547



protein
FLDIYGATGNFPSGNMGNSCYVGTNGVPQLCDDLASSIATCQAAGIKVIISLGGAASSYSLQSQ




M419DRAFT_133349
SQAVAIGQYLWNAYGNSGNTTVQRPFGNVFVNGFDFDIELNAGSQYYQYLISTLRSNFANDPKN




[T. reesei RUT 
TYYITGAPQCPIPEPNMGEIISTSQFDYLWVQFYNNNPVCSLGLPGDAPFNFNDWVSFISTTPS




C-30] ETR98658.1
KNAKLFVGAPASTLGANGNAGGAKYYATPEQLAGIVNSVKSSPFFGGIMLWDAGYSDSNVNNGC




GI:572275208
NYAQEAKNILLTGTACGGESSPPPSTTTTAVPPPASSTPSNPSGGSVPQWGQCGGDGYTGPTQC





VAPYKCVATSEWWSSCQ




Glycoside
MVSASAGLAAVGLLNGYWGQYTTTEGLRPHCDSGVDSITLGFVNGAPDASGYPSLNFGPNCWAE
548



hydrolase [T.
SYPGNLGLPSKLLSHCMSLQSDIPYCRSKGVKVILSIGGVYNALTSNYFVGDNGTATDFATFLY





reesei RUT C-30]

NAFGPYNASYTGPRPFDDITTGLPTSVDGFDFDIEADFPNGPYIKMIETFRSLDSSMLITGAPQ




ETS00190.1
CPTNPQYFVMKDMIQQAAFDKLFIQFYNNPVCDAIPGNTAGDKFNYDDWEAVIAGSAKSKSAKL




GI:572276883
YIGLPAIQEPNESGYIDPIAMKNLVCQYKDRPHFGGLSLWDLSRGLVNNINGTSFNQWALDALQ





YGCNPIPTTTTTTSTASSTSKASTTSKASTTSKASTTSKASTTSKASTTSKASTTSKASTTSKA





STTSKASTTSKASTTSKASTTSKASTTSKVSTTSKASTTSKASTTSKASSTSKVSTTSKASTTS





KVSAKATTSTKASTTVKPSTTSKASTTSKASTTSKASTTSKASTTSKASTTSKASTTSKASTTS





KAATTSVKPTSKTSTSSKPNVSASSSNVGRDATSLVEASTSTSAAVLYPTTTSRWSNSTITRSS





SLTTPIVSDPASLTTSVVYTTSVHTVTKCPAYVTDCPAGGYVTTETIPLYTTVCPISEATQTAA





PTVTTEAPQPWTTSTVYTTRVYTITSCAPGVVDCPANQVTTETIPWYTTVCPVTATATPVGPGS





VVFPQNTEVGQPSLVGPVVEAAYPTASSSLQTLVKPATSVGVPQGSPAGSSVAPGSSSKPTAPA





GPPSYPTGGSGNASPSGSWSGVPVGPSSVPGIPEANAASVMSASLFGLVIVMAAQVFVL




Xylanase regulator
MLSNPLRRYSAYPDISSASFDPNYHGSQSHLHSINVNTFGNSHPYPMQHLAQHAELSSSRMIRA
549



[T. reesei RUT
SPVQPKQRQGSLIAARKNSTGTAGPIRRRISRACDQCNQLRTKCDGLHPCAHCIEFGLGCEYVR




C-30]
ERKKRGKASRKDIAAQQAAAAAAQHSGQVQDGPEDQHRKLSRQQSESSRGSAELAQPAHDPPHG




ET502023.1
HIEGSVSSFSDNGLSQHAAMGGMDGLEDHHGHVGVDPALGRTQLEASSAMGLGAYGEVHPGYES




GI:572278872
PGMNGHVMVPPSYGAQTTMAGYSGISYAAQAPSPATYSSDGNFRLTGHIHDYPLANGSSPSWGQ





SDLRYPVLEPLLPHLGNILPVSLACDLIDLYFSSSSSAQMHPMSPYVLGFVFRKRSFLHPTNPR





RCQPALLASMLWVAAQTSEASFLTSLPSARSKVCQKLLELTVGLLQPLIHTGTNSPSPKTSPVV





GAAALGVLGVAMPGSLNMDSLAGETGAFGAIGSLDDVITYVHLATVVSASEYKGASLRWWGAAW





SLARELKLGRELPPGNPPANQEDGEGLSEDVDEHDLNRNNTRFVTEEEREERRRAWWLVYIVDR





HLALCYNRPLFLLDSECSDLYHPMDDIKWQAGKFRSHDAGNSSINIDSSMTDEFGDSPRAARGA





HYECRGRSIFGYFLSLMTILGEIVDVHHAKSHPRFGVGFRSARDWDEQVAEITRHLDMYEESLK





RFVAKHLPLSSKDKEQHEMHDSGAVTDMQSPLSVRTNASSRMTESEIQASIVVAYSTHVMHVLH





ILLADKWDPINLLDDDDLWISSEGFVTATSHAVSAAEAISQILEFDPGLEFMPFFYGVYLLQGS





FLLLLIADKLQAEASPSVIKACETIVRAHEACVVTLSTEYQRNFSKVMRSALALIRGRVPEDLA





EQQQRRRELLALYRWTGNGTGLAL




SGNH hydrolase [T.
MLLVQVRPSSSPAIDLIRGTELRILPVGDSITYGFLSDQDGGDGNGYRLQLRQHLSKDRVVFAG
550




reesei RUT C-30]

TETSGNMTDGYYAAWNGKTIQYISDHVTPSLEQRPNIILLHAGTNDMNPNGAISREGHDPVAAS




ET503411.1
ERLGSLVDKMTTLCPDAVILVAMIIGTCNDEQAPQTKVFQSLIPNVVAPRLESGKHVLAVDFST




GI:572280314
FPLDKLRDCIHPTNEGYHLLGYYWYDFIAQIPRDWITAPVGEDPQRPEEQNLAMRLETDLLLLG





LLGLLVVLMYA




Xylanase III [T.
MKANVILCLLAPLVAALPTETIHLDPELAALRANLTERTADLWDRQASQSIDQLIKRKGKLYFG
551




reesei RUT C-30]

TATDRGLLQREKNAAIIQADLGQVTPENSMKWQSLENNQGQLNWGDADYLVNFAQQNGKSIRGH




ETS05245.1
TLIWHSQLPAWVNNINNADTLRQVIRTHVSTVVGRYKGKIRAWDVVNEIFNEDGTLRSSVFSRL




GI:572282231
LGEEFVSIAFRAARDADPSARLYINDYNLDRANYGKVNGLKTYVSKWISQGVPIDGIGSQSHLS





GGGGSGTLGALQQLATVPVTELAITELDIQGAPTTDYTQVVQACLSVSKCVGITVWGISDKDSW





RASTNPLLFDANFNPKPAYNSIVGILQ




Hypothetical
MPSLTALAGLLALVPSALAGWNPDSKQNIAVYWGQNSANSQSTQQRLSFYCNDDNINVIEIAFL
552



protein
NGINPPMTNFANAGDRCTPFSDNPWLLSCPEIEADIKTCQANGKTILLSLGGDTYSQGGWASPE




M419DRAFT_94061
AAQDAAAQVWAMFGPVQSDSSAPRPFGDAVVDGFDFDFESTTNNLVAFGAQLRTLSDAAATDSN




[T. reesei RUT C-
KKFYLAAAPQCFFPDAAVGPLINAVPMDWIQIQFYNNPCGVSAYTPGSEQQNNYNYQTWEDWAK




30]
TSPNPNVKLLVGIPAGPNAGHGYVSDAQLKSVFEYSKKFDTFAGAMMWDMSQLYQNSGFEDQVV




ETS6436.1
DALK




GI:572283462





Endo-1,4-beta-
MVSFTSLLAGVAAISGVLAAPAAEVESVAVEKRQTIQPGTGYNNGYFYSYWNDGHGGVTYTNGP
553



xylanase 2 (also
GGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSGSYNPNGNSYLSVYGWSRNPLIEYYIVENFGT




known as EX 2;
YNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSIIGTATFYQYWSVRRNHRSSGSVNTANHFNAW




Xylanase 2; 1,4-
AQQGLTLGTMDYQIVAVEGYFSSGSASITVS




beta-D-xylan





xylanohydrolase 2;





Alkaline endo-





beta-1,4-xylanase





P36217.2





GI:1042782319





Endo-1, 4-beta-
MKANVILCLLAPLVLPTETIHLDPELLRANLTERTADLWDRQASQSIDQLIKRKGKLYFG
554



xylanase 3 (also
TATDRGLLQREKNAAIIQADLGQVTPENSMKWQSLENNQGQLNWGDADYLVNFAQQNGKSIRGH




known as Xylanase;
TLIWHSQLPAWVNNINNADTLRQVIRTHVSTVVGRYKGKIRAWDVVNEIFNEDGTLRSSVFSRL




1,4-beta-D-xylan
LGEEFVSIAFRAARDADPSARLYINDYNLDRANYGKVNGLKTYVSKWISQGVPIDGIGSQSHLS




xylanohydrolase 3)
GGGGSGTLGALQQLATVPVTELAITELDIQGAPTTDYTQVVQACLSVSKCVGITVWGISDKDSW




A0A024SIB3.1
RASTNPLLFDANFNPKPAYNSIVGILQ




GI:1042851768





TPA_Inf: chitinase
MFFSKALAAAGLLATAAYAAPTMEKRAAGGKLVVYWGAEDDSTTLANVCADSSYDIVNLAFLSR
580



18-13 [T. reesei]
FFAGGGYPELSLSTLGGPSAAQRAAGATNLQDGTSLIPAIQACQAAGKLVILSMGGAVDFSAVT




DAA5861.1
LSSDAQGQQLADTVWNLFLGGTANPTLRPFGSVKLDGVDLDNETGNPTGYLAMAQRFKSNFAKD




GI:1232265
TSKKYYLTAAPQCPFPDASEPLNVCQLADYIWVQFYNNGNCNIAQSGFNNAVKNWSKSIGNATL





FIGALASGADGDQGYVSASSLLSAYQGVSALNLPNIGGIMLWEAQLAVKNGNFQKTVKAGIASG





TTPPPPPPTGGCSWAGHCAGASCSTDNDCSDDLTCNGGVCGTAGSTPAPTCSWEGHCLGASCGN





DNDCSDPYSCKNGVCSN




GI:572280314

673





674





675





676



94 RecName:
MKANVILCLLAPLVAALPTETIHLDPELAALRANLTERTADLWDRQASQSIDQLIKRKGKLYFG
677



Full=Endo-1,4-
TATDRGLLQREKNAAIIQADLGQVTPENSMKWQSLENNQGQLNWGDADYLVNFAQQNGKSIRGH




beta-xylanase 3;
TLIWHSQLPAWVNNINNADTLRQVIRTHVSTVVGRYKGKIRAWDVVNEIFNEDGTLRSSVFSRL




Short=Xylanase 3;
LGEEFVSIAFRAARDADPSARLYINDYNLDRANYGKVNGLKTYVSKWISQGVPIDGIGSQSHLS




AltName: Full=1,4-
GGGGSGTLGALQQLATVPVTELAITELDIQGAPTTDYTQVVQACLSVSKCVGITVWGISDKDSW




beta-D-xylan
RASTNPLLFDANFNPKPAYNSIVGILQ




xylanohydrolase 3;





Flags: Precursor





347 aa protein





Beta-galactosidase
MKLQSILSCWAILVAQIWATTDGLTDLVAWDPYSLTVNGNRLFVYSGEFHYPRLPVPEMWLDVF
678



[Aspergillus
QKMRAHGFNAVSLYFFWDYHSPINGTYDFETGAHNIQRLFDYAQEAGIYIIARAGPYCNAEFNG





niger]

GGLALYLSDGSGGELRTSDATYHQAWTPWIERIGKIIAENSITNGGPVILNQIENELQETTHSA




A0V94178.1
SNTLVEYMEQIEEAFRAAGVDVPFTSNEKGQRSRSWSTDYEDVGGAVNVYGLDSYPGGLSCTNP




GI:1078570522
STGFSVLRNYYQWFQNTSYTQPEYLPEFEGGWFSAWGADSFYDQCTSELSPQFADVYYKNIIGQ





RVTLQNLYMLYGGTNWGHLAAPVVYTSYDYSAPLRETRQIRDKLSQTKLVGLFTRVSSGLLGVE





MEGNGTSYTSTTSAYTWVLRNPNTTAGFYVVQQDTTSSQTDITFSLNVNTSAGAFTLPNINLQG





RQSKVISTDYPLGHSTLLYVSTDIATYGTFGDTDVVVLYARSGQEVSFSFKNTTKLTFEEYGDS





VNLTSSSGNRTITSYTYTQGSGTSVVKFSNGAIFYLVETETAFRFWAPPTTTDPYVTAEQQIFV





LGPYLVRNVSISGSVVDLVGDNDNATTVEVFAGSSAKAVKWNGKEITVTKTDYGSLVGSIGGAD





SSSITIPSLTGWKVRDSLPEIQSSYDDSKWTVCNKTTTLSPVDPLSLPVLFASDYGYYTGIKIY





RGRFDGTNVTGANLTAQGGLAFGWNVWLNGDLVASLPGDADETSSNAAIDFSNHTLKQTDNLLT





VVIDYTGHDETSTGDGVENPRGLLGATLNGGSFTSWKIQGNAGGAAGAYELDPVRAPMNEGGLL





AERQGWHLPGYKAKSSDGWTDGSPLDGLNKSGVAFYLTTFTLDLPKNYDVPLGIQFTSPSTVDP





VRIQLFINGYQYGKYVPYLGPQTTFPIPPGIINNRDKNTIGLSLWAQTDAGAKLENIELISYGA





YESGFDAGNGTGFDLNGAKLGYQPEWTEARAKYT




Beta-galactosidase
MTRITKLCVLLLSSIGLLAAAQNQTETGWPLHDDGLTTDVQWDHYSFKVHGERIFVFSGEFHYW
679



[Aspergillus
RIPVPGLWRDILEKIKAAGFTTFAFYSSWAWHAPNNHTVDFSTGARDITPIFELAKELGMYIIV





niger]

RPGPYINAEASAGGFPLWLTTGDYGTLRNNDSRYTEAWKPYFEKMTEITSRYQITNGHNTFCYQ




A0V94179.1
IENEYGDQWLSDPSERVPNETAIAYMELLESSARENGILVPFTANDPNMNAMAWSRDWSNAGGN




GI:1078570524
VDVVGLDSYPSCWTCDVSQCTSTNGEYVAYQVVEYYDYFLDFSPTMPSFMPEFQGGSYNPWAGP





EGGCGDDTGVDFVNLFYRWNIAQRVTAMSLYMLYGGTNWGAIAAPVTATSYDYSSPISEDRSIS





SKYYETKLLSLFTRSARDLTMTDLIGNGTQYTNNTAVKAYELRNPTTNAGFYVTLHEDSTVGTN





EAFNLRVNTSAGNLIVPRRGGSIRLNGHQSKIIVTDFTFGSETLLYSTAEVLTYAVIDKKPTLV





LWVPTGESGEFAVKGAKSGSVVSKCQSCPAINFHQQGGNLIVGFTQFQGMSVVQIDNDIRVVLL





DRTAAYKFWAPALTEDPLVPEDEAVLIQGPYLVRSASLEKSTLAIKGDSINETAVEIFAPENVK





TITWNGKQLKTSKSSYGSLKATIAAPASIQLPAFTSWKVNDSLPERLPTYDASGPAWVDANHMT





TANPSKPATLPVLYADEYGFHNGVRLWRGYFNGTASGVFLNVQGGSAFGFSAYLNGHFLGSYLG





NASIEQANQTFLFPNNITHPTTQNTLLVIHDDTGHDETTGALNPRGILEARLLPSDTTNNSTSP





EFTHWRIAGTAGGESNLDPVRGAWNEDGLYAERVGWHLPGFDDSTWSSASSSLSFTGATVKFFR





TTIPLDIPRGLDVSISFVLGTPDNAPNAYRAQLFVNGYQYGRFNPYIGNQVVFPVPVGVLDYTG





ENTIGVAVWAQTEDGAGITVDWKVNYVADSSLDVSGLETGELRPGWSAERLKFA




Beta-galactosidase
MKTSFLLAIGLAVEACLGLVSAPNYVRQINATDSSLQDIVTWDEYSIRVRGERILLLLGEFHPF
680



[Aspergillus
RLPCPGLWLDVFQKVRALGFSAVSFYVDWALLEGERGSIRADGVFALEEFFQAATEAGLYLTAR





niger]

PGPYINAEVSGGGFPGWLKRVQGRLKTTDQGYLDAITPYMQAIGRIIAKAQITNGGPVILFQPE




A0V94180.1
NEYTACVQDEGYTQVSNYSMPDINSSCLQKEYMAYVEEQYRKAGIVVPFIVNDADPMGNFAPGT




GI:1078570526
GVGAVDIYSFDDYPLQWSTAPSNPSNWSSLISPLLSYNETVHEEQSPTTPFSISEFQGGVPDAW





GGVGIETSAAYIGPEFERIFYKINYGFRAAIQNLYMIFGGTNWGNLGHSGGYTSYDVGAAIAED





RQVIREKYSELKLQSNFLQASSAYLETHSDNGSYGIYTDATSLAVTRLAGNPTNFYVVRHGELT





SRESTSYKLRVNTSAGNLAIPQLSGSLSLHGRDSKIHLVDYNVGNVSLIYSTAELFTWKQAGSK





SVVVLYGGEDELHEFAVPANKGKPTSIEGDGLQVQQINSTTVIQWAVQPSRRVVHFSDTLEVHL





LWRNEAYNYWVLDLPVPGAIGRHVSRSHTNRSVIVKAGYLLRTAEIIGTSLYLTGDINTTTTIE





LISAPQPVTSILFNKNRIPTTITSPGRLTGTLTYHKPNISLPDLTTLDWYYLNTLPEVHDPTYD





DHLWTPCTHTTTANPRNLTTPTSLYASDYGYNGGTLLYRGTFTATGNETSLYLLTEGGYAYGHS





IWLNNTFLASWPGNPAFLLSNQTITFPSPLTPGTTYKLTILIDHLGNDENFPANGEFMKDPRGI





LDYTLHGRDDKSAISWKMTGNFGGESYADLSRGPLNEGALFAERKGYHLPGAPTEQWTKRSPFD





GLPEDERPGVGFFATKFDLQIPDGYDVPISVVFENSTMAGDGSGPARFRSELFVNGWQFGKYVN





HIGPQLSYPVPEGILNYNGSNYLALTIWAMDEKSFKLDGLRLQANAVVQSGYRKPSLVKGEVYK





ERVDSY




Lactase B
MTLQCKLESACSSTPHNAMVSVLQQDQWAGEPAEQQPHLSAVAAMGRDNECTMFEPGLSGHLLR
681



[Aspergillus
GGHEATQVRNMVIEILF





luchuensis]






GAT22890.1





GI:1002328951





Lactase B
MTRITKLCALLLSSTGLLAAAQNQTETGWPLYDDGLTTDIQWDHYSFKVHGERIFVFSGEFHYW
682



[Aspergillus
RIPVPGLWRDILEKIKAAGFTTFSIYSSWAWHAPNNHTVDFSTGARDITPIFELAKELGMYIIV





luchuensis]

RPGPYINAEASAGGFPLWLTTGDYGTLRNNDSRYTAAWKPYFEKMTEITSRYQVTNGHNTFCYQ




GAT26827.1
IENEYGDQWLSDPSERVPNETAIAYMELLESSARENGILVPFTANDPNMNAMAWSRDWSNAGGN




GI:1002325961
VDVVGLDSYPSCWTCDVSQCTSTNGEYVAYQVVEYYDYFLEFSPTMPSFMPEFQGGSYNPWAGP





EGGCGDDTGVDFVNLFYRWNIAQRVTAMSLYMLYGGTNWGAIAAPVTATSYDYSSPISEDRSIS





SKYYETKLLSLFTRSARDLTMTDLIGNGTQYTNNTAVKAYELRNPTTNAGFYVTLHEDSTVGTN





EAFSLRVNTSAGNLIVPRLGGSIRLNGHQSKIIVTDFTFGSETLLYSTAEVLTYAVLDKKPTLV





LWVPTGESGEFAVKGAKSGSVVSKCQDCSAINFHQQGGNLVVGFTQAQGMSIVQIDNDIRVILL





DRTAAYEFWAPALTEDPLVPEDEAVLIQGPYLVRSASLEKSTLAIKGDSINETAVEIFAPNDVK





TVTWNGKQLKTSKSSYGSLKATIAAPVSIQLPAFTSWKVNDSLPERLPTYDASGLAWVDANHMT





TANPSKPATLPVLCADEYGFHNGVRLWRGYFNGTASGVFLNVQGGSAFGFSAYLNGQFLGSYLG





NASIEQANQTFVFPTNITHPTTQNTLLIIHDDTGHDETTGALNPRGILEARLLPSTTTDNTASP





EFTHWRLAGTAGGESNLDPVRGAWNEDGLYAERVGWHLPGFDDSTWPSVSSSSLSFTGATVKFF





RTTIPLNIPRGLDVSISFVLGTPDNAPNTYRAQLFVNGYQYGRFNPYIGNQVVFPVPVGVLDYS





GENTIGVAVWAQTEDGAAITVDWKVNYVADSSLDVAGLETAGLRPGWSVERLKFA




Putative Lactase B
MPFFMPEFQGGSYNPWDGPEGGCTEDTGAEFANLFYRWNIAQRVTAMSLYMMYGGTNWGGLAAP
683



[Aspergillus
VTATSYDYSAPISEDRSIGSKYYETKLLALFTRCAKDLTMTDRIDNGTQYTTNAAISATELRNP





calidoustus]

ETNAAFYVTNHLDTTLGTDESFKLHVDTSEGALTIPKHGGAIRLNGHQSKIIVTDFRLGRETLL




CEN62581.1
YSTAEVLTYAVFDKKPTLVLWVPAGESGEFAIKGAKRGSATTCSDCSPVEFHRSKESLTVSFTQ




GI:972234022
ADGISIVQLDNGVRVLLLDRPSAYTFWAPALTDDPLVPETESVFVSGPYLVRSAKLSGSTLALR





GDSNGKTAIEVFAPKKVNKITWNGRRIKVTKTRYGSLKASLASAPSIELPALDGWKVSDSLPER





LPAYDDSGAAWVDADHMTTPNPHKPATLPVLYADEYGFHNGVRLWRGYFNSSASGVFLNIQGGA





AFGWSAYLNGHFLDSYLGDASTNQANGTLSFPDDTLNTDGTPNVLLVIHDDTGHDQTTGVLNPR





GILEARLLPLDTESDTEAPEFTHWRVAGTAGGESDLDPVRGVYNEDGLFAERVGWHLPGFDDDD





WPAANNSLSFTGATVKFFRTVIPPLDIPQGVDVSISFVFSASSGGNSSSSSSSTGGNTRAFRAQ





LFVNGYQYGRFNPYVGNQIVYPVPPGILDYNGENTIGVAVWAQTEAGASLELDWRVNYVVDSSL





DVANLDVGGLRPEWEEERLSFA




Beta-plucosidase,
MNVNMFKAGDDILQDVDQSCKDRLPAVEELPLPPTFTWGTATAAYQVEGGAFQDGKGKSIWDTF
684



lactase
THLDPSRTNGENGDIACDHYNRMAEDVVLMASYGVDVYRFSIAWARILPLGGRGDPINEKGIAF




phlorizinhydrolase
YNNLIDCLLEHNIEPVVTLYHWDVPQGLYDRYGAFLDTTEFRADFEHFARLCFSRFGDRVKRWI




[Aspergillus
TFNEPYIIAIFGHHSGVLAPGRSSATGGDSRTEPWRVGHTIILAHTAAVQAYATDFQPTQKGDI




oryzae 100-8]
SIVLNGHYYEPWDAGSEEHWLAAQRRLEFYIGWFGDPIFLGKDYPAPMRAQLGSRLPEFTSEEL




KDE76127.1
DLLRRSAPINSFYGMNHYTTKYARALPDPPAEDDCTGNVEEGPTNSEGKTMGPLSGMSWLRVTP




GI:635504017
AGFRKLLNWVWDRYRRPIVVTENGCPCPGESQMTKEQALDDQFRIRYFGLYLDAISRAIYDDGV





KVEGYYVWSLMDNFEWSAGYGPRYGITHVDFTTLVRTPKQSAKYLHHSFNKRRATSLR




Beta-glucosidase,
MGSTSTSTLPPDFLWGFATASYQIEGAVNEDGRGPSIWDTFCKIPGKIAGGANGDVACDSYHRT
685



lactase
HEDIALLKACGAKAYRFSLSWSRIIPLGGRNDPINEKGLQYYIKFVDDLHAAGITPLVTLFHWD




phlorizinhydrolase
LPDELDKRYGGLLNKEEFVADFAHYARIVFKAFGSKVKHWITFNEPWCSSVLGYNVGQFAPGRT




[Aspergillus
SDRSKSPVGDSSRECWIVGHSLLVAHGAAVKIYRDEFKASDGGEIGITLNGDWAEPWDPENPAD




oryzae 3.042]
VEACDRKIEFAISWFADPIYHGKYPDSMVKQLGDRLPKWTPEDIALVHGSNDFYGMNHYCANFI




EIT76661.1
KAKTGEADPNDTAGNLEILLQNRKGEWVGPETQSPWLRPSAIGFRKLLKWLSERYNYPKIYVTE




GI:391867415
NGTSLKGENDLPLEQLLQDDFRTQYFRDYIGAMADAYTLDGVNVRAYMAWSLMDNFEWAEGYET





RFGVTYVDYENNQKRIPKQSAKAIGEIFDQYIEKA




Beta-glucostdase,
MNVNMFKAGDDILQDVDQSCKDRLPAVEELPLPPTFTWGTATAAYQVEGGAFQDGKGKSIWDTF
686



lactase
THLDPSRTNGENGDIACDHYNRMAEDVVLMASYGVDVYRFSIAWARILPLGGRGDPINEKGIAF




phlortzinhydrolase
YNNLIDCLLEHNIEPVVTLYHWDVPQGLYDRYGAFLDTTEFRADFEHFARLCFSRFGDRVKRWI




[Aspergillus
TFNEPYIIAIFGHHSGVLAPGRSSATGGDSRTEPWRVGHTIILAHTAAVQAYATDFQPTQKGDI




oryzae 3.42]
SIVLNGHYYEPWDAGSEEHWLAAQRRLEFYIGWFGDPIFLGKDYPAPMRAQLGSRLPEFTSEEL




EIT82651.1
DLLRRSAPINSFYGMNHYTTKYARALPDPPAEDDCTGNVEEGPTNSEGKTMGPLSGMSWLRVTP




GI:391873626
AGFRKLLNWVWDRYRRPIVVTENGCPCPGESQMTKEQALDDQFRIRYFGLYLDAISRAIYDDGV





KVEGYYVWSLMDNFEWSAGYGPRYGITHVDFTTLVRTPKQSAKYLHHSFNKRRATSLR




Beta-glucosidase,
MNVNMFKAGDDILQDVDQSCKDRLPAVEELPLPPTFTWGTATAAYQVEGGAFQDGKGKSIWDTF
687



lactase
THLDPSRTNGENGDIACDHYNRMAEDVVLMASYGVDVYRFSIAWARILPLGGRGDPINEKGIAF




phlortzinhydrolase
YNNLIDCLLEHNIEPVVTLYHWDVPQGLYDRYGAFLDTTEFRADFEHFARLCFSRFGDRVKRWI




[Aspergillus
TFNEPYIIAIFGHHSGVLAPGRSSATGGDSRTEPWRVGHTIILAHTAAVQAYATDFQPTQKGDI




oryzae 3.042]
SIVLNGHYYEPWDAGSEEHWLAAQRRLEFYIGWFGDPIFLGKDYPAPMRAQLGSRLPEFTSEEL




EIT82651.1
DLLRRSAPINSFYGMNHYTTKYARALPDPPAEDDCTGNVEEGPTNSEGKTMGPLSGMSWLRVTP




GI:391873626
AGFRKLLNWVWDRYRRPIVVTENGCPCPGESQMTKEQALDDQFRIRYFGLYLDAISRAIYDDGV





KVEGYYVWSLMDNFEWSAGYGPRYGITHVDFTTLVRTPKQSAKYLHHSFNKRRATSLR




Lactase B
MTRITKLCALLLSSTGLLAAAQNQTETGWPLYDDGLTTDIQWDHYSFKVHVPGLWRDILEKIKA
688



[Aspergillus
AGFTTFSIYSSWAWHAPNNHTVDFSTGARDITPIFELAKELGMYIIVRPGPYINAEASAGGFPL




kawachii IFO 4308]
WLTTGDYGTLRNNDSRYTAAWKPYFEKMTEITSRYQVTNGHNTFCYQIENEYGDQWLSDP




GAA82087.1
PNETAIAYMELLESSARENGILVPFTANDPNMNAMAWSRDWSNAGGNVDVVGLDSYPSCWTCDV




GI:358365465
SQCTSTNGEYVAYQVVEYYDYFLEFSPTMPSFMPEFQGGSYNPWAGPEGGCGDDTGVDFVNLFY





RWNIAQRVTAMSLYMLYGGTNWGAIAAPVTATSYDYSSPISEDRSISSKYYETKLLSLFTRSAR





DLTMTDLIGNGTQYTNNTAVKAYELRNPTTNAGFYVTLHEDSTVGTNEAFSLRVNTSAGNLIVP





RLGGSIRLNGHQSKIIVTDFTFGSETLLYSTAEVLTYAVLDKKPTLVLWVPTGESGEFAVKGAK





SGSVVSKCQDCSAINFHQQGGNLVVGFTQAQGMSIVQIDNDIRVILLDRTAAYEFWAPALTEDP





LVPEDEAVLIQGPYLVRSASLEKSTLAIKGDSINETAVEIFAPNDVKTVTWNGKQLKTSKSSYG





SLKATIAAPVSIQLPAFTSWKVNDSLPERLPTYDASGLAWVDANHMTTANPSKPATLPVLYADE





YGFHNGVRLWRGYFNGTASGVFLNVQGGSAFGFSAYLNGQFLGSYLGNASIEQANQTFVFPTNI





THPTTQNTLLIIHDDTGHDETTGALNPRGILEARLLPSTTTDNTASPEFTHWRLAGTAGGESNL





DPVRGAWNEDGLYAERVGWHLPGFDDSTWPSVSSSSLSFTGATVKFFRTTIPLNIPRGLDVSIS





FVLGTPDNAPNTYRAQLFVNGYQYGRFNPYIGNQVVFPVPVGVLDYSGENTIGVAVWAQTEDGA





AITVDWKVNYVADSSLDVAGLETAGLRPGWSVERLKFA




Probable beta-
MRILSLLFLLLLGFLAGNRVVSATDHGKTTDVTWDRYSLSVKGERLFVFSGEFHYQRLPVPEMW
689



galactosidase C
LDVFQKLRANGFNAISVYFFWGYHSASEGEFDFETGAHNIQRLFDYAKEAGIYVIARAGPYCNA




(Lactase C)
ETTAGGYALWAANGQMGNERTSDDAYYAKWRPWILEVGKIIAANQITNGGPVILNQHENELQET




AlCE56.1
SYEADNTLVVYMKQIARVFQEAGIVVPSSHNEKGMRAVSWSTDHHDVGGAVNIYGLDSYPGGLS




GI:00680864
CTNPSSGFNLVRTYYQWFQNSSYTQPEYLPEFEGGWFQPWGGHDYDTCATELSPEFADVYYKNN





IGSRVTLQNIYMVFGGTNWGHSAAPVVYTSYDYSAPLRETREIRDKLKQTKLIGLFTRVSSDLL





KTHMEGNGTGYTSDSSIYTWALHNPDTNAGFYVLAHKTSSSRSVTEFSLNVTTSAGAISIPDIQ





LDGRQSKIIVTDYQFGKSSALLYSSAEVLTYANLDVDVLVLYLNVGQKGLFVFKDERSKLSFQT





YGNTNVTASVSSHGTQYIYTQAEGVTAVKFSNGVLAYLLDKESAWNFFAPPTTSNPQVAPDEHI





LVQGPYLVRGVTINHDTVEIIGDNANTTSLEVYAGNLRVKVVKWNGKAIKSRRTAYGSLVGRAP





GAEDARISPPSLDSWSAQDTLPDIQPDYDDSRWTVCNKTASVNAVPLLSLPVLYSGDYGYHAGT





KVYRGRFDGRNVTGANVTVQNGVASGWAAWLNGQFVGGVAGAIDLAVTSAVLSFNSSLLHDRDN





VLTVVTDYTGHDQNSVRPKGTQNPRGILGATLIGGGKFTSWRIQGNAGGEKNIDPVRGPINEGG





LYGERMGWHLPGYKAPRSAAKSSPLDGISGAEGRFYTTTFTLKLDRDLDVPIGLQLGAPAGTQA





VVQVFMNGYQFGHYLPHIGPQSLFPFPPGVINNRGENTLAISMWALTDAGAKLDQVELVAYGKY





RSGFDFNQDWGYLQPQWKDNRRQYA




Probable beta-
MAHIYRLLLLLLSNLWFSAAAQNQSETEWPLHDNGLSKVVQWDHYSFQVNGQRIFIFSGEFHYW
690



galactosidase B
RIPVPELWRDILEKVKATGFTAFAFYSSWAYHAPNNRTVDFSTGARDITPIFELAKELGMYMIV




(Lactase B)
RPGPYVNAEASAGGFPLWLTTGEYGSLRNDDPRYTAAWTPYFANMSQITSKYQVTDGHNTLVYQ




A1D199.1
IENEYGQQWIGDPKDRNPNKTAVAYMELLEASALENGITVPLTSNDPNMNSKSWGSDWSNAGGN




GI:00680896
VDVAGLDSYPSCWTCDVSQCTSTNGEYVPYKVIDYYDYFQEVQPTLPSFMPEFQGGSYNPWAGP





EGGCPQDTGAEFANLFYRWNIGQRVTAMSLYMLYGGTNWGAIAAPVTATSYDYSAPISEDRSIG





AKYSETKLLALFTRTAKDLTMTEAIGNGTQYTTNTAVRAFELRNPQTNAGFYVTFHNDTTVGGN





QAFKLHVNTSVGALTVPKNEGVIQLNGHQSKIIVTDFTLGKRTLLYSTAEVLTYAVFENRPTLV





LWVPTGESGEFAIKGTKSGKVENGDGCSGINFKREKDYLVVNFSQAKGLSVLRLDNGVRVVLLD





KAAAYRFWAPALTDDPIVQETETVLVHGPYLVRSASVSKSTLALRGDSVEKTTLEIFAPHSVRK





ITWNGKEVKTSQTPYGSLKATLAAPPTIKLPALTSWRSNDSLPERLPSYDDSGPAWIEANHMTT





SNPSPPATLPVLYADEYGFHNGVRLWRGYFNGSASGVFLNIQGGSAFGWSAWLNGHFLDSHLGT





ATTSQANKTLTFSSSILNPTENVLLIVHDDTGHDQTTGALNPRGIIEARLLSNDTSSPAPGFTQ





WRIAGTAGGESNLDPIRGVFNEDGLFAERMGWHLPGFDDSAWTPENSTTSASSALSFTGATVRF





FRTVVPLDIPAGLDVSISFVLSTPSAAPKGYRAQLFVNGYQYGRYNPHIGNQVVFPVPPGILDY





QGDNTIGLAVWAQTEEGAGIQVDWKVNYVADSSLSVAGFGKGLRPGWTEERLKFA




Probable beta-
MKLLSVCAVALLAAQAAGASIKHKLNGFTIMEHSDPAKRELLQKYVTWDEKSLFVNGERIMIFS
691



galactosidase A
GEVHPFRLPVPSLWLDVFQKIKALGFNCVSFYVDWALLEGKPGKYRAEGNFALEPFFDAAKQAG




(Lactase A)
IYLLARPGPYINAEASGGGFPGWLQRVNGTLRTSDPAYLKATDNYIAHVAATVAKGQITNGGPV




A1D1Z9.1
ILYQPENEYSGACCNATFPDGDYMQYVIDQARNAGIVVPLINNDAWTGGHNAPGTGKGEVDIYG




GI:300680858
HDSYPLGFDCGHPSVWPKGNLPTTFRTDHLRESPTTPYSLIEFQAGSFDPWGGPGFAACAALVN





HEFERVFYKNDLSFGAAILNLYMTFGGTNWGNLGHPGGYTSYDYGSPLTESRNVTREKYSELKL





IGNFVKASPSYLLATPGNLTTSGYADTADLTVTPLLGNGTGSYFVVRHTDYTSQASTPYKLSLP





TSAGRLTVPQLGGTLTLNGRDSKVHVVDYNVAGTNILYSTAEVFTWKKFGDSKVLVLYGGPGEH





HELAVSLKSDVQVVEGSNSEFTSKKVEDVVVVAWDVSASRRIVQIGDLKIFLLDRNSAYNYWVP





QLDKDDSSTGYSSEKTTASSIIVKAGYLVRTAYTKGSGLYLTADFNATTPVEVIGAPSNVRNLY





INGEKTQFKTDKNGIWSTGVKYSAPKIKLPSMKDLDWKYLDTLPEVQSTYDDSAWPAADLDTTP





NTLRPLTMPKSLHSSDYGFHTGYLIYRGHFVADGSETTFDVRTQGGSAFGSSVWLNEAFLGSWT





GLNANADYNSTYRLPQVEKGKNYVLTVVIDTMGLNENWVVGTDEMKNPRGILSYKLSGRDASAI





TWKLTGNLGGEDYQDKIRGPLNEGGLYAERQGFHQPEPPSKKWKSASPLDGLSKPGIGFYTAQF





DLDIPSGWDVPLYFNFGNSTKSAYRVQLYVNGYQYGKFVSNIGPQTSFPVPQGILNYQGTNWVA





LTLWALESDGAKLDDFELVNTTPVMTALSKIRPSKQPNYRQRKGAY




Probable beta-
MKFLLRRFIALAAASSVVAAPSVSHLSLQDAANRRELLQDLVTWDQHSLFVRGERLMIFSGEFH
692



galactosidase E
PFRLPVPGLWFDVFQKITSLGFNAVSFYTDWGLMEGNPGHVVTDGIWSLDEFFTAASEAGIYLI




(Lactase E)
ARPGPYINAETSAGGIPGWVLRLKGIIRSNSEDYLRATDTYMATLGKIIAKAQITNGGPVILVQ




AlDJ58.1
PENEYTTWPNVSESEFPTTMNKEVMAYAEKQLRDAGVVVPTVVNDNKNLGYFAPGTGLGETDLY




GI:300680873
GIDAYPMRYDCGNPYVWPTYRFPRDWQHTHRNHSPTTPFAIMEFQGGSGDGWGGVTEDGCAILV





NNEAVRVVYKNNYGFGVGVFNIYMTYGGTNWGNLGYHGGYTSYDYGAAITEDRQIWREKYSEEK





LQANFLKVSPAYLTATPGNGVNGSYTGNKDIAVTPLFGNGTTTNFYLVRHADFTSTGSVQYQLS





VSTSVGNVTIPQLGGSLSLNGRDSKFHVTDYDVGEFNLIYSSAEIFTWAKGDNKKRVLVLYGGA





GELHEFALPKHLPRPTVVDGSDVKMAKKGSAWVVQWEVTAQRRVLRAGKLEIHLLWRNDAYQHW





VLELPAKQPIANYSSPSKETVLVKGGYLLRSACITNNKLHLTGDVNATTPLEVISAPKRFDGIV





FNGQSLKSTRSKIGNLAATVRYQPPAISLPDLKRLDWKYLDSLPEISPDYSDEGNMSLTNTYTN





NTRKFTGPTCLYADDYGYHGGSLIYRGHFKANGDESWVFLNTSGGVGFANSVWLNQTFLGSWTG





SGNNMTYPRNISLPHELSPGKPYVFTVVIDHMGQDEEAPGTDAIKFPRGILDYALSGHEVSDLK





WKMTGNLGGEQYQDSTRGPLNEGAMYAERRGYHLPNPPTSSWKSSSPINDGLTGAGIGFYATSF





SLDLPEGYDIPLSFLFNNSASDARSGTSYRCQLFVNGYQFGKYVNDLGPQTNFPVPEGILNYNG





VNYVAVSLWALEPQGALVGGLELVASTPILSAYRKPVPAPQPGWKPRRGAY




Probable beta-
MRIFSFLFLLLLGILTGQGLVSGTDNGKTTDVTWDKYSLSVKGQRLFVFSGEFHYQRLPVPELW
693



galactosidase C
LDVFQKLRANGFNAISVYFFWSFHSASEGEFDFENGAHDIQRLFDYAKEAGLYVIARAGPYCNA




(Lactase C)
ETSAGGFALWAANGQMGNERTSDEAYYEKWRPWILEVGKIIAKNQITNGGPVILNQHENELTET




A1DM65.1
TYDPNHTLVVYMKQIAQVFEEAGIVVPSSHNEKGMRGVSWSTDYHNVGGAVNIYGLDSYPGGLS




GI:300680868
CTNPNSGFRLVRTYYQWFQNYSSTQPSYMPEFEGGWFQPWGGSFYDTCATELSPEFPDVYYKNN





IGSRVTLHSIYMTYGGTNWGHSAAPVVYTSYDYAAPLRETREIRDKLKQTKLIGLFTRVSTDLL





KTYMEGNGTGYTSDSSIYTWSLRNPDTNAGFYVLAHSTSSARDVTTFSLNATTSAGAISIPDIE





LNGRQSKIIVTDYNFGTNSTLLFSSAEVLTYANLDVNVLVFYLNVGQKGTFALKDEPKLAFQTY





GNSNVTTSESSYGTQYSYTQGEGVTAVKFSNGVLAYLLDKESAWNFFAPPTTSSPQVAPNEHIL





VQGPYLVRGASINHGTVEITGDNANTTSIEVYTGNSQVKKVKWNGKTIETRKTAYGSLIGTVPG





AEDVKIRLPSLDSWKAQDTLPEIQPDYDDSTWTVCNKTTSVNAIAPLSLPVLYSGDYGYHAGTK





VYRGRFDGRNVTGANVTVQNGAAAGWAAWVNGQYAGGSAGSPSLAATSAVLTFNGLSLKDRDNV





LTVVTDYTGHDQNSVRPKGTQNPRGILGATLTGGGNFTSWRIQGNAGGEKNIDPVRGPMNEGGL





YGERMGWHLPGYKVPKSASKSSPLDGVSGAEGRFYTTTFKLKLDKDLDVPIGLQLGAPEGTKAV





VQVFMNGYQFGHYLPHTGPQSLFPFPPGVINNRGENTLAISMWALTDAGAKLDKVELVAYGKYR





SGFDFNQDWGYLQPGWKDRSQYA




Probable beta-
MTRITKLCVLLLSSIGLLAAAQNQTETGWPLHDDGLTTDVQWDHYSFKVHGERIFVFSGEFHYW
694



galactosidase B
RIPVPGLWRDILEKIKAAGFTTFAFYSSWAWHAPNNHTVDFSTGARDITPIFELAKELGMYIIV




(Lactase B)
RPGPYINAEASAGGFPLWLTTGDYGTLRNNDSRYTEAWKPYFEKMTEITSRYQITNGHNTFCYQ




A2QA64.2 GI:
IENEYGDQWLSDPSERVPNETAIAYMELLESSARENGILVPFTANDPNMNAMAWSRDWSNAGGN




300681011
VDVVGLDSYPSCWTCDVSQCTSTNGEYVAYQVVEYYDYFLDFSPTMPSFMPEFQGGSYNPWAGP





EGGCGDDTGVDFVNLFYRWNIAQRVTAMSLYMLYGGTNWGAIAAPVTATSYDYSSPISEDRSIS





SKYYETKLLSLFTRSARDLTMTDLIGNGTQYTNNTAVKAYELRNPTTNAGFYVTLHEDSTVGTN





EAFSLRVNTSAGNLIVPRLGGSIRLNGHQSKIIVTDFTFGSETLLYSTAEVLTYAVIDKKPTLV





LWVPTDESGEFAVKGAKSGSVVSKCQSCPAINFHQQGGNLIVGFTQSQGMSVVQIDNDIRVVLL





DRTAAYKFWAPALTEDPLVPEDEAVVLIQGPYLVRSASLEKSTLAIKGDSINETAVEIFAPENV





KTITWNGKQLKTSKSSYGSLKATIAAPASIQLPAFTSWKVNDSLPERLPTYDASGPAWVDANHM





TTANPSKPATLPVLYADEYGFHNGVRLWRGYFNGTASGVFLNVQGGSAFGFSAYLNGHFLGSYL





GNASIEQANQTFLFPNNITHPTTQNTLLVIHDDTGHDETTGALNPRGILEARLLPSDTTNNSTS





PEFTHWRIAGTAGGESNLDPVRGAWNEDGLYAERVGWHLPGFDDSTWSSVSSSSSLSFTGATVK





FFRTTIPLDIPRGLDVSISFVLGTPDNAPNAYRAQLFVNGYQYGRFNPYIGNQVVFPVPVGVLD





YTGENTIGVAVWAQTEDGAGITVDWKVNYVADSSLDVSGLETGELRPGWSAERLKFA




Probable beta-
MKLSSACAIALLAAQAAGASIKHRINGFTLTEHSDPAKRELLQKYVTWDDKSLFINGERIMIFS
695



galactosidase A
GEFHPFRLPVKELQLDIFQKVKALGFNCVSFYVDWALVEGKPGEYRADGIFDLEPFFDAASEAG




Lactase A
IYLLARPGPYINAESSGGGFPGWLQRVNGTLRSSDKAYLDATDNYVSHVAATIAKYQITNGGPI




A2QAN3.1
ILYQPENEYTSGCCGVEFPDPVYMQYVEDQARNAGVVIPLINNDASASGNNAPGTGKGAVDIYG




GI:300680857
HDSYPLGFDCANPTVWPSGDLPTNFRTLHLEQSPTTPYAIVEFQGGSYDPWGGPGFAACSELLN





NEFERVFYKNDFSFQIAIMNLYMIFGGTNWGNLGYPNGYTSYDYGSAVTESRNITREKYSELKL





LGNFAKVSPGYLTASPGNLTTSGYADTTDLTVTPLLGNSTGSFFVVRHSDYSSEESTSYKLRLP





TSAGSVTIPQLGGTLTLNGRDSKIHVTDYNVSGTNIIYSTAEVFTWKKFADGKVLVLYGGAGEH





HELAISTKSNVTVIEGSESGISSKQTSSSVVVGWDVSTTRRIIQVGDLKILLLDRNSAYNYWVP





QLATDGTSPGFSTPEKVASSIIVKAGYLVRTAYLKGSGLYLTADFNATTSVEVIGVPSTAKNLF





INGDKTSHTVDKNGINSATVDYNAPDISLPSLKDLDWKYVDTLPEIQSSYDDSLWPAADLKQTK





NTLRSLTTPTSLYSSDYGFHTGYLLYRGHFTATGNESTFAIDTQGGSAFGSSVWLNGTYLGSWT





GLYANSDYNATYNLPQLQAGKTYVITVVIDNMGLEENWTVGEDLMKTPRGILNFLLAGRPSSAI





SWKLTGNLGGEDYEDKVRGPLNEGGLYAERQGFHQPEPPSQNWKSSSPLEGLSEAGIGFYSASF





DLDLPKGWDVPLFLNIGNSTTPSPYRVQVYVNGYQYAKYISNIGPQTSFPVPEGILNYRGTNWL





AVTLWALDSAGGKLESLELSYTTPVLTALGEVESVDQPKYKKRKGAY




Probable beta-
MKLQSILSCWAILVAQIWATTDGLTDLVAWDPYSLTVNGNRLFVYSGEFHYPRLPVPEMWLDVF
696



galactosidase C
QKMRAHGFNAVSLYFFWDYHSPINGTYDFETGAHNIQRLFDYAQEAGIYIIARAGPYCNAEFNG




(Lactase C)
GGLALYLSDGSGGELRTSDATYHQAWTPWIERIGKIIADNSITNGGPVILNQIENELQETTHSA




A2QL84.1
SNTLVEYMEQIEEAFRAAGVDVPFTSNEKGQRSRSWSTDYEDVGGAVNVYGLDSYPGGLSCTNP




GI:300680867
STGFSVLRNYYQWFQNTSYTQPEYLPEFEGGWFSAWGADSFYDQCTSELSPQFADVYYKNNIGQ





RVTLQNLYMLYGGTNWGHLAAPVVYTSYDYSAPLRETRQIRDKLSQTKLVGLFTRVSSGLLGVE





MEGNGTSYTSTTSAYTWVLRNPNTTAGFYVVQQDTTSSQTDITFSLNVNTSAGAFTLPNINLQG





RQSKVISTDYPLGHSTLLYVSTDIATYGTFGDTDVVVLYARSGQVVSFAFKNTTKLTFEEYGDS





VNLTSSSGNRTITSYTYTQGSGTSVVKFSNGAIFYLVETETAFRFWAPPTTTDPYVTAEQQIFV





LGPYLVRNVSISGSVVDLVGDNDNATTVEVFAGSPAKAVKWNGKEITVTKTDYGSLVGSIGGAD





SSSITIPSLTGWKVRDSLPEIQSSYDDSKWTVCNKTTTLSPVDPLSLPVLFASDYGYYTGIKIY





RGRFDGTNVTGANLTAQGGLAFGWNVWLNGDLVASLPGDADETSSNAAIDFSNHTLKQTDNLLT





VVIDYTGHDETSTGDGVENPRGLLGATLNGGSFTSWKIQGNAGGAAGAYELDPVRAPMNEGGLL





AERQGWHLPGYKAKSSDGWTDGSPLDGLNKSGVAFYLTTFTLDLPKKYDVPLGIQFTSPSTVDP





VRIQLFINGYQYGKYVPYLGPQTTFPIPPGIINNRDKNTIGLSLWAQTDAGAKLENIELISYGA





YESGFDAGNGTGFDLNGAKLGYQPEWTEARAKYT




Probable beta-
MKLLSVCAIALLAAQAAGASIKHMLNGFTLMEHSDPAKRELLQKYVTWDEKSLFVNGERIMIFS
697



galactosidase A
GEVHPFRLPVPSLWLDVFQKIKALGFNCVSFYVDWALLEGKPGEYRAEGNFALEPFFDVAKQAG




(Lactase A)
IYLLARPGPYINAEASGGGFPGWLQRVNGTLRTSDPAYLKATDNYIAHVAATIAKGQITNGGPV




BXMP7.2
ILYQPENEYSGACCDATFPDGDYMQYVIDQARNAGIVVPLINNDAWTGGHNAPGTGKGEVDIYG




GI:300681017
HDSYPLGFDCGHPSVWPKGNLPTTFRTDHLKQSPTTPYSLIEFQAGSFDPWGGPGFAACAALVN





HEFERVFYKNDLSFGAAILNLYMTFGGTNWGNLGHPGGYTSYDYGSPLTESRNVTREKYSELKL





IGNFVKASPSYLLATPGNLTTSGYADTADLTVTPLLGNGTGSYFVVRHTDYTSQASTPYKLSLP





TSAGRLTVPQLGGTLTLNGRDSKIHVVDYNVAGTNIIYSTAEVFTWKNFGDSKVLILYGGPGEH





HELAVSFKSDVQVVEGSNSEFKSKKVGDVAVVAWDVSPSRRIVQIGDLKIFLLDRNSVYNYWVP





QLDKDDSSTGYSSEKTTASSIIVKAGYLVRTAYTKGSGLYLTADFNATTPVEVIGAPSNVRNLY





INGEKTQFKTDKNGIWSTEVKYSAPKIKLPSMKDLDWKYLDTLQEVQSTYDDSAWPAADLDTTP





NTLRPLTTPKSLYSSDYGFHTGYLIYRGHFVADGSETTFDVRTQGGSAFGSSVWLNESFLGSWT





GLNANADYNSTYKLPQVEQGKNYVLTILIDTMGLNENWVVGTDEMKNPRGILSYKLSGRDASAI





TWKLTGNLGGEDYQDKIRGPLNEGGLYAERQGFHQPQPPSQKWKSASPLDGLSKPGIGFYTAQF





DLDIPSGWDVPLYFNFGNSTKSAYRVQLYVNGYQYGKFVSNIGPQTSFPVPQGILNYQGTNWVA





LTLWALESDGAKLDDFELVNTTPVMTALSKIRPSKQPNYRQRKGAY




Probable beta-
MAHIYRLLLLLLSNLWFSTAAQNQSETEWPLHDNGLSKVVQWDHYSFQVNGQRIFIFSGEFHYW
698



galactosidase B
RIPVPELWRDILEKVKATGFTAFAFYSSWAYHAPNNSTVDFSTGARDITPIFELAKELGMYMIV




(Lactase B)
RPGPYVNAEASAGGFPLWLMTGEYGSLRNDDPRYTAAWTPYFANMSQITSKYQVTDGHNTLVYQ




BOXNY2.1
IENEYGQQWIGDPKNRNPNKTAVAYMELLEASARENGITVPLTSNDPNMNSKSWGSDWSNAGGN




GI:300680860
VDVAGLDSYPSCWTCDVSQCTSTNGEYVPYKVIDYYDYFQEVQPTLPSFMPEFQGGSYNPWAGP





EGGCPQDTSAEFANLFYRWNIGQRVTAMSLYMLYGGTNWGAIAAPVTATSYDYSAPISEDRSIG





AKYSETKLLALFTRTAKDLTMTEAIGNGTQYTTNTAVRAFELRNPQTNAGFYVTFHTDTTVGGN





QAFKLHVNTSVGALTVPKNEGLIQLNGHQSKIIVTDFTLGKRTLLYSTAEVLTYAVFENRPTLV





LWVPTGESGEFAIKGAKSGKVENGDGCSGIKFKREKDYLVVNFSQAKGLSVLRLDNGVRVVLLD





KAAAYRFWAPALTDDPNVQETETVLVHGPYLVRSASISKTTLALRGDSVEKTTLEIFAPHSVRK





ITWNGKEVQTSHTPYGSLKATLAAPPDIKLPALTSWRSNDSLPERLPSYDDSGPAWIEANHMTT





SNPSPPATFPVLYADEYGFHNGVRLWRGYFNGSASGVFLNIQGGSAFGWSAWLNGHFLDSHLGT





ATTSQANKTLTFPSSILNPTENVLLIVHDDTGHDQTTGALNPRGILEARLLSNDTSSPPPEFTH





WRLAGTAGGESNLDPIRGVFNEDGLFAERMGWHLPGFDDSAWTSENSATSASSALSFTGATVRF





FRSVVPLNIPAGLDVSISFVLSTPTAAPKGYRAQLFVNGYQYGRYNPHIGNQVVFPVPPGILDY





QGDNTIGLAVWAQTEEGAGIQVDWKVNYVADSSLSVAGFGKGLRPGWTEERLKFA




Probable beta-
MKSLLKRLIALAAAYSVAAAPSFSHHSSQDAANKRELLQDLVTWDQHSLFVRGERLMIFSGEFH
699



galactosidase E
PFRLPVPGLWFDVFQKIKSLGFNAVSFYTDWGLMEGNPGHVVTDGIWSLDEFFTAAREAGLYLI




(Lactase E)
ARPGPYINAETSAGGIPGWVLRRKGIIRSNSEDYLRATDTYMATLGKIIAKAQITNGGPVILVQ




BOXXE7.1
PENEYTTWPNVSESEFPTTMNQEVMAYAEKQLRDAGVVVPTVVNDNKNLGYFAPGTGLGETDLY




GI:300680872
GIDAYPMRYDCGNPYVWPTYRFPRDWQHEHRNHSPTTPFAIMEFQGGSGDGWGGVTEDGCAILV





NNEAVRVVYKNNYGFGVRVFNIYMTYGGTNWGNLGYYGGYTSYDYGAAITEDRQIWREKYSEEK





LQANFLKVSPAYLTSTPGNGVNGSYTGNKDITVTPLFGNGTTTNLYLVRHADFTSTGSAQYNLS





ISTSVGNVTIPQLGGSLSLNGRDSKFHITDYDVGGFNLIYSSAEVFTWAKGDNKKRVLVLYGGA





GELHEFALPKHLPRPTVVEGSYVKIAKQGSAWVVQWEVAAQRRVLRAGKLEIHLLWRNDAYQHW





VLELPAKQPIANYSSPSKETVIVKGGYLLRSAWITDNDLHLTGDVNVTTPLEVISAPKRFDGIV





FNGQSLKSTRSKIGNLAATVHYQPPAISLPDLKRLDWKYIDSLPEISTEYNDEGWTPLTNTYTN





NTREFTGPTCLYADDYGYHGGSLIYRGHFTANGDESWVFLNTSGGVGFANSVWLNQTFLGSWTG





SGRNMTYPRNISLPHELSPGEPYVFTVVIDHMGQDEEAPGTDAIKFPRGILDYALSGHELSDLR





WKMTGNLGGEQYQDLTRGPLNEGAMYAERQGYHLPSPPTSSWKSSNPIKEGLTGAGIGFYATSF





SLDLPEGYDIPLSFRFNNSASAARSGTSYRCQLFVNGYQFGKYVNDLGPQTKFPVPEGILNYNG





VNYVAVSLWALESQGALIGGLDLVASTPILSGYRKPAPAPQPGWKPRRGAY




Probable beta-
MRIFSFLFLLLLGILTGQGLVSGTDNGKTTDVTWDKYSLSVKGQRLFVFSGEFHYQRLPVPELW
700



galactosidase C
LDVFQKLRANGFNAISVYFFWSFHSASEGEFDFENGAHDIQRLFDYAKEAGLYVIARAGPYCNA




(Lactase C)
ETSAGGFALWAANGQMGNERTSDEAYYEKWRPWILEVGKIIAKNQITNGGPVILNQHENELVET




B0Y752.1
TYDPNHTLVVYMKQIAQVFEEAGIVVPSSHNEKGMRGVSWSTDYHNVGGAVNIYGLDSYPGGLS




GI:300680865
CTNPNSGFNLVRTYHQWFQNYSFTQPSYLPEFEGGWFQPWGGSFYDTCATELSPEFPDVYYKNN





IGSRVTLHSIYMTYGGTNWGHSAAPVVYTSYDYAAPLRETREIRDKLKQTKLIGLFTRVSKDLL





KTYMEGNGTGYTSDSSIYTWSLRNPDTNAGFYVLAHSTSSTRDVTTFTLNVTTSAGAISIPDIE





LNGRQSKIIVTDYNFGTNSTLLFSSAEVLTYANLDVNVLVFYLNVGQKGTFVFKDEPKLAFQTY





GNSNLTTSESSYGTQYSYTQGKGVTAVKFSNGVLAYFLDKESAWNFFAPPTTSSPQVAPNEHIL





VQGPYLVRGASVNHGTVEITGDNANTTSIEVYTGNSQVKKIKWNGKTIETRKTAYGSLIGTAPG





AEDVKIQLPSLDSWKAQDTLPEIQPDYDDSKWTVCNKTTSVNAIAPLSLPVLYSGDYGYHAGTK





VYRGRFDGRNVTGANVTVQNGAAAGWAAWVNGQYAGGSAGSPNLAATSAVLTFNSSSLKDQDNV





LTVVTDYTGHDQNSVRPKGTQNPRGILGATLIGGGNFTSWRIQGNAGGEKNIDPVRGPMNEGGL





YGERMGWHLPGYKVPKSASKSSPLDGVSGAEGRFYTTTFKLKLDKDLDVPIGLQLGAPEGTKAV





VQVFMNGYQFGHYLPHTGPQSLFPFPPGVINNRGENTLAISMWALTDAGAKLDKVELVAYGKYR





SGFDFNQDWGYLQPGWKDRSQYA




Probable beta-
MRLLSFIYLVWLALLTGTPQVSATDNGKTSDVAWDKYSLSVKGERLFVFSGEFHYQRLPVPELW
701



galactosidase C
LDVFQKLRANGFNTISVYFFWSYHSASEDVFDFTTGAHDIQRLFDYAKQAGLYVIARAGPYCNA




(Lactase C)
ETSAGGFALWAANGQMGSERTSDEAYYKKWKPWILEVGKIIAANQITNGGPVILNQHENELQET




B8N2I5.1
TYDSNDTKVIYMEQVAKAFEEAGVVVPSSHNEKGMRTVSWSTDYKNVGGAVNVYGLDSYPGSLS




GI:300680866
CANPNSGFNLLRTYYQWFQNYSYTQPEYLAEFEGGWFQPWGGSFYDSCASELSPEFADVYYKNN





IGSRVTLHNIYMTFGGTNWGHSAAPVVYTSYDYGSPLRETREIRDKLKQTKLLGLFTRVSKDLL





KTYMEGNGTSYTSDDSIYTWALRNPDSDAGFYVVAHNTSSSREVTTFSLNITTSAGALTIPDIE





LDGRQSKIIVTDYSIGSESSLLYSSAEVLTYATLDVDVLVFYLNAGQKGAFVFKDAPADLKYQT





YGNSNLSALETSQGTQYSYTQGEGVTAVKFSNGVLVYLLDKETAWNFFAPPTVSSPTVAPNEHI





LVFGPYLVRGASIKHDTVEIVGDNSNSTSIEIYTGDEHVKKVSWNGNLIDTRATAYGSLIGTVP





GAEDIEISLPSLSSWKAQDTLPEISPDYDDSRWTICNKTTSVNSVAPLSLPVLYSGDYGYHTGT





KIYRGRFDGQNATGANVTVQNGVAAGWAAWLNGAYVGGFSGDPDKVASWEVLKFNHSSLRSRDN





VLTIITDYTGHDQNSQKPIGTQNPRGIMGATLIGGGNFTLWRIQGNAGGEKNIDPVRGPMNEGG





LYGERMGWHLPGYQVPESALDSSPLEGVSGAEGRFYTTSFQLDLEEDLDVPIGLQLSAPAGTEA





VVQIFMNGYQFGHYLPHIGPQSLFPFPPGVIYNRGQNSLAISMWALTDAGARLEQVELKAYAKY





RSGFDFNRDWTYLQPGWKDRTEYA




Probable beta-
MKLLSVAAVALLAAQAAGASIKHRLNGFTILEHPDPAKRDLLQDIVTWDDKSLFINGERIMLFS
702



galactosidase A
GEVHPFRLPVPSLWLDIFHKIRALGFNCVSFYIDWALLEGKPGDYRAEGIFALEPFFDAAKEAG




(Lactase A)
IYLIARPGSYINAEVSGGGFPGWLQRVNGTLRSSDEPFLKATDNYIANAAAAVAKAQITNGGPV




B8N6V7.1
ILYQPENEYSGGCCGVKYPDADYMQYVMDQARKADIVVPFISNDASPSGHNAPGSGTGAVDIYG




GI:300680889
HDSYPLGFDCANPSVWPEGKLPDNFRTLHLEQSPSTPYSLLEFQAGAFDPWGGPGFEKCYALVN





HEFSRVFYRNDLSFGVSTFNLYMTFGGTNWGNLGHPGGYTSYDYGSPITETRNVTREKYSDIKL





LANFVKASPSYLTATPRNLTTGVYTDTSDLAVTPLIGDSPGSFFVVRHTDYSSQESTSYKLKLP





TSAGNLTIPQLEGTLSLNGRDSKIHVVDYNVSGTNIIYSTAEVFTWKKFDGNKVLVLYGGPKEH





HELAIASKSNVTIIEGSDSGIVSTRKGSSVIIGWDVSSTRRIVQVGDLRVFLLDRNSAYNYWVP





ELPTEGTSPGFSTSKTTASSIIVKAGYLLRGAHLDGADLHLTADFNATTPIEVIGAPTGAKNLF





VNGEKASHTVDKNGIWSSEVKYAAPEIKLPGLKDLDWKYLDTLPEIKSSYDDSAWVSADLPKTK





NTHRPLDTPTSLYSSDYGFHTGYLIYRGHFVANGKESEFFIRTQGGSAFGSSVWLNETYLGSWT





GADYAMDGNSTYKLSQLESGKNYVITVVIDNLGLDENWTVGEETMKNPRGILSYKLSGQDASAI





TWKLTGNLGGEDYQDKVRGPLNEGGLYAERQGFHQPQPPSESWESGSPLEGLSKPGIGFYTAQF





DLDLPKGWDVPLYFNFGNNTQAARAQLYVNGYQYGKFTGNVGPQTSFPVPEGILNYRGTNYVAL





SLWALESDGAKLGSFELSYTTPVLTGYGNVESPEQPKYEQRKGAY




Probable beta-
MLISKTVLSGLALGASFVGVSAQQNSTRWPLHDNGLTDTVEWDHYSFLINGQRHFVFSGEFHYW
703



galactosidase B
RIPVPELWRDLLEKIKAAGFTAFSIYNHWGYHSPKPGVLDFENGAHNFTSIMTLAKEIGLYMII




(Lactase B)
RPGPYVNAEANAGGLPLWTTTGAYGKLRDNDPRYLEALTPYWANISKIIAPHLITNGGNVILYQ




B8NKI4.2
IENEYAEQWLDEETHEPNTSGQEYMQYLEDVARENGIDAPLIHNLPNMNGHSWSKDLSNATGNV




GI:68115
DVIGVDSYPTCWTCNVSECASTNGEYIPYKTLIYYDYFKELSPTQPSFMPEFQGGSYNPWGGPQ





GGCPDDLGPDFANLFYRNLISQRVSAISLYMLYGGTNWGWHASTDVATSYDYSSPISENRKLIE





KYYETKVLTQFTKIAQDLSKVDRLGNSTKYSSNPAVSVAELRNPDTGAAFYVTQHEYTPSGTVE





KFTVKVNTSEGALTIPQYGSQITLNGHQSKIIVTDFKFGSKTLLYSTAEVLTYAVIDGKEVLAL





WVPTGESGEFTVKGVNSAKFADKGRTANIEIHPGANNVTVSFMQRSGMSLVELGDGTRIVLLDR





SAAHVFWSTPLNNDPAEAGNNTVLVHGPYLVRSAKLEGCDLKLTGDIQNSTEVSIFAPKSVCSV





NWNGKKTSVKSAKGGVITTTLGGDAKFELPTISGWKSADSLPEIAKDYSATSKAWVVATKTNSS





NPTPPAPNNPVLYVDENDIHVGNHIYRATFPSTDEPPTDVYLNITGGRAFSYSVWLNSDFIGSW





LGTATTEQNDQTFSFSNATLSTDEDNILVVVMDNSAHDLRDGALNPRGITNATLIGPGSYSFTE





WKLAGNAGFEDHLDPVRAPLNEGSLYAERVGIHLPGYEFDEAEEVSSNSTSLTVPGAGIRVFRT





VVPLSVPQGLDVSISFRLTAPSNVTFTSAEGYTNQLRALLFVNGYQYGRFNPYIGHQIDFPVPP





GVLDYNGDNTIAVTVWSQSVDGAEIKVDWNVDYVHETSFDMNFDGAYLRPGWIEERREYA




Probable beta-
MARFPQLLFLLLASIGLLSAAQNHSDSEWPLHDNGLSTVVQWDHYSFHVHGQRIFVFSGEFHYW
704



galactosidase B
RIPVPGLWRDILEKIKAAGFTAFAFYSSWGYHAPNNHTVDFSTGARDITPIYELAKELGMYIIV




(Lactase B)
RPGPYVNAEASAGGYPLWVTTGAYGSLRNDDARYTAAWKPYFAKMSEITSQYQVTDGHNTFCYQ




Q0CMF3.2
IENEYGQQWIGDPVDRNPNQTAVAYMELLEASARENGIVVPLTANDPNMNTKSWGSDWSHAGGN




GI:300681013
VDVVGLDSYPSCWTCDVTQCTSTNGEYVPYKVMQYYDYFQEVQPTMPGFMPEFQGGSYNPWAGP





EGGCPGDTGVDFANLFYRWNIAQRVTAMSLYMLYGGTNWGAIAAPVTATSYDYSSPISEDRSIG





SKYYETKLLALFTRSATDLTMTDRIGNGTHYTNNPAVAAYELRNPVTNGAFYVTIHADSTVGTD





ESFRLNVNTSAGALTVPSKGSIRLNGHQSKIIVTDFRFGPSHTLLYSTAEVLTHAVMDKKATLV





LWVPTGESGEFAVKGAKSGKVERCPQCSNATFTRKKDVLVVNFTQAGGMSVLQLNNGVRVVLLD





RAAAYKFWAPPLTDDPFAPETDLVLVQGPYLVRSASLSGSTLALRGDSANETALEVFASKKVHT





VTWNGKRIKTSRSSYGSLTASLAAPPAVSLPALSSAQWKSQDSLPERLPSYDDSGPAWVDANHM





TTQNPRTPDTLPVLYADEYGFHNGIRLWRGSFTDAASGVYLNVQGGAAFGWSAYLNGHFLGSHL





GTATTSQANKTLLFPAGTLRKNTTNTILVIHDDTGHDQTTGALNPRGILAARLLAPSDSSTAPN





FTQWRVAGTAGGESDLDPVRGVYNEDGLFAERMGWHLPGFDDADWPANNSTTTRGAQVSLSVTG





ATVRFFRAVVPLHLPRGVDASISFMLGTPAGASTAYRAQLFVNGYQYGRFYPHIGNQVVYPVPA





GVLDYDGENTIGVAVWAQSEAGAEMSLDWRVNYVADSSLDAVRVAAEGALRPGWSEERLQYA




Probable beta-
MLISKTVLSGLALGASFVGVSAQQNSTRWPLHDNGLTDTVEWDHYSFLINGQRHFVFSGEFHYW
705



galactosidase B\
RIPVPELWRDLLEKIKAAGFTAFSIYNHWGYHSPKPGVLDFENGAHNFTSIMTLAKEIGLYMII




(Lactase B)
RPGPYVNAEANAGGLPLWTTTGAYGKLRDNDPRYLEALTPYWANISKIIAPHLITNDGNVILYQ




Q2U6P1.2
IENEYAEQWLDEETHEPNTSGQEYMQYLEDVARENGIDAPLIHNLPNMNGHSWSKDLSNATGNV




GI:300681012
DVIGVDSYPTCWTCNVSECASTNGEYIPYLKLTYPQISYFKELSPTQPSFMPEFQGGSYNPWGG





PQGGCPDDLGPDFANLFYRNLISQRVSAISLYMLYGGTNWGWHASTDVATSYDYSSPISENRKL





IEKYYETKVLTQFTKIAQDLSKVDRLGNSTKYSSNPAVSVAELRNPDTGAAFYVTQHEYTPSGT





VEKFTVKVNTSEGALTIPQYGSQITLNGHQSKIIVTDFKFGSKTLLYSTAEVLTYAVIDGKEVL





ALWVPTGESGEFTVKGVNSAKFADKGRTANIEIHPGTNNVTVSFMQRSGMSLVELGDGTRIVLL





DRSAAHVFWSTPLNNDPAEAGNNTVLVHGPYLVRSAKLEGCDLKLTGDIQNSTEVSIFAPKSVC





SVNWNGKKTSVKSAKGGVITTTLGGDAKFELPTISGWKSADSLPEIAKDYSATSKAWVVATKTN





SSNPTPPAPNNPVLYVDENDIHVGNHIYRATFPSTDEPPTDVYLNITGGRAFGYSVWLNSDFIG





SWLGTATTEQNDQTFSFSNATLSTDEDNILVVVMDNSAHDLRDGALNPRGITNATLIGPGSYSF





TEWKLAGNAGFEDHLDPVRAPLNEGSLYAERVGIHLPGYEFDEAEEVSSNSTSLTVPGAGIRVF





RTVVPLSVPQGLDVSISFRLTAPSNVTFTSAEGYTNQLRALLFVNGYQYGRFNPYIGHQIDFPV





PPGVLDYNGDNTIAVTVWSQSVDGAEIKVDWNVDYVHETSFDMNFDGAYLRPGWIEERREYA




Probable beta-
MKLLSVAAVALLAAQAAGASIKHRLNGFTILEHPDPAKRDLLQDIVTWDDKSLFINGERIMLFS
706



galactosidase A
GEVHPFRLPVPSLWLDIFHKIRALGFNCVSFYIDWALLEGKPGDYRAEGIFALEPFFDAAKEAG




(Lactase A)
IYLIARPGSYINAEVSGGGFPGWLQRVNGTLRSSDEPFLKATDNYIANAAAAVAKAQITNGGPV




Q2UCU3.1
ILYQPENEYSGGCCGVKYPDADYMQYVMDQARKADIVVPFISNDASPSGHNAPGSGTGAVDIYG




GI:121801672
HDSYPLGFDCANPSVWPEGKLPDNFRTLHLEQSPSTPYSLLEFQAGAFDPWGGPGFEKCYALVN





HEFSRVFYRNDLSFGVSTFNLYMTFGGTNWGNLGHPGGYTSYDYGSPITETRNVTREKYSDIKL





LANFVKASPSYLTATPRNLTTGVYTDTSDLAVTPLIGDSPGSFFVVRHTDYSSQESTSYKLKLP





TSAGNLTIPQLEGTLSLNGRDSKIHVVDYNVSGTNIIYSTAEVFTWKKFDGNKVLVLYGGPKEH





HELAIASKSNVTIIEGSDSGIVSTRKGSSVIIGWDVSSTRRIVQVGDLRVFLLDRNSAYNYWVP





ELPTEGTSPGFSTSKTTASSIIVKAGYLLRGAHLDGADLHLTADFNATTPIEVIGAPTGAKNLF





VNGEKASHTVDKNGIWSSEVKYAAPEIKLPGLKDLDWKYLDTLPEIKSSYDDSAWVSADLPKTK





NTHRPLDTPTSLYSSDYGFHTGYLIYRGHFVANGKESEFFIRTQGGSAFGSSVWLNETYLGSWT





GADYAMDGNSTYKLSQLESGKNYVITVVIDNLGLDENWTVGEETMKNPRGILSYKLSGQDASAI





TWKLTGNLGGEDYQDKVRGPLNEGGLYAERQGFHQPQPPSESWESGSPLEGLSKPGIGFYTAQF





DLDLPKGWDVPLYFNFGNNTQAARAQLYVNGYQYGKFTGNVGPQTSFPVPEGILNYRGTNYVAL





SLWALESDGAKLGSFELSYTTPVLTGYGNVESPEQPKYEQRKGAY




Probable beta-
MRLLSFIYLVWLALLTGTPQVSATDNGKTSDVAWDKYSLSVKGERLFVFSGEFHYQRLPVPELW
707



galactosidase C
LDVFQKLRANGFNTISVYFFWSYHSASEDVFDFTTGAHDIQRLFDYAKQAGLYVIARAGPYCNA




(Lactase C)
ETSAGGFALWAANGQMGSERTSDEAYYKKWKPWILEVGKIIAANQITNGGPVILNQHENELQET




Q2UMD5.1
TYDSNDTKVIYMEQVAKAFEEAGVVVPSSHNEKGMRTVSWSTDYKNVGGAVNVYGLDSYPGSLS




GI:121804415
CANPNSGFNLLRTYYQWFQNYSYTQPEYLAEFEGGWFQPWGGSFYDSCASELSPEFADVYYKNN





IGSRVTLHNIYMTFGGTNWGHSAAPVVYTSYDYGSPLRETREIRDKLKQTKLLGLFTRVSKDLL





KTYMEGNGTSYTSDDSIYTWALRNPDSDAGFYVVAHNTSSSREVTTFSLNITTSAGAMTIPDIE





LDGRQSKIIVTDYSIGSESSLLYSSAEVLTYATLDVDVLVFYLNAGQKGAFVFKDAPADLKYQT





YGNSNLSALETSQGTQYSYTQGEGVTAVKFSNGVLVYLLDKETAWNFFAPPTVSSPTVAPNEHI





LVFGPYLVRGASIKHDTVEIVGDNSNSTSIEIYTGDEHVKKVSWNGNLIDTRATAYGSLIGTVP





GAEDIEISLPSLSSWKAQDTLPEISPDYDDSRWTICNKTTSVNSVAPLSLPVLYSGDYGYHTGT





KIYRGRFDGQNATGANVTVQNGVAAGWAAWLNGAYVGGFSGDPDKVASWEVLKFNHSSLRSRDN





VLTIITDYTGHDQNSQKPIGTQNPRGIMGATLIGGGNFTLWRIQGNAGGEKNIDPVRGPMNEGG





LYGERMGWHLPGYQVPESALDSSPLEGVSGAEGRFYTTSFQLDLEEDLDVPIGLQLSAPAGTEA





VVQIFMNGYQFGHYLPHIGPQSLFPFPPGVIKNRGQNSLAISMWALTDAGARLEQVELKAYAKY





RSGFDFNRDWTYLQPGWKDRTEYA




Probable beta-
MKSLLKRLIALAAAYSVAAAPSFSHHSSQDAANKRELLQDLVTWDQHSLFVRGERLMIFSGEFH
708



galactosidase E
PFRLPVPGLWFDVFQKIKSLGFNAVSFYTDWGLMEGNPGHVVTDGIWSLDEFFTAAREAGLYLI




(Lactase E)
ARPGPYINAETSAGGIPGWVLRRKGIIRSNSEDYLRATDTYMATLGKIIAKAQITNGGPVILVQ




Q4WG05.1
PENEYTTWPNVSESEFPTTMNQEVMAYAEKQLRDAGVVVPTVVNDNKNLGYFAPGTGLGETDLY




GI:74668464
GIDAYPMRYDCGNPYVWPTYRFPRDWQHEHRNHSPTTPFAIMEFQGGSGDGWGGVTEDGCAILV





NNEAVRVVYKNNYGFGVRVFNIYMTYGGTNWGNLGYYGGYTSYDYGAAITEDRQIWREKYSEEK





LQANFLKVSPAYLTSTPGNGVNGSYTGNKDITVTPLFGNGTTTNLYLVRHADFTSTGSAQYNLS





ISTSVGNVTIPQLGGSLSLNGRDSKFHITDYDVGGFNLIYSSAEVFTWAKGDNKKRVLVLYGGA





GELHEFALPKHLPRPTVVEGSYVKIAKQGSAWVVQWEVAAQRRVLRAGKLEIHLLWRNDAYQHW





VLELPAKQPIANYSSPSKETVIVKGGYLLRSAWITDNDLHLTGDVNVTTPLEVISAPKRFDGIV





FNGQSLKSTRSKIGNLAATVHYQPPAISLPDLKRLDWKYIDSLPEISTEYNDEGWTPLTNTYTN





NTREFTGPTCLYADDYGYHGGSLIYRGHFTANGDESWVFLNTSGGVGFANSVWLNQTFLGSWTG





SGRNMTYPRNISLPHELSPGEPYVFTVVIDHMGQDEEAPGTDAIKFPRGILDYALSGHELSDLR





WKMTGNLGGEQYQDLTRGPLNEGAMYAERQGYHLPSPPTSSWKSSNPIKEGLTGAGIGFYATSF





SLDLPEGYDIPLSFRFNNSASAARSGTSYRCQLFVNGYQFGKYVNDLGPQTKFPVPEGILNYNG




Probable beta-
MRIFSFLFLLLLGILTGQGLVSGTDNGKTTDVTWDKYSLSVKGQRLFVFSGEFHYQRLPVPELW
709



galactosidase C
LDVFQKLRANGFNAISVYFFWSFHSASEGEFDFENGAHDIQRLFDYAKEAGLYVIARAGPYCNA




(Lactase C)
ETSAGGFALWAANGQMGNERTSDEAYYEKWRPWILEVGKIIAKNQITNGGPVILNQHENELVET




Q4WNE4.1
TYDPNHTLVVYMKQIAQVFEEAGIVVPSSHNEKGMRGVSWSTDYHNVGGAVNIYGLDSYPGGLS




GI:74671041
CTNPNSGFNLVRTYHQWFQNYSFTQPSYLPEFEGGWFQPWGGSFYDTCATELSPEFPDVYYKNN





IGSRVTLHSIYMTYGGTNWGHSAAPVVYTSYDYAAPLRETREIRDKLKQTKLIGLFTRVSKDLL





KTYMEGNGTGYTSDSSIYTWSLRNPDTNAGFYVLAHSTSSTRDVTTFTLNVTTSAGAISIPDIE





LNGRQSKIIVTDYNFGTNSTLLFSSAEVLTYANLDVNVLVFYLNVGQKGTFVFKDEPKLAFQTY





GNSNLTTSESSYGTQYSYTQGKGVTAVKFSNGVLAYFLDKESAWNFFAPPTTSSPQVAPNEHIL





VQGPYLVRGASVNHGTVEITGDNANTTSIEVYTGNSQVKKIKWNGKTIETRKTAYGSLIGTAPG





AEDVKIQLPSLDSWKAQDTLPEIQPDYDDSKWTVCNKTTSVNAIAPLSLPVLYSGDYGYHAGTK





VYRGRFDGRNVTGANVTVQNGAAAGWAAWVNGQYAGGSAGSPNLAATSAVLTFNSSSLKDQDNV





LTVVTDYTGHDQNSVRPKGTQNPRGILGATLIGGGNFTSWRIQGNAGGEKNIDPVRGPMNEGGL





YGERMGWHLPGYKVPKSASKSSPLDGVSGAEGRFYTTTFKLKLDKDLDVPIGLQLGAPEGTKAV





VQVFMNGYQFGHYLPHTGPQSLFPFPPGVINNRGENTLAISMWALTDAGAKLDKVELVAYGKYR





SGFDFNQDWGYLQPGWKDRSQYA




Probable beta-
MAHIYRLLLLLLSNLWFSTAAQNQSETEWPLHDNGLSKVVQWDHYSFQVNGQRIFIFSGEFHYW
710



galactosidase B
RIPVPELWRDILEKVKATGFTAFAFYSSWAYHAPNNSTVDFSTGARDITPIFELAKELGMYMIV




(Lactase B)
RPGPYVNAEASAGGFPLWLMTGEYGSLRNDDPRYTAAWTPYFANMSQITSKYQVTDGHNTLVYQ




Q4WRD3.1
IENEYGQQWIGDPKNRNPNKTAVAYMELLEASARENGITVPLTSNDPNMNSKSWGSDWSNAGGN




GI:74672078
VDVAGLDSYPSCWTCDVSQCTSTNGEYVPYKVIDYYDYFQEVQPTLPSFMPEFQGGSYNPWAGP





EGGCPQDTSAEFANLFYRWNIGQRVTAMSLYMLYGGTNWGAIAAPVTATSYDYSAPISEDRSIG





AKYSETKLLALFTRTAKDLTMTEAIGNGTQYTTNTAVRAFELRNPQTNAGFYVTFHTDTTVGGN





QAFKLHVNTSVGALTVPKNEGLIQLNGHQSKIIVTDFTLGKRTLLYSTAEVLTYAVFENRPTLV





LWVPTGESGEFAIKGAKSGKVENGDGCSGIKFKREKDYLVVNFSQAKGLSVLRLDNGVRVVLLD





KAAAYRFWAPALTDDPNVQETETVLVHGPYLVRSASISKTTLALRGDSVEKTTLEIFAPHSVRK





ITWNGKEVQTSHTPYGSLKATLAAPPDIKLPALTSWRSNDSLPERLPSYDDSGPAWIEANHMTT





SNPSPPATFPVLYADEYGFHNGVRLWRGYFNGSASGVFLNIQGGSAFGWSAWLNGHFLDSHLGT





ATTSQANKTLTFPSSILNPTENVLLIVHDDTGHDQTTGALNPRGILEARLLSNDTSSPPPEFTH





WRLAGTAGGESNLDPIRGVFNEDGLFAERMGWHLPGFDDSAWTSENSATSASSALSFTGATVRF





FRSVVPLNIPAGLDVSISFVLSTPTAAPKGYRAQLFVNGYQYGRYNPHIGNQVVFPVPPGILDY





QGDNTIGLAVWAQTEEGAGIQVDWKVNYVADSSLSVAGFGKGLRPGWTEERLKFA




Probable beta-
MKLLSVCAIALLAAQAAGASIKHMLNGFTLMEHSDPAKRELLQKYVTWDEKSLFVNGERIMIFS
711



galactosidase A
GEVHPFRLPVPSLWLDVFQKIKALGFNCVSFYVDWALLEGKPGEYRAEGNFALEPFFDVAKQAG




(Lactase A)
IYLLARPGPYINAEASGGGFPGWLQRVNGTLRTSDPAYLKATDNYIAHVAATIAKGQITNGGPV




Q4WS33.2
ILYQPENEYSGACCDATFPDGDYMQYVIDQARNAGIVVPLINNDAWTGGHNAPGTGKGEVDIYG




GI:300681010
HDSYPLGFDCGHPSVWPKGNLPTTFRTDHLKQSPTTPYSLIEFQAGSFDPWGGPGFAACAALVN





HEFERVFYKNDLSFGAAILNLYMTFGGTNWGNLGHPGGYTSYDYGSPLTESRNVTREKYSELKL





IGNFVKASPSYLLATPGNLTTSGYADTADLTVTPLLGNGTGSYFVVRHTDYTSQASTPYKLSLP





TSAGRLTVPQLGGTLTLNGRDSKIHVVDYNVAGTNIIYSTAEVFTWKNFGDSKVLILYGGPGEH





HELAVSLKSDVQVVEGSNSEFKSKKVGDVVVVAWDVSPSRRIVQIGDLKIFLLDRNSVYNYWVP





QLDKDDSSTGYSSEKTTASSIIVKAGYLVRTAYTKGSGLYLTADFNATTPVEVIGAPSNVRNLY





INGEKTQFKTDKNGIWSTEVKYSAPKIKLPSMKDLDWKYLDTLQEVQSTYDDSAWPAADLDTTP





NTLRPLTTPKSLYSSDYGFHTGYLIYRGHFVADGSETTFDVRTQGGSAFGSSVWLNESFLGSWT





GLNANADYNSTYKLPQVEQGKNYVLTILIDTMGLNENWVVGTDEMKNPRGILSYKLSGRDASAI





TWKLTGNLGGEDYQDKIRGPLNEGGLYAERQGFHQPQPPSQKWKSASPLDGLSKPGIGFYTAQF





DLDIPSGWDVPLYFNFGNSTKSAYRVQLYVNGYQYGKFVSNIGPQTSFPVPQGILNYQGTNWVA





LTLWALESDGAKLDDFELVNTTPVMTALSKIRPSKQPNYRQRKGAY




Probable beta-
MKLSSACAIALLAAQAAGASIKHRINGFTLTEHSDPAKRELLQKYVTWDDKSLFINGERIMIFS
712



galactosidase A
GEFHPFRLPVKELQLDIFQKVKALGFNCVSFYVDWALVEGEPGEYRADGIFDLEPFFDAASEAG




(Lactase A)
IYLLARPGPYINAESSGGGFPGWLQRVNGTLRSSDKAYLDATDNYVSHVAATIAKYQITNGGPI




Q4ZHV7.1
ILYQPENEYTSGCCGVEFPDPVYMQYVEDQARNAGVVIPLINNDASASGNNAPGTGKGAVDIYG




GI:74645200
HDSYPLGFDCANPTVWPSGDLPTNFRTLHLEQSPTTPYAIVEFQGGSYDPWGGPGFAACSELLN





NEFERVSYKNDFSFQIAIMNLYMIFGGTNWGNLGYPNGYTSYDYGSAVTESRNITREKYSELKL





LGNFAKVSPGYLTASPGNLTTSGYADTTDLTVTPLLGNSTGSFFVVRHSDYSSEESTSYKLRLP





TSASSVTIPQLGGTLTLNGRDSKIHVTDYNVSGTNIIYSTAEVFTWKKFADGKVLVLYGGAGEH





HELAISTKSNVTVIEGSESGISSKQTSSSVVVGWDVSTTRRIIQVGDLKILLLDRNSAYNYWVP





QLATDGTSPGFSTPEKVASSIIVKAGYLVRTAYLKGSGLYLTADFNATTSVEVIGVPSTAKNLF





INGDKTSHTVDKNGINSATVDYNAPDISLPSLKDLDWKYVDTLPEIQSSYDDSLWPAADLKQTK





NTLRSLTTPTSLYSSDYGFHTGYLLYRGHFTATGNESTFAIDTQGGSAFGSSVWLNGTYLGSWT





GLYANSDYNATYNLPQLQAGKTYVITVVINNMGLEENWTVGEDLMKTPRGILNFLLAGRPSSAI





SWKLTGNLGGEDYEDKVRGPLNEGGLYAERQGFHQPEPPSQDWKSSSPLEGLSEAGIGFYSASF





DLDLPKGWDVPLFLNIGNSTTPSPYRVQVYVNGYQYAKYISNIGPQTSFPVPEGILNYRGTNWL





AVTLWALDSAGGKLESLELSYTTPVLTALGEVESVDQPKYKKRKGAY




Probable beta-
MATAFWLLLFLLGSLHVLTAAQNSSQSEWPIHDNGLSKVVQWDHYSFYINGQRIFLFSGEFHYW
713



galactosidase B
RIPVPALWRDILEKIKAIGFTGFAFYSSWAYHAPNNQTVDFSTGARDITPIYDLAKELGMYIIV




(Lactase B)
RPGPYVNAEASAGGFPLWLTTGAYGSTRNDDPRYTAAWEPYFAEVSEITSKYQVTDGHYTLCYQ




Q5BEQ0.2
IENEYGQQWIGDPRDRNPNQTAIAYMELLQASARENGITVPLTGNDPNMNTKSWGSDWSDAGGN




GI:300681009
LDTVGLDSYPSCWSCDVSVCTGTNGEYVPYKVLDYYDYFQEVQPTMPFFMPEFQGGSYNPWDGP





EGGCTEDTGADFANLFYRWNIGQRVSAMSLYMMFGGTNWGGIAAPVTASSYDYSAPISEDRSIG





SKYYETKLLALFTRCAKDLTMTDRLGNGTQYTDNEAVIASELRNPDTNAAFYVTTHLDTTVGTD





ESFKLHVNTSKGALTIPRHGGTIRLNGHHSKIIVTDFNFGSETLLYSTAEVLTYAVFDRKPTLV





LWVPTGESGEFAIKGAKSGSVAKCSGCSNIKFHRDSGSLTVAFTQGEGISVLQLDNGVRVVLLD





RQKAYTFWAPALTDNPLVPEGESVLVSGPYLVRTARLARSTLTLRGDSKGETLEIFAPRKIKKV





TWNGKAVEATRTSYGSLKAILAKPPSVELPTLNGWKYSDSLPERFPTYDDSGAAWVEIDANHMT





TPNPNKPATLPVLYADEYGFHNGVRLWRGYFNSSASGVYLNIQGGAAFGWSAWLNGHFLGSHLG





SASIQQANGTLDFPANTLNTEGTPNVLLVVHDDTGHDQTTGVLNPRGILEARLLSEASDNNDDD





SPGFTHWRVAGTAGGESDLDPVRGVYNEDGLYAERVGWHLPGFDDSKWATVNGTSLSFTGATVR





FFRTVIPPLSIPENTDVSISFVFSTPNVNNTSAGNTSAFRAQLFVNGYQYGRYNPYVGNQVVYP





VPPGILDYNGENTIGVAVWAQTEAGARLNLDWRVNYVLGSSLDAGRLDLSFVAIAYVYIFECLQ





L




Probable beta-
MRLLPVWTAALLAAQAAGVALTHKLNGFTITEHPDAEKRELLQKYVTWDDKSLFINGERIMIFG
471



galactosidase A
AEIHPWRLPVPSLWRDILQKVKALGFNCVSFYVDWALLEGKPGEYRAEGSFAWEPFFDAASDLG




(Lactase A)
IYLIARPGPYINAEASGGGFPGWLQRLNGTIRSSDQSYLDATENYVSHIGGLIAKYQITNGGPV




Q5BFC4.2
ILYQPDNEYSGGCCGQEFPNPDYFQYVIDQARRAGIVVPTISNDAWPGGHNAPGTGKGEVDIYG




GI:300681016
HDNYPLGFDCANPDVWPEGNLPTDYRDLHLEISPSTPYALVEYQVGAFDPWGGPGFEQCAALTG





YEFERVFHKNTFSFGVGILSLYMTFGGTNWGNLGHPGGYTSYDYGSPIKETREITREKYSELKL





LGNFIKSSPGYLLATPGKLTNTTYTNTADLTVTPLLGNGTGSFFVLRHSDYSSQASTPYKLRLP





TSAGQLTIPQLGGSLVLNGRDSKVHLVDYDVAGTKILYSTAEVFTWKKFHDGKVLVLYGGPGEH





HELAVSSKAKVKVVEGLGSGISSKQIRGAVVVAWDVEPARRIVQIGDLKIFLLDRNSAYNYWVP





QLGTETSIPYATEKAVAASVIVKAGYLVRTAYVKGRDLHLTADFNATTPVEVIGAPKTAENLFI





NGKKAHHTVDKNGIWSTEVGYSPPKIVLPVLEDLKWKSIDTLPEIQPSYDDSPWPDANLPTKNT





IYPLRTPTSLYASDYGFHTGYLLFRGHFTANGRESNFSIQTQGGQAFGSSVWLSGTYLGSWTGD





NDYQDYNATYTLPSLKAGKEYVFTVVVDNMGLNENWIVGQDEMKKPRGILNYELSGHEASDITW





KLTGNFGGEDYVDKVRGPLNEGGLYAERHGYHQPYPPTKSKDWKSSTPLTGLSKPGISFYTASF





DLDIKSGWDVPIYFEFGNSTTPAPAYRVQLYVNGWQYGKYVNNIGPQTRFPVPEGILNYKGTNW





VAVTLWALEGSGAKLDSFKLVHGIPVRTALDVEGVELPRYQSRKGVY




Lactase, partial
SIKHRINGFTLTEHSDPAKRELLQKYVTWDDKSLFINGERIMIFSGEFHPFRLPVKELQLDIFQ
715



[Aspergillus
KVKALGFNCVSFYVDWALVEGKPGEYGADGIFDLEPFFDAASEAGIYLLARPGPYINAESSGGG





niger]

FPGWLQRVNGTLRTSDKAYLEATDNYVSHIAATIAKYQITNGGPIILYQPENEYTGGCCGVEFP




ABL07484.1
DPVYMQYVEDQARNAGVVIPLINNDASASGHNAPGTGEGAVDIYGHDSYPLGFDCANPTVWPSG




GI:118582212
DLPTNFRTLHLVQSPTTPYAIVEFQGGSYDPWGGPGFAACSELLNNEFERVFYKNDFSFQIAIM





NLYMIFGGTNWGNLGYPNGYTSYDYGSAVTESRNITREKYSELKLLGNFAKVSPGYLTASPGNL





TTSGYADTTDLTVTPLLGNSTGSFFVVRHSDYSSEDSTSYKLRLPTSAGTVTIPQLGGTLTLNG





RDSKIHVTDYNVSGTNIIYSTAEVFTWKKFADGKVLVLYGGAGEHHELAISTKSNVTVIEGSES





GISSKQTSSSVIVGWDVSTTRRIIQVGDLKVLLLDRNSAYNYWVPQLATDGTSPGFSTSETVAS





SIIVKAGYLVRTAYLKGSGLYLTADFNATTSVEVIGVPSTAKNLFINGDKTSHTVDKNGINSAT





VEYNAPDISLPSLKDLDWKYVDTLPEIQSSYDDSLWPAADLKQTKNTLRSLTTPTSLYSSDYGF





HTGYLLYRGHFTATGNESTFSIDTQGGSAFGSSVWLNGTYLGSWTGLYVNSDYNATYKLPQLQA





GKSYVITVVIDNMGLEENWTVGEDLMKTPRGILNFLLAGRPGSAISWKLTGNLGGEDYEDKVRG





PLNEGGLYAERQGFHQPEPPSGNWKSSSPLEGLSEAGIGFYSAKFDLDLPKGWDVPLFLNIGNS





TTPSPYRVQVYVNGYQYAKYISNNGPQTSFPVPEGILNYRGTNWLAVTLWALDSAGGKLESLEL





SYTTPVLTALGEVESVDQPKYKKRKGAY




Unnamed protein
MGSTSTSTLPPDFLWGFATASYQIEGAVNEDGRGPSIWDTFCKIPGKIAGGANGDVACDSYHRT
716



product
HEDIALLKACGAKAYRFSLSWSRIIPLGGRNDPINEKGLQYYIKFVDDLHAAGITPLVTLFHWD




[Aspergillus
LPDELDKRYGGLLNKEEFVADFAHYARIVFKAFGSKVKHWITFNEPWCSSVLGYNVGQFAPGRT




oryzae RIB40]
SDRSKSPVGDSSRECWIVGHSLLVAHGAAVKIYRDEFKASDGGEIGITLNGDWAEPWDPENPAD




BAE57671.1
VEACDRKIEFAISWFADPIYHGKYPDSMVKQLGDRLPKWTPEDIALVHGSNDFYGMNHYCANFI




GI:83767532
KAKTGEADPNDTAGNLEILLQNKKGEWVGPETQSPWLRPSAIGFRKLLKWLSERYNYPKIYVTE





NGTSLKGENDLPLEQLLQDDFRTQYFRDYIGAMADAYTLDGVNVRAYMAWSLME




Unnamed protein
MARVRLKLPADFIWGVSSSSWQIEGGLQLEGRGPSVLDTIGNVLSPEAADRSDANVANMHYFMY
717



product
EQDIARLAAAGIPYYSFSLSWPRIVPFGVAGSPVNTQGLDHYDDLINTCIKYGVTPIVTLNHVD




[Aspergillus
APTAVQADLDSLPEHFLYYAKIVMTRYADRVPYWVTFNEPNIGVGTLFQKYQDLTSALIAHADV




oryzae RIB40]
YDWYKNTLGGTGKITMKFANNLAMPLDTQDSSHIAAASRYQDILLGIMSNPLFLGKQYPDAAID




BAE62705.1
TVDMMQPLTDDQIKHIHGKIDFWSFDPYTAQYASPLPQGTEACASNSSDPFWPTCVILSNVQAN




GI:83772577
GWLMGQASNAYAYLAPQYVRQQLGYIWNTFRPSGILIAEYGFNPFLESNRTLDAQRYDLERTLY





YQDFLTETLKAIHEDNVNVIGALAWSIADNNEFGSYEEQYGLQTVNRTNGKFTRTYKRSLFDYV





DFFHRHVQSA




Unnamed protein
MNVNMFKAGDDILQDVDQSCKDRLPAVEELPLPPSFTWGTATAAYQVEGGAFQDGKGKSIWDTF
718



product
THLDPSRTNGENGDIACDHYNRMAEDVVLMASYGVDVYRFSIAWARILPLGGRGDPINEKGIAF




[Aspergillus
YNNLIDCLLEHNIEPVVTLYHWDVPQGLYDRYGAFLDTTEFRADFEHFARLCFSRFGDRVKRWI




oryzae RIB40]
TFNEPYIISIFGHHSGVLAPGRSSATGGDSRTEPWRVGHTIILAHTAAVQAYATDFQPTQKGDI




BAE63197.1
SIVLNGHYYEPWDAGSEEHRLAAQRRLEFYIGWFGDPIFLGKDYPAPMRAQLGSRLPEFTSEEL




GI:83773069
DLLRRSAPINSFYGMNHYTTKYARALPDPPAEDDCTGNVEEGPTNSEGKTMGPLSGMSWLRVTP





AGFRKLLNWVWDRYRRPIVVTENGCPCPGESQMTKEQALDDQFRIRYFGLYLDAISRAIYDDGV





KVEGYYVWSLMDNFEWSAGYGPRYGITHVDFTTLVRTPKQSAKYLHHSFNKRRATSLR




Beta-galactosidase
MTLSAVPDYENQHILQRNRLKPRAYFLPATSISLNGRWDFHYAASPVSAPEPTWSKGTKNATAE
719



[Aspergillus
PRRDSNQFSSDGADSKTAWAPITVPGHWQLQGYGRPHYTNVIYPFPVCPPFVPTENPTGTYRRT




fumigatus Af293]
FHVPAEWDASSQLRLRFDGVDSAYHVWVNGVPIGYSQGSRNPAEFDVSQVVDRDGANELFVRVY




XP_753202.1
QWSDGSYIEDQDQWWLSGIFRDVTLLAFPGQARIEDFFVRTALDKDYVDATLRLSVDLALATAA




GI:70996895
IVQVTLSNPSTGSTLQTEKYSLGEKQDKLEAELSVSNPNKWTAETPNLYNLCIALYVDGAKDPV





QTINHRVGFRQVEIKNGNITVNGVPVMFRGVNRHDHHPRFGRAVPLSFLREDLLIMKRHNVNAL





RCSHYPSHPRLYELCDELGLWVMDEADLECHGFYDAIARPLDIPESMDYEERKKLTFGQAAQFT





TNNPEWKEAYVDRMAQMVQRDKNHSCIVIWSLGNEAFYGSNHQAMYDYVKQVDPSRPVHYEGDM





EAKTVDMYSYMYPSLERLVGFATAEGDEFKKPIVLCEYAHAMGNAPGGLEEYMEAFRTHRRLQG





GWVWEWANHGLWDEKKGWYGYGGDFGDTPHDGNFVLDGLLFSDHTPTPGITELKKAYAPVRVWP





GEDGTLVVANDYNFVGLEGLQASYKIEVLGDSGRIIATGIIELPPIPAGQNGTIKLPSAPATAI





PGEVWLTISFLQKGETAWAGNNYEVAWYQQCLKSSSPRFSLAVPAEALTHSSTKTSHRISGASF





SLEFSRETGSLYAWTAGGLSLLDQSSSTGAISPGFWRPPTDNDMSHDLLEWRRFGLDTLTSQLR





KMHVVQHTPTSVEVTTETYISAPILGWGFFASTSYTISGNGALTVNVHLKPHGPMPADLPRLGL





DVLLADELDNTSWFGLGPGEAYPDKKRAQKVGIYNAATAELHTPYEVPQEGGNRMDTRWLRVHD





SRGWGLRVTRVKDESDKQPTELFQWLATRYSPEAIEAAKHAPELVPEKRIRLRLDVESCGVGTG





ACGPRTLDKYRVKCEERKFGFTLQPVLAELC




Beta-
MTLSAVPDYENQHILQRNRLKPRAYFLPATSISLNGRWDFHYAASPVSAPEPTWSKGTKNATAE
720



galactosidase,
PRRDSNQFSSDGADSKTAWAPITVPGHWQLQGYGRPHYTNVIYPFPVCPPFVPTENPTGTYRRT




putative
FHVPAEWDASSQLRLRFDGVDSAYHVWVNGVPIGYSQGSRNPAEFDVSQVVDRDGANELFVRVY




[Aspergillus
QWSDGSYIEDQDQWWLSGIFRDVTLLAFPGQARIEDFFVRTALDKDYVDATLRLSVDLALATAA




fumigatus Af293]
IVQVTLSNPSTGSTLQTEKYSLGEKQDKLEAELSVSNPNKWTAETPNLYNLCIALYVDGAKDPV




EAL91164.1
QTINHRVGFRQVEIKNGNITVNGVPVMFRGVNRHDHHPRFGRAVPLSFLREDLLIMKRHNVNAL




GI:66850838
RCSHYPSHPRLYELCDELGLWVMDEADLECHGFYDAIARPLDIPESMDYEERKKLTFGQAAQFT





TNNPEWKEAYVDRMAQMVQRDKNHSCIVIWSLGNEAFYGSNHQAMYDYVKQVDPSRPVHYEGDM





EAKTVDMYSYMYPSLERLVGFATAEGDEFKKPIVLCEYAHAMGNAPGGLEEYMEAFRTHRRLQG





GWVWEWANHGLWDEKKGWYGYGGDFGDTPHDGNFVLDGLLFSDHTPTPGITELKKAYAPVRVWP





GEDGTLVVANDYNFVGLEGLQASYKIEVLGDSGRIIATGIIELPPIPAGQNGTIKLPSAPATAI





PGEVWLTISFLQKGETAWAGNNYEVAWYQQCLKSSSPRFSLAVPAEALTHSSTKTSHRISGASF





SLEFSRETGSLYAWTAGGLSLLDQSSSTGAISPGFWRPPTDNDMSHDLLEWRRFGLDTLTSQLR





KMHVVQHTPTSVEVTTETYISAPILGWGFFASTSYTISGNGALTVNVHLKPHGPMPADLPRLGL





DVLLADELDNTSWFGLGPGEAYPDKKRAQKVGIYNAATAELHTPYEVPQEGGNRMDTRWLRVHD





SRGWGLRVTRVKDESDKQPTELFQWLATRYSPEAIEAAKHAPELVPEKRIRLRLDVESCGVGTG





ACGPRTLDKYRVKCEERKFGFTLQPVLAELC




Beta-galactosidase
MKLLSVAAVALLAAQAAGASIKHRLNGFTILEHPDPAKRDLLQDIVTWDDKSLFINGERIMLFS
721



[Aspergillus
GEVHPFRLPVPSLWLDIFHKIRALGFNCVSFYIDWALLEGKPGDYRAEGIFALEPFFDAAKEAG





candidus]

IYLIARPGSYINAEVSGGGFPGWLQRVNGTLRSSDEPFLKATDNYIANAAAAVAKAQITNGGPV




CAD24293.1
ILYQPENEYSGGCCGVKYPDADYMQYVMDQARKADIVVPFISNDASPSGHNAPGSGTGAVDIYG




GI:18958133
HDSYPLGFDCANPSVWPEGKLPDNFRTLHLEQSPSTPYSLLEFQAGAFDPWGGPGFEKCYALVN





HEFSRVFYRNDLSFGVSTFNLYMTFGGTNWGNLGHPGGYTSYDYGSPITETRNVTREKYSDIKL





LANFVKASPSYLTATPRNLTTGVYTDTSDLAVTPLMGDSPGSFFVVRHTDYSSQESTSYKLKLP





TSAGNLTIPQLEGTLSLNGRDSKIHVVDYNVSGTNIIYSTAEVFTWKKFDGNKVLVLYGGPKEH





HELAIASKSNVTIIEGSDSGIVSTRKGSSVIIGWDVSSTRRIVQVGDLRVFLLDRNSAYNYWVP





ELPTEGTSPGFSTSKTTASSIIVKAGYLLRGAHLDGADLHLTADFNATTPIEVIGAPTGAKNLF





VNGEKASHTVDKNGIWSSEVKYAAPEIKLPGLKDLDWKYLDTLPEIKSSYDDSAWVSADLPKTK





NTHRPLDTPTSLYSSDYGFHTGYLIYRGHFVANGKESEFFIRTQGGSAFGSSVWLNETYLGSWT





GADYAMDGNSTYKLSQLESGKNYVITVVIDNLGLDENWTVGEETMKNPRGILSYKLSGQDASAI





TWKLTGNLGGEDYQDKVRGPLNEGGLYAERQGFHQPQPPSESWESGSPLEGLSKPGIGFYTAQF





DLDLPKGWDVPLYFNFGNNTQAARAQLYVNGYQYGKFTGNVGPQTSFPVPEGILNYRGTNYVAL





SLWALESDGAKLGSFELSYTTPVLTGYGDVESPEQPKYEQRKGAY




Beta-galactostdase
MKLSSACAIALLAAQAAGASIKHRINGFTLTEHSDPAKRELLQKYVTWDDKSLFINGERIMIFS
722



(Lactase-N;
GEFHPFRLPVKELQLDIFQKVKALGFNCVSFYVDWALVEGKPGEYRADGIFDLEPFFDAASEAG




Lactase;
IYLLARPGPYINAESSGGGFPGWLQRVNGTLRSSDKAYLDATDNYVSHVAATIAKYQITNGGPI




Tilactase)
ILYQPENEYTSGCSGVEFPDPVYMQYVEDQARNAGVVIPLINNDASASGNNAPGTGKGAVDIYG




P29853.2
HDSYPLGFDCANPTVWPSGDLPTNFRTLHLEQSPTTPYAIVEFQGGSYDPWGGPGFAACSELLN




GI:461623
NEFERVFYKNDFSFQIAIMNLYMIFGGTNWGNLGYPNGYTSYDYGSAVTESRNITREKYSELKL





LGNFAKVSPGYLTASPGNLTTSGYADTTDLTVTPLLGNSTGSFFVVRHSDYSSEESTSYKLRLP





TSAGSVTIPQLGGTLTLNGRDSKIHVTDHNVSGTNIIYSTAEVFTWKKFADGKVLVLYGGAGEH





HELAISTKSNVTVIEGSESGISSKQTSSSVVVGWDVSTTRRIIQVGDLKILLLDRNSAYNYWVP





QLATDGTSPGFSTPEKVASSIIVKAGYLVRTAYLKGSGLYLTADFNATTSVEVIGVPSTAKNLF





INGDKTSHTVDKNGIWSATVDYNAPDISLPSLKDLDWKYVDTLPEIQSSYDDSLWPAADLKQTK





NTLRSLTTPTSLYSSDYGFHTGYLLYRGHFTATGNESTFAIDTQGGSAFGSSVWLNGTYLGSWT





GLYANSDYNATYNLPQLQAGKTYVITVVIDNMGLEENWTVGEDLMKSPRGISTSCLPDGQAAPI





SWKLTGNLGGEDYEDKVRGPLNEGGLYAERQGFHQPEPPSQNWKSSSPLEGLSEAGIGFYSASF





DLDLPKDGMSHCSSTSVTALRHPRTACRSTSTDIVCEIHKQHRTSDQLPCPRGNPELSRNELVG





GDPVALDSAGGKLESLELSYTTPVLTALGEVESVDQPKYKKRKGAY




Alpha-glucostdase
SLLAPSQPQFXIPASAAVGAQLIANIDDPQAADAQSVCPGYKASKVQHNSRGFTASLQLAGRPC
723



P1 subunit, ANP P1
NVYGTDVESLTLSVEYQDSDRLNIQILPTHVDSTXASWYFLSENLVPRPKASLXASVSQSDLFV




subunit
SWSNEPSFNFKVIRKATGDALFSTEGTVLVYENQFIEFVTALPEEYNLYGLGEHITQFRLQRNA





Aspergillus niger

XLTIYPSDDGTPIDQNLYGQHPFYLDTRYYKGDRQ




AAB2358.1





GI:257186





(transglucosidase)





Celluclast
MADIDVEAILKKLTLAEKVDLLAGIDFWHTKALPKHGVPSLRFTDGPNGVRGTKFFNGVPAACF
724



hypothetical
PCGTSLGSTFNQTLLEEAGKMMGKEAIAKSAHVILGPTINMQRSPLGGRGFESIGEDPFLAGLG




protein
AAALIRGIQSTGVQATIKHFLCNDQEDRRMMVQSIVTERALREIYALPFQIAVRDSQPGAFMTA




M419DRAFT_125268
YNGINGVSCSENPKYLDGMLRKEWGWDGLIMSDWYGTYSTTEAVVAGLDLEMPGPPRFRGETLK




[T. reesei RUT C-3]
FNVSNGKPFIHVIDQRAREVLQFVKKCAASGVTENGPETTVNNTPETAALLRKVGNEGIVLLKN




ETR97394.1
ENNVLPLSKKKKTLIVGPNAKQATYHGGGSAALRAYYAVTPFDGLSKQLETPPSYTVGAYTHRF




GI:57227381
LPILGEQCLTPDGAPGMRWRVFNEPPGTPNRQHIDELFFTKTDMHLVDYYHPKAADTWYADMEG





TYTADEDCTYELGLVVCGTAKAYVDDQLVVDNATKQVPGDAFFGSATREETGRINLVKGNTYKF





KIEFGSAPTYTLKGDTIVPGHGSLRVGGCKVIDDQAEIEKSVALAKEHDQVIICAGLNADWETE





GADRASMKLPGVLDQLIADVAAANPNTVVVMQTGTPEEMPWLDATPAVIQAWYGGNETGNSIAD





VVFGDYNPSGKLSLSFPKRLQDNPAFLNFRTEAGRTLYGEDVYVGYRYYEFADKDVNFPFGHGL





SYTTFAFSNLSVSHKDGKLSVSLSVKNTGSVPGAQVAQLYVKPLQAAKINRPVKELKGFAKVEL





QPGETKAVTIEEQEKYVAAYFDEERDQWCVEKGDYEVIVSDSSAAKDGVALRGKFTVGETYWWS





GV




Velvet complex
MPSLIPPIVSASSASNSAALDHLYHHQPPPRLPLGAVPQSPIQSQAPPPPHLHPPSHHFQLHPG
725



subunit 2
HGHHQQPHHERDHRLPPPVASYSAHSHHLQHDPLPQRLESSQPGHPGAAEHRDHPQHALDEPSR




GRS98.2
SHDPYPSMATGALVHSESQQPASASLLLPISNVEEATGRRYHLDVVQQPRRARMCGFGDKDRRP




GI:1881915
ITPPPCVRLIIIDVATGKEIDCNDIDHSMFVLNVDLWNEDGTREVNLVRSSTSSSPSVSSTVTY





PYGSISVGESSHTYGQSAHPPSREAPYSVSQTASYAPEYQTQPTYSQGSSAYPSNGTYGPPQQY





FPQHQAYRTETGPPGAMQTTVGGFRGYAQDQNALTKMAVVGGQPQGMFTRNLIGSLAASAFRLA





DTSEHLGIWFVLQDLSVRTEGPFRLRFSFVNVGPLAGQNGAKVNTGRAPILASCFSEVFNVYSA





KKFPGVCESTPLSKTFAAQGIKIPIRKDANLKGGDGEDDYGD




alpha-L-
MLSNARIIAAGCIAAGSLVAAGPCDIYSSGGTPCVAAHSTTRALFSAYTGPLYQVKRGSDGATT
726



arabinofuranostdas
AISPLSSGVANAAAQDAFCAGTTCLITIIYDQSGRGNHLTQAPPGGFSGPESNGYDNLASAIGA




e [T. reesei]
PVTLNGQKAYGVFVSPGTGYRNNAASGTAKGDAAEGMYAVLDGTHYNGACCFDYGNAETNSRDT




CAA93243.1
GNGHMEAIYFGDSTVWGTGSGKGPWIMADLENGLFSGSSPGNNAGDPSISYRFVTAAIKGQPNQ




GI:158814
WAIRGGNAASGSLSTFYSGARPQVSGYNPMSKEGAIILGIGGDNSNGAQGTFYEGVMTSGYPSD





ATENSVQANIVAARYAVAPLTSGPALTVGSSISLRATTACCTTRYIAHSGSTVNTQVVSSSSAT





ALKQQASWTVRAGLANNACFSFESRDTSGSYIRHSNFGLVLNANDGSKLFAEDATFCTQAGING





QGSSIRSWSYPTRYFRHYNNTLYIASNGGVHVFDATAAFNDDVSFVVSGGFA




Beta-xylosidase
MVNNAALLAALSALLPTALAQNNQTYANYSAQGQPDLYPETLATLTLSFPDCEHGPLKNNLVCD
727



[Trichoderma
SSAGYVERAQALISLFTLEELILNTQNSGPGVPRLGLPNYQVWNEALHGLDRANFATKGGQFEW





reesei]

ATSFPMPILTTAALNRTLIHQIADIISTQARAFSNSGRYGLDVYAPNVNGFRSPLWGRGQETPG




CAA93248.1
EDAFFLSSAYTYEYITGIQGGVDPEHLKVAATVKHFAGYDLENWNNQSRLGFDAIITQQDLSEY




GI:2791278
YTPQFLAAARYAKSRSLMCAYNSVNGVPSCANSFFLQTLLRESWGFPEWGYVSSDCDAVYNVFN





PHDYASNQSSAAASSLRAGTDIDCGQTYPWHLNESFVAGEVSRGEIERSVTRLYANLVRLGYFD





KKNQYRSLGWKDVVKTDAWNISYEAAVEGIVLLKNDGTLPLSKKVRSIALIGPWANATTQMQGN





YYGPAPYLISPLEAAKKAGYHVNFELGTEIAGNSTTGFAKAIAAAKKSDAIIYLGGIDNTIEQE





GADRTDIAWPGNQLDLIKQLSEVGKPLVVLQMGGGQVDSSSLKSNKKVNSLVWGGYPGQSGGVA





LFDILSGKRAPAGRLVTTQYPAEYVHQFPQNDMNLRPDGKSNPGQTYIWYTGKPVYEFGSGLFY





TTFKETLASHPKSLKFNTSSILSAPHPGYTYSEQIPVFTFEANIKNSGKTESPYTAMLFVRTSN





AGPAPYPNKWLVGFDRLADIKPGHSSKLSIPIPVSALARVDSHGNRIVYPGKYELALNTDESVK





LEFELVGEEVTIENWPLEEQQIKDATPDA




Unnamed protein
MVNNAALLAALSALLPTALAQNNQTYANYSAQGQPDLYPETLATLTLSFPDCEHGPLKNNLVCD
728



product [T.
SSAGYVERAQALISLFTLEELILNTQNSGPGVPRLGLPNYQVWNEALHGLDRANFATKGGQFEW





reesei]

ATSFPMPILTTAALNRTLIHQIADIISTQARAFSNSGRYGLDVYAPNVNGFRSPLWGRGQETPG




CAW52645.1
EDAFFLSSAYTYEYITGIQGGVDPEHLKVAATVKHFAGYDLENWNNQSRLGFDAIITQQDLSEY




GI:219752323
YTPQFLAAARYAKSRSLMCAYNSVNGVPSCANSFFLQTLLRESWGFPEWGYVSSDCDAVYNVFN





PHDYASNQSSAAASSLRAGTDIDCGQTYPWHLNESFVAGEVSRGEIERSVTRLYANLVRLGYFD





KKNQYRSLGWKDVVKTDAWNISYEAAVEGIVLLKNDGTLPLSKKVRSIALIGPWANATTQMQGN





YYGPAPYLISPLEAAKKAGYHVNFELGTEIAGNSTTGFAKAIAAAKKSDAIIYLGGIDNTIEQE





GADRTDIAWPGNQLDLIKQLSEVGKPLVVLQMGGGQVDSSSLKSNKKVNSLVWGGYPGQSGGVA





LFDILSGKRAPAGRLVTTQYPAEYVHQFPQNDMNLRPDGKSNPGQTYIWYTGKPVYEFGSGLFY





TTFKETLASHPKSLKFNTSSILSAPHPGYTYSEQIPVFTFEANIKNSGKTESPYTAMLFVRTSN





AGPAPYPNKWLVGFDRLADIKPGHSSKLSIPIPVSALARVDSHGNRIVYPGKYELALNTDESVK





LEFELVGEEVTIENWPLEEQQIKDATPDA




Unnamed protein
MVNNAALLAALSALLPTALAQNNQTYANYSAQGQPDLYPETLATLTLSFPDCEHGPLKNNLVCD
729



product [T.
SSAGYVERAQALISLFTLEELILNTQNSGPGVPRLGLPNYQVWNEALHGLDRANFATKGGQFEW





reesei]

ATSFPMPILTTAALNRTLIHQIADIISTQARAFSNSGRYGLDVYAPNVNGFRSPLWGRGQETPG




CBC2392.1
EDAFFLSSAYTYEYITGIQGGVDPEHLKVAATVKHFAGYDLENWNNQSRLGFDAIITQQDLSEY




GI:257341433
YTPQFLAAARYAKSRSLMCAYNSVNGVPSCANSFFLQTLLRESWGFPEWGYVSSDCDAVYNVFN





PHDYASNQSSAAASSLRAGTDIDCGQTYPWHLNESFVAGEVSRGEIERSVTRLYANLVRLGYFD





KKNQYRSLGWKDVVKTDAWNISYEAAVEGIVLLKNDGTLPLSKKVRSIALIGPWANATTQMQGN





YYGPAPYLISPLEAAKKAGYHVNFELGTEIAGNSTTGFAKAIAAAKKSDAIIYLGGIDNTIEQE





GADRTDIAWPGNQLDLIKQLSEVGKPLVVLQMGGGQVDSSSLKSNKKVNSLVWGGYPGQSGGVA





LFDILSGKRAPAGRLVTTQYPAEYVHQFPQNDMNLRPDGKSNPGQTYIWYTGKPVYEFGSGLFY





TTFKETLASHPKSLKFNTSSILSAPHPGYTYSEQIPVFTFEANIKNSGKTESPYTAMLFVRTSN





AGPAPYPNKWLVGFDRLADIKPGHSSKLSIPIPVSALARVDSHGNRIVYPGKYELALNTDESVK





LEFELVGEEVTIENWPLEEQQIKDATPDA




Chain A, The
XNNQTYANYSAQGQPDLYPETLATLTLSFPDCEHGPLKNNLVCDSSAGYVERAQALISLFTLEE
73



Structure Of
LILNTQNSGPGVPRLGLPNYQVWNEALHGLDRANFATKGGQFEWATSFPMPILTTAALNRTLIH




Hypocrea Jecorina
QIADIISTQARAFSNSGRYGLDVYAPNVNGFRSPLWGRGQETPGEDAFFLSSAYTYEYITGIQG




Beta-xy1osidase
GVDPEHLKVAATVKHFAGYDLENWNNQSRLGFDAIITQQDLSEYYTPQFLAAARYAKSRSLMCA




Xy13a (bx11)
YNSVNGVPSCANSFFLQTLLRESWGFPEWGYVSSDCDAVYNVFNPHDYASNQSSAAASSLRAGT




5A7M_A
DIDCGQTYPWHLNESFVAGEVSRGEIERSVTRLYANLVRLGYFDKKNQYRSLGWKDVVKTDAWN




GI:152244671
ISYEAAVEGIVLLKNDGTLPLSKKVRSIALIGPWANATTQMQGNYYGPAPYLISPLEAAKKAGY





HVNFELGTEIAGNSTTGFAKAIAAAKKSDAIIYLGGIDNTIEQEGADRTDIAWPGNQLDLIKQL





SEVGKPLVVLQMGGGQVDSSSLKSNKKVNSLVWGGYPGQSGGVALFDILSGKRAPAGRLVTTQY





PAEYVHQFPQNDMNLRPDGKSNPGQTYIWYTGKPVYEFGSGLFYTTFKETLASHPKSLKFNTSS





ILSAPHPGYTYSEQIPVFTFEANIKNSGKTESPYTAMLFVRTSNAGPAPYPNKWLVGFDRLADI





KPGHSSKLSIPIPVSALARVDSHGNRIVYPGKYELALNTDESVKLEFELVGEEVTIENWPLE




Chain B, The
XNNQTYANYSAQGQPDLYPETLATLTLSFPDCEHGPLKNNLVCDSSAGYVERAQALISLFTLEE
731



Structure Of
LILNTQNSGPGVPRLGLPNYQVWNEALHGLDRANFATKGGQFEWATSFPMPILTTAALNRTLIH




Hypocrea Jecorina
QIADIISTQARAFSNSGRYGLDVYAPNVNGFRSPLWGRGQETPGEDAFFLSSAYTYEYITGIQG




Beta-xylosidase
GVDPEHLKVAATVKHFAGYDLENWNNQSRLGFDAIITQQDLSEYYTPQFLAAARYAKSRSLMCA




Xyl3a (bx11)
YNSVNGVPSCANSFFLQTLLRESWGFPEWGYVSSDCDAVYNVFNPHDYASNQSSAAASSLRAGT




5A7M_B
DIDCGQTYPWHLNESFVAGEVSRGEIERSVTRLYANLVRLGYFDKKNQYRSLGWKDVVKTDAWN




GI:152244672
ISYEAAVEGIVLLKNDGTLPLSKKVRSIALIGPWANATTQMQGNYYGPAPYLISPLEAAKKAGY





HVNFELGTEIAGNSTTGFAKAIAAAKKSDAIIYLGGIDNTIEQEGADRTDIAWPGNQLDLIKQL





SEVGKPLVVLQMGGGQVDSSSLKSNKKVNSLVWGGYPGQSGGVALFDILSGKRAPAGRLVTTQY





PAEYVHQFPQNDMNLRPDGKSNPGQTYIWYTGKPVYEFGSGLFYTTFKETLASHPKSLKFNTSS





ILSAPHPGYTYSEQIPVFTFEANIKNSGKTESPYTAMLFVRTSNAGPAPYPNKWLVGFDRLADI





KPGHSSKLSIPIPVSALARVDSHGNRIVYPGKYELALNTDESVKLEFELVGEEVTIENWPLE




Chain A, The
XNNQTYANYSAQGQPDLYPETLATLTLSFPDCEHGPLKNNLVCDSSAGYVERAQALISLFTLEE
732



Structure Of
LILNTQNSGPGVPRLGLPNYQVWNEALHGLDRANFATKGGQFEWATSFPMPILTTAALNRTLIH




Hypocrea Jecorina
QIADIISTQARAFSNSGRYGLDVYAPNVNGFRSPLWGRGQETPGEDAFFLSSAYTYEYITGIQG




Beta-xylosidase
GVDPEHLKVAATVKHFAGYDLENWNNQSRLGFDAIITQQDLSEYYTPQFLAAARYAKSRSLMCA




Xyl3a (Bx11) In
YNSVNGVPSCANSFFLQTLLRESWGFPEWGYVSSDCDAVYNVFNPHDYASNQSSAAASSLRAGT




Complex With 4-
DIDCGQTYPWHLNESFVAGEVSRGEIERSVTRLYANLVRLGYFDKKNQYRSLGWKDVVKTDAWN




thioxylobiose
ISYEAAVEGIVLLKNDGTLPLSKKVRSIALIGPWANATTQMQGNYYGPAPYLISPLEAAKKAGY




5AE6_A
HVNFELGTEIAGNSTTGFAKAIAAAKKSDAIIYLGGIDNTIEQEGADRTDIAWPGNQLDLIKQL




GI:169428461
SEVGKPLVVLQMGGGQVDSSSLKSNKKVNSLVWGGYPGQSGGVALFDILSGKRAPAGRLVTTQY





PAEYVHQFPQNDMNLRPDGKSNPGQTYIWYTGKPVYEFGSGLFYTTFKETLASHPKSLKFNTSS





ILSAPHPGYTYSEQIPVFTFEANIKNSGKTESPYTAMLFVRTSNAGPAPYPNKWLVGFDRLADI





KPGHSSKLSIPIPVSALARVDSHGNRIVYPGKYELALNTDESVKLEFELVGEEVTIENWPLEE




Chain B, The
XNNQTYANYSAQGQPDLYPETLATLTLSFPDCEHGPLKNNLVCDSSAGYVERAQALISLFTLEE
733



Structure Of
LILNTQNSGPGVPRLGLPNYQVWNEAALNRTLIHQIADIISTQARAFSNSGRYGLDVYAPNVNG




Hypocrea Jecorina
FRSPLWGRGQETPGEDAFFLSSAYTYEYITGIQGGVDPEHLKVAATVKHFAGYDLENWNNQSRL




Beta-xylosidase
GFDAIITQQDLSEYYTPQFLAAARYAKSRSLMCAYNSVNGVPSCANSFFLQTLLRESWGFPEWG




Xyl3a (Bx11) In
YVSSDCDAVYNVFNPHDYASNQSSAAASSLRAGTDIDCGQTYPWHLNESFVAGEVSRGEIERSV




Complex With 4-
TRLYANLVRLGYFDKKNQYRSLGWKDVVKTDAWNISYEAAVEGIVLLKNDGTLPLSKKVRSIAL




thioxylobiose
IGPWANATTQMQGNYYGPAPYLISPLEAAKKAGYHVNFELGTEIAGNSTTGFAKAIAAAKKSDA




5AE6_B
IIYLGGIDNTIEQEGADRTDIAWPGNQLDLIKQLSEVGKPLVVLQMGGGQVDSSSLKSNKKVNS




GI:169428462
LVWGGYPGQSGGVALFDILSGKRAPAGRLVTTQYPAEYVHQFPQNDMNLRPDGKSNPGQTYIWY





TGKPVYEFGSGLFYTTFKETLASHPKSLKFNTSSILSAPHPGYTYSEQIPVFTFEANIKNSGKT





ESPYTAMLFVRTSNAGPAPYPNKWLVGFDRLADIKPGHSSKLSIPIPVSALARVDSHGNRIVYP





GKYELALNTDESVKLEFELVGEEVTIENWPLEE




Glycoside
MPLIRNPILPGFNADPSIVRVGSDYYIATSTFEWYPGVQIHHSTDLANWELAVRPLSRRSQLDL
734



hydrolase family
RGEPDSCGVWAPCLTHDGDKFWLVYTDVKRKDGSFKDTHNYIVTAPRIEGPWSDPVYANSSGFD




43 [Trichoderma
PSLFHDDDGRKWLVNMVQDHRARPRTFAGIALQEFDPAQGKLVGTRKVVFHGSELGLVEGPHLY





reesei QM6a]

KRNGWYYLLTAEGGTGYTHAATLARSRSIWGPYELHPQQHILTSKDHPFAALQRAGHADIVETA




EGR49145.1
DGKTYLVHLAGRPIGQKRRCVLGRETALQEAYWGEDDWLYVKNGPVPSLDVEVPGVRDEEAYWK




GI:3451895
EKRYEFHDGLHKDFQWLRTPEPERLFAIEDGKLVLTGRESIGSWFEQSLVARRQTHFSFDAETV





IDFSPEDERQFAGLTLYYSRYNFFYLAVSAHSDGRREVQILRSEASWPNGKLEDVGANACYVRI





PQQGRVKLAATIRGERLQFYYALVAEGGEEQEELQRIGPVLDASIVSDECGGHQAHGSFTGSFV





GVACSDVNGTEKRAVFDYFVYRPAHDSTDRYSVSMEGIQRV




Glycoside
MVNNAALLAALSALLPTALAQNNQTYANYSAQGQPDLYPETLATLTLSFPDCEHGPLKNNLVCD
735



hydrolase family 3
SSAGYVERAQALISLFTLEELILNTQNSGPGVPRLGLPNYQVWNEALHGLDRANFATKGGQFEW




[T. reesei QM6a]
ATSFPMPILTTAALNRTLIHQIADIISTQARAFSNSGRYGLDVYAPNVNGFRSPLWGRGQETPG




EGR4972.1
EDAFFLSSAYTYEYITGIQGGVDPEHLKVAATVKHFAGYDLENWNNQSRLGFDAIITQQDLSEY




GI:34519464
YTPQFLAAARYAKSRSLMCAYNSVNGVPSCANSFFLQTLLRESWGFPEWGYVSSDCDAVYNVFN





PHDYASNQSSAAASSLRAGTDIDCGQTYPWHLNESFVAGEVSRGEIERSVTRLYANLVRLGYFD





KKNQYRSLGWKDVVKTDAWNISYEAAVEGIVLLKNDGTLPLSKKVRSIALIGPWANATTQMQGN





YYGPAPYLISPLEAAKKAGYHVNFELGTEIAGNSTTGFAKAIAAAKKSDAIIYLGGIDNTIEQE





GADRTDIAWPGNQLDLIKQLSEVGKPLVVLQMGGGQVDSSSLKSNKKVNSLVWGGYPGQSGGVA





LFDILSGKRAPAGRLVTTQYPAEYVHQFPQNDMNLRPDGKSNPGQTYIWYTGKPVYEFGSGLFY





TTFKETLASHPKSLKFNTSSILSAPHPGYTYSEQIPVFTFEANIKNSGKTESPYTAMLFVRTSN





AGPAPYPNKWLVGFDRLADIKPGHSSKLSIPIPVSALARVDSHGNRIVYPGKYELALNTDESVK





LEFELVGEEVTIENWPLEEQQIKDATPDA




Glycoside
MRVNVPLHALQIAARSVAAAICKSSSASGRSLRGGKIDQASRINIYSISNPSPNPPLTPSFPDC
736



hydrolase family 3
TRDPLCSNDVCDTTKSIAERAAAIVKPMTLNEKVANVGSSASGSARLGLPAYQWQNEALHGVAG




[T. reesei QM6a]
STGVQFQSPLGANFSAATSFPMPILLSAAFDDALVKSVATAISTEARAFANYGFAGLDFWTPNI




EGR586.1
NPFRDPRWGRGMETPGEDAFRIQGYVLALVDGLQGGIDPDFYRTLSTCKHFAAYDIENGRTANN




GI:34519849
LSPTQQDMADYYLPMFETCVRDAKVASIMCAYNAVDGVPACADSYLLQDVLRDTYGFTEDFNYV





VSDCDAVENVFDPHHYAANLTQAAAMSINAGTDLDCGSSYNVLNASVQAGLTTEATLDKSLIRL





YSALVKVGYFDQPAEYNSLGWGNVNTTQSQALAHDAATEGMTLLKNDGTLPLSRTLSNVAVIGP





WANVTTQMQGNYAGTAPLLVNPLSVFQQKWRNVKYAQGTAINSQDTSGFNAALSAASSSDVIVY





LGGIDISVENEGFDRSSITWPGNQLNLISQLANLGKPLVIVQFGGGQIDDSALLSNSKVNSILW





AGYPGQDGGNAIFDVLTGANPPAGRLPVTQYPANYVNNNNIQDMNLRPSNGIPGRTYAWYTGTP





VLPFGYGLHYTNFSLSFQSTKTAGSDIATLVNNAGSNKDLATFATIVVNVKNTGGKANLASDYV





GLLFLKSTNAGPAPHPNKQLAAYGRVRNVGVGATQQLTLTVNLGSLARADTNGDRWIYPGAYTL





ILDVNGPLTFNFTLTGTATKISTLPSRS




Glycoside
MRVNVPLHALQIAARSVAAAICKSSSASGRSLRGGKIDQASRINIYSISNPSPNPPLTPSFPDC
737



hydrolase family 3
TRDPLCSNDVCDTTKSIAERAAAIVKPMTLNEKVANVGSSASGSARLGLPAYQWQNEALHGVAG




[T. reesei QM6a]
STGVQFQSPLGANFSAATSFPMPILLSAAFDDALVKSVATAISTEARAFANYGFAGLDFWTPNI




XP_6963621.1
NPFRDPRWGRGMETPGEDAFRIQGYVLALVDGLQGGIDPDFYRTLSTCKHFAAYDIENGRTANN




GI:58913197
LSPTQQDMADYYLPMFETCVRDAKVASIMCAYNAVDGVPACADSYLLQDVLRDTYGFTEDFNYV





VSDCDAVENVFDPHHYAANLTQAAAMSINAGTDLDCGSSYNVLNASVQAGLTTEATLDKSLIRL





YSALVKVGYFDQPAEYNSLGWGNVNTTQSQALAHDAATEGMTLLKNDGTLPLSRTLSNVAVIGP





WANVTTQMQGNYAGTAPLLVNPLSVFQQKWRNVKYAQGTAINSQDTSGFNAALSAASSSDVIVY





LGGIDISVENEGFDRSSITWPGNQLNLISQLANLGKPLVIVQFGGGQIDDSALLSNSKVNSILW





AGYPGQDGGNAIFDVLTGANPPAGRLPVTQYPANYVNNNNIQDMNLRPSNGIPGRTYAWYTGTP





VLPFGYGLHYTNFSLSFQSTKTAGSDIATLVNNAGSNKDLATFATIVVNVKNTGGKANLASDYV





GLLFLKSTNAGPAPHPNKQLAAYGRVRNVGVGATQQLTLTVNLGSLARADTNGDRWIYPGAYTL





ILDVNGPLTFNFTLTGTATKISTLPSRS




glycoside
MVNNAALLAALSALLPTALAQNNQTYANYSAQGQPDLYPETLATLTLSFPDCEHGPLKNNLVCD
738



hydrolase family 3
SSAGYVERAQALISLFTLEELILNTQNSGPGVPRLGLPNYQVWNEALHGLDRANFATKGGQFEW




[T. reesei QM6a]
ATSFPMPILTTAALNRTLIHQIADIISTQARAFSNSGRYGLDVYAPNVNGFRSPLWGRGQETPG




XP_696475.1
EDAFFLSSAYTYEYITGIQGGVDPEHLKVAATVKHFAGYDLENWNNQSRLGFDAIITQQDLSEY




GI:5891415
YTPQFLAAARYAKSRSLMCAYNSVNGVPSCANSFFLQTLLRESWGFPEWGYVSSDCDAVYNVFN





PHDYASNQSSAAASSLRAGTDIDCGQTYPWHLNESFVAGEVSRGEIERSVTRLYANLVRLGYFD





KKNQYRSLGWKDVVKTDAWNISYEAAVEGIVLLKNDGTLPLSKKVRSIALIGPWANATTQMQGN





YYGPAPYLISPLEAAKKAGYHVNFELGTEIAGNSTTGFAKAIAAAKKSDAIIYLGGIDNTIEQE





GADRTDIAWPGNQLDLIKQLSEVGKPLVVLQMGGGQVDSSSLKSNKKVNSLVWGGYPGQSGGVA





LFDILSGKRAPAGRLVTTQYPAEYVHQFPQNDMNLRPDGKSNPGQTYIWYTGKPVYEFGSGLFY





TTFKETLASHPKSLKFNTSSILSAPHPGYTYSEQIPVFTFEANIKNSGKTESPYTAMLFVRTSN





AGPAPYPNKWLVGFDRLADIKPGHSSKLSIPIPVSALARVDSHGNRIVYPGKYELALNTDESVK





LEFELVGEEVTIENWPLEEQQIKDATPDA




Glycoside
MPLIRNPILPGFNADPSIVRVGSDYYIATSTFEWYPGVQIHHSTDLANWELAVRPLSRRSQLDL
739



hydrolase family
RGEPDSCGVWAPCLTHDGDKFWLVYTDVKRKDGSFKDTHNYIVTAPRIEGPWSDPVYANSSGFD




43 [T. reesei QM6a]
PSLFHDDDGRKWLVNMVQDHRARPRTFAGIALQEFDPAQGKLVGTRKVVFHGSELGLVEGPHLY




XP_6964816.1
KRNGWYYLLTAEGGTGYTHAATLARSRSIWGPYELHPQQHILTSKDHPFAALQRAGHADIVETA




GI:58915587
DGKTYLVHLAGRPIGQKRRCVLGRETALQEAYWGEDDWLYVKNGPVPSLDVEVPGVRDEEAYWK





EKRYEFHDGLHKDFQWLRTPEPERLFAIEDGKLVLTGRESIGSWFEQSLVARRQTHFSFDAETV





IDFSPEDERQFAGLTLYYSRYNFFYLAVSAHSDGRREVQILRSEASWPNGKLEDVGANACYVRI





PQQGRVKLAATIRGERLQFYYALVAEGGEEQEELQRIGPVLDASIVSDECGGHQAHGSFTGSFV





GVACSDVNGTEKRAVFDYFVYRPAHDSTDRYSVSMEGIQRV




Family 43
MPLIRNPILPGFNADPSIVRVGSDYYIATSTFEWYPGVQIHHSTDLANWELAVRPLSRRSQLDL
740



glycoside
RGEPDSCGVWAPCLTHDGDKFWLVYTDVKRKDGSFKDTHNYIVTAPRIEGPWSDPVYANSSGFD




hydrolase [T.
PSLFHDDDGRKWLVNMVQDHRARPRTFAGIALQEFDPAQGKLVGTRKVVFHGSELGLVEGPHLY





reesei RUT C-3]

KRNGWYYLLTAEGGTGYTHAATLARSRSIWGPYELHPQQHILTSKDHPFAALQRAGHADIVETA




ETS2497.1
DGKTYLVHLAGRPIGQKRRCVLGRETALQEAYWGEDDWLYVKNGPVPSLDVEVPGVRDEEAYWK




GI:572279375
EKRYEFHDGLHKDFQWLRTPEPERLFAIEDGKLVLTGRESIGSWFEQSLVARRQTHFSFDAETV





IDFSPEDERQFAGLTLYYSRYNFFYLAVSAHSDGRREVQILRSEASWPNGKLEDVGANACYVRI





PQQGRVKLAATIRGERLQFYYALVAEGGEEQEELQRIGPVLDASIVSDECGGHQAHGSFTGSFV





GVACSDVNGTEKRAVFDYFVYRPAHDSTDRYSVSMEGIQRV




Beta-xylostdase
MVNNAALLAALSALLPTALAQNNQTYANYSAQGQPDLYPETLATLTLSFPDCEHGPLKNNLVCD
741



[T. reesei RUT C-3]
SSAGYVERAQALISLFTLEELILNTQNSGPGVPRLGLPNYQVWNEALHGLDRANFATKGGQFEW




ET3193.1
ATSFPMPILTTAALNRTLIHQIADIISTQARAFSNSGRYGLDVYAPNVNGFRSPLWGRGQETPG




GI:5722896
EDAFFLSSAYTYEYITGIQGGVDPEHLKVAATVKHFAGYDLENWNNQSRLGFDAIITQQDLSEY





YTPQFLAAARYAKSRSLMCAYNSVNGVPSCANSFFLQTLLRESWGFPEWGYVSSDCDAVYNVFN





PHDYASNQSSAAASSLRAGTDIDCGQTYPWHLNESFVAGEVSRGEIERSVTRLYANLVRLGYFD





KKNQYRSLGWKDVVKTDAWNISYEAAVEGIVLLKNDGTLPLSKKVRSIALIGPWANATTQMQGN





YYGPAPYLISPLEAAKKAGYHVNFELGTEIAGNSTTGFAKAIAAAKKSDAIIYLGGIDNTIEQE





GADRTDIAWPGNQLDLIKQLSEVGKPLVVLQMGGGQVDSSSLKSNKKVNSLVWGGYPGQSGGVA





LFDILSGKRAPAGRLVTTQYPAEYVHQFPQNDMNLRPDGKSNPGQTYIWYTGKPVYEFGSGLFY





TTFKETLASHPKSLKFNTSSILSAPHPGYTYSEQIPVFTFEANIKNSGKTESPYTAMLFVRTSN





AGPAPYPNKWLVGFDRLADIKPGHSSKLSIPIPVSALARVDSHGNRIVYPGKYELALNTDESVK





LEFELVGEEVTIENWPLEEQQIKDATPDA




Glycoside
MALFHLAQARTCLPPYQAQTTYQGCYHDPNSPRDLAGPMLTVGNLNSPQYCANICGAAGYQYSG
742



hydrolase [T.
VEFTIQCFCGHRIESTSVKADESQCSSPCPADSSKVCGGGNMINIYSISNPSPNPPLTPSFPDC





reesei RUT C-3]

TRDPLCSNDVCDTTKSIAERAAAIVKPMTLNEKVANVGSSASGSARLGLPAYQWQNEALHGVAG




ETS3636.1
STGVQFQSPLGANFSAATSFPMPILLSAAFDDALVKSVATAISTEARAFANYGFAGLDFWTPNI




GI:57228576
NPFRDPRWGRGMETPGEDAFRIQGYVLALVDGLQGGIDPDFYRTLSTCKHFAAYDIENGRTANN





LSPTQQDMADYYLPMFETCVRDAKVASIMCAYNAVDGVPACADSYLLQDVLRDTYGFTEDFNYV





VSDCDAVENVFDPHHYAANLTQAAAMSINAGTDLDCGSSYNVLNASVQAGLTTEATLDKSLIRL





YSALVKVGYFDQPAEYNSLGWGNVNTTQSQALAHDAATEGMTLLKNDGTLPLSRTLSNVAVIGP





WANVTTQMQGNYAGTAPLLVNPLSVFQQKWRNVKYAQGTAINSQDTSGFNAALSAASSSDVIVY





LGGIDISVENEGFDRSSITWPGNQLNLISQLANLGKPLVIVQFGGGQIDDSALLSNSKVNSILW





AGYPGQDGGNAIFDVLTGANPPAGRLPVTQYPANYVNNNNIQDMNLRPSNGIPGRTYAWYTGTP





VLPFGYGLHYTNFSLSFQSTKTAGSDIATLVNNAGSNKDLATFATIVVNVKNTGGKANLASDYV





GLLFLKSTNAGPAPHPNKQLAAYGRVRNVGVGATQQLTLTVNLGSLARADTNGDRWIYPGAYTL





ILDVNGPLTFNFTLTGTATKISTLPSRS




Chain B, The
XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN
743



Three-Dimensional
ETCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS




Crystal Structure
FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW




Of The Catalytic
EPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC




Core Of
DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS




Cellobtohydrolase
YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN




I From T. Reesei
ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG




10EL_B GI:89287





Chain A, The
XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN
744



Three-Dimenstional
ETCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS




Crystal Structure
FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW




Of The Catalytic
EPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC




Core Of
DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS




Cellobtohydrolase
YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN




I From T. Reesei
ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG




1CEL_A GI:89286





Chain A, Three-
SGTATYSGNPFVGVTPWANAYYASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLME
745



Dimensional
QTLADIRTANKNGGNYAGQFVVYDLPDRDCAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYS




Structure Of
DIRTLLVIEPDSLANLVTNLGTPKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPA




Cellobtohydrolase
NQDPAAQLFANVYKNASSPRALRGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLA




From T. Reesei
NHGWSNAFFITDQGRSGKQPTGQQQWGDWCNVIGTGFGIRPSANTGDSLLDSFVWVKPGGECDG




3CBH_A
TSDSSAPRFDSHCALPDALQPAPQAGAWFQAYFVQLLTNANPSFL




GI:157836775





Chain A,
TQSHYGQCGGIGYSGPTVCASGTTCQVLNPYYSQCL
746



Determination Of





The Three-





Dimensional





Structure Of The





C-Terminal Domain





Of





Cellobtohydrolase





I From T. Reesei.





2CBH_A





GI:157834734





Chain A, Three-
TQSHYGQCGGIGYSGPTVCASGTTCQVLNPYASQCL
747



Dimensional





Structures Of





Three Engineered





Cellulose-Binding





Domains Of





Cellobiohydrolase





I From T. Reesei,





Nmr, 19 Structures





lAZK_A GI:15783159





Chain A, Three-
TQSHYGQCGGIGYSGPTVCASGTTCQVLNPAYSQCL
748



Dimensional





Structures Of





Three Engineered





Cellulose-Binding





Domains Of





Cellobiohydrolase





I From T. Reesei,





Nmr, 18 Structures





lAZJ_A GI:15783158





Chain A, Three-
TQSHAGQCGGIGYSGPTVCASGTTCQVLNPYYSQCL
749



Dimensional





Structures Of





Three Engineered





Cellulose-Binding





Domains Of





Cellobiohydrolase





I From T. Reesei,





Nmr, 14 Structures





lAZH_A GI:15783156





Chain A, Three-
TQSHAGQCGGIGYSGPTVCASGTTCQVLNPYYSQCL
750



Dimensional





Structures Of





Three Engineered





Cellulose- Binding





Domains Of





Cellobiohydrolase





I From T. Reesei,





Nmr, 23 Structures





1AZ6_A GI:15783153





CellobIohydrolase
MIVGILTTLATLATLAASVPLEERQACSSVWGQCGGQNWSGPTCCASGSTCVYSNDYYSQCLPG
751



II [T. Reesei]
AASSSSSTRAASTTSRVSPTTSRSSSATPPPGSTTTRVPPVGSGTATYSGNPFVGVTPWANAYY




AAA3421.1
ASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLMEQTLADIRTANKNGGNYAGQFVV




GI:17541
YDLPDRDCAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYSDIRTLLVIEPDSLANLVTNLGT





PKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPANQDPAAQLFANVYKNASSPRAL





RGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLANHGWSNAFFITDQGRSGKQPTG





QQQWGDWCNVIGTGFGIRPSANTGDSLLDSFVWVKPGGECDGTSDSSAPRFDSHCALPDALQPA





PQAGAWFQAYFVQLLTNANPSFL




CellobIohydro1ase
GSASYXGNPFVGVSPWANAYYAXEVXXLAIPXLTGAMA
752



II core protein,





CBH II cp=3.2.1.91





reesei,





Peptide PartIa1,





38 aa





AAB3868.1 GI:5528





Chain A,
TQSHYGQCGGIGYSGPTVCASGTTCQVLNPYYSQCL
753



Determination Of





The Three-





Dimensiona1





Structure Of The





C- Termina1 Domain





Of





Cellobiohydro1ase





I From T. Reesei.





1CBH_A GI:15783535





cellobIohydro1ase
MIVGILTTLATLATLAASVPLEERQACSSVWGQCGGQNWSGPTCCASGSTCVYSNDYYSQCLPG
754



II [T. Reesei]
AASSSSSTRAASTTSRVSPTTSRSSSATPPPGSTTTRVPPVGSGTATYSGNPFVGVTPWANAYY




AAG3998.1
ASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLMEQTLADIRTANKNGGNYAGQFVV




GI:11692747
YDLPDRDCAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYSDIRTLLVIEPDSLANLVTNLGT





PKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPANQDPAAQLFANVYKNASSPRAL





RGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGRLLANHGWSNAFFITDQGRSGKQPTG





QQQWGDWCNVIGTGFGIRPSANTGDSLLDSFVWVKPGGECDGTSDSSAPRFDSHCALPDALQPA





PQAGAWFQAYFVQLLTNANPSFL




Chain A,
XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN
755



Cellobiohydro1ase
ETCAKNCCLDGAAYASTYGVTTSGNSLSIDFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS




Ce17a (E223s,
FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW




A224h, L225v,
EPSSNNANTGIGGHGSCCSEMDIWEANSISSHVAPHPCTTVGQEICEGDGCGGTYSDNRYGGTC




T226a, D262g)
DPDGCGWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS




Mutant
YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN




1EGN_A
ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG




GI :14277711





Chain B,
XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN
756



Cellobtohydro1ase
ETCAKNCCLDGAAYASTYGVTTSGNSLSIDFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS




Ce17a With Loop
FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW




De1etion 245-252
EPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICEGDGCGGGTCDPDGCDWN




And Bound Non-
PYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGSYSGNELND




Hydrolysable
DYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGA




Ce11otetraose
VRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG




1Q2E_B GI:39654596





Chain A,
XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN
757



Cellobiohydro1ase
ETCAKNCCLDGAAYASTYGVTTSGNSLSIDFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS




Ce17a With Loop
FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW




De1etion 245-252
EPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICEGDGCGGGTCDPDGCDWN




And Bound Non-
PYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGSYSGNELND




Hydrolysable
DYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGA




Ce11otetraose
VRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG




1Q2E_A GI:39654595





Chain A,
XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN
758



Cellobiohydro1ase
ETCAKNCCLDGAAYASTYGVTTSGNSLSIDFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS




Ce17a With
FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW




Disu1phide Bridge
EPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICEGCGCGGTYSCNRYGGTC




Added Across Exo-
DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS




Loop By Mutations
YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN




D241c And D249c
ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG




1Q2B_A GI:39654594





Glycostde
MRATSLLAAALAVAGDALAGKIKYLGVAIPGIDFGCDIDGSCPTDTSSVPLLSYKGGDGAGQMK
759



hydro1ase family 5
HFAEDDGLNVFRISATWQFVLNNTVDGKLDELNWGSYNKVVNACLETGAYCMIDMHNFARYNGG




[T. reesei QM6a]
IIGQGGVSDDIFVDLWVQIAKYYEDNDKIIFGLMNEPHDLDIEIWAQTCQKVVTAIRKAGATSQ




XP_6969897.1
MILLPGTNFASVETYVSTGSAEALGKITNPDGSTDLLYFDVHKYLDINNSGSHAECTTDNVDAF




GI:589115749
NDFADWLRQNKRQAIISETGASMEPSCMTAFCAQNKAISENSDVYIGFVGWGAGSFDTSYILTL





TPLGKPGNYTDNKLMNECILDQFTLDEKYRPTPTSISTAAEETATATATSDGDAPSTTKPIFRE





ETASPTPNAVTKPSPDTSDSSDDDKDSAASMSAQGLTGTVLFTVAALGYMLVAF




Glycostde
MNKSVAPLLLAASILYGGAAAQQTVWGQCGGIGWSGPTNCAPGSACSTLNPYYAQCIPGATTIT
760



hydro1ase family 5
TSTRPPSGPTTTTRATSTSSSTPPTSSGVRFAGVNIAGFDFGCTTDGTCVTSKVYPPLKNFTGS




[T. reesei QM6a]
NNYPDGIGQMQHFVNDDGMTIFRLPVGWQYLVNNNLGGNLDSTSISKYDQLVQGCLSLGAYCIV




EGR512.1
DIHNYARWNGGIIGQGGPTNAQFTSLWSQLASKYASQSRVWFGIMNEPHDVNINTWAATVQEVV




GI:3452785
TAIRNAGATSQFISLPGNDWQSAGAFISDGSAAALSQVTNPDGSTTNLIFDVHKYLDSDNSGTH





AECTTNNIDGAFSPLATWLRQNNRQAILTETGGGNVQSCIQDMCQQIQYLNQNSDVYLGYVGWG





AGSFDSTYVLTETPTGSGNSWTDTSLVSSCLARK




Glycoside
MIQKLSNLLVTALAVATGVVGHGHINDIVINGVWYQAYDPTTFPYESNPPIVVGWTAADLDNGF
761



hydro1ase family
VSPDAYQNPDIICHKNATNAKGHASVKAGDTILFQWVPVPWPHPGPIVDYLANCNGDCETVDKT




61 reesei
TLEFFKIDGVGLLSGGDPGTWASDVLISNNNTWVVKIPDNLAPGNYVLRHEIIALHSAGQANGA




QM6a1
QNYPQCFNIAVSGSGSLQPSGVLGTDLYHATDPGVLINIYTSPLNYIIPGPTVVSGLPTSVAQG




EGR52697.1
SSAATATASATVPGGGSGPTSRTTTTARTTQASSRPSSTPPATTSAPAGGPTQTLYGQCGGSGY




GI:34522464
SGPTRCAPPATCSTLNPYYAQCLN




Endo-1,4-beta-
MKATLVLGSLIVGAVSAYKATTTRYYDGQEGACGCGSSSGAFPWQLGIGNGVYTAAGSQALFDT
762



g1ucanase V [T.
AGASWCGAGCGKCYQLTSTGQAPCSSCGTGGAAGQSIIVMVTNLCPNNGNAQWCPVVGGTNQYG





reesei RUT C-3]

YSYHFDIMAQNEIFGDNVVVDFEPIACPGQAASDWGTCLCVGQQETDPTPVLGNDTGSTPPGSS




ETR998.1
PPATSSSPPSGGGQQTLYGQCGGAGWTGPTTCQAPGTCKVQNQWYSQCLP




GI:572276454





Chain A, The
MKSCAILAALGCLAGSVLGHGQVQNFTINGQYNQGFILDYYYQKQNTGHFPNVAGWYAEDLDLG
763



Structure Of A
FISPDQYTTPDIVCHKNAAPGAISATAAAGSNIVFQWGPGVWPHPYGPIVTYVVECSGSCTTVN




Glycoside
KNNLRWVKIQEAGINYNTQVWAQQDLINQGNKWTVKIPSSLRPGNYVFRHELLAAHGASSANGM




Hydro1ase Family
QNYPQCVNIAVTGSGTKALPAGTPATQLYKPTDPGILFNPYTTITSYTIPGPALWQG




61 Member, Ce161b





From The Hypocrea





Jecorina.





2VTC_A GI:19844312





Chain B, The
MKSCAILAALGCLAGSVLGHGQVQNFTINGQYNQGFILDYYYQKQNTGHFPNVAGWYAEDLDLG
764



Structure Of A
FISPDQYTTPDIVCHKNAAPGAISATAAAGSNIVFQWGPGVWPHPYGPIVTYVVECSGSCTTVN




Glycoside
KNNLRWVKIQEAGINYNTQVWAQQDLINQGNKWTVKIPSSLRPGNYVFRHELLAAHGASSANGM




Hydro1ase Family
QNYPQCVNIAVTGSGTKALPAGTPATQLYKPTDPGILFNPYTTITSYTIPGPALWQG




61 Member, Ce161b





From The Hypocrea





Jecorina.





2VTC_B





GI:198443121





Endog1ucanase VIII
MRATSLLAAALAVAGDALAGKIKYLGVAIPGIDFGCDIDGSCPTDTSSVPLLSYKGGDGAGQMK
765



[T. reesei
HFAEDDGLNVFRISATWQFVLNNTVDGKLDELNWGSYNKVVNACLETGAYCMIDMHNFARYNGG




RUT C-3]
IIGQGGVSDDIFVDLWVQIAKYYEDNDKIIFGLMNEPHDLDIEIWAQTCQKVVTAIRKAGATSQ




ETR9685.1
MILLPGTNFASVETYVSTGSAEALGKITNPDGSTDLLYFDVHKYLDINNSGSHAECTTDNVDAF




GI:572273122
NDFADWLRQNKRQAIISETGASMEPSCMTAFCAQNKAISENSDVYIGFVGWGAGSFDTSYILTL





TPLGKPGNYTDNKLMNECILDQFTLDEKYRPTPTSISTAAEETATATATSDGDAPSTTKPIFRE





ETASPTPNAVTKPSPDTSDSSDDDKDSAASMSAQGLTGTVLFTVAALGYMLVAF




Putative
MKLWIGLLLLGLACRASAHTTFTTLFIDKKNQGDGTCVRMPYDDKTATNPVKPITSSDMACGRN
766



endoglucanase [T.
GGDPVPFICSAKKGSLLTFEFRLWPDAQQPGSIDPGHLGPCAVYLKKVDNMFSDSAAGGGWFKI





reesei RUT C-3]

WEDGYDSKTQKWCVDRLVKNNGLLSVRLPRGLPAGYYIVRPEILALHWAAHRDDPQFYLGCAQI




ETS3449.1
FVDSDVRGPLEIPRRQQATIPGYVNAKTPGLTFDIYQDKLPPYPMPGPKVYIPPAKGNKPNQDL




GI:57228352
NAGRLVQTDGLIPKDCLIKKANWCGRPVEPYSSARMCWRAVNDCYAQSKKCRESSPPIGLTNCD





RWSDHCGKMDALCEQEKYKGPPKFTEKEYVVPAPGKLPEMWNDIFERLEQNGTSTKFF




Endoglucanase VII
MKSCAILAALGCLAGSVLGHGQVQNFTINGQYNQGFILDYYYQKQNTGHFPNVAGWYAEDLDLG
767



[T. reesei
FISPDQYTTPDIVCHKNAAPGAISATAAAGSNIVFQWGPGVWPHPYGPIVTYVAECSGSCTTVN




RUT C-3]
KNNLRWVKIQEAGINYNTQVWAQQDLINQGNKWTVKIPSSLRPGNYVFRHELLAAHGASSANGM




ETS3833.1
QNYPQCVNIAVTGSGTKALPAGTPATQLYKPTDPGILFNPYTTITSYTIPGPALWQG




GI:57228773





Endoglucanase III
MNKSVAPLLLAASILYGGAAAQQTVWGQCGGIGWSGPTNCAPGSACSTLNPYYAQCIPGATTIT
768



[T. reesei RUT C-3]
TSTRPPSGPTTTTRATSTSSSTPPTSSGVRFAGVNIAGFDFGCTTDGTCVTSKVYPPLKNFTGS




ETS4885.1
NNYPDGIGQMQHFVNDDGMTIFRLPVGWQYLVNNNLGGNLDSTSISKYDQLVQGCLSLGAYCIV




GI:572281861
DIHNYARWNGGIIGQGGPTNAQFTSLWSQLASKYASQSRVWFGIMNEPHDVNINTWAATVQEVV





TAIRNAGATSQFISLPGNDWQSAGAFISDGSAAALSQVTNPDGSTTNLIFDVHKYLDSDNSGTH





AECTTNNIDGAFSPLATWLRQNNRQAILTETGGGNVQSCIQDMCQQIQYLNQNSDVYLGYVGWG





AGSFDSTYVLTETPTGSGNSWTDTSLVSSCLARK




Putative
MKLLSIASLLSLVATAQAHMEVSWPPVFRSKYNPRVPGNLINYDMTSPLNADGSNYPCKGYQVD
769



endoglucanase [T.
VGRPEGAPGVTWRAGGTYNLTVAGSATHSGGSCQASLSYDRGRTWVVVHSWIGGCPLTPTWDFT





reesei RUT C-3]

LPNDTPPGEALFAWTWFNRIGNREMYMNCGAVTIRPSGRAARSPADSIYNRPAQFVANVNNGCA




ETS538.1
TLEGADVLFPSPGPDTDFDSDRTAAPVGKCGASSRRTRPVRA




GI:572282294





Endoglucanase-5
MKATLVLGSLIVGAVSAYKATTTRYYDGQEGACGCGSSSGAFPWQLGIGNGVYTAAGSQALFDT
770



(Cellulase V;
AGASWCGAGCGKCYQLTSTGQAPCSSCGTGGAAGQSIIVMVTNLCPNNGNAQWCPVVGGTNQYG




Endo-1,4-beta-
YSYHFDIMAQNEIFGDNVVVDFEPIACPGQAASDWGTCLCVGQQETDPTPVLGNDTGSTPPGSS




glucanase V; EG V;
PPATSSSPPSGGGQQTLYGQCGGAGWTGPTTCQAPGTCKVQNQWYSQCLP




Endoglucanase V;





P43317.1 GI:117136





Chain A, Active-
XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN
771



Site Mutant E212q
ETCAKNCCLDGAAYASTYGVTTSGNSLSIDFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS




Determined At Ph
FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW




6. With No Ligand
EPSSNNANTGIGGHGSCCSQMDIWEANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC




Bound In The
DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS




Active Site
YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN




2CEL_A GI:194214
ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG




Chain B, Active-
XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN
772



Site Mutant E212q
ETCAKNCCLDGAAYASTYGVTTSGNSLSIDFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS




Determined At Ph
FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW




6. With No Ligand
EPSSNNANTGIGGHGSCCSQMDIWEANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC




Bound In The
DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS




Active Site
YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN




2CEL_B GI:194215
ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG




Chain A, Active-
XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN
773



Site Mutant E212q
ETCAKNCCLDGAAYASTYGVTTSGNSLSIDFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS




Determined At Ph
FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW




6. WIth Cellobiose
EPSSNNANTGIGGHGSCCSQMDIWEANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC




Bound In The
DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS




Active Site
YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN




3CEL_A
ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG




GI:157836779





Chain A, Active-
XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN
774



site Mutant D214n
ETCAKNCCLDGAAYASTYGVTTSGNSLSIDFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS




Determined At Ph
FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW




6. With No Ligand
EPSSNNANTGIGGHGSCCSEMNIWEANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC




Bound In The
DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS




Active Site
YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN




4CEL_A GI:1941941
ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG




Chain B, Active-
XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN
775



site Mutant D214n
ETCAKNCCLDGAAYASTYGVTTSGNSLSIDFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS




Determined At Ph
FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW




6. With No Ligand
EPSSNNANTGIGGHGSCCSEMNIWEANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC




Bound In The
DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS




Active Site
YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN




4CEL_B GI:1941942
ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG




Endoglucanase-4
MIQKLSNLLVTALAVATGVVGHGHINDIVINGVWYQAYDPTTFPYESNPPIVVGWTAADLDNGF
776



(Cellulase IV;
VSPDAYQNPDIICHKNATNAKGHASVKAGDTILFQWVPVPWPHPGPIVDYLANCNGDCETVDKT




Cellulase-61A;
TLEFFKIDGVGLLSGGDPGTWASDVLISNNNTWVVKIPDNLAPGNYVLRHEIIALHSAGQANGA




Ce161A; Endo-1,4-
QNYPQCFNIAVSGSGSLQPSGVLGTDLYHATDPGVLINIYTSPLNYIIPGPTVVSGLPTSVAQG




beta-glucanase IV;
SSAATATASATVPGGGSGPTSRTTTTARTTQASSRPSSTPPATTSAPAGGPTQTLYGQCGGSGY




EGIV;
SGPTRCAPPATCSTLNPYYAQCLN




Endoglucanase IV;





Endoglucanase-61A)





01445.1





GI:21263647





Endoglucanase I
MAPSVTLPLTTAILAIARLVAAQQPGTSTPEVHPKLTTYKCTKSGGCVAQDTSVVLDWNYRWMH
777



precursor [T.
DANYNSCTVNGGVNTTLCPDEATCGKNCFIEGVDYAASGVTTSGSSLTMNQYMPSSSGGYSSVS





reesei RUT C-3]

PRLYLLDSDGEYVMLKLNGQELSFDVDLSALPCGENGSLYLSQMDENGGANQYNTAGANYGSGY




ETS775.1
CDAQCPVQTWRNGTLNTSHQGFCCNEMDILEGNSRANALTPHSCTATACDSAGCGFNPYGSGYK




GI:57228411
SYYGPGDTVDTSKTFTIITQFNTDNGSPSGNLVSITRKYQQNGVDIPSAQPGGDTISSCPSASA





YGGLATMGKALSSGMVLVFSIWNDNSQYMNWLDSGNAGPCSSTEGNPSNILANNPNTHVVFSNI





RWGDIGSTTNSTAPPPPPASSTTFSTTRRSSTTSSSPSCTQTHWGQCGGIGYSGCKTCTSGTTC





QYSNDYYSQCL




Chain A, Cbh1
XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN
778



(E217p) In Complex
ETCAKNCCLDGAAYASTYGVTTSGNSLSIDFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS




With Cellohexaose
FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW




And Cellobiose
EPSSNNANTGIGGHGSCCSEMDIWQANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC




7CEL_A
DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS




GI:157837135
YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN





ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG




Chain A, Cbh1
XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN
779



(E212p)
ETCAKNCCLDGAAYASTYGVTTSGNSLSIDFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS




Cellotetraose
FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW




Complex
EPSSNNANTGIGGHGSCCSQMDIWEANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC




5CEL_A GI:1578372
DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS





YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN





ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG




Chain A, Cbh1
XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN
780



(E212p)
ETCAKNCCLDGAAYASTYGVTTSGNSLSIDFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS




Cellopentaose
FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW




Complex
EPSSNNANTGIGGHGSCCSQMDIWEANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC




6CEL_A GI:15783787
DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS





YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN





ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG




Endoglucanase EG-
MNKSVAPLLLAASILYGGAVAQQTVWGQCGGIGWSGPTNCAPGSACSTLNPYYAQCIPGATTIT
781



II (EGLII;
TSTRPPSGPTTTTRATSTSSSTPPTSSGVRFAGVNIAGFDFGCTTDGTCVTSKVYPPLKNFTGS




Cellulase; Endo-
NNYPDGIGQMQHFVNEDGMTIFRLPVGWQYLVNNNLGGNLDSTSISKYDQLVQGCLSLGAYCIV




1,4-beta-
DIHNYARWNGGIIGQGGPTNAQFTSLWSQLASKYASQSRVWFGIMNEPHDVNINTWAATVQEVV




glucanase(
TAIRNAGATSQFISLPGNDWQSAGAFISDGSAAALSQVTNPDGSTTNLIFDVHKYLDSDNSGTH




P7982.1 GI:121794
AECTTNNIDGAFSPLATWLRQNNRQAILTETGGGNVQSCIQDMCQQIQYLNQNSDVYLGYVGWG





AGSFDSTYVLTETPTSSGNSWTDTSLVSSCLARK




Endoglucanase EG-
MAPSVTLPLTTAILAIARLVAAQQPGTSTPEVHPKLTTYKCTKSGGCVAQDTSVVLDWNYRWMH
782



1; Cellulase;
DANYNSCTVNGGVNTTLCPDEATCGKNCFIEGVDYAASGVTTSGSSLTMNQYMPSSSGGYSSVS




Endo-1,4-beta-
PRLYLLDSDGEYVMLKLNGQELSFDVDLSALPCGENGSLYLSQMDENGGANQYNTAGANYGSGY




glucanase;
CDAQCPVQTWRNGTLNTSHQGFCCNEMDILEGNSRANALTPHSCTATACDSAGCGFNPYGSGYK




P7981.1 GI:121788
SYYGPGDTVDTSKTFTIITQFNTDNGSPSGNLVSITRKYQQNGVDIPSAQPGGDTISSCPSASA





YGGLATMGKALSSGMVLVFSIWNDNSQYMNWLDSGNAGPCSSTEGNPSNILANNPNTHVVFSNI





RWGDIGSTTNSTAPPPPPASSTTFSTTRRSSTTSSSPSCTQTHWGQCGGIGYSGCKTCTSGTTC





QYSNDYYSQCL




endo-1,4-beta-
MKATLVLGSLIVGAVSAYKATTTRYYDGQEGACGCGSSSGAFPWQLGIGNGVYTAAGSQALFDT
783



glucanase V (EGV)
AGASWCGAGCGKCYQLTSTGQAPCSSCGTGGAAGQSIIVMVTNLCPNNGNAQWCPVVGGTNQYG




[T. reesei]
YSYHFDIMAQNEIFGDNVVVDFEPIACPGQAASDWGTCLCVGQQETDPTPVLGNDTGSTPPGSS




CAA838461
PPATSSSPPSGGGQQTLYGQCGGAGWTGPTTCQAPGTCKVQNQWYSQCLP




GI:485864





beta-1,4-glucanase
MKFLQVLPALIPAALAQTSCDQWATFTGNGYTVSNNLWGASAGSGFGCVTAVSLSGGASWHADW
784



[T. reesei]
QWSGGQNNVKSYQNSQIAIPQKRTVNSISSMPTTASWSYSGSNIRANVAYDLFTAANPNHVTYS




ABV71388.1
GDYELMIWLGKYGDIGPIGSSQGTVNVGGQSWTLYYGYNGAMQVYSFVAQTNTTNYSGDVKNFF




GI:15777972
NYLRDNKGYNAAGQYVLSYQFGTEPFTGSGTLNVASWTASIN




Cellbiohydrolase
MIVGILTTLATLATLAASVPLEERQACSSVWGQCGGQNWSGPTCCASGSTCVYSNDYYSQCLPG
785



II [T. reesei]
AASSSSSTRAASTTSRVSPTTSRSSSATPPPGSTTTRVPPVGSGTATYSGNPFVGVTPWANAYY




ADC83999.1
ASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLMEQTLADIRTANKNGGNYAGQFVV




GI:289152138
YDLPDRDCAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYSDIRTLLVIEPDSLANLVTNLGT





PKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPANQDPAAQLFANVYKNASSPRAL





RGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLANHGWSNAFFITDQGRSGKQPTG





QQQWGDWCNVTGTGFGIRPSANTGDSLLDSFVWVKPGGECDGTSDSSAPRFDSHCALPDALQPA





PQAGAWFQAYFVQLLTNANPSFL




Endo-beta-1,4-
MKFLQVLPALIPAALAQTSCDQWATFTGNGYTVSNNLWGASAGSGFGCVTAVSLSGGASWHADW
786



glucanase 
QWSGGQNNVKSYQNSQIAIPQKRTVNSISSMPTTASWSYSGSNIRANVAYDLFTAANPNHVTYS




[T. reesei]
GDYELMIWLGKYGDIGPIGSSQGTVNVGGQSWTLYYGYNGAMQVYSFVAQTNTTNYSGDVKNFF




BAA214.1
NYLRDNKGYNAAGQYVLSYQFGTEPFTGSGTLNVASWTASIN




GI:2116583





Chain A, Crystal
MGVRFAGVNIAGFDFGCTTDGTCVTSKVYPPLKNFTGSNNYPDGIGQMQHFVNEDGMTIFRLPV
787



Structure Of Cel5a
GWQYLVNNNLGGNLDSTSISKYDQLVQGCLSLGAYCIVDIHNYARWNGGIIGQGGPTNAQFTSL




Eg2) From
WSQLASKYASQSRVWFGIMNEPHDVNINTWAATVQEVVTAIRNAGATSQFISLPGNDWQSAGAF




Hypocrea Jecorina
ISDGSAAALSQVTNPDGSTTNLIFDVHKYLDSDNSGTHAECTTNNIDGAFSPLATWLRQNNRQA




(T. Reesei)
ILTETGGGNVQSCIQDMCQQIQYLNQNSDVYLGYVGWGAGSFDSTYVLTETPTSSGNSWTDTSL




3QR3_A GI:39981273
VSSCLARKGGSGSGHHHHHH




Chain B, Crystal
MGVRFAGVNIAGFDFGCTTDGTCVTSKVYPPLKNFTGSNNYPDGIGQMQHFVNEDGMTIFRLPV
788



Structure Of Cel5a
GWQYLVNNNLGGNLDSTSISKYDQLVQGCLSLGAYCIVDIHNYARWNGGIIGQGGPTNAQFTSL




(Eg2) From
WSQLASKYASQSRVWFGIMNEPHDVNINTWAATVQEVVTAIRNAGATSQFISLPGNDWQSAGAF




Hypocrea Jecorina
ISDGSAAALSQVTNPDGSTTNLIFDVHKYLDSDNSGTHAECTTNNIDGAFSPLATWLRQNNRQA




(T. Reesei)
ILTETGGGNVQSCIQDMCQQIQYLNQNSDVYLGYVGWGAGSFDSTYVLTETPTSSGNSWTDTSL




3QR3_B GI:39981274
VSSCLARKGGSGSGHHHHHH




Ce174a [T. reesei]
MKVSRVLALVLGAVIPAHAAFSWKNVKLGGGGGFVPGIIFHPKTKGVAYARTDIGGLYRLNADD
789



AAP57752.1
SWTAVTDGIADNAGWHNWGIDAVALDPQDDQKVYAAVGMYTNSWDPSNGAIIRSSDRGATWSFT




GI:317471
NLPFKVGGNMPGRGAGERLAVDPANSNIIYFGARSGNGLWKSTDGGVTFSKVSSFTATGTYIPD





PSDSNGYNSDKQGLMWVTFDSTSSTTGGATSRIFVGTADNITASVYVSTNAGSTWSAVPGQPGK





YFPHKAKLQPAEKALYLTYSDGTGPYDGTLGSVWRYDIAGGTWKDITPVSGSDLYFGFGGLGLD





LQKPGTLVVASLNSWWPDAQLFRSTDSGTTWSPIWAWASYPTETYYYSISTPKAPWIKNNFIDV





TSESPSDGLIKRLGWMIESLEIDPTDSNHWLYGTGMTIFGGHDLTNWDTRHNVSIQSLADGIEE





FSVQDLASAPGGSELLAAVGDDNGFTFASRNDLGTSPQTVWATPTWATSTSVDYAGNSVKSVVR





VGNTAGTQQVAISSDGGATWSIDYAADTSMNGGTVAYSADGDTILWSTASSGVQRSQFQGSFAS





VSSLPAGAVIASDKKTNSVFYAGSGSTFYVSKDTGSSFTRGPKLGSAGTIRDIAAHPTTAGTLY





VSTDVGIFRSTDSGTTFGQVSTALTNTYQIALGVGSGSNWNLYAFGTGPSGARLYASGDSGASW





TDIQGSQGFGSIDSTKVAGSGSTAGQVYVGTNGRGVFYAQGTVGGGTGGTSSSTKQSSSSTSSA





SSSTTLRSSVVSTTRASTVTSSRTSSAAGPTGSGVAGHYAQCGGIGWTGPTQCVAPYVCQKQND





YYYQCV




Ce161b [T. reesei]
MKSCAILAALGCLAGSVLGHGQVQNFTINGQYNQGFILDYYYQKQNTGHFPNVAGWYAEDLDLG
790



AAP57753.1
FISPDQYTTPDIVCHKNAAPGAISATAAAGSNIVFQWGPGVWPHPYGPIVTYVVECSGSCTTVN




GI:31747162
KNNLRWVKIQEAGINYNTQVWAQQDLINQGNKWTVKIPSSLRPGNYVFRHELLAAHGASSANGM





QNYPQCVNIAVTGSGTKALPAGTPATQLYKPTDPGILFNPYTTITSYTIPGPALWQG




Ce15b [T. reesei]
MRATSLLAAALAVAGDALAGKIKYLGVAIPGIDFGCDIDGSCPTDTSSVPLLSYKGGDGAGQMK
791



AAP57754.1
HFAEDDGLNVFRISATWQFVLNNTVDGKLDELNWGSYNKVVNACLETGAYCMIDMHNFARYNGG




GI:31747164
IIGQGGVSDDIFVDLWVQIAKYYEDNDKIIFGLMNEPHDLDIEIWAQTCQKVVTAIRKAGATSQ





MILLPGTNFASVETYVSTGSAEALGKITNPDGSTDLLYFDVHKYLDINNSGSHAECTTDNVDAF





NDFADWLRQNKRQAIISETGASMEPSCMTAFCAQNKAISENSDVYIGFVGWGAGSFDTSYILTL





TPLGKPGNYTDNKLMNECILDQFTLDEKYRPTPTSISTAAEETATATATSDGDAPSTTKPIFRE





ETASPTPNAVTKPSPDTSDSSDDDKDSAASMSAQGLTGTVLFTVAALGYMLVAF




Endoglucanase I
MAPSVTLPLTTAILAIARLVAAQQPGTSTPEVHPKLTTYKCTKSGGCVAQDTSVVLDWNYRWMH
792



132152A
DANYNSCTVNGGVNTTLCPDEATCGKNCFIEGVDYAASGVTTSGSSLTMNQYMPSSSGGYSSVS




GI:22541
PRLYLLDSDGEYVMLKLNGQELSFDVDLSALPCGENGSLYLSQMDENGGANQYNTAGANYGSGY





CDAQCPVQTWRNGTLNTSHQGFCCNEMDILEGNSRANALTPHSCTATACDSAGCGFNPYGSGYK





SYYGPGDTVDTSKTFTIITQFNTDNGSPSGNLVSITRKYQQNGVDIPSAQPGGDTISSCPSASA





YGGLATMGKALSSGMVLVFSIWNDNSQYMNWLDSGNAGPCSSTEGNPSNILANNPNTHVVFSNI





RWGDIGSTTNSTAPPPPPASSTTFSTTRRSSTTSSSPSCTQTHWGQCGGIGYSGCKTCTSGTTC





QYSNDYYSQCL




Endoglucanase IV
MIQKLSNLLVTALAVATGVVGHGHINDIVINGVWYQAYDPTTFPYESNPPIVVGWTAADLDNGF
793



[Trichoderma
VSPDAYQNPDIICHKNATNAKGHASVKAGDTILFQWVPVPWPHPGPIVDYLANCNGDCETVDKT





reesei]

TLEFFKIDGVGLLSGGDPGTWASDVLISNNNTWVVKIPDNLAPGNYVLRHEIIALHSAGQANGA




CAA71999.1
QNYPQCFNIAVSGSGSLQPSGVLGTDLYHATDPGVLINIYTSPLNYIIPGPTVVSGLPTSVAQG




GI:2315274
SSAATATASATVPGGGSGPTSRTTTTARTTQASSRPSSTPPATTSAPAGGPTQTLYGQCGGSGY





SGPTRCAPPATCSTLNPYYAQCLN




Endoglucanase I
MAPSVTLPLTTAILAIARLVAAQQPGTSTPEVHPKLTTYKCTKSGGCVAQDTSVVLDWNYRWMH
794



[Trichoderma
DANYNSCTVNGGVNTTLCPDEATCGKNCFIEGVDYAASGVTTSGSSLTMNQYMPSSSGGYSSVS





reesei]

PRLYLLDSDGEYVMLKLNGQELSFDVDLSALPCGENGSLYLSQMDENGGANQYNTAGANYGSGY




ADM8177.1
CDAQCPVQTWRNGTLNTSHQGFCCNEMDILEGNSRANALTPHSCTATACDSAGCGFSPYGSGYK




GI:3329711
SYYGPGDTVDTSKTFTIITQFNTDNGSPSGNLVSITRKYQQNGVDVPSAQPGGDTISSCPSASA





YGGLATMGKALSSGMVLVFSIWNDNSQYMNWLDSGNAGPCSSTEGNPSNILANNPNTHVVFSNI





RWGDIGSTTNSTAPPPPPASSTTFSTTRRSSTTSSSPSCTQTHWGQCGGIGYSGCKTCTSGTTC





QYSNDYYSQCL




Endoglucanase II
MNKSVAPLLLAASILYGGAVAQQTVWGQCGGIGNSGPTNCAPGSACSTLNPYYAQCIPGATTIT
795



precursor
TSTRPPSGPTTTTRATSTSSSTPPTSSGVRFAGVNIAGFDFGCTTDGTCVTSKVYPPLKNFTGS




[Trichoderma
NNYPDGIGQMQHFVNEDGMTIFRLPVGWQYLVNNNLGGNLDSTSISKYDQLVQGCLSLGAYCIV





reesei]

DIHNYARWNGGIIGQGGPTNAQFTSLWSQLASKYASQSRVWFGIMNEPHDVNINTWAATVQEVV




AAA34213.1
TAIRNAGATSQFISLPGNDWQSAGAFISDGSAAALSQVTNPDGSTTNLIFDVHKYLDSDNSGTH




GI:17549
AECTTNNIDGAFSPLATWLRQNNRQAILTETGGGNVQSCIQDMCQQIQYLNQNSDVYLGYVGWG





AGSFDSTYVLTETPTSSGNSWTDTSLVSSCLARK




Endoglucanase II
MNKSVAPLLLAASILYGGAVAQQTVWGQCGGIGWSGPTNCAPGSACSTLNPYYAQCIPGATTIT
796



[Trichoderma
TSTRPPSGPTTTTRATSTSSSTPPTSSGVRFAGVNIAGFDFGCTTDGTCVTSKVYPPLKNFTGS





reesei]

NNYPDGIGQMQHFVNEDGMTIFRLPVGWQYLVNNNLGGNLDSTSISKYDQLVQGCLSLGAYCIV




ABA64553.1
DIHNYARWNGGIIGQGGPTNAQFTSLWSQLASKYASQSRVWFGIMNEPHDVNINTWAATVQEVV




GI:77176916
TAIRNAGATSQFISLPGNDWQSAGAFISDGSAAALSQVTNPDGSTTNLIFDVHKYLDSDNSGTH





AECTTNNIDGAFSPLATWLRQNNRQAILTETGGGNVQSCIQDMCQQIQYLNQNSDVYLGYVGWG





AGSFDSTYVLTETPTGSGNSWTDTSLVSSCLARK




Chain A, The X-Ray
XTSCDQWATFTGNGYTVSNNLWGASAGSGFGCVTAVSLSGGASWHADWQWSGGQNNVKSYQNSQ
797



Crystal Structure
IAIPQKRTVNSISSMPTTASWSYSGSNIRANVAYDLFTAANPNHVTYSGDYELMIWLGKYGDIG




Of T. Reesei
PIGSSQGTVNVGGQSWTLYYGYNGAMQVYSFVAQTNTTNYSGDVKNFFNYLRDNKGYNAAGQYV




Family 12
LSYQFGTEPFTGSGTLNVASWTASIN




Endoglucanase 3,





Cel12a, At 1.9 A





Resolution





1H8V_A GI:14278359





Chain B, The X-Ray
XTSCDQWATFTGNGYTVSNNLWGASAGSGFGCVTAVSLSGGASWHADWQWSGGQNNVKSYQNSQ
798



Crystal Structure
IAIPQKRTVNSISSMPTTASWSYSGSNIRANVAYDLFTAANPNHVTYSGDYELMIWLGKYGDIG




Of T. Reesei
PIGSSQGTVNVGGQSWTLYYGYNGAMQVYSFVAQTNTTNYSGDVKNFFNYLRDNKGYNAAGQYV




Family 12
LSYQFGTEPFTGSGTLNVASWTASIN




Endoglucanase 3,





Cel12a, At 1.9 A





Resolution





1H8V_B GI:142783





Chain C, The X-Ray
XTSCDQWATFTGNGYTVSNNLWGASAGSGFGCVTAVSLSGGASWHADWQWSGGQNNVKSYQNSQ
799



Crystal Structure
IAIPQKRTVNSISSMPTTASWSYSGSNIRANVAYDLFTAANPNHVTYSGDYELMIWLGKYGDIG




Of T. Reesei
PIGSSQGTVNVGGQSWTLYYGYNGAMQVYSFVAQTNTTNYSGDVKNFFNYLRDNKGYNAAGQYV




Family 12
LSYQFGTEPFTGSGTLNVASWTASIN




Endoglucanase 3,





Cel12a, At 1.9 A





Resolution





1H8V_C GI:14278361





Chain D, The X-Ray
XTSCDQWATFTGNGYTVSNNLWGASAGSGFGCVTAVSLSGGASWHADWQWSGGQNNVKSYQNSQ
800



Crystal Structure
IAIPQKRTVNSISSMPTTASWSYSGSNIRANVAYDLFTAANPNHVTYSGDYELMIWLGKYGDIG




Of T. Reesei
PIGSSQGTVNVGGQSWTLYYGYNGAMQVYSFVAQTNTTNYSGDVKNFFNYLRDNKGYNAAGQYV




Family 12
LSYQFGTEPFTGSGTLNVASWTASIN




Endoglucanase 3,





Cell2a, At 1.9 A





Resolution





1H8V_D GI:14278362





Chain E, The X-Ray
XTSCDQWATFTGNGYTVSNNLWGASAGSGFGCVTAVSLSGGASWHADWQWSGGQNNVKSYQNSQ
801



Crystal Structure
IAIPQKRTVNSISSMPTTASWSYSGSNIRANVAYDLFTAANPNHVTYSGDYELMIWLGKYGDIG




Of The T. Reesei
PIGSSQGTVNVGGQSWTLYYGYNGAMQVYSFVAQTNTTNYSGDVKNFFNYLRDNKGYNAAGQYV




Family 12
LSYQFGTEPFTGSGTLNVASWTASIN




Endoglucanase 3,





Cell2a, At 1.9 A





Resolution





1H8V_E GI:14278363





Chain F, The X-Ray
XTSCDQWATFTGNGYTVSNNLWGASAGSGFGCVTAVSLSGGASWHADWQWSGGQNNVKSYQNSQ
802



Crystal Structure
IAIPQKRTVNSISSMPTTASWSYSGSNIRANVAYDLFTAANPNHVTYSGDYELMIWLGKYGDIG




Of The T. Reesei
PIGSSQGTVNVGGQSWTLYYGYNGAMQVYSFVAQTNTTNYSGDVKNFFNYLRDNKGYNAAGQYV




Family 12
LSYQFGTEPFTGSGTLNVASWTASIN




Endoglucanase 3,





Cell2a, At 1.9 A





Resolution





1H8V_F GI:14278364





Endoglucanase I
MAPSVTLPLTTAILAIARLVAAQQPGTSTPEVHPKLTTYKCTKSGGCVAQDTSVVLDWNYRWMH
803



precursor 
DANYNSCTVNGGVNTTLCPDEATCGKNCFIEGVDYAASGVTTSGSSLTMNQYMPSSSGGYSSVS




[T. reesei]
PRLYLLDSDGEYVMLKLNGQELSFDVDLSALPCGENGSLYLSQMDENGGANQYNTAGANYGSGY




AAA34212.1
CDAQCPVQTWRNGTLNTSHQGFCCNEMDILEGNSRANALTPHSCTATACDSAGCGFNPYGSGYK




GI:17547
SYYGPGDTVDTSKTFTIITQFNTDNGSPSGNLVSITRKYQQNGVDIPSAQPGGDTISSCPSASA





YGGLATMGKALSSGMVLVFSIWNDNSQYMNWLDSGNAGPCSSTEGNPSNILANNPNTHVVFSNI





RWGDIGSTTNSTAPPPPPASSTTFSTTRRSSTTSSSPSCTQTHWGQCGGIGYSGCKTCTSGTTC





QYSNDYYSQCL




Chain A, Solution
SCTQTHWGQCGGIGYSGCKTCTSGTTCQYSNDYYSQCL
804



Structure Of The





Cellulose-binding





Domain Of





Endoglucanase I





From T. Reesei And





Its Interaction





With Cello-





oligosaccharides





4BMF_A GI:5743





Chain A,
XQPGTSTPEVHPKLTTYKCTKSGGCVAQDTSVVLDWNYRWMHDANYNSCTVNGGVNTTLCPDEA
805



Endoglucanase I
TCGKNCFIEGVDYAASGVTTSGSSLTMNQYMPSSSGGYSSVSPRLYLLDSDGEYVMLKLNGQEL




From Trichoderma
SFDVDLSALPCGENGSLYLSQMDENGGANQYNTAGANYGSGYCDAQCPVQTWRNGTLNTSHQGF





Reesei

CCNEMDILEGNSRANALTPHSCTATACDSAGCGFNPYGSGYKSYYGPGDTVDTSKTFTIITQFN




lEGl_A GI:239236
TDNGSPSGNLVSITRKYQQNGVDIPSAQPGGDTISSCPSASAYGGLATMGKALSSGMVLVFSIW





NDNSQYMNWLDSGNAGPCSSTEGNPSNILANNPNTHVVFSNIRWGDIGSTT




Chain C,
XQPGTSTPEVHPKLTTYKCTKSGGCVAQDTSVVLDWNYRWMHDANYNSCTVNGGVNTTLCPDEA
806



Endoglucanase I
TCGKNCFIEGVDYAASGVTTSGSSLTMNQYMPSSSGGYSSVSPRLYLLDSDGEYVMLKLNGQEL




From Trichoderma
SFDVDLSALPCGENGSLYLSQMDENGGANQYNTAGANYGSGYCDAQCPVQTWRNGTLNTSHQGF





Reesei

CCNEMDILEGNSRANALTPHSCTATACDSAGCGFNPYGSGYKSYYGPGDTVDTSKTFTIITQFN




lEGl_C GI:239237
TDNGSPSGNLVSITRKYQQNGVDIPSAQPGGDTISSCPSASAYGGLATMGKALSSGMVLVFSIW





NDNSQYMNWLDSGNAGPCSSTEGNPSNILANNPNTHVVFSNIRWGDIGSTT




Endoglucanase
MRSFSLLGSLSLLTSLSWALPTEGVISKLEGRQSGSSWFLPNIDHTTGAVRGYVPNLFNSAGQQ
807



[Trichoderma
NFTYPVYKTVASGDSAGFVNALYSDGPSGGQRDNCYLAGEPRVIYLPPGTYTVSSTIFFDTDTV





reesei RUT C-3]

IIGDAANPPTIKAAAGFNGDYLIVGGQGDGDSHPCGGSGGETHFSVMIKNVILDTTANAGSSGF




ETS6856.1
TALSWAVAQNCALVNVKINMPQGVHTGMLVSGGSTISISDVSFNFGNIGLHWNGHQQGQIKGMT




GI:572283882
FTDCTNGIFIDSGFTISIFAPTCNTVGRCIVLNSGNAWVAVIDGQSINSGDFFTSNVGFPNFML





ENISKDTTNSNMVVVGGNVKVGGSTSLGTYVYGNTRGANPVYQTNPTSQPVNRPAALAPGGRYP





VINAPQYADKTVANVVNLKDPNQNGGHTLQGDGFTDDTAALQGALNTAASQGKIAYLPFGIYIV





KSTITIPPGTELYGEAWSTISGSGSAFSSETNPTPVVQIGATPGQKGVAHVQDIRFTVNEALPG





AILLRINMAGNNPGDVAVFNSLNTIGGTRDTSISCSSESNCRAAYLGLHLAAGSSAYIDNFWSW





VADHATDQSGKGTRTAVKGGVLVEATAGTWLTGLGSEHNWLYQLSFHNAANVFISLFQSETNYN





QGNNGAPLPGTPFDATSIDPNFSWCSGGDTVCRMGLAQYYTGSNSNIFHYAAGSWNFIGLTKVN





QGLMNFIQSTISNAHLYGFTSGPNTGETMRLPNGVEFGNGGNDGYGGSWGTLIANIASQS




Endoglucanase,
MKATLVLGSLIVGAVSAYKATTTRYYDGQEGACGCGSSSGAFPWQLGIGNGVYTAAGSQALFDT
808



partial 
AGASWCGAGCGKCYQLTSTGQAPCSSCGTGGAAGQSIIVMVTNLCPNNGNAQWCPVVGGTNQYG




[T. reesei]
YSYHFDIMAQNEIFGDNVVVDFEPIACPGQAASDWGTCLCVGQQETDPTPVLGNDTGSTPPGSS




AHK2346.1
PPATSSSPPSGGGQQTLYGQCGGAGWTGPTTCQAPGTCKVQNQWYSQCLP




GI:58829453





Chain A, Hypocrea
XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN
809



Jecorina Cel7a
ETCAKNCCLDGAAYASTYGVTTSGNSLSIDFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS




E212q Mutant In
FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW




Complex With P-
EPSSNNANTGIGGHGSCCSQMDIWEANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC




nitrophenyl
DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS




Cellobioside
YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN




4UWT_A
ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG




GI:922664681





Chain A, 0-
XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN
810



nitropenyl
ETCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS




Cellobioside As An
FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW




Active Site Probe
EPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC




For Family 7
DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQDGVTFQQPNAELGS




Cellobiohydrolases
YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN




4VZ_A GI:931139719
ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG




Chain A, Hypocrea
XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN
811



Jecorina Cel7a
ETCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS




(wild Type) Soaked
FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW




With Xylopentaose.
EPSSNNANTGIGGHGSCCSEMDIWQANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC




4D5Q_A
DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS




GI:783282859
YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN





ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG




Chain A, Cbh1 In
XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN
812



Complex With S-
ETCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS




propranolol
FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW




1DY4_A GI:1284415
EPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC





DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS





YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN





ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG




glycoside
MIVGILTTLATLATLAASVPLEERQACSSVWGQCGGQNWSGPTCCASGSTCVYSNDYYSQCLPG
813



hydrolase family 6
AASSSSSTRAASTTSRVSPTTSRSSSATPPPGSTTTRVPPVGSGTATYSGNPFVGVTPWANAYY




[T. reesei QM6a1]
ASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLMEQTLADIRTANKNGGNYAGQFVV





YDLPDRDCAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYSDIRTLLVIEPDSLANLVTNLGT




XP_696258.1
PKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPANQDPAAQLFANVYKNASSPRAL




GI:58911115
RGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLANHGWSNAFFITDQGRSGKQPTG





QQQWGDWCNVIGTGFGIRPSANTGDSLLDSFVWVKPGGECDGTSDSSAPRFDSHCALPDALQPA





PQAGAWFQAYFVQLLTNANPSFL




glycoside
MYRKLAVISAFLATARAQSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSS
814



hydro1ase family 7
TNCYDGNTWSSTLCPDNETCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQKNVGARLYLM




[T. reesei QM6a]
ASDTTYQEFTLLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQ




XP_6969224.1
CPRDLKFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICE




GI:58911443
GDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRY





YVQNGVTFQQPNAELGSYSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLW





DDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGN





PSGGNPPGGNPPGTTTTRRPATTTGSSPGPTQSHYGQCGGIGYSGPTVCASGTTCQVLNPYYSQ





CL




glycoside
MYRKLAVISAFLATARAQSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSS
815



hydro1ase family 7
TNCYDGNTWSSTLCPDNETCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQKNVGARLYLM




[T. reesei QM6a]
ASDTTYQEFTLLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQ




EGR44817.1
CPRDLKFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICE




GI:34514556
GDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRY





YVQNGVTFQQPNAELGSYSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLW





DDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGN





PSGGNPPGGNPPGTTTTRRPATTTGSSPGPTQSHYGQCGGIGYSGPTVCASGTTCQVLNPYYSQ





CL




glycoside
MIVGILTTLATLATLAASVPLEERQACSSVWGQCGGQNWSGPTCCASGSTCVYSNDYYSQCLPG
816



hydro1ase family 6
AASSSSSTRAASTTSRVSPTTSRSSSATPPPGSTTTRVPPVGSGTATYSGNPFVGVTPWANAYY




[T. reesei QM6a]
ASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLMEQTLADIRTANKNGGNYAGQFVV




EGR5117.1
YDLPDRDCAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYSDIRTLLVIEPDSLANLVTNLGT




GI:3452782
PKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPANQDPAAQLFANVYKNASSPRAL





RGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLANHGWSNAFFITDQGRSGKQPTG





QQQWGDWCNVIGTGFGIRPSANTGDSLLDSFVWVKPGGECDGTSDSSAPRFDSHCALPDALQPA





PQAGAWFQAYFVQLLTNANPSFL




Chain A, Hypocrea
XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN
817



Jecorina
ETCAKNCCLDGAAYASTYGVTTSGNSLSIDFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS




Cellobiohydro1ase
FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW




Ce17a E212q Soaked
EPSSNNANTGIGGHGSCCSQMDIWEANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC




With Xy1otriose.
DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS




4D5I_A
YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN




GI:783282849
ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG




Chan A, Hypocrea
XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN
818



Jecorina
ETCAKNCCLDGAAYASTYGVTTSGNSLSIDFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS




Cellobiohydrolase
FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW




Cel7a E217q Soaked
EPSSNNANTGIGGHGSCCSEMDIWQANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC




With Xylopentaose.
DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS




4D5P_A
YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN




GI:783282856
ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG




Chain A, Hypocrea
XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN
819



Jecorina Ce17a In
ETCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS




Complex With (R)-
FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW




Dihydroxy-
EPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC




Phenanthrenolol
DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS




2V3I_A GI:19356563
YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN





ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG




Chain A, Hypocrea
XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN
820



Jecorina Ce17a In
ETCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS




Complex With (S)-
FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW




Dihydroxy-
EPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC




Phenanthrenolol
DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS




2V3R_A GI:19356564
YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN





ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG




Chain A, Michaelis
XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN
821



Complex Of
ETCAKNCCLDGAAYASTYGVTTSGNSLSIDFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS




Hypocrea Jecorina
FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW




Cel7a E217q Mutant
EPSSNNANTGIGGHGSCCSEMDIWQANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC




With Cellononaose
DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS




Spanning The
YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN




Active Site
ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG




4C4C_A GI:57215318





Chain A, Covalent
XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN
822



Glycosyl-enzyme
ETCAKNCCLDGAAYASTYGVTTSGNSLSIDFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS




Intermediate Of
FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW




Hypocrea Jecorina
EPSSNNANTGIGGHGSCCSEMDIWQANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC




Ce17a E217q Mutant
DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS




Trapped Using Dnp-
YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN




2-deoxy-2-fluoro-
ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG




cellotrioside





4C4D_A





GI:572153181





Chain A, Ce16a
SGTATYSGNPFVGVTPWANAYYASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLME
823



D175 a Mutant
QTLADIRTANKNGGNYAGQFVVYDLPDRACAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYS




1HGW_A
DIRTLLVIEPDSLANLVTNLGTPKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPA




GI:1865599
NQDPAAQLFANVYKNASSPRALRGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLA





NHGWSNAFFITDQGRSGKQPTGQQQWGDWCNVIGTGFGIRPSANTGDSLLDSFVWVKPGGECDG





TSDSSAPRFDSHCALPDALQPAPQAGAWFQAYFVQLLTNANPSFL




Chain B, Ce16a
SGTATYSGNPFVGVTPWANAYYASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLME
824



D175a Mutant
QTLADIRTANKNGGNYAGQFVVYDLPDRACAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYS




1HGW_B
DIRTLLVIEPDSLANLVTNLGTPKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPA




GI:1865591
NQDPAAQLFANVYKNASSPRALRGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLA





NHGWSNAFFITDQGRSGKQPTGQQQWGDWCNVIGTGFGIRPSANTGDSLLDSFVWVKPGGECDG





TSDSSAPRFDSHCALPDALQPAPQAGAWFQAYFVQLLTNANPSFL




Cain A, Ce16a
SGTATYSGNPFVGVTPWANAYYASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLME
825



D221a Mutant
QTLADIRTANKNGGNYAGQFVVYDLPDRDCAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYS




1HGY_A
DIRTLLVIEPASLANLVTNLGTPKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPA




GI:18655911
NQDPAAQLFANVYKNASSPRALRGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLA





NHGWSNAFFITDQGRSGKQPTGQQQWGDWCNVIGTGFGIRPSANTGDSLLDSFVWVKPGGECDG





TSDSSAPRFDSHCALPDALQPAPQAGAWFQAYFVQLLTNANPSFL




Chain B, Ce16a
SGTATYSGNPFVGVTPWANAYYASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLME
826



D221a Mutant
QTLADIRTANKNGGNYAGQFVVYDLPDRDCAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYS




1HGY_B
DIRTLLVIEPASLANLVTNLGTPKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPA




GI:18655912
NQDPAAQLFANVYKNASSPRALRGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLA





NHGWSNAFFITDQGRSGKQPTGQQQWGDWCNVIGTGFGIRPSANTGDSLLDSFVWVKPGGECDG





TSDSSAPRFDSHCALPDALQPAPQAGAWFQAYFVQLLTNANPSFL




Cellobiohydro1ase,
ESACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN
827



beta g1ucan
ETCAKNCCLDGAAYTSAYSSZPGGGGGVVIFFKNVGARLYLMASDTTYQEFTLLGNEFSFDVDV




13195A
SQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGWEPSSN




GI:223874
NANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTCDPDGC





DWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQDGVTFQQPNAELGSYSGNE





LNDDYCTAEEAEFGGSSFSDKGGLTQFXXATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSST





PGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSGGNPPGGNPPGTTTTTTTSS





SZPPPGAHRRYGQCGGIGYSGPTVCASGTTCQVLNPYYSQCL




Chain A, Cel6a
SGTATYSGNPFVGVTPWANAYYASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLME
828



(y169f) With A
QTLADIRTANKNGGNYAGQFVVFDLPDRDCAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYS




Non-hydrolysable
DIRTLLVIEPDSLANLVTNLGTPKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPA




Cellotetraose
NQDPAAQLFANVYKNASSPRALRGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLA




1QJW_A GI:6137482
NHGWSNAFFITDQGRSGKQPTGQQQWGDWCNVIGTGFGIRPSANTGDSLLDSFVWVKPGGECDG





TSDSSAPRFDSHCALPDALQPAPQAGAWFQAYFVQLLTNANPSFL




Chain B, Cel6a
SGTATYSGNPFVGVTPWANAYYASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLME
829



(y169f) With A
QTLADIRTANKNGGNYAGQFVVFDLPDRDCAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYS




Non-hydrolysable
DIRTLLVIEPDSLANLVTNLGTPKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPA




Cellotetraose
NQDPAAQLFANVYKNASSPRALRGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLA




1QJW_B GI:6137483
NHGWSNAFFITDQGRSGKQPTGQQQWGDWCNVIGTGFGIRPSANTGDSLLDSFVWVKPGGECDG





TSDSSAPRFDSHCALPDALQPAPQAGAWFQAYFVQLLTNANPSFL




Chain A, Cel6a In
TATYSGNPFVGVTPWANAYYASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLMEQT
830



Complex WIth M-
LADIRTANKNGGNYAGQFVVYDLPDRDCAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYSDI




iodobenzyl Beta-d-
RTLLVIEPDSLANLVTNLGTPKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPANQ




glucopyranosyl-
DPAAQLFANVYKNASSPRALRGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLANH




Beta(1,4)-d-
GWSNAFFITDQGRSGKQPTGQQQWGDWCNVIGTGFGIRPSANTGDSLLDSFVWVKPGGECDGTS




xylopyranoside
DSSAPRFDSHCALPDALQPAPQAGAWFQAYFVQLLTNANPSFL




1QK_A GI:6137484





Chain B, Cel6a In
TATYSGNPFVGVTPWANAYYASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLMEQT
831



Complex With M-
LADIRTANKNGGNYAGQFVVYDLPDRDCAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYSDI




iodobenzyl Beta-d-
RTLLVIEPDSLANLVTNLGTPKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPANQ




glucopyranosyl-
DPAAQLFANVYKNASSPRALRGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLANH




Beta(1,4)-d-
GWSNAFFITDQGRSGKQPTGQQQWGDWCNVIGTGFGIRPSANTGDSLLDSFVWVKPGGECDGTS




xylopyranoside
DSSAPRFDSHCALPDALQPAPQAGAWFQAYFVQLLTNANPSFL




1QK_B GI:6137485





Chain A, Wild Type
TATYSGNPFVGVTPWANAYYASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLMEQT
832



Ce16a With A Non-
LADIRTANKNGGNYAGQFVVYDLPDRDCAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYSDI




hydrolysable
RTLLVIEPDSLANLVTNLGTPKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPANQ




Cellotetraose
DPAAQLFANVYKNASSPRALRGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLANH




1QK2_A GI:6137486
GWSNAFFITDQGRSGKQPTGQQQWGDWCNVIGTGFGIRPSANTGDSLLDSFVWVKPGGECDGTS





DSSAPRFDSHCALPDALQPAPQAGAWFQAYFVQLLTNANPSFL




Chain B, Wild Type
TATYSGNPFVGVTPWANAYYASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLMEQT
833



Ce16a With A Non-
LADIRTANKNGGNYAGQFVVYDLPDRDCAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYSDI




hydrolysable
RTLLVIEPDSLANLVTNLGTPKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPANQ




Cellotetraose
DPAAQLFANVYKNASSPRALRGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLANH




1QK2_B GI:6137487
GWSNAFFITDQGRSGKQPTGQQQWGDWCNVIGTGFGIRPSANTGDSLLDSFVWVKPGGECDGTS





DSSAPRFDSHCALPDALQPAPQAGAWFQAYFVQLLTNANPSFL




Exoglucanase 2
MIVGILTTLATLATLAASVPLEERQACSSVWGQCGGQNWSGPTCCASGSTCVYSNDYYSQCLPG
834



(also known as
AASSSSSTRAASTTSRVSPTTSRSSSATPPPGSTTTRVPPVGSGTATYSGNPFVGVTPWANAYY




1,4-beta_
ASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLMEQTLADIRTANKNGGNYAGQFVV




cellobiohydrolase;
YDLPDRDCAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYSDIRTLLVIEPDSLANLVTNLGT




exocellobiohydrola
PKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPANQDPAAQLFANVYKNASSPRAL




se II; CBHII;
RGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLANHGWSNAFFITDQGRSGKQPTG




Exoglucanase II
QQQWGDWCNVIGTGFGIRPSANTGDSLLDSFVWVKPGGECDGTSDSSAPRFDSHCALPDALQPA




P7987.1 GI:121855
PQAGAWFQAYFVQLLTNANPSFL




Exoglucanase 1
MYRKLAVISAFLATARAQSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSS
835



(also known as
TNCYDGNTWSSTLCPDNETCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQKNVGARLYLM




1,4-beta-
ASDTTYQEFTLLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQ




cellobIohydrolase;
CPRDLKFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICE




ExocellobIohydrola
GDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRY




se I; CBHI;
YVQNGVTFQQPNAELGSYSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLW




AltName:
DDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGN




Full=Exoglucanase
PSGGNPPGGNRGTTTTRRPATTTGSSPGPTQSHYGQCGGIGYSGPTVCASGTTCQVLNPYYSQC




I
L




P62694.1 GI:542144





Unnamed protein
MIVGILTTLATLATLAASVPLEERQACSSVWGQCGGQNWSGPTCCASGSTCVYSNDYYSQCLPG
836



product 
AASSSSSTRAASTTSRVSPTTSRSSSATPPPGSTTTRVPPVGSGTATYSGNPFVGVTPWANAYY




[T. reesei]
ASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLMEQTLADIRTANKNGGNYAGQFVV




CAV28333.1
YDLPDRDCAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYSDIRTLLVIEPDSLANLVTNLGT




GI:21829938
PKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPANQDPAAQLFANVYKNASSPRAL





RGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLANHGWSNAFFITDQGRSGKQPTG





QQQWGDWCNVIGTGFGIRPSANTGDSLLDSFVWVKPGGECDGTSDSSAPRFDSHCALPDALQPA





PQAGAWFQAYFVQLLTNANPSFL




CellobIohydrolase
MIVGILTTLATLATLAASVPLEERQACSSVWGQCGGQNWSGPTCCASGSTCVYSNDYYSQCLPG
837



II
AASSSSSTRAASTTSRVSPTTSRSSSATPPPGSTTTRVPPVGSGTATYSGNPFVGVTPWANAYY




134188A
ASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLMEQTLADIRTANKNGGNYAGQFVV




GI:225475
YDLPDRDCAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYSDIRTLLVIEPDSLANLVTNLGT





PKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPANQDPAAQLFANVYKNASSPRAL





RGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLANHGWSNAFFITDQGRSGKQPTG





QQQWGDWCNVIGTGFGIRPSANTGDSLLDSFVWVKPGGECDGTSDSSAPRFDSHCALPDALQPA





PQAGAWFQAYFVQLLTNANPSFL




Unnamed protein
MIVGILTTLATLATLAASVPLEERQACSSVWGQCGGQNWSGPTCCASGSTCVYSNDYYSQCLPG
838



[Trichoderma
AASSSSSTRAASTTSRVSPTTSRSSSATPPPGSTTTRVPPVGSGTATYSGNPFVGVTPWANAYY




reesei]
ASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLMEQTLADIRTANKNGGNYAGQFVV




AAA72922.1
YDLPDRDCAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYSDIRTLLVIEPDSLANLVTNLGT




GI:17543
PKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPANQDPAAQLFANVYKNASSPRAL





RGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGRLLANHGWSNAFFITDQGRSGKQPTG





QQQWGDWCNVIGTGFGIRPSANTGDSLLDSFVWVKPGGECDGTSDSSAPRFDSHCALPDALQPA





AQAGAWFQAYFVQLLTNANPSFL




Chain A, Hypocrea
XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN
839



Jecortna
ETCAKNCCLDGAAYASTYGVTTSGNSLSIDFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS




Cellobtohydro1ase
FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW




Ce17a E217q Soaked
EPSSNNANTGIGGHGSCCSEMDIWQANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC




With Xy1otrtose.
DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS




4D5J_A
YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN




GI:783282851
ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG




Chain A, Hypocrea
XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN
840



Jecortna
ETCAKNCCLDGAAYASTYGVTTSGNSLSIDFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS




Cellobiohydro1ase
FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW




Ce17a E212q Soaked
EPSSNNANTGIGGHGSCCSQMDIWEANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC




With Xy1opentaose.
DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS




4D5O_A
YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN




GI:783282853
ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG




Chain A, Hypocrea
XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN
841



Jecortna
ETCAKNCCLDGAAYASTYGVTTSGNSLSIDFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS




Cellobtohydro1ase
FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW




Ce17a E217q Soaked
EPSSNNANTGIGGHGSCCSEMDIWQANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC




With Xy1otetraose.
DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS




4D5V_A
YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN




GI:783282861
ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG




Chain A,
SGTATYSGNPFVGVTPWANAYYASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLME
842



Cellobiohydrolase
QTLADIRTANKNGGNYAGQFVVFDLPDRDCAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYS




Ii, Catalytic
DIRTLLVIEPDSLANLVTNLGTPKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPA




Domain, Mutant
NQDPAAQLFANVYKNASSPRALRGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLA




Y169f
NHGWSNAFFITDQGRSGKQPTGQQQWGDWCNVIGTGFGIRPSANTGDSLLDSFVWVKPGGECDG




1CB2_A GI 182776
TSDSSAPRFDSHCALPDALQPAPQAGAWFQAYFVQLLTNANPSFL




Chain B,
SGTATYSGNPFVGVTPWANAYYASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLME
843



Cellobiohydrolase
QTLADIRTANKNGGNYAGQFVVFDLPDRDCAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYS




Ii, Catalytic
DIRTLLVIEPDSLANLVTNLGTPKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPA




Domain, Mutant
NQDPAAQLFANVYKNASSPRALRGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLA




Y169f
NHGWSNAFFITDQGRSGKQPTGQQQWGDWCNVIGTGFGIRPSANTGDSLLDSFVWVKPGGECDG




1CB2_B GI :182777
TSDSSAPRFDSHCALPDALQPAPQAGAWFQAYFVQLLTNANPSFL




SS2c-G10 fusion
MWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDDEAIAVANNSASKFENARNHFR
846



protein
NNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKKEAVKNTLERALSFYSAVQTSD





GNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIYNHQNEDGGWGLHIEGSSTMF





GSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPP





EFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNRSR





NTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKAMKIAMEHIHYEDENSRYICLG





PVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGYNGSQLWDTAFSIQAIISTKL





IDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLM





LSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDYS





YVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQKTDGSWYGCWGVCFTYAGWFGI





KGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVMMA





LIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYS





HRVLDMGSAAPGSGSGSGMQPSTATAAPKEKTSSEKKDNYIIKGVFWDPACVIA




SS2c-G10 fusion
ATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAGCA
848



protein, coding
TTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGACGA




sequence
TGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCCGT





AATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGAGA





TCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAGAA





AGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTGAC





GGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCTAT





ACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTACAT





CTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGTTT





GGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCATG





GTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCCTG





GGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCACCC





GAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAGAA





TGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATAGT





TTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCAGA





AACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGGCA





GTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAGGC





CATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTGGA





CCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTTCA





AGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAGGG





TTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAATTG





ATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCAAG





AGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTTAG





CACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGATG





TTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGCAG





TCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGGTC





CTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACAGT





TATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGGGC





ATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAGAA





AACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCATC





AAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAACT





TCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAACAA





AGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGGCC





TTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCTAA





TCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAGAA





CTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACTCC





CATCGTGTCTTGGATATGGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAGGAATGCAACCAT





CTACCGCTACCGCCGCTCCAAAAGAAAAGACCAGCAGTGAAAAGAAGGACAACTATATTATCAA





AGGTGTCTTCTGGGACCCAGCATGTGTTATTGCTTAG




SS2c-G10 fusion
MWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDDEAIAVANNSASKFENARNHFR
1024



protein
NNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKKEAVKNTLERALSFYSAVQTSD





GNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIYNHQNEDGGWGLHIEGSSTMF





GSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPP





EFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNRSR





NTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKAMKIAMEHIHYEDENSRYICLG





PVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGYNGSQLWDTAFSIQAIISTKL





IDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLM





LSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDYS





YVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQKTDGSWYGCWGVCFTYAGWFGI





KGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVMMA





LIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYS





HRVLDMGSAAPGSGSGSGMQPSTATAAPKEKTSSEKKDNYIIKGVFWDPACVIA-




SS2e-A7b fusion
ATGTCTGCATCTACTACAAGTTTAGAGGAATATCAAAAAACTTTCCTTGAACTGGGATTAGAAT
849



protein, coding
GCAAAGCACTAAGATTTGGGTCATTCAAGCTGAATTCAGGCAGGCAGTCGCCATATTTTTTCAA




sequence
TCTTAGTTTGTTCAATTCTGGAAAGCTGTTGGCAAACCTTGCCACCGCGTATGCAACTGCTATC





ATTCAATCGGAGCTTAAATTCGATGTTATTTTCGGACCTGCTTACAAAGGGATCCCTTTGGCTG





CTATTGTATGCGTTAAACTAGCAGAAATCGGGGGCACTAAATTTCAAGGTATTCAATATGCTTT





TAATAGAAAGAAAGTTAAAGACCACGGCGAAGGTGGTATTATTGTTGGAGCATCGCTTGAAGAC





AAGAGGGTGTTGATTATCGACGATGTCATGACTGCAGGAACTGCAATCAATGAAGCATTTGAGA





TAATCAGTATTGCTCAAGGTAGGGTAGTGGGTTGTATTGTTGCTTTAGATAGGCAAGAAGTGAT





TCATGAATCTGATCCGGAAAGAACAAGTGCTACCCAATCTGTTTCAAAGAGATACAACGTTCCT





GTGCTAAGTATTGTATCACTGACTCAAGTGGTACAATTTATGGGAAATAGACTATCACCAGAGC





AAAAATCAGCGATTGAAAACTACCGTAAGGCCTATGGTATATGA




SS2e-A7b fusion
ATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAGCA
850



protein, coding
TTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGACGA




sequence
TGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCCGT





AATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGAGA





TCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAGAA





AGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTGAC





GGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCTAT





ACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTACAT





CTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGTTT





GGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCATG





GTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCCTG





GGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCACCC





GAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAGAA





TGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATAGT





TTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCAGA





AACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGGCA





GTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAGGC





CATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTGGA





CCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTTCA




SS2e-A7b fusion
AGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAGGG
851



protein
TTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAATTG





ATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCAAG





AGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTTAG





CACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGATG





TTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGCAG





TCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGGTC





CTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACAGT





TATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGGGC





ATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAGAA





AACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCATC





AAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAACT





TCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAACAA





AGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGGCC





TTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCTAA





TCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAGAA





CTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACTCC





CATCGTGTCTTGGATATGGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAGGAATGTCTGCAT





CTACTACAAGTTTAGAGGAATATCAAAAAACTTTCCTTGAACTGGGATTAGAATGCAAAGCACT





AAGATTTGGGTCATTCAAGCTGAATTCAGGCAGGCAGTCGCCATATTTTTTCAATCTTAGTTTG





TTCAATTCTGGAAAGCTGTTGGCAAACCTTGCCACCGCGTATGCAACTGCTATCATTCAATCGG





AGCTTAAATTCGATGTTATTTTCGGACCTGCTTACAAAGGGATCCCTTTGGCTGCTATTGTATG





CGTTAAACTAGCAGAAATCGGGGGCACTAAATTTCAAGGTATTCAATATGCTTTTAATAGAAAG





AAAGTTAAAGACCACGGCGAAGGTGGTATTATTGTTGGAGCATCGCTTGAAGACAAGAGGGTGT





TGATTATCGACGATGTCATGACTGCAGGAACTGCAATCAATGAAGCATTTGAGATAATCAGTAT





TGCTCAAGGTAGGGTAGTGGGTTGTATTGTTGCTTTAGATAGGCAAGAAGTGATTCATGAATCT





GATCCGGAAAGAACAAGTGCTACCCAATCTGTTTCAAAGAGATACAACGTTCCTGTGCTAAGTA





TTGTATCACTGACTCAAGTGGTACAATTTATGGGAAATAGACTATCACCAGAGCAAAAATCAGC





GATTGAAAACTACCGTAAGGCCTATGGTATATGA





MWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDDEAIAVANNSASKFENARNHFR





NNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKKEAVKNTLERALSFYSAVQTSD





GNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIYNHQNEDGGWGLHIEGSSTMF





GSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPP





EFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNRSR





NTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKAMKIAMEHIHYEDENSRYICLG





PVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGYNGSQLWDTAFSIQAIISTKL





IDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLM





LSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDYS





YVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQKTDGSWYGCWGVCFTYAGWFGI





KGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVMMA





LIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYS





HRVLDMGSAAPGSGSGSGMSASTTSLEEYQKTFLELGLECKALRFGSFKLNSGRQSPYFFNLSL





FNSGKLLANLATAYATAIIQSELKFDVIFGPAYKGIPLAAIVCVKLAEIGGTKFQGIQYAFNRK





KVKDHGEGGIIVGASLEDKRVLIIDDVMTAGTAINEAFEIISIAQGRVVGCIVALDRQEVIHES





DPERTSATQSVSKRYNVPVLSIVSLTQVVQFMGNRLSPEQKSAIENYRKAYGI-




SS2d-G11 fusion
ATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAGCA
853



protein, coding
TTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGACGA




sequence
TGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCCGT





AATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGAGA





TCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAGAA





AGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTGAC





GGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCTAT





ACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTACAT





CTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGTTT





GGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCATG





GTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCCTG





GGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCACCC





GAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAGAA





TGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATAGT





TTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCAGA





AACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGGCA





GTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAGGC





CATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTGGA





CCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTTCA





AGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAGGG





TTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAATTG





ATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCAAG





AGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTTAG





CACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGATG





TTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGCAG





TCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGGTC





CTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACAGT





TATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGGGC





ATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAGAA





AACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCATC





AAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAACT





TCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAACAA





AGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGGCC





TTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCTAA





TCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAGAA





CTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACTCC





CATCGTGTCTTGGATATGGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAGGAATGACCGAAT





TAGATTATCAAGGAACTGCTGAGGCGGCTTCTACCTCGTATAGTCGAAATCAAACGGACCTTAA





GCCGTTTCCTTCTGCAGGCAGTGCATCTTCATCAATTAAAACGACGGAACCTGTGAAAGATCAT





AGAAGAAGGCGTTCTTCCAGCATAATTTCACATGTGGAACCGGAGACTTTTGAAGATGAAAATG





ACCAGCAACTTCTACCAAATATGAATGCTACTTGGGTAGACCAACGCGGCGCTTGGATTATTCA





TGTGGTCATTATCATACTGCTGAAACTATTTTATAATTTATTTCCTGGTGTTACCACAGAATGG





TCGTGGACTCTGACTAATATGACATATGTTATTGGGTCCTATGTCATGTTCCATCTGATTAAGG





GTACCCCTTTCGATTTCAATGGTGGTGCTTATGACAACTTGACGATGTGGGAACAAATTGACGA





CGAGACTTTATATACTCCTTCAAGAAAATTTTTGATTAGTGTCCCGATCGCCCTATTCTTAGTT





AGTACTCATTATGCTCACTATGATTTGAAATTGTTTTCATGGAATTGTTTTTTGACAACCTTTG





GTGCTGTTGTCCCAAAGTTACCTGTTACTCATAGATTAAGGATTTCTATCCCAGGTATCACAGG





TCGCGCCCAAATTAGTTGA




SS2d-G11 fusion
MWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDDEAIAVANNSASKFENARNHFR
854



protein
NNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKKEAVKNTLERALSFYSAVQTSD





GNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIYNHQNEDGGWGLHIEGSSTMF





GSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPP





EFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNRSR





NTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKAMKIAMEHIHYEDENSRYICLG





PVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGYNGSQLWDTAFSIQAIISTKL





IDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLM





LSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDYS





YVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQKTDGSWYGCWGVCFTYAGWFGI





KGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVMMA





LIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYS





HRVLDMGSAAPGSGSGSGMTELDYQGTAEAASTSYSRNQTDLKPFPSAGSASSSIKTTEPVKDH





RRRRSSSIISHVEPETFEDENDQQLLPNMNATWVDQRGAWIIHVVIIILLKLFYNLFPGVTTEW





SWTLTNMTYVIGSYVMFHLIKGTPFDFNGGAYDNLTMWEQIDDETLYTPSRKFLISVPIALFLV





STHYAHYDLKLFSWNCFLTTFGAVVPKLPVTHRLRISIPGITGRAQIS




SS2e-A7a fusion
ATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAGCA
855



protein, coding
TTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGACGA




sequence
TGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCCGT





AATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGAGA





TCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAGAA





AGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTGAC





GGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCTAT





ACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTACAT





CTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGTTT





GGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCATG





GTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCCTG





GGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCACCC





GAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAGAA





TGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATAGT





TTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCAGA





AACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGGCA





GTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAGGC





CATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTGGA





CCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTTCA





AGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAGGG





TTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAATTG





ATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCAAG





AGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTTAG





CACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGATG





TTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGCAG





TCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGGTC





CTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACAGT





TATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGGGC





ATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAGAA





AACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCATC





AAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAACT





TCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAACAA





AGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGGCC





TTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCTAA





TCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAGAA





CTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACTCC





CATCGTGTCTTGGATATGGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAGGAATGCCAAGAG





TAGCTATCATCATTTACACACTATATGGTCACGTTGCTGCCACCGCAGAGGCAGAAAAGAAGGG





AATTGAAGCCGCTGGAGGCTCTGCAGACATTTATCAAGTCGAGGAAACGTTGTCTCCAGAAGTT





GTTAAGGCGCTTGGCGGTGCTCCAAAGCCAGATTACCCAATTGCCACTCAAGATACGTTGACAG





AATATGATGCCTTTTTGTTTGGTATTCCAACTAGATTTGGTAACTTCCCTGCTCAATGGAAGGC





TTTCTGGGACCGTACCGGTGGGTTGTGGGCTAAGGGTGCTTTGCATGGTAAGGTCGCTGGTTGT





TTCGTCTCCACCGGAACTGGTGGTGGTAATGAAGCCACAATTATGAACTCTTTGTCTACTTTGG





CTCATCACGGTATCATTTTTGTCCCATTGGGTTACAAGAATGTTTTCGCTGAATTGACCAATAT





GGATGAAGTTCACGGTGGTTCACCATGGGGTGCGGGTACCATTGCAGGCAGTGACGGTTCAAGA





TCTCCTTCCGCCTTGGAATTACAAGTACACGAAATTCAAGGCAAGACTTTCTACGAAACCGTTG





CAAAGTTTTGA




SS2e-A7a fusion
MWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDDEAIAVANNSASKFENARNHFR
856



protein
NNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKKEAVKNTLERALSFYSAVQTSD





GNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIYNHQNEDGGWGLHIEGSSTMF





GSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPP





EFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNRSR





NTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKAMKIAMEHIHYEDENSRYICLG





PVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGYNGSQLWDTAFSIQAIISTKL





IDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLM





LSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDYS





YVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQKTDGSWYGCWGVCFTYAGWFGI





KGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVMMA





LIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYS





HRVLDMGSAAPGSGSGSGMPRVAIIIYTLYGHVAATAEAEKKGIEAAGGSADIYQVEETLSPEV





VKALGGAPKPDYPIATQDTLTEYDAFLFGIPTRFGNFPAQWKAFWDRTGGLWAKGALHGKVAGC





FVSTGTGGGNEATIMNSLSTLAHHGIIFVPLGYKNVFAELTNMDEVHGGSPWGAGTIAGSDGSR





SPSALELQVHEIQGKTFYETVAKF-




554d-G5 fusion
ATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAGCA
858



protein, coding
TTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGACGA




sequence
TGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCCGT





AATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGAGA





TCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAGAA





AGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTGAC





GGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCTAT





ACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTACAT





CTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGTTT





GGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCATG





GTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCCTG





GGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCACCC





GAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAGAA





TGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATAGT





TTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCAGA





AACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGGCA





GTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAGGC





CATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTGGA





CCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTTCA





AGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAGGG





TTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAATTG





ATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCAAG





AGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTTAG





CACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGATG





TTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGCAG





TCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGGTC





CTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACAGT





TATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGGGC





ATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAGAA





AACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCATC





AAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAACT





TCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAACAA





AGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGGCC





TTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCTAA





TCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAGAA





CTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACTCC





CATCGTGTCTTGGATATGGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAGGAATGGGAAAGA





ACGTTTTGTTGCTAGGATCTGGTTTTGTTGCACAACCTGTTATCGACACATTGGCTGCTAATGA





TGACATCAATGTCACTGTCGCATGTAGAACATTAGCCAATGCGCAAGCATTGGCCAAGCCCTCT





GGATCCAAGGCTATTTCATTGGATGTTACCGATGACAGTGCCTTAGACAAAGTTCTGGCTGATA





ACGATGTTGTCATCTCTTTGATTCCATACACCTTCCATCCAAATGTGGTAAAGAGCGCCATCAG





AACAAAGACCGATGTCGTCACTTCCTCTTACATCTCACCTGCCTTAAGAGAATTGGAACCAGAA





ATCGTAAAGGCAGGTATTACAGTTATGAACGAAATTGGGTTGGATCCAGGTATCGACCACTTGT





ATGCGGTCAAGACTATTGATGAAGTTCACAGAGCTGGTGGTAAGCTAAAGTCATTCTTGTCATA





CTGTGGTGGTTTACCAGCTCCTGAAGACTCTGATAATCCATTAGGATACAAATTTTCATGGTCC





TCCAGAGGTGTGCTACTGGCTTTAAGAAACTCTGCTAAATACTGGAAAGACGGAAAGATTGAAA





CTGTTTCTTCCGAAGACTTAATGGCCACTGCTAAGCCTTACTTCATCTACCCAGGTTATGCATT





CGTTTGCTACCCAAATAGAGACTCTACCCTTTTCAAGGATCTTTATCATATTCCAGAAGCCGAA





ACGGTCATTAGAGGTACTTTGAGATATCAAGGTTTCCCAGAATTTGTTAAGGCTTTAGTTGACA





TGGGTATGTTGAAGGATGATGCTAACGAAATCTTCAGCAAGCCAATTGCCTGGAACGAAGCACT





AAAACAATATTTAGGTGCCAAGTCTACTTCTAAAGAAGATTTGATTGCTTCCATTGACTCAAAG





GCTACTTGGAAAGATGATGAAGATAGAGAAAGAATCCTTTCCGGGTTTGCTTGGTTAGGCTTGT





TCTCTGACGCAAAGATCACACCAAGAGGTAATGCTTTAGACACTCTATGTGCACGTTTAGAAGA





ACTAATGCAATATGAAGACAATGAAAGAGATATGGTTGTACTACAACACAAATTCGGTATTGAA





TGGGCTGATGGAACTACCGAAACAAGAACATCCACTTTAGTTGACTATGGTAAGGTTGGTGGTT





ACAGTTCTATGGCCGCTACTGTTGGTTATCCAGTTGCCATTGCAACGAAATTCGTCTTAGATGG





TACAATCAAGGGACCAGGCTTACTAGCGCCATACTCACCAGAGATTAATGATCCAATCATGAAA





GAACTAAAGGACAAGTACGGCATCTATCTAAAGGAAAAGACAGTGGCTTAA




SS4d-G5 fusion
MWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDDEAIAVANNSASKFENARNHFR
859



protein
NNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKKEAVKNTLERALSFYSAVQTSD





GNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIYNHQNEDGGWGLHIEGSSTMF





GSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPP





EFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNRSR





NTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKAMKIAMEHIHYEDENSRYICLG





PVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGYNGSQLWDTAFSIQAIISTKL





IDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLM





LSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDYS





YVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQKTDGSWYGCWGVCFTYAGWFGI





KGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVMMA





LIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYS





HRVLDMGSAAPGSGSGSGMGKNVLLLGSGFVAQPVIDTLAANDDINVTVACRTLANAQALAKPS





GSKAISLDVTDDSALDKVLADNDVVISLIPYTFHPNVVKSAIRTKTDVVTSSYISPALRELEPE





IVKAGITVMNEIGLDPGIDHLYAVKTIDEVHRAGGKLKSFLSYCGGLPAPEDSDNPLGYKFSWS





SRGVLLALRNSAKYWKDGKIETVSSEDLMATAKPYFIYPGYAFVCYPNRDSTLFKDLYHIPEAE





TVIRGTLRYQGFPEFVKALVDMGMLKDDANEIFSKPIAWNEALKQYLGAKSTSKEDLIASIDSK





ATWKDDEDRERILSGFAWLGLFSDAKITPRGNALDTLCARLEELMQYEDNERDMVVLQHKFGIE





WADGTTETRTSTLVDYGKVGGYSSMAATVGYPVAIATKFVLDGTIKGPGLLAPYSPEINDPIMK





ELKDKYGIYLKEKTVA-




SS4d-C7 fusion
ATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAGCA
861



protein, coding
TTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGACGA




sequence
TGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCCGT





AATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGAGA





TCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAGAA





AGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTGAC





GGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCTAT





ACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTACAT





CTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGTTT





GGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCATG





GTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCCTG





GGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCACCC





GAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAGAA





TGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATAGT





TTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCAGA





AACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGGCA





GTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAGGC





CATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTGGA





CCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTTCA





AGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAGGG





TTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAATTG





ATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCAAG





AGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTTAG





CACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGATG





TTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGCAG





TCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGGTC





CTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACAGT





TATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGGGC





ATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAGAA





AACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCATC





AAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAACT





TCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAACAA





AGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGGCC





TTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCTAA





TCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAGAA





CTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACTCC





CATCGTGTCTTGGATATGGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAGGAATGTCACGTC





TTCCTCTAAAGCAGTTCTTAGCGGATAACCCCAAAAAAGTTCTTGTTCTTGACGGTGGTCAAGG





AACAGAACTGGAAAACAGAGGTATCAAAGTTGCAAATCCCGTGTGGTCTACTATTCCATTTATT





AGCGAATCATTTTGGTCTGATGAGTCATCTGCTAACAGAAAAATTGTCAAAGAAATGTTCAACG





ATTTCTTGAATGCTGGCGCAGAAATATTGATGACTACAACATACCAAACGAGTTATAAATCAGT





TTCTGAAAACACCCCAATCAGAACTTTATCCGAGTACAATAACCTTTTAAACAGGATTGTCGAT





TTTTCTCGTAATTGTATTGGCGAAGACAAATATTTGATTGGCTGTATTGGCCCATGGGGTGCTC





ATATTTGTCGTGAGTTTACAGGCGACTATGGTGCTGAGCCAGAAAATATTGATTTCTACCAATA





CTTCAAGCCTCAGTTGGAGAATTTCAATAAAAATGACAAATTGGATTTGATTGGGTTTGAAACC





ATTCCTAACATCCATGAACTGAAAGCTATCTTATCTTGGGATGAGAGTATCCTGTCTAGACCCT





TCTATATCGGGTTGTCTGTGCATGAGCACGGTGTCTTGAGAGACGGCACTACCATGGAAGAAAT





CGCACAAGTTATTAAGGACTTGGGCGACAAAATAAATCCTAACTTCTCGTTCTTAGGAATCAAC





TGCGTCAGCTTCAACCAATCACCCGACATTCTTGAGTCTCTACATCAAGCACTACCAAATATGG





CCTTGCTTGCTTATCCAAACAGTGGTGAAGTTTATGATACTGAAAAGAAGATATGGTTGCCAAA





TAGCGATAAGCTGAACAGTTGGGATACGGTTGTTAAACAGTACATTAGCAGCGGTGCCCGTATC





ATTGGTGGTTGTTGCAGAACAAGTCCAAAAGACATCCAAGAGATTTCTGCAGCCGTCAAGAAAT





ACACGTAA




SS4d-C7 fusion
MWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDDEAIAVANNSASKFENARNHFR
862



protein
NNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKKEAVKNTLERALSFYSAVQTSD





GNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIYNHQNEDGGWGLHIEGSSTMF





GSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPP





EFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNRSR





NTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKAMKIAMEHIHYEDENSRYICLG





PVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGYNGSQLWDTAFSIQAIISTKL





IDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLM





LSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDYS





YVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQKTDGSWYGCWGVCFTYAGWFGI





KGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVMMA





LIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYS





HRVLDMGSAAPGSGSGSGMSRLPLKQFLADNPKKVLVLDGGQGTELENRGIKVANPVWSTIPFI





SESFWSDESSANRKIVKEMFNDFLNAGAEILMTTTYQTSYKSVSENTPIRTLSEYNNLLNRIVD





FSRNCIGEDKYLIGCIGPWGAHICREFTGDYGAEPENIDFYQYFKPQLENFNKNDKLDLIGFET





IPNIHELKAILSWDESILSRPFYIGLSVHEHGVLRDGTTMEEIAQVIKDLGDKINPNFSFLGIN





CVSFNQSPDILESLHQALPNMALLAYPNSGEVYDTEKKIWLPNSDKLNSWDTVVKQYISSGARI





IGGCCRTSPKDIQEISAAVKKYT-




SS3b-D8 fusion
ATGTCGCAAGAGTTCGAGACACCGGCGGTTGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAG
864



protein, coding
GAATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAG




sequence
CATTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGAC





GATGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCC





GTAATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGA





GATCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAG





AAAGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTG





ACGGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCT





ATACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTAC





ATCTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGT





TTGGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCA





TGGTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCC





TGGGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCAC





CCGAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAG





AATGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATA





GTTTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCA





GAAACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGG





CAGTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAG





GCCATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTG





GACCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTT





CAAGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAG





GGTTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAAT





TGATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCA





AGAGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTT





AGCACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGA





TGTTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGC





AGTCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGG





TCCTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACA





GTTATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGG





GCATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAG





AAAACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCA





TCAAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAA





CTTCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAAC





AAAGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGG





CCTTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCT





AATCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAG





AACTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACT





CCCATCGTGTCTTGGATATGTGA




SS3b-D8 fusion
MSQEFETPAVGSAAPGSGSGSGMWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDD
865



protein
DEAIAVANNSASKFENARNHFRNNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVK





KEAVKNTLERALSFYSAVQTSDGNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRY





IYNHQNEDGGWGLHIEGSSTMFGSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITS





WGKLWLSVLGVYEWSGNNPLPPEFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPI





VLSLRKELYTIPYHEIDWNRSRNTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREK





AMKIAMEHIHYEDENSRYICLGPVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQ





GYNGSQLWDTAFSIQAIISTKLIDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPF





STRDHGWLISDCTAEGLKASLMLSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTR





SYPWLELINPAETFGDIVIDYSYVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQ





KTDGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQN





KVYTNLEGNKPHLVNTAWVMMALIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNK





NCMITYAAYRNIFPIWALGEYSHRVLDM-




SS3b-D8 fusion
MSQEFETPAV
866



protein, fusion





domain





SS2c-A10a
ATGAATTCGAATGAAGACATCATACCTGAACTATAA
867



SS2c-A10a fusion
MWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDDEAIAVANNSASKFENARNHFR
869



protein
NNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKKEAVKNTLERALSFYSAVQTSD





GNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIYNHQNEDGGWGLHIEGSSTMF





GSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPP





EFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNRSR





NTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKAMKIAMEHIHYEDENSRYICLG





PVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGYNGSQLWDTAFSIQAIISTKL





IDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLM





LSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDYS





YVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQKTDGSWYGCWGVCFTYAGWFGI





KGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVMMA





LIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYS





HRVLDMGSAAPGSGSGSGMNSNEDIIPEL-




SS2c-A10a fusion
MNSNEDIIPEL
870



protein, fusion





domain





Pathway step 3
ATGTGGACAGTTGTGTTGGGACTTGCTACCTTGTTTGTTGCCTATTATATTCATTGGATCAACA
871



sequence A
AGTGGAGAGATTCCAAGTTCAATGGTGTTCTACCTCCTGGAACTATGGGGCTACCATTGATAGG




CYP87D18 DNA
AGAGACAATTCAGTTGTCAAGACCATCTGACAGTTTGGATGTGCATCCCTTTATCCAGAAGAAA




(codon optimized)
GTCGAACGTTATGGTCCGATATTTAAAACCTGTTTGGCAGGCAGACCAGTTGTTGTTTCAGCGG





ATGCAGAGTTCAATAATTACATTATGTTACAAGAAGGTAGAGCTGTAGAAATGTGGTATTTGGA





CACACTGTCTAAATTCTTCGGGTTGGATACAGAGTGGTTAAAAGCCTTAGGCTTAATCCACAAG





TACATAAGATCCATTACCCTAAACCATTTTGGTGCTGAAGCATTGAGAGAAAGATTCTTGCCAT





TTATAGAGGCATCGTCTATGGAAGCGTTACATTCTTGGTCCACTCAACCCAGTGTGGAGGTCAA





GAATGCAAGTGCTTTGATGGTATTCAGAACGTCTGTAAACAAAATGTTTGGAGAAGATGCTAAG





AAATTATCAGGAAATATTCCAGGTAAATTCACAAAGCTGCTGGGTGGCTTTCTATCTCTACCGT





TAAATTTTCCCGGCACTACTTATCACAAGTGCTTAAAAGACATGAAAGAAATCCAGAAGAAATT





ACGTGAAGTTGTAGATGATAGACTTGCCAATGTTGGGCCAGATGTTGAGGACTTTCTAGGGCAA





GCGTTGAAAGACAAAGAATCCGAGAAATTCATAAGCGAAGAATTTATCATCCAATTGCTATTTT





CAATAAGCTTTGCTTCGTTCGAATCGATCAGCACGACGTTGACATTGATTTTGAAGCTACTTGA





CGAACATCCTGAGGTTGTAAAGGAATTAGAAGCCGAACATGAAGCTATCAGAAAAGCTAGAGCT





GATCCAGATGGTCCAATTACCTGGGAAGAATACAAATCTATGACCTTCACACTTCAAGTCATAA





ACGAAACACTTAGGTTAGGCTCAGTGACTCCTGCCTTATTGAGGAAAACTGTTAAAGATCTGCA





AGTCAAGGGTTACATTATTCCTGAAGGATGGACTATAATGTTGGTAACTGCATCTAGGCATCGT





GATCCAAAGGTCTACAAAGATCCGCACATATTCAATCCTTGGAGATGGAAAGACCTGGACTCAA





TTACCATTCAAAAGAACTTTATGCCATTCGGTGGTGGTTTAAGGCATTGTGCAGGAGCTGAATA





CTCCAAAGTGTATCTGTGTACTTTTCTTCACATTCTTTGCACAAAATATAGGTGGACGAAGTTA





GGTGGCGGTAGAATTGCAAGAGCCCATATTTTAAGTTTTGAGGATGGTTTGCACGTCAAGTTTA





CTCCTAAAGAGTAA




Pathway step 3
MWTVVLGLATLFVAYYIHWINKWRDSKFNGVLPPGTMGLPLIGETIQLSRPSDSLDVHPFIQKK
872



sequence B
VERYGPIFKTCLAGRPVVVSADAEFNNYIMLQEGRAVEMWYLDTLSKFFGLDTEWLKALGLIHK




CYP87D18 Protein
YIRSITLNHFGAEALRERFLPFIEASSMEALHSWSTQPSVEVKNASALMVFRTSVNKMFGEDAK





KLSGNIPGKFTKLLGGFLSLPLNFPGTTYHKCLKDMKEIQKKLREVVDDRLANVGPDVEDFLGQ





ALKDKESEKFISEEFIIQLLFSISFASFESISTTLTLILKLLDEHPEVVKELEAEHEAIRKARA





DPDGPITWEEYKSMTFTLQVINETLRLGSVTPALLRKTVKDLQVKGYIIPEGWTIMLVTASRHR





DPKVYKDPHIFNPWRWKDLDSITIQKNFMPFGGGLRHCAGAEYSKVYLCTFLHILCTKYRWTKL





GGGRIARAHILSFEDGLHVKFTPKE




Pathway step 3
ATGAAGGTCAGTCCATTCGAATTCATGTCCGCTATTATCAAGGGTAGAATGGACCCATCTAACT
873



sequence C
CCTCATTTGAATCTACTGGTGAAGTTGCCTCCGTTATCTTTGAAAACAGAGAATTGGTTGCCAT




SgCPR DNA (codon
CTTGACCACTTCTATTGCTGTTATGATTGGTTGCTTCGTTGTCTTGATGTGGAGAAGAGCTGGT




optimized)
TCTAGAAAGGTTAAGAATGTCGAATTGCCAAAGCCATTGATTGTCCATGAACCAGAACCTGAAG





TTGAAGATGGTAAGAAGAAGGTTTCCATCTTCTTCGGTACTCAAACTGGTACTGCTGAAGGTTT





TGCTAAGGCTTTGGCTGATGAAGCTAAAGCTAGATACGAAAAGGCTACCTTCAGAGTTGTTGAT





TTGGATGATTATGCTGCCGATGATGACCAATACGAAGAAAAATTGAAGAACGAATCCTTCGCCG





TTTTCTTGTTGGCTACTTATGGTGATGGTGAACCTACTGATAATGCTGCTAGATTTTACAAGTG





GTTCGCCGAAGGTAAAGAAAGAGGTGAATGGTTGCAAAACTTGCACTATGCTGTTTTTGGTTTG





GGTAACAGACAATACGAACACTTCAACAAGATTGCTAAGGTTGCCGACGAATTATTGGAAGCTC





AAGGTGGTAATAGATTGGTTAAGGTTGGTTTAGGTGATGACGATCAATGCATCGAAGATGATTT





TTCTGCTTGGAGAGAATCTTTGTGGCCAGAATTGGATATGTTGTTGAGAGATGAAGATGATGCT





ACTACTGTTACTACTCCATATACTGCTGCTGTCTTGGAATACAGAGTTGTCTTTCATGATTCTG





CTGATGTTGCTGCTGAAGATAAGTCTTGGATTAACGCTAATGGTCATGCTGTTCATGATGCTCA





ACATCCATTCAGATCTAACGTTGTCGTCAGAAAAGAATTGCATACTTCTGCCTCTGATAGATCC





TGTTCTCATTTGGAATTCAACATTTCCGGTTCCGCTTTGAATTACGAAACTGGTGATCATGTTG





GTGTCTACTGTGAAAACTTGACTGAAACTGTTGATGAAGCCTTGAACTTGTTGGGTTTGTCTCC





AGAAACTTACTTCTCTATCTACACCGATAACGAAGATGGTACTCCATTGGGTGGTTCTTCATTG





CCACCACCATTTCCATCATGTACTTTGAGAACTGCTTTGACCAGATACGCTGATTTGTTGAACT





CTCCAAAAAAGTCTGCTTTGTTGGCTTTAGCTGCTCATGCTTCTAATCCAGTTGAAGCTGATAG





ATTGAGATACTTGGCTTCTCCAGCTGGTAAAGATGAATATGCCCAATCTGTTATCGGTTCCCAA





AAGTCTTTGTTGGAAGTTATGGCTGAATTCCCATCTGCTAAACCACCATTAGGTGTTTTTTTTG





CTGCTGTTGCTCCAAGATTGCAACCTAGATTCTACTCCATTTCATCCTCTCCAAGAATGGCTCC





ATCTAGAATCCATGTTACTTGTGCTTTGGTTTACGATAAGATGCCAACTGGTAGAATTCATAAG





GGTGTTTGTTCTACCTGGATGAAGAATTCTGTTCCAATGGAAAAGTCCCATGAATGTTCTTGGG





CTCCAATTTTCGTTAGACAATCCAATTTTAAGTTGCCAGCCGAATCCAAGGTTCCAATTATCAT





GGTTGGTCCAGGTACTGGTTTGGCTCCTTTTAGAGGTTTTTTACAAGAAAGATTGGCCTTGAAA





GAATCCGGTGTTGAATTGGGTCCATCCATTTTGTTTTTCGGTTGCAGAAACAGAAGAATGGATT





ACATCTACGAAGATGAATTGAACAACTTCGTTGAAACCGGTGCTTTGTCCGAATTGGTTATTGC





TTTTTCTAGAGAAGGTCCTACCAAAGAATACGTCCAACATAAGATGGCTGAAAAGGCTTCTGAT





ATCTGGAACTTGATTTCTGAAGGTGCTTACTTGTACGTTTGTGGTGATGCTAAAGGTATGGCTA





AGGATGTTCATAGAACCTTGCATACCATCATGCAAGAACAAGGTTCTTTGGATTCTTCCAAAGC





TGAATCCATGGTCAAGAACTTGCAAATGAATGGTAGATACTTAAGAGATGTTTGGTAA




Pathway step 3
MKVSPFEFMSAIIKGRMDPSNSSFESTGEVASVIFENRELVAILTTSIAVMIGCFVVLMWRRAG
874



sequence D
SRKVKNVELPKPLIVHEPEPEVEDGKKKVSIFFGTQTGTAEGFAKALADEAKARYEKATFRVVD




SgCPR Protein
LDDYAADDDQYEEKLKNESFAVFLLATYGDGEPTDNAARFYKWFAEGKERGEWLQNLHYAVFGL





GNRQYEHFNKIAKVADELLEAQGGNRLVKVGLGDDDQCIEDDFSAWRESLWPELDMLLRDEDDA





TTVTTPYTAAVLEYRVVFHDSADVAAEDKSWINANGHAVHDAQHPFRSNVVVRKELHTSASDRS





CSHLEFNISGSALNYETGDHVGVYCENLTETVDEALNLLGLSPETYFSIYTDNEDGTPLGGSSL





PPPFPSCTLRTALTRYADLLNSPKKSALLALAAHASNPVEADRLRYLASPAGKDEYAQSVIGSQ





KSLLEVMAEFPSAKPPLGVFFAAVAPRLQPRFYSISSSPRMAPSRIHVTCALVYDKMPTGRIHK





GVCSTWMKNSVPMEKSHECSWAPIFVRQSNFKLPAESKVPIIMVGPGTGLAPFRGFLQERLALK





ESGVELGPSILFFGCRNRRMDYIYEDELNNFVETGALSELVIAFSREGPTKEYVQHKMAEKASD





IWNLISEGAYLYVCGDAKGMAKDVHRTLHTIMQEQGSLDSSKAESMVKNLQMNGRYLRDVW




Pathway step 3
ATGGAACCTGAAAACAAGTTCTTCAATGTTGGGTTATTGATCGTAGTTACGTTGGTTTTGGCTA
875



sequence E
AACTAATTTCTGCGGTCATTAATTCCAGGTCTAAGAAGAGAGTACCTCCAACCGTCAAAGGTTT




CYP51G1 (codon
TCCACTTGTAGGTGGCTTGGTTAGATTTCTTAAAGGGCCAATTGTGATGTTGAGAGAAGAATAT




optimized)
CCCAAACATGGATCCGTATTCACTCTGAATTTACTACATAAGAAGATTACCTTTCTGATTGGAC





CAGAAGTTTCTGCACATTTCTTTAAGGCTTCAGAGAGTGATTTATCACAGCAAGAAGTCTACCA





ATTTAACGTGCCCACTTTTGGTCCGGGCGTTGTTTTCGATGTCGACTACTCGGTAAGGCAAGAA





CAATTCAGATTCTTTACCGAAGCATTGAGAGTTACAAAACTGAAGGGCTATGTTGACCAAATGG





TGAAAGAAGCAGAAGATTACTTTTCAAAATGGGGTGATTCAGGAGAGGTTGATCTAAAATGCGA





ACTTGAACACTTGATCATATTAACCGCATCTAGATGTTTGTTGGGAAGAGAAGTTCGTGACCAG





TTATTTGCTGATGTAAGTGCCCTATTTCATGACTTGGATAACGGTATGCTGCCAATATCCGTGA





TGTTCCCATACTTGCCTATACCCGCTCATAGGAGAAGAGATCAAGCGAGATCAAAATTGGCTGA





TATCTTTGTCAACATCATATCCTCTCGTAAATGTACTGGCACTTCTGAAAATGACATGTTACAA





TGCTTTATAAACTCTAAATACAAAGATGGCAGACCAACTACTGATTCTGAAATCACAGGGTTAT





TGATAGCCGCATTATTCGCTGGGCAACATACGAGCTCGATTACTAGCACATGGACAGGCGCATA





TTTGTTATGTCACAAAGAGTATATGAGTGCCGTTCTTGAAGAGCAGCAGAAACAAATGGAGAAG





CATGGTGACGAAATTGATCACGATATTCTATCCGAAATGGACAATTTGTACCGTTGCATCAAAG





AAGCCCTAAGACTACATCCACCCTTGATTATGCTTATGAGGTCGAGTCATACCGATTTTAGCGT





TACGACAAGAGAAGGAAAAGAGTATGATATTCCGAAGGGACATATTATAGCCACAAGTCCAGCT





TTCGCAAATCGTTTACCTCACGTGTATAAAGACCCTGACAGATTTGATCCAGATAGGTTTGCTC





CAGGTAGAGATGAGGATAAGGCTGCTGGACCTTTCTCCTACATATCATTTGGTGGTGGTAGACA





CGGTTGTTTAGGTGAACCTTTTGCGTATTTACAAATCAAGGCAATCTGGTCACACTTACTGAGA





AATTTTGAGTTAGAGTTGATTAGTCCTTTCCCGGAAATTGACTGGAATGCCATGGTTGTGGGTG





TCAAGGGTAAAGTGATGGTCAGGTATAAGAGAAGAAAGCTTAGCGTATCTTAG




Pathway step 3
MEPENKFFNVGLLIVVTLVLAKLISAVINSRSKKRVPPTVKGFPLVGGLVRFLKGPIVMLREEY
876



sequence F
PKHGSVFTLNLLHKKITFLIGPEVSAHFFKASESDLSQQEVYQFNVPTFGPGVVFDVDYSVRQE




CYP51G1 Protein
QFRFFTEALRVTKLKGYVDQMVKEAEDYFSKWGDSGEVDLKCELEHLIILTASRCLLGREVRDQ





LFADVSALFHDLDNGMLPISVMFPYLPIPAHRRRDQARSKLADIFVNIISSRKCTGTSENDMLQ





CFINSKYKDGRPTTDSEITGLLIAALFAGQHTSSITSTWTGAYLLCHKEYMSAVLEEQQKQMEK





HGDEIDHDILSEMDNLYRCIKEALRLHPPLIMLMRSSHTDFSVTTREGKEYDIPKGHIIATSPA





FANRLPHVYKDPDRFDPDRFAPGRDEDKAAGPFSYISFGGGRHGCLGEPFAYLQIKAIWSHLLR





NFELELISPFPEIDWNAMVVGVKGKVMVRYKRRKLSVS.




Pathway step 3
ATGTTATCGTTGGCCATTTGGGTTTCACTTTTGTTCTTGTTGTCATCATTGCTTCTTTTAAAGA
877



sequence G
CGAAGAAGAAAGTTGCTCCACAAAAGAAGAAGAAGCAATTTCCACCTGGACCTCCCAAACTACC




CYP71B97 (codon
ATTGTTAGGCCATCTGCACTTATTGGGTTCTTTGCCTCATTGCTCCTTATGTGAACTGTCTAGA




optimized)
AAATATGGTCCTGTCATGTTGTTAAAATTAGGCTCAGTACCTACCGTAGTCATATCTAGCGCTG





CAGCCGCTAGAGAGGTGTTGAAAGTACACGATCTAGCATGTTGCTCTCGTCCGAGATTGGCTGC





TTCCGGTAGATTCTCGTACAATTTTCTGGATCTGAACTTAAGCCCATATGGTGAGAGATGGAGA





GAACTGAGGAAAATTTGCGTATTGGTTTTGCTGAGTGCTAGACGTGTTCAGAGCTTCCAACAGA





TAAGAGAAGAAGAGGTGGGATTATTACTTAAATCCATTAGTCAAGTTTCCAGTAGTGCCACTCC





AGTTGATCTATCTGAGAAATCCTATTCTTTGACAGCTAACATTATCACTAGAATCGCGTTTGGG





AAGTCATTCAGAGGTGGCGAATTAGACAATGAAAACTTTCAACAAGTCATCCACAGAGCATCGA





TTGCCTTAGGTTCCTTTTCTGTGACAAACTTCTTTCCTTCAGTAGGGTGGATTATCGACAGATT





AACCGGTGTACATGGCAGATTGGAGAAGAGTTTTGCTGAATTAGACACCTTCTTTCAGCATATC





ATTGATGATCGTATCAATTTTGTCGCAACAAGCCAAACCGAAGAAAACATTATAGACGTACTAT





TGAAAATGGAAAGAGAACGTTCAAAATTTGATGTCCTACAACTGAATAGGGACTGCATAAAAGC





CTTGATAATGGATATATTTCTTGCCGGTGTAGATACTGGAGCAGGGACAATTGTGTGGGCATTG





ACTGAATTGGTGAGAAATCCCAGAGTGATGAAGAAGTTGCAAGACGAAATAAGGTCGTGTGTGA





AAGAGGATCAAGTCAAGGAACGTGATTTAGAGAAACTTCAGTACTTAAAGATGGTCGTTAAAGA





AGTTTTAAGATTGCATGCTCCAGTTCCTTTGTTATTGCCGAGAGAGACAATGTCTCATTTCAAA





CTAAATGGTTATGACATTGATCCGAAAACTCACTTGCATGTCAATGTTTGGGCGATTGGTAGGG





ACCCAGATTCTTGGTCTGATCCAGAAGAATTCTTCCCAGAAAGATTCGCAGGATCAAGTATTGA





TTACAAAGGACATAATTTTGAATTGCTGCCATTTGGTGGTGGCAGAAGGATCTGTCCCGGTATG





AACATGGGGACAGTTGCGGTTGAACTTGCACTAACGAACCTATTACTTTGTTTTGATTGGACTC





TACCTGATGGCATGAAAGAGGAAGATGTTGACATGGAAGAAGATGGTGGACTTGCTATTGCTAA





GAAATCTCCCCTAAAATTAGTTCCAGTTAGGTGTCTTAATTAG




Pathway step 3
MLSLAIWVSLLFLLSSLLLLKTKKKVAPQKKKKQFPPGPPKLPLLGHLHLLGSLPHCSLCELSR
878



sequence H
KYGPVMLLKLGSVPTVVISSAAAAREVLKVHDLACCSRPRLAASGRFSYNFLDLNLSPYGERWR




CYP71B97 Protein
ELRKICVLVLLSARRVQSFQQIREEEVGLLLKSISQVSSSATPVDLSEKSYSLTANIITRIAFG





KSFRGGELDNENFQQVIHRASIALGSFSVTNFFPSVGWIIDRLTGVHGRLEKSFAELDTFFQHI





IDDRINFVATSQTEENIIDVLLKMERERSKFDVLQLNRDCIKALIMDIFLAGVDTGAGTIVWAL





TELVRNPRVMKKLQDEIRSCVKEDQVKERDLEKLQYLKMVVKEVLRLHAPVPLLLPRETMSHFK





LNGYDIDPKTHLHVNVWAIGRDPDSWSDPEEFFPERFAGSSIDYKGHNFELLPFGGGRRICPGM





NMGTVAVELALTNLLLCFDWTLPDGMKEEDVDMEEDGGLAIAKKSPLKLVPVRCLN.




Pathway step 3
ATGGATTTGCTTTTGTTGGAAAAGACGTTGTTGGGTCTATTTATCGCTGTCGTATTGGCAATAG
879



sequence I
CCATTAGCAAATTAAGGGGTAAAAGGTTTAAACTGCCACCAGGTCCGTTACCTGTCCCTATCTT




CYP73A152 (codon
TGGCAACTGGTTACAGGTTGGTGATGATTTGAACCACAGAAATCTAACGGGTTTAGCCAAGAAA




optimized)
TTTGGGGATATTTTCTTGTTAAGAATGGGCCAAAGAAACTTAGTGGTAGTTTCATCTCCTGAAC





TTGCCAAAGAAGTGCTTCATACACAAGGAGTGGAGTTTGGATCTAGAACAAGAAATGTAGTGTT





CGACATATTTACCGGAAAAGGTCAAGATATGGTTTTCACAGTATATGGTGAACATTGGCGTAAA





ATGCGTAGAATAATGACTGTACCATTCTTCACCAACAAGGTTGTCCAACAATATAGGCATGGAT





GGGAAGCAGAAGCAGCTAGCGTTGTTGAAGATGTGAAGAAGAATCCGGAATCTGCTACTACTGG





TATTGTGTTACGTCGTAGACTTCAATTGATGATGTACAATAACATGTATCGTATAATGTTTGAC





AGAAGATTTGAGTCCGAGGATGATCCCCTATTTCACAAATTGAGAGCACTGAATGGTGAGAGAT





CTAGGTTGGCTCAATCGTTCGAGTACAACTATGGAGACTTCATCCCTATTTTAAGACCTTTCTT





GAGAGGCTATTTGAAAATTTGCAAGGAAGTCAAGGACACTAGGTTACAGTTGTTTAAAGACTAC





TTTGTTGAAGAAAGAAAGAAATTGGCGAACGTGAAAACTACCACAAATGAGGGCTTAAAATGTG





CGATCGATCACATTCTGGACGCACAACAGAAAGGTGAAATCAATGAAGATAACGTTTTATACAT





TGTTGAGAATATTAATGTAGCTGCCATTGAAACTACGTTGTGGTCGATAGAATGGGGAATTGCA





GAGCTTGTCAATCATCCTGAAATCCAAAGAAAGCTGAGAAATGAGATGGATACAGTCTTAGGCT





CAGGTGTTCCTATCACTGAACCAGATACACATAAGTTGCCCTATTTACAAGCTGTCATAAAAGA





AACTCTTAGACTTAGAATGGCTATACCCTTGCTAGTTCCACATATGAATCTACATGATGCCAAA





CTGGGTGGTTACGACATTCCAGCAGAATCCAAGATTCTAGTAAACGCTTGGTGGTTAGCCAATA





ATCCAGCTAATTGGAAGAATCCAGAAGAATTCAGACCAGAGAGATTCTTGGAAGAAGAATCCAA





AGTTGAAGCTAATGGGAACGACTTTAGATATTTACCGTTCGGTGTAGGAAGAAGGAGTTGTCCA





GGGATAATTTTAGCGCTACCTATCCTAGCTATCACCATAGGCAGACTGGTTCAGAACTTTGAAT





TGTTACCTCCACCAGGGCAAAGTAAGCTGGATACAAGTGAGAAGGGTGGTCAGTTTTCATTGCA





TATTCTTAAACACTCAACCATTGTCGTTAAACCCAGGGCATTTTAG




Pathway step 3
MDLLLLEKTLLGLFIAVVLAIAISKLRGKRFKLPPGPLPVPIFGNWLQVGDDLNHRNLTGLAKK
880



sequence J
FGDIFLLRMGQRNLVVVSSPELAKEVLHTQGVEFGSRTRNVVFDIFTGKGQDMVFTVYGEHWRK




CYP73A152 Protein
MRRIMTVPFFTNKVVQQYRHGWEAEAASVVEDVKKNPESATTGIVLRRRLQLMMYNNMYRIMFD





RRFESEDDPLFHKLRALNGERSRLAQSFEYNYGDFIPILRPFLRGYLKICKEVKDTRLQLFKDY





FVEERKKLANVKTTTNEGLKCAIDHILDAQQKGEINEDNVLYIVENINVAAIETTLWSIEWGIA





ELVNHPEIQRKLRNEMDTVLGSGVPITEPDTHKLPYLQAVIKETLRLRMAIPLLVPHMNLHDAK





LGGYDIPAESKILVNAWWLANNPANWKNPEEFRPERFLEEESKVEANGNDFRYLPFGVGRRSCP





GIILALPILAITIGRLVQNFELLPPPGQSKLDTSEKGGQFSLHILKHSTIVVKPRAF.




Pathway step 3
ATGTTAAAAGATCCCTTTTGCTTTCCCTTTCTACCTCTGTTGAGTTTGGCTGTTCTTCTGTTCT
881



sequence K
TACTATTGAGAAGGATCTGCTCTAAATCTAAGCCTAGACCTTTGCCTCCGGGTCCTACTCCATG




CYP80C13 (codon
GCCTGTGGTCGGAAATCTATTGCAAATAGGCACAAATCCCCATATTTCGATCACTCAATTTTCT




optimized)
CAAACTTACGGTCCGTTGATTTCCTTGCGTTTGGGAACTAGCTTATTGGTCGTTGCATCGTCAC





CAGCTGCTGCTACTGCCGTTCTTAGAACACATGATAGATTACTTAGTGCGAGATATATGTTCCA





GACGATTCCTGACAAACGTAAACATGCCCAATTGTCCTTATCTACATCGCCATTCTGCGATGAC





CATTGGAAGTCATTGAGAAGCATTTGTAGAGCAAACTTATTCACGTCCAAGGCTATAGAGTCAC





AAGGAGGTCTTAGAAGAAGAAAGATGAAAGAGATGGTGGAATTTCTACAATCCAAACAAGGTAC





GGTTGTAGGTGTTAGGGACTTAGTGTTTACCACCGTTTTCAACATCTTATCCAACTTGGTGTTC





TCAAGAGACTTAGTTGGCTATGTAGGTGAAGGTTTCAATGGGATTAAGTCATCTTTTCACCGTT





CTATGAAATTAGGGTTAACACCTAATCTGGCAGACTTTTATCCAATACTGGAAGGGTTCGATCT





TCAAGGACTACAGAAGAAGGCTGTACTATATAACAAAGGAGTTGATTCTACATGGGAAATCCTA





GTCAAAGAAAGGAGAGAATTACACAGGAACAACTTGGTAGTTTCACCGAATGACTTCTTGGATG





TTTTGATACAGAATCAATTCAGTGATGATCAGATCAACTACTTGATTACCGAGGTTCTAACAGC





TGGTATTGATACAACCACTTCTACCGTTGAATGGGCTATGGCGGAACTGTTAAAGAATAAGGAT





TTAACTGAGAAAGTCAGGGTCGAATTGGAAAGAGAGATGAAAATCAAGGAAAATGCGATTGATG





AGAGTCAGATTAGTCAATTTCAGTTTCTTCAACAGTGTGTCAAAGAAACTTTGAGACTTTATCC





ACCAGTGCCATTTCTGTTACCAAGACTAGCACCAGAACCTTGTGAAGTGATGGGTTACAGTATT





CCGAAAGATACCTCGATATTTGTTAACGCATGGGGCATTGGTAGAGATCCATCTATATGGGAGG





AACCCTCAGCATTCAAACCAGAAAGATTTGTCAATTCAGACTTAGACTTTAAAGCCTATGATTA





CAGATTCTTGCCTTTTGGTGGAGGCAGAAGATCTTGTCCAGGCCTTTTGATGACAACTGTACAA





GTACCATTGATAATTGCCACGTTAATCCACAATTTTGACTGGAGCCTACCTAATGGCGGTGATT





TGGCCCAATTGGATTTAAGCGGTCAAATGGGTGTATCCTTACAAAAGGAAAAGCCACTGTTGCT





TATTCCCAGGAAACGTACTTAG




Pathway step 3
MLKDPFCFPFLPLLSLAVLLFLLLRRICSKSKPRPLPPGPTPWPVVGNLLQIGTNPHISITQFS
882



sequence L
QTYGPLISLRLGTSLLVVASSPAAATAVLRTHDRLLSARYMFQTIPDKRKHAQLSLSTSPFCDD




CYP80C13 Protein
HWKSLRSICRANLFTSKAIESQGGLRRRKMKEMVEFLQSKQGTVVGVRDLVFTTVFNILSNLVF





SRDLVGYVGEGFNGIKSSFHRSMKLGLTPNLADFYPILEGFDLQGLQKKAVLYNKGVDSTWEIL





VKERRELHRNNLVVSPNDFLDVLIQNQFSDDQINYLITEVLTAGIDTTTSTVEWAMAELLKNKD





LTEKVRVELEREMKIKENAIDESQISQFQFLQQCVKETLRLYPPVPFLLPRLAPEPCEVMGYSI





PKDTSIFVNAWGIGRDPSIWEEPSAFKPERFVNSDLDFKAYDYRFLPFGGGRRSCPGLLMTTVQ





VPLIIATLIHNFDWSLPNGGDLAQLDLSGQMGVSLQKEKPLLLIPRKRT




Pathway step 3
ATGGAAGCTCCCTCGTGGGTGTCTTATGCCGCAGCTTGGGTTGCAACATTGGCTCTATTGTTAC
883



sequence M
TTAGTAGGCGTTTGAGAAGAAGAAAATTGAATTTGCCACCTGGACCTAAACCCTGGCCATTAAT




CYP92A127 (codon
TGGCAATTTAAACCTAATAGGTTCTTTACCGCATCAATCCATCCATCAATTGTCCCAAAAGTAT




optimized)
GGCCCAATAATGCACTTGAGATTTGGATCATTTCCTGTTGTAGTTGGCAGTTCTGTGGATATGG





CCAAGATCTTCTTGAAAACTCAGGATCTAACCTTCGTTTCACGTCCAAAGACAGCAGCTGGCAA





ATACACCACTTACAATTATAGCAATATAACGTGGTCACAATATGGTCCTTATTGGAGACAAGCG





AGGAAAATGTGTTTGATGGAATTGTTCTCTGCTAGAAGATTGGACAGTTATGAATACATTAGGA





AAGAAGAGATGAATGCCTTGCTTAAGGAAATTTGCAAAAGTTCGGGAAAAGTCATCAAACTAAA





GGACTACCTATCTACAGTTTCCTTGAACGTGATAAGCAGGATGGTCTTAGGGAAGAAATACACT





GACGAGTCAGAAGATGCAATCGTTAGTCCAGACGAATTTAAGAAAATGCTTGACGAATTGTTTC





TTCTATCTGGTGTATTGAACATCGGTGATTCGATACCGTGGATTGATTTCTTAGATCTACAGGG





TTACGTGAAACGTATGAAAGCTTTGTCCAAGAAATTCGACAGATTTCTGGAGCATGTTTTAGAC





GAGCATAATGAGAGAAGAAAAGGTGTCAAAGATTATGTAGCTAAAGACATGGTCGATGTACTGT





TACAACTGGCAGATGATCCGGATCTTGAGGTGAAATTGGAACGTCACGGTGTTAAGGCGTTCAC





ACAAGACTTAATAGCCGGTGGTACAGAATCTTCCGCTGTCACTGTAGAATGGGCAATGAGCGAA





CTTCTAAAGAAACCAGAGATGTTCGAAAAGGCCTCTGAAGAGTTAGATAGAGTGATTGGTAGGG





AAAGATGGGTTGAGGAAAAGGATATCGCGAATTTACCCTATATTGACGCAATTGCTAAAGAAAC





CATGAGGTTACATCCTGTGGCACCAATGTTGGTACCTAGATTATGCAGAGAAGATTGTCAGATT





GCTGGCTACGATATAGCAAAGGGCACTAGAGTTCTTGTCAACGTTTGGACAATTGGAAGAGATC





CAACTGTTTGGGAAAATCCGGATGAATTTAACCCAGAAAGATTTCTTGGGAAATCAATTGATGT





CAAAGGGCAAGACTTTGAGTTGTTACCCTTTGGAAGTGGTAGAAGAATGTGTCCTGGATATTCA





CTGGGTTTAAAAGTTATTCAGTCATCACTAGCCAACTTATTGCATGGGTTTTCCTGGAAGCTGG





CTGGTGATACCAAGAAAGAAGATTTGAATATGGAAGAAGTATTCGGTTTAAGCACGCCAAAGAA





GTTTCCTTTGGATGCTGTTGCCGAACCAAGACTGCCTCCACACCTGTATTCTATGTAG




Pathway step 3
MEAPSWVSYAAAWVATLALLLLSRRLRRRKLNLPPGPKPWPLIGNLNLIGSLPHQSIHQLSQKY
884



sequence N
GPIMHLRFGSFPVVVGSSVDMAKIFLKTQDLTFVSRPKTAAGKYTTYNYSNITWSQYGPYWRQA




CYP92A127 Protein
RKMCLMELFSARRLDSYEYIRKEEMNALLKEICKSSGKVIKLKDYLSTVSLNVISRMVLGKKYT





DESEDAIVSPDEFKKMLDELFLLSGVLNIGDSIPWIDFLDLQGYVKRMKALSKKFDRFLEHVLD





EHNERRKGVKDYVAKDMVDVLLQLADDPDLEVKLERHGVKAFTQDLIAGGTESSAVTVEWAMSE





LLKKPEMFEKASEELDRVIGRERWVEEKDIANLPYIDAIAKETMRLHPVAPMLVPRLCREDCQI





AGYDIAKGTRVLVNVWTIGRDPTVWENPDEFNPERFLGKSIDVKGQDFELLPFGSGRRMCPGYS





LGLKVIQSSLANLLHGFSWKLAGDTKKEDLNMEEVFGLSTPKKFPLDAVAEPRLPPHLYSM.




Pathway step 3
ATGGAGGCACCACCGTGGGTTTCATATGCAGCTGCGTGGGTAGCAACATTGGCTCTGTTACTTC
885



sequence O
TGTCTAGACATTTGCGTAGAAGAAAATTGAATTTACCACCTGGTCCAAAGCCTTGGCCTCTAAT




CYP92A129 (codon
TGGCAATCTGAACTTGATAGGATCGCTACCACATCAATCCATACATCAATTGAGTCAGAAATAT




optimized)
GGCCCAATTATGCAGTTAAGATTTGGTTCTTTTCCCGTTGTTGTTGGTTCAAGCGTAGATATGG





CCAAAATTTTCCTGAAAACACACGATCTTACGTTTGTGAGCAGACCGAAAACTGCTGCAGGCAA





ATACACCACGTATAACTGTTCCAATATAACTTGGTCGCAATATGGTCCGTATTGGAGACAAGCC





AGGAAAATGTGTTTGATGGAGCTGTTTAGCGCTAGACGTCTGGATTCATACGAATACATCAGAA





AAGAGGAAATGAATGCACTATTGAAGGAGATTTGCAAAAGTAGTGGGAAAGTAATCAAACTTAA





AGACTATTTGTCTACTGTCTCGCTTAATGTCATCAGTAGAATGGTGCTAGGAAAGAAGTACACC





GATGAGTCTGAAGATGCCATTGTTTCTCCCGATGAATTTAAGAAAATGTTGGATGAATTGTTTC





TACTGGGCGGTGTTTTGAACATCGGTGATTCCATACCTTGGATCGACTTCTTAGATCTTCAAGG





ATATGTCAAGAGAATGAAGGCTTTATCAAAGAAATTTGATCGTTTTCTAGAACACGTACTAGAT





GAACACAACGAGCGTAGAAAAGGTGTGAAGGATTATGTTGCTAAGGACATGGTCGATGTGTTAT





TGCAATTGGCTGACGATCCAGACTTGGAAGTCAGGTTAGAGAGGCATGGTGTTAAGGCGTTTAC





CCAAGACTTGATTGCAGGAGGAACAGAATCATCCGCAGTAACAGTAGAATGGGCCATGTCTGAA





TTGTTAAAGAAGCCCGAAATGTTCGAGAAAGCCTCAGAAGAGCTAGACAGAGTGATTGGTAGGG





AAAGATGGGTTGAAGAGAAAGACATAGCCAATTTACCGTATATAGACGCCATCGCTAAAGAAAC





CATGAGATTGCATCCAGTCGCACCTATGCTAGTTCCACGTTTATGCAGAGAAGATTGTCAGATT





GCTGGATACGATATTGCTAAGGGTACTAGAGTCTTGGTGAACGTTTGGACAATTGGTAGGGATC





CTACTGTATGGGAAAATCCTGATGAATTCAATCCCGAAAGATTCTTAGGGAAATCCATCGATGT





CAAAGGTCAAGACTTCGAATTATTGCCATTCGGATCAGGCAGAAGAATGTGTCCAGGGTACTCC





TTAGGCTTAAAGGTTATACAGAGTAGCTTAGCAAATCTTTTGCATGGTTTCTCTTGGAGACTTG





CTGGGGACGTTAAGAAAGAAGATTTAAACATGGAAGAAGTGTTTGGTCTTTCTACTCCCAAGAA





ATTTCCATTGGATGCGGTTGCTGAACCTAGGTTACCACCTCACTTGTACTCTATTTAG




Pathway step 3
MEAPPWVSYAAAWVATLALLLLSRHLRRRKLNLPPGPKPWPLIGNLNLIGSLPHQSIHQLSQKY
886



sequence P)
GPIMQLRFGSFPVVVGSSVDMAKIFLKTHDLTFVSRPKTAAGKYTTYNCSNITWSQYGPYWRQA




CYP92A129 Protein
RKMCLMELFSARRLDSYEYIRKEEMNALLKEICKSSGKVIKLKDYLSTVSLNVISRMVLGKKYT





DESEDAIVSPDEFKKMLDELFLLGGVLNIGDSIPWIDFLDLQGYVKRMKALSKKFDRFLEHVLD





EHNERRKGVKDYVAKDMVDVLLQLADDPDLEVRLERHGVKAFTQDLIAGGTESSAVTVEWAMSE





LLKKPEMFEKASEELDRVIGRERWVEEKDIANLPYIDAIAKETMRLHPVAPMLVPRLCREDCQI





AGYDIAKGTRVLVNVWTIGRDPTVWENPDEFNPERFLGKSIDVKGQDFELLPFGSGRRMCPGYS





LGLKVIQSSLANLLHGFSWRLAGDVKKEDLNMEEVFGLSTPKKFPLDAVAEPRLPPHLYSI.




Pathway step 3
ATGGAAATGTCATCATGTGTAGCCGCTACGATTAGCATCTGGATGGTGGTTGTTTGTATTGTGG
887



sequence Q
GTGTTGGATGGAGAGTGGTAAATTGGGTTTGGCTAAGACCCAAGAAATTGGAGAAAAGGTTAAG




CYP92A458 (codon
GGAACAAGGCTTGGCAGGGAACTCTTACAGATTGTTATTTGGTGACCTTAAAGAACGTGCAGCA




optimized)
ATGGCTGAACAAGCCAATTCAAAACCGATTAATTTTAGTCACGACATTGGTCCAAGAGTTTTCC





CAAGTATGTACAAAACCATTCAGAATTATGGGAAGAATTCCTACATGTGGTTAGGTCCCTATCC





AAGAGTGCATATAATGGATCCTCAACAGCTGAAAACCGTCTTTACATTGGTTTATGACATTCAA





AAGCCGAATCTGAATCCACTGGTCAAATTCTTGTTAGATGGGATTGTCACTCATGAAGGAGAAA





AGTGGGCAAAGCATAGAAAGATCATTAATCCAGCTTTTCACCTTGAAAAGTTGAAGGACATGAT





TCCTGCCTTCTTTCACTCTTGCAATGAGATAGTTAATGAGTGGGAAAGACTAATTTCGAAGGAG





GGTTCCTGTGAACTTGATGTTATGCCTTACTTGCAGAACTTAGCTGCTGATGCTATATCCAGAA





CAGCGTTTGGTTCTAGCTATGAAGAGGGTAAAATGATATTCCAATTACTTAAGGAATTGACTGA





TTTGGTCGTAAAAGTAGCGTTTGGTGTGTATATCCCTGGTTGGAGATTCTTACCAACCAAATCA





AACAACAAAATGAAAGAAATCAACAGGAAAATCAAATCTCTGCTATTAGGAATCATTAACAAAC





GTCAGAAAGCAATGGAAGAAGGCGAAGCTGGTCAATCTGATTTGTTAGGCATACTAATGGAATC





GAATTCCAACGAAATTCAAGGAGAAGGAAACAATAAGGAGGACGGTATGTCTATAGAAGATGTA





ATCGAGGAATGCAAGGTTTTCTATATAGGTGGACAAGAGACTACAGCCAGACTATTAATTTGGA





CAATGATACTTTTAAGTTCACATACGGAATGGCAAGAGAGAGCAAGGACTGAAGTCTTGAAAGT





CTTTGGCAATAAGAAGCCTGATTTTGATGGCTTGAACAGATTGAAAATCGTTAGTGAAATTCTA





TAG




Pathway step 3
MEMSSCVAATISIWMVVVCIVGVGWRVVNWVWLRPKKLEKRLREQGLAGNSYRLLFGDLKERAA
888



sequence R
MAEQANSKPINFSHDIGPRVFPSMYKTIQNYGKNSYMWLGPYPRVHIMDPQQLKTVFTLVYDIQ




CYP92A458 Protein
KPNLNPLVKFLLDGIVTHEGEKWAKHRKIINPAFHLEKLKDMIPAFFHSCNEIVNEWERLISKE





GSCELDVMPYLQNLAADAISRTAFGSSYEEGKMIFQLLKELTDLVVKVAFGVYIPGWRFLPTKS





NNKMKEINRKIKSLLLGIINKRQKAMEEGEAGQSDLLGILMESNSNEIQGEGNNKEDGMSIEDV





IEECKVFYIGGQETTARLLIWTMILLSSHTEWQERARTEVLKVFGNKKPDFDGLNRLKIVSEIL




Pathway step 3
MEVHWVCMCAATLLVCYIFGSKFVRNLNGWYYDVKLRRKEHPLPPGDMGWPLMGNLLSFIKDFS
889



sequence S
SGHPDSFINNLVLKYGRSGIYKTHLFGNPSIIVCEPQMCRRVLTDDVNFKLGYPKSIKELARCR




CYP88D6 Protein
PMIDVSNAEHRLFRRLITSPIVGHKALAMYLERLEEIVINSLEELSSMKHPVELLKEMKKVSFK





AIVHVFMGSSNQDIIKKIGSSFTDLYNGMFSIPINVPGFTFHKALEARKKLAKIVQPVVDERRL





MIENGQQEGDQRKDLIDILLEVKDENGRKLEDEDISDLLIGLLFAGHESTATSLMWSITYLTQH





PHILKKAKEEQEEIMRTRLSSQKQLSFKEIKQMVYLSQVIDETLRCANIAFATFREATADVNIN





GYIIPKGWRVLIWARAIHMDSEYYPNPEEFNPSRWDDYNAKAGTFLPFGAGSRLCPGADLAKLE





ISIFLHYFLLNYRLERVNPECHVTSLPVSKPTDNCLAKVMKVSCA.




Pathway step 3
ATGGAAGTACATTGGGTTTGCATGTGCGCTGCCACTTTGTTGGTATGCTACATTTTTGGAAGCA
890



sequence T
AGTTTGTGAGGAATTTGAATGGGTGGTATTATGATGTAAAACTAAGAAGGAAAGAACACCCACT




CYP88D6 (codon
ACCCCCAGGTGACATGGGATGGCCTCTTATGGGCAATCTATTGTCCTTCATCAAAGATTTCTCA




optimized)
TCGGGTCACCCTGATTCATTCATCAACAACCTTGTTCTCAAATATGGACGAAGTGGTATCTACA





AGACTCACTTGTTTGGGAATCCAAGCATCATTGTTTGCGAGCCTCAGATGTGTAGGCGAGTTCT





CACTGATGATGTGAACTTTAAGCTTGGTTATCCAAAATCTATCAAAGAGTTGGCACGATGTAGA





CCCATGATTGATGTCTCTAATGCGGAACATAGGCTTTTTCGACGCCTCATTACTTCCCCAATCG





TGGGTCACAAGGCGCTAGCAATGTACCTAGAACGTCTTGAGGAAATTGTGATCAATTCGTTGGA





AGAATTGTCCAGCATGAAGCACCCCGTTGAGCTCTTGAAAGAGATGAAGAAGGTTTCCTTTAAA





GCCATTGTCCACGTTTTCATGGGCTCTTCCAATCAGGACATCATTAAAAAAATTGGAAGTTCGT





TTACTGATTTGTACAATGGCATGTTCTCTATCCCCATTAACGTACCTGGTTTTACATTCCACAA





AGCACTCGAGGCACGTAAGAAGCTAGCCAAAATAGTTCAACCCGTTGTGGATGAAAGGCGGTTG





ATGATAGAAAATGGTCAACAAGAAGGGGACCAAAGAAAAGATCTTATTGATATTCTTTTGGAAG





TCAAAGATGAGAATGGACGAAAATTGGAGGACGAGGATATTAGCGATTTATTAATAGGGCTTTT





GTTTGCTGGCCATGAAAGTACAGCAACCAGTTTAATGTGGTCAATTACATATCTTACACAGCAT





CCCCATATCTTGAAAAAGGCTAAGGAAGAGCAGGAAGAAATAATGAGGACAAGATTGTCCTCGC





AGAAACAATTAAGTTTTAAGGAAATTAAACAAATGGTTTATCTTTCTCAGGTAATTGATGAAAC





TTTACGATGTGCCAATATTGCCTTTGCAACTTTTCGAGAGGCAACTGCTGATGTGAACATCAAT





GGTTATATCATACCAAAGGGATGGAGAGTGCTAATTTGGGCAAGAGCCATTCATATGGATTCTG





AATATTACCCAAATCCAGAAGAATTTAATCCATCGAGATGGGATGATTACAATGCCAAAGCAGG





AACCTTCCTTCCTTTTGGAGCAGGAAGTAGACTTTGTCCTGGAGCCGACTTGGCGAAACTTGAA





ATTTCCATATTTCTTCATTATTTCCTCCTTAATTACAGGTTGGAGCGAGTAAATCCAGAATGTC





ATGTTACCAGCTTACCAGTATCTAAGCCCACAGACAATTGCCTCGCTAAGGTGATGAAGGTCTC





ATGTGCTTAG




Pathway step 3
ATGGAAATGTCCTCTTCTGTTGCTGCCACCATTTCTATTTGGATGGTTGTTGTATGTATCGTTG
891



sequence U
GTGTTGGTTGGAGAGTTGTTAATTGGGTTTGGTTAAGACCAAAGAAGTTGGAAAAGAGATTGAG




CYP1798 (codon
AGAACAAGGTTTGGCTGGTAACTCTTACAGATTGTTGTTCGGTGACTTGAAAGAAAGAGCTGCT




optimized)
ATGGAAGAACAAGCTAACTCTAAGCCAATCAACTTCTCCCATGATATTGGTCCAAGAGTTTTCC





CATCTATGTACAAGACCATTCAAAACTACGGTAAGAACTCCTATATGTGGTTGGGTCCATACCC





AAGAGTTCATATTATGGATCCACAACAATTGAAAACCGTCTTTACCTTGGTTTACGACATCCAA





AAGCCAAACTTGAACCCATTGATCAAGTTCTTGTTGGATGGTATTGTCACCCATGAAGGTGAAA





AATGGGCTAAACATAGAAAGATTATCAACCCAGCCTTCCACTTGGAAAAGTTGAAAGATATGAT





TCCAGCCTTCTTCCACTCTTGCAACGAAATAGTTAATGAATGGGAAAGATTGATCTCCAAAGAA





GGTTCTTGCGAATTGGATGTTATGCCATACTTGCAAAATTTGGCTGCTGATGCTATTTCTAGAA





CTGCTTTTGGTTCCTCTTACGAAGAAGGTAAGATGATCTTCCAATTATTGAAAGAATTGACCGA





CTTGGTTGTTAAGGTTGCTTTCGGTGTTTACATTCCAGGTTGGAGATTTTTGCCAACTAAGTCC





AACAACAAGATGAAGGAAATCAACAGAAAGATCAAGTCTTTGTTGTTAGGTATCATCAACAAGA





GACAAAAGGCCATGGAAGAAGGTGAAGCTGGTCAATCTGATTTGTTGGGTATTTTGATGGAATC





CAACTCCAACGAAATTCAAGGTGAAGGTAACAACAAAGAAGATGGTATGTCCATCGAAGATGTT





ATCGAAGAATGCAAGGTTTTCTACATCGGTGGTCAAGAAACTACCGCCAGATTATTGATTTGGA





CCATGATCTTGTTGAGTTCCCATACTGAATGGCAAGAAAGAGCAAGAACTGAAGTCTTGAAGGT





TTTCGGTAACAAAAAGCCAGATTTCGACGGTTTGTCTAGATTGAAGGTTGTCACCATGATTTTG





AACGAAGTTTTGAGATTATACCCACCAGCTTCTATGTTGACCAGAATCATTCAAAAAGAAACCA





GAGTCGGTAAGTTGACTTTGCCAGCTGGTGTTATTTTGATCATGCCAATCATCTTGATCCACAG





AGATCATGATTTGTGGGGTGAAGATGCTAATGAATTCAAGCCAGAAAGATTCTCCAAGGGTGTT





TCTAAAGCTGCTAAAGTTCAACCAGCTTTCTTTCCATTTGGTTGGGGTCCAAGAATATGTATGG





GTCAAAATTTCGCTATGATCGAAGCTAAGATGGCCTTGTCTTTGATCTTGCAAAGATTTTCCTT





CGAATTGTCCTCCTCATATGTTCATGCTCCAACTGTTGTTTTCACCACTCAACCACAACATGGT





GCTCATATCGTTTTGAGAAAGTTGTAA




Pathway step 3
MEMSSSVAATISIWMVVVCIVGVGWRVVNWVWLRPKKLEKRLREQGLAGNSYRLLFGDLKERAA
892



sequence V
MEEQANSKPINFSHDIGPRVFPSMYKTIQNYGKNSYMWLGPYPRVHIMDPQQLKTVFTLVYDIQ




CYP1798 Protein
KPNLNPLIKFLLDGIVTHEGEKWAKHRKIINPAFHLEKLKDMIPAFFHSCNEIVNEWERLISKE





GSCELDVMPYLQNLAADAISRTAFGSSYEEGKMIFQLLKELTDLVVKVAFGVYIPGWRFLPTKS





NNKMKEINRKIKSLLLGIINKRQKAMEEGEAGQSDLLGILMESNSNEIQGEGNNKEDGMSIEDV





IEECKVFYIGGQETTARLLIWTMILLSSHTEWQERARTEVLKVFGNKKPDFDGLSRLKVVTMIL





NEVLRLYPPASMLTRIIQKETRVGKLTLPAGVILIMPIILIHRDHDLWGEDANEFKPERFSKGV





SKAAKVQPAFFPFGWGPRICMGQNFAMIEAKMALSLILQRFSFELSSSYVHAPTVVFTTQPQHG





AHIVLRKL.




Pathway step 3
ATGGATGAAATCGAACATATTACCATCAATACAAATGGAATCAAAATGCATATTGCGTCAGTCG
893



sequence W
GCACAGGACCAGTTGTTCTCTTGCTACACGGCTTTCCAGAATTATGGTACTCTTGGAGACACCA




EPH2A (codon
ACTACTTTACCTGTCCTCCGTTGGGTACAGAGCAATAGCTCCAGATTTGAGAGGCTATGGCGAT




optimized)
ACTGACAGTCCAGCTAGTCCTACCTCTTATACTGCTCTTCATATTGTAGGTGACCTGGTCGGCG





CATTAGACGAATTGGGAATAGAAAAGGTCTTTTTAGTGGGTCATGACTGGGGTGCTATTATCGC





ATGGTACTTTTGTTTGTTTAGACCAGATAGAATTAAAGCACTTGTGAATTTGTCTGTCCAGTTT





ATCCCACGTAACCCAGCAATACCTTTTATAGAAGGTTTCAGAACAGCTTTTGGTGATGACTTCT





ACATTTGTAGATTTCAAGTACCTGGGGAAGCTGAAGAGGATTTCGCGTCTATCGATACTGCTCA





ATTGTTTAAAACTTCATTATGCAATAGAAGCTCAGCCCCTCCTTGTTTGCCTAAAGAGATTGGT





TTTAGGGCTATCCCACCACCAGAAAATCTGCCATCTTGGCTCACAGAGGAAGATATCAACTTCT





ACGCAGCCAAGTTTAAACAAACTGGTTTTACTGGTGCCCTTAACTATTATAGAGCATTCGACTT





GACATGGGAATTAACAGCCCCATGGACAGGAGCCCAGATCCAAGTTCCTGTAAAGTTCATAGTT





GGTGATTCAGATCTCACGTACCATTTCCCTGGTGCTAAGGAATACATCCACAACGGAGGGTTTA





AAAGAGATGTGCCACTATTAGAGGAAGTTGTTGTGGTAAAAGATGCCTGCCACTTCATTAACCA





AGAGCGACCACAAGAGATTAATGCTCATATTCATGACTTCATCAATAAGTTCTAA




Pathway step 3
MDEIEHITINTNGIKMHIASVGTGPVVLLLHGFPELWYSWRHQLLYLSSVGYRAIAPDLRGYGD
894



sequence X
TDSPASPTSYTALHIVGDLVGALDELGIEKVFLVGHDWGAIIAWYFCLFRPDRIKALVNLSVQF




EPH2A Protein
IPRNPAIPFIEGFRTAFGDDFYICRFQVPGEAEEDFASIDTAQLFKTSLCNRSSAPPCLPKEIG





FRAIPPPENLPSWLTEEDINFYAAKFKQTGFTGALNYYRAFDLTWELTAPWTGAQIQVPVKFIV





GDSDLTYHFPGAKEYIHNGGFKRDVPLLEEVVVVKDACHFINQERPQEINAHIHDFINKF




Pathway step 3
ATGCCAGCAAATGCCCCAGATAAACAATCAGTGACTAATGCACCAGTAGTGCCGCCAAAGCATG
895



sequence Y
ATACGGACCAGCAGGACGATTCACTAGAAAAACAGCAAGTATTAGAACCGAGCGTAAATAGTAA




tDexT DNA (native
TATACCAAAAAAGCAGACAAATCAACAGTTAGCGGTTGTTACAGCACCAGCAAATTCAGCACCT




DNA sequence)
CAAACCAAAACAACAGCAGAAATTTCTGCTGGTACAGAGTTAGACACGATGCCTAATGTTAAGC





ATGTAGATGGCAAAGTTTATTTTTATGGAGATGATGGCCAACCAAAAAAGAATTTTACTACTAT





TATAGATGGTAAACCTTACTACTTTGATAAAGATACAGGGGCACTATCTAATAACGATAAGCAA





TATGTATCGGAATTATTCAGTATTGGCAATAAACATAACGCCGTCTATAACACATCATCAGATA





ATTTTACGCAATTAGAAGGACATCTGACGGCAAGTAGTTGGTATCGTCCAAAAGATATTTTGAA





AAATGGTAAACGTTGGGCACCTTCAACAGTGACTGATTTCAGACCATTATTGATGGCCTGGTGG





CCGGATAAGAGTACGCAAGTCACTTATCTGAATTACATGAAAGATCAGGGCCTCTTGTCTGGTA





CTCATCACTTTTCCGATAATGAAAATATGCGGACCTTAACGGCAGCTGCCATGCAGGCACAGGT





AAACATTGAGAAAAAAATTGGGCAACTTGGCAATACGGATTGGTTGAAAACGGCGATGACGCAA





TACATTGATGCCCAGCCCAATTGGAATATTGACAGTGAGGCGAAAGGAGATGATCATCTACAAG





GTGGTGCACTACTTTATACAAATAGTGATATGTCGCCAAAGGCCAATTCTGATTATCGTAAGCT





GAGCCGTACGCCTAAAAATCAAAAAGGTCAAATTGCTGATAAATATAAGCAAGGTGGGTTTGAA





TTATTACTAGCAAACGATGTCGATAATTCTAATCCAGTTGTGCAAGCAGAACAACTTAATTGGT





TACATTATATGATGAATATCGGTAGTATTTTACAAAATGATGACCAAGCTAATTTTGATGGTTA





CCGTGTTGATGCTGTCGATAATGTGGACGCTGACTTACTACAGATTGCTGGTGAATATGCTAAG





GCTGCCTATGGTGTTGACAAAAATGACGCGAGAGCGAATCAACATTTATCAATTTTGGAAGACT





GGGGAGATGAAGATCCAGACTATGTCAAAGCACATGGCAACCAGCAAATTACAATGGATTTCCC





CTTGCATTTAGCGATTAAATACGCGCTCAACATGCCTAATGATAAGCGGAGTGGCCTTGAGCCA





ACCCGTGAACACAGTTTAGTCAAACGAATTACAGATGATAAAGAAAATGTTGCACAACCAAATT





ATTCATTTATCCGAGCTCATGACAGTGAAGTACAAACGATTATTGCTGATATTATTAAAGATAA





AATCAACCCGGCGTCAACAGGGCTAGATTCAACAGTGACTTTGGATCAAATTAAGCAGGCTTTT





GACATCTATAATGCTGATGAATTGAAAGCAGATAAAGTTTACACACCTTACAATATTCCAGCAT





CATACGCTTTGTTATTGACTAATAAAGACACAATTCCACGTGTTTATTATGGGGATATGTTCAC





GGATGATGGCCAATACATGGCTAAACAATCACCTTACTATCAAGCGATTGATGCGTTGTTGAAA





GCTCGTATCAAGTATGCTGCTGGTGGTCAAACCATGAAAATGAACTATTTTCCAGATGAACAAT





CTGTTATGACATCAGTTCGTTATGGTAAGGGTGCAATGACGGCAAGTGACTCTGGTAACCAAGA





GACACGCTATCAAGGTATTGGACTTGTTGTCAACAATCGCCCAGATTTGAAACTATCTGACAAA





GATGAAGTCAAAATGGATATGGGTGCGGCACATAAAAACCAAGATTATCGCCCAGTTTTGTTGA





CGACAAAATCAGGATTAAAAGTCTACAGCACTGATGCAAATGCACCTGTCGTTCGAACTGACGC





CAATGGCCAATTAACTTTTAAGGCAGACATGGTATATGGTGTAAACGACCCACAAGTGTCAGGG





TACATTGCGGCTTGGGTACCAGTAGGGGCTTCAGAAAATCAAGATGCTCGAACGAAAAGTGAAA





CAACGCAGTCAACTGACGGGAGTGTTTATCATTCTAATGCAGCGTTAGATTCGCAAGTCATTTA





TGAAGGCTTTTCAAATTTTCAAGACTTTCCAACAACACCCGATGAGTTTACGAACATTAAAATT





GCTCAAAATGTTAACTTATTTAAGGATTGGGGTATTACTAGCTTTGAAATGGCGCCACAATATC





GCGCCAGCTCAGATAAAAGTTTCTTAGATGCTATCGTACAAAATGGTTATGCATTTACAGATCG





ATATGATATTGGTTACAACACACCAACAAAGTATGGGACAGCAGATAATTTGTTAGATGCTTTA





CGTGCATTGCATGGTCAGGGTATTCAAGCGATTAACGACTGGGTACCAGATCAAATTTATAATC





TACCCGATGAACAGTTAGTCACGGCTATTCGAACAGACGGTTCAGGTGATCATACTTATGGTTC





AGTTATTGACCATACTTTGTATGCATCAAAGACAGTTGGCGGGGGCATTTATCAGCAACAATAT





GGTGGGGCCTTCTTGGAACAATTAAAAACACAGTACCCGCAACTTTTCCAGCAAAAACAGATTT





CCACAGATCAGCCAATGAACCCAGATATTCAAATTAAGTCATGGGAAGCCAAGTATTTCAACGG





TTCGAACATTCAGGGGCGTGGGGCTTGGTATGTTTTGAAGGACTGGGGCACACAACAGTATTTT





AATGTGTCAGATGCGCAGACCTTCCTTCCAAAGCAATTATTGGGTGAAAAGGCCAAAACTGGTT





TTGTTACGCGTGGTAAGGAGACTTCATTCTATTCCACTAGTGGCTATCAAGCAAAATCTGCCTT





TATTTGTGATAACGGTAATTGGTACTACTTTGATGACAAAGGGAAAATGGTTGTTGGAAACCAA





GTTATCAATGGCATCAATTATTACTTTTTACCGAATGGTATCGAATTACAAGATGCCTATCTAG





TACATGATGGTATGTACTATTATTATAATAATATTGGCAAGCAACTGCACAACACATATTACCA





AGATAAACAAAAAAATTTCCATTACTTCTTTGAAGATGGGCACATGGCACAGGGTATTGTCACC





ATCATTCAAAGTGATGGCACCCCAGTCACACAGTACTTTGATGAGAATGGTAAGCAACAAAAAG





GCGTGGCGGTCAAAGGATCAGATGGTCATTTGCATTACTTTGACGGTGCGTCAGGGAATATGCT





CTTTAAATCATGGGGTAGACTAGCAGATGGCTCTTGGCTATATGTAGACGAGAAAGGTAATGCG





GTTACAGGCAAACAAACCATTAATAATCAAACGGTTTACTTTAATGATGATGGTCGTCAAATCA





AAAATAACTTTAAAGAATTAGCAGATGGTTCTTGGCTTTATCTTAACAATAAAGGTGTTGCAGT





AACAGGAGAGCAAATAATTAATGGGCAGACACTTTATTTTGGTAACGATGGTCGTCAATTTAAA





GGGACAACACATATAAATGCTACTGGTGAAAGCCGTTACTATGACCCAGACTCAGGTAATATGA





TAACTGATCGTTTTGAACGTGTTGGTGATAATCAATGGGCTTATTTTGGTTATGATGGTGTTGC





AGTAACAGGGGACCGAATCATTAAAGGGCAAAAACTCTATTTCAACCAAAATGGTATCCAAATG





AAAGGCCACTTACGTCTTGAAAATGGTATCATGCGTTATTACGATGCTGATACTGGCGAATTAG





TTCGTAATCGATTTGTATTGCTATCTGATGGTTCATGGGTTTACTTTGGCCAAGATGGCGTACC





CGTAACTGGCGTGCAAGTGATTAATGGCCAAACATTATATTTTGACGCAGATGGTAGGCAAGTC





AAAGGGCAGCAACGTGTAATCGGCAATCAACGCTATTGGATGGATAAAGACAATGGTGAAATGA





AAAAAATAACATACTAG




Pathway step 3
MPANAPDKQSVTNAPVVPPKHDTDQQDDSLEKQQVLEPSVNSNIPKKQTNQQLAVVTAPANSAP
896



sequence Z
QTKTTAEISAGTELDTMPNVKHVDGKVYFYGDDGQPKKNFTTIIDGKPYYFDKDTGALSNNDKQ




tDexT Protein
YVSELFSIGNKHNAVYNTSSDNFTQLEGHLTASSWYRPKDILKNGKRWAPSTVTDFRPLLMAWW





PDKSTQVTYLNYMKDQGLLSGTHHFSDNENMRTLTAAAMQAQVNIEKKIGQLGNTDWLKTAMTQ





YIDAQPNWNIDSEAKGDDHLQGGALLYTNSDMSPKANSDYRKLSRTPKNQKGQIADKYKQGGFE





LLLANDVDNSNPVVQAEQLNWLHYMMNIGSILQNDDQANFDGYRVDAVDNVDADLLQIAGEYAK





AAYGVDKNDARANQHLSILEDWGDEDPDYVKAHGNQQITMDFPLHLAIKYALNMPNDKRSGLEP





TREHSLVKRITDDKENVAQPNYSFIRAHDSEVQTIIADIIKDKINPASTGLDSTVTLDQIKQAF





DIYNADELKADKVYTPYNIPASYALLLTNKDTIPRVYYGDMFTDDGQYMAKQSPYYQAIDALLK





ARIKYAAGGQTMKMNYFPDEQSVMTSVRYGKGAMTASDSGNQETRYQGIGLVVNNRPDLKLSDK





DEVKMDMGAAHKNQDYRPVLLTTKSGLKVYSTDANAPVVRTDANGQLTFKADMVYGVNDPQVSG





YIAAWVPVGASENQDARTKSETTQSTDGSVYHSNAALDSQVIYEGFSNFQDFPTTPDEFTNIKI





AQNVNLFKDWGITSFEMAPQYRASSDKSFLDAIVQNGYAFTDRYDIGYNTPTKYGTADNLLDAL





RALHGQGIQAINDWVPDQIYNLPDEQLVTAIRTDGSGDHTYGSVIDHTLYASKTVGGGIYQQQY





GGAFLEQLKTQYPQLFQQKQISTDQPMNPDIQIKSWEAKYFNGSNIQGRGAWYVLKDWGTQQYF





NVSDAQTFLPKQLLGEKAKTGFVTRGKETSFYSTSGYQAKSAFICDNGNWYYFDDKGKMVVGNQ





VINGINYYFLPNGIELQDAYLVHDGMYYYYNNIGKQLHNTYYQDKQKNFHYFFEDGHMAQGIVT





IIQSDGTPVTQYFDENGKQQKGVAVKGSDGHLHYFDGASGNMLFKSWGRLADGSWLYVDEKGNA





VTGKQTINNQTVYFNDDGRQIKNNFKELADGSWLYLNNKGVAVTGEQIINGQTLYFGNDGRQFK





GTTHINATGESRYYDPDSGNMITDRFERVGDNQWAYFGYDGVAVTGDRIIKGQKLYFNQNGIQM





KGHLRLENGIMRYYDADTGELVRNRFVLLSDGSWVYFGQDGVPVTGVQVINGQTLYFDADGRQV





KGQQRVIGNQRYWMDKDNGEMKKITY




Pathway 1 sequence
ATGGCAGCTGACCAATTGGTGAAAACTGAAGTCACCAAGAAGTCTTTTACTGCTCCTGTACAAA
897



id A
AGGCTTCTACACCAGTTTTAACCAATAAAACAGTCATTTCTGGATCGAAAGTCAAAAGTTTATC




tHMG-CoA DNA
ATCTGCGCAATCGAGCTCATCAGGACCTTCATCATCTAGTGAGGAAGATGATTCCCGCGATATT





GAAAGCTTGGATAAGAAAATACGTCCTTTAGAAGAATTAGAAGCATTATTAAGTAGTGGAAATA





CAAAACAATTGAAGAACAAAGAGGTCGCTGCCTTGGTTATTCACGGTAAGTTACCTTTGTACGC





TTTGGAGAAAAAATTAGGTGATACTACGAGAGCGGTTGCGGTACGTAGGAAGGCTCTTTCAATT





TTGGCAGAAGCTCCTGTATTAGCATCTGATCGTTTACCATATAAAAATTATGACTACGACCGCG





TATTTGGCGCTTGTTGTGAAAATGTTATAGGTTACATGCCTTTGCCCGTTGGTGTTATAGGCCC





CTTGGTTATCGATGGTACATCTTATCATATACCAATGGCAACTACAGAGGGTTGTTTGGTAGCT





TCTGCCATGCGTGGCTGTAAGGCAATCAATGCTGGCGGTGGTGCAACAACTGTTTTAACTAAGG





ATGGTATGACAAGAGGCCCAGTAGTCCGTTTCCCAACTTTGAAAAGATCTGGTGCCTGTAAGAT





ATGGTTAGACTCAGAAGAGGGACAAAACGCAATTAAAAAAGCTTTTAACTCTACATCAAGATTT





GCACGTCTGCAACATATTCAAACTTGTCTAGCAGGAGATTTACTCTTCATGAGATTTAGAACAA





CTACTGGTGACGCAATGGGTATGAATATGATTTCTAAAGGTGTCGAATACTCATTAAAGCAAAT





GGTAGAAGAGTATGGCTGGGAAGATATGGAGGTTGTCTCCGTTTCTGGTAACTACTGTACCGAC





AAAAAACCAGCTGCCATCAACTGGATCGAAGGTCGTGGTAAGAGTGTCGTCGCAGAAGCTACTA





TTCCTGGTGATGTTGTCAGAAAAGTGTTAAAAAGTGATGTTTCCGCATTGGTTGAGTTGAACAT





TGCTAAGAATTTGGTTGGATCTGCAATGGCTGGGTCTGTTGGTGGATTTAACGCACATGCAGCT





AATTTAGTGACAGCTGTTTTCTTGGCATTAGGACAAGATCCTGCACAAAATGTTGAAAGTTCCA





ACTGTATAACATTGATGAAAGAAGTGGACGGTGATTTGAGAATTTCCGTATCCATGCCATCCAT





CGAAGTAGGTACCATCGGTGGTGGTACTGTTCTAGAACCACAAGGTGCCATGTTGGACTTATTA





GGTGTAAGAGGCCCGCATGCTACCGCTCCTGGTACCAACGCACGTCAATTAGCAAGAATAGTTG





CCTGTGCCGTCTTGGCAGGTGAATTATCCTTATGTGCTGCCCTAGCAGCCGGCCATTTGGTTCA





AAGTCATATGACCCACAACAGGAAACCTGCTGAACCAACAAAACCTAACAATTTGGACGCCACT





GATATAAATCGTTTGAAAGATGGGTCCGTCACCTGCATTAAATCCTAA




Pathway 1 sequence
MAADQLVKTEVTKKSFTAPVQKASTPVLTNKTVISGSKVKSLSSAQSSSSGPSSSSEEDDSRDI
898



id B
ESLDKKIRPLEELEALLSSGNTKQLKNKEVAALVIHGKLPLYALEKKLGDTTRAVAVRRKALSI




tHMG-CoA Protein
LAEAPVLASDRLPYKNYDYDRVFGACCENVIGYMPLPVGVIGPLVIDGTSYHIPMATTEGCLVA





SAMRGCKAINAGGGATTVLTKDGMTRGPVVRFPTLKRSGACKIWLDSEEGQNAIKKAFNSTSRF





ARLQHIQTCLAGDLLFMRFRTTTGDAMGMNMISKGVEYSLKQMVEEYGWEDMEVVSVSGNYCTD





KKPAAINWIEGRGKSVVAEATIPGDVVRKVLKSDVSALVELNIAKNLVGSAMAGSVGGFNAHAA





NLVTAVFLALGQDPAQNVESSNCITLMKEVDGDLRISVSMPSIEVGTIGGGTVLEPQGAMLDLL





GVRGPHATAPGTNARQLARIVACAVLAGELSLCAALAAGHLVQSHMTHNRKPAEPTKPNNLDAT





DINRLKDGSVTCIKS




Pathway 1 sequence
ATGTCTGCTGTTAACGTTGCACCTGAATTGATTAATGCCGACAACACAATTACCTACGATGCGA
899



id C
TTGTCATCGGTGCTGGTGTTATCGGTCCATGTGTTGCTACTGGTCTAGCAAGAAAGGGTAAGAA




erg1 DNA
AGTTCTTATCGTAGAACGTGACTGGGCTATGCCTGATAGAATTGTTGGTGAATTGATGCAACCA





GGTGGTGTTAGAGCATTGAGAAGTCTGGGTATGATTCAATCTATCAACAACATCGAAGCATATC





CTGTTACCGGTTATACCGTCTTTTTCAACGGCGAACAAGTTGATATTCCATACCCTTACAAGGC





CGATATCCCTAAAGTTGAAAAATTGAAGGACTTGGTCAAAGATGGTAATGACAAGGTCTTGGAA





GACAGCACTATTCACATCAAGGATTACGAAGATGATGAAAGAGAAAGGGGTGTTGCTTTTGTTC





ATGGTAGATTCTTGAACAACTTGAGAAACATTACTGCTCAAGAGCCAAATGTTACTAGAGTGCA





AGGTAACTGTATTGAGATATTGAAGGATGAAAAGAATGAGGTTGTTGGTGCCAAGGTTGACATT





GATGGCCGTGGCAAGGTGGAATTCAAAGCCCACTTGACATTTATCTGTGACGGTATCTTTTCAC





GTTTCAGAAAGGAATTGCACCCAGACCATGTTCCAACTGTCGGTTCTTCGTTTGTCGGTATGTC





TTTGTTCAATGCTAAGAATCCTGCTCCTATGCACGGTCACGTTATTCTTGGTAGTGATCATATG





CCAATCTTGGTTTACCAAATCAGTCCAGAAGAAACAAGAATCCTTTGTGCTTACAACTCTCCAA





AGGTCCCAGCTGATATCAAGAGTTGGATGATTAAGGATGTCCAACCTTTCATTCCAAAGAGTCT





ACGTCCTTCATTTGATGAAGCCGTCAGCCAAGGTAAATTTAGAGCTATGCCAAACTCCTACTTG





CCAGCTAGACAAAACGACGTCACTGGTATGTGTGTTATCGGTGACGCTCTAAATATGAGACATC





CATTGACTGGTGGTGGTATGACTGTCGGTTTGCATGATGTTGTCTTGTTGATTAAGAAAATAGG





TGACCTAGACTTCAGCGACCGTGAAAAGGTTTTGGATGAATTACTAGACTACCATTTCGAAAGA





AAGAGTTACGATTCCGTTATTAACGTTTTGTCAGTGGCTTTGTATTCTTTGTTCGCTGCTGACA





GCGATAACTTGAAGGCATTACAAAAAGGTTGTTTCAAATATTTCCAAAGAGGTGGCGATTGTGT





CAACAAACCCGTTGAATTTCTGTCTGGTGTCTTGCCAAAGCCTTTGCAATTGACCAGGGTTTTC





TTCGCTGTCGCTTTTTACACCATTTACTTGAACATGGAAGAACGTGGTTTCTTGGGATTACCAA





TGGCTTTATTGGAAGGTATTATGATTTTGATCACAGCTATTAGAGTATTCACCCCATTTTTGTT





TGGTGAGTTGATTGGTTAA




Pathway 1 sequence
MSAVNVAPELINADNTITYDAIVIGAGVIGPCVATGLARKGKKVLIVERDWAMPDRIVGELMQP
900



id D
GGVRALRSLGMIQSINNIEAYPVTGYTVFFNGEQVDIPYPYKADIPKVEKLKDLVKDGNDKVLE




erg1 protein
DSTIHIKDYEDDERERGVAFVHGRFLNNLRNITAQEPNVTRVQGNCIEILKDEKNEVVGAKVDI





DGRGKVEFKAHLTFICDGIFSRFRKELHPDHVPTVGSSFVGMSLFNAKNPAPMHGHVILGSDHM





PILVYQISPEETRILCAYNSPKVPADIKSWMIKDVQPFIPKSLRPSFDEAVSQGKFRAMPNSYL





PARQNDVTGMCVIGDALNMRHPLTGGGMTVGLHDVVLLIKKIGDLDFSDREKVLDELLDYHFER





KSYDSVINVLSVALYSLFAADSDNLKALQKGCFKYFQRGGDCVNKPVEFLSGVLPKPLQLTRVF





FAVAFYTIYLNMEERGFLGLPMALLEGIMILITAIRVFTPFLFGELIG




Pathway 2 sequence
ATGTGGAGATTAAAAGTGGGAAAAGAGAGTGTTGGGGAAAAAGAAGAGAAATGGATTAAGAGTA
901



id E
TAAGCAATCACTTGGGACGTCAAGTTTGGGAATTTTGCAGTGGTGAAAATGAAAATGATGATGA




Cmelo DNA
TGAAGCCATTGCTGTTGCTAATAATTCTGCTTCAAAGTTCGAGAATGCCAGGAATCACTTTCGT





AATAATCGTTTCCATCGCAAGCAATCTTCCGACCTCTTTCTTGCCATTCAGTGTGAAAAGGAAA





TAATAAGAAACGGTGCAAAAAATGAAGGAACCACCAAAGTAAAAGAAGGGGAAGATGTGAAGAA





AGAAGCAGTGAAGAATACATTAGAAAGAGCATTAAGTTTCTATTCGGCTGTTCAAACAAGCGAT





GGGAATTGGGCTTCGGATCTTGGCGGGCCTATGTTTTTACTACCGGGTTTAGTGATTGCTCTAT





ATGTCACTGGAGTCTTGAATTCTGTTCTGTCCAAGCACCATCGCCAAGAAATGTGTAGATATAT





TTACAATCATCAGAATGAAGATGGGGGATGGGGTTTGCACATTGAAGGTTCGAGCACGATGTTT





GGTTCGGCACTGAATTATGTTGCACTGAGACTGCTTGGAGAGGCTGCCGATGGCGGAGAGCACG





GCGCAATGACAAAAGCTCGAAGTTGGATCTTGGAGCGTGGTGGAGCTACCGCAATCACTTCTTG





GGGAAAATTGTGGCTGTCAGTACTTGGAGTCTATGAATGGAGTGGCAACAATCCTCTCCCACCT





GAATTTTGGTTACTCCCATATAGCCTACCATTTCATCCTGGAAGAATGTGGTGCCATTGTCGAA





TGGTTTATCTACCAATGTCGTACTTATATGGAAAGAGATTTGTTGGGCCAATCACACCCATAGT





TTTATCTCTAAGAAAAGAGCTTTACACAATTCCATATCATGAAATTGATTGGAATAGATCTCGC





AATACATGTGCAAAGGAGGATTTGTACTATCCACATCCGAAGATGCAAGATATTTTATGGGGAT





CGATATACCACGTGTATGAGCCATTGTTTAGTGGTTGGCCAGGGAAAAGGTTGAGGGAAAAGGC





AATGAAAATTGCAATGGAACATATACATTATGAAGATGAAAATAGTCGATATATATGTCTTGGT





CCTGTCAATAAAGTACTTAATATGCTTTGTTGTTGGGTTGAAGATCCTTATTCAGATGCCTTCA





AATTTCATCTACAAAGAATCCCTGACTATCTTTGGCTTGCTGAAGATGGCATGAGAATGCAGGG





TTACAATGGGAGTCAATTGTGGGACACTGCTTTCTCTATTCAAGCAATTATATCCACCAAACTT





ATAGACACCTTTGGCCCAACCTTAAGAAAAGCACATCATTTTGTTAAACACTCTCAGATCCAGG





AGGACTGTCCTGGTGATCCTAACGTTTGGTTCCGTCACATTCATAAAGGTGCTTGGCCTTTTTC





AACTCGAGATCATGGTTGGCTCATCTCTGACTGTACGGCCGAGGGACTAAAGGCTTCTTTGATG





TTATCCAAACTTCCATCCAAAATAGTTGGGGAGCCATTAGAAAAGAATCGCCTTTGTGATGCTG





TTAATGTTCTCCTTTCTTTACAAAACGAAAATGGTGGATTTGCATCATACGAGTTGACAAGATC





ATACCCTTGGTTGGAGTTGATCAACCCTGCAGAAACATTTGGAGATATCGTCATCGATTATTCG





TATGTGGAGTGCACCTCAGCGACAATGGAAGCATTGGCATTGTTTAAGAAGTTACATCCAGGGC





ATAGGACCAAAGAGATTGATGCTGCTATTGCCAAGGCCGCCAACTTTCTTGAAAATATGCAAAA





GACTGATGGCTCTTGGTATGGATGTTGGGGGGTATGCTTCACATATGCAGGGTGGTTTGGGATA





AAGGGATTGGTTGCTGCAGGAAGAACATATAATAACTGTGTTGCAATTCGTAAGGCTTGTAATT





TTCTTTTATCTAAAGAGTTACCTGGTGGTGGATGGGGGGAGAGTTACCTTTCATGTCAGAATAA





GGTCTACACCAATCTTGAAGGAAACAAACCACACTTGGTTAATACTGCTTGGGTAATGATGGCT





CTCATTGAAGCTGGCCAGGGTGAGAGAGACCCAGCCCCATTGCATCGTGCAGCAAGATTATTAA





TCAATTCTCAATTGGAGAGTGGTGATTTTCCCCAACAGGAGATCATGGGAGTGTTTAATAAAAA





CTGTATGATTACATATGCTGCATACCGAAACATTTTTCCCATTTGGGCTCTTGGAGAGTATTCC





CATAGAGTTTTGGATATGTAA




Pathway 2 sequence
MWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDDEAIAVANNSASKFENARNHFR
902



id F
NNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKKEAVKNTLERALSFYSAVQTSD




Cmelo Protein
GNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIYNHQNEDGGWGLHIEGSSTMF





GSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPP





EFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNRSR





NTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKAMKIAMEHIHYEDENSRYICLG





PVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGYNGSQLWDTAFSIQAIISTKL





IDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLM





LSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDYS





YVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQKTDGSWYGCWGVCFTYAGWFGI





KGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVMMA





LIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYS





HRVLDM




Pathway 2 sequence
ATGTGGAAACTTAAAGTTGCTGAGGGTGGCACTCCATGGTTAAGAACCCTAAACAATCACGTGG
903



id G
GTAGACAGGTTTGGGAGTTTGACCCACATTCTGGTTCTCCTCAAGACTTGGACGATATTGAGAC




PSXY118L DNA
AGCAAGAAGAAATTTCCATGACAATCGTTTCACTCATAAACACTCAGACGACTTACTTATGAGA




(codon optimized)
TTGCAGTTTGCCAAAGAAAACCCCATGAATGAAGTACTGCCTAAGGTTAAGGTTAAAGACGTTG





AAGATGTCACAGAAGAAGCAGTTGCTACCACTCTAAGAAGAGGCTTGAACTTCTACAGCACCAT





ACAATCCCACGATGGTCATTGGCCCGGTGATTTGGGTGGTCCTATGTTCTTGATGCCTGGTTTA





GTTATCACTTTGTCCGTTACTGGGGCTCTTAACGCTGTTTTAACCGATGAACATAGAAAAGAGA





TGAGAAGATACTTATACAATCACCAAAACAAAGATGGAGGCTGGGGCTTGCATATTGAAGGTCC





TAGTACGATGTTTGGTTCAGTGTTATGCTATGTTACCTTGAGACTATTGGGTGAAGGGCCAAAT





GATGGTGAGGGTGACATGGAGAGAGGAAGAGATTGGATCCTAGAACATGGTGGAGCAACATATA





TAACCTCTTGGGGCAAAATGTGGTTATCTGTATTGGGCGTGTTTGAATGGTCAGGGAACAATCC





AATGCCACCAGAAATTTGGTTGTTGCCTTATGCTCTTCCAGTTCATCCAGGAAGAATGTGGTGT





CATTGTAGGATGGTTTACTTACCGATGTCGTACTTATACGGAAAACGTTTTGTCGGTCCTATTA





CACCGACCGTGCTTAGTCTTAGGAAAGAGCTATTTACAGTACCGTATCATGATATAGACTGGAA





CCAAGCAAGAAATTTATGTGCCAAAGAAGATTTATATTACCCTCATCCACTAGTGCAGGATATA





TTATGGGCTACACTTCACAAGTTTGTCGAACCCGTCTTTATGAATTGGCCTGGTAAGAAGCTAA





GGGAAAAGGCGATCAAAACAGCAATTGAGCACATTCATTATGAGGATGAGAATACTAGGTATAT





CTGCATTGGGCCCGTCAACAAAGTGTTGAATATGCTGTGTTGTTGGGTGGAAGATCCTAATTCC





GAAGCTTTCAAACTGCATTTGCCGAGAATTTATGATTACCTATGGGTAGCTGAAGATGGCATGA





AAATGCAAGGTTATAACGGATCGCAATTGTGGGATACAGCATTTGCTGCACAAGCCATTATTAG





CACAAATCTAATTGACGAATTCGGACCCACGTTAAAGAAGGCGCACGCCTTCATTAAGAATAGT





CAAGTATCCGAAGATTGTCCTGGTGATCTGAGCAAATGGTACAGACACATCTCAAAAGGTGCTT





GGCCATTTTCTACTGCCGATCATGGCTGGCCAATTAGCGACTGTACTGCGGAAGGGCTTAAGGC





AGTATTGTTATTATCGAAGATAGCACCTGAGATTGTTGGAGAACCATTGGATTCCAAGCGTTTG





TATGATGCAGTTAATGTAATTCTGTCACTGCAGAACGAAAATGGAGGTTTGGCGACTTACGAAT





TGACTAGATCATATACGTGGCTGGAAATAATCAACCCTGCCGAAACGTTTGGTGACATAGTCAT





AGATTGTCCATATGTTGAATGCACAAGTGCTGCCATTCAGGCTCTAGCAACTTTTGGTAAATTG





TATCCAGGTCATCGTCGTGAAGAAATACAATGTTGCATAGAGAAAGCCGTTGCCTTCATCGAGA





AGATTCAAGCTTCTGATGGTTCTTGGTATGGATCATGGGGCGTCTGTTTTACCTACGGGACGTG





GTTTGGTATCAAGGGTTTGATTGCTGCAGGGAAGAATTTCTCCAATTGCTTAAGTATAAGGAAA





GCGTGTGAGTTCTTACTGTCTAAACAATTGCCAAGTGGTGGATGGGCCGAATCTTACTTGTCTT





GTCAGAACAAAGTGTACTCTAACTTAGAAGGAAATAGGTCGCACGTCGTTAATACAGGATGGGC





TATGCTTGCATTGATTGAAGCAGAGCAAGCTAAGAGAGATCCAACTCCACTACATAGAGCAGCC





GTATGCTTAATCAACTCACAACTTGAAAATGGCGACTTTCCGCAAGAAGAAATCATGGGCGTAT





TCAATAAGAACTGTATGATAACTTACGCTGCGTATAGGTGCATCTTTCCCATTTGGGCTTTGGG





TGAATATAGAAGAGTCTTACAAGCTTGCTAG




Pathway 2 sequence
MWKLKVAEGGTPWLRTLNNHVGRQVWEFDPHSGSPQDLDDIETARRNFHDNRFTHKHSDDLLMR
904



id H
LQFAKENPMNEVLPKVKVKDVEDVTEEAVATTLRRGLNFYSTIQSHDGHWPGDLGGPMFLMPGL




PSXY118L Protein
VITLSVTGALNAVLTDEHRKEMRRYLYNHQNKDGGWGLHIEGPSTMFGSVLCYVTLRLLGEGPN





DGEGDMERGRDWILEHGGATYITSWGKMWLSVLGVFEWSGNNPMPPEIWLLPYALPVHPGRMWC





HCRMVYLPMSYLYGKRFVGPITPTVLSLRKELFTVPYHDIDWNQARNLCAKEDLYYPHPLVQDI





LWATLHKFVEPVFMNWPGKKLREKAIKTAIEHIHYEDENTRYICIGPVNKVLNMLCCWVEDPNS





EAFKLHLPRIYDYLWVAEDGMKMQGYNGSQLWDTAFAAQAIISTNLIDEFGPTLKKAHAFIKNS





QVSEDCPGDLSKWYRHISKGAWPFSTADHGWPISDCTAEGLKAVLLLSKIAPEIVGEPLDSKRL





YDAVNVILSLQNENGGLATYELTRSYTWLEIINPAETFGDIVIDCPYVECTSAAIQALATFGKL





YPGHRREEIQCCIEKAVAFIEKIQASDGSWYGSWGVCFTYGTWFGIKGLIAAGKNFSNCLSIRK





ACEFLLSKQLPSGGWAESYLSCQNKVYSNLEGNRSHVVNTGWAMLALIEAEQAKRDPTPLHRAA





VCLINSQLENGDFPQEEIMGVFNKNCMITYAAYRCIFPIWALGEYRRVLQAC




Pathway 2 sequence
ATGACCACGACAAACTGGTCCCTAAAGGTAGACAGAGGGCGTCAAACTTGGGAATACTCTCAAG
905



id I
AAAAGAAGGAGGCCACTGATGTGGACATCCATTTGCTACGACTGAAGGAACCCGGCACACATTG




DdCASY80L DNA
CCCTGAAGGTTGTGATCTGAATCGCGCTAAAACTCCCCAACAAGCGATTAAGAAAGCATTTCAG




(codon optimized)
TACTTCTCCAAAGTCCAAACAGAAGATGGTCATTGGGCTGGAGATTTGGGTGGGCCAATGTTCT





TGTTACCCGGTTTGGTGATAACATGCTACGTTACTGGCTATCAATTGCCAGAATCCACTCAAAG





GGAAATTATAAGGTATCTGTTCAATAGACAGAATCCGGTTGATGGTGGCTGGGGTTTGCATATA





GAGGCCCACTCTGATATATTTGGAACTACGTTACAATATGTATCATTGAGATTACTTGGAGTTC





CAGCCGACCATCCATCTGTTGTAAAGGCAAGAACCTTCTTATTACAGAATGGTGGAGCAACCGG





TATTCCTTCATGGGGTAAATTCTGGTTGGCCACGTTGAATGCATACGACTGGAACGGGTTGAAT





CCAATTCCTATTGAATTTTGGCTGTTACCCTACAACTTACCCATTGCTCCTGGTAGGTGGTGGT





GTCACTGTCGGATGGTCTATCTCCCAATGTCTTATATCTACGCTAAGAAAACAACTGGTCCACT





AACAGATTTGGTCAAGGATCTGAGGAGAGAAATCTATTGTCAAGAGTACGAAAAGATTAACTGG





TCTGAACAAAGAAACAATATTTCGAAATTAGACATGTACTACGAGCATACATCTCTTTTAAATG





TTATAAACGGATCATTGAATGCTTACGAGAAAGTTCATTCCAAATGGCTTAGGGATAAAGCCAT





TGACTATACCTTTGACCATATACGCTATGAAGATGAGCAGACGAAATACATTGACATAGGTCCA





GTCAATAAGACCGTCAATATGTTATGCGTTTGGGATAGAGAAGGCAAATCTCCTGCGTTTTACA





AACATGCCGATCGACTTAAAGATTATCTATGGTTATCTTTCGATGGGATGAAAATGCAAGGCTA





TAACGGTTCTCAATTGTGGGACACTGCTTTTACGATCCAAGCATTCATGGAATCTGGGATTGCC





AATCAATTCCAGGATTGTATGAAATTAGCTGGTCACTATTTGGACATCTCCCAGGTACCAGAAG





ATGCCAGAGATATGAAGCACTACCACAGACACTATTCGAAGGGTGCATGGCCTTTTAGTACCGT





TGACCATGGATGGCCAATTTCAGATTGCACAGCAGAAGGTATCAAGTCAGCGCTTGCTCTCAGA





TCTTTGCCTTTTATCGAACCAATATCCTTAGATAGAATTGCTGATGGCATTAATGTTCTATTAA





CCTTGCAAAATGGGGATGGTGGATGGGCATCGTACGAGAACACAAGAGGACCGAAATGGCTGGA





AAAGTTTAACCCTTCCGAAGTTTTCCAGAATATAATGATTGACTATAGCTATGTGGAATGTAGT





GCTGCTTGTATTCAAGCTATGAGTGCGTTTCGTAAACATGCACCTAATCATCCAAGAATTAAGG





AAATCAACAGATCTATTGCACGTGGAGTGAAATTTATCAAGAGCATTCAACGTCAGGATGGTTC





ATGGCTGGGCAGTTGGGGAATTTGTTTTACCTACGGTACTTGGTTTGGCATAGAGGGCTTAGTA





GCATCTGGTGAGCCTCTAACATCGCCATCGATCGTGAAGGCTTGCAAGTTTCTTGCGTCAAAAC





AACGTGCAGATGGTGGTTGGGGAGAAAGCTTTAAAAGCAATGTGACTAAAGAATATGTTCAACA





CGAAACTTCACAAGTAGTCAATACTGGTTGGGCTCTACTCAGTCTAATGAGTGCTAAATATCCG





GACAGAGAGTGCATAGAGAGAGGTATCAAATTCTTAATACAGAGGCAATATCCGAACGGTGATT





TTCCACAGGAATCCATTATTGGCGTTTTCAATTTTAACTGTATGATCTCATATTCAAACTATAA





GAACATATTCCCTCTTTGGGCCTTGAGTAGGTATAATCAATTGTACCTTAAAAGCAAAATCTGA




Pathway 2 sequence
MTTTNWSLKVDRGRQTWEYSQEKKEATDVDIHLLRLKEPGTHCPEGCDLNRAKTPQQAIKKAFQ
906



id J
YFSKVQTEDGHWAGDLGGPMFLLPGLVITCYVTGYQLPESTQREIIRYLFNRQNPVDGGWGLHI




Dd CASY8OL protein
EAHSDIFGTTLQYVSLRLLGVPADHPSVVKARTFLLQNGGATGIPSWGKFWLATLNAYDWNGLN





PIPIEFWLLPYNLPIAPGRWWCHCRMVYLPMSYIYAKKTTGPLTDLVKDLRREIYCQEYEKINW





SEQRNNISKLDMYYEHTSLLNVINGSLNAYEKVHSKWLRDKAIDYTFDHIRYEDEQTKYIDIGP





VNKTVNMLCVWDREGKSPAFYKHADRLKDYLWLSFDGMKMQGYNGSQLWDTAFTIQAFMESGIA





NQFQDCMKLAGHYLDISQVPEDARDMKHYHRHYSKGAWPFSTVDHGWPISDCTAEGIKSALALR





SLPFIEPISLDRIADGINVLLTLQNGDGGWASYENTRGPKWLEKFNPSEVFQNIMIDYSYVECS





AACIQAMSAFRKHAPNHPRIKEINRSIARGVKFIKSIQRQDGSWLGSWGICFTYGTWFGIEGLV





ASGEPLTSPSIVKACKFLASKQRADGGWGESFKSNVTKEYVQHETSQVVNTGWALLSLMSAKYP





DRECIERGIKFLIQRQYPNGDFPQESIIGVFNFNCMISYSNYKNIFPLWALSRYNQLYLKSKI




Pathway 2 sequence
ATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAGCA
907



id K
TTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGACGA




Cmelo DNA (codon
TGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCCGT




optimized)
AATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGAGA





TCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAGAA





AGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTGAC





GGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCTAT





ACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTACAT





CTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGTTT





GGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCATG





GTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCCTG





GGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCACCC





GAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAGAA





TGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATAGT





TTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCAGA





AACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGGCA





GTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAGGC





CATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTGGA





CCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTTCA





AGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAGGG





TTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAATTG





ATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCAAG





AGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTTAG





CACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGATG





TTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGCAG





TCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGGTC





CTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACAGT





TATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGGGC





ATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAGAA





AACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCATC





AAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAACT





TCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAACAA





AGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGGCC





TTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCTAA





TCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAGAA





CTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACTCC





CATCGTGTCTTGGATATGTGA




Pathway 1 sequence
ATGGTCGATCAATGTGCGTTAGGCTGGATATTAGCTAGTGCACTAGGATTGGTTATCGCTCTAT
908



id L
GCTTCTTCGTTGCACCAAGAAGAAACCACAGAGGTGTTGATAGCAAAGAAAGGGATGAGTGTGT




SQE1 DNA (codon
GCAGTCTGCAGCCACAACTAAGGGCGAATGCAGATTTAACGATAGAGATGTGGATGTTATTGTG




optimized)
GTTGGTGCAGGAGTAGCTGGTTCGGCATTAGCCCATACACTTGGTAAAGACGGTAGAAGAGTTC





ATGTCATTGAGAGGGATTTGACTGAACCAGACAGGATTGTTGGTGAACTTCTACAACCAGGAGG





CTATTTGAAGTTGATTGAGTTAGGCTTACAAGACTGTGTGGAAGAAATAGATGCACAAAGAGTT





TACGGGTATGCTTTGTTTAAAGATGGTAAGAACACCAGACTATCTTATCCACTTGAAAATTTTC





ACTCAGATGTCTCCGGTAGAAGCTTTCACAACGGTAGATTCATTCAGAGGATGAGAGAAAAGGC





TGCTTCGCTGCCAAATGTAAGATTGGAACAAGGGACGGTTACTAGTCTACTGGAAGAGAAAGGG





ACGATCAAAGGAGTTCAGTATAAGTCCAAGAATGGAGAGGAGAAAACCGCGTATGCGCCTTTAA





CGATAGTGTGTGATGGCTGTTTCTCTAACTTACGTAGATCATTATGTAATCCCATGGTTGACGT





TCCGAGCTACTTTGTTGGTCTTGTGTTAGAAAATTGCGAACTGCCATTTGCCAATCATGGACAT





GTTATCCTTGGTGATCCATCCCCAATCTTATTCTATCAGATCTCAAGAACCGAAATTAGGTGTT





TGGTCGATGTACCCGGTCAAAAGGTCCCTTCAATTGCCAATGGCGAAATGGAGAAATATTTAAA





GACTGTTGTAGCTCCACAAGTACCACCTCAGATTTACGACAGTTTTATAGCCGCCATTGACAAA





GGGAATATCAGAACTATGCCTAATAGGTCTATGCCTGCAGCTCCCCATCCAACTCCAGGTGCGT





TACTGATGGGCGATGCATTCAACATGAGACATCCTCTAACAGGAGGTGGCATGACAGTAGCACT





GTCTGACATTGTGGTCTTGAGAAACTTGTTAAAACCGTTAAAAGACTTGTCTGACGCCTCTACT





TTGTGCAAATACTTGGAATCCTTTTATACCCTTCGTAAACCAGTAGCTAGCACAATCAACACCT





TAGCTGGAGCCTTGTACAAAGTCTTTTGCGCATCACCGGATCAAGCGAGAAAGGAAATGAGACA





AGCTTGTTTTGATTACCTAAGTCTGGGAGGTATTTTCTCGAATGGTCCTGTCTCATTGTTGTCA





GGGTTGAATCCCAGACCTTTATCCTTGGTATTGCACTTCTTCGCTGTCGCAATTTATGGTGTTG





GTCGTTTGCTTCTACCTTTTCCAAGTGTTAAGGGTATATGGATTGGTGCAAGGTTGATCTACTC





TGCCTCTGGTATAATATTTCCCATAATTAGAGCTGAAGGCGTTCGTCAAATGTTCTTTCCTGCT





ACAGTGCCCGCTTACTATCGTTCCCCACCTGTATTTAAACCGATAGTGTAG




Pathway 1 sequence
MVDQCALGWILASALGLVIALCFFVAPRRNHRGVDSKERDECVQSAATTKGECRFNDRDVDVIV
909



id M
VGAGVAGSALAHTLGKDGRRVHVIERDLTEPDRIVGELLQPGGYLKLIELGLQDCVEEIDAQRV




SQE1 Protein
YGYALFKDGKNTRLSYPLENFHSDVSGRSFHNGRFIQRMREKAASLPNVRLEQGTVTSLLEEKG





TIKGVQYKSKNGEEKTAYAPLTIVCDGCFSNLRRSLCNPMVDVPSYFVGLVLENCELPFANHGH





VILGDPSPILFYQISRTEIRCLVDVPGQKVPSIANGEMEKYLKTVVAPQVPPQIYDSFIAAIDK





GNIRTMPNRSMPAAPHPTPGALLMGDAFNMRHPLTGGGMTVALSDIVVLRNLLKPLKDLSDAST





LCKYLESFYTLRKPVASTINTLAGALYKVFCASPDQARKEMRQACFDYLSLGGIFSNGPVSLLS





GLNPRPLSLVLHFFAVAIYGVGRLLLPFPSVKGIWIGARLIYSASGIIFPIIRAEGVRQMFFPA





TVPAYYRSPPVFKPIV




Pathway 1 sequence
ATGGTCGATCAATGCGCGTTAGGCTGGATATTAGCTTCCGTCCTAGGAGCTGCAGCGTTGTATT
910



id N
TCTTGTTTGGTAGAAAGAATGGTGGTGTGTCTAATGAAAGAAGGCATGAAAGTATTAAGAACAT




SQE2 DNA (codon
TGCAACTACCAATGGTGAGTATAAGTCAAGTAACTCCGATGGTGACATCATCATTGTTGGTGCT




optimized)
GGCGTTGCTGGATCTGCTTTGGCCTATACGCTAGGTAAAGATGGGAGAAGAGTGCATGTCATTG





AAAGGGATTTGACAGAACCAGACCGTATAGTAGGTGAATTGTTACAACCAGGAGGGTATCTAAA





ACTGACAGAGTTGGGTTTAGAAGATTGTGTGGATGATATAGATGCTCAACGTGTTTATGGGTAT





GCATTATTCAAAGACGGTAAAGATACCAGATTGTCCTATCCCTTGGAAAAGTTTCACTCTGACG





TCGCAGGCAGATCCTTTCATAATGGCAGATTCATTCAGCGTATGAGAGAGAAAGCTGCTTCATT





GCCTAAAGTGAGCCTAGAGCAAGGGACTGTAACGTCACTGTTGGAGGAAAACGGAATAATCAAA





GGGGTACAGTATAAAACTAAGACTGGTCAAGAGATGACTGCATATGCTCCTTTAACAATCGTCT





GTGACGGCTGCTTTTCGAACCTTCGTAGAAGCTTGTGCAACCCAAAAGTCGATGTTCCCTCATG





TTTTGTGGGATTAGTTCTAGAAAATTGCGATTTGCCTTACGCCAATCACGGACATGTGATCTTG





GCTGATCCGTCACCTATTCTGTTCTACAGAATATCTAGTACCGAAATCAGGTGTTTGGTTGATG





TTCCAGGTCAGAAAGTGCCTTCTATCAGTAATGGCGAAATGGCCAACTACTTGAAGAATGTTGT





TGCACCTCAGATTCCAAGCCAACTTTACGACTCTTTTGTTGCAGCCATTGACAAGGGAAACATA





AGAACAATGCCGAATAGATCTATGCCAGCAGATCCATATCCAACACCCGGTGCGCTGCTAATGG





GTGATGCCTTTAACATGAGACATCCTCTAACAGGTGGTGGTATGACAGTCGCTTTATCGGATGT





TGTCGTATTAAGAGACTTACTGAAACCACTTAGAGACTTGAATGATGCACCTACCTTGAGCAAG





TATTTAGAAGCCTTTTACACTCTGCGTAAGCCTGTTGCTTCTACCATAAACACGTTAGCAGGAG





CATTGTACAAGGTATTCTGTGCTTCTCCTGATCAAGCGAGAAAGGAAATGAGACAAGCCTGTTT





TGACTACCTTTCACTTGGTGGCATATTCAGTAATGGACCAGTATCCTTATTGTCAGGTCTTAAT





CCAAGGCCCATTTCCCTTGTTTTACACTTCTTTGCAGTGGCTATCTATGGTGTTGGAAGGCTAT





TAATACCGTTTCCATCACCGAAAAGGGTATGGATTGGTGCTAGAATTATTTCGGGCGCGAGTGC





AATTATTTTCCCCATTATCAAAGCTGAAGGCGTCAGACAAATGTTCTTTCCAGCCACTGTAGCT





GCCTACTACAGAGCCCCAAGAGTTGTTAAAGGTAGGTAG




Pathway 1 sequence
MVDQCALGWILASVLGAAALYFLFGRKNGGVSNERRHESIKNIATTNGEYKSSNSDGDIIIVGA
911



id O
GVAGSALAYTLGKDGRRVHVIERDLTEPDRIVGELLQPGGYLKLTELGLEDCVDDIDAQRVYGY




SQE2 Protein
ALFKDGKDTRLSYPLEKFHSDVAGRSFHNGRFIQRMREKAASLPKVSLEQGTVTSLLEENGIIK





GVQYKTKTGQEMTAYAPLTIVCDGCFSNLRRSLCNPKVDVPSCFVGLVLENCDLPYANHGHVIL





ADPSPILFYRISSTEIRCLVDVPGQKVPSISNGEMANYLKNVVAPQIPSQLYDSFVAAIDKGNI





RTMPNRSMPADPYPTPGALLMGDAFNMRHPLTGGGMTVALSDVVVLRDLLKPLRDLNDAPTLSK





YLEAFYTLRKPVASTINTLAGALYKVFCASPDQARKEMRQACFDYLSLGGIFSNGPVSLLSGLN





PRPISLVLHFFAVAIYGVGRLLIPFPSPKRVWIGARIISGASAIIFPIIKAEGVRQMFFPATVA





AYYRAPRVVKGR




Pathway 1 sequence
ATGGAATTCCAATCGGAACCCTTGTTTGGGGTTCTGTTGGCTAGTCTTTTAGCGCTGGTTTTCT
912



id P
TCTTTACTTTGAGAGATGGTACCAAGAACAAGAAAACCACAACTGGGTCATCTGTGGATCTGAA




SQE3 DNA (codon
ACGTACTGACGCTGTCCTACAAATGTCTCCCGAAAACGATGCTAGAAGGCAGGAAATCATAGGG




optimized)
GATTCAGACGTGATTGTAGTAGGTGCAGGAGTTGCAGGAGCTGCATTAGCCTATACGTTGGGCA





AAGATGGTAGAAAAGTTCACGTAATTGAAAGAGACTTGACAGAGCCAGATAGAATTGTAGGTGA





ACTATTACAACCTGGTGGCTACTTGAAGCTAGTGGAGTTGGGTCTTGAAGATAGTGTTAAAGGT





ATTGACGCTCAACAAGTCTTTGGATATGCGTTGTATAAGGACGGTAAACACACAAGACTTACGT





ATCCTTTGGAAAAGTTCGACTCAACTGTATCAGGCAGATCCTTCCATAATGGCAGATTCATCCA





AAGATTAAGGGAATCTGTGAGACTAGAACAAGGAACTGTTACCAGCATCTTAGAAGAGGATGGA





ACAGTTAAAGGTGTTCAGTATAAGACGAAAATTGGAGAGGAGTTTACAGCTTATGCACCATTGA





CAATCGTCTGTGATGGCGGGTTTAGTAACTTGAGAAGAAATTTATGCAAACCACAAATCGACAT





TCCCTCGTGTTTTGTGGGATTAGTTTTGGAAAACTGCAAACTTCCCTTCGAGAATCATGGCCAT





GTAGTACTGGCAGATCCGTCACCTATTCTGTTATACCCGATTAGTTCAACGGAAATTCGTTGTT





TGGTTGACATTCCAGGTCAGAAAGTGCCCTCAGTAGCCAATGGCGAAATGGCCAGATACTTAAA





GACTGTTGTCGCTCCGCAAGTTCCACCTGAACTACATGCTGCCTTTATAGCGGCTATAGAGAAA





GGTAATATCAAGAGCACAACTAACAGATCTATGCCAGCAGCACCTCACCCAACACCTGGCGCCC





TGTTGCTAGGTGATGCATTCAATATGAGACATCCCTTAACCGGTGGTGGTATGACTGTTGCCTT





AGCGGACATTGTTGTGCTTAGAGATTTGTTGCGTCCTCTTGCTAATCTAAAGGATGCTGATGCC





TTGTGTCACTATCTAGAGTCCTTTTACACCCTTCGTAAACCTGTCGCATCCACCATAAACACAT





TAGCTGGCGCATTATACAAGGTCTTTTGTGCCTCTCCAGATTCTGCTAGAAAGGAAATGAGGGA





AGCATGTTTTGATTACCTGAGTTTAGGTGGTGTCTTTTCGTCTGGACCTGTAGCTTTGTTATCC





GGTTTGAATCCAAGACCTTTGTCCTTATTTTGCCATTTCTTTGCAGTGGCCATATATGGAGTTT





CTAGGTTGCTTATACCATTCCCAAGCCCAATGAGGATTTGGATTGGTGTTAGATTAATCACTGT





TGCGGCCGGTATAATATTTCCGATTATCAAAGCTGAAGGGGTCAGACAGATGTTCTTTCCTGCT





ACTGTCCCAGCTTATTACAGGGCACCACCAATGTAG




Pathway 1 sequence
MEFQSEPLFGVLLASLLALVFFFTLRDGTKNKKTTTGSSVDLKRTDAVLQMSPENDARRQEIIG
913



id Q
DSDVIVVGAGVAGAALAYTLGKDGRKVHVIERDLTEPDRIVGELLQPGGYLKLVELGLEDSVKG




SQE3 Protein
IDAQQVFGYALYKDGKHTRLTYPLEKFDSTVSGRSFHNGRFIQRLRESVRLEQGTVTSILEEDG





TVKGVQYKTKIGEEFTAYAPLTIVCDGGFSNLRRNLCKPQIDIPSCFVGLVLENCKLPFENHGH





VVLADPSPILLYPISSTEIRCLVDIPGQKVPSVANGEMARYLKTVVAPQVPPELHAAFIAAIEK





GNIKSTTNRSMPAAPHPTPGALLLGDAFNMRHPLTGGGMTVALADIVVLRDLLRPLANLKDADA





LCHYLESFYTLRKPVASTINTLAGALYKVFCASPDSARKEMREACFDYLSLGGVFSSGPVALLS





GLNPRPLSLFCHFFAVAIYGVSRLLIPFPSPMRIWIGVRLITVAAGIIFPIIKAEGVRQMFFPA





TVPAYYRAPPM




SS3e-E7 fusion
ATGTCCGAAAATCACGTTCCTGCCGTTGTCAAAACGCGTGGAAGTGCAGCTCCTGGAAGTGGAA
915



protein, coding
GTGGTTCAGGAATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTG




sequence,
GATTAAAAGCATTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAA




cucurbitadienol
AACGACGACGATGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAA




synthase
ATCACTTCCGTAATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATG





CGAGAAAGAGATCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAA





GATGTTAAGAAAGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTAC





AGACCTCTGACGGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGT





TATTGCGCTATACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATG





TGTCGTTACATCTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTT





CTACTATGTTTGGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGG





CGGTGAGCATGGTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCA





ATAACTTCCTGGGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATC





CATTGCCACCCGAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTG





TCATTGTAGAATGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATC





ACTCCAATAGTTTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGA





ATAGATCCAGAAACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATAT





CCTATGGGGCAGTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTG





AGGGAAAAGGCCATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACA





TATGCCTTGGACCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTC





TGATGCTTTCAAGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATG





AGAATGCAGGGTTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCT





CAACGAAATTGATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAG





TCAGATTCAAGAGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCT





TGGCCTTTTAGCACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAG





CTTCACTGATGTTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTT





ATGTGATGCAGTCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAA





TTAACTAGGTCCTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAA





TTGACTACAGTTATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTT





GCACCCTGGGCATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAG





AATATGCAGAAAACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTT





GGTTTGGCATCAAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAA





GGCTTGTAACTTCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGT





TGCCAAAACAAAGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGG





TCATGATGGCCTTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGC





CAGATTGCTAATCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTG





TTTAATAAGAACTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAG





GTGAATACTCCCATCGTGTCTTGGATATGTGA




SS3e-E7 fusion
MSENHVPAVVKTRGSAAPGSGSGSGMWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENE
916



protein,
NDDDEAIAVANNSASKFENARNHFRNNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGE




cucurbitadienol
DVKKEAVKNTLERALSFYSAVQTSDGNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEM




synthase
CRYIYNHQNEDGGWGLHIEGSSTMFGSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATA





ITSWGKLWLSVLGVYEWSGNNPLPPEFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPI





TPIVLSLRKELYTIPYHEIDWNRSRNTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRL





REKAMKIAMEHIHYEDENSRYICLGPVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGM





RMQGYNGSQLWDTAFSIQAIISTKLIDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGA





WPFSTRDHGWLISDCTAEGLKASLMLSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYE





LTRSYPWLELINPAETFGDIVIDYSYVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLE





NMQKTDGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLS





CQNKVYTNLEGNKPHLVNTAWVMMALIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGV





FNKNCMITYAAYRNIFPIWALGEYSHRVLDM-




SS3e-E7 fusion
MSENHVPAVVKTR
917



protein,





fusion domain





SS3d-G5 fusion
ATGACGACCCAGCAAGAGGAGCTCGATGTTGGAGACAGTGAGGGAAGTGCAGCTCCTGGAAGTG
919



protein, coding
GAAGTGGTTCAGGAATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAA




sequence,
GTGGATTAAAAGCATTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAAT




cucurbitadienol
GAAAACGACGACGATGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAA




synthase
GAAATCACTTCCGTAATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACA





ATGCGAGAAAGAGATCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGT





GAAGATGTTAAGAAAGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTG





TACAGACCTCTGACGGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCT





AGTTATTGCGCTATACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAA





ATGTGTCGTTACATCTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCT





CTTCTACTATGTTTGGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGA





TGGCGGTGAGCATGGTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACA





GCAATAACTTCCTGGGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACA





ATCCATTGCCACCCGAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTG





GTGTCATTGTAGAATGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCA





ATCACTCCAATAGTTTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATT





GGAATAGATCCAGAAACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGA





TATCCTATGGGGCAGTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGA





TTGAGGGAAAAGGCCATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGT





ACATATGCCTTGGACCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTA





CTCTGATGCTTTCAAGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGA





ATGAGAATGCAGGGTTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAA





TCTCAACGAAATTGATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCA





TAGTCAGATTCAAGAGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGA





GCTTGGCCTTTTAGCACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGA





AAGCTTCACTGATGTTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCG





TTTATGTGATGCAGTCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTAT





GAATTAACTAGGTCCTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCG





TAATTGACTACAGTTATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAA





GTTGCACCCTGGGCATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTG





GAGAATATGCAGAAAACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTG





GTTGGTTTGGCATCAAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAG





AAAGGCTTGTAACTTCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTA





AGTTGCCAAAACAAAGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCAT





GGGTCATGATGGCCTTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGC





TGCCAGATTGCTAATCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGT





GTGTTTAATAAGAACTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTC





TAGGTGAATACTCCCATCGTGTCTTGGATATGTGA




SS3d-G5 fusion
MTTQQEELDVGDSEGSAAPGSGSGSGMWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGEN
920



protein,
ENDDDEAIAVANNSASKFENARNHFRNNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEG




cucurbitadieno1
EDVKKEAVKNTLERALSFYSAVQTSDGNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQE




synthase
MCRYIYNHQNEDGGWGLHIEGSSTMFGSALNYVALRLLGEAADGGEHGAMTKARSWILERGGAT





AITSWGKLWLSVLGVYEWSGNNPLPPEFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGP





ITPIVLSLRKELYTIPYHEIDWNRSRNTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKR





LREKAMKIAMEHIHYEDENSRYICLGPVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDG





MRMQGYNGSQLWDTAFSIQAIISTKLIDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKG





AWPFSTRDHGWLISDCTAEGLKASLMLSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASY





ELTRSYPWLELINPAETFGDIVIDYSYVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFL





ENMQKTDGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYL





SCQNKVYTNLEGNKPHLVNTAWVMMALIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMG





VFNKNCMITYAAYRNIFPIWALGEYSHRVLDM-




SS3d-G5 fusion
MTTQQEELDVGDSE
921



protein, fusion





domain





SS3c-G8 fusion
ATGGAGGACGGTAAACAGGCCATCAGCGAGGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAG
923



protein, coding
GAATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAG




sequence
CATTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGAC





GATGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCC





GTAATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGA





GATCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAG





AAAGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTG





ACGGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCT





ATACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTAC





ATCTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGT





TTGGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCA





TGGTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCC





TGGGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCAC





CCGAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAG





AATGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATA





GTTTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCA





GAAACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGG





CAGTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAG





GCCATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTG





GACCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTT





CAAGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAG





GGTTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAAT





TGATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCA





AGAGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTT





AGCACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGA





TGTTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGC





AGTCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGG





TCCTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACA





GTTATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGG





GCATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAG





AAAACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCA





TCAAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAA





CTTCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAAC





AAAGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGG





CCTTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCT





AATCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAG





AACTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACT





CCCATCGTGTCTTGGATATGTGA




SS3c-G8 fusion
MEDGKQAISEGSAAPGSGSGSGMWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDD
924



protein,
DEAIAVANNSASKFENARNHFRNNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVK




cucurbitadienol
KEAVKNTLERALSFYSAVQTSDGNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRY




synthase
IYNHQNEDGGWGLHIEGSSTMFGSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITS





WGKLWLSVLGVYEWSGNNPLPPEFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPI





VLSLRKELYTIPYHEIDWNRSRNTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREK





AMKIAMEHIHYEDENSRYICLGPVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQ





GYNGSQLWDTAFSIQAIISTKLIDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPF





STRDHGWLISDCTAEGLKASLMLSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTR





SYPWLELINPAETFGDIVIDYSYVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQ





KTDGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQN





KVYTNLEGNKPHLVNTAWVMMALIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNK





NCMITYAAYRNIFPIWALGEYSHRVLDM-




SS3c-G8 fusion
MEDGKQAISE
925



protein,





fusion domain





SS3c-E5 fusion
ATGACGATCGGTGATAAGCTGAAAAAGAAGCTTGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTT
927



protein, coding
CAGGAATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAA




sequence
AAGCATTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGAC





GACGATGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACT





TCCGTAATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAA





AGAGATCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTT





AAGAAAGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCT





CTGACGGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGC





GCTATACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGT





TACATCTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTA





TGTTTGGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGA





GCATGGTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACT





TCCTGGGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGC





CACCCGAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTG





TAGAATGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCA





ATAGTTTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGAT





CCAGAAACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATG





GGGCAGTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAA





AAGGCCATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCC





TTGGACCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGC





TTTCAAGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATG





CAGGGTTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGA





AATTGATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGAT





TCAAGAGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCT





TTTAGCACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCAC





TGATGTTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGA





TGCAGTCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACT





AGGTCCTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACT





ACAGTTATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCC





TGGGCATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATG





CAGAAAACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTG





GCATCAAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTG





TAACTTCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAA





AACAAAGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGA





TGGCCTTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATT





GCTAATCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAAT





AAGAACTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAAT





ACTCCCATCGTGTCTTGGATATGTGA




SS3c-E5 fusion
MTIGDKLKKKLGSAAPGSGSGSGMWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENEND
928



protein,
DDEAIAVANNSASKFENARNHFRNNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDV




cucurbitadieno1
KKEAVKNTLERALSFYSAVQTSDGNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCR




synthase
YIYNHQNEDGGWGLHIEGSSTMFGSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAIT





SWGKLWLSVLGVYEWSGNNPLPPEFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITP





IVLSLRKELYTIPYHEIDWNRSRNTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLRE





KAMKIAMEHIHYEDENSRYICLGPVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRM





QGYNGSQLWDTAFSIQAIISTKLIDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWP





FSTRDHGWLISDCTAEGLKASLMLSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELT





RSYPWLELINPAETFGDIVIDYSYVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENM





QKTDGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQ





NKVYTNLEGNKPHLVNTAWVMMALIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFN





KNCMITYAAYRNIFPIWALGEYSHRVLDM-




SS3c-E5 protein,
MTIGDKLKKKL
929



fusion domain





SS2c-E2 fusion
ATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAGCA
931



protein, coding
TTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGACGA




sequence
TGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCCGT





AATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGAGA





TCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAGAA





AGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTGAC





GGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCTAT





ACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTACAT





CTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGTTT





GGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCATG





GTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCCTG





GGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCACCC





GAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAGAA





TGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATAGT





TTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCAGA





AACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGGCA





GTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAGGC





CATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTGGA





CCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTTCA





AGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAGGG





TTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAATTG





ATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCAAG





AGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTTAG





CACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGATG





TTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGCAG





TCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGGTC





CTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACAGT





TATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGGGC





ATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAGAA





AACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCATC





AAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAACT





TCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAACAA





AGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGGCC





TTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCTAA





TCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAGAA





CTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACTCC





CATCGTGTCTTGGATATGGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAGGAATGATGTTGG





AGCCGTCACCCTAA




SS2c-E2 fusion
MWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDDEAIAVANNSASKFENARNHFR
932



protein,
NNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKKEAVKNTLERALSFYSAVQTSD




cucurbitadienol
GNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIYNHQNEDGGWGLHIEGSSTMF




synthase
GSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPP





EFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNRSR





NTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKAMKIAMEHIHYEDENSRYICLG





PVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGYNGSQLWDTAFSIQAIISTKL





IDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLM





LSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDYS





YVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQKTDGSWYGCWGVCFTYAGWFGI





KGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVMMA





LIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYS





HRVLDMGSAAPGSGSGSGMMLEPSP-




SS2c-E2 protein,
MMLEPSP
933



fusion domain





SS2c-A10b fusion
ATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAGCA
935



protein, coding
TTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGACGA




sequence
TGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCCGT





AATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGAGA





TCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAGAA





AGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTGAC





GGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCTAT





ACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTACAT





CTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGTTT





GGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCATG





GTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCCTG





GGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCACCC





GAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAGAA





TGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATAGT





TTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCAGA





AACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGGCA





GTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAGGC





CATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTGGA





CCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTTCA





AGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAGGG





TTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAATTG





ATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCAAG





AGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTTAG





CACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGATG





TTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGCAG





TCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGGTC





CTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACAGT





TATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGGGC





ATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAGAA





AACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCATC





AAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAACT





TCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAACAA





AGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGGCC





TTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCTAA





TCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAGAA





CTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACTCC





CATCGTGTCTTGGATATGGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAGGAATGAATTCGA





ATGAAGACATCATACCTGAACTATAA




SS2c-A10b fusion
MWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDDEAIAVANNSASKFENARNHFR
936



protein,
NNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKKEAVKNTLERALSFYSAVQTSD




cucurbitadieno1
GNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIYNHQNEDGGWGLHIEGSSTMF




synthase
GSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPP





EFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNRSR





NTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKAMKIAMEHIHYEDENSRYICLG





PVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGYNGSQLWDTAFSIQAIISTKL





IDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLM





LSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDYS





YVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQKTDGSWYGCWGVCFTYAGWFGI





KGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVMMA





LIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYS





HRVLDMGSAAPGSGSGSGMNSNEDIIPEL-




SS2c-A10b protein,
MNSNEDIIPE
937



fusion domain





SS3b-C1 fusion
ATGTGGAACAAAACCAAAAAAACACAAGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAGGAA
939



protein, coding
TGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAGCAT




sequence
TAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGACGAT





GAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCCGTA





ATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGAGAT





CATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAGAAA





GAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTGACG





GTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCTATA





CGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTACATC





TATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGTTTG





GGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCATGG





TGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCCTGG





GGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCACCCG





AATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAGAAT





GGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATAGTT





TTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCAGAA





ACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGGCAG





TATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAGGCC





ATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTGGAC





CCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTTCAA





GTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAGGGT





TATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAATTGA





TTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCAAGA





GGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTTAGC





ACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGATGT





TATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGCAGT





CAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGGTCC





TATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACAGTT





ATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGGGCA





TAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAGAAA





ACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCATCA





AAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAACTT





CCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAACAAA





GTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGGCCT





TGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCTAAT





CAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAGAAC





TGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACTCCC





ATCGTGTCTTGGATATGTGA




SS3b-C1 fusion
MWNKTKKTQGSAAPGSGSGSGMWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDD
940



protein,
EAIAVANNSASKFENARNHFRNNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKK




cucurbitadienol
EAVKNTLERALSFYSAVQTSDGNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYI




synthase
YNHQNEDGGWGLHIEGSSTMFGSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSW





GKLWLSVLGVYEWSGNNPLPPEFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIV





LSLRKELYTIPYHEIDWNRSRNTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKA





MKIAMEHIHYEDENSRYICLGPVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQG





YNGSQLWDTAFSIQAIISTKLIDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFS





TRDHGWLISDCTAEGLKASLMLSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRS





YPWLELINPAETFGDIVIDYSYVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQK





TDGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNK





VYTNLEGNKPHLVNTAWVMMALIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKN





CMITYAAYRNIFPIWALGEYSHRVLDM-




SS3b-C1 protein, 
MWNKTKKTQ
941



fusion domain





SS3b-B10 fusion
ATGGCCAAAGAAGATACTGTAAAACTAAAAAGGGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTT
943



protein, coding
CAGGAATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAA




sequence
AAGCATTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGAC





GACGATGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACT





TCCGTAATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAA





AGAGATCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTT





AAGAAAGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCT





CTGACGGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGC





GCTATACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGT





TACATCTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTA





TGTTTGGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGA





GCATGGTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACT





TCCTGGGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGC





CACCCGAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTG





TAGAATGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCA





ATAGTTTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGAT





CCAGAAACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATG





GGGCAGTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAA





AAGGCCATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCC





TTGGACCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGC





TTTCAAGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATG





CAGGGTTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGA





AATTGATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGAT





TCAAGAGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCT





TTTAGCACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCAC





TGATGTTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGA





TGCAGTCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACT





AGGTCCTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACT





ACAGTTATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCC





TGGGCATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATG





CAGAAAACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTG





GCATCAAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTG





TAACTTCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAA





AACAAAGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGA





TGGCCTTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATT





GCTAATCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAAT





AAGAACTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAAT





ACTCCCATCGTGTCTTGGATATGTGA




SS3b-B10 fusion
MAKEDTVKLKRGSAAPGSGSGSGMWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENEND
944



protein,
DDEAIAVANNSASKFENARNHFRNNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDV




cucurbitadieno1
KKEAVKNTLERALSFYSAVQTSDGNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCR




synthase
YIYNHQNEDGGWGLHIEGSSTMFGSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAIT





SWGKLWLSVLGVYEWSGNNPLPPEFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITP





IVLSLRKELYTIPYHEIDWNRSRNTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLRE





KAMKIAMEHIHYEDENSRYICLGPVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRM





QGYNGSQLWDTAFSIQAIISTKLIDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWP





FSTRDHGWLISDCTAEGLKASLMLSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELT





RSYPWLELINPAETFGDIVIDYSYVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENM





QKTDGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQ





NKVYTNLEGNKPHLVNTAWVMMALIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFN





KNCMITYAAYRNIFPIWALGEYSHRVLDM-




SS3b-B10 protein,
MAKEDTVKLKR
945



fusion domain





SS3a-D8 fusion
ATGTCATTTCAAATTGAAACGGTTCGTACTGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAG
947



protein, coding
GAATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAG




sequence
CATTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGAC





GATGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCC





GTAATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGA





GATCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAG





AAAGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTG





ACGGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCT





ATACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTAC





ATCTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGT





TTGGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCA





TGGTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCC





TGGGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCAC





CCGAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAG





AATGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATA





GTTTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCA





GAAACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGG





CAGTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAG





GCCATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTG





GACCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTT





CAAGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAG





GGTTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAAT





TGATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCA





AGAGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTT





AGCACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGA





TGTTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGC





AGTCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGG





TCCTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACA





GTTATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGG





GCATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAG





AAAACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCA





TCAAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAA





CTTCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAAC





AAAGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGG





CCTTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCT





AATCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAG





AACTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACT





CCCATCGTGTCTTGGATATGTGA




SS3a-D8 fusion
MSFQIETVRTGSAAPGSGSGSGMWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDD
948



protein,
DEAIAVANNSASKFENARNHFRNNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVK




cucurbitadieno1
KEAVKNTLERALSFYSAVQTSDGNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRY




synthase
IYNHQNEDGGWGLHIEGSSTMFGSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITS





WGKLWLSVLGVYEWSGNNPLPPEFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPI





VLSLRKELYTIPYHEIDWNRSRNTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREK





AMKIAMEHIHYEDENSRYICLGPVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQ





GYNGSQLWDTAFSIQAIISTKLIDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPF





STRDHGWLISDCTAEGLKASLMLSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTR





SYPWLELINPAETFGDIVIDYSYVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQ





KTDGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQN





KVYTNLEGNKPHLVNTAWVMMALIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNK





NCMITYAAYRNIFPIWALGEYSHRVLDM




SS3a-D8 protein,
MSFQIETVRT
949



fusion domain





SS3a-A2 fusion
ATGACCGGCTTGAATGGAGATGCTGACAGCGATCTACTAGGAAGTGCAGCTCCTGGAAGTGGAA
951



protein (5'
GTGGTTCAGGAATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTG




fusion), coding
GATTAAAAGCATTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAA




sequence
AACGACGACGATGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAA





ATCACTTCCGTAATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATG





CGAGAAAGAGATCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAA





GATGTTAAGAAAGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTAC





AGACCTCTGACGGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGT





TATTGCGCTATACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATG





TGTCGTTACATCTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTT





CTACTATGTTTGGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGG





CGGTGAGCATGGTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCA





ATAACTTCCTGGGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATC





CATTGCCACCCGAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTG





TCATTGTAGAATGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATC





ACTCCAATAGTTTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGA





ATAGATCCAGAAACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATAT





CCTATGGGGCAGTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTG





AGGGAAAAGGCCATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACA





TATGCCTTGGACCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTC





TGATGCTTTCAAGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATG





AGAATGCAGGGTTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCT





CAACGAAATTGATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAG





TCAGATTCAAGAGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCT





TGGCCTTTTAGCACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAG





CTTCACTGATGTTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTT





ATGTGATGCAGTCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAA





TTAACTAGGTCCTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAA





TTGACTACAGTTATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTT





GCACCCTGGGCATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAG





AATATGCAGAAAACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTT





GGTTTGGCATCAAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAA





GGCTTGTAACTTCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGT





TGCCAAAACAAAGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGG





TCATGATGGCCTTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGC





CAGATTGCTAATCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTG





TTTAATAAGAACTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAG





GTGAATACTCCCATCGTGTCTTGGATATGTGA




SS3a-A2 fusion
MTGLNGDADSDLLGSAAPGSGSGSGMWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENE
952



protein (5'
NDDDEAIAVANNSASKFENARNHFRNNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGE




fusion),
DVKKEAVKNTLERALSFYSAVQTSDGNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEM




cucurbitadienol
CRYIYNHQNEDGGWGLHIEGSSTMFGSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATA




synthase
ITSWGKLWLSVLGVYEWSGNNPLPPEFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPI





TPIVLSLRKELYTIPYHEIDWNRSRNTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRL





REKAMKIAMEHIHYEDENSRYICLGPVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGM





RMQGYNGSQLWDTAFSIQAIISTKLIDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGA





WPFSTRDHGWLISDCTAEGLKASLMLSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYE





LTRSYPWLELINPAETFGDIVIDYSYVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLE





NMQKTDGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLS





CQNKVYTNLEGNKPHLVNTAWVMMALIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGV





FNKNCMITYAAYRNIFPIWALGEYSHRVLDM-




SS3a-A2 protein
MTGLNGDADSDLL




(5' fusion),
953




fusion domain





SS3f-A8 fusion
ATGGCAAGTAACCAGCTCGAGCCCCTGCAAACTGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTT
955



protein (5'
CAGGAATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAA




fusion), coding
AAGCATTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGAC




sequence
GACGATGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACT





TCCGTAATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAA





AGAGATCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTT





AAGAAAGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCT





CTGACGGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGC





GCTATACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGT





TACATCTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTA





TGTTTGGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGA





GCATGGTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACT





TCCTGGGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGC





CACCCGAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTG





TAGAATGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCA





ATAGTTTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGAT





CCAGAAACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATG





GGGCAGTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAA





AAGGCCATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCC





TTGGACCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGC





TTTCAAGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATG





CAGGGTTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGA





AATTGATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGAT





TCAAGAGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCT





TTTAGCACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCAC





TGATGTTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGA





TGCAGTCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACT





AGGTCCTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACT





ACAGTTATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCC





TGGGCATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATG





CAGAAAACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTG





GCATCAAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTG





TAACTTCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAA





AACAAAGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGA





TGGCCTTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATT





GCTAATCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAAT





AAGAACTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAAT





ACTCCCATCGTGTCTTGGATATGTGA




SS3f-A8 fusion
MASNQLEPLQTGSAAPGSGSGSGMWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENEND
956



protein (5'
DDEAIAVANNSASKFENARNHFRNNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDV




fusion),
KKEAVKNTLERALSFYSAVQTSDGNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCR




cucurbitadienol
YIYNHQNEDGGWGLHIEGSSTMFGSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAIT




synthase
SWGKLWLSVLGVYEWSGNNPLPPEFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITP





IVLSLRKELYTIPYHEIDWNRSRNTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLRE





KAMKIAMEHIHYEDENSRYICLGPVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRM





QGYNGSQLWDTAFSIQAIISTKLIDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWP





FSTRDHGWLISDCTAEGLKASLMLSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELT





RSYPWLELINPAETFGDIVIDYSYVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENM





QKTDGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQ





NKVYTNLEGNKPHLVNTAWVMMALIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFN





KNCMITYAAYRNIFPIWALGEYSHRVLDM-




SS3f-A8 protein
MASNQLEPLQT
957



(5
fusion),




fusion domain





SS4b-B8b fusion
ATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAGCA
959



protein, coding
TTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGACGA




sequence
TGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCCGT





AATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGAGA





TCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAGAA





AGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTGAC





GGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCTAT





ACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTACAT





CTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGTTT





GGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCATG





GTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCCTG





GGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCACCC





GAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAGAA





TGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATAGT





TTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCAGA





AACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGGCA





GTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAGGC





CATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTGGA





CCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTTCA





AGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAGGG





TTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAATTG





ATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCAAG





AGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTTAG





CACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGATG





TTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGCAG





TCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGGTC





CTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACAGT





TATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGGGC





ATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAGAA





AACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCATC





AAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAACT





TCCTGTTGTCAAAAC-AATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAACAA





AGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGGCC





TTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCTAA





TCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAGAA





CTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACTCC





CATCGTGTCTTGGATATGGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAGGAATGAATTTAG





ATTTAGATCAAGATTCAGACTAG




SS4b-B8b fusion
MWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDDEAIAVANNSASKFENARNHFR
960



protein,
NNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKKEAVKNTLERALSFYSAVQTSD




cucurbitadienol
GNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIYNHQNEDGGWGLHIEGSSTMF




synthase
GSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPP





EFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNRSR





NTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKAMKIAMEHIHYEDENSRYICLG





PVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGYNGSQLWDTAFSIQAIISTKL





IDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLM





LSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDYS





YVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQKTDGSWYGCWGVCFTYAGWFGI





KGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVMMA





LIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYS





HRVLDMGSAAPGSGSGSGMNLDLDQDSD-




SS4b-B8b fusion
MNLDLDQDSD
961



protein, fusion





domain





SS4b-B8a fusion
ATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAGCA
963



protein, coding
TTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGACGA




sequence
TGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCCGT





AATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGAGA





TCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAGAA





AGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTGAC





GGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCTAT





ACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTACAT





CTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGTTT





GGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCATG





GTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCCTG





GGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCACCC





GAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAGAA





TGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATAGT





TTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCAGA





AACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGGCA





GTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAGGC





CATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTGGA





CCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTTCA





AGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAGGG





TTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAATTG





ATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCAAG





AGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTTAG





CACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGATG





TTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGCAG





TCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGGTC





CTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACAGT





TATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGGGC





ATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAGAA





AACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCATC





AAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAACT





TCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAACAA





AGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGGCC





TTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCTAA





TCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAGAA





CTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACTCC





CATCGTGTCTTGGATATGGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAGGAATGATAAAAC





ATATAGTTTCGCCATTCAGGACGAATTTTGTTGGCATCAGCAAGTCCGTGCTGTCAAGGATGAT





TCATCACAAGGTTACAATCATAGGTTCTGGCCCCGCTGCCCACACCGCTGCTATATACTTGGCA





AGAGCAGAGATGAAGCCCACATTATATGAGGGAATGATGGCCAACGGAATTGCTGCTGGTGGCC





AATTGACAACAACCACCGATATCGAAAATTTCCCAGGGTTTCCTGAATCGTTGAGTGGCAGTGA





ACTGATGGAGAGGATGAGGAAACAATCTGCCAAGTTTGGCACTAACATAATTATCGAGACTGTC





TCTAAAGTCGATTTATCTTCAAAACCATTCAGATTATGGACCGAATTTAATGAGGATGCAGAGC





CTGTGACCACTGATGCTATAATCTTGGCCACGGGTGCTTCCGCTAAGAGAATGCATTTACCAGG





GGAGGAAACCTACTGGCAGCAGGGAATATCTGCCTGTGCTGTATGTGATGGTGCAGTCCCTATC





TTTAGAAACAAGCCATTGGCCGTTATTGGTGGTGGTGACTCTGCGTGTGAGGAAGCGGAATTTC





TTACGAAGTATGCGTCGAAAGTATATATATTAGTAAGAAAGGATCATTTTCGTGCATCTGTAAT





AATGCAGAGACGAATTGAGAAAAATCCAAACATCATTGTTTTGTTCAACACAGTTGCATTAGAA





GCTAAGGGTGATGGTAAGTTATTGAATATGTTGAGAATTAAGAATACTAAAAGTAATGTGGAGA





ACGATTTAGAAGTAAATGGACTATTTTACGCAATAGGTCACAGCCCTGCCACAGATATAGTTAA





AGGACAAGTAGATGAAGAAGAGACGGGGTATATAAAAACTGTGCCTGGATCGTCTCTGACTTCT





GTGCCAGGTTTTTTTGCTGCAGGTGACGTTCAGGACTCTAGGTATAGACAAGCAGTTACTTCTG





CTGGTTCCGGATGCATTGCTGCTTTGGATGCAGAACGGTACCTAAGTGCCCAAGAGTAA




SS4b-B8a fusion
MWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDDEAIAVANNSASKFENARNHFR
964



protein,
NNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKKEAVKNTLERALSFYSAVQTSD




cucurbitadienol
GNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIYNHQNEDGGWGLHIEGSSTMF




synthase
GSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPP





EFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNRSR





NTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKAMKIAMEHIHYEDENSRYICLG





PVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGYNGSQLWDTAFSIQAIISTKL





IDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLM





LSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDYS





YVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQKTDGSWYGCWGVCFTYAGWFGI





KGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVMMA





LIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYS





HRVLDMGSAAPGSGSGSGMIKHIVSPFRTNFVGISKSVLSRMIHHKVTIIGSGPAAHTAAIYLA





RAEMKPTLYEGMMANGIAAGGQLTTTTDIENFPGFPESLSGSELMERMRKQSAKFGTNIIIETV





SKVDLSSKPFRLWTEFNEDAEPVTTDAIILATGASAKRMHLPGEETYWQQGISACAVCDGAVPI





FRNKPLAVIGGGDSACEEAEFLTKYASKVYILVRKDHFRASVIMQRRIEKNPNIIVLFNTVALE





AKGDGKLLNMLRIKNTKSNVENDLEVNGLFYAIGHSPATDIVKGQVDEEETGYIKTVPGSSLTS





VPGFFAAGDVQDSRYRQAVTSAGSGCIAALDAERYLSAQE-




SS4b-D4 fusion
ATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAGCA
966



protein, coding
TTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGACGA




sequence
TGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCCGT





AATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGAGA





TCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAGAA





AGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTGAC





GGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCTAT





ACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTACAT





CTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGTTT





GGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCATG





GTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCCTG





GGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCACCC





GAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAGAA





TGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATAGT





TTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCAGA





AACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGGCA





GTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAGGC





CATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTGGA





CCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTTCA





AGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAGGG





TTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAATTG





ATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCAAG





AGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTTAG





CACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGATG





TTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGCAG





TCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGGTC





CTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACAGT





TATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGGGC





ATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAGAA





AACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCATC





AAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAACT





TCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAACAA





AGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGGCC





TTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCTAA





TCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAGAA





CTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACTCC





CATCGTGTCTTGGATATGGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAGGAATGGCCGTAC





AGAACCATATCTTGCCTCTAA




SS4b-D4 fusion
MWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDDEAIAVANNSASKFENARNHFR
967



protein,
NNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKKEAVKNTLERALSFYSAVQTSD




cucurbitadienol
GNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIYNHQNEDGGWGLHIEGSSTMF




synthase
GSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPP





EFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNRSR





NTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKAMKIAMEHIHYEDENSRYICLG





PVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGYNGSQLWDTAFSIQAIISTKL





IDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLM





LSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDYS





YVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQKTDGSWYGCWGVCFTYAGWFGI





KGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVMMA





LIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYS





HRVLDMGSAAPGSGSGSGMAVQNHILPLTRVM-




SS4b-D4 fusion
MAVQNHILPLTRVM




protein, fusion
968




domain





SS4c-C4a fusion
ATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAGCA
970



protein, coding
TTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGACGA




sequence
TGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCCGT





AATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGAGA





TCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAGAA





AGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTGAC





GGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCTAT





ACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTACAT





CTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGTTT





GGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCATG





GTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCCTG





GGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCACCC





GAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAGAA





TGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATAGT





TTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCAGA





AACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGGCA





GTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAGGC





CATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTGGA





CCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTTCA





AGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAGGG





TTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAATTG





ATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCAAG





AGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTTAG





CACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGATG





TTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGCAG





TCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGGTC





CTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACAGT





TATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGGGC





ATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAGAA





AACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCATC





AAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAACT





TCCTGTTGTCAAAAC-AATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAACAA





AGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGGCC





TTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCTAA





TCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAGAA





CTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACTCC





CATCGTGTCTTGGATATGGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAGGAATGTCTGCTT





CAACTCATTCGCCTGAATAACCGTGTCTGA




SS4c-C4a fusion
MWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDDEAIAVANNSASKFENARNHFR
971



protein,
NNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKKEAVKNTLERALSFYSAVQTSD




cucurbitadienol
GNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIYNHQNEDGGWGLHIEGSSTMF




synthase
GSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPP





EFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNRSR





NTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKAMKIAMEHIHYEDENSRYICLG





PVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGYNGSQLWDTAFSIQAIISTKL





IDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLM





LSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDYS





YVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQKTDGSWYGCWGVCFTYAGWFGI





KGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVMMA





LIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYS





HRVLDMGSAAPGSGSGSGMSASTHSPE-PCL




SS4c-C4a fusion
MSASTHSPE
972



protein, fusion





domain





SS4 c-C4b fusion
ATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAGCA
974



protein, coding
TTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGACGA




sequence
TGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCCGT





AATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGAGA





TCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAGAA





AGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTGAC





GGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCTAT





ACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTACAT





CTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGTTT





GGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCATG





GTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCCTG





GGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCACCC





GAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAGAA





TGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATAGT





TTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCAGA





AACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGGCA





GTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAGGC





CATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTGGA





CCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTTCA





AGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAGGG





TTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAATTG





ATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCAAG





AGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTTAG





CACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGATG





TTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGCAG





TCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGGTC





CTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACAGT





TATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGGGC





ATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAGAA





AACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCATC





AAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAACT





TCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAACAA





AGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGGCC





TTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCTAA





TCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAGAA





CTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACTCC





CATCGTGTCTTGGATATGGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAGGAATGGGTACTA





GCATAGTAAATCTAAACCAAAAGATTGAACTGCCCCCAATCCAGGTCTTATTCGAGTCACTTAA





CCGAGAAAATGAAACAAAACCCCACTTCGAGGAACGCAGGTTATATCAACCTAATCCTTCATTT





GTTCCTAGAACAAATATAGCAGTTGGTAGCCCAGTTAACCCGGTTCCAGTATCATCCCCTGTTT





TTTTCATTGGTCCTTCTCCACAGAGAAGCATTCAGAATCACAACGCTATTATGACTCAAAACAT





ACGTCAGTATCCAGTTATATATAATAATAACCGAGAAGTTATATCTACGGGTGAGAGAAATTAC





ATAATAACTGTAGGGGGGCCTCCGGTAACTTCTTCGCAGCCCGAGTATGAGCATATCTCAACTC





CCAATTTCTATCAAGAGCAGAGACTGGCACAACCTCATCCGGTAAATGAGAGTATGATGATAGG





TGGTTATACAAATCCTCAGCCTATTAGCATTTCCCGAGGTAAAATGCTATCCGGCAACATAAGT





ACGAACTCGGTCCGCGGATCTAATAATGGATATTCCGCAAAAGAAAAAAAACATAAGGCACATG





GTAAGAGGTCCAATTTACCAAAGGCCACCGTTTCAATTCTAAACAAATGGTTACATGAGCACGT





AAACAACCCTTACCCAACCGTGCAGGAAAAAAGAGAACTGCTCGCGAAAACTGGTCTAACTAAA





CTTCAAATTTCCAATTGGTTCATTAATGCTAGGAGAAGAAAAATATTTTCTGGCCAGAATGACG





CAAATAATTTCAGAAGAAAATTCAGTTCTTCTACAAATTTAGCTAAGTTCTGA




SS4c-C4b fusion
MWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDDEAIAVANNSASKFENARNHFR
975



protein,
NNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKKEAVKNTLERALSFYSAVQTSD




cucurbitadienol
GNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIYNHQNEDGGWGLHIEGSSTMF




synthase
GSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPP





EFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNRSR





NTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKAMKIAMEHIHYEDENSRYICLG





PVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGYNGSQLWDTAFSIQAIISTKL





IDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLM





LSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDYS





YVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQKTDGSWYGCWGVCFTYAGWFGI





KGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVMMA





LIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYS





HRVLDMGSAAPGSGSGSGMGTSIVNLNQKIELPPIQVLFESLNRENETKPHFEERRLYQPNPSF





VPRTNIAVGSPVNPVPVSSPVFFIGPSPQRSIQNHNAIMTQNIRQYPVIYNNNREVISTGERNY





IITVGGPPVTSSQPEYEHISTPNFYQEQRLAQPHPVNESMMIGGYTNPQPISISRGKMLSGNIS





TNSVRGSNNGYSAKEKKHKAHGKRSNLPKATVSILNKWLHEHVNNPYPTVQEKRELLAKTGLTK





LQISNWFINARRRKIFSGQNDANNFRRKFSSSTNLAKF-




554e-B2 fusion
ATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAGCA
978



protein, coding
TTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGACGA




sequence
TGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCCGT





AATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGAGA





TCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAGAA





AGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTGAC





GGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCTAT





ACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTACAT





CTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGTTT





GGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCATG





GTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCCTG





GGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCACCC





GAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAGAA





TGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATAGT





TTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCAGA





AACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGGCA





GTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAGGC





CATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTGGA





CCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTTCA





AGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAGGG





TTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAATTG





ATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCAAG





AGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTTAG





CACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGATG





TTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGCAG





TCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGGTC





CTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACAGT





TATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGGGC





ATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAGAA





AACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCATC





AAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAACT





TCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAACAA





AGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGGCC





TTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCTAA





TCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAGAA





CTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACTCC





CATCGTGTCTTGGATATGGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAGGAATGGACCAGC





CAAGGACCATTTAAGTAA




SS4e-B2 fusion
MWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDDEAIAVANNSASKFENARNHFR
979



protein,
NNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKKEAVKNTLERALSFYSAVQTSD




cucurbitadienol
GNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIYNHQNEDGGWGLHIEGSSTMF




synthase
GSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPP





EFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNRSR





NTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKAMKIAMEHIHYEDENSRYICLG





PVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGYNGSQLWDTAFSIQAIISTKL





IDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLM





LSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDYS





YVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQKTDGSWYGCWGVCFTYAGWFGI





KGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVMMA





LIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYS





HRVLDMGSAAPGSGSGSGMDQPRTI-V




SS4e-B2 fusion
MDQPRTI
980



protein, fusion





domain





SS5a-E8 fusion
ATGACTACGGATAACGCAGCAGCATATTCAGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAG
982



protein, coding
GAATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAG




sequence
CATTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGAC





GATGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCC





GTAATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGA





GATCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAG





AAAGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTG





ACGGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCT





ATACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTAC





ATCTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGT





TTGGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCA





TGGTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCC





TGGGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCAC





CCGAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAG





AATGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATA





GTTTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCA





GAAACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGG





CAGTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAG





GCCATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTG





GACCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTT





CAAGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAG





GGTTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAAT





TGATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCA





AGAGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTT





AGCACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGA





TGTTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGC





AGTCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGG





TCCTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACA





GTTATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGG





GCATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAG





AAAACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCA





TCAAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAA





CTTCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAAC





AAAGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGG





CCTTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCT





AATCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAG





AACTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACT





CCCATCGTGTCTTGGATATGTGA




SS5a-E8 fusion
MTTDNAAAYSGSAAPGSGSGSGMWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDD
983



protein,
DEAIAVANNSASKFENARNHFRNNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVK




cucurbitadienol
KEAVKNTLERALSFYSAVQTSDGNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRY




synthase
IYNHQNEDGGWGLHIEGSSTMFGSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITS





WGKLWLSVLGVYEWSGNNPLPPEFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPI





VLSLRKELYTIPYHEIDWNRSRNTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREK





AMKIAMEHIHYEDENSRYICLGPVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQ





GYNGSQLWDTAFSIQAIISTKLIDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPF





STRDHGWLISDCTAEGLKASLMLSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTR





SYPWLELINPAETFGDIVIDYSYVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQ





KTDGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQN





KVYTNLEGNKPHLVNTAWVMMALIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNK





NCMITYAAYRNIFPIWALGEYSHRVLDM-




SS5a-E8 fusion
MTTDNAAAYS
984



protein, fusion





domain





SS 5d-E7 fusion
ATGGCTCTCGGTAATGAGATAGCCAACTTGCAAGAGGGAAGTGCAGCTCCTGGAAGTGGAAGTG
986



protein, coding
GTTCAGGAATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGAT




sequence
TAAAAGCATTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAAC





GACGACGATGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATC





ACTTCCGTAATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGA





GAAAGAGATCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGAT





GTTAAGAAAGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGA





CCTCTGACGGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTAT





TGCGCTATACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGT





CGTTACATCTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTA





CTATGTTTGGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGG





TGAGCATGGTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATA





ACTTCCTGGGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCAT





TGCCACCCGAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCA





TTGTAGAATGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACT





CCAATAGTTTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATA





GATCCAGAAACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCT





ATGGGGCAGTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGG





GAAAAGGCCATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATAT





GCCTTGGACCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGA





TGCTTTCAAGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGA





ATGCAGGGTTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAA





CGAAATTGATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCA





GATTCAAGAGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGG





CCTTTTAGCACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTT





CACTGATGTTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATG





TGATGCAGTCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTA





ACTAGGTCCTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTG





ACTACAGTTATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCA





CCCTGGGCATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAAT





ATGCAGAAAACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGT





TTGGCATCAAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGC





TTGTAACTTCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGC





CAAAACAAAGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCA





TGATGGCCTTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAG





ATTGCTAATCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTT





AATAAGAACTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTG





AATACTCCCATCGTGTCTTGGATATGTGA




SS5d-E7 fusion
MALGNEIANLQEGSAAPGSGSGSGMWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENEN
987



protein,
DDDEAIAVANNSASKFENARNHFRNNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGED




cucurbitadienol
VKKEAVKNTLERALSFYSAVQTSDGNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMC




synthase
RYIYNHQNEDGGWGLHIEGSSTMFGSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAI





TSWGKLWLSVLGVYEWSGNNPLPPEFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPIT





PIVLSLRKELYTIPYHEIDWNRSRNTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLR





EKAMKIAMEHIHYEDENSRYICLGPVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMR





MQGYNGSQLWDTAFSIQAIISTKLIDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAW





PFSTRDHGWLISDCTAEGLKASLMLSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYEL





TRSYPWLELINPAETFGDIVIDYSYVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLEN





MQKTDGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSC





QNKVYTNLEGNKPHLVNTAWVMMALIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVF





NKNCMITYAAYRNIFPIWALGEYSHRVLDM-




SS5d-E7 fusion
MALGNEIANLQE
988



protein, fusion





domain





SS5d-G5 fusion
ATGCCAGTCTCTGTAATAACCACGTCAACACAGCCACATGTGAAGGAGCCTGTGGAAGAAGAGA
990



protein, coding
GTGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAGGAATGTGGAGACTGAAAGTAGGCAAAGA




sequence
ATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAGCATTAGTAACCATTTGGGCAGACAAGTC





TGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGACGATGAAGCAATTGCTGTAGCTAACAATT





CAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCCGTAATAACAGATTCCATAGGAAGCAATC





TTCCGACTTATTCTTGGCTATACAATGCGAGAAAGAGATCATTAGGAATGGTGCTAAGAATGAA





GGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAGAAAGAAGCCGTAAAAAATACACTAGAAA





GAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTGACGGTAATTGGGCATCAGACTTGGGAGG





ACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCTATACGTTACAGGCGTTCTTAACAGTGTG





TTGTCAAAACATCACAGGCAAGAAATGTGTCGTTACATCTATAATCACCAAAACGAAGATGGAG





GGTGGGGTTTACATATAGAAGGCTCTTCTACTATGTTTGGGTCTGCACTGAATTACGTTGCGTT





AAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCATGGTGCGATGACCAAAGCAAGGTCCTGG





ATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCCTGGGGCAAACTTTGGCTTTCCGTATTAG





GAGTGTATGAATGGTCGGGAAACAATCCATTGCCACCCGAATTTTGGTTACTTCCATATAGCCT





TCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAGAATGGTTTATCTGCCAATGTCGTATCTT





TATGGTAAGAGATTTGTTGGTCCAATCACTCCAATAGTTTTATCGTTGAGAAAAGAGTTATATA





CCATTCCGTACCACGAGATTGATTGGAATAGATCCAGAAACACGTGTGCTAAGGAGGACTTATA





TTACCCTCACCCTAAGATGCAAGATATCCTATGGGGCAGTATCTACCATGTCTACGAACCGCTA





TTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAGGCCATGAAAATTGCAATGGAACATATCC





ATTACGAAGATGAGAATTCCAGGTACATATGCCTTGGACCCGTGAACAAAGTTCTAAATATGCT





GTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTTCAAGTTTCACTTGCAGAGAATTCCTGAC





TATCTATGGTTAGCTGAAGATGGAATGAGAATGCAGGGTTATAATGGTTCACAGCTTTGGGACA





CTGCATTCAGCATACAAGCGATAATCTCAACGAAATTGATTGATACGTTTGGTCCGACATTACG





TAAAGCACACCATTTCGTCAAGCATAGTCAGATTCAAGAGGATTGTCCAGGTGATCCAAACGTA





TGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTTAGCACTAGAGATCATGGTTGGCTTATAT





CGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGATGTTATCTAAGTTGCCTTCTAAAATTGT





GGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGCAGTCAATGTTTTGTTATCATTGCAAAAC





GAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGGTCCTATCCCTGGTTAGAGTTGATTAACC





CTGCCGAAACATTTGGTGATATCGTAATTGACTACAGTTATGTGGAATGCACTAGTGCGACGAT





GGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGGGCATAGAACCAAAGAAATTGATGCCGCT





ATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAGAAAACAGACGGGTCTTGGTATGGTTGTT





GGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCATCAAAGGCTTAGTAGCTGCTGGTAGAAC





ATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAACTTCCTGTTGTCAAAAGAATTGCCTGGC





GGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAACAAAGTTTATACCAATCTGGAGGGAAACA





AGCCACACTTAGTCAATACTGCATGGGTCATGATGGCCTTGATTGAAGCAGGACAAGGGGAAAG





AGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCTAATCAATAGTCAATTGGAGTCAGGTGAC





TTTCCACAACAAGAAATCATGGGTGTGTTTAATAAGAACTGTATGATAACATATGCCGCATACA





GAAACATATTCCCTATATGGGCTCTAGGTGAATACTCCCATCGTGTCTTGGATATGTGA




SS5d-G5 fusion
MPVSVITTSTQPHVKEPVEEESGSAAPGSGSGSGMWRLKVGKESVGEKEEKWIKSISNHLGRQV
991



protein,
WEFCSGENENDDDEAIAVANNSASKFENARNHFRNNRFHRKQSSDLFLAIQCEKEIIRNGAKNE




cucurbitadienol
GTTKVKEGEDVKKEAVKNTLERALSFYSAVQTSDGNWASDLGGPMFLLPGLVIALYVTGVLNSV




synthase
LSKHHRQEMCRYIYNHQNEDGGWGLHIEGSSTMFGSALNYVALRLLGEAADGGEHGAMTKARSW





ILERGGATAITSWGKLWLSVLGVYEWSGNNPLPPEFWLLPYSLPFHPGRMWCHCRMVYLPMSYL





YGKRFVGPITPIVLSLRKELYTIPYHEIDWNRSRNTCAKEDLYYPHPKMQDILWGSIYHVYEPL





FSGWPGKRLREKAMKIAMEHIHYEDENSRYICLGPVNKVLNMLCCWVEDPYSDAFKFHLQRIPD





YLWLAEDGMRMQGYNGSQLWDTAFSIQAIISTKLIDTFGPTLRKAHHFVKHSQIQEDCPGDPNV





WFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLMLSKLPSKIVGEPLEKNRLCDAVNVLLSLQN





ENGGFASYELTRSYPWLELINPAETFGDIVIDYSYVECTSATMEALALFKKLHPGHRTKEIDAA





IAKAANFLENMQKTDGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCVAIRKACNFLLSKELPG





GGWGESYLSCQNKVYTNLEGNKPHLVNTAWVMMALIEAGQGERDPAPLHRAARLLINSQLESGD





FPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYSHRVLDM-




SS5d-G5
MPVSVITTSTQPHVKEPVEEES
992



protein, fusion





domain





SS5d-G7 fusion
ATGTCATCGAGCAAGAAAATCACCAGTGTCAAACAAGGAAGTGCAGCTCCTGGAAGTGGAAGTG
994



protein, coding
GTTCAGGAATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGAT




sequence
TAAAAGCATTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAAC





GACGACGATGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATC





ACTTCCGTAATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGA





GAAAGAGATCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGAT





GTTAAGAAAGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGA





CCTCTGACGGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTAT





TGCGCTATACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGT





CGTTACATCTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTA





CTATGTTTGGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGG





TGAGCATGGTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATA





ACTTCCTGGGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCAT





TGCCACCCGAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCA





TTGTAGAATGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACT





CCAATAGTTTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATA





GATCCAGAAACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCT





ATGGGGCAGTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGG





GAAAAGGCCATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATAT





GCCTTGGACCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGA





TGCTTTCAAGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGA





ATGCAGGGTTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAA





CGAAATTGATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCA





GATTCAAGAGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGG





CCTTTTAGCACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTT





CACTGATGTTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATG





TGATGCAGTCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTA





ACTAGGTCCTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTG





ACTACAGTTATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCA





CCCTGGGCATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAAT





ATGCAGAAAACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGT





TTGGCATCAAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGC





TTGTAACTTCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGC





CAAAACAAAGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCA





TGATGGCCTTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAG





ATTGCTAATCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTT





AATAAGAACTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTG





AATACTCCCATCGTGTCTTGGATATGTGA




SS5d-G7 fusion
MSSSKKITSVKQGSAAPGSGSGSGMWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENEN
995



protein,
DDDEAIAVANNSASKFENARNHFRNNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGED




cucurbitadienol
VKKEAVKNTLERALSFYSAVQTSDGNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMC




synthase
RYIYNHQNEDGGWGLHIEGSSTMFGSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAI





TSWGKLWLSVLGVYEWSGNNPLPPEFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPIT





PIVLSLRKELYTIPYHEIDWNRSRNTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLR





EKAMKIAMEHIHYEDENSRYICLGPVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMR





MQGYNGSQLWDTAFSIQAIISTKLIDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAW





PFSTRDHGWLISDCTAEGLKASLMLSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYEL





TRSYPWLELINPAETFGDIVIDYSYVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLEN





MQKTDGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSC





QNKVYTNLEGNKPHLVNTAWVMMALIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVF





NKNCMITYAAYRNIFPIWALGEYSHRVLDM-




SS5d-G7 fusion
MSSSKKITSVKQ




protein, fusion
996




domain





SS5e-C10 fusion
ATGCGAGGCTTGACACCTAAGGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAGGAATGTGGA
998



protein, coding
GACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAGCATTAGTAA




sequence
CCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGACGATGAAGCA





ATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCCGTAATAACA





GATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGAGATCATTAG





GAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAGAAAGAAGCC





GTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTGACGGTAATT





GGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCTATACGTTAC





AGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTACATCTATAAT





CACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGTTTGGGTCTG





CACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCATGGTGCGAT





GACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCCTGGGGCAAA





CTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCACCCGAATTTT





GGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAGAATGGTTTA





TCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATAGTTTTATCG





TTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCAGAAACACGT





GTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGGCAGTATCTA





CCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAGGCCATGAAA





ATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTGGACCCGTGA





ACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTTCAAGTTTCA





CTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAGGGTTATAAT





GGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAATTGATTGATA





CGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCAAGAGGATTG





TCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTTAGCACTAGA





GATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGATGTTATCTA





AGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGCAGTCAATGT





TTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGGTCCTATCCC





TGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACAGTTATGTGG





AATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGGGCATAGAAC





CAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAGAAAACAGAC





GGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCATCAAAGGCT





TAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAACTTCCTGTT





GTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAACAAAGTTTAT





ACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGGCCTTGATTG





AAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCTAATCAATAG





TCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAGAACTGTATG





ATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACTCCCATCGTG





TCTTGGATATGTGA




SS5e-C10 fusion
MRGLTPKGSAAPGSGSGSGMWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDDEA
999



protein,
IAVANNSASKFENARNHFRNNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKKEA




cucurbitadienol
VKNTLERALSFYSAVQTSDGNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIYN




synthase
HQNEDGGWGLHIEGSSTMFGSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSWGK





LWLSVLGVYEWSGNNPLPPEFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLS





LRKELYTIPYHEIDWNRSRNTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKAMK





IAMEHIHYEDENSRYICLGPVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGYN





GSQLWDTAFSIQAIISTKLIDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFSTR





DHGWLISDCTAEGLKASLMLSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSYP





WLELINPAETFGDIVIDYSYVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQKTD





GSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNKVY





TNLEGNKPHLVNTAWVMMALIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKNCM





ITYAAYRNIFPIWALGEYSHRVLDM-




SS5e-C10 fusion
MRGLTPK
1000



protein, fusion





domain





SS5e-G8 fusion
ATGGCAAACCAAATAGCAAATCAAGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAGGAATGT
1002



protein, coding
GGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAGCATTAG




sequence
TAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGACGATGAA





GCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCCGTAATA





ACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGAGATCAT





TAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAGAAAGAA





GCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTGACGGTA





ATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCTATACGT





TACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTACATCTAT





AATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGTTTGGGT





CTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCATGGTGC





GATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCCTGGGGC





AAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCACCCGAAT





TTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAGAATGGT





TTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATAGTTTTA





TCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCAGAAACA





CGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGGCAGTAT





CTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAGGCCATG





AAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTGGACCCG





TGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTTCAAGTT





TCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAGGGTTAT





AATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAATTGATTG





ATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCAAGAGGA





TTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTTAGCACT





AGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGATGTTAT





CTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGCAGTCAA





TGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGGTCCTAT





CCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACAGTTATG





TGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGGGCATAG





AACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAGAAAACA





GACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCATCAAAG





GCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAACTTCCT





GTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAACAAAGTT





TATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGGCCTTGA





TTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCTAATCAA





TAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAGAACTGT





ATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACTCCCATC





GTGTCTTGGATATGTGA




SS5e-G8 fusion
MANQIANQGSAAPGSGSGSGMWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDDE
1003



protein,
AIAVANNSASKFENARNHFRNNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKKE




cucurbitadienol
AVKNTLERALSFYSAVQTSDGNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIY




synthase
NHQNEDGGWGLHIEGSSTMFGSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSWG





KLWLSVLGVYEWSGNNPLPPEFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVL





SLRKELYTIPYHEIDWNRSRNTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKAM





KIAMEHIHYEDENSRYICLGPVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGY





NGSQLWDTAFSIQAIISTKLIDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFST





RDHGWLISDCTAEGLKASLMLSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSY





PWLELINPAETFGDIVIDYSYVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQKT





DGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNKV





YTNLEGNKPHLVNTAWVMMALIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKNC





MITYAAYRNIFPIWALGEYSHRVLDM




SS5e-G8 fusion
MANQIANQ
1004



protein, fusion





domain





SS5f-E11 fusion
ATGGTTGACGCTAGGGGTAGCAACGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAGGAATGT
1006



protein, coding
GGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAGCATTAG




sequence
TAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGACGATGAA





GCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCCGTAATA





ACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGAGATCAT





TAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAGAAAGAA





GCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTGACGGTA





ATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCTATACGT





TACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTACATCTAT





AATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGTTTGGGT





CTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCATGGTGC





GATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCCTGGGGC





AAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCACCCGAAT





TTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAGAATGGT





TTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATAGTTTTA





TCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCAGAAACA





CGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGGCAGTAT





CTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAGGCCATG





AAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTGGACCCG





TGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTTCAAGTT





TCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAGGGTTAT





AATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAATTGATTG





ATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCAAGAGGA





TTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTTAGCACT





AGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGATGTTAT





CTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGCAGTCAA





TGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGGTCCTAT





CCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACAGTTATG





TGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGGGCATAG





AACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAGAAAACA





GACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCATCAAAG





GCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAACTTCCT





GTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAACAAAGTT





TATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGGCCTTGA





TTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCTAATCAA





TAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAGAACTGT





ATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACTCCCATC





GTGTCTTGGATATGTGA




SS5f-E11 fusion
MVDARGSNGSAAPGSGSGSGMWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDDE
1007



protein,
AIAVANNSASKFENARNHFRNNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKKE




cucurbitadieno1
AVKNTLERALSFYSAVQTSDGNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIY




synthase
NHQNEDGGWGLHIEGSSTMFGSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSWG





KLWLSVLGVYEWSGNNPLPPEFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVL





SLRKELYTIPYHEIDWNRSRNTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKAM





KIAMEHIHYEDENSRYICLGPVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGY





NGSQLWDTAFSIQAIISTKLIDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFST





RDHGWLISDCTAEGLKASLMLSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSY





PWLELINPAETFGDIVIDYSYVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQKT





DGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNKV





YTNLEGNKPHLVNTAWVMMALIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKNC





MITYAAYRNIFPIWALGEYSHRVLDM-




SS5f-E11 fusion
MVDARGSN
1008



protein, fusion





domain





SS5f-F8 fusion
ATGCACGGCAAAGAGTTGGCTGGGCTAGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAGGAA
1010



protein, coding
TGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAGCAT




sequence
TAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGACGAT





GAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCCGTA





ATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGAGAT





CATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAGAAA





GAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTGACG





GTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCTATA





CGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTACATC





TATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGTTTG





GGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCATGG





TGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCCTGG





GGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCACCCG





AATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAGAAT





GGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATAGTT





TTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCAGAA





ACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGGCAG





TATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAGGCC





ATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTGGAC





CCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTTCAA





GTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAGGGT





TATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAATTGA





TTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCAAGA





GGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTTAGC





ACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGATGT





TATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGCAGT





CAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGGTCC





TATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACAGTT





ATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGGGCA





TAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAGAAA





ACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCATCA





AAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAACTT





CCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAACAAA





GTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGGCCT





TGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCTAAT





CAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAGAAC





TGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACTCCC





ATCGTGTCTTGGATATGTGA




SS5f-F8 fusion
MHGKELAGLGSAAPGSGSGSGMWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDD
1011



protein,
EAIAVANNSASKFENARNHFRNNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKK




cucurbitadienol
EAVKNTLERALSFYSAVQTSDGNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYI




synthase
YNHQNEDGGWGLHIEGSSTMFGSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSW





GKLWLSVLGVYEWSGNNPLPPEFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIV





LSLRKELYTIPYHEIDWNRSRNTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKA





MKIAMEHIHYEDENSRYICLGPVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQG





YNGSQLWDTAFSIQAIISTKLIDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFS





TRDHGWLISDCTAEGLKASLMLSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRS





YPWLELINPAETFGDIVIDYSYVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQK





TDGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNK





VYTNLEGNKPHLVNTAWVMMALIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKN





CMITYAAYRNIFPIWALGEYSHRVLDM-




SS5f-F8 fusion
MHGKELAGL
1012



protein, fusion





domain





EXG1 YLR300W
MLSLKTLLCTLLTVSSVLATPVPARDPSSIQFVHEENKKRYYDYDHGSLGEPIRGVNIGGWLLL
1013




Saccharemyces

EPYITPSLFEAFRTNDDNDEGIPVDEYHFCQYLGKDLAKSRLQSHWSTFYQEQDFANIASQGFN





cerevisiae

LVRIPIGYWAFQTLDDDPYVSGLQESYLDQAIGWARNNSLKVWVDLHGAAGSQNGFDNSGLRDS





YKFLEDSNLAVTTNVLNYILKKYSAEEYLDTVIGIELINEPLGPVLDMDKMKNDYLAPAYEYLR





NNIKSDQVIIIHDAFQPYNYWDDFMTENDGYWGVTIDHHHYQVFASDQLERSIDEHIKVACEWG





TGVLNESHWTVCGEFAAALTDCTKWLNSVGFGARYDGSWVNGDQTSSYIGSCANNDDIAYWSDE





RKENTRRYVEAQLDAFEMRGGWIIWCYKTESSLEWDAQRLMFNGLFPQPLTDRKYPNQCGTISN




EXG1
MKLTKLVALAGAALASPIQLVPREGSFLGFNYGSEKVHGVNLGGWFVLEPFITPSLFEAFGNND
1014




Yarrowia

ANVPVDEYHYTAWLGKEEAEKRLTDHWNTWITEYDIKAIAENYKLNLVRIPIGYWAFSLLPNDP





lipolytica

YVQGQEAYLDRALGWCRKYGVKAWVDVHGVPGSQNGFDNSGLRDHWDWPNADNVQHSINVINYI





AGKYGAPEYNDIVVGIELVNEPLGPAIGMEVIEKYFQEGFWTVRHAGSDTAVVIHDAFQEKNYF





NNFMTTEQGFWNVVLDHHQYQVFSPGELARNIDQHIAEVCNVGRQASTEYHWRIFGEWSAALTD





CTHWLNGVGKGPRLDGSFPGSYYQRSCQGRGDIQTWSEQDKQESRRYVEAQLDAWEHGGDGWIY





WTYKTENALEWDFRRLVDNGIFPFPYWDRQFPNQCGF




Glugan 1, 4, alpha
MPRLSYALCALSLGHAAIAAPQLSARATGSLDSWLGTETTVALNGILANIGADGAYAKSAKPGI
1015



glucosidase
IIASPSTSEPDYYYTWTRDAALVTKVLVDLFRNGNLGLQKVITEYVNSQAYLQTVSNPSGGLAS





GGLAEPKYNVDMTAFTGAWGRPQRDGPALRATALIDFGNWLIDNGYSSYAVNNIWPIVRNDLSY





VSQYWSQSGFDLWEEVNSMSFFTVAVQHRALVEGSTFAKRVGASCSWCDSQAPQILCYMQSFWT





GSYINANTGGGRSGKDANTVLASIHTFDPEAGCDDTTFQPCSPRALANHKVYTDSFRSVYAINS





GIPQGAAVSAGRYPEDVYYNGNPWFLTTLAAAEQLYDAIYQWKKIGSISITSTSLAFFKDIYSS





AAVGTYASSTSTFTDIINAVKTYADGYVSIVQAHAMNNGSLSEQFDKSSGLSLSARDLTWSYAA





FLTANMRRNGVVPAPWGAASANSVPSSCSMGSATGTYSTATATSWPSTLTSGSPGSTTTVGTTT





STTSGTAAETACATPTAVAVTFNEIATTTYGENVYIVGSISELGNWDTSKAVALSASKYTSSNN





LWYVSVTLPAGTTFEYKYIRKESDGSIVWESDPNRSYTVPAACGVSTATENDTWQ




polygalacturonase
MHLNTTLLVSLALGAASVLASPAPPAITAPPTAEEIAKRATTCTFSGSNGASSASKSKTSCSTI
1016



1 [Aspergillus
VLSNVAVPSGTTLDLTKLNDGTHVIFSGETTFGYKEWSGPLISVSGSDLTITGASGHSINGDGS





aculeatus]

RWWDGEGGNGGKTKPKFFAAHSLTNSVISGLKIVNSPVQVFSVAGSDYLTLKDITIDNSDGDDN




CAE46193.1
GGHNTDAFDIGTSTYVTISGATVYNQDDCVAVNSGENIYFSGGYCSGGHGLSIGSVGGRSDNTV




GI:34366090
KNVTFVDSTIINSDNGVRIKTNIDTTGSVSDVTYKDITLTSIAKYGIVVQQNYGDTSSTPTTGV




polygalacturonase
PITDFVLDNVHGSVVSSGTNILISCGSGSCSDWTWTDVSVSGGKTSSKCTNVPSGASC
1017



2 [Aspergillus
MHSFQLLGLAAVGSVVSAAPTASRVSDLVKKSSSTCTFTSASEASETSSSCSNVVLSNIEVPAG





aculeatus]

ETLDLSDAADGATITFEGTTSFGYEEWDGPLIRFGGKQLTITQSDGAVIDGDGSRWWDSEGTNG




CAE46194.1
GKTKPKFMYVHDVEDSTIKGLQIKNTPVQAISVQATNVYLTDITIDNSDGDDNGGHNTDGFDIS




GI:34366092
ESTGVYISGATVKNQDDCIAINSGENILFTGGTCSGGHGLSIGSVGGRDDNTVKNVTISDSTVT





DSANGVRIKTIYGDTGDVSEITYSNIQLSGITDYGIVIEQDYENGSPTGTPSTGVPITDVTVDG





VTGSIEDDAVQVYILCGDGSCSDWTWSGVDITGGETSSDCENVPSGASC




polygalacturonase
MVRQLALACGLLAAVAVQAAPAEPAHPMVTEAPDASLLHKRATTCTFSGSEGASKVSKSKTACS
1018



3 [Aspergillus
TIYLSALAVPSGTTLDLKDLNDGTHVIFEGETTFGYEEWEGPLVSVSGTDITVEGASGAVLNGD





aculeatus]

GSRWWDGEGGNGGKTKPKFFAAHDLTSSTIKSIYIENSPVQVFSIDGATDLTLTDITIDNTDGD




CAE46195.1
TDDLAANTDGFDIGESTDITITGAKVYNQDDCVAINSGENIYFSASVCSGGHGLSIGSVGGRDD




GI:34366094
NTVKNVTFYDVNVLKSQQAIRIKAIYGDTGSISDITYHEIAFSDATDYGIVIEQNYDDTSKTPT





TGVPITDFTLENVIGTCADDDCTEVYIACGSGACSDWSWSSVSVTGGKVSSKCLNVPSGISCDL




Chain A, Crystal
ATTCTFSGSNGASSASKSKTSCSTIVLSNVAVPSGTTLDLTKLNDGTHVIFSGETTFGYKEWSG
1019



Structure Of
PLISVSGSDLTITGASGHSINGDGSRWWDGEGGNGGKTKPKFFAAHSLTNSVISGLKIVNSPVQ




Polygalacturonase
VFSVAGSDYLTLKDITIDNSDGDDNGGHNTDAFDIGTSTYVTISGATVYNQDDCVAVNSGENIY




From Aspergillus
FSGGYCSGGHGLSIGSVGGRSDNTVKNVTFVDSTIINSDNGVRIKTNIDTTGSVSDVTYKDITL





Aculeatus At Ph4.5

TSIAKYGIVVQQNYGDTSSTPTTGVPITDFVLDNVHGSVVSSGTNILISCGSGSCSDWTWTDVS




1IB4_A GI:15988280
VSGGKTSSKCTNVPSGASC




Chain B, Crystal
ATTCTFSGSNGASSASKSKTSCSTIVLSNVAVPSGTTLDLTKLNDGTHVIFSGETTFGYKEWSG
1020



Structure Of
PLISVSGSDLTITGASGHSINGDGSRWWDGEGGNGGKTKPKFFAAHSLTNSVISGLKIVNSPVQ




Polygalacturonase
VFSVAGSDYLTLKDITIDNSDGDDNGGHNTDAFDIGTSTYVTISGATVYNQDDCVAVNSGENIY




From Aspergillus
FSGGYCSGGHGLSIGSVGGRSDNTVKNVTFVDSTIINSDNGVRIKTNIDTTGSVSDVTYKDITL





aculeatus At Ph4.5

TSIAKYGIVVQQNYGDTSSTPTTGVPITDFVLDNVHGSVVSSGTNILISCGSGSCSDWTWTDVS




1IB4_B GI:15988281
VSGGKTSSKCTNVPSGASC




Chain A,
ATTCTFSGSNGASSASKSKTSCSTIVLSNVAVPSGTTLDLTKLNDGTHVIFSGETTFGYKEWSG
1021



Polygalacturonase
PLISVSGSDLTITGASGHSINGDGSRWWDGEGGNGGKTKPKFFAAHSLTNSVISGLKIVNSPVQ




From Aspergillus
VFSVAGSDYLTLKDITIDNSDGDDNGGHNTDAFDIGTSTYVTISGATVYNQDDCVAVNSGENIY





Aculeatus

FSGGYCSGGHGLSIGSVGGRSDNTVKNVTFVDSTIINSDNGVRIKTNIDTTGSVSDVTYKDITL




1IA5_A GI:15988279
TSIAKYGIVVQQNYGDTSSTPTTGVPITDFVLDNVHGSVVSSGTNILISCGSGSCSDWTWTDVS





VSGGKTSSKCTNVPSGASC




polygalacturonase
MHLNTTLLVSLALGAASVLASPAPPAITAPPTAEEIAKRATTCTFSGSNGASSASKSKTSCSTI
1022



precursor
VLSNVAVPSGTTLDLTKLNDGTHVIFSGETTFGYKEWSGPLISVSGSDLTITGASGHSINGDGS




[Aspergillus






aculeatus]

RWWDGEGGNGGKTKPKFFAAHSLTNSVISGLKIVNSPVQVFSVAGSDYLTLKDITIDNSDGDDN




378 aa protein
GGHNTDAFDIGTSTYVTISGATVYNQDDCVAVNSGENIYFSGGYCSGGHGLSIGSVGGRSDNTV




AAC23565.1
KNVTFVDSTIINSDNGVRIKTNIDTTGSVSDVTYKDITLTSIAKYGIVVQQNYGDTSSTPTTGV




GI:3220207
PITDFVLDNVHGSVVSSGTNILISCGSGSCSDWTWTDVSVSGGKTSSKCTNVPSGASC




EXG2 YDR261C
MPLKSFFFSAFLVLCLSKFTQGVGTTEKEESLSPLELNILQNKFASYYANDTITVKGITIGGWL
1023




Saccharomyces

VTEPYITPSLYRNATSLAKQQNSSSNISIVDEFTLCKTLGYNTSLTLLDNHFKTWITEDDFEQI





cerevisiae

KTNGFNLVRIPIGYWAWKQNTDKNLYIDNITFNDPYVSDGLQLKYLNNALEWAQKYELNVWLDL





HGAPGSQNGFDNSGERILYGDLGWLRLNNTKELTLAIWRDMFQTFLNKGDKSPVVGIQIVNEPL





GGKIDVSDITEMYYEAFDLLKKNQNSSDNTTFVIHDGFQGIGHWNLELNPTYQNVSHHYFNLTG





ANYSSQDILVDHHHYEVFTDAQLAETQFARIENIINYGDSIHKELSFHPAVVGEWSGAITDCAT





WLNGVGVGARYDGSYYNTTLFTTNDKPVGTCISQNSLADWTQDYRDRVRQFIEAQLATYSSKTT





GWIFWNWKTEDAVEWDYLKLKEANLFPSPFDNYTYFKADGSIEEKFSSSLSAQAFPRTTSSVLS





STTTSRKSKNAAISNKLTTSQLLPIKNMSLTWKASVCALAITIAALCASL









While the invention has been described with reference to the specific embodiments thereof, it should be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the true spirit and scope of the invention. This includes embodiments which do not provide all of the benefits and features set forth herein. In addition, many modifications may be made to adapt a particular situation, material, composition of matter, process, process step or steps, to the objective, spirit and scope of the present invention. All such modifications are intended to be within the scope of the claims appended hereto. Accordingly, the scope of the invention is defined only by reference to the appended claims.

Claims
  • 1. A method of producing Compound 1 having the structure of:
  • 2. The method of claim 1, the mogroside IIIE is contacted with a recombinant host cell that comprises the gene comprising the nucleic acid sequence having at least 90% sequence identity to any one of SEQ ID NOs: 104, and 105 encoding the enzyme having dextransucrase activity.
  • 3. The method of claim 2, wherein the mogroside IIIE is present in and/or produced by the recombinant host cell.
RELATED APPLICATIONS

The present application claims priority under 35 U.S.C. § 119(e) to U.S. Provisional Application Nos. 62/501,018, filed on May 3, 2017 and 62/551,750, filed on Aug. 29, 2017. The content of each of these related applications is incorporated herein by reference in its entirety.

US Referenced Citations (36)
Number Name Date Kind
6468576 Sher et al. Oct 2002 B1
7659097 Renninger et al. Feb 2010 B2
8236512 Zhao et al. Aug 2012 B1
8357527 Ubersax Jan 2013 B2
8367395 Bailey et al. Feb 2013 B2
8415136 Gardner et al. Apr 2013 B1
8470568 Walker et al. Jun 2013 B2
8481286 Julien et al. Jul 2013 B2
8519204 Ohler et al. Aug 2013 B2
8586814 Fisher et al. Nov 2013 B2
8603800 Gardner et al. Dec 2013 B2
8609371 Julien et al. Dec 2013 B2
8753842 Julien et al. Jun 2014 B2
8859261 Gardner et al. Oct 2014 B2
9200296 Renninger et al. Dec 2015 B2
9410214 Hawkins et al. Aug 2016 B2
9540662 Walker et al. Jan 2017 B2
20050084506 Tachdjian Apr 2005 A1
20060228454 Ackill et al. Oct 2006 A1
20060263411 Tachdjian et al. Nov 2006 A1
20070116690 Yang et al. May 2007 A1
20090111834 Tachdjian et al. Apr 2009 A1
20090137014 Tsuruta et al. May 2009 A1
20090220662 Tachdjian et al. Sep 2009 A1
20100151519 Julien et al. Jun 2010 A1
20100151555 Julien et al. Jun 2010 A1
20120201763 Tachdjian et al. Aug 2012 A1
20120226047 Shigernura et al. Sep 2012 A1
20150064743 Liu et al. Mar 2015 A1
20150093339 Tachdjian et al. Apr 2015 A1
20150225754 Tange et al. Aug 2015 A1
20170029458 Siems et al. Feb 2017 A1
20170119032 Patron et al. May 2017 A1
20170145429 Walker et al. May 2017 A1
20170283844 Itkin et al. Oct 2017 A1
20180020709 Markosyan Jan 2018 A1
Foreign Referenced Citations (38)
Number Date Country
105039274 Nov 2015 CN
2 783 009 Apr 2016 EP
WO 05015158 Feb 2005 WO
WO 05041684 May 2005 WO
WO 06084186 Aug 2006 WO
WO 06138512 Dec 2006 WO
WO 07124152 Nov 2007 WO
WO 08154221 Dec 2008 WO
WO 09023975 Feb 2009 WO
WO 09100333 Aug 2009 WO
WO 09111447 Sep 2009 WO
WO 10014666 Feb 2010 WO
WO 10014813 Feb 2010 WO
WO 11112892 Sep 2011 WO
WO 11123693 Oct 2011 WO
WO 12021837 Feb 2012 WO
WO 12061698 May 2012 WO
WO 13025560 Feb 2013 WO
WO 13096420 Jun 2013 WO
WO 14025706 Feb 2014 WO
WO 14027118 Feb 2014 WO
WO 14086842 Jun 2014 WO
WO 14130513 Aug 2014 WO
WO 14140634 Sep 2014 WO
WO 14130582 Oct 2014 WO
WO 14086842 Jun 2015 WO
WO 15082012 Jun 2015 WO
WO 15168779 Jun 2015 WO
WO 16038617 Mar 2016 WO
WO 16050890 Apr 2016 WO
WO2016060276 Apr 2016 WO
WO 16073251 May 2016 WO
WO 16130609 Aug 2016 WO
WO 17044659 Mar 2017 WO
WO 17172766 Oct 2017 WO
WO 17176873 Oct 2017 WO
WO 18016483 Jan 2018 WO
WO 18229283 Dec 2018 WO
Non-Patent Literature Citations (88)
Entry
US 8,486,659 B2, 07/2013, Julien et al. (withdrawn)
Devos et al., Proteins: Structure, Function and Genetics, 2000, vol. 41: 98-107.
Whisstock et al., Quarterly Reviews of Biophysics 2003, vol. 36 (3): 307-340.
Witkowski et al., Biochemistry 38:11643-11650, 1999.
Kisselev L., Structure, 2002, vol. 10: 8-9.
Str.PDF-15969616, 2020, pp. 1-136.
Akihisa et al., 2007, Cucurbitane glycosides from the fruits of siraitia grosvenorii and their inhibitory effects on Epstein-Barr virus activation, J. Nat. Prod., 70:783-788.
Chaturvedula et al., 2011, Enzymatic and acid hydrolysis of steviol and cucurbitane glycosides, Int. J. Pharm. Biomed. Res., 2(2):135-139.
Chen et al., 2018, Kumada arylation of secondary amides enabled by chromium catalysis for unsymmetric ketone synthesis under mild conditions, ACS Catalysis, 8:5864-5868.
Chen et al., Jan. 2005, Cucurbitacins and cucurbitane glycosies: structures and biologial activities, Natural Product Reports, 22(3), 14 pp.
Jia et al., 2009, A minor, sweet cucurbitane glycoside from siraitia grosvenorii, Natural Product Communications, 4(6):769-772.
Li et al., 2006, Cucurbitane glycosides from unripe fruits of Lo Han Kuo (Siraitia grosvenori), Chem. Pharm. Bull, 54(10):1425-1428.
Li et al., 2007, Cucurbitane glycosides from unripe fruits of siraitia grosvenori, Chem. Pharm. Bull. 55(7):1082-1086.
Li et al., 2014, Chemistry and pharmacology of siraitia grosvenorii: a review, Chinese Journal of Natural Medicines, 12(2):89-102.
Li et al., 2017, Cucurbitane glycosides from the fruit of siraitia grosvenori and their effects on glucose uptake in human HepG2 cells in vitro, Food Chemistry, 228:567-573.
Matsumoto et al., 1990, Minor cucurbitane-glycosides from fruits of siraitia grosvenori (cucurbitaceae), Chem. Pharm. Bull., 38(7):2030-2032.
Prakash et al., 2014, Additional new minor cucurbitane clycosieds from Siraitia grosvenorii, Molecules, 19:3669-3680.
Prakash et al., Jan. 2011, Comparative phytochemical studies of the commercial extracts of Siraitia grosvenorii, Journal of Pharmacy Research, 4(9):3166-3167.
Shen et al., 2014, Rapid identification and quantification of five major mogrosides in siraitia grosvenorii (Luo-Han-Guo) by high performance liquid chromatography-triple quadrupole linear trap tandem mass spectrometry combined with microwave-assisted extraction, Microchemical Journal, 116:142-150.
Takemoto et al., 1983, Studies on the constituents of fructus momordicae. III. Structure of mogrosides, Pharmaceutical Journal, 103(11):1167-1173.
Wang et al., 2015, Hyperproduction of β-Glucanase Exg1 promotes the bioconversion of mogrosides in Saccharomyces cerevisiae mutants defective in mannoprotien deposition, Journal of Agricultural and Food Chemistry, 63:10271-10279.
Wang et al., 2019, Dekkera bruxellensis, a beer yeast that specifically bioconverts mogroside extracts into the intense natural sweetener siamensode I, Food Chemistry, 276:43-49.
Xu et al., 2015, Exploring in vitro, in vivo metabolism of mogroside V and distribution of its metabolites in rats by HPLC-ESI-IT-TOF-MS, Journal of Pharmaceutical and Biomedical Analysis, 115:418-430.
Yang et al., 2016, Metabolites of siamenoside I and their distributions in rats, Molecules, 21:1-20.
Zhou et al., 2016, Comprehensive analysis of 61 characteristic constituents from siraitiae fructus using ultrahigh-pressure liquid chromatography with time-of-flight mass spectrometry, Journal of Pharmaceutical and Biomedical Analysis, 125:1-14.
Zhou et al., 2017, Biotransformation of total saponins in siraitia fructus by human intestinal microbiota of normal and type 2 diabetic patients: comprehensive metabolite identification and metabolic profile elucidation using LC-Q-TOF/MS, Journal of Agricultural and Food Chemistry, 65:1518-1524.
Ager et al., 1998; Commercial, synthetic nonnutritive sweeteners, Angew, Chem. Int, Ed. 37:1802-1817.
Altschul et al., 1996, Local Alignment Statistics, Methods in Enzymology, 266:460-480.
Altschul et al., 1997, Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Res., 25(17):3389-3402.
Andrade-Eiroa et al., Jun. 2016, Solid-phase extraction of organic compounds: A critical review (Part I), TrAC Trends in Analytical Chemistry, 80:641-654.
Cardenas et al., 2016, Engineering cofactor and transport mechanisms in Saccharomyces cerevisiae for enhanced acetyl-CoA and polyketide biosynthesis. Metab Eng, 36:80-89.
Chabrol, 2012, The hideous price of beauty an investigation into the market of deep-sea shark liver oil. Edited by: The Bioom Association.
Chang et al., 2007, Engineering Escherichia coli for production of functionalized terpenoids using plant P450s. Nat Chem Biol, 3:274-277.
Chiu et al., 2013, Biotransformation of mogrosides from Siraitia grosvenorii swingle by Saccharomyces cerevisiae, J. Agric. Food Chem., 61:7127-7134.
Dai et al., 2015, Functional characterization of cucurbtadienol synthase and triperpene glycosyltransferase involved in biosynthesis of mogrosides from Siraitia grosvenorii, Plant Cell Physiol., 56(6):1172-1182.
de Felipe et al., 2004, Targeting of Proteins Derived from Self-Processing Polyproteins Containing Multiple Signal Sequences, Traffic, 5:616-626.
de Felipe, 2004, Skipping the co-expression problem: the new 2A “CHYSEL” technology, Genetic Vaccines and Ther. 2:13.
Donald et al., Sep. 1997, Effects of Overproduction of the Catalytic Domain of 3-Hydroxy-3-Methylglutaryl Coenzyme A Reductase on Squalene Synthesis in Saccharomyces cerevisiae, Appl Environ Microbiol., 63(9):3341-3344.
GenBank: AEM42982.1, cucurbitadienol synthase [Siraitia grosvenorii], Dec. 1, 2012, 2 pp.
Ghimire et al., 2009, Improved squalene production via modulation of the methylerythritol 4-phosphate pathway and heterologous expression of genes from Streptomyces peucetius ATCC 27952 in Escherichia coli. Appl Environ Microbiol, 75:7291-7293.
Ghimire et al., 2016, Advances in Biochemistry and Microbial Production of Squalene and Its Derivatives. J Microbial Biotechnol, 26:441-451.
Gruchattka et al., Dec. 23, 2015, In Vivo Validation of in Silico Predicted Metabolic Engineering Strategies in Yeast: Disruption of α-Ketoglutarate Dehydrogenase and Expression of ATP-Citrate Lyase for Terpenoid Production. PLOS ONE, 10(12):e0144981.
Itkin et al., 2016 The biosynthetic pathway of the nonsugar, high-intensity sweetener mogroside V from Siraitia grosvenorii, PNAS, 113(47):E7619-E7628 and supplemental material.
Joska et al., May 2014 A universal cloning method based on yeast homologous recombination that is simple, efficient, and versatile, J. Microbiol. Methods, 100: 46-51.
Kasai et al., 1988, Glycosides from Chinese medicinal plant, hernsleya panacis-scandens, and structure-taste relationship of cucurbitane glycosides, Chemical and Pharmaceutical Bulletin, 36(1):234-243.
Katabami et al., 2015, Production of squalene by squalene synthases and their truncated mutants in Escherichia coli. J Biosci Bioeng, 119:165-171.
Kinghorn et al., 1998, Noncariogenic intense natural sweeteners, Med. Res. Rev. 18(5):347-360.
Kirby et al. Engineering triterpene production in Saccharomyces cerevisiae-β-amyrin synthase from Artemisia annua, FEBS J. Apr. 2008; 275(8):1852-9.
Kozak et al., 2014, Engineering acetyl coenzyme A supply: functional expression of a bacterial pyruvate dehydrogenase complex in the cytosol of Saccharomyces cerevisiae. MBio, 5:e01696-01614.
Lernaigre and Rousseau, Transcriptional control of genes that regulate glycolysis and gluconeogenesis in adult liver, Biochem. J. 303:1-14 (1994).
LeVan et al., 2008, Section 16: Adsorbents and Ion Exchange, In Perry's Chemical Engineers' Handbook, 8th edition. Green ed., McGraw-Hill, New York, pp. 16-1-16-10.
Lewin, Genes V (Oxford University Press, Oxford), pp. 847-873.
Loeken et al., 1993, Effects of mutation of the CREB binding site of the somatostatin promoter on cyclic AMP responsiveness in CV-1 cells, Gene Expr. 3:253-264.
Luo et al., 2016, Liquid chromatography with tandem mass spectrometry method for the simultaneous determination of multiple sweet mogrosides in the fruits of Siraitia grosvenorii and its marketed sweeteners, J. Sep. Sci, 39:4124-4135.
McGehee et al., 1993, Differentiation-specific element: a cis-acting developmental switch required for the sustained transcriptional expression of the angiotensinogen gene during hormonal-induced differentiation of 3T3-L1 fibroblasts to adipocyteshee et al., Mol. Endocrinol. 7:551-560.
Mehrotra et al., 2014, Steviol glycosides and their use in food processing: a review, Innovare Journal of Food Science, 2(1):7-13.
Narendranath et al., May 2005, Relationship between pH and Medium Dissolved Solids in Terms of Growth and Metabolism of Lactobacilli and Saccharomyces cerevisiae during Ethanol Production, Appl Environ Microbiol., 71(5): 2239-2243.
Newman et al., 2006, High-level production of amorpha-4,11-diene in a two-phase partitioning bioreactor of metabolically engineered Escherichia coli, Biotechnol Bioeng, 95:684-691.
Noguchi et al., May 2008, Sequential glucosylation of a furofuran lignan, (+)-sesaminol, by Sesamum indicum UGT71A9 and UGT94D1 glucosyltransferases, Plant J., 54(3):415-427.
O'Reilly et al., 1992, Identification of an Activating Transcription Factor( ATF) Binding Site in the Human Transforming Growth Factor-/32 Promoter, J. Biol. Chem. 267:19938-19943.
Pandey et al., 2014, Enzymatic Biosynthesis of Novel Resveratrol Glucoside and Glycoside Derivatives, Applied and Environmental Microbiology, 80(23):7235-7243.
Peng et al., 2015, Controlling heterologous gene expression in yeast cell factories on different carbon substrates and across the diauxic shift: a comparison of yeast promoter activities, Microb Cell Fact, 14:91.
Plotka-Wasylka J et al., New Polymeric Materials for Solid Phase Extraction, Crit Rev Anal Chem., published online on Apr 11, 2017, pp. 373-383.
Prakash et al., Jul. 2008, development of rebiana, a natural, non-caloric sweetener, Food and Chemical Toxicology, 9 pp.
Qing et al., 2017, Systematic identification of flavonols, flavonol glycosides, triterpene and siraitic acid glycosides from Siraitia grosvenorii using high-performance liquid chromatography/quadrupole-time-of-flight mass spectrometry combined with a screening strategy, Journal of Pharmaceutical and Biomedical Analysis, 138:240-248.
Rodriguez et al., 2016, ATP citrate lyase mediated cytosolic acetyl-CoA biosynthesis increases mevalonate production in Saccharomyces cerevisiae, Microb Cell Fact, 15:48.
Sajid et al., May 2017, Porous membrane protected micro-solid-phase extraction: A review of features, advancements and applications, Anal Chim Acta., 965:36-53.
Salmon et al., Jul. 2016, A conserved amino acid residue critical for product and substrate specificity in plant triterpene synthases, Proc Natl Acad Sci USA. 26; 113(30): E4407-E4414.
Sawai et al. Triterpenoid Biosynthesis and Engineering in Plants, Front Plant Sci. Jun. 30, 2011; 2:25.
Shiba et al., 2007, Engineering of the pyruvate dehydrogenase bypass in Saccharornyces cerevisiae for high-level production of isoprenoids. Metab Eng, 9:160-168.
Shibuya et al., 2004, Cucurbitadienol synthase, the first committed enzyme for cucurbitacin biosyntheis, is a distinct enzyme from cycloartenol synthase for phytosterol biosynthesis, Tetrahedron 60:6995-7003.
Su et al. Jul. 2017, Molecular and biochemical characterization of squalene synthase from Siraitia grosvenorii, Biotechnol Lett. vol. 39, Issue 7, pp. 1009-1018.
Tai et al., 2013, Engineering the push and pull of lipid biosynthesis in oleaginous yeast Yarrowia lipolytica for biofuel production. Metab Eng, 15:1-9.
Takase et al. 2015, Control of the 1,2-rearrangement process by oxidosqualene cyclases during triterpene biosynthesis, Org Biomol Chem. 13(26):7331-6.
Tang et al., 2011, An efficient approach to finding Siraitia grosvenorii triterpene biosynthetic genes by RNA-seq and digital gene expression analysis, BMC Genomics, 12:343.
Thompson et al., 2014, Squalene production using Saccharornyces cerevisiae, i-ACES, 1(1), 7 pp.
Treisman et al., 1990, The SRE: a growth factor responsive transcriptional regulator, Seminars in Cancer Biol. 1:47-58.
U.S. FDA list of Everything Added to Food in the U.S. (EAFUS), available at http://www.accessdata.fda.gov/scripts/fcn/fcnNavigation.cfm?rpt=eafusListing, last accessed Nov. 16, 2015, 186 pp.
Wang et al., Aug. 20, 2014, Cucurbitane glycosides derived from mogroside IIE: structure-taste relationships, antioxidant activity, and acute toxicity, Molecules, 19(8):12676-12689.
Westfall et al., 2012, Production of amorphadiene in yeast, and its conversion to dihydroartemisinic acid, precursor to the antimalarial agent artemisinin. Proc Natl Acad Sci USA, 109:E111-118.
Wiet et al., 1993, Fat concentration affects sweetness and sensory profiles of sucrose, sucralose, and aspartame, J, Food Sci., 58(3):599-602.
Yang et al., Sep. 2005, Grosmomoside I, a new cucurbitane triterpenoid glycoside from fruits of momordica grosvenori, Chinese Traditional and Herbal Drugs, 36(9):1285-1290.
Ye et al., 1994, Characterization of a Silencer Regulatory Element in the Human Interferon-y Promoter, J. Biol. Chem. 269:25728-25734.
Zhang et al., 2012, Identification of flavonol and triperpene glycosides in Luo-Han-Guo extract using ultra-high performance liquid chromatography/quadrupole time-of-flight mass spectrometry, Journal of Food Compsition and Analysis, 25:142-148.
Zhang et al., 2015, Functional pyruvate formate lyase pathway expressed with two different electron donors in Saccharomyces cerevisiae at aerobic growth. FEMS Yeast Res, 15:fov024.
Zhang et al., 2016, Oxidation of Cucurbitadienol Catalyzed by CYP87D18 in the Biosynthesis of Mogrosides from Siraitia grosvenorii. Plant Cell Physiol 57:1000-1007.
Zhou et al., 2012, Enhanced alpha-ketoglutarate production in Yarrowia lipolytica WSH-Z06 by alteration of the acetyl-CoA metabolism. J Biotechnol, 161:257-264.
International Search Report dated Nov. 14, 2018 in PCT/US2018/030627 filed on May 2, 2018.
Related Publications (1)
Number Date Country
20190071705 A1 Mar 2019 US
Provisional Applications (2)
Number Date Country
62551750 Aug 2017 US
62501018 May 2017 US