PROTEIN COMPOSITIONS AND METHODS OF PRODUCTION

Information

  • Patent Application
  • 20240002824
  • Publication Number
    20240002824
  • Date Filed
    June 29, 2023
    a year ago
  • Date Published
    January 04, 2024
    11 months ago
Abstract
Provided are systems and methods for recombinant proteins in microorganisms engineered to use alternate carbon sources.
Description
SEQUENCE LISTING

The instant application contains a Sequence Listing which has been submitted in XML format via Patent Center and is hereby incorporated by reference in its entirety. Said XML copy, created on Jun. 28, 2023, is named 56025US_CRF_sequencelisting.xml and is 425,213 bytes in size.


BACKGROUND

A significant expense in commercial recombinant protein production is due to the cost of the carbon (e.g., sugar) fed to the recombinant organism during fermentation. This expense may be reduced by feeding the recombinant organisms less expensive carbon sources. Unfortunately, many recombinant organisms are unable to metabolize these less expensive carbon sources. Thus, there is a need to create recombinant organisms which are able to metabolize these less expensive carbon sources when used for commercial recombinant protein production.


SUMMARY

An aspect of the present disclosure is a fusion protein comprising a catalytic domain of a heterologous glycosyl hydrolase. In some embodiments, the fusion protein further comprises a and an anchoring domain of a glycosylphosphatidylinositol (GPI)-anchored protein, wherein the anchoring domain comprises at least about 200 amino acids and/or at least about 30% of the residues in the anchoring domain are serines or threonines.


Another aspect of the present disclosure is an engineered host cell comprising: an integrated coding sequence of a fusion protein comprising a catalytic domain of a heterologous glycosyl hydrolase; and an integrated coding sequence of a heterologous protein of interest (POI); wherein the engineered host cell does not endogenously express the glycosyl hydrolase and the POI; and wherein the glycosyl hydrolase is anchored on the surface of the engineered host cell.


In some embodiments, the glycosyl hydrolase is an invertase selected from: S. cerevisiae, Kluyveromyces lactis, Cyberlindnera jadinii, Oryza sativa japonica (rice), Oryza sativa japonica (rice), Arabidopsis thaliana, Arabidopsis thaliana, Arabidopsis thaliana, Rattus norvegicus (rat), Oryctolagus cuniculus (Rabbit), and Homo sapiens.


In some embodiments, the invertase is encoded by the SUC2 gene.


In some embodiments, the invertase is encoded by the MAL1 gene.


In some embodiments, the invertase is encoded by a gene selected from: invertase (INV1), cytosolic invertase 1 (CINV1), CIN2, CINV1, INVA, INVE, and sucrase-isomaltase (SI) gene.


In some embodiments, the fusion protein is surface-displayed on the engineered host cell; wherein the surface-displayed fusion protein comprises a catalytic domain of the glycosyl hydrolase and an anchoring domain of a glycosylphosphatidylinositol (GPI)-anchored protein, wherein the anchoring domain comprises at least about 200 amino acids and/or at least about 30% of the residues in the anchoring domain are serines or threonines.


In some embodiments, the anchoring domain comprises at least about 225 amino acids, at least about 250 amino acids, at least about 275 amino acids, at least about 300 amino acids, at least about 325 amino acids, at least about 350 amino acids, at least about 375 amino acids, or at least about 400 amino acids.


In some embodiments, at least about 35% of the residues in the anchoring domain are serines or threonines, at least about 40% of the residues in the anchoring domain are serines or threonines, at least about 45% of the residues in the anchoring domain are serines or threonines, or at least about 50% of the residues in the anchoring domain are serines or threonines.


In some embodiments, the serines or threonines in the anchoring domain are capable of being O-mannosylated.


In some embodiments, a fusion protein having an anchoring domain comprising at least about 325 amino acids provides greater glycosyl hydrolase activity relative to a fusion protein having an anchoring domain comprising less than about 300 amino acids.


In some embodiments, a fusion protein having an anchoring domain comprising at least about 300 amino acids provides greater glycosyl hydrolase activity relative to a fusion protein having an anchoring domain comprising less than about 250 amino acids.


In some embodiments, the fusion protein comprises the anchoring domain of the GPI anchored protein.


In some embodiments, the fusion protein comprises the GPI anchored protein without its native signal peptide or native secretory signal.


In some embodiments, the GPI anchored protein is not native to the engineered host cell.


In some embodiments, the GPI anchored protein is naturally expressed by a S. cerevisiae cell and the engineered host cell is not a S. cerevisiae cell.


In some embodiments, the GPI anchored protein is selected from Tir4, Dan1, or Sed1.


In some embodiments, an anchoring domain of the GPI anchored protein comprises an amino acid sequence that is at least 70% identical, at least 75% identical, at least 80% identical, at least 85% identical, at least 90% identical, or at least 95% identical, to one of SEQ ID NO: 1 to SEQ ID NO: 14.


In some embodiments, the anchoring domain of the GPI anchored protein comprises an amino acid sequence of one of SEQ ID NO: 1 to SEQ ID NO: 14.


In some embodiments, the engineered host cell is a yeast cell.


In some embodiments, the engineered host cell is a Pichia species.


In some embodiments, the Pichia species is Pichia pastoris.


In some embodiments, the engineered host cell comprises a genomic modification that expresses the fusion.


In some embodiments, the fusion protein comprises a portion of the glycosyl hydrolase in addition to its catalytic domain.


In some embodiments, the fusion protein comprises substantially the entire amino acid sequence of the glycosyl hydrolase.


In some embodiments, in the fusion protein, the catalytic domain is N-terminal to the anchoring domain.


In some embodiments, in the fusion protein, the catalytic domain is C-terminal to the anchoring domain.


In some embodiments, the fusion protein comprises a linker between the catalytic domain and the anchoring domain.


In some embodiments, wherein, upon translation, the fusion protein comprises a signal peptide and/or a secretory signal.


In some embodiments, wherein a growth rate of the engineered host cell in a media containing sucrose as a primary carbon source is higher than a growth rate of a control host cell, wherein the control host cell is identical to the engineered host cell, except the control cell does not express the glycosyl hydrolase.


In some embodiments, wherein the engineered eukaryotic cell comprises a genomic modification that overexpresses a secreted recombinant protein and/or comprises an extrachromosomal modification that overexpresses a secreted recombinant protein.


In some embodiments, wherein the secreted recombinant protein is an animal protein.


In some embodiments, wherein the animal protein is an egg protein.


In some embodiments, wherein the egg protein is selected from the group consisting of ovalbumin, ovomucoid, lysozyme ovoglobulin G2, ovoglobulin G3, α-ovomucin, β-ovomucin, ovotransferrin, ovoinhibitor, ovoglycoprotein, flavoprotein, ovomacroglobulin, ovostatin, cystatin, avidin, ovalbumin related protein X, and ovalbumin related protein Y.


In some embodiments, wherein the genomic modification and/or the extrachromosomal modification that overexpresses the secreted recombinant protein comprises an inducible promoter.


In some embodiments, wherein the inducible promoter is an AOX1, DAK2, PEX11, FLD1, FGH1, DAS1, DAS2, CAT1, MDH3, HAC1, BiP, RAD30, RVS161-2, MPP10, THP3, TLR, GBP2, PMP20, SHB17, PEX8, PEX4, or TKL3 promoter.


In some embodiments, wherein the genomic modification and/or the extrachromosomal modification that overexpresses a secreted recombinant protein comprises an AOX1, TDH3, MOX, RPS25A, or RPL2A terminator.


T In some embodiments, wherein the genomic modification and/or the extrachromosomal modification that overexpresses a secreted recombinant protein encodes a signal peptide and/or a secretory signal.


In some embodiments, wherein the genomic modification and/or the extrachromosomal modification that overexpresses a secreted recombinant protein comprises codons that are optimized for the species of the engineered eukaryotic cell.


In some embodiments, wherein the secreted recombinant protein is designed to be secreted from the cell and/or is capable of being secreted from the cell.


In some embodiments, wherein the fusion protein comprises an amino acid sequence that is at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identical to the amino acid sequence selected from SEQ ID NOs: 315, 332-335, and 342.


In some embodiments, wherein the fusion protein comprises a nucleotide sequence that is at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identical to the nucleotide sequence of SEQ ID ON: 314.


Another aspect of the present disclosure is a method of growing/culturing the engineered host cell, wherein the method comprises culturing the engineered host cell with a carbon source that is not naturally utilized by the host cell in the absence of the glycosyl hydrolase.


Another aspect of the present disclosure is a method for growing/culturing a host cell with a carbon source that is not naturally utilized by the host cell, the method comprising: (a) recombinantly producing in the host cell, a fusion protein comprising a catalytic domain of a glycosyl hydrolase capable of digesting sucrose; optionally, wherein the glycosyl hydrolase capable of digesting sucrose is an invertase; (b) recombinantly producing in the host cell a heterologous protein of interest (POI); wherein the host cell does not express the glycosyl hydrolase endogenously; wherein the engineered host cell prior to step (a) does not utilize sucrose as a carbon source as efficiently as glucose, and wherein the glycosyl hydrolase is expressed on the surface of the engineered host cell.


Another aspect of the present disclosure is a method for manufacturing a host cell capable of utilizing a carbon source that is not naturally utilized by the host cell, the method comprising: (a) obtaining a host cell that recombinantly expresses a fusion protein comprising a catalytic domain of a glycosyl hydrolase capable of digesting sucrose; optionally, wherein the glycosyl hydrolase capable of digesting sucrose is an invertase; and (b) genetically modifying the host cell to express a heterologous protein of interest (POI); wherein the host cell does not utilize sucrose as a carbon source as efficiently as glucose in the absence of the glycosyl hydrolase.


Another aspect of the present disclosure is a method for manufacturing a host cell capable of utilizing a carbon source that is not naturally utilized by the host cell, the method comprising: (a) obtaining a host cell that recombinantly expresses a heterologous protein of interest (POI); and (b) genetically modifying the host cell to express a fusion protein comprising a catalytic domain of a glycosyl hydrolase capable of digesting sucrose; optionally, wherein the glycosyl hydrolase capable of digesting sucrose is an invertase; wherein the host cell prior to step (b) does not utilize sucrose as a carbon source as efficiently as glucose.


Any aspect or embodiment described herein can be combined with any other aspect or embodiment as disclosed herein.





BRIEF DESCRIPTION OF THE DRAWINGS

The novel features of the invention are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings of which:



FIG. 1 illustrates the growth of P. pastoris on minimal nutrient plates containing glucose, fructose and sucrose.



FIG. 2 illustrates an exemplary schematic of a construct to express a surface displayed protein comprising SUC2 and an anchored protein Tir4.



FIG. 3 illustrates the growth of P. pastoris strains using mannose as a sole carbon source.



FIG. 4 illustrates the growth of P. pastoris strains using glucose or sucrose as a sole carbon source. The strains labelled “_D” in FIG. 4 denote that dextrose (glucose) was used as the carbon source in the experimental condition. The strains labelled “_S” in FIG. 4 denote that sucrose was used as the carbon source in the experimental condition.



FIG. 5 is an SDS-PAGE gel comparing protein of interest production in P. pastoris strains using glucose or sucrose as a sole carbon source.





DETAILED DESCRIPTION

High-yielding recombinant protein expression is a cornerstone of various industries such as therapeutic proteins, food industry, cosmetics, etc. The growth of host cells in readily available media to produce such recombinant proteins is therefore one of the most important factors not only from an economic perspective but also from an environment perspective. Recombinant protein expression using commonly available carbon sources, while maintaining high titers of the recombinant proteins is necessary. The present invention addresses this need. The systems and methods provide high-titer expression of recombinant proteins in large scale production using genetic modifications to the host cell which are capable of utilizing carbon sources not usually utilized by the host cell and are particularly useful for expressing pure heterologous animal derived proteins in a microbial host.


Host Cell

As used herein, a “host cell” refers to a cell which is capable of protein expression and optionally protein secretion. Such host cell is applied in the methods of the present invention. For that purpose, for the host cell to express a polypeptide, a nucleotide sequence encoding the polypeptide is present or introduced in the cell. Host cells provided by the present invention can be prokaryotes or eukaryotes. As will be appreciated by one of skill in the art, a prokaryotic cell lacks a membrane-bound nucleus, while a eukaryotic cell has a membrane-bound nucleus. Examples of eukaryotic cells include, but are not limited to, vertebrate cells, mammalian cells, human cells, animal cells, invertebrate cells, plant cells, nematodal cells, insect cells, stem cells, fungal cells or yeast cells.


Examples of yeast cells include but are not limited to the Saccharomyces genus (e.g. Saccharomyces cerevisiae, Saccharomyces kluyveri, Saccharomyces uvarum), the Komagataella genus (Komagataella pastoris, Komagataella pseudopastoris or Komagataella phaffii), Kluyveromyces genus (e.g. Kluyveromyces lactis, Kluyveromyces marxianus), the Candida genus (e.g. Candida utilis, Candida cacaoi), the Geotrichum genus (e.g. Geotrichum fermentans), as well as Hansenula polymorpha and Yarrowia lipolytica. A host cell may also be a member of the following species: Arxula spp., Arxula adeninivorans, Kluyveromyces spp., Kluyveromyces lactis, Komagataella phaffii, Pichia spp., Pichia angusta, Pichia pastoris, Saccharomyces spp., Saccharomyces cerevisiae, Schizosaccharomyces spp., Schizosaccharomyces pombe, Yarrowia spp., Yarrowia lipolytica, Agaricus spp., Agaricus bisporus, Aspergillus spp., Aspergillus awamori, Aspergillus fumigatus, Aspergillus nidulans, Aspergillus niger, Aspergillus oryzae, Bacillus subtilis, Colletotrichum spp., Colletotrichum gloeosporiodes, Endothia spp., Endothia parasitica, Escherichia coli, Fusarium spp., Fusarium graminearum, Fusarium solani, Mucor spp., Mucor miehei, Mucor pusillus, Myceliophthora spp., Myceliophthora thermophila, Neurospora spp., Neurospora crassa, Penicillium spp., Penicillium camemberti, Penicillium canescens, Penicillium chrysogenum, Penicillium (Talaromyces) emersonii, Penicillium funiculo sum, Penicillium purpurogenum, Penicillium roqueforti, Pleurotus spp., Pleurotus ostreatus, Rhizomucor spp., Rhizomucor miehei, Rhizomucor pusillus, Rhizopus spp., Rhizopus arrhizus, Rhizopus oligosporus, Rhizopus oryzae, Trichoderma spp., Trichoderma altroviride, Trichoderma reesei, or Trichoderma vireus.


The genus Pichia is of particular interest. Pichia comprises a number of species, including the species Pichia pastoris, Pichia methanolica, Pichia kluyveri, and Pichia angusta. Most preferred is the species Pichia pastoris.


The former species Pichia pastoris has been divided and renamed to Komagataella pastoris and Komagataella phaffii. Therefore, Pichia pastoris is synonymous for both Komagataella pastoris and Komagataella phaffii.


In some embodiments, the host cell is a Pichia pastoris, Hansenula polymorpha, Trichoderma reesei, Saccharomyces cerevisiae, Kluyveromyces lactis, Yarrowia lipolytica, Pichia methanolica, Candida boidinii, and Komagataella, and Schizosaccharomyces pombe.


Protein of Interest

The term “protein of interest” (POI) as used herein refers to a protein that is produced by means of recombinant technology in a host cell. More specifically, the protein may either be a polypeptide not naturally occurring in the host cell, i.e. a heterologous protein, or else may be native to the host cell, i.e. a homologous protein to the host cell, but is produced, for example, by transformation with a self-replicating vector containing the nucleic acid sequence encoding the POI, or upon integration by recombinant techniques of one or more copies of the nucleic acid sequence encoding the POI into the genome of the host cell, or by recombinant modification of one or more regulatory sequences controlling the expression of the gene encoding the POI, e.g. of the promoter sequence. In general, the proteins of interest referred to herein may be produced by methods of recombinant expression well known to a person skilled in the art. Exemplary proteins of interest are provided in Table 6. A recombinant POI expressed in a host cell may comprise a sequence with at least 75%, at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 97% or at least 99% sequence identity to any of the sequences in Table 6.


There is no limitation with respect to the protein of interest (POI). The POI may comprise a eukaryotic or prokaryotic polypeptide, variant or derivative thereof. The POI can be any eukaryotic or prokaryotic protein. The protein can be a naturally secreted protein or an intracellular protein, i.e. a protein which is not naturally secreted. The present invention also includes biologically active fragments of proteins. In another embodiment, a POI may be an amino acid chain or present in a complex, such as a dimer, trimer, hetero-dimer, multimer or oligomer.


The protein of interest may be a protein used as nutritional, dietary, digestive, supplements, such as in food products, feed products, or cosmetic products. The food products may be, for example, bouillon, desserts, cereal bars, confectionery, sports drinks, dietary products or other nutrition products. Preferably, the protein of interest is a food additive.


Glycosyl Hydrolases

In some cases, a heterologous glycosyl hydrolase is produced in a host cell that has been engineered to express or overexpress one or more heterologous recombinant proteins such as the proteins of interest. A glycosyl hydrolase may be a surface-displayed enzyme that hydrolyses a disaccharide which allows a host cell to utilize a carbon source which it previously was unable to utilize or utilize efficiently. In some embodiments, a carbon source which a host cell is previously unable to utilize or utilize efficiently may comprise sucrose, maltose, fructose, high fructose corn syrup, molasses, or some combination thereof. In some embodiments, the carbon source which a host cell is previously unable to utilize or utilize efficiently may be present in a mixture with glucose. In some examples, a glycosyl hydrolase may be an enzyme that hydrolyzes a carbon source, e.g., a disaccharide, to its monomers, e.g., glucose, fructose, and galactose, which can be utilized by the host cell. For example, in some examples, the glycosyl hydrolase may be an invertase such as proteins encoded by the SUC2 or MAL1 genes which cleave a disaccharide sucrose to release glucose and fructose which can be utilized by a yeast such as P. pastoris. In some embodiments, the glycosyl hydrolase may be an invertase such as proteins encoded by the INV1, CINV1, CIN2, INVE, INVA, or SI genes which cleave a disaccharide sucrose to release glucose and fructose which can be utilized by a yeast. Additional non-limiting examples of glycosyl hydrolases include, but are not limited to: invertase, invertase 1, cytosolic invertase 1, Beta-fructofuranosidase, insoluble isoenzyme 2, Alkaline/neutral invertase, Alkaline/neutral invertase A, Alkaline/neutral invertase E, and Sucrase-isomaltase. Exemplary sequences for glycosyl hydrolases are provided in Table 2. A recombinant glycosyl hydrolase expressed in a host cell may comprise a sequence with at least 75%, at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 97% or at least 99% sequence identity to any of the sequences in Table 2.


In certain embodiments, the glycosyl hydrolase is of the family GHS. In certain embodiments, the glycosyl hydrolase is of the family GH7. In certain embodiments, the glycosyl hydrolase is of the family GH9. Such glycosyl hydrolases are found in PCT Application Publication No.: WO2009090381, which is hereby incorporated by reference in its entirety.


An engineered host cell expressing a heterologous glycosyl hydrolase may be cultured with a carbon source that is not naturally utilized by the host cell or not utilized as efficiently as glucose in the absence of the glycosyl hydrolase.


An engineered host cell expressing a heterologous glycosyl hydrolase may have a growth rate in a media containing sucrose as a primary carbon source is higher than a growth rate of a control host cell, wherein the control host cell is identical to the engineered host cell, except the control cell does not express the glycosyl hydrolase.


In some embodiments, an engineered host cell expressing a heterologous glycosyl hydrolase may have a growth rate in a media containing fructose as a primary carbon source is higher than a growth rate of a control host cell, wherein the control host cell is identical to the engineered host cell, except the control cell does not express the glycosyl hydrolase.


In some embodiments, an engineered host cell expressing a heterologous glycosyl hydrolase may have a growth rate in a media containing maltose as a primary carbon source is higher than a growth rate of a control host cell, wherein the control host cell is identical to the engineered host cell, except the control cell does not express the glycosyl hydrolase.


In some embodiments, an engineered host cell expressing a heterologous glycosyl hydrolase may have a growth rate in a media containing high fructose corn syrup as a primary carbon source is higher than a growth rate of a control host cell, wherein the control host cell is identical to the engineered host cell, except the control cell does not express the glycosyl hydrolase.


In some embodiments, an engineered host cell expressing a heterologous glycosyl hydrolase may have a growth rate in a media containing molasses as a primary carbon source is higher than a growth rate of a control host cell, wherein the control host cell is identical to the engineered host cell, except the control cell does not express the glycosyl hydrolase.


In some embodiments, an engineered host cell expressing a heterologous glycosyl hydrolase may have a growth rate in a media containing a disaccharide as a primary carbon source is higher than a growth rate of a control host cell, wherein the control host cell is identical to the engineered host cell, except the control cell does not express the glycosyl hydrolase.


In some embodiments, an engineered host cell expressing a heterologous glycosyl hydrolase may have a growth rate in a media containing a mixture of glucose and a disaccharide as a primary carbon source is higher than a growth rate of a control host cell, wherein the control host cell is identical to the engineered host cell, except the control cell does not express the glycosyl hydrolase.


In some embodiments, an engineered host cell expressing a heterologous glycosyl hydrolase may have a growth rate in a media containing a carbon source that is not glucose as a primary carbon source is higher than a growth rate of a control host cell, wherein the control host cell is identical to the engineered host cell, except the control cell does not express the glycosyl hydrolase.


Surface Display of Glycosyl Hydrolases

Surface displaying a catalytic domain of an enzyme provides effective and efficient means to project the catalytic domain into the extracellular space, thereby increasing the likelihood that the catalytic domain will encounter and catalyze an enzymatic reaction with its substrate, e.g., protein, lipid, carbohydrate, or another compound. In the present disclosure, a fusion protein is localized to the extracellular surface of a host cell, i.e., is surface displayed. This way, the catalytic domain is unlikely to contact an intracellular, membrane-associated, or cell wall protein, thereby lowering the opportunity for the enzyme to modify, degrade, or the like a substrate needed by the cell. In some embodiments, the fusion protein catalyzes a reaction that cleaves a disaccharide, which would allow the cell to utilize an alternate carbon source that was previously not possible or efficient. By cleaving the disaccharide into monosaccharides, the cell is able to use the monosaccharides even though the culturing medium did not include the monosaccharide. In further embodiments, the fusion protein expresses an enzyme, e.g., a sucrase, that digests an impurity secreted by the cell.


An aspect of the present disclosure is an engineered host cell that expresses a surface-displayed fusion protein. In some embodiments, host cells that can be engineered to express a surface-displayed fusion protein provided by the present invention can be prokaryotes or eukaryotes. As will be appreciated by one of skill in the art, a prokaryotic cell lacks a membrane-bound nucleus, while a eukaryotic cell has a membrane-bound nucleus. Examples of eukaryotic cells include, but are not limited to, vertebrate cells, mammalian cells, human cells, animal cells, invertebrate cells, plant cells, nematodal cells, insect cells, stem cells, fungal cells or yeast cells.


Examples of yeast cells that may be transformed to include one or more expression cassettes include but are not limited to the Saccharomyces genus (e.g. Saccharomyces cerevisiae, Saccharomyces kluyveri, Saccharomyces uvarum), the Komagataella genus (Komagataella pastoris, Komagataella pseudopastoris or Komagataella phaffii), Kluyveromyces genus (e.g. Kluyveromyces lactis, Kluyveromyces mandanus), the Candida genus (e.g. Candida utilis, Candida cacaoi, the Geotrichum genus (e.g. Geotrichum fermentans), as well as Hansenula polymorpha and Yarrowia lipolytica. A host cell may also be a member of the following species: Arxula spp., Arxula adeninivorans, Kluyveromyces spp., Kluyveromyces lactis, Komagataella phaffii, Pichia spp., Pichia angusta, Pichia pastoris, Saccharomyces spp., Saccharomyces cerevisiae, Schizosaccharomyces spp., Schizosaccharomyces pombe, Yarrowia spp., Yarrowia lipolytica, Agaricus spp., Agaricus bisporus, Aspergillus spp., Aspergillus awamori, Aspergillus fumigatus, Aspergillus nidulans, Aspergillus niger, Aspergillus oryzae, Bacillus subtilis, Colletotrichum spp., Colletotrichum gloeosporiodes, Endothia spp., Endothia parasitica, Escherichia coli, Fusarium spp., Fusarium graminearum, Fusarium solani, Mucor spp., Mucor miehei, Mucor pusillus, Myceliophthora spp., Myceliophthora thermophila, Neurospora spp., Neurospora crassa, Penicillium spp., Penicillium camemberti, Penicillium canescens, Penicillium chrysogenum, Penicillium (Talaromyces) emersonii, Penicillium funiculo sum, Penicillium purpurogenum, Penicillium roqueforti, Pleurotus spp., Pleurotus ostreatus, Rhizomucor spp., Rhizomucor miehei, Rhizomucor pusillus, Rhizopus spp., Rhizopus arrhizus, Rhizopus oligosporus, Rhizopus oryzae, Trichoderma spp., Trichoderma altroviride, Trichoderma reesei, or Trichoderma vireus.


The genus Pichia is of particular interest. Pichia comprises a number of species, including the species Pichia pastoris, Pichia methanolica, Pichia kluyveri, and Pichia angusta. Most preferred is the species Pichia pastoris.


The former species Pichia pastoris has been divided and renamed to Komagataella pastoris and Komagataella phaffii. Therefore, Pichia pastoris is synonymous for both Komagataella pastoris and Komagataella phaffii.


In some embodiments, the host cell is a Pichia pastoris, Hansenula polymorpha, Trichoderma reesei, Saccharomyces cerevisiae, Kluyveromyces lactis, Yarrowia lipolytica, Pichia methanolica, Candida boidinii, and Komagataella, and Schizosaccharomyces pombe.


In some embodiments, the engineered host cell expresses a surface-displayed fusion protein. The fusion protein comprising a catalytic domain of an enzyme and an anchoring domain of a glycosylphosphatidylinositol (GPI)-anchored protein, wherein the anchoring domain comprises at least about 200 amino acids and/or at least about 30% of the residues in the anchoring domain are serines or threonines.


A fusion protein is a protein consisting of at least two domains that are normally encoded by separate genes but have been joined so that they are transcribed and translated as a single unit; thereby, producing a single (fused) polypeptide.


In the present disclosure, a fusion protein comprises at least a catalytic domain of an enzyme such as a glycosyl hydrolase and an anchoring domain of GPI-anchored protein. Typically, a GPI-anchored protein is a cell surface protein, e.g., which is located on the extracellular surface of the cell.


A fusion protein may further comprise linkers that separate the two domains. Linkers can be flexible or rigid; they can be semi-flexible or semi-rigid. Separating the two domains, may promote activity of the catalytic domain in that it reduces steric hindrance upon the catalytic site which may be present if the catalytic site is too closely positioned relative to an anchoring domain. Additionally, a linker may further project the catalytic domain into the extracellular space, thereby increasing the likelihood that the catalytic domain will encounter and catalyze an enzymatic reaction with its substrate, e.g., protein, lipid, carbohydrate, or other compounds.


In embodiments, the anchoring domain comprises at least about 225 amino acids, at least about 250 amino acids, at least about 275 amino acids, at least about 300 amino acids, at least about 325 amino acids, at least about 350 amino acids, at least about 375 amino acids, or at least about 400 amino acids.


In some embodiments, at least about 35% of the residues in the anchoring domain are serines or threonines, at least about 40% of the residues in the anchoring domain are serines or threonines, at least about 45% of the residues in the anchoring domain are serines or threonines, or at least about 50% of the residues in the anchoring domain are serines or threonines.


In various embodiments, the serines or threonines in the anchoring domain are capable of being 0-mannosylated.


In embodiments, a fusion protein having an anchoring domain comprising at least about 325 amino acids provides greater enzymatic activity relative to a fusion protein having an anchoring domain comprising less than about 300 amino acids.


In some embodiments, a fusion protein having an anchoring domain comprising at least about 300 amino acids provides greater enzymatic activity relative to a fusion protein having an anchoring domain comprising less than about 250 amino acids.


In some embodiments, the fusion protein comprises the GPI anchored protein without its native signal peptide. In some embodiments, the fusion protein comprises the GPI anchored protein without a C terminus region having amino acid sequence of GAAKAVIGMGAGALAAVAAML (SEQ ID NO: 336). In some embodiments, the fusion protein comprises the GPI anchored protein with a C terminus region having amino acid sequence of GAAKAVIGMGAGALAAVAAML (SEQ ID NO: 336).


In some embodiments, the GPI anchored protein is not native to the engineered eukaryotic cell.


In various embodiments, the GPI anchored protein is naturally expressed by a S. cerevisiae cell and the engineered eukaryotic cell is not a S. cerevisiae cell.


In embodiments, the GPI anchored protein is selected from Tir4, Dan1, Dan4, Sag1, FIG. 2, or Sed1.


In some embodiments, the anchoring domain of the GPI anchored protein comprises an amino acid sequence that is at least 70% identical, at least 75% identical, at least 80% identical, at least 85% identical, at least 90% identical, or at least 95% identical, to one of SEQ ID NO: 1 to SEQ ID NO: 14.


In various embodiments, the anchoring domain of the GPI anchored protein comprises an amino acid sequence of one of SEQ ID NO: 1 to SEQ ID NO: 14.


Sed1p is a major component of the Saccharomyces cerevisiae cell wall. It is required to stabilize the cell wall and for stress resistance in stationary-phase cells. See, e.g., the world wide web (at) uniprot.org/uniprot/Q01589. It is believed that Asn318 (with respect to SEQ ID NO: 13) is the most likely candidate for the GPI attachment site in Sed1p. In some embodiments, a fusion protein comprising a Sed1p anchoring domain has a sequence having at least 95% or more sequence identity with SEQ ID NO:13 or SEQ ID NO: 14. In some cases, the sequence identity may be greater than or about 90%, 95%, 96%, 97%, 98%, 99%, or 100%. In various embodiments, the Sed1p anchoring domain of a fusion protein of the present disclosure comprises a GPI attachment site; thus, the anchoring domain may only require a short fragment of SEQ ID NO: 13 or SEQ ID NO: 14, i.e., a fragment that is 5, 10, 50, 100, 200, or 300 or more amino acids in length, as long as it is capable of projecting the catalytic domain of the fusion protein into the extracellular space. In some embodiments, the anchoring domain comprises, at least, Sed1p's GPI attachment site.


When a linker is present, a fusion protein may have a general structure of: N terminus -(a)-(b)-(c)-C terminus, wherein (a) is comprises a first domain, (b) is one or more linkers, and (c) is a second domain. The first domain may comprise a catalytic domain of an enzyme and the second domain may comprise an anchoring domain of a GPI anchored protein. In some embodiments, in the fusion protein, the catalytic domain is N-terminal to the anchoring domain. The fusion protein may comprise a linker N-terminal to the anchoring domain.


Linkers useful in fusion proteins may comprise one or more sequences of Table 3. In one example, a tandem repeat (of two, three, four, five, six, or more copies) of a linker, e.g., of SEQ ID NO: 33 or SEQ ID NO: 34 is included in a fusion protein.


In embodiments, a fusion protein comprises a Glu-Ala-Glu-Ala (EAEA; SEQ ID NO: 19) spacer dipeptide repeat. The EAEA (SEQ ID NO: 19) is a signal that promotes yields of an expressed protein in certain cell types.


Other linkers are well-known in the art and can be substituted for the linkers of Table 3. For example, in embodiments, the linker may be derived from naturally-occurring multi-domain proteins or are empirical linkers as described, for example, in Chichili et al., (2013), Protein Sci. 22(2):153-167, Chen et al., (2013), Adv Drug Deliv Rev. 65(10):1357-1369, the entire contents of which are hereby incorporated by reference. In embodiments, the linker may be designed using linker designing databases and computer programs such as those described in Chen et al., (2013), Adv Drug Deliv Rev. 65(10):1357-1369 and Crasto et. al., (2000), Protein Eng. 13(5):309-312, the entire contents of which are hereby incorporated by reference.


In embodiments, the linker comprises a polypeptide. In embodiments, the polypeptide is less than about 500 amino acids long, about 450 amino acids long, about 400 amino acids long, about 350 amino acids long, about 300 amino acids long, about 250 amino acids long, about 200 amino acids long, about 150 amino acids long, or about 100 amino acids long. For example, the linker may be less than about 100, about 95, about 90, about 85, about 80, about 75, about 70, about 65, about 60, about 55, about 50, about 45, about 40, about 35, about 30, about 25, about 20, about 19, about 18, about 17, about 16, about 15, about 14, about 13, about 12, about 11, about 10, about 9, about 8, about 7, about 6, about 5, about 4, about 3, or about 2 amino acids long. In some cases, the linker is about 59 amino acids long.


The length of a linker may be important to the effectiveness of a surface displayed enzyme's catalytic domain. For example, if a linker is too short, then the catalytic domain of the enzyme may not project far enough away from the cell surface such that it is incapable of interacting with its substrate, e.g., protein, lipid, carbohydrate, or another compound. In this case, the catalytic domain may be buried in the cell wall and/or among other cell surface proteins or sugars. On the other hand, the linker may be too long and/or too rigid to allow adequate contact between a substrate and the catalytic domain of the enzyme.


The secondary structure of a linker may also be important to the effectiveness of a surface displayed enzyme's catalytic domain. More specifically, a linker designed to have a plurality of distinct regions may provide additional flexibility to the fusion protein. As examples, a linker having one or more alpha helices may be superior to a linker having no alpha helices.


The longer linker comprises three subsections: an N-terminal flexible GS linker with higher S content, a rigid linker that forms four turns of an alpha helix, and a flexible GS linker with much higher G content on its C-terminus. Linkers containing only G's and S's in repetitive sequences are commonly used in fusion proteins as flexible spacers that do not introduce secondary structure. In some cases, the ratio of G to S determines the flexibility of the linker. Linkers with higher G content may be more flexible than linkers with higher S content. The structure of the linker of SEQ ID NO: 31 is designed to mimic multi-domain proteins in nature, which often uses alpha helices (sometimes multiple) to separate as well as orient their domains spatially. In fusion proteins of the present disclosure, a complex linker, such as that of SEQ ID NO: 32 can be viewed as a multi-domain protein with the catalytic domain of an enzyme and an anchoring domain of a GPI anchored protein being separate functional domains.


In various embodiments, the fusion protein comprises a linker having an amino acid sequence that is at least 95%, 96%, 97%, 98%, 99%, or 100% identical to SEQ ID NO: 32.


In embodiments, the linker is substantially comprised of glycine and serine residues (e.g. about 30%, or about 40%, or about 50%, or about 60%, or about 70%, or about 80%, or about 90%, or about 95%, or about 96%, or about 97%, or about 98%, or about 99%, or about 100% glycines and serines).


In various embodiments, the engineered eukaryotic cell comprises a genomic modification that expresses the fusion protein and/or comprises an extrachromosomal modification that expresses the fusion protein.


In embodiments, the fusion protein comprises a portion of the enzyme in addition to its catalytic domain.


In some embodiments, the fusion protein comprises substantially the entire amino acid sequence of the enzyme.


In some embodiments, upon translation, the fusion protein comprises a signal peptide and/or a secretory signal. In certain embodiments, the fusion protein comprises a signal peptide and a secretory signal.


In some embodiments, the fusion protein comprises an amino acid sequence having at least at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% amino acid sequence identity to the amino acid sequence selected from SEQ ID NOs: 315, and 332-335. In some embodiments, the fusion protein comprises an amino acid sequence having at least at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% amino acid sequence identity to the amino acid sequence of SEQ ID NO: 315. In some embodiments, the fusion protein comprises an amino acid sequence having at least at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% amino acid sequence identity to the amino acid sequence of SEQ ID NO: 332. In some embodiments, the fusion protein comprises an amino acid sequence having at least at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% amino acid sequence identity to the amino acid sequence of SEQ ID NO: 333. In some embodiments, the fusion protein comprises an amino acid sequence having at least at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% amino acid sequence identity to the amino acid sequence of SEQ ID NO: 334. In some embodiments, the fusion protein comprises an amino acid sequence having at least at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% amino acid sequence identity to the amino acid sequence of SEQ ID NO: 335.


In various embodiments, the engineered eukaryotic cell comprises two or more fusion proteins, three or more fusion proteins, or four fusion proteins.


In some cases, the two or more fusion proteins comprise different enzyme types or the two or more fusion proteins comprise the same enzyme type.


In various cases, the two of the three or more fusion proteins or two of the four or more fusion proteins comprise different enzyme types or two of the three or more fusion proteins or two of the four or more fusion proteins comprise the same enzyme type.


In additional cases, the three of the three or more fusion proteins or three of the four or more fusion proteins comprise different enzyme types or three of the three or more fusion proteins or three of the four or more fusion proteins comprise the same enzyme type.


In various cases, each of the two or more, three or more, or four fusion proteins comprise different enzyme types or each of the two or more, three or more, or four fusion proteins comprise the same enzyme type.


In embodiments, the enzyme types are selected from an enzyme that catalyzes a post-translational modification of a protein secreted by the engineered eukaryotic cell, an enzyme that catalyzes a reaction which allows the engineered eukaryotic cell to rely on alternate carbon sources.


In some embodiments, an engineered host cell expressing a fusion protein may have a growth rate in a media containing fructose as a primary carbon source is higher than a growth rate of a control host cell, wherein the control host cell is identical to the engineered host cell, except the control cell does not express the fusion protein.


In some embodiments, an engineered host cell expressing a fusion protein may have a growth rate in a media containing maltose as a primary carbon source is higher than a growth rate of a control host cell, wherein the control host cell is identical to the engineered host cell, except the control cell does not express the fusion protein.


In some embodiments, an engineered host cell expressing a fusion protein may have a growth rate in a media containing high fructose corn syrup as a primary carbon source is higher than a growth rate of a control host cell, wherein the control host cell is identical to the engineered host cell, except the control cell does not express the fusion protein.


In some embodiments, an engineered host cell expressing a fusion protein may have a growth rate in a media containing molasses as a primary carbon source is higher than a growth rate of a control host cell, wherein the control host cell is identical to the engineered host cell, except the control cell does not express the fusion protein.


In some embodiments, an engineered host cell expressing a fusion protein may have a growth rate in a media containing a disaccharide as a primary carbon source is higher than a growth rate of a control host cell, wherein the control host cell is identical to the engineered host cell, except the control cell does not express the fusion protein.


In some embodiments, an engineered host cell expressing a fusion protein may have a growth rate in a media containing a mixture of glucose and a disaccharide as a primary carbon source is higher than a growth rate of a control host cell, wherein the control host cell is identical to the engineered host cell, except the control cell does not express the fusion protein.


In some embodiments, an engineered host cell expressing a fusion protein may have a growth rate in a media containing a carbon source that is not glucose as a primary carbon source is higher than a growth rate of a control host cell, wherein the control host cell is identical to the engineered host cell, except the control cell does not express the fusion protein.


Transporter Proteins

In some cases, a heterologous transporter protein is produced in a host cell that has been engineered to express or overexpress one or more heterologous recombinant proteins such as the proteins of interest. A transporter protein may be a protein that allows the host cell to transport a carbon source into the host cell. The host cell then may be able to catalyze a reaction which allows the host cell to utilize a carbon source which it previously was unable to utilize or utilize efficiently. In some embodiments, the transporter protein may be a sucrose permease (such as encoded by the MAL 11 or AGT1 genes) or a maltose permease (such as encoded by the MAL2 gene). Exemplary sequences for glycosyl hydrolases are provided in Table 10. A recombinant glycosyl hydrolase expressed in a host cell may comprise a sequence with at least 75%, at least 80%, at least 85%, at least 90%, at least 92%, at least 95%, at least 97% or at least 99% sequence identity to any of the sequences in Table 10. In certain embodiments, the sucrose permease is a CscB sucrose permease. Exemplary sequences of sucrose permeases can be found in PCT Application Publication No.: WO2022129470, which is hereby incorporated by reference in its entirety.


An engineered host cell expressing a heterologous transporter protein may be cultured with a carbon source that is not naturally utilized by the host cell or not utilized as efficiently as glucose in the absence of the transporter protein.


An engineered host cell expressing a heterologous transporter protein may have a growth rate in a media containing sucrose as a primary carbon source is higher than a growth rate of a control host cell, wherein the control host cell is identical to the engineered host cell, except the control cell does not express the transporter protein.


In some embodiments, an engineered host cell expressing a heterologous transporter protein may have a growth rate in a media containing fructose as a primary carbon source is higher than a growth rate of a control host cell, wherein the control host cell is identical to the engineered host cell, except the control cell does not express the transporter protein.


In some embodiments, an engineered host cell expressing a heterologous transporter protein may have a growth rate in a media containing maltose as a primary carbon source is higher than a growth rate of a control host cell, wherein the control host cell is identical to the engineered host cell, except the control cell does not express the transporter protein.


In some embodiments, an engineered host cell expressing a heterologous transporter protein may have a growth rate in a media containing high fructose corn syrup as a primary carbon source is higher than a growth rate of a control host cell, wherein the control host cell is identical to the engineered host cell, except the control cell does not express the transporter protein.


In some embodiments, an engineered host cell expressing a heterologous transporter protein may have a growth rate in a media containing molasses as a primary carbon source is higher than a growth rate of a control host cell, wherein the control host cell is identical to the engineered host cell, except the control cell does not express the transporter protein.


In some embodiments, an engineered host cell expressing a heterologous transporter protein may have a growth rate in a media containing a disaccharide as a primary carbon source is higher than a growth rate of a control host cell, wherein the control host cell is identical to the engineered host cell, except the control cell does not express the transporter protein.


In some embodiments, an engineered host cell expressing a heterologous transporter protein may have a growth rate in a media containing a mixture of glucose and a disaccharide as a primary carbon source is higher than a growth rate of a control host cell, wherein the control host cell is identical to the engineered host cell, except the control cell does not express the transporter protein.


In some embodiments, an engineered host cell expressing a heterologous transporter protein may have a growth rate in a media containing a carbon source that is not glucose as a primary carbon source is higher than a growth rate of a control host cell, wherein the control host cell is identical to the engineered host cell, except the control cell does not express the transporter protein.


In some cases, the engineered host cell may endogenously express a glycosyl hydrolase which can utilize the alternate carbon source, but it is unable to do so efficiently. In such cases, a transporter protein may increase the uptake of the alternate carbon source and therefore increase the metabolization of the alternate carbon source.


In some cases, the engineered host cell may not express a glycosyl hydrolase which is able to hydrolyze an alternate carbon source. In such examples, the host cell may be engineered to express a heterologous glycosyl hydrolase which is able to hydrolyze the alternate carbon source.


Expression of Recombinant Proteins

Expression of a recombinant proteins can be provided by an expression vector, a plasmid, a nucleic acid integrated into the host genome or other means. For example, a vector for expression can include: (a) a promoter element, (b) a signal peptide, (c) a heterologous protein sequence, and (d) a terminator element.


Expression vectors that can be used for expression of a recombinant proteins include those containing an expression cassette with elements (a), (b), (c) and (d). In some embodiments, the signal peptide (c) need not be included in the vector. In general, the expression cassette is designed to mediate the transcription of the transgene when integrated into the genome of a cognate host microorganism.


To aid in the amplification of the vector prior to transformation into the host microorganism, a replication origin (e) may be contained in the vector (such as pUC ORIC and pUC (DNA2.0)). To aide in the selection of microorganism stably transformed with the expression vector, the vector may also include a selection marker (f) such as URA3 gene and Zeocin resistance gene (ZeoR). The expression vector may also contain a restriction enzyme site (g) that allows for linearization of the expression vector prior to transformation into the host microorganism to facilitate the expression vectors stable integration into the host genome. In some embodiments the expression vector may contain any subset of the elements (b), (e), (f), and (g), including none of elements (b), (e), (f), and (g). Other expression elements and vector elements known to one of skill in the art can be used in combination or substituted for the elements described herein.


Exemplary promoter elements (a) may include, but are not limited to, a constitutive promoter, inducible promoter, and hybrid promoter. Promoters include, but are not limited to, acu-5, adh1+, alcohol dehydrogenase (ADH1, ADH2, ADH4), AHSB4m, AINV, alcA, α-amylase, alternative oxidase (AOD), alcohol oxidase I (AOX1), alcohol oxidase 2 (AOX2), AXDH, B2, CaMV, cellobiohydrolase I (cbh1), ccg-1, cDNA1, cellular filament polypeptide (cfp), cpc-2, ctr4+, CUP1, dihydroxyacetone synthase (DAS), enolase (ENO, ENO1), formaldehyde dehydrogenase (FLD1), FMD, formate dehydrogenase (FMDH), G1, G6, GAA, GAL1, GAL2, GAL3, GAL4, GAL5, GAL6, GALT, GAL5, GAL5, GAL10, GCW14, gdhA, gla-1, α-glucoamylase (glaA), glyceraldehyde-3-phosphate dehydrogenase (gpdA, GAP, GAPDH), phosphoglycerate mutase (GPM1), glycerol kinase (GUT1), HSP82, inv1+, isocitrate lyase (ICL1), acetohydroxy acid isomeroreductase (ILV5), KAR2, KEX2, O-galactosidase (lac4), LEU2, melO, MET3, methanol oxidase (MOX), nmt1, NSP, pcbC, PETS, peroxin 8 (PEX8), phosphoglycerate kinase (PGK, PGK1), pho1, PHO5, PH089, phosphatidylinositol synthase (PIS1), PYK1, pyruvate kinase (pki1), RPS7, sorbitol dehydrogenase (SDH), 3-phosphoserine aminotransferase (SERI), SSA4, SV40, TEF, translation elongation factor 1 alpha (TEF1), THI11, homoserine kinase (THR1), tpi, TPS1, triose phosphate isomerase (TPI1), XRP2, YPT1, and any combination thereof. Illustrative inducible promoters include methanol-induced promoters, e.g., DAS1 and PEX11. Exemplary promoter sequences are provided in Table 4.


A signal peptide (b), also known as a signal sequence, targeting signal, localization signal, localization sequence, signal peptide, transit peptide, leader sequence, or leader peptide, may support secretion of a protein or polynucleotide. Extracellular secretion of a recombinant or heterologously expressed protein from a host cell may facilitate protein purification. A signal peptide may be derived from a precursor (e.g., prepropeptide, preprotein) of a protein. Signal peptides can be derived from a precursor of a protein other than the signal peptides in native a recombinant protein.


Any nucleic acid sequence that encodes a recombinant protein can be used as (c). Preferably such sequence is codon optimized for the species/genus/kingdom of the host cell.


Exemplary transcriptional terminator elements include, but are not limited to, acu-5, adh1+, alcohol dehydrogenase (ADH1, ADH2, ADH4), AHSB4m, AINV, alcA, α-amylase, alternative oxidase (AOD), alcohol oxidase I (AOX1), alcohol oxidase 2 (AOX2), AXDH, B2, CaMV, cellobiohydrolase I (cbh1), ccg-1, cDNA1, cellular filament polypeptide (cfp), cpc-2, ctr4+, CUP1, dihydroxyacetone synthase (DAS), enolase (ENO, ENO1), formaldehyde dehydrogenase (FLD1), FMD, formate dehydrogenase (FMDH), G1, G6, GAA, GAL1, GAL2, GAL3, GAL4, GAL5, GAL6, GALT, GAL5, GAL5, GAL10, GCW14, gdhA, gla-1, α-glucoamylase (glaA), glyceraldehyde-3-phosphate dehydrogenase (gpdA, GAP, GAPDH), phosphoglycerate mutase (GPM1), glycerol kinase (GUT1), HSP82, inv1+, isocitrate lyase (ICL1), acetohydroxy acid isomeroreductase (ILV5), KAR2, KEX2, (3-galactosidase (lac4), LEU2, melO, MET3, methanol oxidase (MOX), nmt1, NSP, pcbC, PETS, peroxin 8 (PEX8), phosphoglycerate kinase (PGK, PGK1), pho1, PHO5, PH089, phosphatidylinositol synthase (PIS1), PYK1, pyruvate kinase (pki1), RPS7, sorbitol dehydrogenase (SDH), 3-phosphoserine aminotransferase (SERI), SSA4, SV40, TEF, translation elongation factor 1 alpha (TEF1), THI11, homoserine kinase (THR1), tpi, TPS1, triose phosphate isomerase (TPI1), XRP2, YPT1, and any combination thereof. Exemplary promoter sequences are provided in Table 5.


Exemplary selectable markers (f) may include but are not limited to: an antibiotic resistance gene (e.g. zeocin, ampicillin, blasticidin, kanamycin, nurseothricin, chloroamphenicol, tetracycline, triclosan, ganciclovir, and any combination thereof), an auxotrophic marker (e.g. ade1, arg4, his4, ura3, met2, and any combination thereof). Exemplary terminator sequences are provided in Table 8.


In one example, a vector for expression in Pichia sp. can include an AOX1 promoter operably linked to a signal peptide (alpha mating factor) that is fused in frame with a nucleic acid sequence encoding a recombinant protein, and a terminator element (AOX1 terminator) immediately downstream of the nucleic acid sequence encoding a recombinant protein.


In another example, a vector comprising a DAS1 promoter is operably linked to a signal peptide (alpha mating factor) that is fused in frame with a nucleic acid sequence encoding a recombinant protein and a terminator element (AOX1 terminator) immediately downstream of a recombinant protein.


A recombinant protein described herein may be secreted from the one or more host cells. In some embodiments, a recombinant POI is secreted from the host cell. The secreted recombinant POI may be isolated and purified by methods such as centrifugation, fractionation, filtration, affinity purification and other methods for separating protein from cells, liquid and solid media components and other cellular products and byproducts. In some embodiments, a recombinant POI is produced in a Pichia Sp. and secreted from the host cells into the culture media. The secreted recombinant protein such as the POI is then separated from other media components for further use.


In some cases, multiple vectors comprising the gene sequence of a protein may be transfected into one or more host cells. A host cell may comprise more than one copy of the gene encoding the recombinant protein. A single host cell may comprise 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 copies of the recombinant POI or the fusion protein. A single host cell may comprise one or more vectors for the expression of the POI and/or the fusion protein. A single host cell may comprise 2, 3, 4, 5, 6, 7, 8, 9, or 10 vectors for the POI expression and/or the fusion protein expression. Each vector in the host cell may drive the expression of POI and/or the fusion protein using the same promoter. Alternatively, different promoters may be used in different vectors for POI and/or the fusion protein expression.


A recombinant protein such as the POI or the fusion protein may be recombinantly expressed in one or more host cells. As used herein, a “host” or “host cell” denotes here any protein production host selected or genetically modified to produce a desired product. Exemplary hosts include fungi, such as filamentous fungi, as well as bacteria, yeast, plant, insect, and mammalian cells. A host cell can be an organism that is approved as generally regarded as safe by the U.S. Food and Drug Administration.


In some embodiments, a host cell may be transformed to include one or more expression cassettes. As examples, a host cell may be transformed to express one expression cassette, two expression cassettes, three expression cassettes or more expression cassettes. In one example, a host cell is transformed express a first expression cassette that encodes a first POI and express a second expression cassette that encodes a second POI.


As used herein, a “host cell” refers to a cell which is capable of protein expression and optionally protein secretion. Such host cell is applied in the methods of the present invention. For that purpose, for the host cell to express a polypeptide, a nucleotide sequence encoding the polypeptide is present or introduced in the cell. Host cells provided by the present invention can be prokaryotes or eukaryotes. As will be appreciated by one of skill in the art, a prokaryotic cell lacks a membrane-bound nucleus, while a eukaryotic cell has a membrane-bound nucleus. Examples of eukaryotic cells include, but are not limited to, vertebrate cells, mammalian cells, human cells, animal cells, invertebrate cells, plant cells, nematodal cells, insect cells, stem cells, fungal cells or yeast cells.


Examples of yeast cells that may be transformed to include one or more expression cassettes include but are not limited to the Saccharomyces genus (e.g. Saccharomyces cerevisiae, Saccharomyces kluyveri, Saccharomyces uvarum), the Komagataella genus (Komagataella pastoris, Komagataella pseudopastoris or Komagataella phaffii), Kluyveromyces genus (e.g. Kluyveromyces lactis, Kluyveromyces mandanus), the Candida genus (e.g. Candida utilis, Candida cacaoi, the Geotrichum genus (e.g. Geotrichum fermentans), as well as Hansenula polymorpha and Yarrowia lipolytica. A host cell may also be a member of the following species: Arxula spp., Arxula adeninivorans, Kluyveromyces spp., Kluyveromyces lactis, Komagataella phaffii, Pichia spp., Pichia angusta, Pichia pastoris, Saccharomyces spp., Saccharomyces cerevisiae, Schizosaccharomyces spp., Schizosaccharomyces pombe, Yarrowia spp., Yarrowia lipolytica, Agaricus spp., Agaricus bisporus, Aspergillus spp., Aspergillus awamori, Aspergillus fumigatus, Aspergillus nidulans, Aspergillus niger, Aspergillus oryzae, Bacillus subtilis, Colletotrichum spp., Colletotrichum gloeosporiodes, Endothia spp., Endothia parasitica, Escherichia coli, Fusarium spp., Fusarium graminearum, Fusarium solani, Mucor spp., Mucor miehei, Mucor pusillus, Myceliophthora spp., Myceliophthora thermophila, Neurospora spp., Neurospora crassa, Penicillium spp., Penicillium camemberti, Penicillium canescens, Penicillium chrysogenum, Penicillium (Talaromyces) emersonii, Penicillium funiculo sum, Penicillium purpurogenum, Penicillium roqueforti, Pleurotus spp., Pleurotus ostreatus, Rhizomucor spp., Rhizomucor miehei, Rhizomucor pusillus, Rhizopus spp., Rhizopus arrhizus, Rhizopus oligosporus, Rhizopus oryzae, Trichoderma spp., Trichoderma altroviride, Trichoderma reesei, or Trichoderma vireus.


The genus Pichia is of particular interest. Pichia comprises a number of species, including the species Pichia pastoris, Pichia methanolica, Pichia kluyveri, and Pichia angusta. Most preferred is the species Pichia pastoris.


The former species Pichia pastoris has been divided and renamed to Komagataella pastoris and Komagataella phaffii. Therefore, Pichia pastoris is synonymous for both Komagataella pastoris and Komagataella phaffii.


In some embodiments, the host cell is a Pichia pastoris, Hansenula polymorpha, Trichoderma reesei, Saccharomyces cerevisiae, Kluyveromyces lactis, Yarrowia lipolytica, Pichia methanolica, Candida boidinii, and Komagataella, and Schizosaccharomyces pombe.


The term “sequence identity” as used herein in the context of amino acid sequences is defined as the percentage of amino acid residues in a candidate sequence that are identical with the amino acid residues in a selected sequence, after aligning the sequences and introducing gaps, if necessary, to achieve the maximum percent sequence identity, and not considering any conservative substitutions as part of the sequence identity. Alignment for purposes of determining percent amino acid sequence identity can be achieved in various ways that are within the skill in the art, for instance, using publicly available computer software such as BLAST, BLAST-2, ALIGN, ALIGN-2 or Megalign (DNASTAR) software. Those skilled in the art can determine appropriate parameters for measuring alignment, including any algorithms needed to achieve maximal alignment over the full-length of the sequences being compared.


In some embodiments, an engineered host cell expressing a recombinant protein such as the POI or the fusion protein may have a growth rate in a media containing fructose as a primary carbon source is higher than a growth rate of a control host cell, wherein the control host cell is identical to the engineered host cell, except the control cell does not express the recombinant protein such as the POI or the fusion protein.


In some embodiments, an engineered host cell expressing a recombinant protein such as the POI or the fusion protein may have a growth rate in a media containing maltose as a primary carbon source is higher than a growth rate of a control host cell, wherein the control host cell is identical to the engineered host cell, except the control cell does not express the recombinant protein such as the POI or the fusion protein.


In some embodiments, an engineered host cell expressing a recombinant protein such as the POI or the fusion protein may have a growth rate in a media containing high fructose corn syrup as a primary carbon source is higher than a growth rate of a control host cell, wherein the control host cell is identical to the engineered host cell, except the control cell does not express the recombinant protein such as the POI or the fusion protein.


In some embodiments, an engineered host cell expressing a recombinant protein such as the POI or the fusion protein may have a growth rate in a media containing molasses as a primary carbon source is higher than a growth rate of a control host cell, wherein the control host cell is identical to the engineered host cell, except the control cell does not express the recombinant protein such as the POI or the fusion protein.


In some embodiments, an engineered host cell expressing a recombinant protein such as the POI or the fusion protein may have a growth rate in a media containing a disaccharide as a primary carbon source is higher than a growth rate of a control host cell, wherein the control host cell is identical to the engineered host cell, except the control cell does not express the recombinant protein such as the POI or the fusion protein.


In some embodiments, an engineered host cell expressing a recombinant protein such as the POI or the fusion protein may have a growth rate in a media containing a mixture of glucose and a disaccharide as a primary carbon source is higher than a growth rate of a control host cell, wherein the control host cell is identical to the engineered host cell, except the control cell does not express the recombinant protein such as the POI or the fusion protein.


In some embodiments, an engineered host cell expressing a recombinant protein such as the POI or the fusion protein may have a growth rate in a media containing a carbon source that is not glucose as a primary carbon source is higher than a growth rate of a control host cell, wherein the control host cell is identical to the engineered host cell, except the control cell does not express the recombinant protein such as the POI or the fusion protein.









TABLE 1







Anchoring proteins









Sequence
SEQ ID



Info
NO:
Amino acid sequence





Tir4 from
SEQ ID NO:
QINELNVVLDDVKTNIADYITLSYTPNSGFSLDQMPAGIMDIAAQLVANPSDDSYTTLYSE



Saccharomyces

1
VDFSAVEHMLTMVPWYSSRLLPELEAMDASLTTSSSAATSSSEVASSSIASSTSSSVAPSSS



cerevisiae


EVVSSSVASSSSEVASSSVASTSEATSSSAVTSSSAVSSSTESVSSSSVSSSSAVSSSEAVSSS




PVSSVVSSSAGPASSSVAPYNSTIASSSSTAQTSISTIAPYNSTTTTTPASSASSVIISTRNGTT




VTETDNTLVTKETTVCDYSSTSAVPASTTGYNNSTKVSTATICSTCKEGTSTATDFSTLKT




TVTVCDSACQAKKSATVVSVQSKTTGIVEQTENGAAKAVIGMGAGALAAVAAMLL





Tir4 from
SEQ ID NO:
QINELNVVLDDVKTNIADYITLSYTPNSGFSLDQMPAGIMDIAAQLVANPSDDSYTTLYSE



Saccharomyces

320
VDFSAVEHMLTMVPWYSSRLLPELEAMDASLTTSSSAATSSSEVASSSIASSTSSSVAPSSS



cerevisiae


EVVSSSVASSSSEVASSSVASTSEATSSSAVTSSSAVSSSTESVSSSSVSSSSAVSSSEAVSSS




PVSSVVSSSAGPASSSVAPYNSTIASSSSTAQTSISTIAPYNSTTTTTPASSASSVIISTRNGTT




VTETDNTLVTKETTVCDYSSTSAVPASTTGYNNSTKVSTATICSTCKEGTSTATDFSTLKT




TVTVCDSACQAKKSATVVSVQSKTTGIVEQTEN





Tir4 from
SEQ ID NO:

MAYSKITLLAALAAIAYAQTQAQINELNVVLDDVKTNIADYITLSYTPNSGFSLDQMPAG




Saccharomyces

2
IMDIAAQLVANPSDDSYTTLYSEVDFSAVEHMLTMVPWYSSRLLPELEAMDASLTTSSSA



cerevisiae


ATSSSEVASSSIASSTSSSVAPSSSEVVSSSVASSSSEVASSSVASTSEATSSSAVTSSSAVSS


(underlined is

STESVSSSSVSSSSAVSSSEAVSSSPVSSVVSSSAGPASSSVAPYNSTIASSSSTAQTSISTIAP


signal peptide, may

YNSTTTTTPASSASSVIISTRNGTTVTETDNTLVTKETTVCDYSSTSAVPASTTGYNNSTK


or may not be

VSTATICSTCKEGTSTATDFSTLKTTVTVCDSACQAKKSATVVSVQSKTTGIVEQTENGA


utilized in design)

AKAVIGMGAGALAAVAAMLL





Tir4 from
SEQ ID NO:
QINELNVVLDDVKTNIADYITLSYTPNSGFSLDQMPAGIMDIAAQLVANPSDDSYTTLYSE



Saccharomyces

320
VDFSAVEHMLTMVPWYSSRLLPELEAMDASLTTSSSAATSSSEVASSSIASSTSSSVAPSSS



cerevisiae


EVVSSSVASSSSEVASSSVASTSEATSSSAVTSSSAVSSSTESVSSSSVSSSSAVSSSEAVSSS




PVSSVVSSSAGPASSSVAPYNSTIASSSSTAQTSISTIAPYNSTTTTTPASSASSVIISTRNGTT




VTETDNTLVTKETTVCDYSSTSAVPASTTGYNNSTKVSTATICSTCKEGTSTATDFSTLKT




TVTVCDSACQAKKSATVVSVQSKTTGIVEQTEN





Tir4
SEQ ID NO:
QINELNVVLDDVKTNIADYITLSYTPNSGFSLDQMPAGIMDIAAQLVANPSDDSYTTLYSE


(NP_014652.1)
3
VDFSAVEHMLTMVPWYSSRLLPELEAMDASLTTSSSAATSSSEVASSSIASSTSSSVAPSSS


from

EVVSSSVAPSSSEVVSSSVAPSSSEVVSSSVASSSSEVASSSVAPSSSEVVSSSVASSSSEVA



Saccharomyces


SSSVAPSSSEVVSSSVAPSSSEVVSSSVASSSSEVASSSVAPSSSEVVSSSVASSTSEATSSSA



cerevisiae


VTSSSAVSSSTESVSSSSVSSSSAVSSSEAVSSSPVSSVVSSSAGPASSSVAPYNSTIASSSST




AQTSISTIAPYNSTTTTTPASSASSVIISTRNGTTVTETDNTLVTKETTVCDYSSTSAVPAST




TGYNNSTKVSTATICSTCKEGTSTATDFSTLKTTVTVCDSACQAKKSATVVSVQSKTTGI




VEQTENGAAKAVIGMGAGALAAVAAMLL





Tir4
SEQ ID NO:

MAYSKITLLAALAALAYAQTQAQINELNVVLDDVKTNIADYITLSYTPNSGFSLDQMPAG



(NP_014652.1)
4
IMDIAAQLVANPSDDSYTTLYSEVDFSAVEHMLTMVPWYSSRLLPELEAMDASLTTSSSA


from

ATSSSEVASSSIASSTSSSVAPSSSEVVSSSVAPSSSEVVSSSVAPSSSEVVSSSVASSSSEVA



Saccharomyces


SSSVAPSSSEVVSSSVASSSSEVASSSVAPSSSEVVSSSVAPSSSEVVSSSVASSSSEVASSSV



cerevisiae


APSSSEVVSSSVASSTSEATSSSAVTSSSAVSSSTESVSSSSVSSSSAVSSSEAVSSSPVSSVV


(underlined is

SSSAGPASSSVAPYNSTIASSSSTAQTSISTIAPYNSTTTTTPASSASSVIISTRNGTTVTETD


signal peptide, may

NTLVTKETTVCDYSSTSAVPASTTGYNNSTKVSTATICSTCKEGTSTATDFSTLKTTVTVC


or may not be

DSACQAKKSATVVSVQSKTTGIVEQTENGAAKAVIGMGAGALAAVAAMLL


utilized in design)







Tir4
SEQ ID NO:
QINELNVVLDDVKTNIADYITLSYTPNSGFSLDQMPAGIMDIAAQLVANPSDDSYTTLYSE


(NP_014652.1)
321
VDFSAVEHMLTMVPWYSSRLLPELEAMDASLTTSSSAATSSSEVASSSIASSTSSSVAPSSS


from

EVVSSSVAPSSSEVVSSSVAPSSSEVVSSSVASSSSEVASSSVAPSSSEVVSSSVASSSSEVA



Saccharomyces


SSSVAPSSSEVVSSSVAPSSSEVVSSSVASSSSEVASSSVAPSSSEVVSSSVASSTSEATSSSA



cerevisiae


VTSSSAVSSSTESVSSSSVSSSSAVSSSEAVSSSPVSSVVSSSAGPASSSVAPYNSTIASSSST


(without C-

AQTSISTIAPYNSTTTTTPASSASSVIISTRNGTTVTETDNTLVTKETTVCDYSSTSAVPAST


terminus of Tir4

TGYNNSTKVSTATICSTCKEGTSTATDFSTLKTTVTVCDSACQAKKSATVVSVQSKTTGI


GPI anchor or

VEQTEN


signal peptide or




signal peptide)







Dan1 from
SEQ ID NO:
ASVTTTLSPYDERVNLIELAVYVSDIGAHLSEYYAFQALHKTETYPPEIAKAVFAGGDFTT



Saccharomyces

5
MLTGISGDEVTRMITGVPWYSTRLMGAISEALANEGIATAVPASTTEASSTSTSEASSAAT



cerevisiae


ESSSSSESSAETSSNAASTQATVSSESSSAASTIASSAESSVASSVASSVASSASFANTTAPV




SSTSSISVTPVVQNGTDSTVTKTQASTVETTITSCSNNVCSTVTKPVSSKAQSTATSVTSSA




SRVIDVTTNGANKFNNGVFGAAAIAGAAALLL





Dan1 from
SEQ ID NO:

MSRISILAVAAALVASATAASVTTTLSPYDERVNLIELAVYVSDIGAHLSEYYAFQALHK




Saccharomyces

6
TETYPPEIAKAVFAGGDFTTMLTGISGDEVTRMITGVPWYSTRLMGAISEALANEGIATA



cerevisiae


VPASTTEASSTSTSEASSAATESSSSSESSAETSSNAASTQATVSSESSSAASTIASSAESSV


(underlined is

ASSVASSVASSASFANTTAPVSSTSSISVTPVVQNGTDSTVTKTQASTVETTITSCSNNVCS


signal peptide, may

TVTKPVSSKAQSTATSVTSSASRVIDVTTNGANKFNNGVFGAAAIAGAAALLL


or may not be




utilized in design)







Dan4 from
SEQ ID NO:
ITATTTLSPYDERVNLIELAVYVSDIRAHIFQYYSFRNHHKTETYPSEIAAAVFDYGDFTTR



Saccharomyces

7
LTGISGDEVTRMITGVPWYSTRLKPAISSALSKDGIYTAIPTSTSTTTTKSSTSTTPTTTITST



cerevisiae


TSTTSTTPTTSTTSTTPTTSTTSTTPTTSTTSTTPTTSTTSTTPTTSTTSTTPTTSTTSTTPTTST




TSTTPTTSTTSTTPTTSTTPTTSTTSTTSQTSTKSTTPTTSSTSTTPTTSTTPTTSTTSTAPTTS




TTSTTSTTSTISTAPTTSTTSSTFSTSSASASSVISTTATTSTTFASLTTPATSTASTDHTTSSV




STTNAFTTSATTTTTSDTYISSSSPSQVTSSAEPTTVSEVTSSVEPTRSSQVTSSAEPTTVSEF




TSSVEPTRSSQVTSSAEPTTVSEFTSSVEPTRSSQVTSSAEPTTVSEFTSSVEPTRSSQVTSSA




EPTTVSEFTSSVEPTRSSQVTSSAEPTTVSEFTSSVEPIRSSQVTSSAEPTTVSEVTSSVEPIRS




SQVTTTEPVSSFGSTFSEITSSAEPLSFSKATTSAESISSNQITISSELIVSSVITSSSEIPSSIEVL




TSSGISSSVEPTSLVGPSSDESISSTESLSATSTFTSAVVSSSKAADFFTRSTVSAKSDVSGNS




STQSTTFFATPSTPLAVSSTVVTSSTDSVSPNIPFSEISSSPESSTAITSTSTSFIAERTSSLYLS




SSNMSSFTLSTFTVSQSIVSSFSMEPTSSVASFASSSPLLVSSRSNCSDARSSNTISSGLFSTIE




NVRNATSTFTNLSTDEIVITSCKSSCTNEDSVLTKTQVSTVETTITSCSGGICTTLMSPVTTI




NAKANTLTTTETSTVETTITTCPGGVCSTLTVPVTTITSEATTTATISCEDNEEDITSTETEL




LTLETTITSCSGGICTTLMSPVTTINAKANTLTTTETSTVETTITTCSGGVCSTLTVPVTTITS




EATTTATISCEDNEEDVASTKTELLTMETTITSCSGGICTTLMSPVSSFNSKATTSNNAESTI




PKAIKVSCSAGACTTLTTVDAGISMFTRTGLSITQTTVTNCSGGTCTMLTAPIATATSKVIS




PIPKASSATSIAHSSASYTVSINTNGAYNFDKDNIFGTAIVAVVALLLL





Dan4 from
SEQ ID NO:

MVNISIVAGIVALATSAAAITATTTLSPYDERVNLIELAVYVSDIRAHIFQYYSFRNHHKTE




Saccharomyces

8
TYPSEIAAAVFDYGDFTTRLTGISGDEVTRMITGVPWYSTRLKPAISSALSKDGIYTAIPTS



cerevisiae


TSTTTTKSSTSTTPTTTITSTTSTTSTTPTTSTTSTTPTTSTTSTTPTTSTTSTTPTTSTTSTTPT


(underlined is

TSTTSTTPTTSTTSTTPTTSTTSTTPTTSTTSTTPTTSTTPTTSTTSTTSQTSTKSTTPTTSSTST


signal peptide, may

TPTTSTTPTTSTTSTAPTTSTTSTTSTTSTISTAPTTSTTSSTFSTSSASASSVISTTATTSTTFA


or may not be

SLTTPATSTASTDHTTSSVSTTNAFTTSATTTTTSDTYISSSSPSQVTSSAEPTTVSEVTSSV


utilized in design)

EPTRSSQVTSSAEPTTVSEFTSSVEPTRSSQVTSSAEPTTVSEFTSSVEPTRSSQVTSSAEPTT




VSEFTSSVEPTRSSQVTSSAEPTTVSEFTSSVEPTRSSQVTSSAEPTTVSEFTSSVEPIRSSQV




TSSAEPTTVSEVTSSVEPIRSSQVTTTEPVSSFGSTFSEITSSAEPLSFSKATTSAESISSNQITI




SSELIVSSVITSSSEIPSSIEVLTSSGISSSVEPTSLVGPSSDESISSTESLSATSTFTSAVVSSSK




AADFFTRSTVSAKSDVSGNSSTQSTTFFATPSTPLAVSSTVVTSSTDSVSPNIPFSEISSSPES




STAITSTSTSFIAERTSSLYLSSSNMSSFTLSTFTVSQSIVSSFSMEPTSSVASFASSSPLLVSS




RSNCSDARSSNTISSGLFSTIENVRNATSTFTNLSTDEIVITSCKSSCTNEDSVLTKTQVSTV




ETTITSCSGGICTTLMSPVTTINAKANTLTTTETSTVETTITTCPGGVCSTLTVPVTTITSEA




TTTATISCEDNEEDITSTETELLTLETTITSCSGGICTTLMSPVTTINAKANTLTTTETSTVET




TITTCSGGVCSTLTVPVTTITSEATTTATISCEDNEEDVASTKTELLTMETTITSCSGGICTT




LMSPVSSFNSKATTSNNAESTIPKAIKVSCSAGACTTLTTVDAGISMFTRTGLSITQTTVTN




CSGGTCTMLTAPIATATSKVISPIPKASSATSIAHSSASYTVSINTNGAYNFDKDNIFGTAIV




AVVALLLL





Sag1 from
SEQ ID NO:
ININDITFSNLEITPLTANKQPDQGWTATFDFSIADASSIREGDEFTLSMPHVYRIKLLNSSQ



Saccharomyces

9
TATISLADGTEAFKCYVSQQAAYLYENTTFTCTAQNDLSSYNTIDGSITFSLNFSDGGSSY



cerevisiae


EYELENAKFFKSGPMLVKLGNQMSDVVNFDPAAFTENVFHSGRSTGYGSFESYHLGMY




CPNGYFLGGTEKIDYDSSNNNVDLDCSSVQVYSSNDFNDWWFPQSYNDTNADVTCFGS




NLWITLDEKLYDGEMLWVNALQSLPANVNTIDHALEFQYTCLDTIANTTYATQFSTTREF




IVYQGRNLGTASAKSSFISTTTTDLTSINTSAYSTGSISTVETGNRTTSEVISHVVTTSTKLS




PTATTSLTIAQTSIYSTDSNITVGTDIHTTSEVISDVETISRETASTVVAAPTSTTGWTGAMN




TYISQFTSSSFATINSTPIISSSAVFETSDASIVNVHTENITNTAAVPSEEPTFVNATRNSLNS




FCSSKQPSSPSSYTSSPLVSSLSVSKTLLSTSFTPSVPTSNTYIKTKNTGYFEHTALTTSSVG




LNSFSETAVSSQGTKIDTFLVSSLIAYPSSASGSQLSGIQQNFTSTSLMISTYEGKASIFFSAE




LGSIIFLLLSYLLF





Sag1 from
SEQ ID NO:

MFTFLKIILWLFSLALASAININDITFSNLEITPLTANKQPDQGWTATFDFSIADASSIREGD




Saccharomyces

10
EFTLSMPHVYRIKLLNSSQTATISLADGTEAFKCYVSQQAAYLYENTTFTCTAQNDLSSY



cerevisiae


NTIDGSITFSLNFSDGGSSYEYELENAKFFKSGPMLVKLGNQMSDVVNFDPAAFTENVFH


(underlined is

SGRSTGYGSFESYHLGMYCPNGYFLGGTEKIDYDSSNNNVDLDCSSVQVYSSNDFNDW


signal peptide, may

WFPQSYNDTNADVTCFGSNLWITLDEKLYDGEMLWVNALQSLPANVNTIDHALEFQYT


or may not be

CLDTIANTTYATQFSTTREFIVYQGRNLGTASAKSSFISTTTTDLTSINTSAYSTGSISTVET


utilized in design)

GNRTTSEVISHVVTTSTKLSPTATTSLTIAQTSIYSTDSNITVGTDIHTTSEVISDVETISRET




ASTVVAAPTSTTGWTGAMNTYISQFTSSSFATINSTPIISSSAVFETSDASIVNVHTENITNT




AAVPSEEPTFVNATRNSLNSFCSSKQPSSPSSYTSSPLVSSLSVSKTLLSTSFTPSVPTSNTYI




KTKNTGYFEHTALTTSSVGLNSFSETAVSSQGTKIDTFLVSSLIAYPSSASGSQLSGIQQNF




TSTSLMISTYEGKASIFFSAELGSIIFLLLSYLLF





Fig2 from
SEQ ID NO:
QIVFYQNSSTSLPVPTLVSTSIADFHESSSTGEVQYSSSYSYVQPSIDSFTSSSFLTSFEAPTE



Saccharomyces

11
TSSSYAVSSSLITSDTFSSYSDIFDEETSSLISTSAASSEKASSTLSSTAQPHRTSHSSSSFELP



cerevisiae


VTAPSSSSLPSSTSLTFTSVNPSQSWTSFNSEKSSALSSTIDFTSSEISGSTSPKSLESFDTTGT




ITSSYSPSPSSKNSNQTSLLSPLEPLSSSSGDLILSSTIQATTNDQTSKTIPTLVDATSSLPPTL




RSSSMAPTSGSDSISHNFTSPPSKTSGNYDVLTSNSIDPSLFTTTSEYSSTQLSSLNRASKSE




TVNFTASIASTPFGTDSATSLIDPISSVGSTASSFVGISTANFSTQGNSNYVPESTASGSSQY




QDWSSSSLPLSQTTWVVINTTNTQGSVTSTTSPAYVSTATKTVDGVITEYVTWCPLTQTK




SQAIGVSSSISSVPQASSFSGSSILSSNSSTLAASNNVPESTASGSSQYQDWSSSSLPLSQTT




WVVINTTNTQGSVTSTTSPAYVSTATKTVDGVITEYVTWCPLTQTKSQAIGISSSTISATQ




TSKPSSILTLGISTLQLSDATFKGTETINTHLMTESTSITEPTYFSGTSDSFYLCTSEVNLASS




LSSYPNFSSSEGSTATITNSTVTFGSTSKYPSTSVSNPTEASQHVSSSVNSLTDFTSNSTETI




AVISNIHKTSSNKDYSLTTTQLKTSGMQTLVLSTVTTTVNGAATEYTTWCPASSIAYTTSI




SYKTLVLTTEVCSHSECTPTVITSVTATSSTIPLLSTSSSTVLSSTVSEGAKNPAASEVTINT




QVSATSEATSTSTQVSATSATATASESSTTSQVSTASETISTLGTQNFTTTGSLLFPALSTE




MINTTVVSRKTLIISTEVCSHSKCVPTVITEVVTSKGTPSNGHSSQTLQTEAVEVTLSSHQT




VTMSTEVCSNSICTPTVITSVQMRSTPFPYLTSSTSSSSLASTKKSSLEASSEMSTFSVSTQS




LPLAFTSSEKRSTTSVSQWSNTVLTNTIMSSSSNVISTNEKPSSTTSPYNFSSGYSLPSSSTPS




QYSLSTATTTINGIKTVYTTWCPLAEKSTVAASSQSSRSVDRFVSSSKPSSSLSQTSIQYTL




STATTTISGLKTVYTTWCPLTSKSTLGATTQTSSTAKVRITSASSATSTSISLSTSTESESSSG




YLSKGVCSGTECTQDVPTQSSSPASTLAYSPSVSTSSSSSFSTTTASTLTSTHTSVPLLPSSS




SISASSPSSTSLLSTSLPSPAFTSSTLPTATAVSSSTFIASSLPLSSKSSLSLSPVSSSILMSQFSS




SSSSSSSLASLPSLSISPTVDTVSVLQPTTSIATLTCTDSQCQQEVSTICNGSNCDDVTSTAT




TPPSTVTDTMTCTGSECQKTTSSSCDGYSCKVSETYKSSATISACSGEGCQASATSELNSQ




YVTMTSVITPSAITTTSVEVHSTESTISITTVKPVTYTSSDTNGELITITSSSQTVIPSVTTIITR




TKVAITSAPKPTTTTYVEQRLSSSGIATSFVAAASSTWITTPIVSTYAGSASKFLCSKFFMI




MVMVINFI





Fig2 from
SEQ ID NO:

MNSFASLGLIYSVVNLLTRVEAQIVFYQNSSTSLPVPTLVSTSIADFHESSSTGEVQYSSSY




Saccharomyces

12
SYVQPSIDSFTSSSFLTSFEAPTETSSSYAVSSSLITSDTFSSYSDIFDEETSSLISTSAASSEKA



cerevisiae


SSTLSSTAQPHRTSHSSSSFELPVTAPSSSSLPSSTSLTFTSVNPSQSWTSFNSEKSSALSSTI


(underlined is

DFTSSEISGSTSPKSLESFDTTGTITSSYSPSPSSKNSNQTSLLSPLEPLSSSSGDLILSSTIQAT


signal peptide, may

TNDQTSKTIPTLVDATSSLPPTLRSSSMAPTSGSDSISHNFTSPPSKTSGNYDVLTSNSIDPS


or may not be

LFTTTSEYSSTQLSSLNRASKSETVNFTASIASTPFGTDSATSLIDPISSVGSTASSFVGISTA


utilized in design)

NFSTQGNSNYVPESTASGSSQYQDWSSSSLPLSQTTWVVINTTNTQGSVTSTTSPAYVST




ATKTVDGVITEYVTWCPLTQTKSQAIGVSSSISSVPQASSFSGSSILSSNSSTLAASNNVPES




TASGSSQYQDWSSSSLPLSQTTWVVINTTNTQGSVTSTTSPAYVSTATKTVDGVITEYVT




WCPLTQTKSQAIGISSSTISATQTSKPSSILTLGISTLQLSDATFKGTETINTHLMTESTSITEP




TYFSGTSDSFYLCTSEVNLASSLSSYPNFSSSEGSTATITNSTVTFGSTSKYPSTSVSNPTEA




SQHVSSSVNSLTDFTSNSTETIAVISNIHKTSSNKDYSLTTTQLKTSGMQTLVLSTVTTTVN




GAATEYTTWCPASSIAYTTSISYKTLVLTTEVCSHSECTPTVITSVTATSSTIPLLSTSSSTV




LSSTVSEGAKNPAASEVTINTQVSATSEATSTSTQVSATSATATASESSTTSQVSTASETIS




TLGTQNFTTTGSLLFPALSTEMINTTVVSRKTLIISTEVCSHSKCVPTVITEVVTSKGTPSNG




HSSQTLQTEAVEVTLSSHQTVTMSTEVCSNSICTPTVITSVQMRSTPFPYLTSSTSSSSLAST




KKSSLEASSEMSTFSVSTQSLPLAFTSSEKRSTTSVSQWSNTVLTNTIMSSSSNVISTNEKPS




STTSPYNFSSGYSLPSSSTPSQYSLSTATTTINGIKTVYTTWCPLAEKSTVAASSQSSRSVD




RFVSSSKPSSSLSQTSIQYTLSTATTTISGLKTVYTTWCPLTSKSTLGATTQTSSTAKVRITS




ASSATSTSISLSTSTESESSSGYLSKGVCSGTECTQDVPTQSSSPASTLAYSPSVSTSSSSSFS




TTTASTLTSTHTSVPLLPSSSSISASSPSSTSLLSTSLPSPAFTSSTLPTATAVSSSTFIASSLPL




SSKSSLSLSPVSSSILMSQFSSSSSSSSSLASLPSLSISPTVDTVSVLQPTTSIATLTCTDSQCQ




QEVSTICNGSNCDDVTSTATTPPSTVTDTMTCTGSECQKTTSSSCDGYSCKVSETYKSSAT




ISACSGEGCQASATSELNSQYVTMTSVITPSAITTTSVEVHSTESTISITTVKPVTYTSSDTN




GELITITSSSQTVIPSVTTIITRTKVAITSAPKPTTTTYVEQRLSSSGIATSFVAAASSTWITTP




IVSTYAGSASKFLCSKFFMIMVMVINFI





Sed1 from
SEQ ID NO:
QFSNSTSASSTDVTSSSSISTSSGSVTITSSEAPESDNGTSTAAPTETSTEAPTTAIPTNGTST



Saccharomyces

13
EAPTTAIPTNGTSTEAPTDTTTEAPTTALPTNGTSTEAPTDTTTEAPTTGLPTNGTTSAFPPT



cerevisiae


TSLPPSNTTTTPPYNPSTDYTTDYTVVTEYTTYCPEPTTFTTNGKTYTVTEPTTLTITDCPC




TIEKPTTTSTTEYTVVTEYTTYCPEPTTFTTNGKTYTVTEPTTLTITDCPCTIEKSEAPESSV




PVTESKGTTTKETGVTTKQTTANPSLTVSTVVPVSSSASSHSVVINSNGANVVVPGALGL




AGVAMLFL





Sed1 from
SEQ ID NO:

MKLSTVLLSAGLASTTLAQFSNSTSASSTDVTSSSSISTSSGSVTITSSEAPESDNGTSTAAP




Saccharomyces

14
TETSTEAPTTAIPTNGTSTEAPTTAIPTNGTSTEAPTDTTTEAPTTALPTNGTSTEAPTDTTT



cerevisiae


EAPTTGLPTNGTTSAFPPTTSLPPSNTTTTPPYNPSTDYTTDYTVVTEYTTYCPEPTTFTTN


(underlined is

GKTYTVTEPTTLTITDCPCTIEKPTTTSTTEYTVVTEYTTYCPEPTTFTTNGKTYTVTEPTT


signal peptide, may

LTITDCPCTIEKSEAPESSVPVTESKGTTTKETGVTTKQTTANPSLTVSTVVPVSSSASSHS


or may not be

VVINSNGANVVVPGALGLAGVAMLFL


utilized in design)
















TABLE 2







Carbon utilization proteins









Sequence
SEQ ID



Info
NO:
Amino acid sequences






Saccharomyces

SEQ ID NO: 15
SMTNETSDRPLVHFTPNKGWMNDPNGLWYDEKDAKWHLYFQYNPNDTVWGTPLFWGHATSD



cerevisiae


DLTNWEDQPIAIAPKRNDSGAFSGSMVVDYNNTSGFFNDTIDPRQRCVAIWTYNTPESEEQYISY


SUC2

SLDGGYTFTEYQKNPVLAANSTQFRDPKVFWYEPSQKWIMTAAKSQDYKIEIYSSDDLKSWKLE


(without

SAFANEGFLGYQYECPGLIEVPTEQDPSKSYWVMFISINPGAPAGGSFNQYFVGSFNGTHFEAFD


peptides

NQSRVVDFGKDYYALQTFFNTDPTYGSALGIAWASNWEYSAFVPTNPWRSSMSLVRKFSLNTE


that are

YQANPETELINLKAEPILNISNAGPWSRFATNTTLTKANSYNVDLSNSTGTLEFELVYAVNTTQTI


cleaved

SKSVFADLSLWFKGLEDPEEYLRMGFEVSASSFFLDRGNSKVKFVKENPYFTNRMSVNNQPFKS


off post-

ENDLSYYKVYGLLDQNILELYFNDGDVVSTNTYFMTTGNALGSVNMTTGVDNLFYIDKFQVRE


translationally)

VK






Saccharomyces

SEQ ID NO: 16
MLLQAFLFLLAGFAAKISASMTNETSDRPLVHFTPNKGWMNDPNGLWYDEKDAKWHLYFQYN



cerevisiae


PNDTVWGTPLFWGHATSDDLTNWEDQPIAIAPKRNDSGAFSGSMVVDYNNTSGFFNDTIDPRQR


SUC2

CVAIWTYNTPESEEQYISYSLDGGYTFTEYQKNPVLAANSTQFRDPKVFWYEPSQKWIMTAAKS


(including

QDYKIEIYSSDDLKSWKLESAFANEGFLGYQYECPGLIEVPTEQDPSKSYWVMFISINPGAPAGGS


peptides

FNQYFVGSFNGTHFEAFDNQSRVVDFGKDYYALQTFFNTDPTYGSALGIAWASNWEYSAFVPT


that are

NPWRSSMSLVRKFSLNTEYQANPETELINLKAEPILNISNAGPWSRFATNTTLTKANSYNVDLSN


cleaved

STGTLEFELVYAVNTTQTISKSVFADLSLWFKGLEDPEEYLRMGFEVSASSFFLDRGNSKVKFVK


off post-

ENPYFTNRMSVNNQPFKSENDLSYYKVYGLLDQNILELYFNDGDVVSTNTYFMTTGNALGSVN


translationally)

MTTGVDNLFYIDKFQVREVK


UniProt




KB -




P00724




(INV2_




YEAST)








Pichia

SEQ ID NO: 17
MTIESQEPWWKSAVVYQVWPASFKDSNGDGIGDLNGITSELDHIKSLGTDVIWLSPHYASPLDD



angusta


MGYDISDYNAINPQFGTMEDMDRLLAEIKKRDMRLILDLVINHTSSEHAWFKESRSSRDNPKRD


MAL1

WYIWKDNANNWLSFFSGSAWSYDEKTKQYYLRLFAETQPDLNWENPKTREAIYKSALEFWYE


(including

KGVSGFRIDTAGLYSKVQTFEDAPVTFPGEKYQPAGPLINSGPRIHEFHKEMYEKVTSRYDAMTV


peptides

GEVGHCSKADALKYVSAKEKEMNMMFLFDTVDVGSDKSDRFRYKGFTLTDFKDAIINQSNFIFD


that are

DETGELNDAWSTVFIENHDQPRCVTRFGNTSNKLFWSRSAKMLALLQTTLTGTLFVYQGQEIGM


cleaved

TNVSPKWDISEYLDINTINYWNAFNETEHSDEEKAELLKIINLLARDNARTPVQWDSSENGGFGG


off post-

KPWMRINDNYKDINVASQKEDPDSVLNFYRNAIKTRKHYSETLIFGRFEVQDYDNQEIFYYTKTS


translationally)

NKGQKKMAVVLNFTDREVEYPIPQGKLLLSNIANNITGKLQPYEGRLIEVN


UniProt




KB -




Q9P8G8




(Q9P8G8_




PICAN)








Saccharomyces

SEQ ID NO: 322
MLLQAFLFLLAGFAAKISASMTNETSDRPLVHFTPNKGWMNDPNGLWYDAKEGKWHLYFQYN



cerevisiae


PNDTVWGLPLFWGHATSDDLTHWQDEPVAIAPKRKDSGAYSGSMVIDYNNTSGFFNDTIDPRQ


SUC1

RCVAIWTYNTPESEEQYISYSLDGGYTFTEYQKNPVLAANSTQFRDPKVFWYEPSKKWIMTAAK


(invertase 1)

SQDYKIEIYSSDDLKSWKLESAFANEGFLGYQYECPGLIEVPSEQDPSKSHWVMFISINPGAPAGG


Unitprot

SFNQYFVGSFNGHHFEAFDNQSRVVDFGKDYYALQTFFNTDPTYGSALGIAWASNWEYSAFVPS


Accession:

NPWRSSMSLVRPFSLNTEYQANPETELINLKAEPILNISSAGPWSRFATNTTLTKANSYNVDLSNS


P10594

TGTLEFELVYAVNTTQTISKSVFADLSLWFKGLEDPEEYLRMGFEVSASSFFLDRGNSKVKFVKE




NPYFTNRMSVNNQPFKSENDLSYYKVYGLLDQNILELYFNDGDVVSTNTYFMTTGNALGSVNM




TTGVDNLFYIDKFQVREVK






Kluyveromyces

SEQ ID NO: 323
MLKLLSLMVPLASAAVIHRRDANISAIASEWNSTSNSSSSLSLNRPAVHYSPEEGWMNDPNGLW



lactis


YDAKEEDWHIYYQYYPDAPHWGLPLTWGHAVSKDLTVWDEQGVAFGPEFETAGAFSGSMVID


INV1

YNNTSGFFNSSTDPRQRVVAIWTLDYSGSETQQLSYSHDGGYTFTEYSDNPVLDIDSDAFRDPKV


(invertase)

FWYQGEDSESEGNWVMTVAEADRFSVLIYSSPDLKNWTLESNFSREGYLGYNYECPGLVKVPY


Unitprot

VKNTTYASAPGSNITSSGPLHPNSTVSFSNSSSIAWNASSVPLNITLSNSTLVDETSQLEEVGYAW


Accession:

VMIVSFNPGSILGGSGTEYFIGDFNGTHFEPLDKQTRFLDLGKDYYALQTFFNTPNEVDVLGIAW


Q9Y746

ASNWQYANQVPTDPWRSSMSLVRNFTITEYNINSNTTALVLNSQPVLDFTSLRKNGTSYTLENLT




LNSSSHEVLEFEDPTGVFEFSLEYSVNFTGIHNWVFTDLSLYFQGDKDSDEYLRLGYEANSKQFF




LDRGHSNIPFVQENPFFTQRLSVSNPPSSNSSTFDVYGIVDRNIIELYFNNGTVTSTNTFFFSTGNNI




GSIIVKSGVDDVYEIESLKVNQFYVD






Cyberlindnera

SEQ ID NO: 324
MSLTKDASEDQEDIKSLTMNTSLVDSSIYRPLVHLTPPVGWMNDPNGLFYDSSESTYHVYYQYN



jadinii


PNDTIWGLPLYWGHATSDDLLTWDHHAPAIGPENDDEGIYSGSIVIDYDNTSGFFDDSTRPEQRI


INV1

VAIYTNNLPDVETQDIAYSTDGGYTFEKYENNPVIDVNSTQFRDPKVIWYEETEQWVMTVAKSQ


(invertase)

EYKIQIYTSDNLKDWSLASNFSTKGYVGYQYECPGLFEATIENPKSGDPEKKWVMVLAINPGSPL


Unitprot

GGSINEYFVGDFNGTEFIPDDDATRFMDTGKDFYAFQAFFNAPENRSIGVAWSSNWQYSNQVPD


Accession:

PDGYRSSMSSIREYTLRYVSTNPESEQLILCQKPFFVNETDLKVVEEYKVSNSSLTVDHTFGSSFA


O94224

NSNTTGLLDFNMTFTVNGTTDVTQKDSVTFELRIKSNQSDEAIALGYDYNNEQFYINRATESYFQ




RTNQFFQERWSTYVQPLTITESGDKQYQLYGLVDNNILELYFNDGAFTSTNTFFLEKGKPSNVDI




VASSSKEAYHRGPAD






Oryza

SEQ ID NO: 325
MELAVGAGGMRRSASHTSLSESDDFDLSRLLNKPRINVERQRSFDDRSLSDVSYSGGGHGGTRG



sativa


GFDGMYSPGGGLRSLVGTPASSALHSFEPHPIVGDAWEALRRSLVFFRGQPLGTIAAFDHASEEV



japonica


LNYDQVFVRDFVPSALAFLMNGEPEIVRHFLLKTLLLQGWEKKVDRFKLGEGAMPASFKVLHD


(rice)

SKKGVDTLHADFGESAIGRVAPVDSGFWWIILLRAYTKSTGDLTLAETPECQKGMRLILSLCLSE


CINV1

GFDTFPTLLCADGCCMIDRRMGVYGYPIEIQALFFMALRCALQLLKHDNEGKEFVERIATRLHAL


(invertase)

SYHMRSYYWLDFQQLNDIYRYKTEEYSHTAVNKFNVIPDSIPDWLFDFMPCQGGFFIGNVSPAR


Unitprot

MDFRWFALGNMIAILSSLATPEQSTAIMDLIEERWEELIGEMPLKICYPAIENHEWRIVTGCDPKN


Accession:

TRWSYHNGGSWPVLLWLLTAACIKTGRPQIARRAIDLAERRLLKDGWPEYYDGKLGRYVGKQA


Q69T31

RKFQTWSIAGYLVAKMMLEDPSHLGMISLEEDKAMKPVLKRSASWTN






Arabidopsis

SEQ ID NO: 326
MEGVGLRAVGSHCSLSEMDDLDLTRALDKPRLKIERKRSFDERSMSELSTGYSRHDGIHDSPRG



thaliana


RSVLDTPLSSARNSFEPHPMMAEAWEALRRSMVFFRGQPVGTLAAVDNTTDEVLNYDQVFVRD


Alkaline/

FVPSALAFLMNGEPDIVKHFLLKTLQLQGWEKRVDRFKLGEGVMPASFKVLHDPIRETDNIVAD


neutral

FGESAIGRVAPVDSGFWWIILLRAYTKSTGDLTLSETPECQKGMKLILSLCLAEGFDTFPTLLCAD


invertase

GCSMIDRRMGVYGYPIEIQALFFMALRSALSMLKPDGDGREVIERIVKRLHALSFHMRNYFWLD


CINV1

HQNLNDIYRFKTEEYSHTAVNKFNVMPDSIPEWVFDFMPLRGGYFVGNVGPAHMDFRWFALGN


INVA

CVSILSSLATPDQSMAIMDLLEHRWAELVGEMPLKICYPCLEGHEWRIVTGCDPKNTRWSYHNG


UnitProt

GSWPVLLWQLTAACIKTGRPQIARRAVDLIESRLHRDCWPEYYDGKLGRYVGKQARKYQTWSI


Accession

AGYLVAKMLLEDPSHIGMISLEEDKLMKPVIKRSASWPQL


No.:




Q9LQF2








Arabidopsis

SEQ ID NO: 327
MSAIYLLRKISTKTPSRFHRSLFFSTFSKDSPPDLSRTTSIRHLSSSQRFVSSSIYCFPQSKILPNRFSE



thaliana


KTTGISVRQFSTSVETNLSDKSFERIHVQSDAILERIHKNEEEVETVSIGSEKVVREESEAEKEAWR


Alkaline/

ILENAVVRYCGSPVGTVAANDPGDKMPLNYDQVFIRDFVPSALAFLLKGEGDIVRNFLLHTLQL


neutral

QSWEKTVDCYSPGQGLMPASFKVRTVALDENTTEEVLDPDFGESAIGRVAPVDSGLWWIILLRA


invertase

YGKITGDFSLQERIDVQTGIKLIMNLCLADGFDMFPTLLVTDGSCMIDRRMGIHGHPLEIQSLFYS


A,

ALRCSREMLSVNDSSKDLVRAINNRLSALSFHIREYYWVDIKKINEIYRYKTEEYSTDATNKFNIY


mitochondrial

PEQIPPWLMDWIPEQGGYLLGNLQPAHMDFRFFTLGNFWSIVSSLATPKQNEAILNLIEAKWDDII


INVE

GNMPLKICYPALEYDDWRIITGSDPKNTPWSYHNSGSWPTLLWQFTLACMKMGRPELAEKALA


UnitProt

VAEKRLLADRWPEYYDTRSGKFIGKQSRLYQTWTVAGFLTSKLLLANPEMASLLFWEEDYELL


Accession

DICACGLRKSDRKKCSRVAAKTQILVR


No.:




UnitProt




Accession




No.:




Q9FXA8








Arabidopsis

SEQ ID NO: 328
MAASETVLRVPLGSVSQSCYLASFFVNSTPNLSFKPVSRNRKTVRCTNSHEVSSVPKHSFHSSNS



thaliana


VLKGKKFVSTICKCQKHDVEESIRSTLLPSDGLSSELKSDLDEMPLPVNGSVSSNGNAQSVGTKSI


Alkaline/

EDEAWDLLRQSVVFYCGSPIGTIAANDPNSTSVLNYDQVFIRDFIPSGIAFLLKGEYDIVRNFILYT


neutral

LQLQSWEKTMDCHSPGQGLMPCSFKVKTVPLDGDDSMTEEVLDPDFGEAAIGRVAPVDSGLW


invertase

WIILLRAYGKCTGDLSVQERVDVQTGIKMILKLCLADGFDMFPTLLVTDGSCMIDRRMGIHGHP


E,

LEIQALFYSALVCAREMLTPEDGSADLIRALNNRLVALNFHIREYYWLDLKKINEIYRYQTEEYS


chloroplastic

YDAVNKFNIYPDQIPSWLVDFMPNRGGYLIGNLQPAHMDFRFFTLGNLWSIVSSLASNDQSHAIL


INVE

DFIEAKWAELVADMPLKICYPAMEGEEWRIITGSDPKNTPWSYHNGGAWPTLLWQLTVASIKM


UnitProt

GRPELAEKAVELAERRISLDKWPEYYDTKRARFIGKQARLYQTWSIAGYLVAKLLLANPAAAKF


Accession

LTSEEDSDLRNAFSCMLSANPRRTRGPKKAQQPFIV


No.:




Q9FK88








Oryza

SEQ ID NO: 329
MGVLGSRVAWAWLVQLLLLQQLAGASHVVYDDLELQAAATTADGVPPSIVDSELRTGYHFQPP



sativa


KNWINDPNAPMYYKGWYHLFYQYNPKGAVWGNIVWAHSVSRDLINWVALKPAIEPSIRADKY


japonica

GCWSGSATMMADGTPVIMYTGVNRPDVNYQVQNVALPRNGSDPLLREWVKPGHNPVIVPEGGI


(rice)

NATQFRDPTTAWRGADGHWRLLVGSLAGQSRGVAYVYRSRDFRRWTRAAQPLHSAPTGMWE


Beta-

CPDFYPVTADGRREGVDTSSAVVDAAASARVKYVLKNSLDLRRYDYYTVGTYDRKAERYVPD


fructo-

DPAGDEHHIRYDYGNFYASKTFYDPAKRRRILWGWANESDTAADDVAKGWAGIQAIPRKVWL


furanosidase,

DPSGKQLLQWPIEEVERLRGKWPVILKDRVVKPGEHVEVTGLQTAQADVEVSFEVGSLEAAERL


insoluble

DPAMAYDAQRLCSARGADARGGVGPFGLWVLASAGLEEKTAVFFRVFRPAARGGGAGKPVVL


isoenzyme 2

MCTDPTKSSRNPNMYQPTFAGFVDTDITNGKISLRSLIDRSVVESFGAGGKACILSRVYPSLAIGK


CIN2

NARLYVFNNGKAEIKVSQLTAWEMKKPVMMNGA


Unit




Prot




Accession




No.:




Q0JDC5








Rattus

SEQ ID NO: 330
MAKKKFSALEISLIVLFIIVTAIAIALVTVLATKVPAVEEIKSPTPTSNSTPTSTPTSTSTPTSTSTPSP



norvegicus


GKCPPEQGEPINERINCIPEQHPTKAICEERGCCWRPWNNTVIPWCFFADNHGYNAESITNENAGL


(rat)

KATLNRIPSPTLFGEDIKSVILTTQTQTGNRFRFKITDPNNKRYEVPHQFVKEETGIPAADTLYDVQ


Sucrase-

VSENPFSIKVIRKSNNKVLCDTSVGPLLYSNQYLQISTRLPSEYIYGFGGHIHKRFRHDLYWKTWP


isomaltase,

IFTRDEIPGDNNHNLYGHQTFFMGIGDTSGKSYGVFLMNSNAMEVFIQPTPIITYRVTGGILDFYIF


intestinal

LGDTPEQVVQQYQEVHWRPAMPAYWNLGFQLSRWNYGSLDTVSEVVRRNREAGIPYDAQVTD


Si Gene

IDYMEDHKEFTYDRVKFNGLPEFAQDLHNHGKYIIILDPAISINKRANGAEYQTYVRGNEKNVW


UnitProt

VNESDGTTPLIGEVWPGLTVYPDFTNPQTIEWWANECNLFHQQVEYDGLWIDMNEVSSFIQGSL


Accession

NLKGVLLIVLNYPPFTPGILDKVMYSKTLCMDAVQHWGKQYDVHSLYGYSMAIATEQAVERVF


No.:

PNKRSFILTRSTFGGSGRHANHWLGDNTASWEQMEWSITGMLEFGIFGMPLVGATSCGFLADTT


P23739

EELCRRWMQLGAFYPFSRNHNAEGYMEQDPAYFGQDSSRHYLTIRYTLLPFLYTLFYRAHMFGE




TVARPFLYEFYDDTNSWIEDTQFLWGPALLITPVLRPGVENVSAYIPNATWYDYETGIKRPWRKE




RINMYLPGDKIGLHLRGGYIIPTQEPDVTTTASRKNPLGLIVALDDNQAAKGELFWDDGESKDSI




EKKMYILYTFSVSNNELVLNCTHSSYAEGTSLAFKTIKVLGLREDVRSITVGENDQQMATHTNFT




FDSANKILSITALNFNLAGSFIVRWCRTFSDNEKFTCYPDVGTATEGTCTQRGCLWQPVSGLSNV




PPYYFPPENNPYTLTSIQPLPTGITAELQLNPPNARIKLPSNPISTLRVGVKYHPNDMLQFKIYDAQ




HKRYEVPVPLNIPDTPTSSNERLYDVEIKENPFGIQVRRRSSGKLIWDSRLPGFGFNDQFIQISTRLP




SNYLYGFGEVEHTAFKRDLNWHTWGMFTRDQPPGYKLNSYGFHPYYMALENEGNAHGVLLLN




SNGMDVTFQPTPALTYRTIGGILDFYMFLGPTPEIATRQYHEVIGFPVMPPYWALGFQLCRYGYR




NTSEIEQLYNDMVAANIPYDVQYTDINYMERQLDFTIGERFKTLPEFVDRIRKDGMKYIVILAPAI




SGNETQPYPAFERGIQKDVFVKWPNTNDICWPKVWPDLPNVTIDETITEDEAVNASRAHVAFPDF




FRNSTLEWWAREIYDFYNEKMKFDGLWIDMNEPSSFGIQMGGKVLNECRRMMTLNYPPVFSPE




LRVKEGEGASISEAMCMETEHILIDGSSVLQYDVHNLYGWSQVKPTLDALQNTTGLRGIVISRST




YPTTGRWGGHWLGDNYTTWDNLEKSLIGMLELNLFGIPYIGADICGVFHDSGYPSLYFVGIQVG




AFYPYPRESPTINFTRSQDPVSWMKLLLQMSKKVLEIRYTLLPYFYTQMHEAHAHGGTVIRPLM




HEFFDDKETWEIYKQFLWGPAFMVTPVVEPFRTSVTGYVPKARWFDYHTGADIKLKGILHTFSA




PFDTINLHVRGGYILPCQEPARNTHLSRQNYMKLIVAADDNQMAQGTLFGDDGESIDTYERGQY




TSIQFNLNQTTLTSTVLANGYKNKQEMRLGSIHIWGKGTLRISNANLVYGGRKHQPPFTQEEAKE




TLIFDLKNMNVTLDEPIQITWS






Oryctolagus

SEQ ID NO: 331
MAKRKFSGLEITLIVLFVIVFILAIALIAVLATKTPAVEEVNPSSSTPTTTSTTTSTSGSVSCPSELNE



cuniculus


VVNERINCIPEQSPTQAICAQRNCCWRPWNNSDIPWCFFVDNHGYNVEGMTTTSTGLEARLNRK


(Rabbit)

STPTLFGNDINNVLLTTESQTANRLRFKLTDPNNKRYEVPHQFVTEFAGPAATETLYDVQVTENP


Sucrase-

FSIKVIRKSNNRILFDSSIGPLVYSDQYLQISTRLPSEYMYGFGEHVHKRFRHDLYWKTWPIFTRD


isomaltase,

QHTDDNNNNLYGHQTFFMCIEDTTGKSFGVFLMNSNAMEIFIQPTPIVTYRVIGGILDFYIFLGDT


intestinal

PEQVVQQYQELIGRPAMPAYWSLGFQLSRWNYNSLDVVKEVVRRNREALIPFDTQVSDIDYME


Si Gene

DKKDFTYDRVAYNGLPDFVQDLHDHGQKYVIILDPAISINRRASGEAYESYDRGNAQNVWVNE


UnitProt

SDGTTPIVGEVWPGDTVYPDFTSPNCIEWWANECNIFHQEVNYDGLWIDMNEVSSFVQGSNKGC


Accession

NDNTLNYPPYIPDIVDKLMYSKTLCMDSVQYWGKQYDVHSLYGYSMAIATERAVERVFPNKRS


No.:

FILTRSTFAGSGRHAAHWLGDNTATWEQMEWSITGMLEFGLFGMPLVGADICGFLAETTEELCR


P07768

RWMQLGAFYPFSRNHNADGFEHQDPAFFGQDSLLVKSSRHYLNIRYTLLPFLYTLFYKAHAFGE




TVARPVLHEFYEDTNSWVEDREFLWGPALLITPVLTQGAETVSAYIPDAVWYDYETGAKRPWR




KQRVEMSLPADKIGLHLRGGYIIPIQQPAVTTTASRMNPLGLIIALNDDNTAVGDFFWDDGETKD




TVQNDNYILYTFAVSNNNLNITCTHELYSEGTTLAFQTIKILGVTETVTQVTVAENNQSMSTHSN




FTYDPSNQVLLIENLNFNLGRNFRVQWDQTFLESEKITCYPDADIATQEKCTQRGCIWDTNTVNP




RAPECYFPKTDNPYSVSSTQYSPTGITADLQLNPTRTRITLPSEPITNLRVEVKYHKNDMVQFKIF




DPQNKRYEVPVPLDIPATPTSTQENRLYDVEIKENPFGIQIRRRSTGKVIWDSCLPGFAFNDQFIQI




STRLPSEYIYGFGEAEHTAFKRDLNWHTWGMFTRDQPPGYKLNSYGFHPYYMALEDEGNAHGV




LLLNSNAMDVTFMPTPALTYRVIGGILDFYMFLGPTPEVATQQYHEVIGHPVMPPYWSLGFQLC




RYGYRNTSEIIELYEGMVAADIPYDVQYTDIDYMERQLDFTIDENFRELPQFVDRIRGEGMRYIIIL




DPAISGNETRPYPAFDRGEAKDVFVKWPNTSDICWAKVWPDLPNITIDESLTEDEAVNASRAHA




AFPDFFRNSTAEWWTREILDFYNNYMKFDGLWIDMNEPSSFVNGTTTNVCRNTELNYPPYFPEL




TKRTDGLHFRTMCMETEHILSDGSSVLHYDVHNLYGWSQAKPTYDALQKTTGKRGIVISRSTYP




TAGRWAGHWLGDNYARWDNMDKSIIGMMEFSLFGISYTGADICGFFNDSEYHLCTRWTQLGAF




YPFARNHNIQFTRRQDPVSWNQTFVEMTRNVLNIRYTLLPYFYTQLHEIHAHGGTVIRPLMHEFF




DDRTTWDIFLQFLWGPAFMVTPVLEPYTTVVRGYVPNARWFDYHTGEDIGIRGQVQDLTLLMN




AINLHVRGGHILPCQEPARTTFLSRQKYMKLIVAADDNHMAQGSLFWDDGDTIDTYERDLYLSV




QFNLNKTTLTSTLLKTGYINKTEIRLGYVHVWGIGNTLINEVNLMYNEINYPLIFNQTQAQEILNI




DLTAHEVTLDDPIEISWS






Homo

SEQ ID NO: 343
MARKKFSGLEISLIVLFVIVTIIALALIVVLATKTPAVDEISDSTSTPATTRVTTNPSDSGKCPNVLN



sapiens


DPVNVRINCIPEQFPTEGICAQRGCCWRPWNDSLIPWCFFVDNHGYNVQDMTTTSIGVEAKLNRI


Sucrase-

PSPTLFGNDINSVLFTTQNQTPNRFRFKITDPNNRRYEVPHQYVKEFTGPTVSDTLYDVKVAQNP


isomaltase,

FSIQVIRKSNGKTLFDTSIGPLVYSDQYLQISTRLPSDYIYGIGEQVHKRFRHDLSWKTWPIFTRDQ


intestinal

LPGDNNNNLYGHQTFFMCIEDTSGKSFGVFLMNSNAMEIFIQPTPIVTYRVTGGILDFYILLGDTP


Si Gene

EQVVQQYQQLVGLPAMPAYWNLGFQLSRWNYKSLDVVKEVVRRNREAGIPFDTQVTDIDYME


UnitProt

DKKDFTYDQVAFNGLPQFVQDLHDHGQKYVIILDPAISIGRRANGTTYATYERGNTQHVWINES


Accession

DGSTPIIGEVWPGLTVYPDFTNPNCIDWWANECSIFHQEVQYDGLWIDMNEVSSFIQGSTKGCNV


No.:

NKLNYPPFTPDILDKLMYSKTICMDAVQNWGKQYDVHSLYGYSMAIATEQAVQKVFPNKRSFIL


P14410

TRSTFAGSGRHAAHWLGDNTASWEQMEWSITGMLEFSLFGIPLVGADICGFVAETTEELCRRW




MQLGAFYPFSRNHNSDGYEHQDPAFFGQNSLLVKSSRQYLTIRYTLLPFLYTLFYKAHVFGETVA




RPVLHEFYEDTNSWIEDTEFLWGPALLITPVLKQGADTVSAYIPDAIWYDYESGAKRPWRKQRV




DMYLPADKIGLHLRGGYIIPIQEPDVTTTASRKNPLGLIVALGENNTAKGDFFWDDGETKDTIQN




GNYILYTFSVSNNTLDIVCTHSSYQEGTTLAFQTVKILGLTDSVTEVRVAENNQPMNAHSNFTYD




ASNQVLLIADLKLNLGRNFSVQWNQIFSENERFNCYPDADLATEQKCTQRGCVWRTGSSLSKAP




ECYFPRQDNSYSVNSARYSSMGITADLQLNTANARIKLPSDPISTLRVEVKYHKNDMLQFKIYDP




QKKRYEVPVPLNIPTTPISTYEDRLYDVEIKENPFGIQIRRRSSGRVIWDSWLPGFAFNDQFIQISTR




LPSEYIYGFGEVEHTAFKRDLNWNTWGMFTRDQPPGYKLNSYGFHPYYMALEEEGNAHGVFLL




NSNAMDVTFQPTPALTYRTVGGILDFYMFLGPTPEVATKQYHEVIGHPVMPAYWALGFQLCRY




GYANTSEVRELYDAMVAANIPYDVQYTDIDYMERQLDFTIGEAFQDLPQFVDKIRGEGMRYIIIL




DPAISGNETKTYPAFERGQQNDVFVKWPNTNDICWAKVWPDLPNITIDKTLTEDEAVNASRAHV




AFPDFFRTSTAEWWAREIVDFYNEKMKFDGLWIDMNEPSSFVNGTTTNQCRNDELNYPPYFPEL




TKRTDGLHFRTICMEAEQILSDGTSVLHYDVHNLYGWSQMKPTHDALQKTTGKRGIVISRSTYP




TSGRWGGHWLGDNYARWDNMDKSIIGMMEFSLFGMSYTGADICGFFNNSEYHLCTRWMQLG




AFYPYSRNHNIANTRRQDPASWNETFAEMSRNILNIRYTLLPYFYTQMHEIHANGGTVIRPLLHE




FFDEKPTWDIFKQFLWGPAFMVTPVLEPYVQTVNAYVPNARWFDYHTGKDIGVRGQFQTFNAS




YDTINLHVRGGHILPCQEPAQNTFYSRQKHMKLIVAADDNQMAQGSLFWDDGESIDTYERDLYL




SVQFNLNQTTLTSTILKRGYINKSETRLGSLHVWGKGTTPVNAVTLTYNGNKNSLPFNEDTTNMI




LRIDLTTHNVTLEEPIEINWS
















TABLE 3







Linkers









Sequence
SEQ ID



Info
NO:
Amino Acid sequence





N-terminal
SEQ ID
EAEA


addition
NO: 19



EAEA







GGGS
SEQ ID
GGGGS


linker
NO: 20






GSS linker

GSS





A rigid
SEQ ID
EAAAREAAAREAAAREAAAR


linker that
NO: 22



forms 4




turns of an




alpha helix







Full linker
SEQ ID
GSSGSSGSSGSSGSSGSSGSSGSSEAAAREA



NO: 23
AAREAAAREAAARGGGGSGGGGSGGGGS





A flexible
SEQ ID
GSSGSSGSSGSSGSSGSSGSSGSS


GS linker
NO: 24



with higher




S content







A flexible
SEQ ID
GGGGSGGGGSGGGGS


GS linker
NO: 25



with much




higher G




content







(flex
SEQ ID
Nucleotide sequence


linkers)
NO: 339
GGTTCATCAGGGTCCTCAGGATCATCCGGTA




GTAGTGGTTCATCCGGTTCATCCGGATCAAG




TGGCTCCTCTGAAGCTGCAGCAAGGGAGGCT




GCAGCCCGTGAGGCAGCCGCTAGAGAAGCCG




CCGCTAGGGGTGGTGGCGGCTCTGGCGGAGG




CGGTTCCGGTGGCGGAGGCTCT
















TABLE 4







Promoters









Sequence
SEQ ID



Info
NO:
Amino Acid sequence





AOX1
SEQ ID NO: 26
GATCTAACATCCAAAGACGAAAGGTTGAATGAAACCTTTTTGCCATCCGACATCCACA


promoter

GGTCCATTCTCACACATAAGTGCCAAACGCAACAGGAGGGGATACACTAGCAGCAGA




CCGTTGCAAACGCAGGACCTCCACTCCTCTTCTCCTCAACACCCACTTTTGCCATCGAA




AAACCAGCCCAGTTATTGGGCTTGATTGGAGCTCGCTCATTCCAATTCCTTCTATTAGG




CTACTAACACCATGACTTTATTAGCCTGTCTATCCTGGCCCCCCTGGCGAGGTTCATGT




TTGTTTATTTCCGAATGCAACAAGCTCCGCATTACACCCGAACATCACTCCAGATGAG




GGCTTTCTGAGTGTGGGGTCAAATAGTTTCATGTTCCCCAAATGGCCCAAAACTGACA




GTTTAAACGCTGTCTTGGAACCTAATATGACAAAAGCGTGATCTCATCCAAGATGAAC




TAAGTTTGGTTCGTTGAAATGCTAACGGCCAGTTGGTCAAAAAGAAACTTCCAAAAGT




CGGCATACCGTTTGTCTTGTTTGGTATTGATTGACGAATGCTCAAAAATAATCTCATTA




ATGCTTAGCGCAGTCTCTCTATCGCTTCTGAACCCCGGTGCACCTGTGCCGAAACGCA




AATGGGGAAACACCCGCTTTTTGGATGATTATGCATTGTCTCCACATTGTATGCTTCCA




AGATTCTGGTGGGAATACTGCTGATAGCCTAACGTTCATGATCAAAATTTAACTGTTC




TAACCCCTACTTGACAGCAATATATAAACAGAAGGAAGCTGCCCTGTCTTAAACCTTT




TTTTTTATCATCATTATTAGCTTACTTTCATAATTGCGACTGGTTCCAATTGACAAGCT




TTTGATTTTAACGACTTTTAACGACAACTTGAGAAGATCAAAAAACAACTAATTATTG




GATCCCGA





DAK2
SEQ ID NO: 27
AAATAAGCATGTTTGTTTCAGATCAAAGATTAGCGTTTCAAAGTTGTGGAAAAGTGAC


promoter

CATGCAACAATATGCAACACATTCGGATTATCTGATAAGTTTCAAAGCTACTAAGTAA




GCCCGTTTCAAGTCTCCAGACCGACATCTGCCATCCAGTGATTTTCTTAGTCCTGAAA




AATACGATGTGTAAACATAAACCACAAAGATCGGCCTCCGAGGTTGAACCCTTACGA




AAGAGACATCTGGTAGCGCCAATGCCAAAAAAAAATCACACCAGAAGGACAATTCCC




TTCCCCCCCAGCCCATTAAAGCTTACCATTTCCTATTCCAATACGTTCCATAGAGGGCA




TCGCTCGGCTCATTTTCGCGTGGGTCATACTAGAGCGGCTAGCTAGTCGGCTGTTTGA




GCTCTCTAATCGAGGGGTAAGGATGTCTAATATGTCATAATGGCTCACTATATAAAGA




ACCCGCTTGCTCAACCTTCGACTCCTTTCCCGATCCTTTGCTTGTTGCTTCTTCTTTTAT




AACAGGAAACAAAGGAATTTATACACTTTAAGAATT





PEX11
SEQ ID NO: 28
CTTCCCCATTTCACTGACAGTTTGTAGAAATAGGGCAACAATTGATGCAAATCGATTT


promoter

TCAACGCATTGGTTTTGATAGCATTGATGATCTTGGAGCTGTAAAAGTCCGGCTGGAT




AAGCTCAATGAAATAGGTTGGTTGATCTGGATCTTCTTTTGGGTCATTTTGTTCGCTCT




GTATTTCACAAATTGCCAGAATCTCTGCCAACCACAGTGGTAGGTCCAACTTGGTGTT




CTGAATCACAGGCTTCCCCGGGTTGTTCTCTAAATAACCGAGGCCCGGCACAGAAATC




GTAAACCGACACGGTATCTTTTGTCCGTCCGCCAGTATCTCATCAAGGTCGTAGTAGC




CCATGATGAGTATCAAAGGGGATTTGGTTATGCGATGCAACGAGAGATTGTTTATCCC




AGATGCTGATGTAAAAACCTTAACCAGCGTGACAGTAGAAATAAGACACGTTAAAAT




TACCCGCGCTTCCCTAACAATTGGCTCTGCCTTTCGGCAAGTTTCTAACTGCCCTCCCC




TCTCACATGCACCACGAACTTACCGTTCGCTCCTAGCAGAACCACCCCAAAGTTTAAT




CAGGACCGCATTTTAGCCTATTGCTGTAGAACCCCACAACATAACCTGGTCCAGAGCC




AGCCCTTTATATATGGTAAATCCCGTTTGAACTTCGAAGTGGAATCGGAATTTTTACA




TCAAAGAAACTGATACTGAAACTTTTGGCTTCGACTTGGACTTTCTCTTAATC





FLD1
SEQ ID NO: 29
AAATCAGCCATTAATCTCACCTCAGTTTTTGAATCAGTAGAATTTTCAATGAAACAAA


promoter

CGGTTGGTATATTATTTGATAGGGTAGCCAAATTTCCAAAAATGAACTTTTCATCAGG




TAATATCTTGAATACCGTAATGTAGTGACTATTGGAAGAAACTGCTATCAAATTATAT




TTCGGATAGAAATCCAAACCCCAGACTGATCTCTTGAGTCTCAACTCTAAGTCAGCCG




CGACTCTAATTATCTGTGGATTAGGAGTTAGTGTGGACAAAGCATCAGTATAGTATAA




CTTTACGGTTCCATTATCAGACGCTATTGCAAGAACTTCCTTTCCATTGATCTCTCCAA




TTCGACAGTAATTGATATCATAAGGTAGGTCTGGAAACACACTGGCGCTTGTATCCCA




TTCTGCAGGAATTTCTGGAACGGTGGTAATGGTAGTTATCCAACGGAGTTGGGGTAGT




TGGTATATCTGGATATGCCGCCTATAGGATAAAAACAGGAGAGAGTGAACCTTGCTT




ACGGCTACTAGATTGTTCTTGTACTCGGAATTGTCGTTATCGGAAACTAGACTAATCT




CATCTGTGTGTTGCAGTACTATTGAGTCGTTGTAGTATCTACCAGGAGGGCATTCCAT




GAACTAGTGAGACAAATGAGTTGGATTTTCTCAATAGACATATGCAAGAATGCTACA




CAACGGATGTCGCACTCTTTTTCTTAGTTGATAATATCATCCAATCAGAAGACACGGG




CTAGAAGGACTTGCTCCCGAAGGATAATCCACTGCTACTATCTCCCTTCCTCACATAT




AGTCTTGCAGGGCTCATGCCCCTTTCTCCTTCGAACTGCCCGATGAGGAAGTCTTTAG




CCTATCAAGGAATTCGGGACCATCATCAATTTTTAGAGCCTTACCTGATCGCAATCAG




GATTTCACTACTCATATAAATACATCACTCAAACTCCAACTTTGCTTGTTCATACAATT




CTTGATATTCACAGGATC





FGH1
SEQ ID NO: 30
GTGAATTTGTCACGGAATTGACCAAGAGGTCAGACGATCCTGTATCCCATTGAGCCGT


promoter

TATGCTTTGTGGGGGAAACCCTATTTCTATCGTACTAAGAAAACCAATGGTGAACTCA




TATTCGGTATCAATGGCGACGATTCCAGCATAGCCTGTAGACAGTAACAACACTAGG




GCAACAGCAACTAACATATCTTCATTGATGAAACGTTGTGATCGGTGTGACTTTTATA




GTAAAAGCTACAACTGTTTGAAATACCAAGATATCATTGTGAATGGCTCAAAAGGGT




AATACATCTGAAAAACCTGAAGTGTGGAAAATTCCGATGGAGCCAACTCATGATAAC




GCAGAAGTCCCATTTTGCCATCTTCTCTTGGTATGAAACGGTAGAAAATGATCCGAGT




ATGCCAATTGATACTCTTGATTCATGCCCTATAGTTTGCGTAGGGTTTAATTGATCTCC




TGGTCTATCGATCTGGGACGCAATGTAGACCCCATTAGTGGAAACACTGAAAGGGAT




CCAACACTCTAGGCGGACCCGCTCACAGTCATTTCAGGACAATCACCACAGGAATCA




ACTACTTCTCCCAGTCTTCCTTGCGTGAAGCTTCAAGCCTACAACATAACACTTCTTAC




TTAATCTTTGATTCTCGAATTGTTTACCCAATCTTGACAACTTAGCCTAAGCAATACTC




TGGGGTTATATATAGCAATTGCTCTTCCTCGCTGTAGCGTTCATTCCATCTTTCTAGAA




TTCGT





DAS2
SEQ ID NO: 31
CCTGTTGATAAGACGCATTCTAGAGTTGTTTCATGAAAGGGTTACGGGTGTTGATTGG


promoter

TTTGAGATATGCCAGAGGACAGATCAATCTGTGGTTTGCTAAACTGGAAGTCTGGTAA




GGACTCTAGCAAGTCCGTTACTCAAAAAGTCATACCAAGTAAGATTACGTAACACCTG




GGCATGACTTTCTAAGTTAGCAAGTCACCAAGAGGGTCCTATTTAACGTTTGGCGGTA




TCTGAAACACAAGACTTGCCTATCCCATAGTACATCATATTACCTGTCAAGCTATGCT




ACCCCACAGAAATACCCCAAAAGTTGAAGTGAAAAAATGAAAATTACTGGTAACTTC




ACCCCATAACAAACTTAATAATTTCTGTAGCCAATGAAAGTAAACCCCATTCAATGTT




CCGAGATTTAGTATACTTGCCCCTATAAGAAACGAAGGATTTCAGCTTCCTTACCCCA




TGAACAGAAATCTTCCATTTACCCCCCACTGGAGAGATCCGCCCAAACGAACAGATA




ATAGAAAAAAGAAATTCGGACAAATAGAACACTTTCTCAGCCAATTAAAGTCATTCC




ATGCACTCCCTTTAGCTGCCGTTCCATCCCTTTGTTGAGCAACACCATCGTTAGCCAGT




ACGAAAGAGGAAACTTAACCGATACCTTGGAGAAATCTAAGGCGCGAATGAGTTTAG




CCTAGATATCCTTAGTGAAGGGTTGTTCCGATACTTCTCCACATTCAGTCATAGATGG




GCAGCTTTGTTATCATGAAGAGACGGAAACGGGCATTAAGGGTTAACCGCCAAATTA




TATAAAGACAACATGTCCCCAGTTTAAAGTTTTTCTTTCCTATTCTTGTATCCTGAGTG




ACCGTTGTGTTTAATATAACAAGTTCGTTTTAACTTAAGACCAAAACCAGTTACAACA




AATTATAACCCCTCTAAACACTAAAGTTCACTCTTATCAAACTATCAAACATCAAAAG




AATTCGCG





CAT1
SEQ ID NO: 32
TAATCGAACTCCGAATGCGGTTCTCCTGTAACCTTAATTGTAGCATAGATCACTTAAA


promoter

TAAACTCATGGCCTGACATCTGTACACGTTCTTATTGGTCTTTTAGCAATCTTGAAGTC




TTTCTATTGTTCCGGTCGGCATTACCTAATAAATTCGAATCGAGATTGCTAGTACCTGA




TATCATATGAAGTAATCATCACATGCAAGTTCCATGATACCCTCTACTAATGGAATTG




AACAAAGTTTAAGCTTCTCGCACGAGACCGAATCCATACTATGCACCCCTCAAAGTTG




GGATTAGTCAGGAAAGCTGAGCAATTAACTTCCCTCGATTGGCCTGGACTTTTCGCTT




AGCCTGCCGCAATCGGTAAGTTTCATTATCCCAGCGGGGTGATAGCCTCTGTTGCTCA




TCAGGCCAAAATCATATATAAGCTGTAGACCCAGCACTTCAATTACTTGAAATTCACC




ATAACACTTGCTCTAGTCAAGACTTACAATTAAA





MDH3
SEQ ID NO: 33
TAGCTTGGGTAGGACTTGACAAGTACGGCTTCCGTGGTCATACCAAACGCCTTTGTTA


promoter

CCGTTGGCTATACCTAATGACCAAGGCATTTGTGGATTATAACGGTATCGTAGTTGAA




AAATATGACGTAACCACTGGTACTAGCCCCCACAAGGTTGATGCTGAATACGGGAAT




CAAGGTGCCGATTTTAAAGGAGTAGCCACTGAAGGGTTTGGCTGGGTCAATGCCTCTT




TTATTTTGGGATTAACCTACTTAGATGTCCAAGGCATCCGTGCGATAGGCGCCGTTAC




GTCCCCTGATGTATTTTTCAGGAAGCTCAAACCTTGGGAACGCGCAAGTTATGGCCTA




AGGCCATGTAACGAGATAGTCAAGTCAAACTAGAAGTATACGGTTTCCCCGCAGAAA




TAGCAGAAATAGGCGACAAATACATACAACATTTTCATTGTGATAGGGGGCGGCGGT




TCCTAGGAGGGACAACCCCCAGAAACCTTGTAGACTACGTTTTCACGACGATGGGTTA




TTACTGTAAAGGAAGAATATACTACCCACCAGTTGAATGTTTGAACGGATCAAAGGTC




GAAGGGAGTACACGGCCCAACCAACGTAGCTACCGGAGAAAGCAAGACTTTCCCAAA




CCAAATAGCTCCGGGTTTCTTCTCCGGCAACCCGTCAGTTTTTGTGTGGCCGGACAAA




AATTCGCACCCTCAGTCTAATTGAAAGGTCGGGCTCCGAGCTCTAGGCGTTTGCGCAT




GTAATATTGCATCCCCTCCCATAGATAATACTGCGCGAACACAGGGTGCAAATTATGA




TGACCACACATGCCAGTGACCAAAACAGTTTTTTAGTCTTTAAAAACCCTCGGAACTT




CTGAGTATATAAAGGCTTCTCATTTCCTACAAGCAAACAAAGAAGAAACTTCCACTTT




CTAACTTTTTATCTATAGACTTTAGAGTTACAACCAACGAACAATAACAAA





HAC1
SEQ ID NO: 34
TGAAGCTTATCTGCTGAGCAAGTTGTTTGACCAAACTTGAGTCAACAGTGGTTAACTA


promoter

TATCCTCTATTATTTTAGATGGGAGCACATCAAGTGTACGGGAACAATGCAATCGACA




ACCTGTAGCCTGACATACATAGCCATCTTGAATTGACAAAACTTAGAATGTCTTGAAT




GTGATAGATATGAGTTCCCAAAAATCTCTTTTACGATTTCCCAGTTGCGGTGTACTATT




ACACAGAGGATATCATAGCAGACTTACAATCCTCAGGCATAAAACGAGCTTTCTTATC




AAAGTGTATTCAAATGGACCATTTGATTGCACCAAGGCATTAGCCCCAAACCATACCA




CACAGTAACTTGATATTCTCAGCATGCATGGAAATTCCACTCATAACGCGCTATTCAC




CGCGAATACTTATCTATGAAACTGGGTTCTTTAGTATTCTTTGCCAAATTTCACCGATT




AGAAATTATTAGGTAATATAATTTCTTTGGGGAACCCCTTCCCGTTACGCCCGCTGCG




GCTTTGTGGTTCTTTTCCAGTCTTGAGCAAATTACATCTGGTCTAGACAGTTCTTCCGT




GCCCCAGTATGCGAGCGCAAACTTTCAATCAAACCTCGTAGCAAATTGGTACTTGAAC




TTCGTATTTAACCGCTATTAAATGTACTGACTCTTACATTATGAAAAATTTTGATAAAG




ATTTTATATTTCATCTCAGTTAATCTCCTAATAATAATAGTCTGCATAACTCAAACGGT




ACTTCCTTTTCGGAACGCGAAGAGTAGTCTCTATGTCATTCTCACACTATCCGCAGCG




CAATAGAGAACGAGCATGTTACCCGACTCATCCCTTGTCGATTCGGAAACGATTTATA




AATACAATTAGATCGCCACCGATCTTCTTTTGTCAATATTATAAAAATAGTACAGATT




TTCCTTAGTCGAATCAGATCGCAGAAA





BiP
SEQ ID NO: 35
AGATCTGAGGGTGTATACGATGTATCGTGCCGAACACATGCACTTGACGGCACAGCA


promoter

AATGGTATTCAAGAAGACCACTTTAGAATGGGAGTTAATAGGGATGGTTTCATGGAG




GTTAAAACACTTCAAGGAGGCATCTGAAGCATTCAAGTATGCACTAGGTCTGAGGTTT




TCGGTCAAGGCATGCAAGAAATTAATTGTATTCTATCTGAACGAACGCTCCAGAATGA




ACCAGCCAGAAACCTCAATTGCCCTCAACAACTTAAATCAATCCACATTATCCATCCA




AGAGATTCTCAAGTATCGTTCGTTCCTCGATATCAACCTAATTTCAAACTTGGTCAAA




CTAGGAGTTTGGAATCACCGCTGGTATGCTGAGTTTTCTCCAAAACTCATAGAAAGCC




TTGCGGTTGTTGTGGAGAACGGAGGGCTTATCAAGGTAGAAAACGAGGTTAAGGCTA




CCTATTTCGATTCACAAGATGGAGTTTACGACTTGATGAACGAGGTATTCAAGTTCAT




GAAGCATTACGATTATCCTGGGACTGACAACTAAGAGCTCCTAGTGAAGACTTGAGA




TGGACATGATAAACAATTATAGTGAAAATAGAAACCATAATACAATATTCTAATAGA




GGAACCGTTTACCTGTGGTTCCTATTGTGGCCTACTGTTACTAGCTAGTGTAATACACC




CTTGCCTCAGCTTTGCAAGTTGACAACTCAGCCAAATGATCTTTGAATGCGCGAAACC




TCAAGGTCCATCGAATTTTCTCGAATTTTCAGTGTTTTCATACAGCGTGTCATCTTCTT




TCGCGTACTTATTAAAATCGTACCCAGATCCCTTCTTCTTCCTTAATTTCAATTCCAAC




ACTCAAGA





RAD30
SEQ ID NO: 36
AGATCTTGCAAAATACCTTTCCAGCTTTCCAGCTTCCTAGCACTCATCTTGAAGATATC


promoter

AAATATTCTCCATTCAAACCAACATCAAAAAATAGAATAATTATAATCAGTTTGAAGA




GCAAGAGTAATTTTAAAGGAAACACATTCATGGTCAGCTAGAAGGTTGACTGAAGAG




TCGCAAGATATCTGAGAATAAAAAAGAGCATAGCTAACAAGATGAGTAAACACGGCA




AACAGATTTAGGAACAGGTGAAGGGTTTCTGGCTCTTCAATGTATATCCTGCTAGCCA




CCCATTCAGAAATAACACAAAGTAGGACCCTACTGAAAAATAAATTTAATACATCTTC




ATCCTCTCATTAAACCACCGACCACTCAAACCATACCAGCCTTGTCCAATTCCATGCA




TCGTGCTATCCGTCAGAATTTTCAGTGTTAATCGAATCGGTCATTATAGCTCCGTCTGG




GGCGACAACTTGTCATCACAGAATAGCACAATTATGCGTTGGAATCGTCAAAAAATC




ACCTCCAGGTCTGTATACATACAGAACTGGTTGTAACGACAACCTTGTTTGATTGAGG




TGACTGGAAGGTGGAAAGAAAGGGAGGAAATAAATATTGCAAGGAAAGAAAAAAAA




ATTGTTCACAGTCACCTCTTCACCTTCGCGATTTCATGTTTCTTTCATGTGCTAACTGAT




CCCAGGGCTTCTCCAGCGCCCTTATCTGTTAG





RVS161-2
SEQ ID NO: 37
CTGCCCATCTATGACTGAATGTGGAGAAGTATCGGAACAACCCTTCACTAAGGATATC


promoter

TAGGCTAAACTCATTCGCGCCTTAGATTTCTCCAAGGTATCGGTTAAGTTTCCTCTTTC




GTACTGGCTAACGATGGTGTTGCTCAACAAAGGGATGGAACGGCAGCTAAAGGGAGT




GCATGGAATGACTTTAATTGGCTGAGAAAGTGTTCTATTTGTCCGAATTTCTTTTTTCT




ATTATCTGTTCGTTTGGGCGGATCTCTCCAGTGGGGGGTAAATGGAAGATTTCTGTTC




ATGGGGTAAGGAAGCTGAAATCCTTCGTTTCTTATAGGGGCAAGTATACTAAATCTCG




GAACATTGAATGGGGTTTACTTTCATTGGCTACAGAAATTATTAAGTTTGTTATGGGG




TGAAGTTACCAGTAATTTTCATTTTTTCACTTCAACTTTTGGGGTATTTCTGTGGGGTA




GCATAGCTTGACAGGTAATATGATGTACTATGGGATAGGCAAGTCTTGTGTTTCAGAT




ACCGCCAAACGTTAAATAGGACCCTCTTGGTGACTTGCTAACTTAGAAAGTCATGCCC




AGGTGTTACGTAATCTTACTTGGTATGACTTTTTGAGTAACGGACTTGCTAGAGTCCTT




ACCAGACTTCCAGTTTAGCAAACCACAGATTGATCTGTCCTCTGGCATATCTCAAACC




AATCAACACCCGTAACCCTTTCATGAAACAACTCTAGAATGCGTCTTATCAACAGGAT




TGCCCAAAACAGTAATTGGGGCGGTGGAATCTACATGGGAGTTCCATCGTTGTCTCGG




TTTTTCTCCCTATAAGCTACTCTGGAGACGAAGTAACTAACACCCTCAAATATCATT





MPP10
SEQ ID NO: 38
TCTGAATCCGACCTCCTCTAATCTACCACTGAAGAGAAGCAGTGTATTGTTCGTCTAC


promoter

GTAAATTTGAATGTGTAAATGGCAAACATGGCTTCGGGGATGATTTGGCATATATATT




ATTGTAGCATCGTCTGTGGCTCTATGAGTTGTGTGGCGGATGATGAAAAGTTTCGTGC




TGATCCCACAATGCGGCATTTACCAAATGGGGAAAGACCAGATTTCTTCGCTGCGCCA




GCTAGGGACAGCATAATGTTCCAAGAAGAAGCGATTACAGGTGGATTACAAAGCGTT




CGTCTGCAGTTGATGTTCTACGTGATGGGTATGAGTTGTAGTGCTACGCTCCATGAAT




ACTTCTAATTTGTCGTTGACAATCCATGAATAATTTAAGTTTGCTTCCCAAGAGTCTAT




TGCGAAGGGTGAGCCGAATCTCTTGGCGTATGCACCCGACTCGTCGGCTTTTGTGCGT




TCCTTGCAAAGCTCGGTAGCAATCCGTTGGTGGGAGAAATTTGTCTCACGAATTTCAG




TTGGGAGTAGCTGTTCCTGGTAGCAAGTTCGAGGGGATCTGTGCTCATAAAACGTGCT




CACGCCAAAAATATTCTTACAAAATCTTCGCGGGGTGTTTGTCTTACATAATCGATTG




GATATTTTCTTCAAATTTTTTTTTCTTACTGAAGTCCCCTATAGAG





THP3
SEQ ID NO: 39
TCTTGCCAGTTGTCTCCTAAGATGTCATCGGAGTAGGCTCGGCTAAAGAGTAGTAATG


promoter

CATCAAGACCAACCAAAACACCTTCCACGAGTTCAGATGAACCTTTTAATAACTTCAG




GTCACTTTGATGCCGGCACAACTGGGCGAGTTTCGTATAGTTAACTCTGATCTTGCAC




TCCAGAACGGGAATAGGATTGACTTTTTGCTTCCGAGAAACGATTTGCTCTCTCTTCGT




CTGGCTTTTCACTTTATATCGCACGGAATCAATGGATGGAACTCCTAAAGCTCCTAAC




TTCGATGATTTGCTAGCCATGACTCTGTGGGACATTTTCTTGCATCTCGTTTGTAACCT




GTCTGTTCCTACACTAAGTTTATGAGAGGCTACTTTGGATTCTAGCCTCGGTGGTAAA




GTGGGAGATAACAACGGCATAAGGCAAGAACCAGAAGTACCATAACGGTCTGGTAA




AGTTGGTGATAACTTAATTGGAAGAGTGTAAGTAAGACGTGGCTTGTAATAAGGCTTT




CCATCAAAAAGGTTCTCCGGGTTGGAGTTTGTGAGGCTCACATCTTTGATCAGTCTTTC




AATATAAATTGGTAACGTTGATGACAATGCCGGAGGTAATTTCTGTAGTTGTTGATAT




ACGCAGATAACAGATTCAAATCTCCATTGGTTTTCATCATTGTGGCTTAAATTAGATC




AGAACATGGTAGTATTTAAAAATGGATCTCTTTGCAGATTTACTCAATATAGCGAAAA




AAGGAGACATTCGTTACAAAATATGAAGATAATTCGCCTCATAACTCGATTAATCAAA




ACAGACGGTCCAGTTCTTCTTTTGGTAGT





GBP2
SEQ ID NO: 40
ATCTGTACTGGTACTGACAAAGGTTATCCAGAATCCGAGACATTTCAACAACAGAGAT


promoter

TCCAGGCTTCAAAACATCCATTTTATCACCAATATCTAGTAATGCTTGCAACAATTCTG




GATACTTCTTCTGTGTAACCAAATCTCTTATAAACTGAACAGCTTTCTGTACGTTGTCG




TCAGTAGTTGGATCAACCTCAGTGGTGACCTGGCCTATCGGTTTTCCAAAAGACTTGT




TTATCACGTCCGAAAGCTCCCATTTTTGCAGATGCGCAACTTTAAAAGGCCTGGCTTG




AACATTTGCATCTCTTGTTGTGTGTTCTTTGAGAAAATATTCATCGATCTGGGTGCTTC




CAACGACAGAAGATACTCTTCTGAGACCAGAAAGTCCCCAGCCATGCTTCCTAATTAC




AAAATATTTGTAGGAAGATCCCTGATTAGGACAAAGTTGTCTTCTCATGAGTTCAACT




GAAACTGGGGCTCAAACGGATTATGAAAGGGGTGATTAAAGGTTTTCCTAGCCTTACT




TTCCAAATGTCGACCGAGACGAACATTTAAAATCCTAACATCAGAAATTTCTATCCTT




AATCTCATTGATGGTTAGTACACTTCGCAGAGTCTCCACATTTGCAGACCCTCCTGGA




TAACCAAAGCTTATCTAACAGCGGCATTGGACCTTTGAAAAGACCCTC





DAS1
SEQ ID NO: 41
AAATCTGAACACGATGAAACCTCCCCGTAGATTCCACCGCCCCGTTACTTTTTTGGGC


promoter

AATCCCGTTGATAAGATCCATTTTAGAGTTGTTTCTGAAAGGATTACAGGCGTTGAAG




GGTCAGAGAGATGCCAGAGAACAGACCAATTGGTAGTTTGCTAAAGTGGACGTCTGG




CAGGTGCTCTATCGTGTTCTTTATTTAGGGCGTTACACTTAGTAGGATTACGTAACAAT




TTGGCTTAACCTTCTAAGTTAGAAAGAAACCAAGAGGGGTCCTCTTTAACGTTCAGCA




GTATCTAAAACACAAAACCTGCCCTCATAATACATCATTCTATCTGTCAAGCTGTGCT




ACCCCACAGAAATACCCCCAAGAGTTAAAGTGAAAAGAAAAGCTAAATCTGTTAGAC




TTCACCCCATAACAAACTTGATAGTTCCTGTAGCCAATGAAAGTTAACCCCATTCAAT




GTTCCGAGATCTAGTATGCTTGCTCCTATAAGGAACGAAGGGTTCCAGCTTCCTTACC




CCATCAATGGAAATCTCCTATTTACCCCCCACTGGAAAGATCCGTCCGAACGAACGGA




TAATAGAAAAAAGAAATTCGGACAAAATAGAACACTTATTTAGCCAATGAAATCCAT




TTCCAGCATCTCCTTCAACTGCCGTTCCATCCCCTTTGTTGAGCTACACCATCGTCAGC




CAGTACCGAATAGGAAACTTAACCGATATCTTGGAGAATTCTAATGCGCGAATGAGTT




TAGCCTAGATATCCTTAGTGAAGGGTTGTTCCGATACTTCTCCACATTCAGTCATTTCA




GATGGGCAGCATTGTTATCATGAAGAAACGGAAACGGGCAGTAAGGGTTAACCGCCA




AATTATATAAAGACAACATGTCCCCAGTTTAAAGTTTTTCTTTCCTATTCTTGTATCCT




GAGTGACCGTTGTGTTTAAAATAACAAGTTCGTTTTAACTTAAGACCAAAACCAGTTA




CAACAAATTATTCCCCAACTAAACACTAAAGTTCACTCTTATCAAACTATCAAACATC




AAAG





Methanol
SEQ ID NO: 42
CTTCCCCATTTCACTGACAGTTTGTAGAAATAGGGCAACAATTGATGCAAATCGATTT


inducible

TCAACGCATTGGTTTTGATAGCATTGATGATCTTGGAGCTGTAAAAGTCCGGCTGGAT


promoter

AAGCTCAATGAAATAGGTTGGTTGATCTGGATCTTCTTTTGGGTCATTTTGTTCGCTCT




GTATTTCACAAATTGCCAGAATCTCTGCCAACCACAGTGGTAGGTCCAACTTGGTGTT




CTGAATCACAGGCTTCCCCGGGTTGTTCTCTAAATAACCGAGGCCCGGCACAGAAATC




GTAAACCGACACGGTATCTTTTGTCCGTCCGCCAGTATCTCATCAAGGTCGTAGTAGC




CCATGATGAGTATCAAAGGGGATTTGGTTATGCGATGCAACGAGAGATTGTTTATCCC




AGATGCTGATGTAAAAACCTTAACCAGCGTGACAGTAGAAATAAGACACGTTAAAAT




TACCCGCGCTTCCCTAACAATTGGCTCTGCCTTTCGGCAAGTTTCTAACTGCCCTCCCC




TCTCACATGCACCACGAACTTACCGTTCGCTCCTAGCAGAACCACCCCAAAGTTTAAT




CAGGACCGCATTTTAGCCTATTGCTGTAGAACCCCACAACATAACCTGGTCCAGAGCC




AGCCCTTTATATATGGTAAATCCCGTTTGAACTTCGAAGTGGAATCGGAATTTTTACA




TCAAAGAAACTGATACTGAAACTTTTGGCTTCGACTTGGACTTTCTCTTAATCGAATTC




GT





GCW14
SEQ ID NO: 43
CAGGTGAACCCACCTAACTATTTTTAACTGGCATCCAGTGAGCTCGCTGGGTGAAAGC


promoter

CAACCATCTTTTGTTTCGGGGAACCGTGCTCGCCCCGTAAAGTTAATTTTTTTTTCCCG




CGCAGCTTTAATCTTTCGGCAGAGAAGGCGTTTTCATCGTAGCGTGGGAACAGAATAA




TCAGTTCATGTGCTATACAGGCACATGGCAGCAGTCACTATTTTGCTTTTTAACCTTAA




AGTCGTTCATCAATCATTAACTGACCAATCAGATTTTTTGCATTTGCCACTTATCTAAA




AATACTTTTGTATCTCGCAGATACGTTCAGTGGTTTCCAGGACAACACCCAAAAAAAG




GTATCAATGCCACTAGGCAGTCGGTTTTATTTTTGGTCACCCACGCAAAGAAGCACCC




ACCTCTTTTAGGTTTTAAGTTGTGGGAACAGTAACACCGCCTAGAGCTTCAGGAAAAA




CCAGTACCTGTGACCGCAATTCACCATGATGCAGAATGTTAATTTAAACGAGTGCCAA




ATCAAGATTTCAACAGACAAATCAATCGATCCATAGTTACCCATTCCAGCCTTTTCGT




CGTCGAGCCTGCTTCATTCCTGCCTCAGGTGCATAACTTTGCATGAAAAGTCCAGATT




AGGGCAGATTTTGAGTTTAAAATAGGAAATATAAACAAATATACCGCGAAAAAGGTT




TGTTTATAGCTTTTCGCCTGGTGCCGTACGGTATAAATACATACTCTCCTCCCCCCCCT




GGTTCTCTTTTTCTTTTGTTACTTACATTTTACCGTTCCGT





FDH1
SEQ ID NO: 44
AAATAAATGGCAGAAGGATCAGCCTGGACGAAGCAACCAGTTCCAACTGCTAAGTAA


promoter

AGAAGATGCTAGACGAAGGAGACTTCAGAGGTGAAAAGTTTGCAAGAAGAGAGCTG




CGGGAAATAAATTTTCAATTTAAGGACTTGAGTGCGTCCATATTCGTGTACGTGTCCA




ACTGTTTTCCATTACCTAAGAAAAACATAAAGATTAAAAAGATAAACCCAATCGGGA




AACTTTAGCGTGCCGTTTCGGATTCCGAAAAACTTTTGGAGCGCCAGATGACTATGGA




AAGAGGAGTGTACCAAAATGGCAAGTCGGGGGCTACTCACCGGATAGCCAATACATT




CTCTAGGAACCAGGGATGAATCCAGGTTTTTGTTGTCACGGTAGGTCAAGCATTCACT




TCTTAGGAATATCTCGTTGAAAGCTACTTGAAATCCCATTGGGTGCGGAACCAGCTTC




TAATTAAATAGTTCGATGATGTTCTCTAAGTGGGACTCTACGGCTCAAACTTCTACAC




AGCATCATCTTAGTAGTCCCTTCCCAAAACACCATTCTAGGTTTCGGAACGTAACGAA




ACAATGTTCCTCTCTTCACATTGGGCCGTTACTCTAGCCTTCCGAAGAACCAATAAAA




GGGACCGGCTGAAACGGGTGTGGAAACTCCTGTCCAGTTTATGGCAAAGGCTACAGA




AATCCCAATCTTGTCGGGATGTTGCTCCTCCCAAACGCCATATTGTACTGCAGTTGGT




GCGCATTTTAGGGAAAATTTACCCCAGATGTCCTGATTTTCGAGGGCTACCCCCAACT




CCCTGTGCTTATACTTAGTCTAATTCTATTCAGTGTGCTGACCTACACGTAATGATGTC




GTAACCCAGTTAAATGGCCGAAAAACTATTTAAGTAAGTTTATTTCTCCTCCAGATGA




GACTCTCCTTCTTTTCTCCGCTAGTTATCAAACTATAAACCTATTTTACCTCAAATACC




TCCAACATCACCCACTTAAACAGAATT





FBA1
SEQ ID NO: 45
TGCTTAAGTAATTGAAAACAGTGTTGTGATTATATAAGCATGGTATTTGAATAGAACT


promoter

ACTGGGGTTAACTTATCTAGTAGGATGGAAGTTGAGGGAGATCAAGATGCTTAAAGA




AAAGGATTGGCCAATATGAAAGCCATAATTAGCAATACTTATTTAATCAGATAATTGT




GGGGCATTGTGACTTGACTTTTACCAGGACTTCAAACCTCAACCATTTAAACAGTTAT




AGAAGACGTACCGTCACTTTTGCTTTTAATGTGATCTAAATGTGATCACATGAACTCA




AACTAAAATGATATCTTTTACTGGACAAAAATGTTATCCTGCAAACAGAAAGCTTTCT




TCTATTCTAAGAAGAACATTTACATTGGTGGGAAACCTGAAAACAGAAAATAAATAC




TCCCCAGTGACCCTATGAGCAGGATTTTTGCATCCCTATTGTAGGCCTTTCAAACTCAC




ACCTAATATTTCCCGCCACTCACACTATCAATGATCACTTCCCAGTTCTCTTCTTCCCC




TATTCGTACCATGCAACCCTTACACGCCTTTTCCATTTCGGTTCGGATGCGACTTCCAG




TCTGTGGGGTACGTAGCCTATTCTCTTAGCCGGTATTTAAACATACAAATTCACCCAA




ATTCTACCTTGATAAGGTAATTGATTAATTTCATAAATGAATTCGCG





GAP
SEQ ID NO: 46
TTTTTGTAGAAATGTCTTGGTGTCCTCGTCCAATCAGGTAGCCATCTCTGAAATATCTG


promoter

GCTCCGTTGCAACTCCGAACGACCTGCTGGCAACGTAAAATTCTCCGGGGTAAAACTT




AAATGTGGAGTAATGGAACCAGAAACGTCTCTTCCCTTCTCTCTCCTTCCACCGCCCG




TTACCGTCCCTAGGAAATTTTACTCTGCTGGAGAGCTTCTTCTACGGCCCCCTTGCAGC




AATGCTCTTCCCAGCATTACGTTGCGGGTAAAACGGAGGTCGTGTACCCGACCTAGCA




GCCCAGGGATGGAAAAGTCCCGGCCGTCGCTGGCAATAATAGCGGGCGGACGCATGT




CATGAGATTATTGGAAACCACCAGAATCGAATATAAAAGGCGAACACCTTTCCCAAT




TTTGGTTTCTCCTGACCCAAAGACTTTAAATTTAATTTATTTGTCCCTATTTCAATCAAT




TGAACAACTAT





PGK
SEQ ID NO: 47
AAATAGCAGTTTGCGGTTTCTTGATTTCATGGGGGGAACAAACAATAGTGTTGCCTTA


promoter

ATTCTAATTGGCATTGTTGCTTGGAATCGAAATTGGGGGATAACGTCATATCTGAAAA




GTAAACAACTTCGGGAAATCAGGCTGTTTGAATGGCTTGGAAGCGAGATAGAAAGGG




GATAGCGAGATAGAGGGGGCGGAGTAGACGAAGGGTGTTAAACTGCTGAAATCTCTC




AATCTGGAAGAAACGGAATAAATTAACTCCTTGCGATAATAAAATCCGAGTCCGTTAT




GACCCCACACCGTGTTGACCACGGCATACCCCATGGAATCTGGTACAAAGCGTCAGTC




TTGAAGACACCATCACGTGTAGGAGACTGATTGTCTGACCGTCCAGCAAAAAGGGCA




TTATAAATCTTGCTGTTAAAGGGGTGAGGGGAGATGCAGGTTGTTCTTTTATTCGCCTT




GAACTTTTTAATTTTCCCGGGGTTGCGGAGCGTGAACAGTTAGCCCGATCTGATAGCT




TGCAAGATTCAACAGTTTATCCACTACAGGTCAGAGAGATCGCCGCAGAAGAAATGC




TCGTCTCGTGTTCCAGCACACATACTGGTGAAGTCGTTATTTTGCCGAAGGGGGGGTA




ATAAGGTTATGCACCCCCTCTCCACACCCCAGAATCATTTTTTAGCTGGGTTCAAGGC




ATTAGACTTTGCACATTTTTCCCTTAAACACCCTTGAAACGCGGATAAACAGTTGCAT




GTGCATCCTAAAACTAGGTGAGATGCGTACTCCGTGCTCCGATAATAACAGTGGTGTT




GGGGTTGCTGCTAGCTCACGCACTCCGTTCTTTTTTTTCAACCAGCAAAATTCGATGGG




GAGAAACTTGGGGTACTTTGCCGACTCCTCCACCATGCTGGTATATAAATAATACTCG




CCCACTTTTCGTTTGCTGCTTTTATATTTCATAGACTGAAAAAGACTCTTCTTCTACTTT




TTCATAATATATCTCAGATATCACTACTATAG





TEFg_
SEQ ID NO: 48
GCGATTTAAATTCGCGAAAGAACAGCCTAATAAACTCCGAAGCATGATGGCCTCTATC


promoter

CGGAAAACGTTAAGAGATGTGGCAACAGGAGGGCACATAGAATTTTTAAAGACGCTG




AAGAATGCTATCATAGTCCGTAAAAATGTGATAGTACTTTGTTTAGTGCGTACGCCAC




TTATTCGGGGCCAATAGCTAAACCCAGGTTTGCTGGCAGCAAATTCAACTGTAGATTG




AATCTCTCTAACAATAATGGTGTTCAATCCCCTGGCTGGTCACGGGGAGGACTATCTT




GCGTGATCCGCTTGGAAAATGTTGTGTATCCCTTTCTCAATTGCGGAAAGCATCTGCT




ACTTCCCATAGGCACCAGTTACCCAATTGATATTTCCAAAAAAGATTACCATATGTTC




ATCTAGAAGTATAAATACAAGTGGACATTCAATGAATATTTCATTCAATTAGTCATTG




ACACTTTCATCAACTTACTACGTCTTATTCAACAATGAATTCGCG





PMP20
SEQ ID NO: 49
ACACAGTTATTATTCATTTAAATGTCAAAACAGTAGTGATAAAAGGCTATGAAGGAG


promoter

GTTGTCTAGGGGCTCGCGGAGGAAAGTGATTCAAACAGACCTGCCAAAAAGAGAAAA




AAGAGGGAATCCCTGTTCTTTCCAATGGAAATGACGTAACTTTAACTTGAAAAATACC




CCAACCAGAAGGGTTCAAACTCAACAAGGATTGCGTAATTCCTACAAGTAGCTTAGA




GCTGGGGGAGAGACAACTGAAGGCAGCTTAACGATAACGCGGGGGGATTGGTGCAC




GACTCGAAAGGAGGTATCTTAGTCTTGTAACCTCTTTTTTCCAGAGGCTATTCAAGATT




CATAGGCGATATCGATGTGGAGAAGGGTGAACAATATAAAAGGCTGGAGAGATGTCA




ATGAAGCAGCTGGATAGATTTCAAATTTTCTAGATTTCAGAGTAATCGCACAAAACGA




AGGAATCCCACCAAGACAAAAAAAAAAATTCTAAGGAATTCCGAAACG





SHB17
SEQ ID NO: 50
AAATTCTTTTTACGTGGTGCGCATACTGGACAGAGGCAGAGTCTCAATTTCTTCTTTTG


promoter

AGACAGGCTACTACAGCCTGTGATTCCTCTTGGTACTTGGATTTGCTTTTATCTGGCTC




CGTTGGGAACTGTGCCTGGGTTTTGAAGTATCTTGTGGATGTGTTTCTAACACTTTTTC




AATCTTCTTGGAGTGAGAATGCAGGACTTTGAACATCGTCTAGCTCGTTGGTAGGTGA




ACCGTTTTACCTTGCATGTGGTTAGGAGTTTTCTGGAGTAACCAAGACCGTCTTATCAT




CGCCGTAAAATCGCTCTTACTGTCGCTAATAATCCCGCTGGAAGAGAAGTTCGAACAG




AAGTAGCACGCAAAGCTCTTGTCAAATGAGAATTGTTAATCGTTTGACAGGTCACACT




CGTGGGCTATGTACGATCAACTTGCCGGCTGTTGCTGGAGAGATGACACCAGTTGTGG




CATGGCCAATTGGTATTCAGCCGTACCACTGTATGGAAAATGAGATTATCTTGTTCTT




GATCTAGTTTCTTGCCATTTTAGAGTTGCCACATTCGTAGGTTTCAGTACCAATAATGG




TAACTTCCAAACTTCCAACGCAGATACCAGAGATCTGCCGATCCTTCCCCAACAATAG




GAGCTTACTACGCCATACATATAGCCTATCTATTTTCACTTTCGCGTGGGTGCTTCTAT




ATAAACGGTTCCCCATCTTCCGTTTCATACTACTTGAATTTTAAGCACTAAAGAATT





PEX8
SEQ ID NO: 51
AAATTAACCAGTGTTTTCTTATCTATTTGTCTTTTTACACTAAAGTGAAGTACGAATCC


promoter

ATGCGATTGATTCCTCCTCAGATATCAGCTGAATTCTTGCTTATGTAATACTTGCGCGA




ACTACATGTGAACTTAGGATTCGATAAGGCTGGGGGGTCAACCAACCCCACTTCAAA




GAGCCGACCCGTATAAATAGCCTCTGCGTCCTCAGATCAACAAGACGAAGCAATTTTT




TTTTACCTATCTTCAGGTGCCTGTTAG





PEX4
SEQ ID NO: 52
AGGGAGGCAATTAGTTGTCCTTGTGGAATCAAAAGAGCACAAGAAACCTGTGATTGA


promoter

AAGTCTGGGCTGTCTGGGGTTGGCAAGAAAATCATAAAGTTTATATAGTACATTTGTT




AGTTGCTTCTTTGAATGACACCTTGATCTACATGTTGTTCTTCCCAGTTCCCACCGCGA




AGTTTCTCTAACTCTCAATCTCTCTTTCCCCACTTGATAATCCAAAGAA





TKL3
SEQ ID NO: 53
gtcgaggaaagggtcgtttcggggagttaaatatttttggctatgtagcagacatgtttcgacgctggcgtcgc


Promoter

gtcgatcggaaaatattaccccaggaacaagcacttgcttgggttagccaccaccctgcgcaagcctttttgcc




ggctctacacagggccaatgaaatctgggcggaatctgaaaccgatgaaacggacgacactggcaacaagctca




ctgcactattttttttttctagtgaaatagcctatcctcgtctcgctcccctcatacctgtaaaggggtgcaat




ttagcctcgttccagccattcacgggccactcaacaacacgtcggctaccatggggtgcttgggcaccaaaagg




cctataaataggcccccatccgtctgctacacagtcatctctgtcttttcttccc
















TABLE 5







Signal Peptides









Sequence
SEQ ID



Info
NO:
Amino Acid sequence





Signal Peptide
SEQ ID NO: 56
MFTPVRRRVRTAALALSAAAALVLGSTAASGASATPSPAPAP





Signal Peptide
SEQ ID NO: 57
MKLSTVLLSAGLASTTLA





Signal Peptide
SEQ ID NO: 58
MRFPSIFTAVLFAASSALA





Signal Peptide
SEQ ID NO: 59
MVSLRSIFTSSILAAGLTRAHG





Signal Peptide
SEQ ID NO: 60
MKFPVPLLFLLQLFFIIATQG





Signal Peptide
SEQ ID NO: 61
MQVKSIVNLLLACSLAVA





Signal Peptide
SEQ ID NO: 62
MQFNWNIKTVASILSALTLAQA





Signal Peptide
SEQ ID NO: 63
MYRNLIIATALTCGAYSAYVPSEPWSTLTPDASLESALKDYSQTFGIAIKSLDADKIKR





Signal Peptide
SEQ ID NO: 64
MNLYLITLLFASLCSAITLPKR





Signal Peptide
SEQ ID NO: 65
MFEKSKFVVSFLLLLQLFCVLGVHG





Signal Peptide
SEQ ID NO: 66
MQFNSVVISQLLLTLASVSMG





Signal Peptide
SEQ ID NO: 67
MKSQLIFMALASLVASAPLEHQQQHHKHEKR





Signal Peptide
SEQ ID NO: 68
MKFAISTLLIILQAAAVFA





Signal Peptide
SEQ ID NO: 69
MKLLNFLLSFVTLFGLLSGSVFA





Signal Peptide
SEQ ID NO: 70
MIFNLKTLAAVAISISQVSA





Signal Peptide
SEQ ID NO: 71
MKISALTACAVTLAGLAIAAPAPKPEDCTTTVQKRHQHKR





Signal Peptide
SEQ ID NO: 72
MSYLKISALLSVLSVALA





Signal Peptide
SEQ ID NO: 73
MLSTILNIFILLLFIQASLQ





Signal Peptide
SEQ ID NO: 74
MKLSTNLILAIAAASAVVSAAPVAPAEEAANHLHKR





Signal Peptide
SEQ ID NO: 75
MFKSLCMLIGSCLLSSVLA





Signal Peptide
SEQ ID NO: 76
MKLAALSTIALTILPVALA





Signal Peptide
SEQ ID NO: 77
MSFSSNVPQLFLLLVLLTNIVSG





Signal Peptide
SEQ ID NO: 78
MQLQYLAVLCALLLNVQSKNVVDFSRFGDAKISPDDTDLESRERKR





Signal Peptide
SEQ ID NO: 79
MKIHSLLLWNLFFIPSILG





Signal Peptide
SEQ ID NO: 80
MSTLTLLAVLLSLQNSALA





Signal Peptide
SEQ ID NO: 81
MINLNSFLILTVTLLSPALALPKNVLEEQQAKDDLAKR





Signal Peptide
SEQ ID NO: 82
MFSLAVGALLLTQAFG





Signal Peptide
SEQ ID NO: 83
MKILSALLLLFTLAFA





Signal Peptide
SEQ ID NO: 84
MKVSTTKFLAVFLLVRLVCA





Signal Peptide
SEQ ID NO: 85
MQFGKVLFAISALAVTALG





Signal Peptide
SEQ ID NO: 86
MWSLFISGLLIFYPLVLG





Signal Peptide
SEQ ID NO: 87
MRNHLNDLVVLFLLLTVAAQA





Signal Peptide
SEQ ID NO: 88
MFLKSLLSFASILTLCKA





Signal Peptide
SEQ ID NO: 89
MFVFEPVLLAVLVASTCVTA





Signal Peptide
SEQ ID NO: 90
MFSPILSLEIILALATLQSVFA





Signal Peptide
SEQ ID NO: 91
MIINHLVLTALSIALA





Signal Peptide
SEQ ID NO: 92
MLALVRISTLLLLALTASA





Signal Peptide
SEQ ID NO: 93
MRPVLSLLLLLASSVLA





Signal Peptide
SEQ ID NO: 94
MVLIQNFLPLFAYTLFFNQRAALA





Signal Peptide
SEQ ID NO: 95
MVSLTRLLITGIATALQVNA





Signal Peptide
SEQ ID NO: 96
MIFDGTTMSIAIGLLSTLGIGAEA





Signal Peptide
SEQ ID NO: 97
MVLVGLLTRLVPLVLLAGTVLLLVFVVLSGG





Signal Peptide
SEQ ID NO: 98
MLSILSALTLLGLSCA





Signal Peptide
SEQ ID NO: 99
MRLLHISLLSIISVLTKANA





Signal Peptide
SEQ ID NO: 100
MRFPSIFTAVLFAASSALAAPVNTTTEDETAQIPAEAVIGYLDLEGDFDVAVLPFSNSTNN




GLLFINTTIASIAAKEEGVSLDKREAEA





Signal Peptide
SED ID NO: 344
MRFPSIFTAVLFAASSALAAPVQTTTEDELEGDFDVAVLPFSASIAAKEEGVSLEKREAEA





Signal Peptide
SEQ ID NO: 101
MFKSVVYSILAASLANA





Signal Peptide
SEQ ID NO: 102
MLLQAFLFLLAGFAAKISA





Signal Peptide
SEQ ID NO: 103
MASSNLLSLALFLVLLTHANS





Signal Peptide
SEQ ID NO: 104
MNIFYIFLFLLSFVQGLEHTHRRGSLVKR





Signal Peptide
SEQ ID NO: 105
MLIIVLLFLATLANSLDCSGDVFFGYTRGDKTDVHKSQALTAVKNIKR





Signal Peptide
SEQ ID NO: 106
MESVSSLFNIFSTIMVNYKSLVLALLSVSNLKYARGMPTSERQQGLEER





Signal Peptide
SEQ ID NO: 107
MFAFYFLTACISLKGVFG





Signal Peptide
SEQ ID NO: 108
MRFSTTLATAATALFFTASQVSA





Signal Peptide
SEQ ID NO: 109
MKFAYSLLLPLAGVSASVINYKR





Signal Peptide
SEQ ID NO: 110
MKFFAIAALFAAAAVAQPLEDR





Signal Peptide
SEQ ID NO: 111
MQFFAVALFATSALA





Signal Peptide
SEQ ID NO: 112
MKWVTFISLLFLFSSAYSRGVFRR





Signal Peptide
SEQ ID NO: 113
MRSLLILVLCFLPLAALG





Signal Peptide
SEQ ID NO: 114
MKVLILACLVALALA





Signal Peptide
SEQ ID NO: 115
MFNLKTILISTLASIAVA





Signal Peptide
SEQ ID NO: 116
MYRKLAVISAFLATARAQSA





WT
SEQ ID NO: 117
MRFPSIFTAVLFAASSALAAPVNTTTEDETAQIPAEAVIGYLDLEGDFDVAVLPFSNSTNN




GLLFINTTIASIAAKEEGVQLDKR





App3
SEQ ID NO: 118
MRFPPIFTAALFAASSALAAPANTTTEDETAQIPAEAVIGYLDSEGDSDVAVLPFSNSTNN




GLSFINTTIASIAAKEEGVQLDKR





App8
SEQ ID NO: 119
MRFPSIFTAVLFAASSALAAPANTTTEDETAQIPAEAVISYSDLEGDFDAAALPLSNSTNN




GLSSTNTTIASIAAKEEGVQLDKR





App9
SEQ ID NO: 120
MRPPSIFTAVLFAASSALAAPANTTTEDETTQIPAEAVATYLDLEGDVDVAVLPFSSSTN




NGLSFINTTIASIAAKEEGVQLDKR





App10
SEQ ID NO: 121
MRFPSIFTAALFAASSALAAPANTTTEGETAQTPAEAVIGYRDLEGDFDVAVLPFPNSTN




NGLLFTNTTTASIAAKEEGVQLDKR





appS1
SEQ ID NO: 122
MRFPSIFTAVLLAAPSALAAPANATTEDEAAQIPAEAVIGYLDLEGDFDAAVLPFSNSTN




NGLLSINTTIASIAAKEEGVQLDKR





appS4
SEQ ID NO: 123
MRFPSIFTAVVFAASSALAAPANTTAEDETAQIPAEAVIGYLGLEGDSDVAALPLSDSTN




NGSLSTNTTIASIAAKEEGVQLDKR





appS6
SEQ ID NO: 124
MRLPSIFTAAVFAASSALAAPANTTTEDETAQIPAEAAIGYLDLEGDSDVAVLPLSNSTN




NGLLFINTTIASIAAKEEGVQLDKR





appS8
SEQ ID NO: 125
MRFPSIFTAVLFAASSALAAPANTTTEDETAQIPAEAVIGYLDLEGDFDVAVLPFSNSTND




GLSFINTTTASIAAKEEGVQLDKR





a-Factor
SEQ ID NO: 126
MRFPSIFTAVLFAASSALAAPVNTTTEDETAQIPA





PpScw11p
SEQ ID NO: 127
MLSTILNIFILLLFIQASLQAPIPVVTKYVTEGIAVV





PpDse4p
SEQ ID NO: 128
MSFSSNVPQLFLLLVLLTNIVSGAVISVWSTSKVTK





PpExglp
SEQ ID NO: 129
MNLYLITLLFASLCSAITLPKRDIIWDYSSEKIMG





a-EGFP
SEQ ID NO: 130
MRFPSIFTAVLFAASSALAAPVNTTTEDETAQIPA





S-EGFP
SEQ ID NO: 131
MLSTILNIFILLLFIQASLQEFDYKDDDDKMVSKG





D-EGFP
SEQ ID NO: 132
MSFSSNVPQLFLLLVLLTNIVSGEFDYKDDDDKMV





E-EGFP
SEQ ID NO: 133
MNLYLITLLFASLCSAEFDYKDDDDKMVSKGEELF





a-CALB
SEQ ID NO: 134
MRFPSIFTAVLFAASSALAAPVNTTTEDETAQIPA





S-CALB
SEQ ID NO: 135
MLSTILNIFILLLFIQASLQEFLPSGSDPAFSQPK





D-CALB
SEQ ID NO: 136
MSFSSNVPQLFLLLVLLTNIVSGEFLPSGSDPAFS





E-CALB
SEQ ID NO: 137
MNLYLITLLFASLCSAEFLPSGSDPAFSQPKSVLD





Amylase (AA)
SEQ ID NO: 138
MVAWWSLFLYGLQVAAPALAAEVDCSRFPNATDKEGKDVLVCNKDLRPICGTDGVTY




TNDCLLCAYSIEFGTNISKEHDGECKETVPMNCSSYANTTSEDGKVMVLCNRAFNPVCG




TDGVTYDNECLLCAHKVEQGASVDKRHDGGCRKELAAVSVDCSEYPKPDCTAEDRPLC




GSDNKTYGNKCNFCNAVVESNGTLTLSHFGKC





Alpha K (AK)
SEQ ID NO: 139
MRFPSIFTAVLFAASSALAAPVNTTTEDELEGDFDVAVLPFSASIAAKEEGVSLEKRAEV




DCSRFPNATDKEGKDVLVCNKDLRPICGTDGVTYTNDCLLCAYSIEFGTNISKEHDGECK




ETVPMNCSSYANTTSEDGKVMVLCNRAFNPVCGTDGVTYDNECLLCAHKVEQGASVD




KRHDGGCRKELAAVSVDCSEYPKPDCTAEDRPLCGSDNKTYGNKCNFCNAVVESNGTL




TLSHFGKC





Alpha T (AT)
SEQ ID NO: 140
MRFPSIFTAVLFAASSALAAEVDCSRFPNATDKEGKDVLVCNKDLRPICGTDGVTYTND




CLLCAYSIEFGTNISKEHDGECKETVPMNCSSYANTTSEDGKVMVLCNRAFNPVCGTDG




VTYDNECLLCAHKVEQGASVDKRHDGGCRKELAAVSVDCSEYPKPDCTAEDRPLCGSD




NKTYGNKCNFCNAVVESNGTLTLSHFGKC





Glucoamyl
SEQ ID NO: 141
MSFRSLLALSGLVCSGLAAEVDCSRFPNATDKEGKDVLVCNKDLRPICGTDGVTYTNDC


(GA)

LLCAYSIEFGTNISKEHDGECKETVPMNCSSYANTTSEDGKVMVLCNRAFNPVCGTDGV




TYDNECLLCAHKVEQGASVDKRHDGGCRKELAAVSVDCSEYPKPDCTAEDRPLCGSDN




KTYGNKCNFCNAVVESNGTLTLSHFGKC





Ovomucoid
SEQ ID NO: 144
MAMAGVFVLFSFVLCGFLPDAAFG


signal peptide







Lysozyme
SEQ ID NO: 145
MRSLLILVLCFLPLAALG


signal peptide







Ovalbumin
SEQ ID NO: 146
MRFPSIFTAVLFAASSALAAPVNTTTEDETAQIPAEAVIGYSDLEGDFDVAVLPFSNSTNN


Signal Peptide

GLLFINTTIASIAAKEEGVSLDKREAEA





Ovotransferrin
SEQ ID NO: 147
MKLILCTVLSLGIAAVCFA


Signal Peptide







Bovine
SEQ ID NO: 148
MKLFVPALLSLGALGLCLA


Lactoferrin




Signal Peptide







Porcine
SEQ ID NO: 149
MKLFIPALLFLGTLGLCLA


Lactoferrin




Signal Peptide







Kid Lipase
SEQ ID NO: 150
MESKALLLLALSVWLQSLTVSHG


Signal Peptide







Porcine
SEQ ID NO: 151
MLLIWTLSLLLGAVLG


Lipase Signal




Peptide
















TABLE 6







Proteins of Interest









SEQ ID
Sequence



NO.
Info
Sequence





Ovomucoid
SEQ ID NO:
AEVDCSRFPNATDKEGKDVLVCNKDLRPICGTDGVTYTNDCLLCAYSIEFGTNISKEHDGECK


(canonical)
152
ETVPMNCSSYANTTSEDGKVMVLCNRAFNPVCGTDGVTYDNECLLCAHKVEQGASVDKRH




DGGCRKELAAVSVDCSEYPKPDCTAEDRPLCGSDNKTYGNKCNFCNAVVESNGTLTLSHFGK




C*





Ovomucoid
SEQ ID NO:
AEVDCSRFPNATDMEGKDVLVCNKDLRPICGTDGVTYTNDCLLCAYSVEFGTNISKEHDGEC



153
KETVPMNCSSYANTTSEDGKVMVLCNRAFNPVCGTDGVTYDNECLLCAHKVEQGASVDKR




HDGGCRKELAAVSVDCSEYPKPDCTAEDRPLCGSDNKTYGNKCNFCNAVVESNGTLTLSHFG




KC*





Ovomucoid
SEQ ID NO:
AEVDCSRFPNATDMEGKDVLVCNKDLRPICGTDGVTYTNDCLLCAYSVEFGTNISKEHDGEC


G162M
154
KETVPMNCSSYANTTSEDGKVMVLCNRAFNPVCGTDGVTYDNECLLCAHKVEQGASVDKR


F167A

HDGGCRKELAAVSVDCSEYPKPDCTAEDRPLCGSDNKTYMNKCNACNAVVESNGTLTLSHF




GKC*





Ovomucoid
SEQ ID NO:
MAMAGVFVLFSFVLCGFLPDAAFGAEVDCSRFPNATDKEGKDVLVCNKDLRPICGTDGVTYT


isoform 1
155
NDCLLCAYSIEFGTNISKEHDGECKETVPMNCSSYANTTSEDGKVMVLCNRAFNPVCGTDGV


precursor full

TYDNECLLCAHKVEQGASVDKRHDGGCRKELAAVSVDCSEYPKPDCTAEDRPLCGSDNKTY


length

GNKCNFCNAVVESNGTLTLSHFGKC





Ovomucoid
SEQ ID NO:
MAMAGVFVLFSFVLCGFLPDAVFGAEVDCSRFPNATDMEGKDVLVCNKDLRPICGTDGVTY


[Gallusgallus]
156
TNDCLLCAYSVEFGTNISKEHDGECKETVPMNCSSYANTTSEDGKVMVLCNRAFNPVCGTDG




VTYDNECLLCAHKVEQGASVDKRHDGGCRKELAAVSVDCSEYPKPDCTAEDRPLCGSDNKT




YGNKCNFCNAVVESNGTLTLSHFGKC





Ovomucoid
SEQ ID NO:
MAMAGVFVLFSFVLCGFLPDAAFGAEVDCSRFPNATDKEGKDVLVCNKDLRPICGTDGVTYT


isoform 2
157
NDCLLCAYSIEFGTNISKEHDGECKETVPMNCSSYANTTSEDGKVMVLCNRAFNPVCGTDGV


precursor

TYDNECLLCAHKVEQGASVDKRHDGGCRKELAAVDCSEYPKPDCTAEDRPLCGSDNKTYGN


[Gallusgallus]

KCNFCNAVVESNGTLTLSHFGKC





Ovomucoid
SEQ ID NO:
AEVDCSRFPNATDKEGKDVLVCNKDLRPICGTDGVTYNNECLLCAYSIEFGTNISKEHDGECK


[Gallusgallus]
158
ETVPMNCSSYANTTSEDGKVMVLCNRAFNPVCGTDGVTYDNECLLCAHKVEQGASVDKRH




DGECRKELAAVSVDCSEYPKPDCTAEDRPLCGSDNKTYGNKCNFCNAVVESNGTLTLSHFGK




C





Ovomucoid
SEQ ID NO:
MAMAGVFVLFSFALCGFLPDAAFGVEVDCSRFPNATNEEGKDVLVCTEDLRPICGTDGVTYS


[Numida
159
NDCLLCAYNIEYGTNISKEHDGECREAVPVDCSRYPNMTSEEGKVLILCNKAFNPVCGTDGVT



meleagris]


YDNECLLCAHNVEQGTSVGKKHDGECRKELAAVDCSEYPKPACTMEYRPLCGSDNKTYDNK




CNFCNAVVESNGTLTLSHFGKC





PREDICTED:
SEQ ID NO:
MQTITWRQPQGDHLRSRAPAATCRAGQYLTMAMAGIFVLFSFALCGFLPDAAFGVEVDCSRF


Ovomucoid
160
PNTTNEEGKDVLVCTEDLRPICGTDGVTHSECLLCAYNIEYGTNISKEHDGECREAVPMDCSR


isoform X1

YPNTTNEEGKVMILCNKALNPVCGTDGVTYDNECVLCAHNLEQGTSVGKKHDGGCRKELAA


[Meleagris

VSVDCSEYPKPACTLEYRPLCGSDNKTYGNKCNFCNAVVESNGTLTLSHFGKC



gallopavo]








Ovomucoid
SEQ ID NO:
VEVDCSRFPNTTNEEGKDVLVCTEDLRPICGTDGVTHSECLLCAYNIEYGTNISKEHDGECRE


[Meleagris
161
AVPMDCSRYPNTTSEEGKVMILCNKALNPVCGTDGVTYDNECVLCAHNLEQGTSVGKKHDG



gallopavo]


ECRKELAAVSVDCSEYPKPACTLEYRPLCGSDNKTYGNKCNFCNAVVESNGTLTLSHFGKC





PREDICTED:
SEQ ID NO:
MQTITWRQPQGDHLRSRAPAATCRAGQYLTMAMAGIFVLFSFALCGFLPDAAFGVEVDCSRF


Ovomucoid
162
PNTTNEEGKDVLVCTEDLRPICGTDGVTHSECLLCAYNIEYGTNISKEHDGECREAVPMDCSR


isoform X2

YPNTTNEEGKVMILCNKALNPVCGTDGVTYDNECVLCAHNLEQGTSVGKKHDGGCRKELAA


[Meleagris

VDCSEYPKPACTLEYRPLCGSDNKTYGNKCNFCNAVVESNGTLTLSHFGKC


gallopavo]







Ovomucoid
SEQ ID NO:
EYGTNISIKHNGECKETVPMDCSRYANMTNEEGKVMMPCDRTYNPVCGTDGVTYDNECQLC


[Bambusicola
163
AHNVEQGTSVDKKHDGVCGKELAAVSVDCSEYPKPECTAEERPICGSDNKTYGNKCNFCNA



thoracicus]


VVYVQP





Ovomucoid
SEQ ID NO:
VDCSRFPNTTNEEGKDVLACTKELHPICGTDGVTYSNECLLCYYNIEYGTNISKEHDGECTEA


[Callipepla
164
VPVDCSRYPNTTSEEGKVLIPCNRDFNPVCGSDGVTYENECLLCAHNVEQGTSVGKKHDGGC



squamata]


RKEFAAVSVDCSEYPKPDCTLEYRPLCGSDNKTYASKCNFCNAVVIWEQEKNTRHHASHSVF




FISARLVC





Ovomucoid
SEQ ID NO:
MLPLGLREYGTNTSKEHDGECTEAVPVDCSRYPNTTSEEGKVRILCKKDINPVCGTDGVTYD


[Colinus
165
NECLLCSHSVGQGASIDKKHDGGCRKEFAAVSVDCSEYPKPACMSEYRPLCGSDNKTYVNKC



virginianus]


NFCNAVVYVQPWLHSRCRLPPTGTSFLGSEGRETSLLTSRATDLQVAGCTAISAMEATRAAAL




LGLVLLSSFCELSHLCFSQASCDVYRLSGSRNLACPRIFQPVCGTDNVTYPNECSLCRQMLRSR




AVYKKHDGRCVKVDCTGYMRATGGLGTACSQQYSPLYATNGVIYSNKCTFCSAVANGEDID




LLAVKYPEEESWISVSPTPWRMLSAGA





Ovomucoid-
SEQ ID NO:
MSWWGIKPALERPSQEQSTSGQPVDSGSTSTTTMAGIFVLLSLVLCCFPDAAFGVEVDCSRFP


like isoform
166
NTTNEEGKEVLLCTKDLSPICGTDGVTYSNECLLCAYNIEYGTNISKDHDGECKEAVPVDCST


X2

YPNMTNEEGKVMLVCNKMFSPVCGTDGVTYDNECMLCAHNVEQGTSVGKKYDGKCKKEV


[Ansercygnoides

ATVDCSDYPKPACTVEYMPLCGSDNKTYDNKCNFCNAVVDSNGTLTLSHFGKC



domesticus]








Ovomucoid-
SEQ ID NO:
MSSQNQLHRRRRPLPGGQDLNKYYWPHCTSDRFSWLLHVTAEQFRHCVCIYLQPALERPSQE


like isoform
167
QSTSGQPVDSGSTSTTTMAGIFVLLSLVLCCFPDAAFGVEVDCSRFPNTTNEEGKEVLLCTKDL


X1

SPICGTDGVTYSNECLLCAYNIEYGTNISKDHDGECKEAVPVDCSTYPNMTNEEGKVMLVCN


[Ansercygnoides

KMFSPVCGTDGVTYDNECMLCAHNVEQGTSVGKKYDGKCKKEVATVDCSDYPKPACTVEY



domesticus]


MPLCGSDNKTYDNKCNFCNAVVDSNGTLTLSHFGKC





Ovomucoid
SEQ ID NO:
VEVDCSRFPNTTNEEGKDEVVCPDELRLICGTDGVTYNHECMLCFYNKEYGTNISKEQDGEC


[Coturnix
168
GETVPMDCSRYPNTTSEDGKVTILCTKDFSFVCGTDGVTYDNECMLCAHNVVQGTSVGKKH



japonica]


DGECRKELAAVSVDCSEYPKPACPKDYRPVCGSDNKTYSNKCNFCNAVVESNGTLTLNHFGK




C





Ovomucoid
SEQ ID NO:
MAMAGVFLLFSFALCGFLPDAAFGVEVDCSRFPNTTNEEGKDEVVCPDELRLICGTDGVTYN


[Coturnix
169
HECMLCFYNKEYGTNISKEQDGECGETVPMDCSRYPNTTSEDGKVTILCTKDFSFVCGTDGVT



japonica]


YDNECMLCAHNIVQGTSVGKKHDGECRKELAAVSVDCSEYPKPACPKDYRPVCGSDNKTYS




NKCNFCNAVVESNGTLTLNHFGKC





Ovomucoid
SEQ ID NO:
MAGVFVLLSLVLCCFPDAAFGVEVDCSRFPNTTNEEGKDVLLCTKELSPVCGTDGVTYSNEC


[Anas
170
LLCAYNIEYGTNISKDHDGECKEAVPADCSMYPNMTNEEGKMTLLCNKMFSPVCGTDGVTY



platyrhynchos]


DNECMLCAHNVEQGTSVGKKYDGKCKKEVATVDCSGYPKPACTMEYMPLCGSDNKTYGNK




CNFCNAVVDSNGTLTLSHFGEC





Ovomucoid,
SEQ ID NO:
QVDCSRFPNTTNEEGKEVLLCTKELSPVCGTDGVTYSNECLLCAYNIEYGTNISKDHDGECKE


partial [Anas
171
AVPADCSMYPNMTNEEGKMTLLCNKMFSPVCGTDGVTYDNECMLCAHNVEQGTSVGKKYD



platyrhynchos]


GKCKKEVATVSVDCSGYPKPACTMEYMPLCGSDNKTYGNKCNFCNAVV





Ovomucoid-
SEQ ID NO:
MTMPGAFVVLSFVLCCFPDATFGVEVDCSTYPNTTNEEGKEVLVCSKILSPICGTDGVTYSNE


like [Tyto
172
CLLCANNIEYGTNISKYHDGECKEFVPVNCSRYPNTTNEEGKVMLICNKDLSPVCGTDGVTYD



alba]


NECLLCAHNLEPGTSVGKKYDGECKKEIATVDCSDYPKPVCSLESMPLCGSDNKTYSNKCNF




CNAVVDSNETLTLSHFGKC





Ovomucoid
SEQ ID NO:
MTMAGVFVLLSFALCCFPDAAFGVEVDCSTYPNTTNEEGKEVLVCTKILSPICGTDGVTYSNE


[Balearica
173
CLLCAYNIEYGTNVSKDHDGECKEVVPVDCSRYPNSTNEEGKVVMLCSKDLNPVCGTDGVT



regulorum


YDNECVLCAHNVESGTSVGKKYDGECKKETATVDCSDYPKPACTLEYMPFCGSDSKTYSNK



gibbericeps]


CNFCNAVVDSNGTLTLSHFGKC





Turkey
SEQ ID NO:
MTTAGVFVLLSFALCSFPDAAFGVEVDCSTYPNTTNEEGKEVLVCTKILSPICGTDGVTYSNEC


vulture
174
LLCAYNIEYGTNVSKDHDGECKEFVPVDCSRYPNTTNEDGKVVLLCNKDLSPICGTDGVTYD


[Cathartes

NECLLCARNLEPGTSVGKKYDGECKKEIATVDCSDYPKPVCSLEYMPLCGSDSKTYSNKCNF



aura] OVD


CNAVVDSNGTLTLSHFGKC


(native




sequence)







Ovomucoid-
SEQ ID NO:
MTTAGVFVLLSFTLCSFPDAAFGVEVDCSPYPNTTNEEGKEVLVCNKILSPICGTDGVTYSNEC


like [Cuculus
175
LLCAYNLEYGTNISKDYDGECKEVAPVDCSRHPNTTNEEGKVELLCNKDLNPICGTNGVTYD



canorus]


NECLLCARNLESGTSIGKKYDGECKKEIATVDCSDYPKPVCTLEEMPLCGSDNKTYGNKCNFC




NAVVDSNGTLTLSHFGKC





Ovomucoid
SEQ ID NO:
MTTAVVFVLLSFALCCFPDAAFGVEVDCSTYPNSTNEEGKDVLVCPKILGPICGTDGVTYSNE


[Antrostomus
176
CLLCAYNIQYGTNVSKDHDGECKEIVPVDCSRYPNTTNEEGKVVFLCNKNFDPVCGTDGDTY



carolinensis]


DNECMLCARSLEPGTTVGKKHDGECKREIATVDCSDYPKPTCSAEDMPLCGSDSKTYSNKCN




FCNAVVDSNGTLTLSRFGKC





Ovomucoid
SEQ ID NO:
MTMTGVFVLLSFAICCFPDAAFGVEVDCSTYPNTTNEEGKEVLVCTKILSPICGTDGVTYSNEC


[Cariama
177
LLCAYNIEYGTNVSKDHDGECKEVVPVDCSKYPNTTNEEGKVVLLCSKDLSPVCGTDGVTYD



cristata]


NECLLCARNLEPGSSVGKKYDGECKKEIATIDCSDYPKPVCSLEYMPLCGSDSKTYDNKCNFC




NAVVDSNGTLTLSHFGKC





Ovomucoid-
SEQ ID NO:
MTTAGVFVLLSFVLCCFPDAVFGVEVDCSTYPNTTNEEGKEVLVCTKILSPICGTDGVTYSNE


like isoform
178
CLLCAYNIEYGTNVSKDHDGECKEVVPVNCSRYPNTTNEEGKVVLRCSKDLSPVCGTDGVTY


X2

DNECLMCARNLEPGAVVGKNYDGECKKEIATVDCSDYPKPVCSLEYMPLCGSDSKTYSNKC


[Pygoscelis

NFCNAVVDSNGTLTLSHFGKC



adeliae]








Ovomucoid-
SEQ ID NO:
MTTAGVFVLLSIALCCFPDAAFGVEVDCSAYSNTTSEEGKEVLSCTKILSPICGTDGVTYSNEC


like [Nipponia
179
LLCAYNIEYGTNISKDHDGECKEVVSVDCSRYPNTTNEEGKAVLLCNKDLSPVCGTDGVTYD



nippon]


NECLLCAHNLEPGTSVGKKYDGACKKEIATVDCSDYPKPVCTLEYLPLCGSDSKTYSNKCDF




CNAVVDSNGTLTLSHFGKC





Ovomucoid-
SEQ ID NO:
MTTAGVFVLLSFALCCFPDAAFGVEVDCSTYPNTTNEEGKEVLVCTKILSPICGTDGTTYSNEC


like [Phaethon
180
LLCAYNIEYGTNVSKDHDGECKVVPVDCSKYPNTTNEDGKVVLLCNKALSPICGTDRVTYDN



lepturus]


ECLMCAHNLEPGTSVGKKHDGECQKEVATVDCSDYPKPVCSLEYMPLCGSDGKTYSNKCNF




CNAVVNSNGTLTLSHFEKC





Ovomucoid-
SEQ ID NO:
MTTAGVFVLLSFVLCCFFPDAAFGVEVDCSTYPNTTNEEGKEVLVCAKILSPVCGTDGVTYSN


like isoform
181
ECLLCAHNIENGTNVGKDHDGKCKEAVPVDCSRYPNTTDEEGKVVLLCNKDVSPVCGTDGV


X1

TYDNECLLCAHNLEAGTSVDKKNDSECKTEDTTLAAVSVDCSDYPKPVCTLEYLPLCGSDNK


[Melopsittacus

TYSNKCRFCNAVVDSNGTLTLSRFGKC



undulatus]








Ovomucoid
SEQ ID NO:
MTTAGVFVLLSFALCCSPDAAFGVEVDCSTYPNTTNEEGKEVLACTKILSPICGTDGVTYSNE


[Podiceps
182
CLLCAYNMEYGTNVSKDHDGKCKEVVPVDCSRYPNTTNEEGKVVLLCNKDLSPVCGTDGVT



cristatus]


YDNECLLCARNLEPGASVGKKYDGECKKEIATVDCSDYPKPVCSLEHMPLCGSDSKTYSNKC




TFCNAVVDSNGTLTLSHFGKC





Ovomucoid-
SEQ ID NO:
MTTAGVFVLLSFALCCFPDAAFGVEVDCSTYPNTTNEEGREVLVCTKILSPICGTDGVTYSNEC


like [Fulmarus
183
LLCAYNIEYGTNVSKDHDGECKEVAPVGCSRYPNTTNEEGKVVLLCNKDLSPVCGTDGVTYD



glacialis]


NECLLCARHLEPGTSVGKKYDGECKKEIATVDCSDYPKPVCSLEYMPLCGSDSKTYSNKCNF




CNAVLDSNGTLTLSHFGKC





Ovomucoid
SEQ ID NO:
MTTAGVFVLLSFALCCFPDAVFGVEVDCSTYPNTTNEEGKEVLVCTKILSPICGTDGVTYSNE


[Aptenodytes
184
CLLCAYNIEYGTNVSKDHDGECKEVVPVDCSRYPNTTNEEGKVVLRCNKDLSPVCGTDGVTY



forsteri]


DNECLMCARNLEPGAIVGKKYDGECKKEIATVDCSDYPKPVCSLEYMPLCGSDSKTYSNKCN




FCNAVVDSNGTLILSHFGKC





Ovomucoid-
SEQ ID NO:
MTTAGVFVLLSFVLCCFPDAVFGVEVDCSTYPNTTNEEGKEVLVCTKILSPICGTDGVTYSNE


like isoform
185
CLLCAYNIEYGTNVSKDHDGECKEVVPVDCSRYPNTTNEEGKVVLRCSKDLSPVCGTDGVTY


X1

DNECLMCARNLEPGAVVGKNYDGECKKEIATVDCSDYPKPVCSLEYMPLCGSDSKTYSNKC


[Pygoscelis

NFCNAVVDSNGTLTLSHFGKC



adeliae]








Ovomucoid
SEQ ID NO:
MSSQNQLPSRCRPLPGSQDLNKYYQPHCTGDRFCWLFYVTVEQFRHCICIYLQLALERPSHEQ


isoform X1
186
SGQPADSRNTSTMTTAGVFVLLSFALCCFPDAVFGVEVDCSTYPNTTNEEGKEVLVCTKILSPI


[Aptenodytes

CGTDGVTYSNECLLCAYNIEYGTNVSKDHDGECKEVVPVDCSRYPNTTNEEGKVVLRCNKD



forsteri]


LSPVCGTDGVTYDNECLMCARNLEPGAIVGKKYDGECKKEIATVDCSDYPKPVCSLEYMPLC




GSDSKTYSNKCNFCNAVVDSNGTLILSHFGKC





Ovomucoid,
SEQ ID NO:
MTTAVVFVLLSFALCCFPDAAFGVEVDCSTYPNSTNEEGKDVLVCPKILGPICGTDGVTYSNE


partial
187
CLLCAYNIQYGTNVSKDHDGECKEIVPVDCSRYPNTTNEEGKVVFLCNKNFDPVCGTDGDTY


[Antrostomus

DNECMLCARSLEPGTTVGKKHDGECKREIATVDCSDYPKPTCSAEDMPLCGSDSKTYSNKCN



carolinensis]


FCNAVV





rOVD as
SEQ ID NO:
EAEAAEVDCSRFPNATDKEGKDVLVCNKDLRPICGTDGVTYTNDCLLCAYSIEFGTNISKEHD


expressed in
188
GECKETVPMNCSSYANTTSEDGKVMVLCNRAFNPVCGTDGVTYDNECLLCAHKVEQGASVD


pichia

KRHDGGCRKELAAVSVDCSEYPKPDCTAEDRPLCGSDNKTYGNKCNFCNAVVESNGTLTLS


secreted form

HFGKC


1







rOVD as
SEQ ID NO:
EEGVSLEKREAEAAEVDCSRFPNATDKEGKDVLVCNKDLRPICGTDGVTYTNDCLLCAYSIEF


expressed in
189
GTNISKEHDGECKETVPMNCSSYANTTSEDGKVMVLCNRAFNPVCGTDGVTYDNECLLCAH


pichia

KVEQGASVDKRHDGGCRKELAAVSVDCSEYPKPDCTAEDRPLCGSDNKTYGNKCNFCNAVV


secreted form

ESNGTLTLSHFGKC


2







rOVD [gallus]
SEQ ID NO:
MRFPSIFTAVLFAASSALAAPVNTTTEDETAQIPAEAVIGYSDLEGDFDVAVLPFSNSTNNGLL


coding
190
FINTTIASIAAKEEGVSLEKREAEAAEVDCSRFPNATDKEGKDVLVCNKDLRPICGTDGVTYTN


sequence

DCLLCAYSIEFGTNISKEHDGECKETVPMNCSSYANTTSEDGKVMVLCNRAFNPVCGTDGVT


containing an

YDNECLLCAHKVEQGASVDKRHDGGCRKELAAVSVDCSEYPKPDCTAEDRPLCGSDNKTYG


alpha mating

NKCNFCNAVVESNGTLTLSHFGKC


factor signal




sequence




(bolded) as




expressed in




pichia







Turkey
SEQ ID NO:
MRFPSIFTAVLFAASSALAAPVNTTTEDETAQIPAEAVIGYSDLEGDFDVAVLPFSNSTNNGLL


vulture OVD
191
FINTTIASIAAKEEGVSLEKREAEAVEVDCSTYPNTTNEEGKEVLVCTKILSPICGTDGVTYSNE


coding

CLLCAYNIEYGTNVSKDHDGECKEFVPVDCSRYPNTTNEDGKVVLLCNKDLSPICGTDGVTY


sequence

DNECLLCARNLEPGTSVGKKYDGECKKEIATVDCSDYPKPVCSLEYMPLCGSDSKTYSNKCN


containing

FCNAVVDSNGTLTLSHFGKC


secretion




signals as




expressed in




pichia bolded




is an alpha




mating factor




signal




sequence







Turkey
SEQ ID NO:
EAEAVEVDCSTYPNTTNEEGKEVLVCTKILSPICGTDGVTYSNECLLCAYNIEYGTNVSKDHD


vulture OVD
192
GECKEFVPVDCSRYPNTTNEDGKVVLLCNKDLSPICGTDGVTYDNECLLCARNLEPGTSVGK


in secreted

KYDGECKKEIATVDCSDYPKPVCSLEYMPLCGSDSKTYSNKCNFCNAVVDSNGTLTLSHFGK


form

C


expressed in




Pichia







Humming bird
SEQ ID NO:
MTMAGVFVLLSFILCCFPDTAFGVEVDCSIYPNTTSEEGKEVLVCTETLSPICGSDGVTYNNEC


OVD (native
193
QLCAYNVEYGTNVSKDHDGECKEIVPVDCSRYPNTTEEGRVVMLCNKALSPVCGTDGVTYD


sequence)

NECLLCARNLESGTSVGKKFDGECKKEIATVDCTDYPKPVCSLDYMPLCGSDSKTYSNKCNF




CNAVMDSNGTLTLNHFGKC





Humming bird
SEQ ID NO:
MRFPSIFTAVLFAASSALAAPVNTTTEDETAQIPAEAVIGYSDLEGDFDVAVLPFSNSTNNGLL


OVD coding
194
FINTTIASIAAKEEGVSLDKREAEAVEVDCSIYPNTTSEEGKEVLVCTETLSPICGSDGVTYNNE


sequence as

CQLCAYNVEYGTNVSKDHDGECKEIVPVDCSRYPNTTEEGRVVMLCNKALSPVCGTDGVTY


expressed in

DNECLLCARNLESGTSVGKKFDGECKKEIATVDCTDYPKPVCSLDYMPLCGSDSKTYSNKCN


Pichia

FCNAVMDSNGTLTLNHFGKC





Humming bird
SEQ ID NO:
EAEAVEVDCSIYPNTTSEEGKEVLVCTETLSPICGSDGVTYNNECQLCAYNVEYGTNVSKDHD


OVD in
195
GECKEIVPVDCSRYPNTTEEGRVVMLCNKALSPVCGTDGVTYDNECLLCARNLESGTSVGKK


secreted form

FDGECKKEIATVDCTDYPKPVCSLDYMPLCGSDSKTYSNKCNFCNAVMDSNGTLTLNHFGKC


from Pichia







Ovalbumin
SEQ ID NO:
MFFYNTDFRMGSISAANAEFCFDVFNELKVQHTNENILYSPLSIIVALAMVYMGARGNTEYQ


related protein
196
MEKALHFDSIAGLGGSTQTKVQKPKCGKSVNIHLLFKELLSDITASKANYSLRIANRLYAEKSR


X

PILPIYLKCVKKLYRAGLETVNFKTASDQARQLINSWVEKQTEGQIKDLLVSSSTDLDTTLVLV




NAIYFKGMWKTAFNAEDTREMPFHVTKEESKPVQMMCMNNSFNVATLPAEKMKILELPFAS




GDLSMLVLLPDEVSGLERIEKTINFEKLTEWTNPNTMEKRRVKVYLPQMKIEEKYNLTSVLM




ALGMTDLFIPSANLTGISSAESLKISQAVHGAFMELSEDGIEMAGSTGVIEDIKHSPELEQFRAD




HPFLFLIKHNPTNTIVYFGRYWSP*





Ovalbumin
SEQ ID NO:
MDSISVTNAKFCFDVFNEMKVHHVNENILYCPLSILTALAMVYLGARGNTESQMKKVLHFDS


related protein
197
ITGAGSTTDSQCGSSEYVHNLFKELLSEITRPNATYSLEIADKLYVDKTFSVLPEYLSCARKFYT


Y

GGVEEVNFKTAAEEARQLINSWVEKETNGQIKDLLVSSSIDFGTTMVFINTIYFKGIWKIAFNT




EDTREMPFSMTKEESKPVQMMCMNNSFNVATLPAEKMKILELPYASGDLSMLVLLPDEVSGL




ERIEKTINFDKLREWTSTNAMAKKSMKVYLPRMKIEEKYNLTSILMALGMTDLFSRSANLTGI




SSVDNLMISDAVHGVFMEVNEEGTEATGSTGAIGNIKHSLELEEFRADHPFLFFIRYNPTNAILF




FGRYWSP*





Ovalbumin
SEQ ID NO:
MGSIGAASMEFCFDVFKELKVHHANENIFYCPIAIMSALAMVYLGAKDSTRTQINKVVRFDKL



198
PGFGDSIEAQCGTSVNVHSSLRDILNQITKPNDVYSFSLASRLYAEERYPILPEYLQCVKELYRG




GLEPINFQTAADQARELINSWVESQINGIIRNVLQPSSVDSQTAMVLVNAIVFKGLWEKAFKD




EDTQAMPFRVTEQESKPVQMMYQIGLFRVASMASEKMKILELPFASGTMSMLVLLPDEVSGL




EQLESIINFEKLTEWTSSNVMEERKIKVYLPRMKMEEKYNLTSVLMAMGITDVFSSSANLSGIS




SAESLKISQAVHAAHAEINEAGREVVGSAEAGVDAASVSEEFRADHPFLFCIKHIATNAVLFFG




RCVSP*





Chicken
SEQ ID NO:
MRFPSIFTAVLFAASSALAAPVNTTTEDETAQIPAEAVIGYSDLEGDFDVAVLPFSNSTNNGLL


Ovalbumin
199
FINTTIASIAAKEEGVSLDKREAEAGSIGAASMEFCFDVFKELKVHHANENIFYCPIAIMSALAM


with bolded

VYLGAKDSTRTQINKVVRFDKLPGFGDSIEAQCGTSVNVHSSLRDILNQITKPNDVYSFSLASR


signal

LYAEERYPILPEYLQCVKELYRGGLEPINFQTAADQARELINSWVESQINGIIRNVLQPSSVDS


sequence

QTAMVLVNAIVFKGLWEKAFKDEDTQAMPFRVTEQESKPVQMMYQIGLFRVASMASEKMKI




LELPFASGTMSMLVLLPDEVSGLEQLESIINFEKLTEWTSSNVMEERKIKVYLPRMKMEEKYN




LTSVLMAMGITDVFSSSANLSGISSAESLKISQAVHAAHAEINEAGREVVGSAEAGVDAASVS




EEFRADHPFLFCIKHIATNAVLFFGRCVSP





Chicken OVA
SEQ ID NO:
EAEAGSIGAASMEFCFDVFKELKVHHANENIFYCPIAIMSALAMVYLGAKDSTRTQINKVVRF


sequence as
200
DKLPGFGDSIEAQCGTSVNVHSSLRDILNQITKPNDVYSFSLASRLYAEERYPILPEYLQCVKEL


secreted from

YRGGLEPINFQTAADQARELINSWVESQTNGIIRNVLQPSSVDSQTAMVLVNAIVFKGLWEKA


pichia

FKDEDTQAMPFRVTEQESKPVQMMYQIGLFRVASMASEKMKILELPFASGTMSMLVLLPDEV




SGLEQLESIINFEKLTEWTSSNVMEERKIKVYLPRMKMEEKYNLTSVLMAMGITDVFSSSANL




SGISSAESLKISQAVHAAHAEINEAGREVVGSAEAGVDAASVSEEFRADHPFLFCIKHIATNAV




LFFGRCVSP





Predicted
SEQ ID NO:
MRVPAQLLGLLLLWLPGARCGSIGAASMEFCFDVFKELKVHHANENIFYCPIAIMSALAMVY


Ovalbumin
201
LGAKDSTRTQINKVVRFDKLPGFGDSIEAQCGTSVNVHSSLRDILNQITKPNDVYSFSLASRLY


[Achromobacter

AEERYPILPEYLQCVKELYRGGLEPINFQTAADQARELINSWVESQTNGIIRNVLQPSSVDSQT



denitrificans]


AMVLVNAIVFKGLWEKAFKDEDTQAMPFRVTEQESKPVQMMYQIGLFRVASMASEKMKILE




LPFASGTMSMLVLLPDEVSGLEQLESIINFEKLTEWTSSNVMEERKIKVYLPRMKMEEKYNLT




SVLMAMGITDVFSSSANLSGISSAESLKISQAVHAAHAEINEAGREVVGSAEAGVDAASVSEEF




RADHPFLFCIKHIATNAVLFFGRCVSPLEIKRAAAHHHHHH





OLLAS
SEQ ID NO:
MTSGFANELGPRLMGKLTMGSIGAASMEFCFDVFKELKVHHANENIFYCPIAIMSALAMVYL


epitope-
202
GAKDSTRTQINKVVRFDKLPGFGDSIEAQCGTSVNVHSSLRDILNQITKPNDVYSFSLASRLYA


tagged

EERYPILPEYLQCVKELYRGGLEPINFQTAADQARELINSWVESQTNGIIRNVLQPSSVDSQTA


ovalbumin

MVLVNAIVFKGLWEKTFKDEDTQAMPFRVTEQESKPVQMMYQIGLFRVASMASEKMKILEL




PFASGTMSMLVLLPDEVSGLEQLESIINFEKLTEWTSSNVMEERKIKVYLPRMKMEEKYNLTS




VLMAMGITDVFSSSANLSGISSAESLKISQAVHAAHAEINEAGREVVGSAEAGVDAASVSEEF




RADHPFLFCIKHIATNAVLFFGRCVSPSR





Serpin family
SEQ ID NO:
MGGRRVRWEVYISRAGYVNRQIAWRRHHRSLTMRVPAQLLGLLLLWLPGARCGSIGAASME


protein
203
FCFDVFKELKVHHANENIFYCPIAIMSALAMVYLGAKDSTRTQINKVVRFDKLPGFGDSIEAQ


[Achromobacter

CGTSVNVHSSLRDILNQITKPNDVYSFSLASRLYAEERYPILPEYLQCVKELYRGGLEPINFQTA



denitrificans]


ADQARELINSWVESQTNGIIRNVLQPSSVDSQTAMVLVNAIVFKGLWEKAFKDEDTQAMPFR




VTEQESKPVQMMYQIGLFRVASMASEKMKILELPFASGTMSMLVLLPDEVSGLEQLESIINFE




KLTEWTSSNVMEERKIKVYLPRMKMEEKYNLTSVLMAMGITDVFSSSANLSGISSAESLKISQ




AVHAAHAEINEAGREVVGSAEAGVDAASVSEEFRADHPFLFCIKHIATNAVLFFGRCVSPLEIK




RAAAHHHHHH





PREDICTED:
SEQ ID NO:
MGSIGAVSMEFCFDVFKELKVHHANENIFYSPFTIISALAMVYLGAKDSTRTQINKVVRFDKLP


ovalbumin
204
GFGDSVEAQCGTSVNVHSSLRDILNQITKPNDVYSFSLASRLYAEETYPILPEYLQCVKELYRG


isoform X1

GLESINFQTAADQARGLINSWVESQTNGMIKNVLQPSSVDSQTAMVLVNAIVFKGLWEKAFK


[Meleagris

DEDTQAIPFRVTEQESKPVQMMYQIGLFKVASMASEKMKILELPFASGTMSMWVLLPDEVSG



gallopavo]


LEQLETTISFEKMTEWISSNIMEERRIKVYLPRMKMEEKYNLTSVLMAMGITDLFSSSANLSGI




SSAGSLKISQAVHAAYAEIYEAGREVIGSAEAGADATSVSEEFRVDHPFLYCIKHNLTNSILFFG




RCISP





Ovalbumin
SEQ ID NO:
MGSIGAVSMEFCFDVFKELKVHHANENIFYSPFTIISALAMVYLGAKDSTRTQINKVVRFDKLP


precursor
205
GFGDSVEAQCGTSVNVHSSLRDILNQITKPNDVYSFSLASRLYAEETYPILPEYLQCVKELYRG


[Meleagris

GLESINFQTAADQARGLINSWVESQTNGMIKNVLQPSSVDSQTAMVLVNAIVFKGLWEKAFK



gallopavo]


DEDTQAIPFRVTEQESKPVQMMYQIGLFKVASMASEKMKILELPFASGTMSMWVLLPDEVSG




LEQLETTISFEKMTEWISSNIMEERRIKVYLPRMKMEEKYNLTSVLMAMGITDLFSSSANLSGI




SSAGSLKISQAAHAAYAEIYEAGREVIGSAEAGADATSVSEEFRVDHPFLYCIKHNLTNSILFFG




RCISP





Hypothetical
SEQ ID NO:
YYRVPCMVLCTAFHPYIFIVLLFALDNSEFTMGSIGAVSMEFCFDVFKELRVHHPNENIFFCPF


protein
206
AIMSAMAMVYLGAKDSTRTQINKVIRFDKLPGFGDSTEAQCGKSANVHSSLKDILNQITKPND


[Bambusicola

VYSFSLASRLYADETYSIQSEYLQCVNELYRGGLESINFQTAADQARELINSWVESQINGIIRN



thoracicus]


VLQPSSVDSQTAMVLVNAIVFRGLWEKAFKDEDTQTMPFRVTEQESKPVQMMYQIGSFKVAS




MASEKMKILELPLASGTMSMLVLLPDEVSGLEQLETTISFEKLTEWTSSNVMEERKIKVYLPR




MKMEEKYNLTSVLMAMGITDLFRSSANLSGISLAGNLKISQAVHAAHAEINEAGRKAVSSAE




AGVDATSVSEEFRADRPFLFCIKHIATKVVFFFGRYTSP





Egg albumin
SEQ ID NO:
MGSIGAASMEFCFDVFKELKVHHANDNMLYSPFAILSTLAMVFLGAKDSTRTQINKVVHFDK



207
LPGFGDSIEAQCGTSVNVHSSLRDILNQITKQNDAYSFSLASRLYAQETYTVVPEYLQCVKELY




RGGLESVNFQTAADQARGLINAWVESQINGIIRNILQPSSVDSQTAMVLVNAIAFKGLWEKAF




KAEDTQTIPFRVTEQESKPVQMMYQIGSFKVASMASEKMKILELPFASGTMSMLVLLPDDVSG




LEQLESIISFEKLTEWTSSSIMEERKVKVYLPRMKMEEKYNLTSLLMAMGITDLFSSSANLSGIS




SVGSLKISQAVHAAHAEINEAGRDVVGSAEAGVDATEEFRADHPFLFCVKHIETNAILLFGRC




VSP





Ovalbumin
SEQ ID NO:
MASIGAVSTEFCVDVYKELRVHHANENIFYSPFTIISTLAMVYLGAKDSTRTQINKVVRFDKLP


isoform X2
208
GFGDSIEAQCGTSVNVHSSLRDILNQITKPNDVYSFSLASRLYAEETYPILPEYLQCVKELYRG


[Numida

GLESINFQTAADQARELINSWVESQTSGIIKNVLQPSSVNSQTAMVLVNAIYFKGLWERAFKD



meleagris]


EDTQAIPFRVTEQESKPVQMMSQIGSFKVASVASEKVKILELPFVSGTMSMLVLLPDEVSGLEQ




LESTISTEKLTEWTSSSIMEERKIKVFLPRMRMEEKYNLTSVLMAMGMTDLFSSSANLSGISSA




ESLKISQAVHAAYAEIYEAGREVVSSAEAGVDATSVSEEFRVDHPFLLCIKHNPTNSILFFGRCI




SP





Ovalbumin
SEQ ID NO:
MALCKAFHPYIFIVLLFDVDNSAFTMASIGAVSTEFCVDVYKELRVHHANENIFYSPFTIISTLA


isoform X1
209
MVYLGAKDSTRTQINKVVRFDKLPGFGDSIEAQCGTSVNVHSSLRDILNQITKPNDVYSFSLAS


[Numida

RLYAEETYPILPEYLQCVKELYRGGLESINFQTAADQARELINSWVESQTSGIIKNVLQPSSVNS



meleagris]


QTAMVLVNAIYFKGLWERAFKDEDTQAIPFRVTEQESKPVQMMSQIGSFKVASVASEKVKILE




LPFVSGTMSMLVLLPDEVSGLEQLESTISTEKLTEWTSSSIMEERKIKVFLPRMRMEEKYNLTS




VLMAMGMTDLFSSSANLSGISSAESLKISQAVHAAYAEIYEAGREVVSSAEAGVDATSVSEEF




RVDHPFLLCIKHNPTNSILFFGRCISP





PREDICTED:
SEQ ID NO:
MGSIGAASMEFCFDVFKELKVHHANDNMLYSPFAILSTLAMVFLGAKDSTRTQINKVVHFDK


Ovalbumin
210
LPGFGDSIEAQCGTSANVHSSLRDILNQITKQNDAYSFSLASRLYAQETYTVVPEYLQCVKELY


isoform X2

RGGLESVNFQTAADQARGLINAWVESQINGIIRNILQPSSVDSQTAMVLVNAIAFKGLWEKAF


[Coturnix

KAEDTQTIPFRVTEQESKPVQMMHQIGSFKVASMASEKMKILELPFASGTMSMLVLLPDDVSG



japonica]


LEQLESTISFEKLTEWTSSSIMEERKVKVYLPRMKMEEKYNLTSLLMAMGITDLFSSSANLSGI




SSVGSLKISQAVHAAYAEINEAGRDVVGSAEAGVDATEEFRADHPFLFCVKHIETNAILLFGRC




VSP





PREDICTED:
SEQ ID NO:
MGLCTAFHPYIFIVLLFALDNSEFTMGSIGAASMEFCFDVFKELKVHHANDNMLYSPFAILSTL


ovalbumin
211
AMVFLGAKDSTRTQINKVVHFDKLPGFGDSIEAQCGTSANVHSSLRDILNQITKQNDAYSFSL


isoform X1

ASRLYAQETYTVVPEYLQCVKELYRGGLESVNFQTAADQARGLINAWVESQINGIIRNILQPS


[Coturnix

SVDSQTAMVLVNAIAFKGLWEKAFKAEDTQTIPFRVTEQESKPVQMMHQIGSFKVASMASEK



japonica]


MKILELPFASGTMSMLVLLPDDVSGLEQLESTISFEKLTEWTSSSIMEERKVKVYLPRMKMEE




KYNLTSLLMAMGITDLFSSSANLSGISSVGSLKISQAVHAAYAEINEAGRDVVGSAEAGVDAT




EEFRADHPFLFCVKHIETNAILLFGRCVSP





Egg albumin
SEQ ID NO:
MGSIGAASMEFCFDVFKELKVHHANDNMLYSPFAILSTLAMVFLGAKDSTRTQINKVVHFDK



212
LPGFGDSIEAQCGTSANVHSSLRDILNQITKQNDAYSFSLASRLYAQETYTVVPEYLQCVKELY




RGGLESVNFQTAADQARGLINAWVESQINGIIRNILQPSSVDSQTAMVLVNAIAFKGLWEKAF




KAEDTQTIPFRVTEQESKPVQMMHQIGSFKVASMASEKMKILELPFASGTMSMLVLLPDDVSG




LEQLESTISFEKLTEWTSSSIMEERKVKVYLPRMKMEEKYNLTSLLMAMGITDLFSSSANLSGI




SSVGSLKIPQAVHAAYAEINEAGRDVVGSAEAGVDATEEFRADHPFLFCVKHIETNAILLFGRC




VSP





ovalbumin
SEQ ID NO:
MGSIGAASTEFCFDVFRELRVQHVNENIFYSPFSIISALAMVYLGARDNTRTQIDKVVHFDKLP


[Anas
213
GFGESMEAQCGTSVSVHSSLRDILTQITKPSDNFSLSFASRLYAEETYAILPEYLQCVKELYKG



platyrhynchos]


GLESISFQTAADQARELINSWVESQINGIIKNILQPSSVDSQTTMVLVNAIYFKGMWEKAFKDE




DTQAMPFRMTEQESKPVQMMYQVGSFKVAMVTSEKMKILELPFASGMMSMFVLLPDEVSGL




EQLESTISFEKLTEWTSSTMMEERRMKVYLPRMKMEEKYNLTSVFMALGMTDLFSSSANMSG




ISSTVSLKMSEAVHAACVEIFEAGRDVVGSAEAGMDVTSVSEEFRADHPFLFFIKHNPTNSILFF




GRWMSP





PREDICTED:
SEQ ID NO:
MGSIGAASTEFCFDVFRELKVQHVNENIFYSPLSIISALAMVYLGARDNTRTQIDQVVHFDKIP


ovalbumin-
214
GFGESMEAQCGTSVSVHSSLRDILTEITKPSDNFSLSFASRLYAEETYTILPEYLQCVKELYKGG


like

LESISFQTAADQARELINSWVESQINGIIKNILQPSSVDSQTTMVLVNAIYFKGMWEKAFKDED


[Ansercygnoides

TQTMPFRMTEQESKPVQMMYQVGSFKLATVTSEKVKILELPFASGMMSMCVLLPDEVSGLEQ



domesticus]


LETTISFEKLTEWTSSTMMEERRMKVYLPRMKMEEKYNLTSVFMALGMTDLFSSSANMSGIS




STVSLKMSEAVHAACVEIFEAGRDVVGSAEAGMDVTSVSEEFRADHPFLFFIKHNPSNSILFFG




RWISP





PREDICTED:
SEQ ID NO:
MGSIGAASTEFCFDVFKELKVQHVNENIFYSPLTIISALSMVYLGARENTRAQIDKVLHFDKMP


Ovalbumin-
215
GFGDTIESQCGTSVSIHTSLKDMFTQITKPSDNYSLSFASRLYAEETYPILPEYLQCVKELYKGG


like [Aquila

LETISFQTAAEQARELINSWVESQTNGMIKNILQPSSVDPQTKMVLVNAIYFKGVWEKAFKDE



chrysaetos


DTQEVPFRVTEQESKPVQMMYQIGSFKVAVMASEKMKILELPYASGQLSMLVLLPDDVSGLE



canadensis]


QLESAITFEKLMAWTSSTTMEERKMKVYLPRMKIEEKYNLTSVLMALGVTDLFSSSANLSGIS




SAESLKISKAVHEAFVEIYEAGSEVVGSTEAGMEVTSVSEEFRADHPFLFLIKHNPTNSILFFGR




CFSP





PREDICTED:
SEQ ID NO:
MGSIGAASTEFCFDVFKELKVQHVNENIFYSPLTIISALSMVYLGARENTRTQIDKVLHFDKMT


Ovalbumin-
216
GFGDTVESQCGTSVSIHTSLKDIFTQITKPSDNYSLSLASRLYAEETYPILPEYLQCVKELYKGG


like

LETVSFQTAAEQARELINSWVESQTNGMIKNILQPSSVDPQTKMVLVNAIYFKGVWEKAFKD


[Haliaeetus

EDTQEVPFRVTEQESKPVQMMYQIGSFKVAVMASEKMKILELPYASGQLSMLVLLPDDVSGL



albicilla]


EQLESAITSEKLMEWTSSTTMEERKMKVYLPRMKIEEKYNLTSVLMALGVTDLFSSSADLSGI




SSAESLKISKAVHEAFVEIYEAGSEVVGSTEGGMEVTSVSEEFRADHPFLFLIKHKPTNSILFFG




RCFSP





PREDICTED:
SEQ ID NO:
MGSIGAASTEFCFDVFKELKVQHVNENIFYSPLTIISALSMVYLGARENTRTQIDKVLHFDKMT


Ovalbumin-
217
GFGDTVESQCGTSVSIHTSLKDIFTQITKPSDNYSLSLASRLYAEETYPILPEYLQCVKELYKGG


like

LETVSFQTAAEQARELINSWVESQTNGMIKNILQPSSVDPQTKMVLVNAIYFKGVWEKAFKD


[Haliaeetus

EDTQEVPFRVTEQESKPVQMMYQIGSFKVAVMASEKMKILELPYASGQLSMLVLLPDDVSGL



leucocephalus]


EQLESAITSEKLMEWTSSTTMEERKMKVYLPRMKIEEKYNLTSVLMALGVTDLFSSSADLSGI




SSAESLKISKAVHEAFVEIYEAGSEVVGSTEGGMEVTSFSEEFRADHPFLFLIKHKPTNSILFFGR




CFSP





PREDICTED:
SEQ ID NO:
MGSIGAASTEFCFDVFKELKVQHVNENIFYSPLSIISALSMVYLGARENTRAQIDKVVHFDKIT


Ovalbumin
218
GFGETIESQCGTSVSVHTSLKDMFTQITKPSDNYSLSFASRLYAEETYPILPEYLQCVKELYKG


[Fulmarus

GLETTSFQTAADQARELINSWVESQTNGMIKNILQPGSVDPQTEMVLVNAIYFKGMWEKAFK



glacialis]


DEDTQAVPFRMTEQESKTVQMMYQIGSFKVAVMASEKMKILELPYASGELSMLVMLPDDVS




GLEQLETAITFEKLMEWTSSNMMEERKMKVYLPRMKMEEKYNLTSVLMALGVTDLFSSSAN




LSGISSAESLKMSEAVHEAFVEIYEAGSEVVGSTGAGMEVTSVSEEFRADHPFLFLIKHNPTNSI




LFFGRCFSP





PREDICTED:
SEQ ID NO:
MGSIGAASTEFCFDVFKELRVQHVNENVCYSPLIIISALSLVYLGARENTRAQIDKVVHFDKIT


Ovalbumin-
219
GFGESIESQCGTSVSVHTSLKDMFNQITKPSDNYSLSVASRLYAEERYPILPEYLQCVKELYKG


like

GLESISFQTAADQAREAINSWVESQTNGMIKNILQPSSVDPQTEMVLVNAIYFKGMWQKAFK


[Chlamydotis

DEDTQAVPFRISEQESKPVQMMYQIGSFKVAVMAAEKMKILELPYASGELSMLVLLPDEVSG



macqueenii]


LEQLENAITVEKLMEWTSSSPMEERIMKVYLPRMKIEEKYNLTSVLMALGITDLFSSSANLSGI




SAEESLKMSEAVHQAFAEISEAGSEVVGSSEAGIDATSVSEEFRADHPFLFLIKHNATNSILFFG




RCFSP





PREDICTED:
SEQ ID NO:
MGSISAASTEFCFDVFKELKVQHVNENIFYSPLSIISALSMVYLGARENTRAQIEKVVHFDKITG


Ovalbumin
220
FGESIESQCSTSVSVHTSLKDMFTQITKPSDNYSLSFASRFYAEETYPILPEYLQCVKELYKGGL


like [Nipponia

ETINFRTAADQARELINSWVESQTNGMIKNILQPGSVDPQTDMVLVNAIYFKGMWEKAFKDE



nippon]


DTQALPFRVTEQESKPVQMMYQIGSFKVAVLASEKVKILELPYASGQLSMLVLLPDDVSGLEQ




LETAITVEKLMEWTSSNNMEERKIKVYLPRIKIEEKYNLTSVLMALGITDLFSSSANLSGISSAE




SLKVSEAIHEAFVEIYEAGSEVAGSTEAGIEVTSVSEEFRADHPFLFLIKHNATNSILFFGRCFSP





PREDICTED:
SEQ ID NO:
MVSIGAASTEFCFDVFKELKVQHVNENIFYSPLSIISALSMVYLGARENTRAQIDKVVHFDKIT


Ovalbumin-
221
GFEETIESQCSTSVSVHTSLKDMFTQITKPSDNYSLSFASRLYAEETYPILPEYLQCVKELYKGG


like isoform

LETISFQTAADQARELINSWVESQTDGMIKNILQPGSVDPQTEMVLVNAIYFKGMWEKAFKDE


X2 [Gavia

DTQAVPFRMTEQESKPVQMMYQIGSFKVAVMASEKMKILELPYASGGMSMLVMLPDDVSGL



stellata]


EQLETAITFEKLMEWTSSNMMEERKMKVYLPRMKMEEKYNLTSVLMALGMTDLFSSSANLS




GISSAESLKMSEAVHEAFVEIYEAGSEAVGSTGAGMEVTSVSEEFRADHPFLFLIKHNPTNSILF




FGRCFSP





PREDICTED:
SEQ ID NO:
MGSIGAASTEFCFDVFKELKVQHVNENIFYSPLSIISALSMVYLGARENTRAQIDKVVHFDKIT


Ovalbumin
222
GFGEPIESQCGISVSVHTSLKDMITQITKPSDNYSLSFASRLYAEETYPILPEYLQCVKELYKGG


[Pelecanus

LETISFQTAADQARELINSWVENQTNGMIKNILQPGSVDPQTEMVLVNAVYFKGMWEKAFKD



crispus]


EDTQAVPFRMTEQESKPVQMMYQIGSFKVAVMASEKIKILELPYASGELSMLVLLPDDVSGLE




QLETAITLDKLTEWTSSNAMEERKMKVYLPRMKIEKKYNLTSVLIALGMTDLFSSSANLSGISS




AESLKMSEAIHEAFLEIYEAGSEVVGSTEAGMEVTSVSEEFRADHPFLFLIKHNPTNSILFFGRC




LSP





PREDICTED:
SEQ ID NO:
MGSIGAASTEFCFDVFKELKVQHVNENIFYSPLTIISALSMVYLGARENTRAQIDKVVHFDKIP


Ovalbumin-
223
GFGDTTESQCGTSVSVHTSLKDMFTQITKPSDNYSVSFASRLYAEETYPILPEFLECVKELYKG


like

GLESISFQTAADQARELINSWVESQTNGMIKNILQPGSVDSQTEMVLVNAIYFKGMWEKAFK


[Charadrius

DEDTQTVPFRMTEQETKPVQMMYQIGTFKVAVMPSEKMKILELPYASGELCMLVMLPDDVS



vociferus]


GLEELESSITVEKLMEWTSSNMMEERKMKVFLPRMKIEEKYNLTSVLMALGMTDLFSSSANL




SGISSAEPLKMSEAVHEAFIEIYEAGSEVVGSTGAGMEITSVSEEFRADHPFLFLIKHNPTNSILF




FGRCVSP





PREDICTED:
SEQ ID NO:
MGSIGAVSTEFCFDVFKELKVQHVNENIFYSPLSIISALSMVYLGARENTRAQIDKVVHFDKIT


Ovalbumin-
224
GSGETIEAQCGTSVSVHTSLKDMFTQITKPSENYSVGFASRLYADETYPIIPEYLQCVKELYKG


like

GLEMISFQTAADQARELINSWVESQTNGMIKNILQPGSVDPQTEMILVNAIYFKGVWEKAFKD


[Eurypyga

EDTQAVPFRMTEQESKPVQMMYQFGSFKVAAMAAEKMKILELPYASGALSMLVLLPDDVSG



helias]


LEQLESAITFEKLMEWTSSNMMEEKKIKVYLPRMKMEEKYNFTSVLMALGMTDLFSSSANLS




GISSADSLKMSEVVHEAFVEIYEAGSEVVGSTGSGMEAASVSEEFRADHPFLFLIKHNPTNSILF




FGRCFSP





PREDICTED:
SEQ ID NO:
MVSIGAASTEFCFDVFKELKVQHVNENIFYSPLSIISALSMVYLGARENTRAQIDKVVHFDKIT


Ovalbumin-
225
GFEETIESQVQKKQCSTSVSVHTSLKDMFTQITKPSDNYSLSFASRLYAEETYPILPEYLQCVKE


like isoform

LYKGGLETISFQTAADQARELINSWVESQTDGMIKNILQPGSVDPQTEMVLVNAIYFKGMWE


X1 [Gavia

KAFKDEDTQAVPFRMTEQESKPVQMMYQIGSFKVAVMASEKMKILELPYASGGMSMLVMLP



stellata]


DDVSGLEQLETAITFEKLMEWTSSNMMEERKMKVYLPRMKMEEKYNLTSVLMALGMTDLF




SSSANLSGISSAESLKMSEAVHEAFVEIYEAGSEAVGSTGAGMEVTSVSEEFRADHPFLFLIKH




NPTNSILFFGRCFSP





PREDICTED:
SEQ ID NO:
MGSIGAASGEFCFDVFKELKVQHVNENIFYSPLSIISALSMVYLGARENTRAQIDKVVHFDKIIG


Ovalbumin -
226
FGESIESQCGTSVSVHTSLKDMFAQITKPSDNYSLSFASRLYAEETFPILPEYLQCVKELYKGGL


like [Egretta

ETLSFQTAADQARELINSWVESQTNGMIKDILQPGSVDPQTEMVLVNAIYFKGVWEKAFKDE



garzetta]


DTQTVPFRMTEQESKPVQMMYQIGSFKVAVVAAEKIKILELPYASGALSMLVLLPDDVSSLEQ




LETAITFEKLTEWTSSNIMEERKIKVYLPRMKIEEKYNLTSVLMDLGITDLFSSSANLSGISSAES




LKVSEAIHEAIVDIYEAGSEVVGSSGAGLEGTSVSEEFRADHPFLFLIKHNPTSSILFFGRCFSP





PREDICTED:
SEQ ID NO:
MGSIGAASTEFCFDVFKELKVQHVNENIFYSPLSIISALSMVYLGARENTRAQIDKVVHFDKIT


Ovalbumin-
227
GSGEAIESQCGTSVSVHISLKDMFTQITKPSDNYSLSFASRLYAEETYPILPEYLQCVKELYKEG


like [Balearica

LATISFQTAADQAREFINSWVESQTNGMIKNILQPGSVDPQTQMVLVNAIYFKGVWEKAFKDE



regulorum


DTQAVPFRMTKQESKPVQMMYQIGSFKVAVMASEKMKILELPYASGQLSMLVMLPDDVSGL



gibbericeps]


EQIENAITFEKLMEWTNPNMMEERKMKVYLPRMKMEEKYNLTSVLMALGMTDLFSSSANLS




GISSAESLKMSEAVHEAFVEIYEAGSEVVGSTGAGIEVTSVSEEFRADHPFLFLIKHNPTNSILFF




GRCFSP





PREDICTED:
SEQ ID NO:
MGSIGEASTEFCIDVFRELKVQHVNENIFYSPLSIISALSMVYLGARENTRAQIDQVVHFDKITG


Ovalbumin-
228
FGDTVESQCGSSLSVHSSLKDIFAQITQPKDNYSLNFASRLYAEETYPILPEYLQCVKELYKGG


like [Nestor

LETISFQTAADQARELINSWVESQTNGMIKNILQPSSVDPQTEMVLVNAIYFKGVWEKAFKDE


notabilis]

ETQAVPFRITEQENRPVQIMYQFGSFKVAVVASEKIKILELPYASGQLSMLVLLPDEVSGLEQL




ENAITFEKLTEWTSSDIMEEKKIKVFLPRMKIEEKYNLTSVLVALGIADLFSSSANLSGISSAESL




KMSEAVHEAFVEIYEAGSEVVGSSGAGIEAASDSEEFRADHPFLFLIKHKPTNSILFFGRCFSP





PREDICTED:
SEQ ID NO:
MGSIGAASTEFCFDIFNELKVQHVNENIFYSPLSIISALSMVYLGARENTKAQIDKVVHFDKITG


Ovalbumin-
229
FGESIESQCSTSASVHTSFKDMFTQITKPSDNYSLSFASRLYAEETYPILPEYSQCVKELYKGGL


like

ESISFQTAADQARELINSWVESQTNGMIKNILQPGSVDPQTELVLVNAIYFKGTWEKAFKDKD


[Pygoscelis

TQAVPFRVTEQESKPVQMMYQIGSYKVAVIASEKMKILELPYASGELSMLVLLPDDVSGLEQL



adeliae]


ETAITFEKLMEWTSSNMMEERKVKVYLPRMKIEEKYNLTSVLMALGMTDLFSPSANLSGISSA




ESLKMSEAIHEAFVEIYEAGSEVVGSTEAGMEVTSVSEEFRADHPFLFLIKCNLTNSILFFGRCF




SP





Ovalbumin-
SEQ ID NO:
MGSISTASTEFCFDVFKELKVQHVNENIFYSPLSIISALSMVYLGARENTRAQIEKVVHFDKITG


like [Athene
230
FGESIESQCGTSVSVHTSLKDMLIQISKPSDNYSLSFASKLYAEETYPILPEYLQCVKELYKGGL



cunicularia]


ESINFQTAADQARQLINSWVESQTNGMIKDILQPSSVDPQTEMVLVNAIYFKGIWEKAFKDED




TQEVPFRITEQESKPVQMMYQIGSFKVAVIASEKIKILELPYASGELSMLIVLPDDVSGLEQLET




AITFEKLIEWTSPSIMEERKTKVYLPRMKIEEKYNLTSVLMALGMTDLFSPSANLSGISSAESLK




MSEAIHEAFVEIYEAGSEVVGSAEAGMEATSVSEFRVDHPFLFLIKHNPANIILFFGRCVSP





PREDICTED:
SEQ ID NO:
MGSIGAASTEFCFDVFKELKVQHVNENIFYSPLTIISALSLVYLGARENTRAQIDKVFHFDKISG


Ovalbumin-
231
FGETTESQCGTSVSVHTSLKEMFTQITKPSDNYSVSFASRLYAEDTYPILPEYLQCVKELYKGG


like [Calidris

LETISFQTAADQAREVINSWVESQTNGMIKNILQPGSVDSQTEMVLVNAIYFKGMWEKAFKD



pugnax]


EDTQTMPFRITEQERKPVQMMYQAGSFKVAVMASEKMKILELPYASGEFCMLIMLPDDVSGL




EQLENSFSFEKLMEWTTSNMMEERKMKVYIPRMKMEEKYNLTSVLMALGMTDLFSSSANLS




GISSAETLKMSEAVHEAFMEIYEAGSEVVGSTGSGAEVTGVYEEFRADHPFLFLVKHKPTNSIL




FFGRCVSP





PREDICTED:
SEQ ID NO:
MGSIGAASTEFCFDIFNELKVQHVNENIFYSPLSIISALSMVYLGARENTKAQIDKVVHFDKITG


Ovalbumin
232
FGETIESQCSTSVSVHTSLKDTFTQITKPSDNYSLSFASRLYAEETYPILPEYSQCVKELYKGGL


[Aptenodytes

ETISFQTAADQARELINSWVESQTNGMIKNILQPGSVDPQTELVLVNAIYFKGTWEKAFKDKD



forsteri]


TQAVPFRVTEQESKPVQMMYQIGSYKVAVIASEKMKILELPYASRELSMLVLLPDDVSGLEQL




ETAITFEKLMEWTSSNMMEERKVKVYLPRMKIEEKYNLTSVLMALGMTDLFSPSANLSGISSA




ESLKMSEAVHEAFVEIYEAGSEVVGSTGAGMEVTSVSEEFRADHPFLFLIKCNPTNSILFFGRC




FSP





PREDICTED:
SEQ ID NO:
MGSISAASAEFCLDVFKELKVQHVNENIFYSPLSIISALSMVYLGARENTRAQIDKVVHFDKIT


Ovalbumin-
233
GSGETIEFQCGTSANIHPSLKDMFTQITRLSDNYSLSFASRLYAEERYPILPEYLQCVKELYKGG


like [Pterocles

LETISFQTAADQARELINSWVESQTNGMIKNILQPGSVNPQTEMVLVNAIYFKGLWEKAFKDE



gutturalis]


DTQTVPFRMTEQESKPVQMMYQVGSFKVAVMASDKIKILELPYASGELSMLVLLPDDVTGLE




QLETSITFEKLMEWTSSNVMEERTMKVYLPHMRMEEKYNLTSVLMALGVTDLFSSSANLSGI




SSAESLKMSEAVHEAFVEIYESGSQVVGSTGAGTEVTSVSEEFRVDHPFLFLIKHNPTNSILFFG




RCFSP





Ovalbumin-
SEQ ID NO:
MGSIGAASVEFCFDVFKELKVQHVNENIFYSPLSIISALSMVYLGARENTKAQIDKVVHFDKIA


like [Falco
234
GFGEAIESQCVTSASIHSLKDMFTQITKPSDNYSLSFASRLYAEEAYSILPEYLQCVKELYKGGL



peregrinus]


ETISFQTAADQARDLINSWVESQTNGMIKNILQPGAVDLETEMVLVNAIYFKGMWEKAFKDE




DTQTVPFRMTEQESKPVQMMYQVGSFKVAVMASDKIKILELPYASGQLSMVVVLPDDVSGL




EQLEASITSEKLMEWTSSSIMEEKKIKVYFPHMKIEEKYNLTSVLMALGMTDLFSSSANLSGIS




SAEKLKVSEAVHEAFVEISEAGSEVVGSTEAGTEVTSVSEEFKADHPFLFLIKHNPTNSILFFGR




CFSP





PREDICTED:
SEQ ID NO:
MGSIGAASSEFCFDIFKELKVQHVNENIFYSPLSIISALSMVYLGARENTRAQIDKVVPFDKITA


Ovalbumin -
235
SGESIESQCSTSVSVHTSLKDIFTQITKSSDNHSLSFASRLYAEETYPILPEYLQCVKELYEGGLE


like isoform

TISFQTAADQARELINSWIESQTNGRIKNILQPGSVDPQTEMVLVNAIYFKGMWEKAFKDEDT


X2

QAVPFRMTEQESKPVQVMHQIGSFKVAVLASEKIKILELPYASGELSMLVLLPDDVSGLEQLE


[Phalacrocorax

TAITFEKLMEWTSPNIMEERKIKVFLPRMKIEEKYNLTSVLMALGITDLFSPLANLSGISSAESL



carbo]


KMSEAIHEAFVEISEAGSEVIGSTEAEVEVINDPEEFRADHPFLFLIKHNPTNSILFFGRCFSP





PREDICTED:
SEQ ID NO:
MGSIGAASTEFCFDVFKELKAQYVNENIFYSPMTIITALSMVYLGSKENTRAQIAKVAHFDKIT


Ovalbumin-
236
GFGESIESQCGASASIQFSLKDLFTQITKPSGNHSLSVASRIYAEETYPILPEYLECMKELYKGGL


like [Merops

ETINFQTAANQARELINSWVERQTSGMIKNILQPSSVDSQTEMVLVNAIYFRGLWEKAFKVED



nubicus]


TQATPFRITEQESKPVQMMHQIGSFKVAVVASEKIKILELPYASGRLTMLVVLPDDVSGLKQL




ETTITFEKLMEWTTSNIMEERKIKVYLPRMKIEEKYNLTSVLMALGLTDLFSSSANLSGISSAES




LKMSEAVHEAFVEIYEAGSEVVASAEAGMDATSVSEEFRADHPFLFLIKDNTSNSILFFGRCFS




P





PREDICTED:
SEQ ID NO:
MGSIGAASTEFCFDVFKELKGQHVNENIFFCPLSIVSALSMVYLGARENTRAQIVKVAHFDKIA


Ovalbumin-
237
GFAESIESQCGTSVSIHTSLKDMFTQITKPSDNYSLNFASRLYAEETYPIIPEYLQCVKELYKGG


like [Tauraco

LETISFQTAADQAREIINSWVESQTNGMIKNILRPSSVHPQTELVLVNAVYFKGTWEKAFKDE



erythrolophus]


DTQAVPFRITEQESKPVQMMYQIGSFKVAAVTSEKMKILEVPYASGELSMLVLLPDDVSGLEQ




LETAITAEKLIEWTSSTVMEERKLKVYLPRMKIEEKYNLTTVLTALGVTDLFSSSANLSGISSA




QGLKMSNAVHEAFVEIYEAGSEVVGSKGEGTEVSSVSDEFKADHPFLFLIKHNPTNSIVFFGRC




FSP





PREDICTED:
SEQ ID NO:
MGSIGAASTEFCFDVFKELKVHHVNENILYSPLAIISALSMVYLGAKENTRDQIDKVVHFDKIT


Ovalbumin -
238
GIGESIESQCSTAVSVHTSLKDVFDQITRPSDNYSLAFASRLYAEKTYPILPEYLQCVKELYKGG


like [Cuculus

LETIDFQTAADQARQLINSWVEDETNGMIKNILRPSSVNPQTKIILVNAIYFKGMWEKAFKDED



canorus]


TQEVPFRITEQETKSVQMMYQIGSFKVAEVVSDKMKILELPYASGKLSMLVLLPDDVYGLEQL




ETVITVEKLKEWTSSIVMEERITKVYLPRMKIMEKYNLTSVLTAFGITDLFSPSANLSGISSTESL




KVSEAVHEAFVEIHEAGSEVVGSAGAGIEATSVSEEFKADHPFLFLIKHNPTNSILFFGRCFSP





Ovalbumin
SEQ ID NO:
MGSIGAASTEFCLDVFKELKVQHVNENIFYSPLSIISALSMVYLGARENTRAQIDKVVHFDKIT


[Antrostomus
239
GFEDSIESQCGTSVSVHTSLKDMFTQITKPSDNYSVGFASRLYAAETYQILPEYSQCVKELYKG



carolinensis]


GLETINFQKAADQATELINSWVESQTNGMIKNILQPSSVDPQTQIFLVNAIYFKGMWQRAFKE




EDTQAVPFRISEKESKPVQMMYQIGSFKVAVIPSEKIKILELPYASGLLSMLVILPDDVSGLEQL




ENAITLEKLMQWTSSNMMEERKIKVYLPRMRMEEKYNLTSVFMALGITDLFSSSANLSGISSA




ESLKMSDAVHEASVEIHEAGSEVVGSTGSGTEASSVSEEFRADHPYLFLIKHNPTDSIVFFGRCF




SP





PREDICTED:
SEQ ID NO:
MGSIGAASTEFCFDVFKELKFQHVDENIFYSPLTIISALSMVYLGARENTRAQIDKVVHFDKIA


Ovalbumin-
240
GFEETVESQCGTSVSVHTSLKDMFAQITKPSDNYSLSFASRLYAEETYPILPEYLQCVKELYKG


like

GLETISFQTAADQARDLINSWVESQTNGMIKNILQPSSVGPQTELILVNAIYFKGMWQKAFKD


[Opisthocomus

EDTQEVPFRMTEQQSKPVQMMYQTGSFKVAVVASEKMKILALPYASGQLSLLVMLPDDVSG



hoazin]


LKQLESAITSEKLIEWTSPSMMEERKIKVYLPRMKIEEKYNLTSVLMALGITDLFSPSANLSGIS




SAESLKMSQAVHEAFVEIYEAGSEVVGSTGAGMEDSSDSEEFRVDHPFLFFIKHNPTNSILFFG




RCFSP





PREDICTED:
SEQ ID NO:
MGSIGPLSVEFCCDVFKELRIQHPRENIFYSPVTIISALSMVYLGARDNTKAQIEKAVHFDKIPG


Ovalbumin-
241
FGESIESQCGTSLSIHTSLKDIFTQITKPSDNYTVGIASRLYAEEKYPILPEYLQCIKELYKGGLEP


like

INFQTAAEQARELINSWVESQTNGMIKNILQPSSVNPETDMVLVNAIYFKGLWEKAFKDEDIQ


[Lepidothrix

TVPFRITEQESKPVQMMFQIGSFRVAEITSEKIRILELPYASGQLSLWVLLPDDISGLEQLETAIT



coronata]


FENLKEWTSSTKMEERKIKVYLPRMKIEEKYNLTSVLTSLGITDLFSSSANLSGISSAESLKVSS




AFHEASVEIYEAGSKVVGSTGAEVEDTSVSEEFRADHPFLFLIKHNPSNSIFFFGRCFSP





PREDICTED:
SEQ ID NO:
MGSIGTASAEFCFDVFKELKVHHVNENIFYSPLSIISALSMVYLGARENTKTQMEKVIHFDKIT


Ovalbumin
242
GLGESMESQCGTGVSIHTALKDMLSEITKPSDNYSLSLASRLYAEQTYAILPEYLQCIKELYKE


[Struthio

SLETVSFQTAADQARELINSWIESQTNGVIKNFLQPGSVDSQTELVLVNAIYFKGMWEKAFKD



camelus


EDTQEVPFRITEQESRPVQMMYQAGSFKVATVAAEKIKILELPYASGELSMLVLLPDDISGLEQ



australis]


LETTISFEKLTEWTSSNMMEDRNMKVYLPRMKIEEKYNLTSVLIALGMTDLFSPAANLSGISA




AESLKMSEAIHAAYVEIYEADSEIVSSAGVQVEVTSDSEEFRVDHPFLFLIKHNPTNSVLFFGRC




ISP





PREDICTED:
SEQ ID NO:
MGSIGAVSTEFSCDVFKELRIHHVQENIFYSPVTIISALSMIYLGARDSTKAQIEKAVHFDKIPGF


Ovalbumin-
243
GESIESQCGTSLSIHTSIKDMFTKITKASDNYSIGIASRLYAEEKYPILPEYLQCVKELYKGGLESI


like

SFQTAAEQAREIINSWVESQTNGMIKNILQPSSVDPQTDIVLVNAIYFKGLWEKAFRDEDTQTV


[Acanthisitta

PFKITEQESKPVQMMYQIGSFKVAEITSEKIKILEVPYASGQLSLWVLLPDDISGLEKLETAITFE



chloris]


NLKEWTSSTKMEERKIKVYLPRMKIEEKYNLTSVLTALGITDLFSSSANLSGISSAESLKVSEAF




HEAIVEISEAGSKVVGSVGAGVDDTSVSEEFRADHPFLFLIKHNPTSSIFFFGRCFSP





PREDICTED:
SEQ ID NO:
MGSIGAASTEFCFDVFKELKVQHVNENIFYSPLSIISALSMVYLGARENTRAQIDKVVHFDKIA


Ovalbumin-
244
GFGESTESQCGTSVSAHTSLKDMSNQITKLSDNYSLSFASRLYAEETYPILPEYSQCVKELYKG


like [Tyto

GLESISFQTAAYQARELINAWVESQTNGMIKDILQPGSVDSQTKMVLVNAIYFKGIWEKAFKD



alba]


EDTQEVPFRMTEQETKPVQMMYQIGSFKVAVIAAEKIKILELPYASGQLSMLVILPDDVSGLE




QLETAITFEKLTEWTSASVMEERKIKVYLPRMSIEEKYNLTSVLIALGVTDLFSSSANLSGISSA




ESLRMSEAIHEAFVETYEAGSTESGTEVTSASEEFRVDHPFLFLIKHKPTNSILFFGRCFSP





PREDICTED:
SEQ ID NO:
MGSIGAASSEFCFDIFKELKVQHVNENIFYSPLSIISALSMVYLGARENTRAQIDKVVPFDKITA


Ovalbumin -
245
SGESIESQVQKIQCSTSVSVHTSLKDIFTQITKSSDNHSLSFASRLYAEETYPILPEYLQCVKELY


like isoform

EGGLETISFQTAADQARELINSWIESQTNGRIKNILQPGSVDPQTEMVLVNAIYFKGMWEKAF


X1

KDEDTQAVPFRMTEQESKPVQVMHQIGSFKVAVLASEKIKILELPYASGELSMLVLLPDDVSG


[Phalacrocorax

LEQLETAITFEKLMEWTSPNIMEERKIKVFLPRMKIEEKYNLTSVLMALGITDLFSPLANLSGIS



carbo]


SAESLKMSEAIHEAFVEISEAGSEVIGSTEAEVEVTNDPEEFRADHPFLFLIKHNPTNSILFFGRC




FSP





Ovalbumin-
SEQ ID NO:
MGSIGPLSVEFCCDVFKELRIQHARENIFYSPVTIISALSMVYLGARDNTKAQIEKAVHFDKIPG


like [Pipra
246
FGESIESQCGTSLSIHTSLKDIFTQITKPSDNYTVGIASRLYAEEKYPILPEYLQCIKELYKGGLEP



filicauda]


ISFQTAAEQARELINSWVESQINGIIKNILQPSSVNPETDMVLVNAIYFKGLWEKAFKDEGTQT




VPFRITEQESKPVQMMFQIGSFRVAEIASEKIRILELPYASGQLSLWVLLPDDISGLEQLETAITF




ENLKEWTSSTKMEERKIKVYLPRMKIEEKYNLTSVLTSLGITDLFSSSANLSGISSAERLKVSSA




FHEASMEINEAGSKVVGAGVDDTSVSEEFRVDRPFLFLIKHNPSNSIFFFGRCFSP





Ovalbumin
SEQ ID NO:
MGSIGAASTEFCFDMFKELKVHHVNENIIYSPLSIISILSMVFLGARENTKTQMEKVIHFDKITG


[Dromaius
247
FGESLESQCGTSVSVHASLKDILSEITKPSDNYSLSLASKLYAEETYPVLPEYLQCIKELYKGSL



novaehollandiae]


ETVSFQTAADQARELINSWVETQTNGVIKNFLQPGSVDPQTEMVLVDAIYFKGTWEKAFKDE




DTQEVPFRITEQESKPVQMMYQAGSFKVATVAAEKMKILELPYASGELSMFVLLPDDISGLEQ




LETTISIEKLSEWTSSNMMEDRKMKVYLPHMKIEEKYNLTSVLVALGMTDLFSPSANLSGIST




AQTLKMSEAIHGAYVEIYEAGSEMATSTGVLVEAASVSEEFRVDHPFLFLIKHNPSNSILFFGR




CIFP





Chain A,
SEQ ID NO:
MGSIGAASTEFCFDMFKELKVHHVNENIIYSPLSIISILSMVFLGARENTKTQMEKVIHFDKITG


Ovalbumin
248
FGESLESQCGTSVSVHASLKDILSEITKPSDNYSLSLASKLYAEETYPVLPEYLQCIKELYKGSL




ETVSFQTAADQARELINSWVETQTNGVIKNFLQPGSVDPQTEMVLVDAIYFKGTWEKAFKDE




DTQEVPFRITEQESKPVQMMYQAGSFKVATVAAEKMKILELPYASGELSMFVLLPDDISGLEQ




LETTISIEKLSEWTSSNMMEDRKMKVYLPHMKIEEKYNLTSVLVALGMTDLFSPSANLSGIST




AQTLKMSEAIHGAYVEIYEAGSEMATSTGVLVEAASVSEEFRVDHPFLFLIKHNPSNSILFFGR




CIFPHHHHHH





Ovalbumin-
SEQ ID NO:
MGSIGPLSVEFCCDVFKELRIQHARENIFYSPVTIISALSMVYLGARDNTKAQIEKAVHFDKIPG


like [Corapipo
249
FGESIESQCGTSLSIHTSLKDIFTQITKPSDNYTVGIASRLYAEEKYPILPEYLQCIKELYKGGLEP



altera]


ISFQTAAEQARELINSWVESQTNGMIKNILQPSAVNPETDMVLVNAIYFKGLWEKAFKDEGTQ




TVPFRITEQESKPVQMMFQIGSFRVAEITSEKIRILELPYASGQLSLWVLLPDDISGLEQLETAIT




FENLKEWTSSTKMEERKIKVYLPRMKIEEKYNLTSVLTSLGITDLFSSSANLSGISSAERLKVSS




AFHEASMEIYEAGSKVVGSTGAGVDDTSVSEEFRVDRPFLFLIKHNPSNSIFFFGRCFSP





Ovalbumin-
SEQ ID NO:
MEDQRGNTGFTMGSIGAASTEFCIDVFRELRVQHVNENIFYSPLTIISALSMVYLGARENTRAQ


like protein
250
IDQVVHFDKIAGFGDTVESQCGSSPSVHNSLKTVXAQITQPRDNYSLNLASRLYAEESYPILPE


[Amazona

YLQCVKELYNGGLETVSFQTAADQARELINSWVESQINGIIKNILQPSSVDPQTEMVLVNAIYF



aestiva]


KGLWEKAFKDEETQAVPFRITEQENRPVQMMYQFGSFKVAXVASEKIKILELPYASGQLSML




VLLPDEVSGLEQNAITFEKLTEWTSSDLMEERKIKVFFPRVKIEEKYNLTAVLVSLGITDLFSSS




ANLSGISSAENLKMSEAVHEAXVEIYEAGSEVAGSSGAGIEVASDSEEFRVDHPFLFLIXHNPT




NSILFFGRCFSP





PREDICTED:
SEQ ID NO:
MGSIGAASTEFCIDVFRELRVQHVNENIFYSPLSIISALSMVYLGARENTRAQIDEVFHFDKIAG


Ovalbumin-
251
FGDTVDPQCGASLSVHKSLQNVFAQITQPKDNYSLNLASRLYAEESYPILPEYLQCVKELYNE


like

GLETVSFQTGADQARELINSWVENQTNGVIKNILQPSSVDPQTEMVLVNAIYFKGLWQKAFK


[Melopsittacus

DEETQAVPFRITEQENRPVQMMYQFGSFKVAVVASEKVKILELPYASGQLSMWVLLPDEVSG



undulatus]


LEQLENAITFEKLTEWTSSDLTEERKIKVFLPRVKIEEKYNLTAVLMALGVTDLFSSSANFSGIS




AAENLKMSEAVHEAFVEIYEAGSEVVGSSGAGIEAPSDSEEFRADHPFLFLIKHNPTNSILFFGR




CFSP





Ovalbumin-
SEQ ID NO:
MGSIGPLSVEFCCDVFKELRIQHARDNIFYSPVTIISALSMVYLGARDNTKAQIEKAVHFDKIPG


like
252
FGESIESQCGTSLSVHTSLKDIFTQITKPRENYTVGIASRLYAEEKYPILPEYLQCIKELYKGGLE


[Neopelma

PISFQTAAEQARELINSWVESQTNGMIKNILQPSSVNPETDMVLVNAIYFKGLWKKAFKDEGT



chrysocephalum]


QTVPFRITEQESKPVQMMFQIGSFRVAEITSEKIRILELPYASGQLSLWVLLPDDISGLEQLESAI




TFENLKEWTSSTKMEERKIKVYLPRMKIEEKYNLTSVLTSLGITDLFSSSANLSGISSAEKLKVS




SAFHEASMEIYEAGNKVVGSTGAGVDDTSVSEEFRVDRPFLFLIKHNPSNSIFFFGRCFSP





PREDICTED:
SEQ ID NO:
MGSIGAASAEFCVDVFKELKDQHVNNIVFSPLMIISALSMVNIGAREDTRAQIDKVVHFDKITG


Ovalbumin-
253
YGESIESQCGTSIGIYFSLKDAFTQITKPSDNYSLSFASKLYAEETYPILPEYLKCVKELYKGGLE


like [Buceros

TISFQTAADQARELINSWVESQTNGMIKNILQPSSVDPQTEMVLVNAIYFKGLWEKAFKDEDT



rhinoceros


QAVPFRITEQESKPVQMMYQIGSFKVAVIASEKIKILELPYASGQLSLLVLLPDDVSGLEQLESA



silvestris]


ITSEKLLEWTNPNIMEERKTKVYLPRMKIEEKYNLTSVLVALGITDLFSSSANLSGISSAEGLKL




SDAVHEAFVEIYEAGREVVGSSEAGVEDSSVSEEFKADRPFIFLIKHNPTNGILYFGRYISP





PREDICTED:
SEQ ID NO:
MGSIGAANTDFCFDVFKELKVHHANENIFYSPLSIVSALAMVYLGARENTRAQIDKALHFDKI


Ovalbumin-
254
LGFGETVESQCDTSVSVHTSLKDMLIQITKPSDNYSFSFASKIYTEETYPILPEYLQCVKELYKG


like [Cariama

GVETISFQTAADQAREVINSWVESHTNGMIKNILQPGSVDPQTKMVLVNAVYFKGIWEKAFK



cristata]


EEDTQEMPFRINEQESKPVQMMYQIGSFKLTVAASENLKILEFPYASGQLSMMVILPDEVSGL




KQLETSITSEKLIKWTSSNTMEERKIRVYLPRMKIEEKYNLKSVLMALGITDLFSSSANLSGISS




AESLKMSEAVHEAFVEIYEAGSEVTSSTGTEMEAENVSEEFKADHPFLFLIKHNPTDSIVFFGR




CMSP





Ovalbumin
SEQ ID NO:
MGSIGPLSVEFCCDVFKELRIQHARENIFYSPVTIISALSMVYLGARDNTKAQIEKAVHFDKIPG


[Manacus
255
FGESIESQCGTSLSIHTSLKDIFTQITKPSDNYTVGIASRLYAEEKYPILPEYLQCIKELYKGGLEP



vitellinus]


ISFQTAAEQARELINSWVESQTNGMIKNILQPSSVNPETDMVLVNAIYFKGLWEKAFKDESTQ




TVPFRITEQESKPVQMMFQIGSFRVAEIASEKIRILELPYASGQLSLWVLLPDDISGLEQLETAIT




FENLKEWTSSTKMEERKIKVYLPRMKIEEKYNLTSVLTSLGITDLFSSSANLSGISSAERLKVSS




AFHEASMEIYEAGSRVVEAGVDDTSVSEEFRVDRPFLFLIKHNPSNSIFFFGRCFSP





Ovalbumin-
SEQ ID NO:
MGSIGPVSTEFCCDIFKELRIQHARENIIYSPVTIISALSMVYLGARDNTKAQIEKAVHFDKIPGF


like
256
GESIESQCGTSLSIHTSLKDILTQITKPSDNYTVGIASRLYAEEKYPILSEYLQCIKELYKGGLEPI


[Empidonax

SFQTAAEQARELINSWVESQTNGMIKNILQPSSVNPETDMVLVNAIYFKGLWEKAFKDEGTQT



traillii]


VPFRITEQESKPVQMMFQIGSFKVAEITSEKIRILELPYASGKLSLWVLLPDDISGLEQLETAITF




ENLKEWTSSTRMEERKIKVYLPRMKIEEKYNLTSVLTSLGITDLFSSSANLSGISSAERLKVSSA




FHEVFVEIYEAGSKVEGSTGAGVDDTSVSEEFRADHPFLFLVKHNPSNSIIFFGRCYLP





PREDICTED:
SEQ ID NO:
MGSTGAASMEFCFALFRELKVQHVNENIFFSPVTIISALSMVYLGARENTRAQLDKVAPFDKIT


Ovalbumin-
257
GFGETIGSQCSTSASSHTSLKDVFTQITKASDNYSLSFASRLYAEETYPILPEYLQCVKELYKGG


like

LESISFQTAADQARELINSWVESQTNGMIKDILRPSSVDPQTKIILITAIYFKGMWEKAFKEEDT


[Leptosomus

QAVPFRMTEQESKPVQMMYQIGSFKVAVIPSEKLKILELPYASGQLSMLVILPDDVSGLEQLET



discolor]


AITTEKLKEWTSPSMMKERKMKVYFPRMRIEEKYNLTSVLMALGITDLFSPSANLSGISSAESL




KVSEAVHEASVDIDEAGSEVIGSTGVGTEVTSVSEEIRADHPFLFLIKHKPTNSILFFGRCFSP





Hypothetical
SEQ ID NO:
MEHAQLTQLVNSNMTSNTCHEADEFENIDFRMDSISVTNTKFCFDVFNEMKVHHVNENILYS


protein
258
PLSILTALAMVYLGARGNTESQMKKALHFDSITGAGSTTDSQCGSSEYIHNLFKEFLTEITRTN


H355_008077

ATYSLEIADKLYVDKTFTVLPEYINCARKFYTGGVEEVNFKTAAEEARQLINSWVEKETNGQI


[Colinus

KDLLVPSSVDFGTMMVFINTIYFKGIWKTAFNTEDTREMPFSMTKQESKPVQMMCLNDTFNM



virginianus]


ATLPAEKMRILELPYASGELSMLVLLPDEVSGLEQIEKAINFEKLREWTSTNAMEKKSMKVYL




PRMKIEEKYNLTSTLMALGMTDLFSRSANLTGISSVENLMISDAVHGAFMEVNEEGTEAAGST




GAIGNIKHSVEFEEFRADHPFLFLIRYNPTNVILFFDNSEFTMGSIGAVSTEFCFDVFKELRVHH




ANENIFYSPFTVISALAMVYLGAKDSTRTQINKVVRFDKLPGFGDSIEAQCGTSANVHSSLRDI




LNQITKPNDIYSFSLASRLYADETYTILPEYLQCVKELYRGGLESINFQTAADQARELINSWVES




QTSGIIRNVLQPSSVDSQTAMVLVNAIYFKGLWEKGFKDEDTQAMPFRVTEQENKSVQMMY




QIGTFKVASVASEKMKILELPFASGTMSMWVLLPDEVSGLEQLETTISIEKLTEWTSSSVMEER




KIKVFLPRMKMEEKYNLTSVLMAMGMTDLFSSSANLSGISSTLQKKGFRSQELGDKYAKPML




ESPALTPQVTAWDNSWIVAHPAAIEPDLCYQIMEQKWKPFDWPDFRLPMRVSCRFRTMEALN




KANTSFALDFFKHECQEDDDENILFSPFSISSALATVYLGAKGNTADQMAKTEIGKSGNIHAGF




KALDLEINQPTKNYLLNSVNQLYGEKSLPFSKEYLQLAKKYYSAEPQSVDFLGKANEIRREINS




RVEHQTEGKIKNLLPPGSIDSLTRLVLVNALYFKGNWATKFEAEDTRHRPFRINMHTTKQVPM




MYLRDKFNWTYVESVQTDVLELPYVNNDLSMFILLPRDITGLQKLINELTFEKLSAWTSPELM




EKMKMEVYLPRFTVEKKYDMKSTLSKMGIEDAFTKVDSCGVTNVDEITTHIVSSKCLELKHIQ




INKKLKCNKAVAMEQVSASIGNFTIDLFNKLNETSRDKNIFFSPWSVSSALALTSLAAKGNTAR




EMAEDPENEQAENIHSGFKELMTALNKPRNTYSLKSANRIYVEKNYPLLPTYIQLSKKYYKAE




PYKVNFKTAPEQSRKEINNWVEKQTERKIKNFLSSDDVKNSTKSILVNAIYFKAEWEEKFQAG




NTDMQPFRMSKNKSKLVKMMYMRHTFPVLIMEKLNFKMIELPYVKRELSMFILLPDDIKDST




TGLEQLERELTYEKLSEWADSKKMSVTLVDLHLPKFSMEDRYDLKDALKSMGMASAFNSNA




DFSGMTGFQAVPMESLSASTNSFTLDLYKKLDETSKGQNIFFASWSIATALAMVHLGAKGDT




ATQVAKGPEYEETENIHSGFKELLSAINKPRNTYLMKSANRLFGDKTYPLLPKFLELVARYYQ




AKPQAVNFKTDAEQARAQINSWVENETESKIQNLLPAGSIDSHTVLVLVNAIYFKGNWEKRFL




EKDTSKMPFRLSKTETKPVQMMFLKDTFLIHHERTMKFKIIELPYVGNELSAFVLLPDDISDNT




TGLELVERELTYEKLAEWSNSASMMKAKVELYLPKLKMEENYDLKSVLSDMGIRSAFDPAQ




ADFTRMSEKKDLFISKVIHKAFVEVNEEDRIVQLASGRLTGRCRTLANKELSEKNRTKNLFFSP




FSISSALSMILLGSKGNTEAQIAKVLSLSKAEDAHNGYQSLLSEINNPDTKYILRTANRLYGEKT




FEFLSSFIDSSQKFYHAGLEQTDFKNASEDSRKQINGWVEEKTEGKIQKLLSEGIINSMTKLVLV




NAIYFKGNWQEKFDKETTKEMPFKINKNETKPVQMMFRKGKYNMTYIGDLETTVLEIPYVDN




ELSMIILLPDSIQDESTGLEKLERELTYEKLMDWINPNMMDSTEVRVSLPRFKLEENYELKPTL




STMGMPDAFDLRTADFSGISSGNELVLSEVVHKSFVEVNEEGTEAAAATAGIMLLRCAMIVA




NFTADHPFLFFIRHNKTNSILFCGRFCSP





PREDICTED:
SEQ ID NO:
MGSIGTASTEFCFDMFKEMKVQHANQNIIFSPLTIISALSMVYLGARDNTKAQMEKVIHFDKIT


Ovalbumin
259
GFGESVESQCGTSVSIHTSLKDMLSEITKPSDNYSLSLASRLYAEETYPILPEYLQCMKELYKG


isoform X2

GLETVSFQTAADQARELINSWVESQTNGVIKNFLQPGSVDPQTEMVLVNAIYFKGMWEKAFK


[Apteryx

DEDTQEVPFRITEQESKPVQMMYQVGSFKVATVAAEKMKILEIPYTHRELSMFVLLPDDISGL



australis


EQLETTISFEKLTEWTSSNMMEERKVKVYLPHMKIEEKYNLTSVLMALGMTDLFSPSANLSGI



mantelli]


STAQTLMMSEAIHGAYVEIYEAGREMASSTGVQVEVTSVLEEVRADKPFLFFIRHNPTNSMVV




FGRYMSP





Hypothetical
SEQ ID NO:
MTSNTCHEADEFENIDFRMDSISVTNTKFCFDVFNEMKVHHVNENILYSPLSILTALAMVYLG


protein
260
ARGNTESQMKKALHFDSITGGGSTTDSQCGSSEYIHNLFKEFLTEITRTNATYSLEIADKLYVD


ASZ78_006007

KTFTVLPEYINCARKFYTGGVEEVNFKTAAEEARQLMNSWVEKETNGQIKDLLVPSSVDFGT


[Callipepla

MMVFINTIYFKGIWKTAFNTEDTREMPFSMTKQESKPVQMMCLNDTFNMVTLPAEKMRILEL



squamata]


PYASGELSMLVLLPDEVSGLERIEKAINFEKLREWTSTNAMEKKSMKVYLPRMKIEEKYNLTS




TLMALGMTDLFSRSANLTGISSVDNLMISDAVHGAFMEVNEEGTEAAGSTGAIGNIKHSVEFE




EFRADHPFLFLIRYNPTNVILFFDNSEFTMGSIGAVSTEFCFDVFKELRVHHANENIFYSPFTIISA




LAMVYLGAKDSTRTQINKVVRFDKLPGFGDSIEAQCGTSANVHSSLRDILNQITKPNDIYSFSL




ASRLYADETYTILPEYLQCVKELYRGGLESINFQTAADQARELINSWVESQTSGIIRNVLQPSSV




DSQTAMVLVNAIYFKGLWEKGFKDEDTQAIPFRVTEQENKSVQMMYQIGTFKVASVASEKM




KILELPFASGTMSMWVLLPDEVSGLEQLETTISIEKLTEWTSSSVMEERKIKVFLPRMKMEEKY




NLTSVLMAMGMTDLFSSSANLSGISSTLQKKGFRSQELGDKYAKPMLESPALTPQATAWDNS




WIVAHPPAIEPDLYYQIMEQKWKPFDWPDFRLPMRVSCRFRTMEALNKANTSFALDFFKHEC




QEDDSENILFSPFSISSALATVYLGAKGNTADQMAKVLHFNEAEGARNVTTTIRMQVYSRTDQ




QRLNRRACFQKTEIGKSGNIHAGFKGLNLEINQPTKNYLLNSVNQLYGEKSLPFSKEYLQLAK




KYYSAEPQSVDFVGTANEIRREINSRVEHQTEGKIKNLLPPGSIDSLTRLVLVNALYFKGNWAT




KFEAEDTRHRPFRINTHTTKQVPMMYLSDKFNWTYVESVQTDVLELPYVNNDLSMFILLPRDI




TGLQKLINELTFEKLSAWTSPELMEKMKMEVYLPRFTVEKKYDMKSTLSKMGIEDAFTKVDN




CGVTNVDEITIHVVPSKCLELKHIQINKELKCNKAVAMEQVSASIGNFTIDLFNKLNETSRDKN




IFFSPWSVSSALALTSLAAKGNTAREMAEDPENEQAENIHSGFNELLTALNKPRNTYSLKSAN




RIYVEKNYPLLPTYIQLSKKYYKAEPHKVNFKTAPEQSRKEINNWVEKQTERKIKNFLSSDDV




KNSTKLILVNAIYFKAEWEEKFQAGNTDMQPFRMSKNKSKLVKMMYMRHTFPVLIMEKLNF




KMIELPYVKRELSMFILLPDDIKDSTTGLEQLERELTYEKLSEWADSKKMSVTLVDLHLPKFS




MEDRYDLKDALRSMGMASAFNSNADFSGMTGERDLVISKVCHQSFVAVDEKGTEAAAATA




VIAEAVPMESLSASTNSFTLDLYKKLDETSKGQNIFFASWSIATALTMVHLGAKGDTATQVAK




GPEYEETENIHSGFKELLSALNKPRNTYSMKSANRLFGDKTYPLLPTKTKPVQMMFLKDTFLI




HHERTMKFKIIELPYMGNELSAFVLLPDDISDNTTGLELVERELTYEKLAEWSNSASMMKVKV




ELYLPKLKMEENYDLKSALSDMGIRSAFDPAQADFTRMSEKKDLFISKVIHKAFVEVNEEDRI




VQLASGRLTGNTEAQIAKVLSLSKAEDAHNGYQSLLSEINNPDTKYILRTANRLYGEKTFEFLS




SFIDSSQKFYHAGLEQTDFKNASEDSRKQINGWVEEKTEGKIQKLLSEGIINSMTKLVLVNAIY




FKGNWQEKFDKETTKEMPFKINKNETKPVQMMFRKGKYNMTYIGDLETTVLEIPYVDNELS




MIILLPDSIQDESTGLEKLERELTYEKLMDWINPNMMDSTEVRVSLPRFKLEENYELKPTLSTM




GMPDAFDLRTADFSGISSGNELVLSEVVHKSFVEVNEEGTEAAAATAGIMLLRCAMIVANFTA




DHPFLFFIRHNKTNSILFCGRFCSP





PREDICTED:
SEQ ID NO:
MASIGAASTEFCFDVFKELKTQHVKENIFYSPMAIISALSMVYIGARENTRAEIDKVVHFDKIT


Ovalbumin-
261
GFGNAVESQCGPSVSVHSSLKDLITQISKRSDNYSLSYASRIYAEETYPILPEYLQCVKEVYKG


like

GLESISFQTAADQARENINAWVESQTNGMIKNILQPSSVNPQTEMVLVNAIYLKGMWEKAFK


[Mesitornis

DEDTQTMPFRVTQQESKPVQMMYQIGSFKVAVIASEKMKILELPYTSGQLSMLVLLPDDVSG



unicolor]


LEQVESAITAEKLMEWTSPSIMEERTMKVYLPRMKMVEKYNLTSVLMALGMTDLFTSVANL




SGISSAQGLKMSQAIHEAFVEIYEAGSEAVGSTGVGMEITSVSEEFKADLSFLFLIRHNPTNSIIF




FGRCISP





Ovalbumin,
SEQ ID NO:
MGSIGAASTEFCFDVFRELRVQHVNENIFYSPFSIISALAMVYLGARDNTRTQIDKISQFQALSD


partial [Anas
262
EHLVLCIQQLGEFFVCTNRERREVTRYSEQTEDKTQDQNTGQIHKIVDTCMLRQDILTQITKPS



platyrhynchos]


DNFSLSFASRLYAEETYAILPEYLQCVKELYKGGLESISFQTAADQARELINSWVESQINGIIKN




ILQPSSVDSQTTMVLVNAIYFKGMWEKAFKDEDTQAMPFRMTEQESKPVQMMYQVGSFKVA




MVTSEKMKILELPFASGMMSMFVLLPDEVSGLEQLESTISFEKLTEWTSSTMMEERRMKVYLP




RMKMEEKYNLTSVFMALGMTDLFSSSANMSGISSTVSLKMSEAVHAACVEIFEAGRDVVGSA




EAGMDVTSVSEEFRADHPFLFFIKHNPTNSILFFGRWMSP





PREDICTED:
SEQ ID NO:
MGSIGAASAEFCLDIFKELKVQHVNENIIFSPMTIISALSLVYLGAKEDTRAQIEKVVPFDKIPGF


Ovalbumin-
263
GEIVESQCPKSASVHSSIQDIFNQIIKRSDNYSLSLASRLYAEESYPIRPEYLQCVKELDKEGLETI


like [Chaetura

SFQTAADQARQLINSWVESQTNGMIKNILQPSSVNSQTEMVLVNAIYFRGLWQKAFKDEDTQ



pelagica]


AVPFRITEQESKPVQMMQQIGSFKVAEIASEKMKILELPYASGQLSMLVLLPDDVSGLEKLESS




ITVEKLIEWTSSNLTEERNVKVYLPRLKIEEKYNLTSVLAALGITDLFSSSANLSGISTAESLKLS




RAVHESFVEIQEAGHEVEGPKEAGIEVTSALDEFRVDRPFLFVTKHNPTNSILFLGRCLSP





PREDICTED:
SEQ ID NO:
MGSISAASGEFCLDIFKELKVQHVNENIFYSPMVIVSALSLVYLGARENTRAQIDKVIPFDKITG


Ovalbumin-
264
SSEAVESQCGTPVGAHISLKDVFAQIAKRSDNYSLSFVNRLYAEETYPILPEYLQCVKELYKGG


like

LETISFQTAADQAREIINSWVESQTDGKIKNILQPSSVDPQTKMVLVSAIYFKGLWEKSFKDED


[Apaloderma

TQAVPFRVTEQESKPVQMMYQIGSFKVAALAAEKIKILELPYASEQLSMLVLLPDDVSGLEQLE



vittatum]


KKISYEKLTEWTSSSVMEEKKIKVYLPRMKIEEKYNLTSILMSLGITDLFSSSANLSGISSTKSLK




MSEAVHEASVEIYEAGSEASGITGDGMEATSVFGEFKVDHPFLFMIKHKPTNSILFFGRCISP





Ovalbumin-
SEQ ID NO:
MGSIGPVSTEVCCDIFRELRSQSVQENVCYSPLLIISTLSMVYIGAKDNTKAQIEKAIHFDKIPGF


like [Corvus
265
GESTESQCGTSVSIHTSLKDIFTQITKPSDNYSISIARRLYAEEKYPILPEYIQCVKELYKGGLESI



cornixcornix]


SFQTAAEKSRELINSWVESQTNGTIKNILQPSSVSSQTDMVLVSAIYFKGLWEKAFKEEDTQTI




PFRITEQESKPVQMMSQIGTFKVAEIPSEKCRILELPYASGRLSLWVLLPDDISGLEQLETAITFE




NLKEWTSSSKMEERKIRVYLPRMKIEEKYNLTSVLKSLGITDLFSSSANLSGISSAESLKVSAAF




HEASVEIYEAGSKGVGSSEAGVDGTSVSEEIRADHPFLFLIKHNPSDSILFFGRCFSP





PREDICTED:
SEQ ID NO:
MGSIGAASTEFCFDVFKELKVQHVNENIIISPLSIISALSMVYLGAREDTRAQIDKVVHFDKITG


Ovalbumin-
266
FGEAIESQCPTSESVHASLKETFSQLTKPSDNYSLAFASRLYAEETYPILPEYLQCVKELYKGGL


like [Calypte

ETINFQTAAEQARQVINSWVESQTDGMIKSLLQPSSVDPQTEMILVNAIYFRGLWERAFKDED



anna]


TQELPFRITEQESKPVQMMSQIGSFKVAVVASEKVKILELPYASGQLSMLVLLPDDVSGLEQLE




SSITVEKLIEWISSNTKEERNIKVYLPRMKIEEKYNLTSVLVALGITDLFSSSANLSGISSAESLKI




SEAVHEAFVEIQEAGSEVVGSPGPEVEVTSVSEEWKADRPFLFLIKHNPTNSILFFGRYISP





PREDICTED:
SEQ ID NO:
MGSIGPVSTEVCCDIFRELRSQSVQENVCYSPLLIISTLSMVYIGAKDNTKAQIEKAIHFDKIPGF


Ovalbumin
267
GESTESQCGTSVSIHTSLKDIFTQITKPSDNYSISIARRLYAEEKYPILQEYIQCVKELYKGGLESI


[Corvus

SFQTAAEKSRELINSWVESQTNGTIKNILQPSSVSSQTDMVLVSAIYFKGLWEKAFKEEDTQTI



brachyrhynchos]


PFRITEQESKPVQMMSQIGTFKVAEIPSEKCRILELPYASGRLSLWVLLPDDISGLEQLETSITFE




NLKEWTSSSKMEERKIRVYLPRMKIEEKYNLTSVLKSLGITDLFSSSANLSGISSAESLKVSAVF




HEASVEIYEAGSKGVGSSEAGVDGTSVSEEIRADHPFLFLIKHNPSDSILFFGRCFSP





Hypothetical
SEQ ID NO:
MLNLMHPKQFCCTMGSIGPVSTEVCCDIFRELRSQSVQENVCYSPLLIISTLSMVYIGAKDNTK


protein
268
AQIEKAIHFDKIPGFGESTESQCGTSVSIHTSLKDIFTQITKPSDNYSISIASRLYAEEKYPILPEYI


DUI87_08270

QCVKELYKGGLESISFQTAAEKSRELINSWVESQTNGTIKNILQPSSVSSQTDMVLVSAIYFKG


[Hirundo

LWEKAFKEEDTQTVPFRITEQESKPVQMMSQIGTFKVAEIPSEKCRILELPYASGRLSLWVLLP



rusticarustica]


DDISGLEQLETAITSENLKEWTSSSKMEERKIKVYLPRMKIEEKYNLTSVLKSLGITDLFSSSAN




LSGISSAESLKVSGAFHEAFVEIYEAGSKAVGSSGAGVEDTSVSEEIRADHPFLFFIKHNPSDSIL




FFGRCFSP





Ostrich OVA
SEQ ID NO:
EAEAGSIGTASAEFCFDVFKELKVHHVNENIFYSPLSIISALSMVYLGARENTKTQMEKVIHFD


sequence as
269
KITGLGESMESQCGTGVSIHTALKDMLSEITKPSDNYSLSLASRLYAEQTYAILPEYLQCIKELY


secreted from

KESLETVSFQTAADQARELINSWIESQTNGVIKNFLQPGSVDSQTELVLVNAIYFKGMWEKAF


pichia

KDEDTQEVPFRITEQESRPVQMMYQAGSFKVATVAAEKIKILELPYASGELSMLVLLPDDISGL




EQLETTISFEKLTEWTSSNMMEDRNMKVYLPRMKIEEKYNLTSVLIALGMTDLFSPAANLSGI




SAAESLKMSEAIHAAYVEIYEADSEIVSSAGVQVEVTSDSEEFRVDHPFLFLIKHNPTNSVLFFG




RCISP





Ostrich
SEQ ID NO:
MRFPSIFTAVLFAASSALAAPVNTTTEDETAQIPAEAVIGYSDLEGDFDVAVLPFSNSTNNGLL


construct
270
FINTTIASIAAKEEGVSLEKREAEAGSIGTASAEFCFDVFKELKVHHVNENIFYSPLSIISALSMV


(secretion

YLGARENTKTQMEKVIHFDKITGLGESMESQCGTGVSIHTALKDMLSEITKPSDNYSLSLASRL


signal +

YAEQTYAILPEYLQCIKELYKESLETVSFQTAADQARELINSWIESQTNGVIKNFLQPGSVDSQ


mature

TELVLVNAIYFKGMWEKAFKDEDTQEVPFRITEQESRPVQMMYQAGSFKVATVAAEKIKILE


protein)

LPYASGELSMLVLLPDDISGLEQLETTISFEKLTEWTSSNMMEDRNMKVYLPRMKIEEKYNLT




SVLIALGMTDLFSPAANLSGISAAESLKMSEAIHAAYVEIYEADSEIVSSAGVQVEVTSDSEEFR




VDHPFLFLIKHNPTNSVLFFGRCISP





Duck OVA
SEQ ID NO:
EAEAGSIGAASTEFCFDVFRELRVQHVNENIFYSPFSIISALAMVYLGARDNTRTQIDKVVHFD


sequence as
271
KLPGFGESMEAQCGTSVSVHSSLRDILTQITKPSDNFSLSFASRLYAEETYAILPEYLQCVKELY


secreted from

KGGLESISFQTAADQARELINSWVESQINGIIKNILQPSSVDSQTTMVLVNAIYFKGMWEKAF


pichia

KDEDTQAMPFRMTEQESKPVQMMYQVGSFKVAMVTSEKMKILELPFASGMMSMFVLLPDE




VSGLEQLESTISFEKLTEWTSSTMMEERRMKVYLPRMKMEEKYNLTSVFMALGMTDLFSSSA




NMSGISSTVSLKMSEAVHAACVEIFEAGRDVVGSAEAGMDVTSVSEEFRADHPFLFFIKHNPT




NSILFFGRWMSP





Duck
SEQ ID NO:
MRFPSIFTAVLFAASSALAAPVNTTTEDETAQIPAEAVIGYSDLEGDFDVAVLPFSNSTNNGLL


construct
272
FINTTIASIAAKEEGVSLEKREAEAGSIGAASTEFCFDVFRELRVQHVNENIFYSPFSIISALAMV


(secretion

YLGARDNTRTQIDKVVHFDKLPGFGESMEAQCGTSVSVHSSLRDILTQITKPSDNFSLSFASRL


signal +

YAEETYAILPEYLQCVKELYKGGLESISFQTAADQARELINSWVESQINGIIKNILQPSSVDSQT


mature

TMVLVNAIYFKGMWEKAFKDEDTQAMPFRMTEQESKPVQMMYQVGSFKVAMVTSEKMKIL


protein)

ELPFASGMMSMFVLLPDEVSGLEQLESTISFEKLTEWTSSTMMEERRMKVYLPRMKMEEKYN




LTSVFMALGMTDLFSSSANMSGISSTVSLKMSEAVHAACVEIFEAGRDVVGSAEAGMDVTSV




SEEFRADHPFLFFIKHNPTNSILFFGRWMSP





Ovoglobulin
SEQ ID NO:
TRAPDCGGILTPLGLSYLAEVSKPHAEVVLRQDLMAQRASDLFLGSMEPSRNRITSVKVADL


G2
273
WLSVIPEAGLRLGIEVELRIAPLHAVPMPVRISIRADLHVDMGPDGNLQLLTSACRPTVQAQST




REAESKSSRSILDKVVDVDKLCLDVSKLLLFPNEQLMSLTALFPVTPNCQLQYLPLAAPVFSKQ




GIALSLQTTFQVAGAVVPVPVSPVPFSMPELASTSTSHLILALSEHFYTSLYFTLERAGAFNMTI




PSMLTTATLAQKITQVGSLYHEDLPITLSAALRSSPRVVLEEGRAALKLFLTVHIGAGSPDFQSF




LSVSADVTAGLQLSVSDTRMMISTAVIEDAELSLAASNVGLVRAALLEELFLAPVCQQVPAW




MDDVLREGVHLPHLSHFTYTDVNVVVHKDYVLVPCKLKLRSTMA*





Ovoglobulin
SEQ ID NO:
MDSISVTNAKFCFDVFNEMKVHHVNENILYCPLSILTALAMVYLGARGNTESQMKKVLHFDS


G3
274
ITGAGSTTDSQCGSSEYVHNLFKELLSEITRPNATYSLEIADKLYVDKTFSVLPEYLSCARKFYT




GGVEEVNFKTAAEEARQLINSWVEKETNGQIKDLLVSSSIDFGTTMVFINTIYFKGIWKIAFNT




EDTREMPFSMTKEESKPVQMMCMNNSFNVATLPAEKMKILELPYASGDLSMLVLLPDEVSGL




ERIEKTINFDKLREWTSTNAMAKKSMKVYLPRMKIEEKYNLTSILMALGMTDLFSRSANLTGI




SSVDNLMISDAVHGVFMEVNEEGTEATGSTGAIGNIKHSLELEEFRADHPFLFFIRYNPTNAILF




FGRYWSP*





β-ovomucin
SEQ ID NO:
CSTWGGGHFSTFDKYQYDFTGTCNYIFATVCDESSPDFNIQFRRGLDKKIARIIIELGPSVIIVEK



275
DSISVRSVGVIKLPYASNGIQIAPYGRSVRLVAKLMEMELVVMWNNEDYLMVLTEKKYMGK




TCGMCGNYDGYELNDFVSEGKLLDTYKFAALQKMDDPSEICLSEEISIPAIPHKKYAVICSQLL




NLVSPTCSVPKDGFVTRCQLDMQDCSEPGQKNCTCSTLSEYSRQCAMSHQVVFNWRTENFCS




VGKCSANQIYEECGSPCIKTCSNPEYSCSSHCTYGCFCPEGTVLDDISKNRTCVHLEQCPCTLN




GETYAPGDTMKAACRTCKCTMGQWNCKELPCPGRCSLEGGSFVTTFDSRSYRFHGVCTYILM




KSSSLPHNGTLMAIYEKSGYSHSETSLSAIIYLSTKDKIVISQNELLTDDDELKRLPYKSGDITIF




KQSSMFIQMHTEFGLELVVQTSPVFQAYVKVSAQFQGRTLGLCGNYNGDTTDDFMTSMDITE




GTASLFVDSWRAGNCLPAMERETDPCALSQLNKISAETHCSILTKKGTVFETCHAVVNPTPFY




KRCVYQACNYEETFPYICSALGSYARTCSSMGLILENWRNSMDNCTITCTGNQTFSYNTQACE




RTCLSLSNPTLECHPTDIPIEGCNCPKGMYLNHKNECVRKSHCPCYLEDRKYILPDQSTMTGGI




TCYCVNGRLSCTGKLQNPAESCKAPKKYISCSDSLENKYGATCAPTCQMLATGIECIPTKCES




GCVCADGLYENLDGRCVPPEECPCEYGGLSYGKGEQIQTECEICTCRKGKWKCVQKSRCSST




CNLYGEGHITTFDGQRFVFDGNCEYILAMDGCNVNRPLSSFKIVTENVICGKSGVTCSRSISIYL




GNLTIILRDETYSISGKNLQVKYNVKKNALHLMFDIIIPGKYNMTLIWNKHMNFFIKISRETQET




ICGLCGNYNGNMKDDFETRSKYVASNELEFVNSWKENPLCGDVYFVVDPCSKNPYRKAWAE




KTCSIINSQVFSACHNKVNRMPYYEACVRDSCGCDIGGDCECMCDAIAVYAMACLDKGICID




WRTPEFCPVYCEYYNSHRKTGSGGAYSYGSSVNCTWHYRPCNCPNQYYKYVNIEGCYNCSH




DEYFDYEKEKCMPCAMQPTSVTLPTATQPTSPSTSSASTVLTETTNPPV*





Lysozyme
SEQ ID NO:
KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNFNTQATNRNTDGSTDYGILQINSR



276
WWCNDGRTPGSRNLCNIPCSALLSSDITASVNCAKKIVSDGNGMNAWVAWRNRCKGTDVQ




AWIRGCRL*





Lysozyme
SEQ ID NO:
KVFGRCELAAAMKRHGLDNYRGYSLGNWVCVAKFESNFNTQATNRNTDGSTDYGILQINSR



277
WWCNDGRTPGSRNLCNIPCSALLSSDITASVNCAKKIVSDGNGMSAWVAWRNRCKGTDVQA




WIRGCRL*





Lysozyme C
SEQ ID NO:
KVFERCELARTLKRLGMDGYRGISLANWMCLAKWESGYNTRATNYNAGDRSTDYGIFQINS


(Human)
278
RYWCNDGKTPGAVNACHLSCSALLQDNIADAVACAKRVVRDPQGIRAWVAWRNRCQNRD




VRQYVQGCGV*





Lysozyme C
SEQ ID NO:
KVFERCELARTLKKLGLDGYKGVSLANWLCLTKWESSYNTKATNYNPSSESTDYGIFQINSK


(Bostaurus)
279
WWCNDGKTPNAVDGCHVSCRELMENDIAKAVACAKHIVSEQGITAWVAWKSHCRDHDVSS




YVEGCTL*





Ovoinhibitor
SEQ ID NO:
IEVNCSLYASGIGKDGTSWVACPRNLKPVCGTDGSTYSNECGICLYNREHGANVEKEYDGEC



280
RPKHVMIDCSPYLQVVRDGNTMVACPRILKPVCGSDSFTYDNECGICAYNAEHHTNISKLHD




GECKLEIGSVDCSKYPSTVSKDGRTLVACPRILSPVCGTDGFTYDNECGICAHNAEQRTHVSK




KHDGKCRQEIPEIDCDQYPTRKTTGGKLLVRCPRILLPVCGTDGFTYDNECGICAHNAQHGTE




VKKSHDGRCKERSTPLDCTQYLSNTQNGEAITACPFILQEVCGTDGVTYSNDCSLCAHNIELG




TSVAKKHDGRCREEVPELDCSKYKTSTLKDGRQVVACTMIYDPVCATNGVTYASECTLCAH




NLEQRTNLGKRKNGRCEEDITKEHCREFQKVSPICTMEYVPHCGSDGVTYSNRCFFCNAYVQ




SNRTLNLVSMAAC*





Cystatin
SEQ ID NO:
MAGARGCVVLLAAALMLVGAVLGSEDRSRLLGAPVPVDENDEGLQRALQFAMAEYNRASN



281
DKYSSRVVRVISAKRQLVSGIKYILQVEIGRTTCPKSSGDLQSCEFHDEPEMAKYTTCTFVVYS




IPWLNQIKLLESKCQ*





Porcine
SEQ ID NO:
SEVCFPRLGCFSDDAPWAGIVQRPLKILPWSPKDVDTRFLLYTNQNQNNYQELVADPSTITNS


Lipase
282
NFRMDRKTRFIIHGFIDKGEEDWLSNICKNLFKVESVNCICVDWKGGSRTGYTQASQNIRIVG




AEVAYFVEVLKSSLGYSPSNVHVIGHSLGSHAAGEAGRRTNGTIERITGLDPAEPCFQGTPELV




RLDPSDAKFVDVIHTDAAPIIPNLGFGMSQTVGHLDFFPNGGKQMPGCQKNILSQIVDIDGIWE




GTRDFVACNHLRSYKYYADSILNPDGFAGFPCDSYNVFTANKCFPCPSEGCPQMGHYADRFP




GKTNGVSQVFYLNTGDASNFARWRYKVSVTLSGKKVTGHILVSLFGNEGNSRQYEIYKGTLQ




PDNTHSDEFDSDVEVGDLQKVKFIWYNNNVINPTLPRVGASKITVERNDGKVYDFCSQETVR




EEVLLTLNPC*





Kid Lipase
SEQ ID NO:
GLVAADRITGGKDFRDIESKFALRTPEDTAEDTCHLIPGVTESVANCHFNHSSKTFVVIHGWTV



283
TGMYESWVPKLVAALYKREPDSNVIVVDWLSRAQQHYPVSAGYTKLVGQDVAKFMNWMA




DEFNYPLGNVHLLGYSLGAHAAGIAGSLTSKKVNRITGLDPAGPNFEYAEAPSRLSPDDADFV




DVLHTFTRGSPGRSIGIQKPVGHVDIYPNGGTFQPGCNIGEALRVIAERGLGDVDQLVKCSHER




SVHLFIDSLLNEENPSKAYRCNSKEAFEKGLCLSCRKNRCNNMGYEINKVRAKRSSKMYLKT




RSQMPYKVFHYQVKIHFSGTESNTYTNQAFEISLYGTVAESENIPFTLPEVSTNKTYSFLLYTEV




DIGELLMLKLKWISDSYFSWSNWWSSPGFDIGKIRVKAGETQKKVIFCSREKMSYLQKGKSPV




IFVKCHDKSLNRKSG*





Porcine
SEQ ID NO:
APKKGVRWCVISTAEYSKCRQWQSKIRRTNPMFCIRRASPTDCIRAIAAKRADAVTLDGGLVF


Lactoferrin
284
EADQYKLRPVAAEIYGTEENPQTYYYAVAVVKKGFNFQLNQLQGRKSCHTGLGRSAGWNIPI




GLLRRFLDWAGPPEPLQKAVAKFFSQSCVPCADGNAYPNLCQLCIGKGKDKCACSSQEPYFG




YSGAFNCLHKGIGDVAFVKESTVFENLPQKADRDKYELLCPDNTRKPVEAFRECHLARVPSH




AVVARSVNGKENSIWELLYQSQKKFGKSNPQEFQLFGSPGQQKDLLFRDATIGFLKIPSKIDSK




LYLGLPYLTAIQGLRETAAEVEARQAKVVWCAVGPEELRKCRQWSSQSSQNLNCSLASTTED




CIVQVLKGEADAMSLDGGFIYTAGKCGLVPVLAENQKSRQSSSSDCVHRPTQGYFAVAVVRK




ANGGITWNSVRGTKSCHTAVDRTAGWNIPMGLLVNQTGSCKFDEFFSQSCAPGSQPGSNLCA




LCVGNDQGVDKCVPNSNERYYGYTGAFRCLAENAGDVAFVKDVTVLDNINGQNTEEWARE




LRSDDFELLCLDGTRKPVTEAQNCHLAVAPSHAVVSRKEKAAQVEQVLLTEQAQFGRYGKD




CPDKFCLFRSETKNLLFNDNTEVLAQLQGKTTYEKYLGSEYVTAIANLKQCSVSPLLEACAFM




MR*





Bovine
SEQ ID NO:
APRKNVRWCTISQPEWFKCRRWQWRMKKLGAPSITCVRRAFALECIRAIAEKKADAVTLDG


Lactoferrin
285
GMVFEAGRDPYKLRPVAAEIYGTKESPQTHYYAVAVVKKGSNFQLDQLQGRKSCHTGLGRS




AGWIIPMGILRPYLSWTESLEPLQGAVAKFFSASCVPCIDRQAYPNLCQLCKGEGENQCACSSR




EPYFGYSGAFKCLQDGAGDVAFVKETTVFENLPEKADRDQYELLCLNNSRAPVDAFKECHLA




QVPSHAVVARSVDGKEDLIWKLLSKAQEKFGKNKSRSFQLFGSPPGQRDLLFKDSALGFLRIP




SKVDSALYLGSRYLTTLKNLRETAEEVKARYTRVVWCAVGPEEQKKCQQWSQQSGQNVTC




ATASTTDDCIVLVLKGEADALNLDGGYIYTAGKCGLVPVLAENRKSSKHSSLDCVLRPTEGYL




AVAVVKKANEGLTWNSLKDKKSCHTAVDRTAGWNIPMGLIVNQTGSCAFDEFFSQSCAPGA




DPKSRLCALCAGDDQGLDKCVPNSKEKYYGYTGAFRCLAEDVGDVAFVKNDTVWENTNGE




STADWAKNLNREDFRLLCLDGTRKPVTEAQSCHLAVAPNHAVVSRSDRAAHVKQVLLHQQA




LFGKNGKNCPDKFCLFKSETKNLLFNDNTECLAKLGGRPTYEEYLGTEYVTALANLKKCSTSP




LLEACAFLTR*





Lysozyme
SEQ ID NO:
KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNFNTQATNRNTDGSTDYGILQINSR



276
WWCNDGRTPGSRNLCNIPCSALLSSDITASVNCAKKIVSDGNGMNAWVAWRNRCKGTDVQ




AWIRGCRL*





Lysozyme
SEQ ID NO:
KVFGRCELAAAMKRHGLDNYRGYSLGNWVCVAKFESNFNTQATNRNTDGSTDYGILQINSR



277
WWCNDGRTPGSRNLCNIPCSALLSSDITASVNCAKKIVSDGNGMSAWVAWRNRCKGTDVQA




WIRGCRL*





Lysozyme C
SEQ ID NO:
KVFERCELARTLKRLGMDGYRGISLANWMCLAKWESGYNTRATNYNAGDRSTDYGIFQINS


(Human)
278
RYWCNDGKTPGAVNACHLSCSALLQDNIADAVACAKRVVRDPQGIRAWVAWRNRCQNRD




VRQYVQGCGV*





Lysozyme C
SEQ ID NO:
KVFERCELARTLKKLGLDGYKGVSLANWLCLTKWESSYNTKATNYNPSSESTDYGIFQINSK


(Bostaurus)
279
WWCNDGKTPNAVDGCHVSCRELMENDIAKAVACAKHIVSEQGITAWVAWKSHCRDHDVSS




YVEGCTL*





Ovoinhibitor
SEQ ID NO:
IEVNCSLYASGIGKDGTSWVACPRNLKPVCGTDGSTYSNECGICLYNREHGANVEKEYDGEC



280
RPKHVMIDCSPYLQVVRDGNTMVACPRILKPVCGSDSFTYDNECGICAYNAEHHTNISKLHD




GECKLEIGSVDCSKYPSTVSKDGRTLVACPRILSPVCGTDGFTYDNECGICAHNAEQRTHVSK




KHDGKCRQEIPEIDCDQYPTRKTTGGKLLVRCPRILLPVCGTDGFTYDNECGICAHNAQHGTE




VKKSHDGRCKERSTPLDCTQYLSNTQNGEAITACPFILQEVCGTDGVTYSNDCSLCAHNIELG




TSVAKKHDGRCREEVPELDCSKYKTSTLKDGRQVVACTMIYDPVCATNGVTYASECTLCAH




NLEQRTNLGKRKNGRCEEDITKEHCREFQKVSPICTMEYVPHCGSDGVTYSNRCFFCNAYVQ




SNRTLNLVSMAAC*





Cystatin
SEQ ID NO:
MAGARGCVVLLAAALMLVGAVLGSEDRSRLLGAPVPVDENDEGLQRALQFAMAEYNRASN



281
DKYSSRVVRVISAKRQLVSGIKYILQVEIGRTTCPKSSGDLQSCEFHDEPEMAKYTTCTFVVYS




IPWLNQIKLLESKCQ*





Porcine
SEQ ID NO:
SEVCFPRLGCFSDDAPWAGIVQRPLKILPWSPKDVDTRFLLYTNQNQNNYQELVADPSTITNS


Lipase
282
NFRMDRKTRFIIHGFIDKGEEDWLSNICKNLFKVESVNCICVDWKGGSRTGYTQASQNIRIVG




AEVAYFVEVLKSSLGYSPSNVHVIGHSLGSHAAGEAGRRTNGTIERITGLDPAEPCFQGTPELV




RLDPSDAKFVDVIHTDAAPIIPNLGFGMSQTVGHLDFFPNGGKQMPGCQKNILSQIVDIDGIWE




GTRDFVACNHLRSYKYYADSILNPDGFAGFPCDSYNVFTANKCFPCPSEGCPQMGHYADRFP




GKTNGVSQVFYLNTGDASNFARWRYKVSVTLSGKKVTGHILVSLFGNEGNSRQYEIYKGTLQ




PDNTHSDEFDSDVEVGDLQKVKFIWYNNNVINPTLPRVGASKITVERNDGKVYDFCSQETVR




EEVLLTLNPC*





Kid Lipase
SEQ ID NO:
GLVAADRITGGKDFRDIESKFALRTPEDTAEDTCHLIPGVTESVANCHFNHSSKTFVVIHGWTV



283
TGMYESWVPKLVAALYKREPDSNVIVVDWLSRAQQHYPVSAGYTKLVGQDVAKFMNWMA




DEFNYPLGNVHLLGYSLGAHAAGIAGSLTSKKVNRITGLDPAGPNFEYAEAPSRLSPDDADFV




DVLHTFTRGSPGRSIGIQKPVGHVDIYPNGGTFQPGCNIGEALRVIAERGLGDVDQLVKCSHER




SVHLFIDSLLNEENPSKAYRCNSKEAFEKGLCLSCRKNRCNNMGYEINKVRAKRSSKMYLKT




RSQMPYKVFHYQVKIHFSGTESNTYTNQAFEISLYGTVAESENIPFTLPEVSTNKTYSFLLYTEV




DIGELLMLKLKWISDSYFSWSNWWSSPGFDIGKIRVKAGETQKKVIFCSREKMSYLQKGKSPV




IFVKCHDKSLNRKSG*





Porcine
SEQ ID NO:
APKKGVRWCVISTAEYSKCRQWQSKIRRTNPMFCIRRASPTDCIRAIAAKRADAVTLDGGLVF


Lactoferrin
284
EADQYKLRPVAAEIYGTEENPQTYYYAVAVVKKGFNFQLNQLQGRKSCHTGLGRSAGWNIPI




GLLRRFLDWAGPPEPLQKAVAKFFSQSCVPCADGNAYPNLCQLCIGKGKDKCACSSQEPYFG




YSGAFNCLHKGIGDVAFVKESTVFENLPQKADRDKYELLCPDNTRKPVEAFRECHLARVPSH




AVVARSVNGKENSIWELLYQSQKKFGKSNPQEFQLFGSPGQQKDLLFRDATIGFLKIPSKIDSK




LYLGLPYLTAIQGLRETAAEVEARQAKVVWCAVGPEELRKCRQWSSQSSQNLNCSLASTTED




CIVQVLKGEADAMSLDGGFIYTAGKCGLVPVLAENQKSRQSSSSDCVHRPTQGYFAVAVVRK




ANGGITWNSVRGTKSCHTAVDRTAGWNIPMGLLVNQTGSCKFDEFFSQSCAPGSQPGSNLCA




LCVGNDQGVDKCVPNSNERYYGYTGAFRCLAENAGDVAFVKDVTVLDNINGQNTEEWARE




LRSDDFELLCLDGTRKPVTEAQNCHLAVAPSHAVVSRKEKAAQVEQVLLTEQAQFGRYGKD




CPDKFCLFRSETKNLLFNDNTEVLAQLQGKTTYEKYLGSEYVTAIANLKQCSVSPLLEACAFM




MR*





Bovine
SEQ ID NO:
APRKNVRWCTISQPEWFKCRRWQWRMKKLGAPSITCVRRAFALECIRAIAEKKADAVTLDG


Lactoferrin
285
GMVFEAGRDPYKLRPVAAEIYGTKESPQTHYYAVAVVKKGSNFQLDQLQGRKSCHTGLGRS




AGWIIPMGILRPYLSWTESLEPLQGAVAKFFSASCVPCIDRQAYPNLCQLCKGEGENQCACSSR




EPYFGYSGAFKCLQDGAGDVAFVKETTVFENLPEKADRDQYELLCLNNSRAPVDAFKECHLA




QVPSHAVVARSVDGKEDLIWKLLSKAQEKFGKNKSRSFQLFGSPPGQRDLLFKDSALGFLRIP




SKVDSALYLGSRYLTTLKNLRETAEEVKARYTRVVWCAVGPEEQKKCQQWSQQSGQNVTC




ATASTTDDCIVLVLKGEADALNLDGGYIYTAGKCGLVPVLAENRKSSKHSSLDCVLRPTEGYL




AVAVVKKANEGLTWNSLKDKKSCHTAVDRTAGWNIPMGLIVNQTGSCAFDEFFSQSCAPGA




DPKSRLCALCAGDDQGLDKCVPNSKEKYYGYTGAFRCLAEDVGDVAFVKNDTVWENTNGE




STADWAKNLNREDFRLLCLDGTRKPVTEAQSCHLAVAPNHAVVSRSDRAAHVKQVLLHQQA




LFGKNGKNCPDKFCLFKSETKNLLFNDNTECLAKLGGRPTYEEYLGTEYVTALANLKKCSTSP




LLEACAFLTR*
















TABLE 7







Miscellaneous










SEQ ID



Sequence Info
NO:
Amino acid sequence





CCW12 homolog
SEQ ID NO:
MFEKSKFVVSFLLLLQLFCVLGVHGQESGNGTTSDTAYACDIGATPFDGFNATIYQYQAS


GQ68_01574
286
DDNSIQDPVFMSTGYLQRNQLHSTTGVTNPGFNIFTAGVATTTLYGIPNVNYQNMLLELK


(chr1)

GYFRADASGNYGLSLRNIDDSAILFFGRETAFECCNENLIPLDEAPTDYSLFTIKEGEASTN




PDSYTYTQYLEAGRYYPVRTFFANIRTRAVFNFTMTLPDGSELTDFQNYIFQFGALNQQQ




CQAEIVTRENYTTTTEPWTGTFEATTTVIPSGTEPGTVIVQTPYSTIDSTSTWTGTFTTFTTD




ADGSTIAVVPSSTIDDHFASTETVLTDTAISTTVITVTSCGTSKCTKTTALTGVTQRTLTIDD




RTTVVTTYCPLPTDVATIKTASVSGSEVVQTIYTAKHSQAVSYVHPSTVTITREVCDAQTC




TQATIVTGEILQTTVVDSGSTTVVPKYVPVETHEPTFELSTL





CCW14 homolog
SEQ ID NO:
MQFTFASTSVVVSLIAALAKPAVATPPACLLACAAEVVKESSDCDALNNIQCICENEGSAI


GQ68_01658
287
HACLESTCPDGLSSTALQSFEDVCESVGTEANLDESSSSQSSSSSSSSESSSSSVSSSSSSASS


(PAS_chr1-4_

SSETSSSVTSSSVTSSSTAVSSSTESSSSVEPSTSHSSSHSSSEVSSTVAPTTSVAPTTSSITT


0510)

SSTSLTSATTSSVTISIEPTSDAADKVIIPGLAGLVGALAVGLI





CCW22 homologs
SEQ ID NO:
MQYRSLFLGSALLAAANAAVYNTTVTDVVSELETTVLTITSCAEDKCITSKSTGLITTSTL


GQ68_02511
288
TKHGVVTVVTTVCDLPSTTKSYVPPAKTTTIPPPEKTTTTVPPPAKTTTTVPPPAKTTSTVP


(chr1)

PPAKTSSHHESTITVTVPSSTSTKKIETESTTYHFVTQTTTARNITPPAITTQSHGAAGMNA




ANFVGLGAAAVAAAALVL





CCW22 homolog
SEQ ID NO:
MSLLLFLVLGAFLLSSVKAADIGAFRLRVYTPGRFTNGALNFNNWGYQYLDASSSNGQL


GQ68_03003
289
FAGYATVTSVTTFLAPDDEGFVWGSSLGGYPGFLGIGAGATAFHLTGIPGDALSWYIEDN


(chr3)

ILKTSSPTYVCSRNDGDVVVGIEANTRWLAMHDTSQLPPNYYCFQADYEIVALWYIPDTT




STWTGTETSTTTDDDGSVIELVPTPLPDTTSTWTGTFTTFTTDDDGSVIELVPTPLPDSTST




WTGTYTTFTTDEDGSTIAVVPSSTIDSTSTWTGTYTTFTTDEDGSTIAVVPSSTIDSTSTWT




GTYTTFTTDEDGSTIAVYHHLLSTPHPPGLVLTPRSLPMRMEVLLLWYHHLLSTLHPPGL




VLTPRSLPMRMEVLLLWYHRLLSTPHPGLVLTPRSLPMRMEVLLLYHHLLSTPHPPGLVL




TPRSLPMRMEVLLLWY





FLO5 homolog
SEQ ID NO:
MKLQLQSFVFFLLSAVNVLADDSYGCSIATSPRSTGFVANLYEFPNMAISNAELKTYVRY


GQ68_04296
290
RYKEGRLYDTISNIISPYFYYQGQGANSAYGTLYGRPNVYLYNFSMELKGYFRPPITGQY


(chr4)

TIDFNGANVDDAAMVFFGKAGAFDCCNSDYILPEQSAEYSLYSVYPHTATDQILSATIYL




EAGKYYPLRVTYTNIGNIGSLDLRVVLPSGASITSLGAFVYQFPNNLSPGTCTPDVEYFTT




TTQAWTGTYETTYTVPPSGTQPGTVIIETPESYVTTTQPWTGTYETTYTVPPTGTEPGTVII




ETPESYVTTTQPWTGTYETTYTVPPSGTEPGTVIIETPESYVTTTQPWTGTYETTYTVPPSG




TEPGTVIIETPESYVTTTQPWTGTYETTYTVPPSGTEPGIVIIETPESYVTTTQPWTGTYETT




YTVPPSGTEPGTVVIETPEITDCEAVCCGAVPTSDPLRRRDVCDCETFCCPGDTNCETYVT




TTQPWTGTYETTYTVPPSGTEPGTVIIETPESYVTTTQPWTGTYETTYTVPPSGTEPGIVIIE




TPESYVTTTQPWTGTYETTYTVPPTGTEPGTVIIETPESYVTTTQPWTGTYETTYTVPPSGT




EPGIVIIETPESYVTTTQPWTGTYETTYTVPPSGTEPGTVIIETPESYVTTTQPWTGTYETTY




TVPPTGTEPGTVIIETPESYVTTTQPWTGTYETTYTVPPSGTEPGIVIIETPESYVTTTQPWT




GTYETTYTVPPTGTEPGTVIIETPESYVTTTQPWTGTYETTYTVPPTGTEPGTVIIETPESYV




TTTQPWTGTYETTYTVPPSGTEPGTVIIETPESYVTTTQPWTGTYETTYTVPPSGTEPGTV




VIETPEITDCEAVCCGAVPTSDPLRRRDVCDCETFCCPGDTNCETYVTTTQPWTGTYETT




YTVPPSGTEPGTVIIETPESYVTTTQPWTGTYETTYTVPPTGTEPGTVIIETPESYVTTTQPW




TGTYETTYTVPPSGTQPGTVIIETPESYVTTTQPWTGTYETTYTVPPTGTEPGTVIIETPESY




VTTTQPWTGTYETTYTVPPSGTEPGTVIIETPESYVTTTQPWTGTYETTYTVPPSGTQPGT




VIIETPESYVTTTQPWTGTYETTYTVPPTGTEPGTVIIETPESYVTTTQPWTGTYETTYTVP




PSGTEPGIVIIETPESYVTTTQPWTGTYETTYTVPPTGTEPGTVIIETPESYVTTTQPWTGTY




ETTYTVPPTGTEPGTVIIETPESYVTTTQPWTGTYETTYTVPPSGTEPGTVIIETPESYVTTT




QPWTGTYETTYTVPPSGTEPGTVVIETPEITDCEAVCCGAVPTSDPLRRRDVCDCETFCCP




GDTNCETYVTTTQPWTGTYETTYTVPPSGTEPGTVIIETPESYVTTTQPWTGTYETTYTVP




PTGTEPGTVIIETPESYVTTTQPWTGTYETTYTVPPSGTQPGTVIIETPESYVTTTQPWTGT




YETTYTVPPTGTEPGTVIIETPESYVTTTQPWTGTYETTYTVPPSGTEPGTVIIETPESYVTT




TQPWTGTYETTYTVPPSGTQPGTVIIETPESYVTTTQPWTGTYETTYTVPPTGTEPGTVIIE




TPESYVTTTQPWTGTYETTYTVPPSGTEPGIVIIETPESYVTTTQPWTGTYETTYTVPPTGT




EPGTVIIETPESYVTTTQPWTGTYETTYTVPPSGTEPGTVIIETPESYVTTTQPWTGTYETT




YTVPPSGTQPGTVIIETPESYVTTTQPWTGTYETTYTVPPSGTEPGTVIVETPDVPGSYVTT




TQPWTGTYETTHTVPPTGTEPGTVVVETPDVPGSYVTTTQPWTGTYETTHTVPPTGTEPG




TVVVETPDVPGSYVTTTQPWTGTYETTYTVPPSGTEPGTVIVETPDVPGSYVTTTQPWTG




TYETTHTVPPTGTEPGTVVVETPDVPGSYVTTTQPWTGVYKTTYTVPPSGTIPGTVIIETPF




GYFNTSSISTKTDKRTITSVVPCSQCSESKTQYITPTGPGDVTVIISQPPSKITLSSPEDKTKT




DFITSTGSIGGGSPPSHPNDKPGIITTPTQPIGGGNPSDIPSAISSVSSGGNSRASVPSFSTSS




AISVQVSSLYDENSGSTFEVSLLFSVVSGFFLTLMV





FLO5 homolog
SEQ ID NO:
MKFPVPLLFLLQLFFIIATQGDESGNGDESDTAYGCDITSNAFDGFDATIYEYNANDLKLI


GQ68_03011
291
RDPVFMSTGYLGRNVLNKISGVTVPGFNIWNPRSRTATVYGVQNVNYYNMVLELKGYF


(PAS_chr3_1145)

KAAVSGDYKLTLSNIDDSSMLFFGKNTAFQCCDTGSIPVDQAPTDYSLFTIKPSNQVNSEV




ISSTQYLEAGKYYPVRIVFVNALERALFNFKLTIPSGTVLDDFQDYIYQFGALDENSCYET




TVSKITEWTTYTTPWTGTFETTRTITPTGTEGTVVIETPESYVTTTQPWTGTYETTYTVPPT




GTEPGTVIIETPEIIDCEAVCCGPFLTAFSFRKREECQCENICCPGDTNCETYVTTTQPWTG




TYETTYTVPPTGTEPGTVIIETPESYVTTTQPWTGTYETTYTVPPTGTEPGTVIIETPESYVT




TTQPWTGTYETTYTVPPSGTEPGTVVIETPEIVDCEAYCCASVAIKKRELCQCENFCCSW




DQSCQTYVTTTQPWTGTYETTYTVPPTGTEPGTVIIETPESYVTTTQPWTGTYETTYTVPP




TGTEPGTVIIETPESYVTTTQPWTGTYETTYTVPPTGTEPGTVIIETPEIIDCEAVCCGPFLT




AFSFRKREECQCENICCPGDTNCETYVTTTQPWTGTYETTYTVPPTGTEPGTVIIETPESYV




TTTQPWTGTYETTYTVPPTGTEPGTVIIETPESYVTTTQPWTGTYETTYTVPPTGTEPGTVI




IETPEIINCEAVCCGPFLTAFSFRKREECQCENICCPGDTNCETYVTTTQPWTGTYETTYTV




PPTGTEPGTVIIETPESYVTTTQPWTGTYETTYTVPSTGTEPGTVIIETPESYVTTTQPWTGT




YETTFTVPPTGTEPGTVVIETPESYVTTTQPWTGTYETTYSVPPSGTEPGTVVIETPEASTA




RTKFTTVTSSWTGVFTTTKTLPASGTEPATIVIQTPTGYFNTSSLVSTRTKTNVDTVTRVIP




CPICTAPKTITVVPEEPNESVSVIISQPQSSSTDTTLSKPDSVRVISQPETASQMDTSLSKTDS




AVISTETAGNNIIPLAGSHSYNTIVTTVTDSPQVAQSTTATSSSNVHLTISTQTTTPSLVYSS




SLSTVHQVSPSNGGFRSSITVHPLLSVIGAIFGALFM





FLO5 homolog
SEQ ID NO:
MTKFTILLLVLLKFYSILAIEVDGSANGQPLAHPIVVEVHEATKWITHTSPWTGTPEAIRT


GQ68_03079
292
VTGETPYEQKIARYDEFNPRLANREIIDCVAFCCGDATSSPSITEPESTATELPESYVTINRP


(chr3)

WSLSWIPDVPPGSPYWSTSTIPPSGTEPGTVIIYFYLYDDARKRREINFGSTQPYHGRPKLL




GSIEKRELCQCDAVCCLGDLSCEVYVTTTQPWTGTYETTYTITPTGSEPGTVIIETPELYVT




TTQPWTGTYETTYTITPTGSEPGTVIIETPESYVTTTQPWTGTYETTYTITPTGSEPGTVIIET




PESYVTTTQPWTGTYETTYTITPTGSEPGTVIIETPESYVTTTQPWTGTYETTYTVPPSGTE




PGAVIIETPELYVTTTQPWTGTYETTYTITPTGSEPGTVIIETPESYVTTTQPWTGTYETTYT




VPPSGTEPGTVIIETPELYVTTTQPWTGTYETTYTITPTGSEPGTVIVEIPVSYVNSTQISTST




YDTTDTVLSSGVEPGTIAIETPIVYLNTSVSAFSRPWTKIDTVTQFSSCAVCSKPETITVTPE




NPIDTVTIIISQPQSTSQSNTPTSFKANSTSAFSRFDEDSIPVFGSYSYEITVNIDVNTEDDTT




TNLNADTTIIIGSLSAIRTVAGSSSNYHASNISPTINSQKTASSVVVHSDSSATVYQFSPSNG




APWLSVQISTLLSVVGTLLAAVLL





FLO5 homolog
SEQ ID NO:
MNFRYLLILPIYASIVLGQVGDFQLLLNAKEPIRNSPSLLSSNYGNLTLPAMANGALESHF


GQ68_04277
293
DYGNAYVGDDQITVVYHLPDEHGQINAYRQDTDEYIGYLGLVTDDYGEYTYLSVIMPG


(chr4)

VQYDQTTSVNWYIENEELKSTSINVQPLLGCYYKNPPQYSWYWASIDEPGNIASSNFVCE




PCKVYVDFVPTPSADITSMWTGSETSWTTDADGTVIELVPTPSADATSVWTGDHTTWTT




DDDGNVIEQIPTPSADITSMWTGSETSWTTDADGTVIELVPTPSADITSMWTGSETSWTTD




ADGTVIELVPTPSADATSVWTGDHTTWTTDDDGNVIEQIPTPSADITSMWTGSETSWTTD




ADGTVIELVPTPSADATSVWTGDHTTWTTDDDGNVIEQIPTPSADITSMWTGSETSWTTD




ADGTVIELVPTPSADITSMWTGSETSWTTDADGTVIELVPTPSADATSVWTGDHTTWTTD




DDGNVIEQIPTPSADITSMWTGSETSWTTDADGTVIELVPTPSADITSMWTGSETSWTTDA




DGTVIELVPTPSADITSMWTGSETSWTTDADGTVIELVPTPSADATSVWTGDHTTWTTDD




DGNVIEQIPTPSADITSMWTGSETSWTTDADGTVIELVPTPSADITSMWTGSETSWTTDAD




GTVIELVPTPSADITSMWTGSETSWTTDADGTVIELVPTPSADITSMWTGSETSWTTDAD




GTVIELVPTPSADITSMWTGSETSWTTDADGTVIELVPTPSADITSMWTGSETSWTTDAD




GTVIELVPTPSADATSVWTGDHTTWTTDDDGNVIEQIPTPSADITSMWTGSETSWTTDAD




GTVIELVPTPSADITSMWTGSETSWTTDADGTVIELVPTPSADATSVWTGDHTTWTTDDD




GNVIEQIPTPSADITSMWTGSETSWTTDADGTVIELVPTPSADATSVWTGDHTTWTTDDD




GNVIEQIPTPSADITSMWTGSETSWTTDADGTVIELVPTPSADITSMWTGSETSWTTDADG




TVIELVPTPSADITSMWTGSETSWTTDADGTVIELVPTPSADATSVWTGDHTTWTTDDDG




NVIEQIPTPSADITSMWTGSETSWTTDADGTVIELVPTPSADITSMWTGSETSWTTDADGT




VIELVPTPSADATSVWTGDHTTWTTDDDGNVIEQIPTPSADITSMWTGSETSWTTDADGT




VIELVPTPSADITSMWTGSETSWTTDADGTVIELVPTPSADITSMWTGSETSWTTDADGT




VIELVPTPSADATSVWTGDHTTWTTDDDGNVIEQIPTPSADITSMWTGSETSWTTDADGT




VIELVPTPSADITSMWTGSETSWTTDADGTVIELVPTPSADITSMWTGSETSWTTDADGT




VIELVPTPSADATSVWTGDHTTWTTDDDGNVIEQIPTPSADITSMWTGSETSWTTDADGT




VIELVPTPSADITSMWTGSETSWTTDADGTVIELVPTPSADATSVWTGDHTTWTTDDDG




NVIEQIPTPSADITSMWTGSETSWTTDADGTVIELVPTPSADITSMWTGSETSWTTDADGT




VIELVPTPSADITSMWTGSETSWTTDADGTVIELVPTPSADATSVWTGDHTTWTTDDDG




NVIEQIPTPSADITSMWTGSETSWTTDADGTVIELVPTPSADITSMWTGSETSWTTDADGT




VIELVPTPSADITSMWTGSETSWTTDADGTVIELVPTPSADATSVWTGDHTTWTTDDDG




NVIEQIPTPSADITSMWTGSETSWTTDADGTVIELVPTPSADITSMWTGSETSWTTDADGT




VIELVPTPSADITSMWTGSETSWTTDADGTVIELVPTPSADTTSVWTGSYTTWTTDEDGT




VIEQVPTPSADTPSADTTSVWTGSYTTWTTDEDGTVIEQVPTPSADTTSVWTGSYTTWTT




DEDGTVIEQVPTPSADTPSADTTSVWTGSYTTWTTEVGDGGSSTVVELVPTESSTSTNVM




QTPVPSSGVSDGVSVFNGFNVEVFHYPADNYELANEISFLSYGYENLGLVTTVTGVSDIN




FDTDSNWPYYIDRDALGNTGSYVNATIEYEGFFRAPVDGEYVFSFSSTDYNSILFVGSPAA




ADQALQKREVQFLKPETSPDYVLLFNNTRDLGKTVSTTQYLLADQYYPLRVVIAAISQH




ALLDFQIKLPNGASLTQYQGYVYNFALEGSESTTVIGDKTSTWTGSYTTWTTDSDGSTIV




VVPPATITADKTSTWTGSYTTWTTDSDGSTVVICPSITSDHNDKPSESTLTDSSISTTVVTV




TSCDIEKCTKTTALTGVRETTLTTGGTTTVVTTYCPLPTDIVTVKTTSIDGSEVLQTIYTAK




PNHVVPDVQTSTVTITREVCDAFTCTHATIVTGEILKTTTLADTHYTTVVPVYVPLETYQP




AVELSTLETVLKSSDLASGPVVTAGSVQPSYQSGGVAESSLTVSEFEAHSTSDTVSQPSTIS




LQTGEANALKWSSFFGAALVPLVNVFFV





FLO5 homolog
SEQ ID NO:
MQNTNDKLIIRTFYSISTIHGLLSINIFSDTRVYKFAIYSTDAVSLEPRTKNNMSLVTVLACF


GQ68_01371
294
IIFAAHAFGQDTFYMLKVRTLTPNGYPLADSLSNPMQYWDLYYVPGGPRRLESSFVNWQ


(chr1)

PTTAAPINQFYCRLGTDGHMTGYNRVTGSVIGKLSFGTNAATALAFGSYDGDPSYPPQAF




SISSSVSGTMTYLNVHYVNARSITWYSTTTATGETNVYINVASTGYTGDRTTYQAELWV




EPFVPNIPVDTTTSIWTGSQTSYTTEVGENGGSTVIELIPTPPADATSTWTGTYTTRTTDAD




GSVIEQIPTPSADTTSVWTGTYTTWTTDADGSVIEQIPTPSADTTSVWTGTYTTWTTDAD




GSVIEQIPTPSADTTATWTGTETSYTTDVGEDGSSTVIELVPTPSADTTATWTGTETSYTT




DVGEDGSSTVVELVPTPSADTTATWTGTETSYTTDVGEDGSSTVIELVPTPSADTTATWT




GTETSYTTDVGEDGSSTVIELVPTPSADTTATWTGTETSYTTDVGEDGSSTVVELVPTPSA




DTTATWTGTETSYTTDVGEDGSSTVIELVPTPSADTTATWTGTETSYTTDVGEDGSSTVIE




LVPTPSADTTATWTGTETSYTTDVGEDGSSTVIELVPTPSADTTATWTGTETSYTTDVGE




DGSSTVIELVPTPSADTTATWTGTETSYTTDVGEDGSSTVIELVPTPSADTTATWTGTETS




YTTDVGEDGSSTVIELVPTPSADTTATWTGTETSYTTDVGEDGSSTVIELVPTPSADTTAT




WTGTETSYTTDVGEDGSSTVIELVPTPSADTTATWTGTETSYTTDVGEDGSSTVIELVPTP




SADTTATWTGTETSYTTDVGEDGSSTVIELVPTPSADTTATWTGTETSYTTDVGEDGSST




VIELVPTPSADTTATWTGTETSYTTDVGEDGSSTVIELVPTPTPSADTTATWTGTETSYTT




DVGEDGSSTVIELVPTPSADTTATWTGTETSYTTDVGEDGSSTVIELVPTPSADTTATWTG




TETSYTTDVGEDGSSTVIELVPTPSADTTATWTGTETSYTTDVGEDGSSTVIELVPTPTPSA




DTTATWTGTETSYTTDVGEDGSSTVIELVPTPSADTTATWTGTETSYTTDVGEDGSSTVIE




LVPTPSADTTATWTGTETSYTTDVGEDGSSTVVELVPTPTPSADTTATWTGTETSYTTDV




GEDGSSTVVELVPTPSADTTATWTGTETSYTTDVGEDGSSTVIELVPTPSADTTATWTGT




ETSYTTDVGEDGSSTVVELVPTPSADTTATWTGTETSYTTDVGEDGSSTVVELVPTPTAD




TTATWTGTETSYTTDVGEDGSSTVIELVPTPSADTTATWTGTETSYTTDVGEDGSSTVVE




LVPTPSADTTATWTGTETSYTTDVGEDGSSTVVELVPTPSADTTATWTGTETSYTTDVGE




DGSSTVIELVPTPSADTTATWTGTETSYTTDVGEDGSSTVVELVPTPSADTTATWTGTETS




YTTDVGEDGSSTVVELVPTPTADTTATWTGTETSYTTDVGEDGSSTVIELVPTPSADTTA




TWTGTETSYTTDVGEDGSSTVVELVPTPSADTTATWTGTETSYTTDVGEDGSSTVIELVP




TPSADTTATWTGTETSYTTDVGEDGSSTVIELVPTPSADTTATWTGTETSYTTDVGEDGS




STVIELVPTPTPSADTTATWTGTETSYTTDVGEDGSSTVIELVPTPSADTTATWTGTETSY




TTDVGEDGSSTVIELVPTPTPSADTTATWTGTETSYTTDVGEDGSSTVVELVPTPSADTTA




TWTGTETSYTTDVGEDGSSTVIELVPTPSADTTATWTGTETSYTTDVGEDGSSTVVELVP




TPSADTTATWTGTETSYTTDVGEDGSSTVIELVPTPSADTTATWTGTETSYTTDVGEDGS




STVIELVPTPSADTTATWTGTETSYTTDVGEDGSSTVIELVPTPSADTTATWTGTETSYTT




DVGEDGSSTVIELVPTPSADTTATWTGTETSYTTDVGEDGSSTVIELVPTPSADTTATWTG




TETSYTTDVGEDGSSTVIELVPTPSADTTATWTGTETSYTTDVGEDGSSTVIELVPTPSAD




TTATWTGTETSYTTDVGEDGSSTVIELVPTPSADTTATWTGTETSYTTDVGEDGSSTVIEL




VPTPSADTTATWTGTETSYTTDVGEDGSSTVIELVPTPSADTTATWTGTETSYTTDVGED




GSSTVVELVPTPTPSADTTATWTGTETSYTTDVGEDGSSTVIELVPSDTETATNIVETPVPS




SGVSDGVSVFDGFNVEVFHYPADNYELANEIGFLSYGYENLGLVTNATGVSDINFDTDSN




WPYYIDRDALGNTGSYVNATIEYEGFFRAPVDGEYVFSFSNTDYNSILFVGSPAAAGQAL




QKRRVQFLKPETSPDHVLLFNNTRDLGQTISTTQYLLADQYYPLRVVIAAISQHALLDFQI




KLPNGALLTQYQGYVYNFALEGSESTTVIGDKTSTWTGSYTTWTTDSDGSTVVVVPSATI




TADKTSTWTGSYTTWTTDSDGSTIVICPSITSDHNDKPSESTLTDGSISTTVVTVTSCDIEK




CTKTTALTGVTETTLTTGGTTTVVTTYCPLPTDIVTVKTTSISGSEVLQTIYTAKPSHVVPN




VHTLTVTITREVCDAFTCTQATIVTGEILKTTTLADTHSTTVVPVYVPLESYQSAVELSTL




ETVLKSSDFASGSAVTAGSAQPSYQSGGVAESSLTGSELEAHSTSDTVSQPSTISPQTGEA




NALRWSSFFGAALVPLVNVFFV





FLO5 homolog
SEQ ID NO:
MTKLTILLSVLLQLFSVLAEVPKKTEWSSHTTYWTSTLEALRTVTPTGTERAVIGEAPYE


GQ68_04678
295
YKLIGNDQFDPGLNAKREIIDCEAVCCGAVPTSDPLKRRDVCECENVCCPGDDCETYVTT


(PAS_chr4_0363)

TQPWTGTYETTYTVPPSGTEPGTVVIETPEITDCEAVCCGAVPTSDPLRRRDVCECENVCC




PGDDCETYVTTTQPWTGTYETTYTVPPSGTEPGTVVIETPEITDCEAVCCGAVPTSDPLRR




RDVCECENVCCPGDDCETYVTTTQPWTGTYETTYTVPPTGTEPGTVVIETPVTYVTTTQP




WTGTYETTYTVPPTGTEPGTVVIETPEITDCEAVCCGAVPTSDPLRRRDVCECENVCCPG




DDCETYVTTTQPWTGTYETTYTVPPTGTEPGTVVIETPVTYVTTTQPWTGTYETTYTVPP




TGTEPGTVVIETPVTYVTTTQPWTGTYETTYTVPPTGTEPGTVVIETPVTYVTTTQPWTGT




YETTYTIPPTGTEPGTVVIETPEITDCEAVCCGAVPTSDPLRRRDVCECENVCCPGDDCET




YVTTTQPWTGTYETTYTVPPTGTEPGTVVIETPVTYVTTTQPWTGTYETTYTVPPTGTEP




GTVVIETPVTYVTTTQPWTGTYETTYTVPPTGTEPGTVVIETPVTYVTTTKPWTGTYETT




HTVPASGTEPGTVIIETPIKYLNTSISASTSTWTKINTVTQFISCPVCTIPKTITVTPKISNE




TVTIIISQPHGTSSRTTTVVKTDGASVSSHSYKTALTTDVKPEEKTSTKLGTVTTVSGSHSAID




TVTGSLSDYHASSIPHTVKSEEKASSTVTHTISSSTVYQVSPSNGASWLSVRLNTALSIIGT




LFAAVFI





FLO5 homolog
SEQ ID NO:
MSKTKNGGSEFVHIAYVFHIEASTPSDYINMIQIVLFPHQAQITKRMNLVTLLVCNLLCVS


GQ68_04282
296
LTLGQGVYRLKFPALVVTGRESVGTTVVNYDFLVGNTGQYGDLGEFFYDGEPYYCWNS


(chr4)

TDSQPLSCSSSSSLLISTQNVTISHPDEDGTVYAYAERDGGLLGRFTVGSVSADWPQWAVI




VYSTSSSAHPSSWYVDDNKLKLTSGLGPNNSTTLQACYFTQSSGRDRYAISLEGSPAYTG




QVSCQATEFDLEFIPPSADTTSIWDGSYTTWTTDSNGIVVEQIPTPSADTTSIWTGSETSWT




TDSDGTVIELVPTPSADATSIWTGDHTTWTTDSEGNVIEQIPTPSADTTSIWTGSETSWTT




DSDGTVIELVPTPSADATSIWTGDHTTWTTDSEGNVIEQIPTPSADATSIWTGDHTTWTTD




REGNVIEQIPTPSADTTSIWTGSETSWTTDSDGTVIELVPTPSADATSIWTGDHTTWTTDSE




GNVIEQIPTPSADATSIWTGSETSWTTDSDGTVIELVPTPSADATSIWTGDHTTWTTDSEG




NVIEQIPTPSADATSIWTGDHTTWTTDSEGNVIEQIPTPSADTTSIWTGSETSWTTDSDGTV




IELVPTPSADATSIWTGDHTTWTTDSEGNVIEQIPTPSADATSIWTGDHTTWTTDSEGNVI




EQIPTPSADTTSIWTGSETSWTTDSDGTVIELVPTPSADATSIWTGDHTTWTTDSEGNVIE




QIPTPSADATSIWTGDHTTWTTDSEGNVIEQIPTPSADTTSIWTGSETSWTTDSDGTVIELV




PTPSADATSIWTGDHTTWTTDSEGNVIEQIPTPSADATSIWTGSETSWTTDSDGTVIELVPT




PSADATSIWTGDHTTWTTDSEGNVIEQIPTPSADATSIWTGDHTTWTTDSEGNVIEQIPTPS




ADTTSIWTGDHTTWTTEVGGDGSSIVVELVPSETGTATNVVQTPVPSSGISDGVSALDGF




NVEVFHYPADNYELANEISFLSYGYENLGLVTTATGVSDINFDTDSNWPSYIDRNALGNT




GSYVNATIKYEGFFRAPVDGDYEFSFSNIDYNSILFVGSAAADQALRKREAQFLKPETSPN




HILFFNNSRDVGQTISTTQYLSADSYYPLRVVIAAVSQHALLDFQIKLPNGVSLTQFQGYV




YNFALEGAESTTVIGDKTSTWTGTYTTWTTDSEGSTIVLCPSIISDHNGKPADTTLTDGSIS




TTVVTVTSCDIKKCTKTTALTGVTQKTLTVKGTTTVVTAYCPLPTDVATVKTISVGGSEV




LQTVYTAKPSHIVPDVQTLTVTITREVCDALTCIPATIVTGEILKTTTLADTHSTTVIPVYV




PLETHQPALDLITLETVLKSSDFANGPAITSVSVESLSHQSGVVVSEFDSDSTSGAVSQPSS




AVSLQTGKASALKWSPFLGAAVISLFNVFFV





FLO5 homolog
SEQ ID NO:
MNLFTILAWGFLYVPLVLGEGYYSLNFDARVPIALGILGSSYQKYTIMADRSLLGGSNIDL


GQ68_03013
297
DVTFSGIIELLTNRVHIVVSLPDADGRVSVYDMYSGTSLGYLSFVCSLTTCEVHAVSSSSG


(PAS_chr3_0015)

ATTWTLDGNQLIPTSPSTVYACYRSLVGLLAQYTLNDRTSITAQCEQTNLYVELAIPAFPE




TTAVWTGTYTTWTTDESGSVIEQMPTPSADTTTTWTGTYTTWTTDADGSVIEQIPTPPAD




TTSVWTGTYTTRTTDADGSVIEQIPTPSADTTSIWTGTYTTWTTDADGSVIEQIPTPSADTT




SVWTGTYTTWTTDADGSVIEQIPTPSADTTSVWTGTYTTWTTDADGSVIEQIPTPSADTTS




VWTGTYTTWTTDADGSVIEQIPTPSTDTTLAPSADTTSIWTGTYTTWTTDADGSVIEQIPT




PSADTTSIWTGTYTTWTTDADGSVIEQIPTPSADTTSVWTGTYTTWTTDADGSVIEQIPTP




STDTTLAPSADTTSIWTGTYTTWTTDADGSVIEQIPTPSADTTSVWTGTYTTWTTDADGS




VIEQIPTPSADTTSVWTGTYTTWTTDADGSVIEQIPTPSADTTSVWTGTYTTWTTDADGS




VIEQIPTPSADTTSVWTGTYTTWTTDADGSVIEQIPTPSADTTSVWTGTYTTWTTDADGS




VIEQIPTPSADTTSIWTGTYTTWTTDADGSVIEQIPTPSADTTSVWTGTYTTWTTDADGSV




IEQIPTPSADTTLAPSADTTSIWTGTYTTWTTDADGSVIEQIPTPSADTTSIWTGTYTTWTT




DADGSVIEQIPTPSADTTSVWTGTYTTWTTDADGSVIEQIPTPSADTTSVWTGTYTTWTT




DADGSVIEQIPTPSADTTSVWTGTYTTWTTDADGSVIEQIPTPSADTTSVWTGTYTTWTT




DADGSVIEQIPTPSADTTSVWTGTYTTWTTDADGSVIEQIPTPSADTTSVWTGTYTTWTT




DADGSVIEQIPTPSTDTTLAPSADTTSIWTGTYTTWTTDADGSVIEQIPTPSADTTSVWTGT




YTTWTTDADGSVIEQIPTPSADTTSVWTGTYTTWTTDADGSVIEQIPTPSADTTSVWTGT




YTTWTTDADGSVIEQIPTPSTDTTLAPSADTTSIWTGTYTTWTTDADGSVIEQIPTPSADTT




SIWTGTYTTWTTDADGSVIEQIPTPSADTTSVWTGTYTTWTTDADGSVIEQIPTPSADTTS




VWTGTYTTWTTDADGSVIEQIPTPSADTTSVWTGTYTTWTTDADGSVIEQIPTPSADTTS




VWTGTYTTWTTDADGSVIEQSPTPSAYTTSVWTGTYTTWTTDADGSVIEQIPTPSADTTS




VWTGTYTTWTTDADGSVIEQIPTPSADTTSVWTGTYTTWTTDADGSVIEQIPTPSADTTS




VWTGTYTTWTTDADGSVIEQIPTPSADTTSVWTGTYTTWTTDADGSVIEQIPTPSADTTS




VWTGTYTTWTTDADGSVIEQIPTPSADTTSVWTGTYTTWTTDADGSVIEQSPTPSAYTTS




VWTGTYTTWTTDADGSVIEQIPTPSADTTLAPSADTTSIWTGTYTTWTTDADGSVIEQIPT




PSADTTSIWTGTYTTWTTDADGSVIEQIPTPSADTTSVWTGTYTTWTTDADGSVIEQIPTP




SADTTSVWTGTYTTWTTDADGSVIEQIPTPSADTTSVWTGTYTTWTTDADGSVIEQIPTPS




TDTTLAPSADTTSIWTGTYTTWTTDADGSVIEQIPTPSADTTSVWTGTYTTWTTDADGSV




IEQIPTPSTDTTLAPSADTTSIWTGTYTTWTTDADGSVIEQIPTPSADTTSVWTGTYTTWTT




DADGSVIEQIPTPSADTTSVWTGTYTTWTTDADGSVIEQIPTPSADTTSVWTGTYTTWTT




DADGSVIEQIPTPSADTTLAPSADTTSIWTGTYTTWTTDADGSVIEQIPTPSADTTSVWTG




TYTTWTTDAAGTVIEVIPSGTSISSDVIPTPLPTSGVDIDTIPYDAFNVAVYHYPADNYELA




NNLGFLTSGYEGLGQVTTATSVGNINFDTSSGWPYYIESNALGNTGSYVNATIEYVGFFQ




APANGNYELSFSNIDYNAILFLGSPATDSSLAKREVQFLKPETSSEYVLFFDHGKDAGQTV




STTQYLSAGLYYPLRIVLAAVSERAQLDFQITLPDGRVLDQYQGYVYNFAHEGIESATSS




AHETSWSRFTNSTIYSHSSTIGIITSSTDAPHSVINPTAIETTSTDTSISTVAVTTSICDTKD




CVKTTVITPNSPLPTQTVSLTTTTIDRSEVVQTAHSAVPSQFAPDAHPSAVTITREQCDAYSCS




QATIVSGKVLQTTTVSDSTTVVPLDTPQLSVEASTLETRLKSTQSSRAPTVTVQTSQSSRH




SEDVTESSVHVSEFDAQSTSATSASALQAPSSISLQTGGANTLRLSAFLGTALLPMLNVLFI





SED1 homolog
SEQ ID NO:
MQFSIVATLALAGSALAAYSNVTYTYETTITDVVTELTTYCPEPTTFVHKNKTITVTAPTT


(GQ68_01572)
298
LTITDCPCTISKTTKITTDVPPTTHSTPHTTTTHVPSTSTPAPTHSVSTISHGGAAKAGVAGL




AGVAAAAAYFL





Erp1
SEQ ID NO:
MLLTSLLQVFACCLVLPAQVTAFYYYTSGAERKCFHKELSKGTLFQATYKAQIYDDQLQ



299
NYRDAGAQDFGVLIDIEETFDDNHLVVHQKGSASGDLTFLASDSGEHKICIQPEAGGWLI




KAKTKIDVEFQVGSDEKLDSKGKATIDILHAKVNVLNSKIGEIRREQKLMRDREATFRDA




SEAVNSRAMWWIVIQLIVLAVTCGWQMKHLGKFFVKQKIL





Erp2
SEQ ID NO:
MIKSTIALPSFFIVLILALVNSVAASSSYAPVAISLPAFSKECLYYDMVTEDDSLAVGYQVL



300
TGGNFEIDFDITAPDGSVITSEKQKKYSDFLLKSFGVGKYTFCFSNNYGTALKKVEITLEK




EKTLTDEHEADVNNDDIIANNAVEEIDRNLNKITKTLNYLRAREWRNMSTVNSTESRLT




WLSILIIIIIAVISIAQVLLIQFLFTGRQKNYV





Emp24
SEQ ID NO:
MASFATKFVIACFLFFSASAHNVLLPAYGRRCFFEDLSKGDELSISFQFGDRNPQSSSQLT



301
GDFIIYGPERHEVLKTVRDTSHGEITLSAPYKGHFQYCFLNENTGIETKDVTFNIHGVVYV




DLDDPNTNTLDSAVRKLSKLTREVKDEQSYIVIRERTHRNTAESTNDRVKWWSIFQLGV




VIANSLFQIYYLRRFFEVTSLV





Erv25
SEQ ID NO:
MQVLQLWLTTLISLVVAVQGLHFDIAASTDPEQVCIRDFVTEGQLVVADIHSDGSVGDG



302
QKLNLFVRDSVGNEYRRKRDFAGDVRVAFTAPSSTAFDVCFENQAQYRGRSLSRAIELDI




ESGAEARDWNKISANEKLKPIEVELRRVEEITDEIVDELTYLKNREERLRDTNESTNRRVR




NFSILVIIVLSSLGVWQVNYLKNYFKTKHII





Erp3
SEQ ID NO:
MSNLCVLFFQFFFLAQFFAEASPLTFELNKGRKECLYTLTPEIDCTISYYFAVQQGESNDF



303
DVNYEIFAPDDKNKPIIERSGERQGEWSFIGQHKGEYAICFYGGKAHDKIVDLDFKYNCE




RQDDIRNERRKARKAQRNLRDSKTDPLQDSVENSIDTIERQLHVLERNIQYYKSRNTRNH




HTVCSTEHRIVMFSIYGILLIIGMSCAQIAILEFIFRESRKHNV*





Erp5
SEQ ID NO:
MKYNIVHGICLLFAITQAVGAVHFYAKSGETKCFYEHLSRGNLLIGDLDLYVEKDGLFEE



304
DPESSLTITVDETFDNDHRVLNQKNSHTGDVTFTALDTGEHRFCFTPFYSKKSATLRVFIE




LEIGNVEALDSKKKEDMNSLKGRVGQLTQRLSSIRKEQDAIREKEAEFRNQSESANSKIM




TWSVFQLLILLGTCAFQLRYLKNFFVKQKVV





Flo5-2 from
SEQ ID NO:
DESGNGDESDTAYGCDITSNAFDGFDATIYEYNANDLKLIRDPVFMSTGYLGRNVLNKIS



Komagataella

305
GVTVPGFNIWNPRSRTATVYGVQNVNYYNMVLELKGYFKAAVSGDYKLTLSNIDDSSM



phaffii


LFFGKNTAFQCCDTGSIPVDQAPTDYSLFTIKPSNQVNSEVISSTQYLEAGKYYPVRIVFV




NALERALFNFKLTIPSGTVLDDFQDYIYQFGALDENSCYETTVSKITEWTTYTTPWTGTFE




TTRTITPTGTEGTVVIETPESYVTTTQPWTGTYETTYTVPPTGTEPGTVIIETPEIIDCEAVC




CGPFLTAFSFRKREECQCENICCPGDTNCETYVTTTQPWTGTYETTYTVPPTGTEPGTVIIE




TPESYVTTTQPWTGTYETTYTVPPTGTEPGTVIIETPESYVTTTQPWTGTYETTYTVPPSGT




EPGTVVIETPEIVDCEAYCCASVAIKKRELCQCENFCCSWDQSCQTYVTTTQPWTGTYET




TYTVPPTGTEPGTVIIETPESYVTTTQPWTGTYETTYTVPPTGTEPGTVIIETPESYVTTTQP




WTGTYETTYTVPPTGTEPGTVIIETPEIIDCEAVCCGPFLTAFSFRKREECQCENICCPGDT




NCETYVTTTQPWTGTYETTYTVPPTGTEPGTVIIETPESYVTTTQPWTGTYETTYTVPPTG




TEPGTVIIETPESYVTTTQPWTGTYETTYTVPPTGTEPGTVIIETPEIINCEAVCCGPFLTAFS




FRKREECQCENICCPGDTNCETYVTTTQPWTGTYETTYTVPPTGTEPGTVIIETPESYVTTT




QPWTGTYETTYTVPSTGTEPGTVIIETPESYVTTTQPWTGTYETTFTVPPTGTEPGTVVIET




PESYVTTTQPWTGTYETTYSVPPSGTEPGTVVIETPESYVTTTQPWTGTYETTYSVPPSGT




EPGTVVIETPEASTARTKFTTVTSSWTGVFTTTKTLPASGTEPATIVIQTPTGYFNTSSLVST




RTKTNVDTVTRVIPCPICTAPKTITVVPEEPNESVSVIISQPQSSSTDTTLSKPDSVRVISQPE




TASQMDTSLSKTDSAVISTETAGNNIIPLAGSHSYNTIVTTVTDSPQVAQSTTATSSSNVHL




TISTQTTTPSLVYSSSLSTVHQVSPSNGGFRSSITVHPLLSVIGAIFGALFM





Flo5-2 from
SEQ ID NO:

MKFPVPLLFLLQLFFIIATQGDESGNGDESDTAYGCDITSNAFDGFDATIYEYNANDLKLI




Komagataella

306
RDPVFMSTGYLGRNVLNKISGVTVPGFNIWNPRSRTATVYGVQNVNYYNMVLELKGYF



phaffii


KAAVSGDYKLTLSNIDDSSMLFFGKNTAFQCCDTGSIPVDQAPTDYSLFTIKPSNQVNSEV


(underlined is

ISSTQYLEAGKYYPVRIVFVNALERALFNFKLTIPSGTVLDDFQDYIYQFGALDENSCYET


signal peptide,

TVSKITEWTTYTTPWTGTFETTRTITPTGTEGTVVIETPESYVTTTQPWTGTYETTYTVPPT


used in some

GTEPGTVIIETPEIIDCEAVCCGPFLTAFSFRKREECQCENICCPGDTNCETYVTTTQPWTG


versions and not

TYETTYTVPPTGTEPGTVIIETPESYVTTTQPWTGTYETTYTVPPTGTEPGTVIIETPESYVT


others)

TTQPWTGTYETTYTVPPSGTEPGTVVIETPEIVDCEAYCCASVAIKKRELCQCENFCCSW




DQSCQTYVTTTQPWTGTYETTYTVPPTGTEPGTVIIETPESYVTTTQPWTGTYETTYTVPP




TGTEPGTVIIETPESYVTTTQPWTGTYETTYTVPPTGTEPGTVIIETPEIIDCEAVCCGPFLT




AFSFRKREECQCENICCPGDTNCETYVTTTQPWTGTYETTYTVPPTGTEPGTVIIETPESYV




TTTQPWTGTYETTYTVPPTGTEPGTVIIETPESYVTTTQPWTGTYETTYTVPPTGTEPGTVI




IETPEIINCEAVCCGPFLTAFSFRKREECQCENICCPGDTNCETYVTTTQPWTGTYETTYTV




PPTGTEPGTVIIETPESYVTTTQPWTGTYETTYTVPSTGTEPGTVIIETPESYVTTTQPWTGT




YETTFTVPPTGTEPGTVVIETPESYVTTTQPWTGTYETTYSVPPSGTEPGTVVIETPESYVT




TTQPWTGTYETTYSVPPSGTEPGTVVIETPEASTARTKFTTVTSSWTGVFTTTKTLPASGT




EPATIVIQTPTGYFNTSSLVSTRTKTNVDTVTRVIPCPICTAPKTITVVPEEPNESVSVIISQP




QSSSTDTTLSKPDSVRVISQPETASQMDTSLSKTDSAVISTETAGNNIIPLAGSHSYNTIVTT




VTDSPQVAQSTTATSSSNVHLTISTQTTTPSLVYSSSLSTVHQVSPSNGGFRSSITVHPLLS




VIGAIFGALFM





Flo11 from
SEQ ID NO:
SSGKTCPTSEVSPACYANQWETTFPPSDIKITGATWVQDNIYDVTLSYEAESLELENLTEL



Komagataella

307
KIIGLNSPTGGTKLVWSLNSKVYDIDNPAKWTTTLRVYTKSSADDCYVEMYPFQIQVDW



phaffii


CEAGASTDGCSAWKWPKSYDYDIGCDNMQDGVSRKHHPVYKWPKKCSSNCGVEPTTS




DEPEEPTTSEEPEEPTTSEEPEEPTSSDEEPTTSEEPEEPTTSDEPEEPTTSEEPEEPTTSEEP




EEPTTSEEPTTSEEPEEPTSSDEEPTTSDEPEEPTTSDEPEEPTTSEEPTTSEEPEEPTTSSEE




PTPSEEPEGPTCPTSEVSPACYADQWETTFPPSDIKITGATWVEDNIYDVTLSYEAESLELENL




TELKIIGLNSPTGGTKVVWSLNSGIYDIDNPAKWTTTLRVYTKSSADDCYVEMYPFQIQVD




WCEAGASTDGCSAWKWPKSYDYDIGCDNMQDGVSRKHHPVYKWPKKCSSDCGVEPTT




SDEPEEPTTSEEPVEPTSSDEEPTTSEEPTTSEEPEEPTTSDEPEEPTTSEEPEEPTTSEEPEE




PTTSEEPTTSEEPEEPTSSDEEPTTSDEPEEPTTSEEPEEPTTSEEPEEPTTSEEPEEPTTSDE




PEEPTTSEEPEEPTTSEEPEEPTSSDEEPTTSEEPEEPTTSEEPEEPTTSEEPEEPTTSEEPEE




PTSSDEEPTTSEEPEEPTTSDEPEEPTTSEEPEEPTTSEEPEEPTSSDEEPTTSEEPEEPTTSD




EPEEPTTSEEPEEPTTSEEPEEPTTSEEPEEPTTSEEPEEPTSSDEEPTTSEEPEEPTTSDEPE




EPTTSEEPEEPTTSEEPEEPTTSEEPEEPTTSDEEPGTTEEPLVPTTKTETDVSTTLLTVTDCG




TKTCTKSLVITGVTKETVTTHGKTTVITTYCPLPTETVTPTPVTVTSTIYADESVTKTTVYTTG




AVEKTVTVGGSSTVVVVHTPLTTAVVQSQSTDEIKTVVTARPSTTTIVRDVCYNSVCSVATIV




TGVTEKTITFSTGSITVVPTYVPLVESEEHQRTASTSETRATSVVVPTVVGQSSSASATSSIF




PSVTIHEGVANTVKNSMISGAVALLFNALFL





Flo11 from
SEQ ID NO:

MVSLRSIFTSSILAAGLTRAHGSSGKTCPTSEVSPACYANQWETTFPPSDIKITGATWVQD




Komagataella

308
NIYDVTLSYEAESLELENLTELKIIGLNSPTGGTKLVWSLNSKVYDIDNPAKWTTTLRVYT



phaffii


KSSADDCYVEMYPFQIQVDWCEAGASTDGCSAWKWPKSYDYDIGCDNMQDGVSRKHH




PVYKWPKKCSSNCGVEPTTSDEPEEPTTSEEPEEPTTSEEPEEPTSSDEEPTTSEEPEEPTTS




DEPEEPTTSEEPEEPTTSEEPEEPTTSEEPTTSEEPEEPTSSDEEPTTSDEPEEPTTSDEPEEP




TTSEEPTTSEEPEEPTTSSEEPTPSEEPEGPTCPTSEVSPACYADQWETTFPPSDIKITGATWV




EDNIYDVTLSYEAESLELENLTELKIIGLNSPTGGTKVVWSLNSGIYDIDNPAKWTTTLRV




YTKSSADDCYVEMYPFQIQVDWCEAGASTDGCSAWKWPKSYDYDIGCDNMQDGVSRK




HHPVYKWPKKCSSDCGVEPTTSDEPEEPTTSEEPVEPTSSDEEPTTSEEPTTSEEPEEPTTS




DEPEEPTTSEEPEEPTTSEEPEEPTTSEEPTTSEEPEEPTSSDEEPTTSDEPEEPTTSEEPEEP




TTSEEPEEPTTSEEPEEPTTSDEPEEPTTSEEPEEPTTSEEPEEPTSSDEEPTTSEEPEEPTTS




EEPEEPTTSEEPEEPTTSEEPEEPTSSDEEPTTSEEPEEPTTSDEPEEPTTSEEPEEPTTSEEP




EEPTSSDEEPTTSEEPEEPTTSDEPEEPTTSEEPEEPTTSEEPEEPTTSEEPEEPTTSEEPEEP




TSSDEEPTTSEEPEEPTTSDEPEEPTTSEEPEEPTTSEEPEEPTTSEEPEEPTTSDEEPGTTEE




PLVPTTKTETDVSTTLLTVTDCGTKTCTKSLVITGVTKETVTTHGKTTVITTYCPLPTETVTPT




PVTVTSTIYADESVTKTTVYTTGAVEKTVTVGGSSTVVVVHTPLTTAVVQSQSTDEIKTVVTAR




PSTTTIVRDVCYNSVCSVATIVTGVTEKTITFSTGSITVVPTYVPLVESEEHQRTASTSETR




ATSVVVPTVVGQSSSASATSSIFPSVTIHEGVANTVKNSMISGAVALLFNALFL





Adhesin domain
SEQ ID NO:
DESGNGDESDTAYGCDITSNAFDGFDATIYEYNANDLKLIRDPVFMSTGYLGRNVLNKIS


only of Flo5-2
309
GVTVPGFNIWNPRSRTATVYGVQNVNYYNMVLELKGYFKAAVSGDYKLTLSNIDDSSM


from Komagataella

LFFGKNTAFQCCDTGSIPVDQAPTDYSLFTIKPSNQVNSEVISSTQYLEAGKYYPVRIVFV



phaffii (without


NALERALFNFKLTIPSGTVLDDFQDYIYQFGALDENSC


signal peptide or




extension + anchor




domains)







(secretion signal
SEQ ID NO:
Nucleotide sequence


(mini-alpha, alpha
337
ATGAGATTCCCATCTATTTTCACCGCTGTCTTGTTCGCTGCCTCCTCTGCATTGGCTGC


factor secretion

CCCTGTTCAGACTACCACTGAAGACGAGCTTGAGGGTGATTTCGACGTCGCTGTTTTG


signal with 

CCTTTCTCTGCTTCCATTGCTGCTAAGGAAGAGGGTGTCTCTCTCGAGAAAAGAGAG


deletion and

GCCGAAGCT


mutations 




to eliminate









EPS production))












(SUC2 sequence)
SEQ ID NO:
Nucleotide sequence



338
TCTATGACGAATGAGACGTCAGACAGGCCACTGGTACATTTCACACCAAATAAAGGA




TGGATGAATGATCCCAACGGTCTGTGGTACGACGAGAAAGACGCTAAGTGGCATCTG




TACTTTCAATATAACCCCAATGACACTGTTTGGGGTACGCCCCTTTTCTGGGGCCATG




CAACCTCAGATGATCTGACTAACTGGGAAGACCAACCTATTGCCATCGCCCCCAAGC




GTAATGATTCAGGCGCCTTCTCTGGAAGTATGGTAGTCGACTACAACAACACGAGTG




GTTTTTTCAACGACACAATTGACCCAAGACAGAGATGTGTTGCCATTTGGACATATA




ATACACCTGAGTCAGAAGAACAATATATATCCTACTCTCTGGATGGAGGTTATACTTT




TACGGAGTATCAGAAAAACCCTGTTCTGGCAGCTAATTCCACCCAATTTCGTGACCC




AAAGGTGTTTTGGTATGAGCCATCACAGAAATGGATCATGACCGCCGCAAAGTCACA




GGACTATAAAATTGAGATATATTCATCAGACGACTTGAAATCCTGGAAGCTGGAGAG




TGCTTTCGCAAATGAAGGATTTTTGGGATACCAATACGAATGCCCTGGCCTGATCGA




AGTGCCCACTGAACAAGACCCATCAAAGTCTTACTGGGTGATGTTTATCTCTATAAAC




CCCGGCGCTCCCGCTGGAGGCTCCTTCAACCAATATTTCGTAGGTTCTTTTAACGGAA




CCCACTTCGAGGCATTCGATAACCAATCTAGGGTTGTCGATTTCGGAAAAGATTATTA




TGCACTACAAACCTTTTTTAATACGGACCCTACTTATGGATCAGCTTTAGGCATAGCC




TGGGCTTCTAACTGGGAGTACAGTGCCTTTGTTCCTACAAACCCATGGCGTTCCTCAA




TGAGTCTTGTCAGAAAATTCTCTCTGAATACGGAATATCAGGCTAACCCCGAGACAG




AACTAATAAACTTGAAAGCAGAGCCTATCTTGAATATAAGTAACGCTGGACCTTGGA




GTCGTTTTGCCACCAACACCACATTAACAAAAGCCAATTCCTATAACGTGGACCTTTC




CAACTCTACGGGAACACTGGAATTTGAACTGGTGTACGCCGTAAATACTACGCAAAC




AATTTCAAAGTCAGTCTTTGCTGACCTTAGTCTATGGTTCAAGGGTTTAGAGGACCCC




GAGGAGTACTTACGTATGGGTTTTGAGGTATCAGCATCTTCCTTTTTCCTGGATCGTG




GAAACTCCAAGGTGAAGTTTGTCAAGGAAAATCCTTATTTCACTAACAGGATGTCCG




TGAACAACCAGCCTTTCAAATCTGAGAACGATCTTTCCTATTACAAGGTCTACGGACT




TCTAGACCAAAACATATTGGAACTATACTTCAACGATGGAGATGTAGTTTCCACCAA




CACCTATTTTATGACCACAGGCAACGCCCTTGGATCAGTAAACATGACAACAGGTGT




TGATAACCTTTTTTACATAGACAAATTCCAAGTTAGAGAGGTAAAG





(flex linkers)
SEQ ID NO:
Nucleotide sequence



339
GGTTCATCAGGGTCCTCAGGATCATCCGGTAGTAGTGGTTCATCCGGTTCATCCGGAT




CAAGTGGCTCCTCTGAAGCTGCAGCAAGGGAGGCTGCAGCCCGTGAGGCAGCCGCTA




GAGAAGCCGCCGCTAGGGGTGGTGGCGGCTCTGGCGGAGGCGGTTCCGGTGGCGGA




GGCTCT





(Tir4 anchors)
SEQ ID NO:
Nucleotide sequence



340
CAAATCAACGAATTGAACGTTGTTTTAGATGATGTTAAGACCAACATTGCCGACTAC




ATCACCCTATCCTACACTCCAAATTCAGGTTTTTCCTTGGACCAAATGCCAGCTGGTA




TTATGGATATTGCTGCGCAATTGGTTGCAAATCCAAGTGATGACTCCTACACCACTTT




GTACTCTGAAGTGGACTTTTCTGCTGTTGAGCATATGTTGACTATGGTCCCATGGTAC




TCTTCTAGACTGCTTCCAGAATTAGAAGCAATGGATGCTTCTCTAACTACCTCAAGTT




CTGCTGCCACATCTTCAAGTGAAGTTGCTAGCTCTTCTATTGCTTCATCCACTAGCTCT




TCTGTTGCACCATCCTCAAGTGAAGTTGTCAGCTCTTCCGTTGCTTCATCCTCAAGTG




AAGTTGCCAGCTCCTCTGTTGCGTCTACAAGCGAAGCTACTAGTTCTTCTGCTGTCAC




ATCTTCCTCCGCTGTTTCCTCTTCGACCGAGTCTGTTAGCTCTTCCTCTGTCAGTTCTT




CCTCAGCCGTTTCCTCTTCTGAAGCTGTCAGTTCCTCTCCAGTTTCCTCAGTTGTTTCA




TCTTCGGCCGGACCTGCTAGCTCAAGCGTTGCTCCTTACAACTCAACCATTGCTAGCT




CTTCTTCCACTGCCCAGACTTCTATCTCGACCATTGCTCCTTACAACTCCACAACCAC




CACCACCCCAGCTAGTTCTGCTTCCAGCGTTATTATCTCAACCAGAAACGGTACCACT




GTTACTGAAACTGACAACACTCTTGTCACCAAAGAAACCACTGTCTGTGACTACTCTT




CAACATCTGCCGTTCCAGCTTCCACCACCGGTTACAACAATTCTACTAAGGTTTCAAC




CGCTACTATCTGCAGTACATGCAAAGAAGGTACCTCTACTGCAACTGACTTCTCTACA




CTAAAGACTACAGTTACCGTATGTGACTCCGCCTGTCAAGCTAAGAAGTCTGCTACC




GTTGTTAGCGTTCAATCTAAAACTACCGGTATCGTTGAACAAACCGAAAACGGTGCT




GCCAAGGCTGTTATCGGTATGGGTGCCGGTGCTTTAGCTGCTGTTGCCGCCATGCTAC




TATGA
















TABLE 8







Terminator Sequences









Sequence
Sequence



Info
Info
Sequence Info





AOX1
SEQ ID
TCAAGAGGATGTCAGAATGCCATTTGCCTGAGAGATGCAGGCTTCATTTTTGATACTTTTTTATT


terminator
NO: 310
TGTAACCTATATAGTATAGGATTTTTTTTGTCATTTTGTTTCTTCTCGTACGAGCTTGCTCCTGAT




CAGCCTATCTCGCAGCAGATGAATATCTTGTGGTAGGGGTTTGGGAAAATCATTCGAGTTTGAT




GTTTTTCTTGGTATTTCCCACTCCTCTTCAGAGTACAGAAGATTAAGTGAAACCTTCGTTTGTGC




G





TDH3
SEQ ID
TCGATTTGTATGTGAAATAGCTGAAATTCGAAAATTTCATTATGGCTGTATCTACTTTAGCGTAT


terminator
NO: 311
TAGGCATTTGAGCATTGGCTTGAACAATGCGGGCTGTAGTGTGTCACCAAAGAAACCATTCGGG




TTCGGATCTGGAAGTCCTCATCACGTGATGCCGATCTCGTGTATTTTATTTTCAGATAACACCTG




AAGACTTT





RPS25A
SEQ ID
ATTAGTGTACATCTGATAATATAGTACTACCACGTATGATAATGTAGAGAATAGTCTTCCTTGTC


terminator
NO: 312
GAGTGTGTTTGCAGTTTTCTTGAGTTTCAAGGTTTAAATGCTGGTATATTAGTTCATCGAAGGTT




TCAGCCAATAGCACCTTAAATCAATCAAACTAATTCGACTCTTACGAAAGAGCCTACTGTGTTTA




GTATCGAAGTCGTTTACCTTTCATGTTGAATAGCTTCCTCTCTGACCCTAACATTTCAAGATCCTC




CTAAAGTTACCCGGATTGTGAAATTCTAATGATCCACCTGCCCAATGCATTTTTTCTTTATTCAGT




TTACCTTTTTTACCTAATATACGAGCTTGTTAAAGTAAGTGGCACTGCAATACTAGGCTTATTGT




TGATATTATGATGAATCGTTTTCACAAACTTGATTTCCTGTGAACTCACCATGTACTAAGGAAAA




AAACATGCATCACCATCTGAATATTTGAC





RPL2A
SEQ ID
ACTATGTAACTAACGAAACAGCATGTACTAATAGAACCGTATCGAGAATATTTATTTAGGTGAG


terminator
NO: 313
TAGTAGGAGTGAACCAGACAGTCAATTTAGTGAGCTGTCCCAGCTTTTGTGCATTCCAGAATTG




CCGGTCAAATTGGTTATGGGTTATGGGGCTTTTCCGATTGAGGTTCAGTTTCTGCGGTTATCTCTT




TCTTGACCTGGTCTTTTACAGGCTGTTCTTTCTCCCCATGATTATTCTTTAGCTGAAGATACCGCT




TAGCCTGATAATGTCGTCGTTTTGTAATCAAAATCTTTAGTTGGGCATCGTCTGAGGTTTCCTTTG




GCTTCTGGGGTTGTTAGTAGGAACGTAGGAACCATAGTAACTTTTACACATACATTCTTATGATT




GCGAAGTAAGCTGAGTCTGCTGCTTGGCTCCCGAAGTACTTTCTCTTTCTCTACCGGTTGATTCT




CCTTCTGGTGCTCCTAAACGATTGTGTTAGAAGGGATTGAC





(TDH3
SEQ ID
GCGGCCGCTCGATTTGTATGTGAAATAGCTGAAATTCGAAAATTTCATTATGGCTGTATCTACTT


trans-
NO: 341
TAGCGTATTAGGCATTTGAGCATTGGCTTGAACAATGCGGGCTGTAGTGTGTCACCAAAGAAAC


criptional

CATTCGGGTTCGGATCTGGAAGTCCTCATCACGTGATGCCGATCTCGTGTATTTTATTTTCAGAT


terminator)

AACACCTGAAGACTTT
















TABLE 9







Exemplary SUC surface display molecules (surface display proteins)









Sequence
Sequence



Info
Info
Sequence Info





Exemplary
SEQ ID
CAGGTGAACCCACCTAACTATTTTTAACTGGCATCCAGTGAGCTCGCTGGGTGAAAGCCAACCA


SUC2
NO: 314
TCTTTTGTTTCGGGGAACCGTGCTCGCCCCGTAAAGTTAATTTTTTTTTCCCGCGCAGCTTTAATC


surface

TTTCGGCAGAGAAGGCGTTTTCATCGTAGCGTGGGAACAGAATAATCAGTTCATGTGCTATACA


display

GGCACATGGCAGCAGTCACTATTTTGCTTTTTAACCTTAAAGTCGTTCATCAATCATTAACTGAC


nucleotide

CAATCAGATTTTTTGCATTTGCCACTTATCTAAAAATACTTTTGTATCTCGCAGATACGTTCAGTG


sequence

GTTTCCAGGACAACACCCAAAAAAAGGTATCAATGCCACTAGGCAGTCGGTTTTATTTTTGGTC




ACCCACGCAAAGAAGCACCCACCTCTTTTAGGTTTTAAGTTGTGGGAACAGTAACACCGCCTAG




AGCTTCAGGAAAAACCAGTACCTGTGACCGCAATTCACCATGATGCAGAATGTTAATTTAAACG




AGTGCCAAATCAAGATTTCAACAGACAAATCAATCGATCCATAGTTACCCATTCCAGCCTTTTCG




TCGTCGAGCCTGCTTCATTCCTGCCTCAGGTGCATAACTTTGCATGAAAAGTCCAGATTAGGGCA




GATTTTGAGTTTAAAATAGGAAATATAAACAAATATACCGCGAAAAAGGTTTGTTTATAGCTTT




TCGCCTGGTGCCGTACGGTATAAATACATACTCTCCTCCCCCCCCTGGTTCTCTTTTTCTTTTGTT




ACTTACATTTTACCGTTCCGTCACTCGCTTCACTCAACAACAAAAGAATTCCGAAACG (GCW14




promoter)




ATGAGATTCCCATCTATTTTCACCGCTGTCTTGTTCGCTGCCTCCTCTGCATTGGCTGCCCCTGTT




CAGACTACCACTGAAGACGAGCTTGAGGGTGATTTCGACGTCGCTGTTTTGCCTTTCTCTGCTTC




CATTGCTGCTAAGGAAGAGGGTGTCTCTCTCGAGAAAAGAGAGGCCGAAGCT (secretion signal




(mini-alpha, alpha factor secretion signal with deletion and mutations to




eliminate EPS production))




TCTATGACGAATGAGACGTCAGACAGGCCACTGGTACATTTCACACCAAATAAAGGATGGATGA




ATGATCCCAACGGTCTGTGGTACGACGAGAAAGACGCTAAGTGGCATCTGTACTTTCAATATAA




CCCCAATGACACTGTTTGGGGTACGCCCCTTTTCTGGGGCCATGCAACCTCAGATGATCTGACTA




ACTGGGAAGACCAACCTATTGCCATCGCCCCCAAGCGTAATGATTCAGGCGCCTTCTCTGGAAG




TATGGTAGTCGACTACAACAACACGAGTGGTTTTTTCAACGACACAATTGACCCAAGACAGAGA




TGTGTTGCCATTTGGACATATAATACACCTGAGTCAGAAGAACAATATATATCCTACTCTCTGGA




TGGAGGTTATACTTTTACGGAGTATCAGAAAAACCCTGTTCTGGCAGCTAATTCCACCCAATTTC




GTGACCCAAAGGTGTTTTGGTATGAGCCATCACAGAAATGGATCATGACCGCCGCAAAGTCACA




GGACTATAAAATTGAGATATATTCATCAGACGACTTGAAATCCTGGAAGCTGGAGAGTGCTTTC




GCAAATGAAGGATTTTTGGGATACCAATACGAATGCCCTGGCCTGATCGAAGTGCCCACTGAAC




AAGACCCATCAAAGTCTTACTGGGTGATGTTTATCTCTATAAACCCCGGCGCTCCCGCTGGAGG




CTCCTTCAACCAATATTTCGTAGGTTCTTTTAACGGAACCCACTTCGAGGCATTCGATAACCAAT




CTAGGGTTGTCGATTTCGGAAAAGATTATTATGCACTACAAACCTTTTTTAATACGGACCCTACT




TATGGATCAGCTTTAGGCATAGCCTGGGCTTCTAACTGGGAGTACAGTGCCTTTGTTCCTACAAA




CCCATGGCGTTCCTCAATGAGTCTTGTCAGAAAATTCTCTCTGAATACGGAATATCAGGCTAACC




CCGAGACAGAACTAATAAACTTGAAAGCAGAGCCTATCTTGAATATAAGTAACGCTGGACCTTG




GAGTCGTTTTGCCACCAACACCACATTAACAAAAGCCAATTCCTATAACGTGGACCTTTCCAACT




CTACGGGAACACTGGAATTTGAACTGGTGTACGCCGTAAATACTACGCAAACAATTTCAAAGTC




AGTCTTTGCTGACCTTAGTCTATGGTTCAAGGGTTTAGAGGACCCCGAGGAGTACTTACGTATGG




GTTTTGAGGTATCAGCATCTTCCTTTTTCCTGGATCGTGGAAACTCCAAGGTGAAGTTTGTCAAG




GAAAATCCTTATTTCACTAACAGGATGTCCGTGAACAACCAGCCTTTCAAATCTGAGAACGATC




TTTCCTATTACAAGGTCTACGGACTTCTAGACCAAAACATATTGGAACTATACTTCAACGATGGA




GATGTAGTTTCCACCAACACCTATTTTATGACCACAGGCAACGCCCTTGGATCAGTAAACATGA




CAACAGGTGTTGATAACCTTTTTTACATAGACAAATTCCAAGTTAGAGAGGTAAAG (SUC2




sequence)




GGTTCATCAGGGTCCTCAGGATCATCCGGTAGTAGTGGTTCATCCGGTTCATCCGGATCAAGTG




GCTCCTCTGAAGCTGCAGCAAGGGAGGCTGCAGCCCGTGAGGCAGCCGCTAGAGAAGCCGCCG




CTAGGGGTGGTGGCGGCTCTGGCGGAGGCGGTTCCGGTGGCGGAGGCTCT (flex linkers)




CAAATCAACGAATTGAACGTTGTTTTAGATGATGTTAAGACCAACATTGCCGACTACATCACCC




TATCCTACACTCCAAATTCAGGTTTTTCCTTGGACCAAATGCCAGCTGGTATTATGGATATTGCT




GCGCAATTGGTTGCAAATCCAAGTGATGACTCCTACACCACTTTGTACTCTGAAGTGGACTTTTC




TGCTGTTGAGCATATGTTGACTATGGTCCCATGGTACTCTTCTAGACTGCTTCCAGAATTAGAAG




CAATGGATGCTTCTCTAACTACCTCAAGTTCTGCTGCCACATCTTCAAGTGAAGTTGCTAGCTCT




TCTATTGCTTCATCCACTAGCTCTTCTGTTGCACCATCCTCAAGTGAAGTTGTCAGCTCTTCCGTT




GCTTCATCCTCAAGTGAAGTTGCCAGCTCCTCTGTTGCGTCTACAAGCGAAGCTACTAGTTCTTC




TGCTGTCACATCTTCCTCCGCTGTTTCCTCTTCGACCGAGTCTGTTAGCTCTTCCTCTGTCAGTTC




TTCCTCAGCCGTTTCCTCTTCTGAAGCTGTCAGTTCCTCTCCAGTTTCCTCAGTTGTTTCATCTTCG




GCCGGACCTGCTAGCTCAAGCGTTGCTCCTTACAACTCAACCATTGCTAGCTCTTCTTCCACTGC




CCAGACTTCTATCTCGACCATTGCTCCTTACAACTCCACAACCACCACCACCCCAGCTAGTTCTG




CTTCCAGCGTTATTATCTCAACCAGAAACGGTACCACTGTTACTGAAACTGACAACACTCTTGTC




ACCAAAGAAACCACTGTCTGTGACTACTCTTCAACATCTGCCGTTCCAGCTTCCACCACCGGTTA




CAACAATTCTACTAAGGTTTCAACCGCTACTATCTGCAGTACATGCAAAGAAGGTACCTCTACT




GCAACTGACTTCTCTACACTAAAGACTACAGTTACCGTATGTGACTCCGCCTGTCAAGCTAAGA




AGTCTGCTACCGTTGTTAGCGTTCAATCTAAAACTACCGGTATCGTTGAACAAACCGAAAACGG




TGCTGCCAAGGCTGTTATCGGTATGGGTGCCGGTGCTTTAGCTGCTGTTGCCGCCATGCTACTAT




GA (Tir4 anchors)




GCGGCCGCTCGATTTGTATGTGAAATAGCTGAAATTCGAAAATTTCATTATGGCTGTATCTACTT




TAGCGTATTAGGCATTTGAGCATTGGCTTGAACAATGCGGGCTGTAGTGTGTCACCAAAGAAAC




CATTCGGGTTCGGATCTGGAAGTCCTCATCACGTGATGCCGATCTCGTGTATTTTATTTTCAGAT




AACACCTGAAGACTTT (TDH3 transcriptional terminator)





Exemplary
SEQ ID
SMTNETSDRPLVHFTPNKGWMNDPNGLWYDEKDAKWHLYFQYNPNDTVWGTPLFWGHATSDDL


SUC2
NO: 315
TNWEDQPIAIAPKRNDSGAFSGSMVVDYNNTSGFFNDTIDPRQRCVAIWTYNTPESEEQYISYSLDG


surface

GYTFTEYQKNPVLAANSTQFRDPKVFWYEPSQKWIMTAAKSQDYKIEIYSSDDLKSWKLESAFANE


display

GFLGYQYECPGLIEVPTEQDPSKSYWVMFISINPGAPAGGSFNQYFVGSFNGTHFEAFDNQSRVVDF


protein

GKDYYALQTFFNTDPTYGSALGIAWASNWEYSAFVPTNPWRSSMSLVRKFSLNTEYQANPETELINL


sequence

KAEPILNISNAGPWSRFATNTTLTKANSYNVDLSNSTGTLEFELVYAVNTTQTISKSVFADLSLWFKG




LEDPEEYLRMGFEVSASSFFLDRGNSKVKFVKENPYFTNRMSVNNQPFKSENDLSYYKVYGLLDQNI




LELYFNDGDVVSTNTYFMTTGNALGSVNMTTGVDNLFYIDKFQVREVK (SUC2 sequence)




GSSGSSGSSGSSGSSGSSGSSGSSEAAAREAAAREAAAREAAARGGGGSGGGGSGGGGS (linker




sequence)




QINELNVVLDDVKTNIADYITLSYTPNSGFSLDQMPAGIMDIAAQLVANPSDDSYTTLYSEVDFSAVE




HMLTMVPWYSSRLLPELEAMDASLTTSSSAATSSSEVASSSIASSTSSSVAPSSSEVVSSSVASSSSEV




ASSSVASTSEATSSSAVTSSSAVSSSTESVSSSSVSSSSAVSSSEAVSSSPVSSVVSSSAGPASSSVAPYN




STIASSSSTAQTSISTIAPYNSTTTTTPASSASSVIISTRNGTTVTETDNTLVTKETTVCDYSSTSAVPAST




TGYNNSTKVSTATICSTCKEGTSTATDFSTLKTTVTVCDSACQAKKSATVVSVQSKTTGIVEQTENG




AAKAVIGMGAGALAAVAAMLL (Tir4 anchor)





Exemplary
SEQ ID
SMTNETSDRPLVHFTPNKGWMNDPNGLWYDEKDAKWHLYFQYNPNDTVWGTPLFWGHATSDDL


SUC2
NO: 332
TNWEDQPIAIAPKRNDSGAFSGSMVVDYNNTSGFFNDTIDPRQRCVAIWTYNTPESEEQYISYSLDG


surface

GYTFTEYQKNPVLAANSTQFRDPKVFWYEPSQKWIMTAAKSQDYKIEIYSSDDLKSWKLESAFANE


display

GFLGYQYECPGLIEVPTEQDPSKSYWVMFISINPGAPAGGSFNQYFVGSFNGTHFEAFDNQSRVVDF


protein

GKDYYALQTFFNTDPTYGSALGIAWASNWEYSAFVPTNPWRSSMSLVRKFSLNTEYQANPETELINL


sequence

KAEPILNISNAGPWSRFATNTTLTKANSYNVDLSNSTGTLEFELVYAVNTTQTISKSVFADLSLWFKG


(without

LEDPEEYLRMGFEVSASSFFLDRGNSKVKFVKENPYFTNRMSVNNQPFKSENDLSYYKVYGLLDQNI


C-

LELYFNDGDVVSTNTYFMTTGNALGSVNMTTGVDNLFYIDKFQVREVK (SUC2 sequence)


terminus

GSSGSSGSSGSSGSSGSSGSSGSSEAAAREAAAREAAAREAAARGGGGSGGGGSGGGGS (linker


of Tir4

sequence)


GPI

QINELNVVLDDVKTNIADYITLSYTPNSGFSLDQMPAGIMDIAAQLVANPSDDSYTTLYSEVDFSAVE


anchor or

HMLTMVPWYSSRLLPELEAMDASLTTSSSAATSSSEVASSSIASSTSSSVAPSSSEVVSSSVASSSSEV


signal

ASSSVASTSEATSSSAVTSSSAVSSSTESVSSSSVSSSSAVSSSEAVSSSPVSSVVSSSAGPASSSVAPYN


peptide)

STIASSSSTAQTSISTIAPYNSTTTTTPASSASSVIISTRNGTTVTETDNTLVTKETTVCDYSSTSAVPAST




TGYNNSTKVSTATICSTCKEGTSTATDFSTLKTTVTVCDSACQAKKSATVVSVQSKTTGIVEQTEN





Exemplary
SEQ ID
MRFPSIFTAVLFAASSALAAPVQTTTEDELEGDFDVAVLPFSASIAAKEEGVSLEKREAEASMTNETS


SUC2
NO: 333
DRPLVHFTPNKGWMNDPNGLWYDEKDAKWHLYFQYNPNDTVWGTPLFWGHATSDDLTNWEDQP


surface

LAIAPKRNDSGAFSGSMVVDYNNTSGFFNDTIDPRQRCVAIWTYNTPESEEQYISYSLDGGYTFTEYQ


display

KNPVLAANSTQFRDPKVFWYEPSQKWIMTAAKSQDYKIEIYSSDDLKSWKLESAFANEGFLGYQYE


protein

CPGLIEVPTEQDPSKSYWVMFISINPGAPAGGSFNQYFVGSFNGTHFEAFDNQSRVVDFGKDYYALQ


sequence

TFFNTDPTYGSALGIAWASNWEYSAFVPTNPWRSSMSLVRKFSLNTEYQANPETELINLKAEPILNIS




NAGPWSRFATNTTLTKANSYNVDLSNSTGTLEFELVYAVNTTQTISKSVFADLSLWFKGLEDPEEYL




RMGFEVSASSFFLDRGNSKVKFVKENPYFTNRMSVNNQPFKSENDLSYYKVYGLLDQNILELYFND




GDVVSTNTYFMTTGNALGSVNMTTGVDNLFYIDKFQVREVKGSSGSSGSSGSSGSSGSSGSSGSSEA




AAREAAAREAAAREAAARGGGGSGGGGSGGGGSQINELNVVLDDVKTNIADYITLSYTPNSGFSLD




QMPAGIMDIAAQLVANPSDDSYTTLYSEVDFSAVEHMLTMVPWYSSRLLPELEAMDASLTTSSSAA




TSSSEVASSSIASSTSSSVAPSSSEVVSSSVASSSSEVASSSVASTSEATSSSAVTSSSAVSSSTESVSSSS




VSSSSAVSSSEAVSSSPVSSVVSSSAGPASSSVAPYNSTIASSSSTAQTSISTIAPYNSTTTTTPASSASSV




IISTRNGTTVTETDNTLVTKETTVCDYSSTSAVPASTTGYNNSTKVSTATICSTCKEGTSTATDFSTLK




TTVTVCDSACQAKKSATVVSVQSKTTGIVEQTENGAAKAVIGMGAGALAAVAAMLL*





Exemplary
SEQ ID
SMTNETSDRPLVHFTPNKGWMNDPNGLWYDEKDAKWHLYFQYNPNDTVWGTPLFWGHATSDDL


SUC2
NO: 334
TNWEDQPIAIAPKRNDSGAFSGSMVVDYNNTSGFFNDTIDPRQRCVAIWTYNTPESEEQYISYSLDG


surface

GYTFTEYQKNPVLAANSTQFRDPKVFWYEPSQKWIMTAAKSQDYKIEIYSSDDLKSWKLESAFANE


display

GFLGYQYECPGLIEVPTEQDPSKSYWVMFISINPGAPAGGSFNQYFVGSFNGTHFEAFDNQSRVVDF


protein

GKDYYALQTFFNTDPTYGSALGIAWASNWEYSAFVPTNPWRSSMSLVRKFSLNTEYQANPETELINL


sequence

KAEPILNISNAGPWSRFATNTTLTKANSYNVDLSNSTGTLEFELVYAVNTTQTISKSVFADLSLWFKG


(without

LEDPEEYLRMGFEVSASSFFLDRGNSKVKFVKENPYFTNRMSVNNQPFKSENDLSYYKVYGLLDQNI


extreme

LELYFNDGDVVSTNTYFMTTGNALGSVNMTTGVDNLFYIDKFQVREVKGSSGSSGSSGSSGSSGSSG


C-

SSGSSEAAAREAAAREAAAREAAARGGGGSGGGGSGGGGSQINELNVVLDDVKTNIADYITLSYTP


terminus

NSGFSLDQMPAGIMDIAAQLVANPSDDSYTTLYSEVDFSAVEHMLTMVPWYSSRLLPELEAMDASL


of the Tir4

TTSSSAATSSSEVASSSIASSTSSSVAPSSSEVVSSSVASSSSEVASSSVASTSEATSSSAVTSSSAVSSST


GPI

ESVSSSSVSSSSAVSSSEAVSSSPVSSVVSSSAGPASSSVAPYNSTIASSSSTAQTSISTIAPYNSTTTTTP


anchor or

ASSASSVIISTRNGTTVTETDNTLVTKETTVCDYSSTSAVPASTTGYNNSTKVSTATICSTCKEGTSTA


signal

TDFSTLKTTVTVCDSACQAKKSATVVSVQSKTTGIVEQTEN


peptide)







Exemplary
SEQ ID
MRFPSIFTAVLFAASSALAAPVQTTTEDELEGDFDVAVLPFSASIAAKEEGVSLEKREAEASMTNETS


SUC2
NO: 335
DRPLVHFTPNKGWMNDPNGLWYDEKDAKWHLYFQYNPNDTVWGTPLFWGHATSDDLTNWEDQP


surface

LAIAPKRNDSGAFSGSMVVDYNNTSGFFNDTIDPRQRCVAIWTYNTPESEEQYISYSLDGGYTFTEYQ


display

KNPVLAANSTQFRDPKVFWYEPSQKWIMTAAKSQDYKIEIYSSDDLKSWKLESAFANEGFLGYQYE


protein

CPGLIEVPTEQDPSKSYWVMFISINPGAPAGGSFNQYFVGSFNGTHFEAFDNQSRVVDFGKDYYALQ


sequence

TFFNTDPTYGSALGIAWASNWEYSAFVPTNPWRSSMSLVRKFSLNTEYQANPETELINLKAEPILNIS


(without

NAGPWSRFATNTTLTKANSYNVDLSNSTGTLEFELVYAVNTTQTISKSVFADLSLWFKGLEDPEEYL


extreme

RMGFEVSASSFFLDRGNSKVKFVKENPYFTNRMSVNNQPFKSENDLSYYKVYGLLDQNILELYFND


C-

GDVVSTNTYFMTTGNALGSVNMTTGVDNLFYIDKFQVREVKGSSGSSGSSGSSGSSGSSGSSGSSEA


terminus

AAREAAAREAAAREAAARGGGGSGGGGSGGGGSQINELNVVLDDVKTNIADYITLSYTPNSGFSLD


of the Tir4

QMPAGIMDIAAQLVANPSDDSYTTLYSEVDFSAVEHMLTMVPWYSSRLLPELEAMDASLTTSSSAA


GPI

TSSSEVASSSIASSTSSSVAPSSSEVVSSSVASSSSEVASSSVASTSEATSSSAVTSSSAVSSSTESVSSSS


anchor)

VSSSSAVSSSEAVSSSPVSSVVSSSAGPASSSVAPYNSTIASSSSTAQTSISTIAPYNSTTTTTPASSASSV




IISTRNGTTVTETDNTLVTKETTVCDYSSTSAVPASTTGYNNSTKVSTATICSTCKEGTSTATDFSTLK




TTVTVCDSACQAKKSATVVSVQSKTTGIVEQTEN





Exemplary
SEQ ID
EAEASMTNETSDRPLVHFTPNKGWMNDPNGLWYDEKDAKWHLYFQYNPNDTVWGTPLFWGHAT


SUC2
NO: 342
SDDLTNWEDQPIAIAPKRNDSGAFSGSMVVDYNNTSGFFNDTIDPRQRCVAIWTYNTPESEEQYISYS


surface

LDGGYTFTEYQKNPVLAANSTQFRDPKVFWYEPSQKWIMTAAKSQDYKIEIYSSDDLKSWKLESAF


display

ANEGFLGYQYECPGLIEVPTEQDPSKSYWVMFISINPGAPAGGSFNQYFVGSFNGTHFEAFDNQSRV


protein

VDFGKDYYALQTFFNTDPTYGSALGIAWASNWEYSAFVPTNPWRSSMSLVRKFSLNTEYQANPETE


sequence

LINLKAEPILNISNAGPWSRFATNTTLTKANSYNVDLSNSTGTLEFELVYAVNTTQTISKSVFADLSL


(Post-

WFKGLEDPEEYLRMGFEVSASSFFLDRGNSKVKFVKENPYFTNRMSVNNQPFKSENDLSYYKVYGL


processing

LDQNILELYFNDGDVVSTNTYFMTTGNALGSVNMTTGVDNLFYIDKFQVREVKGSSGSSGSSGSSGS


mature

SGSSGSSGSSEAAAREAAAREAAAREAAARGGGGSGGGGSGGGGSQINELNVVLDDVKTNIADYIT


sequence

LSYTPNSGFSLDQMPAGIMDIAAQLVANPSDDSYTTLYSEVDFSAVEHMLTMVPWYSSRLLPELEA


(with

MDASLTTSSSAATSSSEVASSSIASSTSSSVAPSSSEVVSSSVASSSSEVASSSVASTSEATSSSAVTSSS


secretion

AVSSSTESVSSSSVSSSSAVSSSEAVSSSPVSSVVSSSAGPASSSVAPYNSTIASSSSTAQTSISTIAPYNS


signal

TTTTTPASSASSVIISTRNGTTVTETDNTLVTKETTVCDYSSTSAVPASTTGYNNSTKVSTATICSTCKE


cleaved

GTSTATDFSTLKTTVTVCDSACQAKKSATVVSVQSKTTGIVEQTEN


off the N-




term and




propeptide




of Tir4




cleaved




off the C-




term))
















TABLE 10







Exemplary transporter proteins









Sequence
Sequence



Info
Info
Sequence






Saccharomyces

SEQ ID
MKNIISLVSKKKAASKNEDKNISESSRDIVNQQEVENTEDFEEGKKDSAFELDHLEFTTNSAQLGDS



cerevisiae

NO: 316
DEDNENVINEMNATDDANEANSEEKSMTLKQALLKYPKAALWSILVSTTLVMEGYDTALLSALY


MAL11 or

ALPVFQRKFGTLNGEGSYEITSQWQIGLNMCVLCGEMIGLQITTYMVEFMGNRYTMITALGLLTAY


AGT1-

IFILYYCKSLAMIAVGQILSAIPWGCFQSLAVTYASEVCPLALRYYMTSYSNICWLFGQIFASGIMKN


sucrose

SQENLGNSDLGYKLPFALQWIWPAPLMIGIFFAPESPWWLVRKDRVAEARKSLSRILSGKGAEKDI


permease

QVDLTLKQIELTIEKERLLASKSGSFFNCFKGVNGRRTRLACLTWVAQNSSGAVLLGYSTYFFERA


UniProtKB-

GMATDKAFTFSLIQYCLGLAGTLCSWVISGRVGRWTILTYGLAFQMVCLFIIGGMGFGSGSSASNG


P53048

AGGLLLALSFFYNAGIGAVVYCIVAEIPSAELRTKTIVLARICYNLMAVINAILTPYMLNVSDWNWG


(MAL11_YEAST)

AKTGLYWGGFTAVTLAWVIIDLPETTGRTFSEINELFNQGVPARKFASTVVDPFGKGKTQHDSLAD




ESISQSSSIKQRELNAADKC






Pichiaangusta

SEQ ID
MPEFVENIEKPEEAEVIPDITKKINTLSDSDDGSGAFNDYIARFVEISTNAQNNEHQEKHMSLKEGLK


MAL2-
NO: 317
TFPKAACWSIVLSTAIIMEGYDTTLLNSLYSMQSFAKKYGKYYPEIDQYQVPAKWQTSLSMSTYVG


maltose

EIVGLYIAGLVAEKWGYRRTLISFMAAVVGLIFILFFAVDVQMLLAGELLCGIVWGAFQTLTVSYA


transporter

SEVCPVVLRIYLTTYVNACWVIGQLIAACLLRGTMTLTSEWSYKIPFAVQWIWPVPIMIGIYLAPESP


UniProtKB-

WWLVKKNRDAEAKKSITRLLSPNTEVPDVAPLAEAMLNKMQLTIKEESARTSNVSYFDCFKHGNF


Q32SL4

RRTRIAAMIWLIQNITGSVLMGYSTYFYIQAGLDSSMSFTFSIIQYALGLLGTLASWLLSQKLGRFDI


(Q32SL4_PICAN)

YFLGLSINTCILIIVGGLGFSSSTSASWAIGSLLLVFTFVYDSSIGPITYCTVAEIPSSTVRAKTVALAR




NWYNLSQIPLSIVTPYMLNPTAWNWKAKAALLWAGLSICSLIYIWFEFPETKGRTYAELDILFKNGT




SARKFRSTQVETFNPQEMLKKMNNEDIIQVVDGDLDAGAATAKV










Expression or Secretion of a Protein of Interest in Host Cells with an Alternative Carbon Source


In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide produces about the same amount of a protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide. In these embodiments, “about the same amount” includes from about 1% to about 10%—more or less—protein of interest production.


In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide produces more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide produces about 1% to about 200% more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide produces about 1% to about 2%, about 1% to about 5%, about 1% to about 10%, about 1% to about 15%, about 1% to about 20%, about 1% to about 30%, about 1% to about 40%, about 1% to about 50%, about 1% to about 75%, about 1% to about 100%, about 1% to about 150%, about 1% to about 200%, about 2% to about 5%, about 2% to about 10%, about 2% to about 15%, about 2% to about 20%, about 2% to about 30%, about 2% to about 40%, about 2% to about 50%, about 2% to about 75%, about 2% to about 100%, about 2% to about 150%, about 2% to about 200%, about 5% to about 10%, about 5% to about 15%, about 5% to about 20%, about 5% to about 30%, about 5% to about 40%, about 5% to about 50%, about 5% to about 75%, about 5% to about 100%, about 5% to about 150%, about 5% to about 200%, about 10% to about 15%, about 10% to about 20%, about 10% to about 30%, about 10% to about 40%, about 10% to about 50%, about 10% to about 75%, about 10% to about 100%, about 10% to about 200%, about 15% to about 20%, about 15% to about 30%, about 15% to about 40%, about 15% to about 50%, about 15% to about 75%, about 15% to about 100%, about 15% to about 150%, about 15% to about 200%, about 20% to about 30%, about 20% to about 40%, about 20% to about 50%, about 20% to about 75%, about 20% to about 100%, about 20% to about 200%, about 30% to about 40%, about 30% to about 50%, about 30% to about 75%, about 30% to about 100%, about 30% to about 150%, about 30% to about 200%, about 40% to about 50%, about 40% to about 75%, about 40% to about 100%, about 40% to about 150%, about 40% to about 200%, about 50% to about 75%, about 50% to about 100%, about 50% to about 200%, about 75% to about 100%, about 75% to about 200%, or about 100% to about 200% more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide produces about 1%, about 2%, about 5%, about 10%, about 15%, about 20%, about 30%, about 40%, about 50%, about 75%, about 100%, about 150%, or about 200% more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide produces at least about 1%, about 2%, about 5%, about 10%, about 15%, about 20%, about 30%, about 40%, about 50%, about 75%, about 150%, or about 100% more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide produces at most about 2%, about 5%, about 10%, about 15%, about 20%, about 30%, about 40%, about 50%, about 75%, about 100%, about 150% or about 200% more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide.


In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide secretes the same amount of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide. In these embodiments, “about the same amount” includes from about 1% to about 10% —more or less—protein of interest secretion.


In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide secretes more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide secretes about 1% to about 200% more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide secretes about 1% to about 2%, about 1% to about 5%, about 1% to about 10%, about 1% to about 15%, about 1% to about 20%, about 1% to about 30%, about 1% to about 40%, about 1% to about 50%, about 1% to about 75%, about 1% to about 100%, about 1% to about 150%, about 1% to about 200%, about 2% to about 5%, about 2% to about 10%, about 2% to about 15%, about 2% to about 20%, about 2% to about 30%, about 2% to about 40%, about 2% to about 50%, about 2% to about 75%, about 2% to about 100%, about 2% to about 150%, about 2% to about 200%, about 5% to about 10%, about 5% to about 15%, about 5% to about 20%, about 5% to about 30%, about 5% to about 40%, about 5% to about 50%, about 5% to about 75%, about 5% to about 100%, about 5% to about 150%, about 5% to about 200%, about 10% to about 15%, about 10% to about 20%, about 10% to about 30%, about 10% to about 40%, about 10% to about 50%, about 10% to about 75%, about 10% to about 100%, about 10% to about 200%, about 15% to about 20%, about 15% to about 30%, about 15% to about 40%, about 15% to about 50%, about 15% to about 75%, about 15% to about 100%, about 15% to about 150%, about 15% to about 200%, about 20% to about 30%, about 20% to about 40%, about 20% to about 50%, about 20% to about 75%, about 20% to about 100%, about 20% to about 200%, about 30% to about 40%, about 30% to about 50%, about 30% to about 75%, about 30% to about 100%, about 30% to about 150%, about 30% to about 200%, about 40% to about 50%, about 40% to about 75%, about 40% to about 100%, about 40% to about 150%, about 40% to about 200%, about 50% to about 75%, about 50% to about 100%, about 50% to about 200%, about 75% to about 100%, about 75% to about 200%, or about 100% to about 200% more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide secretes about 1%, about 2%, about 5%, about 10%, about 15%, about 20%, about 30%, about 40%, about 50%, about 75%, about 100%, about 150%, or about 200% more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide secretes at least about 1%, about 2%, about 5%, about 10%, about 15%, about 20%, about 30%, about 40%, about 50%, about 75%, about 150%, or about 100% more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide secretes at most about 2%, about 5%, about 10%, about 15%, about 20%, about 30%, about 40%, about 50%, about 75%, about 100%, about 150% or about 200% more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide.


In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide produces about the same amount of a protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when each are fed a growth medium comprising glucose as its carbon source. In these embodiments, “about the same amount” includes from about 1% to about 10% —more or less—protein of interest production.


In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide produces more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when each are fed a growth medium comprising glucose as its carbon source. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide produces about 1% to about 200% more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when each are fed a growth medium comprising glucose as its carbon source. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide produces about 1% to about 2%, about 1% to about 5%, about 1% to about 10%, about 1% to about 15%, about 1% to about 20%, about 1% to about 30%, about 1% to about 40%, about 1% to about 50%, about 1% to about 75%, about 1% to about 100%, about 1% to about 150%, about 1% to about 200%, about 2% to about 5%, about 2% to about 10%, about 2% to about 15%, about 2% to about 20%, about 2% to about 30%, about 2% to about 40%, about 2% to about 50%, about 2% to about 75%, about 2% to about 100%, about 2% to about 150%, about 2% to about 200%, about 5% to about 10%, about 5% to about 15%, about 5% to about 20%, about 5% to about 30%, about 5% to about 40%, about 5% to about 50%, about 5% to about 75%, about 5% to about 100%, about 5% to about 150%, about 5% to about 200%, about 10% to about 15%, about 10% to about 20%, about 10% to about 30%, about 10% to about 40%, about 10% to about 50%, about 10% to about 75%, about 10% to about 100%, about 10% to about 200%, about 15% to about 20%, about 15% to about 30%, about 15% to about 40%, about 15% to about 50%, about 15% to about 75%, about 15% to about 100%, about 15% to about 150%, about 15% to about 200%, about 20% to about 30%, about 20% to about 40%, about 20% to about 50%, about 20% to about 75%, about 20% to about 100%, about 20% to about 200%, about 30% to about 40%, about 30% to about 50%, about 30% to about 75%, about 30% to about 100%, about 30% to about 150%, about 30% to about 200%, about 40% to about 50%, about 40% to about 75%, about 40% to about 100%, about 40% to about 150%, about 40% to about 200%, about 50% to about 75%, about 50% to about 100%, about 50% to about 200%, about 75% to about 100%, about 75% to about 200%, or about 100% to about 200% more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when each are fed a growth medium comprising glucose as its carbon source. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide produces about 1%, about 2%, about 5%, about 10%, about 15%, about 20%, about 30%, about 40%, about 50%, about 75%, about 100%, about 150%, or about 200% more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when each are fed a growth medium comprising glucose as its carbon source. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide produces at least about 1%, about 2%, about 5%, about 10%, about 15%, about 20%, about 30%, about 40%, about 50%, about 75%, about 150%, or about 100% more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when each are fed a growth medium comprising glucose as its carbon source. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide produces at most about 2%, about 5%, about 10%, about 15%, about 20%, about 30%, about 40%, about 50%, about 75%, about 100%, about 150% or about 200% more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when each are fed a growth medium comprising glucose as its carbon source.


In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide secretes the same amount of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when each are fed a growth medium comprising glucose as its carbon source. In these embodiments, “about the same amount” includes from about 1% to about 10% —more or less—protein of interest secretion.


In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide secretes more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when each are fed a growth medium comprising glucose as its carbon source. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide secretes about 1% to about 200% more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when each are fed a growth medium comprising glucose as its carbon source. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide secretes about 1% to about 2%, about 1% to about 5%, about 1% to about 10%, about 1% to about 15%, about 1% to about 20%, about 1% to about 30%, about 1% to about 40%, about 1% to about 50%, about 1% to about 75%, about 1% to about 100%, about 1% to about 150%, about 1% to about 200%, about 2% to about 5%, about 2% to about 10%, about 2% to about 15%, about 2% to about 20%, about 2% to about 30%, about 2% to about 40%, about 2% to about 50%, about 2% to about 75%, about 2% to about 100%, about 2% to about 150%, about 2% to about 200%, about 5% to about 10%, about 5% to about 15%, about 5% to about 20%, about 5% to about 30%, about 5% to about 40%, about 5% to about 50%, about 5% to about 75%, about 5% to about 100%, about 5% to about 150%, about 5% to about 200%, about 10% to about 15%, about 10% to about 20%, about 10% to about 30%, about 10% to about 40%, about 10% to about 50%, about 10% to about 75%, about 10% to about 100%, about 10% to about 200%, about 15% to about 20%, about 15% to about 30%, about 15% to about 40%, about 15% to about 50%, about 15% to about 75%, about 15% to about 100%, about 15% to about 150%, about 15% to about 200%, about 20% to about 30%, about 20% to about 40%, about 20% to about 50%, about 20% to about 75%, about 20% to about 100%, about 20% to about 200%, about 30% to about 40%, about 30% to about 50%, about 30% to about 75%, about 30% to about 100%, about 30% to about 150%, about 30% to about 200%, about 40% to about 50%, about 40% to about 75%, about 40% to about 100%, about 40% to about 150%, about 40% to about 200%, about 50% to about 75%, about 50% to about 100%, about 50% to about 200%, about 75% to about 100%, about 75% to about 200%, or about 100% to about 200% more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when each are fed a growth medium comprising glucose as its carbon source. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide secretes about 1%, about 2%, about 5%, about 10%, about 15%, about 20%, about 30%, about 40%, about 50%, about 75%, about 100%, about 150%, or about 200% more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when each are fed a growth medium comprising glucose as its carbon source. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide secretes at least about 1%, about 2%, about 5%, about 10%, about 15%, about 20%, about 30%, about 40%, about 50%, about 75%, about 150%, or about 100% more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when each are fed a growth medium comprising glucose as its carbon source. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide secretes at most about 2%, about 5%, about 10%, about 15%, about 20%, about 30%, about 40%, about 50%, about 75%, about 100%, about 150% or about 200% more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when each are fed a growth medium comprising glucose as its carbon source.


In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide produces more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when each are fed a growth medium comprising a disaccharide, e.g., sucrose, as its carbon source. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide produces about 10% to about 2000% more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when each are fed a growth medium comprising a disaccharide, e.g., sucrose, as its carbon source. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide produces about 10% to about 20%, about 10% to about 50%, about 10% to about 100%, about 10% to about 150%, about 10% to about 200%, about 10% to about 300%, about 10% to about 400%, about 10% to about 500%, about 10% to about 750%, about 10% to about 1000%, about 10% to about 1500%, about 10% to about 2000%, about 20% to about 50%, about 20% to about 100%, about 20% to about 150%, about 20% to about 200%, about 20% to about 300%, about 20% to about 400%, about 20% to about 500%, about 20% to about 750%, about 20% to about 1000%, about 20% to about 1500%, about 20% to about 2000%, about 50% to about 100%, about 50% to about 150%, about 50% to about 200%, about 50% to about 300%, about 50% to about 400%, about 50% to about 500%, about 50% to about 750%, about 50% to about 1000%, about 50% to about 1500%, about 50% to about 2000%, about 100% to about 150%, about 100% to about 200%, about 100% to about 300%, about 100% to about 400%, about 100% to about 500%, about 100% to about 750%, about 100% to about 1000%, about 100% to about 2000%, about 150% to about 200%, about 150% to about 300%, about 150% to about 400%, about 150% to about 500%, about 150% to about 750%, about 150% to about 1000%, about 150% to about 1500%, about 150% to about 2000%, about 200% to about 300%, about 200% to about 400%, about 200% to about 500%, about 200% to about 750%, about 200% to about 1000%, about 200% to about 2000%, about 300% to about 400%, about 300% to about 500%, about 300% to about 750%, about 300% to about 1000%, about 300% to about 1500%, about 300% to about 2000%, about 400% to about 500%, about 400% to about 750%, about 400% to about 1000%, about 400% to about 1500%, about 400% to about 2000%, about 500% to about 750%, about 500% to about 1000%, about 500% to about 2000%, about 750% to about 1000%, about 750% to about 2000%, or about 1000% to about 2000% more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when each are fed a growth medium comprising a disaccharide, e.g., sucrose, as its carbon source. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide produces at least about 10%, at least about 20%, at least about 50%, at least about 100%, at least about 150%, at least about 200%, at least about 300%, at least about 400%, at least about 500%, at least about 750%, at least about 1000%, at least about 1500%, or at least about 2000%, at least about 3000%, at least about 4000%, at least about 5000%, at least about 7500%, at least about 10000% more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when each are fed a growth medium comprising a disaccharide, e.g., sucrose, as its carbon source. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide produces about 10%, about 20%, about 50%, about 100%, about 150%, about 200%, about 300%, about 400%, about 500%, about 750%, about 1000%, about 1500%, or about 2000%, about 3000%, about 4000%, about 5000%, about 7500%, about 10000% more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when each are fed a growth medium comprising a disaccharide, e.g., sucrose, as its carbon source. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide produces at most about 20%, about 50%, about 100%, about 150%, about 200%, about 300%, about 400%, about 500%, about 750%, about 1000%, about 1500%, or about 2000%, about 3000%, about 4000%, about 5000%, about 7500%, about 10000% more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when each are fed a growth medium comprising a disaccharide, e.g., sucrose, as its carbon source.


In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide secretes more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when each are fed a growth medium comprising a disaccharide, e.g., sucrose, as its carbon source. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide secretes about 10% to about 2000% more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when each are fed a growth medium comprising a disaccharide, e.g., sucrose, as its carbon source. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide secretes about 10% to about 20%, about 10% to about 50%, about 10% to about 100%, about 10% to about 150%, about 10% to about 200%, about 10% to about 300%, about 10% to about 400%, about 10% to about 500%, about 10% to about 750%, about 10% to about 1000%, about 10% to about 1500%, about 10% to about 2000%, about 20% to about 50%, about 20% to about 100%, about 20% to about 150%, about 20% to about 200%, about 20% to about 300%, about 20% to about 400%, about 20% to about 500%, about 20% to about 750%, about 20% to about 1000%, about 20% to about 1500%, about 20% to about 2000%, about 50% to about 100%, about 50% to about 150%, about 50% to about 200%, about 50% to about 300%, about 50% to about 400%, about 50% to about 500%, about 50% to about 750%, about 50% to about 1000%, about 50% to about 1500%, about 50% to about 2000%, about 100% to about 150%, about 100% to about 200%, about 100% to about 300%, about 100% to about 400%, about 100% to about 500%, about 100% to about 750%, about 100% to about 1000%, about 100% to about 2000%, about 150% to about 200%, about 150% to about 300%, about 150% to about 400%, about 150% to about 500%, about 150% to about 750%, about 150% to about 1000%, about 150% to about 1500%, about 150% to about 2000%, about 200% to about 300%, about 200% to about 400%, about 200% to about 500%, about 200% to about 750%, about 200% to about 1000%, about 200% to about 2000%, about 300% to about 400%, about 300% to about 500%, about 300% to about 750%, about 300% to about 1000%, about 300% to about 1500%, about 300% to about 2000%, about 400% to about 500%, about 400% to about 750%, about 400% to about 1000%, about 400% to about 1500%, about 400% to about 2000%, about 500% to about 750%, about 500% to about 1000%, about 500% to about 2000%, about 750% to about 1000%, about 750% to about 2000%, or about 1000% to about 2000% more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when each are fed a growth medium comprising a disaccharide, e.g., sucrose, as its carbon source. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide secretes about 10%, about 20%, about 50%, about 100%, about 150%, about 200%, about 300%, about 400%, about 500%, about 750%, about 1000%, about 1500%, or about 2000%, about 3000%, about 4000%, about 5000%, about 7500%, about 10000% more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when each are fed a growth medium comprising a disaccharide, e.g., sucrose, as its carbon source. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide secretes at least about 10%, at least about 20%, at least about 50%, at least about 100%, at least about 150%, at least about 200%, at least about 300%, at least about 400%, at least about 500%, at least about 750%, at least about 1000%, at least about 1500%, or at least about 2000%, at least about 3000%, at least about 4000%, at least about 5000%, at least about 7500%, at least about 10000% more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when each are fed a growth medium comprising a disaccharide, e.g., sucrose, as its carbon source. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide secretes at most about 20%, about 50%, about 100%, about 150%, about 200%, about 300%, about 400%, about 500%, about 750%, about 1000%, about 1500%, or about 2000%, about 3000%, about 4000%, about 5000%, about 7500%, about 10000% more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when each are fed a growth medium comprising a disaccharide, e.g., sucrose, as its carbon source.


In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide produces more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when the engineered host cell is fed a growth medium comprising a disaccharide, e.g., sucrose, as its carbon source and the similar cell is fed a growth medium comprising glucose. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide produces about 10% to about 2000% more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when the engineered host cell is fed a growth medium comprising a disaccharide, e.g., sucrose, as its carbon source and the similar cell is fed a growth medium comprising glucose. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide produces about 10% to about 20%, about 10% to about 50%, about 10% to about 100%, about 10% to about 150%, about 10% to about 200%, about 10% to about 300%, about 10% to about 400%, about 10% to about 500%, about 10% to about 750%, about 10% to about 1000%, about 10% to about 1500%, about 10% to about 2000%, about 20% to about 50%, about 20% to about 100%, about 20% to about 150%, about 20% to about 200%, about 20% to about 300%, about 20% to about 400%, about 20% to about 500%, about 20% to about 750%, about 20% to about 1000%, about 20% to about 1500%, about 20% to about 2000%, about 50% to about 100%, about 50% to about 150%, about 50% to about 200%, about 50% to about 300%, about 50% to about 400%, about 50% to about 500%, about 50% to about 750%, about 50% to about 1000%, about 50% to about 1500%, about 50% to about 2000%, about 100% to about 150%, about 100% to about 200%, about 100% to about 300%, about 100% to about 400%, about 100% to about 500%, about 100% to about 750%, about 100% to about 1000%, about 100% to about 2000%, about 150% to about 200%, about 150% to about 300%, about 150% to about 400%, about 150% to about 500%, about 150% to about 750%, about 150% to about 1000%, about 150% to about 1500%, about 150% to about 2000%, about 200% to about 300%, about 200% to about 400%, about 200% to about 500%, about 200% to about 750%, about 200% to about 1000%, about 200% to about 2000%, about 300% to about 400%, about 300% to about 500%, about 300% to about 750%, about 300% to about 1000%, about 300% to about 1500%, about 300% to about 2000%, about 400% to about 500%, about 400% to about 750%, about 400% to about 1000%, about 400% to about 1500%, about 400% to about 2000%, about 500% to about 750%, about 500% to about 1000%, about 500% to about 2000%, about 750% to about 1000%, about 750% to about 2000%, or about 1000% to about 2000% more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when the engineered host cell is fed a growth medium comprising a disaccharide, e.g., sucrose, as its carbon source and the similar cell is fed a growth medium comprising glucose. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide produces at least about 10%, at least about 20%, at least about 50%, at least about 100%, at least about 150%, at least about 200%, at least about 300%, at least about 400%, at least about 500%, at least about 750%, at least about 1000%, at least about 1500%, or at least about 2000%, at least about 3000%, at least about 4000%, at least about 5000%, at least about 7500%, at least about 10000% more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when the engineered host cell is fed a growth medium comprising a disaccharide, e.g., sucrose, as its carbon source and the similar cell is fed a growth medium comprising glucose. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide produces about 10%, about 20%, about 50%, about 100%, about 150%, about 200%, about 300%, about 400%, about 500%, about 750%, about 1000%, about 1500%, or about 2000%, about 3000%, about 4000%, about 5000%, about 7500%, about 10000% more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when the engineered host cell is fed a growth medium comprising a disaccharide, e.g., sucrose, as its carbon source and the similar cell is fed a growth medium comprising glucose. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide produces at most about 20%, about 50%, about 100%, about 150%, about 200%, about 300%, about 400%, about 500%, about 750%, about 1000%, about 1500%, or about 2000%, about 3000%, about 4000%, about 5000%, about 7500%, about 10000% more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when the engineered host cell is fed a growth medium comprising a disaccharide, e.g., sucrose, as its carbon source and the similar cell is fed a growth medium comprising glucose.


In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide secretes more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when the engineered host cell is fed a growth medium comprising a disaccharide, e.g., sucrose, as its carbon source and the similar cell is fed a growth medium comprising glucose. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide secretes about 10% to about 2000% more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when the engineered host cell is fed a growth medium comprising a disaccharide, e.g., sucrose, as its carbon source and the similar cell is fed a growth medium comprising glucose. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide secretes about 10% to about 20%, about 10% to about 50%, about 10% to about 100%, about 10% to about 150%, about 10% to about 200%, about 10% to about 300%, about 10% to about 400%, about 10% to about 500%, about 10% to about 750%, about 10% to about 1000%, about 10% to about 1500%, about 10% to about 2000%, about 20% to about 50%, about 20% to about 100%, about 20% to about 150%, about 20% to about 200%, about 20% to about 300%, about 20% to about 400%, about 20% to about 500%, about 20% to about 750%, about 20% to about 1000%, about 20% to about 1500%, about 20% to about 2000%, about 50% to about 100%, about 50% to about 150%, about 50% to about 200%, about 50% to about 300%, about 50% to about 400%, about 50% to about 500%, about 50% to about 750%, about 50% to about 1000%, about 50% to about 1500%, about 50% to about 2000%, about 100% to about 150%, about 100% to about 200%, about 100% to about 300%, about 100% to about 400%, about 100% to about 500%, about 100% to about 750%, about 100% to about 1000%, about 100% to about 2000%, about 150% to about 200%, about 150% to about 300%, about 150% to about 400%, about 150% to about 500%, about 150% to about 750%, about 150% to about 1000%, about 150% to about 1500%, about 150% to about 2000%, about 200% to about 300%, about 200% to about 400%, about 200% to about 500%, about 200% to about 750%, about 200% to about 1000%, about 200% to about 2000%, about 300% to about 400%, about 300% to about 500%, about 300% to about 750%, about 300% to about 1000%, about 300% to about 1500%, about 300% to about 2000%, about 400% to about 500%, about 400% to about 750%, about 400% to about 1000%, about 400% to about 1500%, about 400% to about 2000%, about 500% to about 750%, about 500% to about 1000%, about 500% to about 2000%, about 750% to about 1000%, about 750% to about 2000%, or about 1000% to about 2000% more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when the engineered host cell is fed a growth medium comprising a disaccharide, e.g., sucrose, as its carbon source and the similar cell is fed a growth medium comprising glucose. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide secretes about 10%, about 20%, about 50%, about 100%, about 150%, about 200%, about 300%, about 400%, about 500%, about 750%, about 1000%, about 1500%, or about 2000%, about 3000%, about 4000%, about 5000%, about 7500%, about 10000% more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when the engineered host cell is fed a growth medium comprising a disaccharide, e.g., sucrose, as its carbon source and the similar cell is fed a growth medium comprising glucose. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide secretes at least about 10%, at least about 20%, at least about 50%, at least about 100%, at least about 150%, at least about 200%, at least about 300%, at least about 400%, at least about 500%, at least about 750%, at least about 1000%, at least about 1500%, or at least about 2000%, at least about 3000%, at least about 4000%, at least about 5000%, at least about 7500%, at least about 10000% more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when the engineered host cell is fed a growth medium comprising a disaccharide, e.g., sucrose, as its carbon source and the similar cell is fed a growth medium comprising glucose. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide secretes at most about 20%, about 50%, about 100%, about 150%, about 200%, about 300%, about 400%, about 500%, about 750%, about 1000%, about 1500%, or about 2000%, about 3000%, about 4000%, about 5000%, about 7500%, about 10000% more of the protein of interest compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when the engineered host cell is fed a growth medium comprising a disaccharide, e.g., sucrose, as its carbon source and the similar cell is fed a growth medium comprising glucose.


Cell Growth in Host Cells with an Alternative Carbon Source


In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide provides about the same amount cellular proliferation and/or cellular growth compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide. In these embodiments, “about the same amount” includes from about 1% to about 10%—more or less—cellular proliferation and/or cellular growth.


In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide provides more cellular proliferation and/or cellular growth compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide provides about 1% to about 200% more cellular proliferation and/or cellular growth compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide provides about 1% to about 2%, about 1% to about 5%, about 1% to about 10%, about 1% to about 15%, about 1% to about 20%, about 1% to about 30%, about 1% to about 40%, about 1% to about 50%, about 1% to about 75%, about 1% to about 100%, about 1% to about 150%, about 1% to about 200%, about 2% to about 5%, about 2% to about 10%, about 2% to about 15%, about 2% to about 20%, about 2% to about 30%, about 2% to about 40%, about 2% to about 50%, about 2% to about 75%, about 2% to about 100%, about 2% to about 150%, about 2% to about 200%, about 5% to about 10%, about 5% to about 15%, about 5% to about 20%, about 5% to about 30%, about 5% to about 40%, about 5% to about 50%, about 5% to about 75%, about 5% to about 100%, about 5% to about 150%, about 5% to about 200%, about 10% to about 15%, about 10% to about 20%, about 10% to about 30%, about 10% to about 40%, about 10% to about 50%, about 10% to about 75%, about 10% to about 100%, about 10% to about 200%, about 15% to about 20%, about 15% to about 30%, about 15% to about 40%, about 15% to about 50%, about 15% to about 75%, about 15% to about 100%, about 15% to about 150%, about 15% to about 200%, about 20% to about 30%, about 20% to about 40%, about 20% to about 50%, about 20% to about 75%, about 20% to about 100%, about 20% to about 200%, about 30% to about 40%, about 30% to about 50%, about 30% to about 75%, about 30% to about 100%, about 30% to about 150%, about 30% to about 200%, about 40% to about 50%, about 40% to about 75%, about 40% to about 100%, about 40% to about 150%, about 40% to about 200%, about 50% to about 75%, about 50% to about 100%, about 50% to about 200%, about 75% to about 100%, about 75% to about 200%, or about 100% to about 200% more cellular proliferation and/or cellular growth compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide provides about 1%, about 2%, about 5%, about 10%, about 15%, about 20%, about 30%, about 40%, about 50%, about 75%, about 100%, about 150%, or about 200% more cellular proliferation and/or cellular growth compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide provides at least about 1%, about 2%, about 5%, about 10%, about 15%, about 20%, about 30%, about 40%, about 50%, about 75%, about 150%, or about 100% more cellular proliferation and/or cellular growth compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide provides at most about 20%, about 50%, about 100%, about 150%, about 200%, about 300%, about 400%, about 500%, about 750%, about 1000%, about 1500%, or about 2000%, about 3000%, about 4000%, about 5000%, about 7500%, about 10000% more cellular proliferation and/or cellular growth compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide.


In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide provides about the same amount cellular proliferation and/or cellular growth compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when each are fed a growth medium comprising glucose as its carbon source. In these embodiments, “about the same amount” includes from about 1% to about 10%—more or less—cellular proliferation and/or cellular growth.


In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide provides more cellular proliferation and/or cellular growth compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when each are fed a growth medium comprising glucose as its carbon source. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide provides about 1% to about 200% more cellular proliferation and/or cellular growth compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when each are fed a growth medium comprising glucose as its carbon source. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide provides about 1% to about 2%, about 1% to about 5%, about 1% to about 10%, about 1% to about 15%, about 1% to about 20%, about 1% to about 30%, about 1% to about 40%, about 1% to about 50%, about 1% to about 75%, about 1% to about 100%, about 1% to about 150%, about 1% to about 200%, about 2% to about 5%, about 2% to about 10%, about 2% to about 15%, about 2% to about 20%, about 2% to about 30%, about 2% to about 40%, about 2% to about 50%, about 2% to about 75%, about 2% to about 100%, about 2% to about 150%, about 2% to about 200%, about 5% to about 10%, about 5% to about 15%, about 5% to about 20%, about 5% to about 30%, about 5% to about 40%, about 5% to about 50%, about 5% to about 75%, about 5% to about 100%, about 5% to about 150%, about 5% to about 200%, about 10% to about 15%, about 10% to about 20%, about 10% to about 30%, about 10% to about 40%, about 10% to about 50%, about 10% to about 75%, about 10% to about 100%, about 10% to about 200%, about 15% to about 20%, about 15% to about 30%, about 15% to about 40%, about 15% to about 50%, about 15% to about 75%, about 15% to about 100%, about 15% to about 150%, about 15% to about 200%, about 20% to about 30%, about 20% to about 40%, about 20% to about 50%, about 20% to about 75%, about 20% to about 100%, about 20% to about 200%, about 30% to about 40%, about 30% to about 50%, about 30% to about 75%, about 30% to about 100%, about 30% to about 150%, about 30% to about 200%, about 40% to about 50%, about 40% to about 75%, about 40% to about 100%, about 40% to about 150%, about 40% to about 200%, about 50% to about 75%, about 50% to about 100%, about 50% to about 200%, about 75% to about 100%, about 75% to about 200%, or about 100% to about 200% more cellular proliferation and/or cellular growth compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when each are fed a growth medium comprising glucose as its carbon source. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide provides about 1%, about 2%, about 5%, about 10%, about 15%, about 20%, about 30%, about 40%, about 50%, about 75%, about 100%, about 150%, or about 200% more cellular proliferation and/or cellular growth compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when each are fed a growth medium comprising glucose as its carbon source. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide provides at least about 1%, about 2%, about 5%, about 10%, about 15%, about 20%, about 30%, about 40%, about 50%, about 75%, about 150%, or about 100% more cellular proliferation and/or cellular growth compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when each are fed a growth medium comprising glucose as its carbon source. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide provides at most about 20%, about 50%, about 100%, about 150%, about 200%, about 300%, about 400%, about 500%, about 750%, about 1000%, about 1500%, or about 2000%, about 3000%, about 4000%, about 5000%, about 7500%, about 10000% more cellular proliferation and/or cellular growth compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when each are fed a growth medium comprising glucose as its carbon source.


In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide provides more cellular proliferation and/or cellular growth compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when each are fed a growth medium comprising a disaccharide, e.g., sucrose, as its carbon source. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide provides about 10% to about 2000% more cellular proliferation and/or cellular growth compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when each are fed a growth medium comprising a disaccharide, e.g., sucrose, as its carbon source. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide provides about 10% to about 20%, about 10% to about 50%, about 10% to about 100%, about 10% to about 150%, about 10% to about 200%, about 10% to about 300%, about 10% to about 400%, about 10% to about 500%, about 10% to about 750%, about 10% to about 1000%, about 10% to about 1500%, about 10% to about 2000%, about 20% to about 50%, about 20% to about 100%, about 20% to about 150%, about 20% to about 200%, about 20% to about 300%, about 20% to about 400%, about 20% to about 500%, about 20% to about 750%, about 20% to about 1000%, about 20% to about 1500%, about 20% to about 2000%, about 50% to about 100%, about 50% to about 150%, about 50% to about 200%, about 50% to about 300%, about 50% to about 400%, about 50% to about 500%, about 50% to about 750%, about 50% to about 1000%, about 50% to about 1500%, about 50% to about 2000%, about 100% to about 150%, about 100% to about 200%, about 100% to about 300%, about 100% to about 400%, about 100% to about 500%, about 100% to about 750%, about 100% to about 1000%, about 100% to about 2000%, about 150% to about 200%, about 150% to about 300%, about 150% to about 400%, about 150% to about 500%, about 150% to about 750%, about 150% to about 1000%, about 150% to about 1500%, about 150% to about 2000%, about 200% to about 300%, about 200% to about 400%, about 200% to about 500%, about 200% to about 750%, about 200% to about 1000%, about 200% to about 2000%, about 300% to about 400%, about 300% to about 500%, about 300% to about 750%, about 300% to about 1000%, about 300% to about 1500%, about 300% to about 2000%, about 400% to about 500%, about 400% to about 750%, about 400% to about 1000%, about 400% to about 1500%, about 400% to about 2000%, about 500% to about 750%, about 500% to about 1000%, about 500% to about 2000%, about 750% to about 1000%, about 750% to about 2000%, or about 1000% to about 2000% more cellular proliferation and/or cellular growth compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when each are fed a growth medium comprising a disaccharide, e.g., sucrose, as its carbon source. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide provides at least about 10%, at least about 20%, at least about 50%, at least about 100%, at least about 150%, at least about 200%, at least about 300%, at least about 400%, at least about 500%, at least about 750%, at least about 1000%, at least about 1500%, or at least about 2000%, at least about 3000%, at least about 4000%, at least about 5000%, at least about 7500%, at least about 10000% more cellular proliferation and/or cellular growth compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when each are fed a growth medium comprising a disaccharide, e.g., sucrose, as its carbon source. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide provides about 10%, about 20%, about 50%, about 100%, about 150%, about 200%, about 300%, about 400%, about 500%, about 750%, about 1000%, about 1500%, or about 2000%, about 3000%, about 4000%, about 5000%, about 7500%, about 10000% more cellular proliferation and/or cellular growth compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when each are fed a growth medium comprising a disaccharide, e.g., sucrose, as its carbon source. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide provides at most about 20%, about 50%, about 100%, about 150%, about 200%, about 300%, about 400%, about 500%, about 750%, about 1000%, about 1500%, or about 2000%, about 3000%, about 4000%, about 5000%, about 7500%, about 10000% more cellular proliferation and/or cellular growth compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when each are fed a growth medium comprising a disaccharide, e.g., sucrose, as its carbon source.


In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide provides more cellular proliferation and/or cellular growth compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when the engineered host cell is fed a growth medium comprising a disaccharide, e.g., sucrose, as its carbon source and the similar cell is fed a growth medium comprising glucose. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide provides about 10% to about 2000% more cellular proliferation and/or cellular growth compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when the engineered host cell is fed a growth medium comprising a disaccharide, e.g., sucrose, as its carbon source and the similar cell is fed a growth medium comprising glucose. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide provides about 10% to about 20%, about 10% to about 50%, about 10% to about 100%, about 10% to about 150%, about 10% to about 200%, about 10% to about 300%, about 10% to about 400%, about 10% to about 500%, about 10% to about 750%, about 10% to about 1000%, about 10% to about 1500%, about 10% to about 2000%, about 20% to about 50%, about 20% to about 100%, about 20% to about 150%, about 20% to about 200%, about 20% to about 300%, about 20% to about 400%, about 20% to about 500%, about 20% to about 750%, about 20% to about 1000%, about 20% to about 1500%, about 20% to about 2000%, about 50% to about 100%, about 50% to about 150%, about 50% to about 200%, about 50% to about 300%, about 50% to about 400%, about 50% to about 500%, about 50% to about 750%, about 50% to about 1000%, about 50% to about 1500%, about 50% to about 2000%, about 100% to about 150%, about 100% to about 200%, about 100% to about 300%, about 100% to about 400%, about 100% to about 500%, about 100% to about 750%, about 100% to about 1000%, about 100% to about 2000%, about 150% to about 200%, about 150% to about 300%, about 150% to about 400%, about 150% to about 500%, about 150% to about 750%, about 150% to about 1000%, about 150% to about 1500%, about 150% to about 2000%, about 200% to about 300%, about 200% to about 400%, about 200% to about 500%, about 200% to about 750%, about 200% to about 1000%, about 200% to about 2000%, about 300% to about 400%, about 300% to about 500%, about 300% to about 750%, about 300% to about 1000%, about 300% to about 1500%, about 300% to about 2000%, about 400% to about 500%, about 400% to about 750%, about 400% to about 1000%, about 400% to about 1500%, about 400% to about 2000%, about 500% to about 750%, about 500% to about 1000%, about 500% to about 2000%, about 750% to about 1000%, about 750% to about 2000%, or about 1000% to about 2000% more cellular proliferation and/or cellular growth compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when the engineered host cell is fed a growth medium comprising a disaccharide, e.g., sucrose, as its carbon source and the similar cell is fed a growth medium comprising glucose. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide provides at least about 10%, at least about 20%, at least about 50%, at least about 100%, at least about 150%, at least about 200%, at least about 300%, at least about 400%, at least about 500%, at least about 750%, at least about 1000%, at least about 1500%, or at least about 2000%, at least about 3000%, at least about 4000%, at least about 5000%, at least about 7500%, at least about 10000% more cellular proliferation and/or cellular growth compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when the engineered host cell is fed a growth medium comprising a disaccharide, e.g., sucrose, as its carbon source and the similar cell is fed a growth medium comprising glucose. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide provides about 10%, about 20%, about 50%, about 100%, about 150%, about 200%, about 300%, about 400%, about 500%, about 750%, about 1000%, about 1500%, or about 2000%, about 3000%, about 4000%, about 5000%, about 7500%, about 10000% more cellular proliferation and/or cellular growth compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when the engineered host cell is fed a growth medium comprising a disaccharide, e.g., sucrose, as its carbon source and the similar cell is fed a growth medium comprising glucose. In some embodiments, the engineered host cell which expresses a surface-displayed enzyme that hydrolyses a disaccharide provides at most about 20%, about 50%, about 100%, about 150%, about 200%, about 300%, about 400%, about 500%, about 750%, about 1000%, about 1500%, or about 2000%, about 3000%, about 4000%, about 5000%, about 7500%, about 10000% more cellular proliferation and/or cellular growth compared to a similar cell that does not express a surface-displayed enzyme that hydrolyses a disaccharide when the engineered host cell is fed a growth medium comprising a disaccharide, e.g., sucrose, as its carbon source and the similar cell is fed a growth medium comprising glucose.


Any aspect or embodiment described herein can be combined with any other aspect or embodiment as disclosed herein.


Definitions

Unless defined otherwise, all terms of art, notations and other technical and scientific terms or terminology used herein are intended to have the same meaning as is commonly understood by one of ordinary skill in the art to which the claimed subject matter pertains. In some cases, terms with commonly understood meanings are defined herein for clarity and/or for ready reference, and the inclusion of such definitions herein should not necessarily be construed to represent a substantial difference over what is generally understood in the art.


As used in the specification and claims, the singular forms “a”, “an” and “the” include plural references unless the context clearly dictates otherwise.


As used herein, the phrases “at least one”, “one or more”, and “and/or” are open-ended expressions that are both conjunctive and disjunctive in operation. For example, each of the expressions “at least one of A, B and C”, “at least one of A, B, or C”, “one or more of A, B, and C”, “one or more of A, B, or C” and “A, B, and/or C” mean A alone, B alone, C alone, A and B together, A and C together, B and C together, or A, B and C together.


As used herein, “or” may refer to “and”, “or,” or “and/or” and may be used both exclusively and inclusively. For example, the term “A or B” may refer to “A or B”, “A but not B”, “B but not A”, and “A and B”. In some cases, context may dictate a particular meaning.


The term “about” or “approximately” means within an acceptable error range for the particular value as determined by one of ordinary skill in the art, which will depend in part on how the value is measured or determined, e.g., the limitations of the measurement system. For example, “about” can mean within 1 or more than 1 standard deviation, per the practice in the art. Alternatively, “about” can mean a range of up to 20%, up to 15%, up to 10%, up to 5%, or up to 1% of a given value. Alternatively, particularly with respect to biological systems or processes, the term can mean within an order of magnitude, preferably within 5-fold, and more preferably within 2-fold, of a value. Where particular values are described in the application and claims, unless otherwise stated the term “about” meaning within an acceptable error range for the particular value should be assumed.


The term “substantially” is meant to be a significant extent, for the most part; or essentially. In other words, the term substantially may mean nearly exact to the desired attribute or slightly different from the exact attribute. Substantially may be indistinguishable from the desired attribute. Substantially may be distinguishable from the desired attribute but the difference is unimportant or negligible.


The terms “comprise”, “comprising”, “contain,” “containing,” “including”, “includes”, “having”, “has”, “with”, or variants thereof as used in either the present disclosure and/or in the claims, are intended to be inclusive in a manner similar to the term “comprising.”


Throughout this application, various embodiments may be presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the disclosure. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.


The terms “increased”, “increasing”, or “increase” are used herein to generally mean an increase by a statically significant amount relative to a reference level. In some aspects, the terms “increased,” or “increase,” mean an increase of at least 10% as compared to a reference level, for example an increase of at least about 10%, at least about 20%, or at least about 30%, or at least about 40%, or at least about 50%, or at least about 60%, or at least about 70%, or at least about 80%, or at least about 90% or up to and including a 100% increase or any increase between 10-100% as compared to a reference level. Other examples of “increase” include an increase of at least 2-fold, at least 5-fold, at least 10-fold, at least 20-fold, at least 50-fold, at least 100-fold, at least 1000-fold or more as compared to a reference level.


The terms “decreased”, “decreasing”, or “decrease” are used herein generally to mean a decrease in a value relative to a reference level. In some aspects, “decreased” or “decrease” means a reduction by at least 10% as compared to a reference level, for example a decrease by at least about 20%, or at least about 30%, or at least about 40%, or at least about 50%, or at least about 60%, or at least about 70%, or at least about 80%, or at least about 90% or up to and including a 100% decrease (e.g., absent level or non-detectable level as compared to a reference level), or any decrease between 10-100% as compared to a reference level.


As used herein, “engineered” host cells are host cells which have been manipulated using genetic engineering, i.e., by human intervention. When a host cell is “engineered to underexpress” a given protein, the host cell is manipulated such that the host cell has no longer the capability to express the protein described or a functional homologue thereof such as a non-engineered host cell.


“Prior to engineering” when used in the context of host cells of the present invention means that such host cells are not engineered such that a polynucleotide encoding a recombinant protein or functional homologue thereof is not expressed.


A nucleic acid is “operably linked” when it is placed into a functional relationship with another nucleic acid sequence on the same nucleic acid molecule. For example, a promoter is operably linked with a coding sequence of a recombinant gene when it is capable of effecting the expression of that coding sequence.


For the purpose of the present invention the term “protein” is also meant to encompass functional homologues of the proteins described.


Sequence identity, such as for the purpose of assessing percent complementarity, may be measured by any suitable alignment algorithm, including but not limited to the Needleman-Wunsch algorithm (see e.g., the EMBOSS Needle aligner available at the World Wide Web at ebi.ac.uk/Tools/psa/emboss_needle/nucleotide.html, optionally with default settings), the BLAST algorithm (see e.g., the BLAST alignment tool available at blast.ncbi.nlm.nih.gov/Blast.cgi, optionally with default settings), and the Smith-Waterman algorithm (see e.g., the EMBOSS Water aligner available at the World Wide Web at ebi.ac.uk/Tools/psa/emboss_water/nucleotide.html, optionally with default settings). Optimal alignment may be assessed using any suitable parameters of a chosen algorithm, including default parameters.


The term “bird” includes both domesticated birds and non-domesticated birds such as wildlife and the like. Birds include, but are not limited to, poultry, fowl, waterfowl, game bird, ratite (e.g., flightless bird), chicken (Gallus Gallus, Gallus domesticus, or Gallus Gallus domesticus), quail, turkey, duck, ostrich (Struthio camelus), Somali ostrich (Struthio molybdophanes), goose, gull, guineafowl, pheasant, emu (Dromaius novaehollandiae), American rhea (Rhea americana), Darwin's rhea (Rhea pennata), and kiwi. Tissues, cells, and their progeny of a biological entity obtained in vivo or cultured in vitro are also encompassed. A bird may lay eggs.


ADDITIONAL EMBODIMENTS

Embodiment 1: An engineered host cell comprising: an integrated coding sequence of a fusion protein comprising a catalytic domain of a heterologous glycosyl hydrolase; and an integrated coding sequence of a heterologous protein of interest (POI). In this embodiment, the engineered host cell does not endogenously express the glycosyl hydrolase and the POI; and the glycosyl hydrolase is anchored on the surface of the engineered host cell.


Embodiment 2: A method of growing/culturing the engineered host cell of Embodiment 1, wherein the method comprises culturing the engineered host cell with a carbon source that is not naturally utilized by the host cell in the absence of the glycosyl hydrolase.


Embodiment 3: A method for growing/culturing a host cell with a carbon source that is not naturally utilized by the host cell, the method comprising: (a) recombinantly producing in the host cell a fusion protein comprising a catalytic domain of a glycosyl hydrolase capable of digesting sucrose; optionally, wherein the glycosyl hydrolase capable of digesting sucrose is an invertase; and (b) recombinantly producing in the host cell a heterologous protein of interest (POI). In this embodiment, the host cell does not express the glycosyl hydrolase endogenously and the engineered host cell prior to step (a) does not utilize sucrose as a carbon source as efficiently as glucose, and wherein the glycosyl hydrolase is expressed on the surface of the engineered host cell.


Embodiment 4: A method for manufacturing a host cell capable of utilizing a carbon source that is not naturally utilized by the host cell, the method comprising: (a) obtaining a host cell that recombinantly expresses a fusion protein comprising a catalytic domain of a glycosyl hydrolase capable of digesting sucrose; optionally, wherein the glycosyl hydrolase capable of digesting sucrose is an invertase; and (b) genetically modifying the host cell to express a heterologous protein of interest (POI). In this embodiment, the host cell does not utilize sucrose as a carbon source as efficiently as glucose in the absence of the glycosyl hydrolase.


Embodiment 5: A method for manufacturing a host cell capable of utilizing a carbon source that is not naturally utilized by the host cell, the method comprising: (a) obtaining a host cell that recombinantly expresses a heterologous protein of interest (POI); and (b) genetically modifying the host cell to express a fusion protein comprising a catalytic domain of a glycosyl hydrolase capable of digesting sucrose, optionally, the glycosyl hydrolase capable of digesting sucrose is an invertase. In this embodiment, the host cell prior to step (b) does not utilize sucrose as a carbon source as efficiently as glucose.


Embodiment 6: The engineered host cell of Embodiment 1 or the method of Embodiment 2, wherein the glycosyl hydrolase is an invertase from S. cerevisiae.


Embodiment 7: The engineered host cell or the method of Embodiment 3, wherein the invertase is encoded by the SUC2 gene.


Embodiment 8: The engineered host cell or the method of Embodiment 3, wherein the invertase is encoded by the MAL1 gene.


Embodiment 9: The engineered host cell or the method of any one of the previous claims, wherein the fusion protein is surface-displayed on the engineered host cell; wherein the surface-displayed fusion protein comprises a catalytic domain of the glycosyl hydrolase and an anchoring domain of a glycosylphosphatidylinositol (GPI)-anchored protein, wherein the anchoring domain comprises at least about 200 amino acids and/or at least about 30% of the residues in the anchoring domain are serines or threonines.


Embodiment 10: The engineered host cell or the method of Embodiment 9, wherein the anchoring domain comprises at least about 225 amino acids, at least about 250 amino acids, at least about 275 amino acids, at least about 300 amino acids, at least about 325 amino acids, at least about 350 amino acids, at least about 375 amino acids, or at least about 400 amino acids.


Embodiment 11: The engineered host cell or the method of Embodiment 9 or Embodiment 10, wherein at least about 35% of the residues in the anchoring domain are serines or threonines, at least about 40% of the residues in the anchoring domain are serines or threonines, at least about 45% of the residues in the anchoring domain are serines or threonines, or at least about 50% of the residues in the anchoring domain are serines or threonines.


Embodiment 12: The engineered host cell or the method of Embodiment 11, wherein the serines or threonines in the anchoring domain are capable of being O-mannosylated.


Embodiment 13: The engineered host cell or the method of any one of the preceding claims, wherein a fusion protein having an anchoring domain comprising at least about 325 amino acids provides greater glycosyl hydrolase activity relative to a fusion protein having an anchoring domain comprising less than about 300 amino acids.


Embodiment 14: The engineered host cell or the method of any one of the preceding claims, wherein a fusion protein having an anchoring domain comprising at least about 300 amino acids provides greater glycosyl hydrolase activity relative to a fusion protein having an anchoring domain comprising less than about 250 amino acids.


Embodiment 15: The engineered host cell or the method of any one of the preceding claims, wherein the fusion protein comprises the anchoring domain of the GPI anchored protein.


Embodiment 16: The engineered host cell or the method of any one of the preceding claims, wherein the fusion protein comprises the GPI anchored protein without its native signal peptide or native secretory signal.


Embodiment 17: The engineered host cell or the method of any one of the preceding claims, wherein the GPI anchored protein is not native to the engineered host cell.


Embodiment 18: The engineered host cell or the method of any one of the preceding claims, wherein the GPI anchored protein is naturally expressed by a S. cerevisiae cell and the engineered host cell is not a S. cerevisiae cell.


Embodiment 19: The engineered host cell or the method of any one of the preceding claims, wherein the GPI anchored protein is selected from Tir4, Dan1, or Sed1.


Embodiment 20: The engineered host cell or the method of Embodiment 19, wherein an anchoring domain of the GPI anchored protein comprises an amino acid sequence that is at least 70% identical, at least 75% identical, at least 80% identical, at least 85% identical, at least 90% identical, or at least 95% identical, to one of SEQ ID NO: 1 to SEQ ID NO: 14.


Embodiment 21: The engineered host cell or the method of Embodiment 19 or Embodiment 20, wherein the anchoring domain of the GPI anchored protein comprises an amino acid sequence of one of SEQ ID NO: 1 to SEQ ID NO: 14.


Embodiment 22: The engineered host cell or the method of any one of the preceding claims, wherein the engineered host cell is a yeast cell.


Embodiment 23: The engineered host cell or the method of any one of the preceding claims, wherein the engineered host cell is a Pichia species.


Embodiment 24: The engineered host cell or the method of Embodiment 23, wherein the Pichia species is Pichia pastoris.


Embodiment 25: The engineered host cell or the method of any one of the preceding claims, wherein the engineered host cell comprises a genomic modification that expresses the fusion.


Embodiment 26: The engineered host cell or the method of any one of the preceding claims, wherein the fusion protein comprises a portion of the glycosyl hydrolase in addition to its catalytic domain.


Embodiment 27: The engineered host cell or the method of any one of the preceding claims, wherein the fusion protein comprises substantially the entire amino acid sequence of the glycosyl hydrolase.


Embodiment 28: The engineered host cell or the method of any one of Embodiments 20-27, wherein in the fusion protein, the catalytic domain is N-terminal to the anchoring domain.


Embodiment 29: The engineered host cell or the method of any one of Embodiments 20-27, wherein in the fusion protein, the catalytic domain is C-terminal to the anchoring domain.


Embodiment 30: The engineered host cell or the method of any one of the preceding claims, wherein the fusion protein comprises a linker between the catalytic domain and the anchoring domain.


Embodiment 31: The engineered host cell or the method of any one of the preceding claims, wherein, upon translation, the fusion protein comprises a signal peptide and/or a secretory signal.


Embodiment 32: The engineered host cell or the method of any one of the preceding claims, wherein a growth rate of the engineered host cell in a media containing sucrose as a primary carbon source is higher than a growth rate of a control host cell, wherein the control host cell is identical to the engineered host cell, except the control cell does not express the glycosyl hydrolase.


Embodiment 33: The engineered eukaryotic cell of any one of the preceding claims, wherein the engineered eukaryotic cell comprises a genomic modification that overexpresses a secreted recombinant protein and/or comprises an extrachromosomal modification that overexpresses a secreted recombinant protein.


Embodiment 34: The engineered eukaryotic cell of Embodiment 33, wherein the secreted recombinant protein is an animal protein.


Embodiment 35: The engineered eukaryotic cell of Embodiment 34, wherein the animal protein is an egg protein.


Embodiment 36: The engineered eukaryotic cell of Embodiment 35, wherein the egg protein is selected from the group consisting of ovalbumin, ovomucoid, lysozyme ovoglobulin G2, ovoglobulin G3, α-ovomucin, β-ovomucin, ovotransferrin, ovoinhibitor, ovoglycoprotein, flavoprotein, ovomacroglobulin, ovostatin, cystatin, avidin, ovalbumin related protein X, and ovalbumin related protein Y.


Embodiment 37: The engineered eukaryotic cell of any one of Embodiments 33 to 36, wherein the genomic modification and/or the extrachromosomal modification that overexpresses the secreted recombinant protein comprises an inducible promoter.


Embodiment 38: The engineered eukaryotic cell of Embodiment 37, wherein the inducible promoter is an AOX1, DAK2, PEX11, FLD1, FGH1, DAS1, DAS2, CAT1, MDH3, HAC1, BiP, RAD30, RVS161-2, MPP10, THP3, TLR, GBP2, PMP20, SHB17, PEX8, PEX4, or TKL3 promoter.


Embodiment 39: The engineered eukaryotic cell of any one of Embodiments 33 to 38, wherein the genomic modification and/or the extrachromosomal modification that overexpresses a secreted recombinant protein comprises an AOX1, TDH3, MOX, RPS25A, or RPL2A terminator.


Embodiment 40: The engineered eukaryotic cell of any one of Embodiments 33 to 39, wherein the genomic modification and/or the extrachromosomal modification that overexpresses a secreted recombinant protein encodes a signal peptide and/or a secretory signal.


Embodiment 41: The engineered eukaryotic cell of any one of Embodiments 33 to 40, wherein the genomic modification and/or the extrachromosomal modification that overexpresses a secreted recombinant protein comprises codons that are optimized for the species of the engineered eukaryotic cell.


Embodiment 42: The engineered eukaryotic cell of any one of Embodiments 33 to 41, wherein the secreted recombinant protein is designed to be secreted from the cell and/or is capable of being secreted from the cell.


INCORPORATION BY REFERENCE

All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference.


Additional aspects and advantages of the present disclosure will become readily apparent to those skilled in this art from the following detailed description, wherein only illustrative embodiments of the present disclosure are shown and described. As will be realized, the present disclosure is capable of other and different embodiments, and its several details are capable of modifications in various obvious respects, all without departing from the disclosure. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.


EXAMPLES

The following examples are given for the purpose of illustrating various embodiments of the invention and are not meant to limit the present invention in any fashion. The present examples, along with the methods described herein are presently representative of preferred embodiments, are exemplary, and are not intended as limitations on the scope of the invention. Changes therein and other uses which are encompassed within the spirit of the invention as defined by the scope of the claims will occur to those skilled in the art.


Example 1: Growth of P. Pastoris on Carbon Sources Prior to Engineering

A background strain (strain 1) was used as a test strain. The genetic modifications present in strain 1 are deletion of AOX1 and AOX2. No target protein cassettes were present in this strain. strain 1 was plated on minimal nutrient plates containing Glucose, Fructose, or Sucrose.


As shown in FIG. 1 the strain was able to grow on glucose and fructose at similar rates and had similar colony sizes. The strain grew to pinprick sized colonies on sucrose and stops. Without wishing to be bound by theory, it appears that sucrose source may naturally contain a small amount of hydrolyzed material, which produces separated glucose and fructose molecules.


Example 2: Expression Constructs, Transformation, and Processing

A surface displayed invertase (suc2) from Saccharomyces cerevisiae was transformed into a high performing strain (strain 2; parent strain) previously transformed to express recombinant ovalbumin (rOVA). Strains 3 and Strain 4 are considered a “high-performing strain”. The fusion protein was driven by PGCW14, a highly expressed constitutive promoter. The DNA sequence for the expression cassette and the amino acid sequence for the fusion protein are disclosed herein respectively as SEQ ID NO: 314 and SEQ ID NO: 315. The DNA sequence encoded a secretion signal between the promoter and the SUC2 sequence, thereby permitting the invertase to become displayed on the outer surface of the cell.


In high throughput screening, those transformants which successfully expressed rOVA protein when fed sucrose, i.e., those transformants that expressed rOVA and the surface displayed invertase, were able to achieve a 50% or more increase in productivity when compared to the same strains when fed glucose alone. Candidate strains were picked into sucrose-containing media and grown for 24 hours. The starter cultures were divided equally and inoculated either sucrose-containing media or glucose-containing media for high throughput screening. Data from eight high performing candidate strains, showing growth and productivity comparisons when fed different carbon sources is shown below in Table 11. The parent strain strain 2 is unable to grow and express recombinant protein when fed sucrose, therefore all strain 2 comparisons below are made relative to its performance in glucose.

















TABLE 11










Supernatant protein
Supernatant protein







Supernatant protein

concentration in
concentration in
Productivity in
Productivity in



OD* in
OD in
concentration in
Productivity in
sucrose vs strain
glucose vs strain 2
sucrose vs strain
glucose vs strain


Strain
sucrose
glucose
sucrose vs glucose
sucrose vs glucose
2in glucose
in glucose
2in glucose
2in glucose























1
16.76
14.02
0.81
0.68
1.09
1.34
0.77
1.13


2
17.16
14.2
0.92
0.76
1.04
1.13
0.71
0.93


3
15.8
13.37
0.79
0.67
0.99
1.25
0.74
1.10


4
16.41
14.29
1.15
1.00
0.98
0.85
0.71
0.70


5
19.29
17.66
1.15
1.05
0.87
0.76
0.53
0.50


6
16.66
14.59
0.76
0.66
0.87
1.14
0.61
0.92


7
17.04
13.67
0.67
0.54
0.75
1.12
0.52
0.96


8
16.14
14.45
0.61
0.55
0.68
1.11
0.49
0.90









In Table 11, above, optical density (OD) is an indirect measure of cell density in culture, thus reflecting cell growth. For reference, strain 2 achieved OD's of 1.14 in sucrose (practically no growth) and 11.76 in glucose. The columns of Table 11 reciting “vs. strain 2” show a relative comparison of protein production of a candidate strain using sucrose or glucose as a food source compared to strain 2 using glucose as a food source. Numbers shown in columns 3-8 show relative ratios of protein production. The ratios shown in Table 11 are described below:


The column entitled: “Supernatant protein concentration in sucrose vs glucose” in Table 11 shows ratios of the concentration of recombinantly-expressed protein measured in the culture supernatant when comparing sucrose-fed cultures to glucose-fed cultures.


The column entitled: “Productivity in sucrose vs glucose” in Table 11 shows ratios comparing sucrose-fed cultures to glucose-fed cultures. Productivity was measured by protein concentration in supernatant divided by OD; by dividing by the culture's OD, a “per-cell” protein productivity was determined.


The column entitled: “Supernatant protein concentration in sucrose vs strain 2 in glucose” in Table 11 shows ratios of protein concentration measured in the culture supernatant when comparing sucrose-fed cultures of each candidate strain to glucose-fed cultures of the parent strain strain 2.


The column entitled: “Supernatant protein concentration in glucose vs strain 2 in glucose” in Table 11 shows ratios of protein concentration measured in the culture supernatant when comparing glucose-fed cultures of each candidate strain to glucose-fed cultures of the parent strain strain 2.


The column entitled: “Productivity in sucrose vs strain 2 in glucose” in Table 11 shows ratios of per cell productivity comparing sucrose-fed cultures of each candidate strain to glucose-fed cultures of the parent strain strain 2.


The column entitled: “Productivity in glucose vs strain 2 in glucose” in Table 11 shows ratios of per cell productivity comparing glucose-fed cultures of each candidate strain to glucose-fed cultures of the parent strain strain 2.


All candidate strains grew more cell mass when fed sucrose when compared to their cell mass when fed glucose. When considering protein concentration and productivity by the candidate strains when fed sucrose in comparison to the strain 2 strain when fed glucose, candidate strains 1 to 4 each performed well, with similar supernatant protein concentration to parent and from about 71% to 77% productivity. The data herein show that candidate strains that were fed sucrose were as efficient as making protein as the strain 2 parent strain fed with glucose.



FIG. 4 illustrates the comparison of growth on glucose (G) (shown as “_D in FIG. 4) vs sucrose (S) (shown as “_S” in FIG. 4) of various background strains and the candidate strains which were engineered to display invertase. Strain 2, strain 1, and strain 11 are background strains which express rOVA, strain 12 is a “wild-type” P. pastoris strain, and strain 3 and strain 4 were engineered express the Suc2 construct (strain 2+Suc2-Tir4, i.e., the surface displayed invertase fusion protein). Although each strain achieved OD600 values of 10 or higher when grown in glucose-containing media, only the strains which were engineered to express the surface displayed invertase fusion protein could achieve such levels with sucrose was the main carbon source in a media. All other media components were the same, final concentrations of sugar (either sucrose or glucose) in the media were 0.5%. OD600 measures the amount turbidity of a culture, which is related to the amount of cells present in the culture and is an indicator of cell proliferation/cell growth.


Example 3: Growth of Engineered P. pastoris Using Sucrose as a Carbon Source

A surface displayed invertase (suc2) from Saccharomyces cerevisiae was transformed into a P2 strain (strain 5) which was previously transformed to express recombinant ovalbumin (rOVA). Performance of the suc2-expressing strain, referred to herein is strain 6, was evaluated in a 250 mL bioreactor. The strain 6 strain produced rOVA at a similar titer and quality as the strain 5 when fed either glucose or sucrose, as measured qualitatively by SDS-PAGE (FIG. 5) and quantitatively by HPLC (Table 12). The strain 6 strain and the control strain 5 strain (which expressed rOVA but did not express suc2) were run in bioreactors in parallel to undergo similar fermentation processes. Inclusion of either glucose or sucrose as the carbon source in a culturing media was the only variable. Strain 6 was further evaluated in a 50:50 glucose:fructose feed (not shown). The strain performed similarly in the 50:50 feed compared to sucrose feed, suggesting that its metabolism when fed sucrose is not rate limited by the sucrose hydrolysis step carried out by SUC2.


In FIG. 5 and Table 12: 194 and 195 are data for parent strain (strain 5) grown on glucose, 196 and 197 are data for a surface displayed suc2-expressing strain strain 6 grown on glucose; and 198 and 199 are data for a suc2-expressing strain 6 grown on sucrose. P2.1-P2-3 are data the standard strain 5 sample loaded as a reference. P2.1-P2.3 are a protein standard (not generated by strain 5) of known concentration loaded for reference. The standard sample was generated using an in-house strain expressing P2 and the protein was column purified to be used as an internal protein standard.


The performance measured by HPLC (Table 12) represents the broth titer of fermentation normalized to the average of the control (strain 5 that lacks suc2, fed glucose as the carbon source, run on Bay 194 and Bay 195).














TABLE 12









Carbon
Performance* normalized



Sample
Strain
source
average of control









Bay 194
strain 5 control
Glucose
1.03



Bay 195
strain 5 control
Glucose
0.97



Bay 196
strain 6
Glucose
1.02



Bay 197
strain 6
Glucose
1.01



Bay 198
strain 6
Sucrose
0.99



Bay 199
strain 6
Sucrose
0.99







*Broth titer of fermentation






To determine if hydrolysis of sucrose into glucose and fructose by the surface displayed invertase fusion protein affects cell growth and/or recombinant protein expression amounts, the strain 6 strain was a fed a media comprising equal parts of glucose and fructose and compared to the strain 6 strain fed a medium comprising an equivalent amount of sucrose. The strain 6 strain performed similarly when the two conditions were compared as shown in Table 12; suggesting that the extra step of hydrolyzing sucrose is not rate limiting to the cell growth and protein expression processes.

Claims
  • 1. An engineered host cell comprising: an integrated coding sequence of a fusion protein comprising a catalytic domain of a heterologous glycosyl hydrolase; andan integrated coding sequence of a heterologous protein of interest (POI); wherein the engineered host cell does not endogenously express the glycosyl hydrolase and the POI; andwherein the glycosyl hydrolase is anchored on the surface of the engineered host cell.
  • 2. The engineered host cell of claim 1, wherein the glycosyl hydrolase is an invertase selected from: S. cerevisiae, Kluyveromyces lactis, Cyberlindnera jadinii, Oryza sativa japonica (rice), Oryza sativa japonica (rice), Arabidopsis thaliana, Arabidopsis thaliana, Arabidopsis thaliana, Rattus norvegicus (rat), Oryctolagus cuniculus (Rabbit), and Homo sapiens.
  • 3-4. (canceled)
  • 5. The engineered host cell of claim 1, wherein the invertase is encoded by a gene selected from: SUC2, MAL1, invertase (INV1), cytosolic invertase 1 (CINV1), CIN2, CINV1, INVA, INVE, and sucrase-isomaltase (SI) gene.
  • 6. The engineered host cell of claim 1, wherein the fusion protein is surface-displayed on the engineered host cell; wherein the surface-displayed fusion protein comprises a catalytic domain of the glycosyl hydrolase and an anchoring domain of a glycosylphosphatidylinositol (GPI)-anchored protein, wherein the anchoring domain comprises at least about 200 amino acids and/or at least about 30% of the residues in the anchoring domain are serines or threonines.
  • 7-8. (canceled)
  • 9. The engineered host cell of claim 8, wherein the serines or threonines in the anchoring domain are capable of being O-mannosylated.
  • 10. The engineered host cell of claim 6, wherein a fusion protein having an anchoring domain comprising at least about 325 amino acids provides greater glycosyl hydrolase activity relative to a fusion protein having an anchoring domain comprising less than about 300 amino acids or less than about 250 amino acids.
  • 11-12. (canceled)
  • 13. The engineered host cell of claim 1, wherein the fusion protein comprises the GPI anchored protein without its native signal peptide or native secretory signal to the engineering host cell.
  • 14. (canceled)
  • 15. The engineered host cell of claim 1, wherein the GPI anchored protein is naturally expressed by a S. cerevisiae cell and the engineered host cell is not a S. cerevisiae cell.
  • 16. The engineered host cell of claim 13, wherein the GPI anchored protein is selected from Tir4, Dan1, or Sed1.
  • 17. The engineered host cell of claim 1, wherein an anchoring domain of the GPI anchored protein comprises an amino acid sequence that is at least 70% identical to one of SEQ ID NO: 1 to SEQ ID NO: 14.
  • 18. (canceled)
  • 19. The engineered host cell of claim 1, wherein the engineered host cell is a yeast cell or a Pichia species.
  • 20. (canceled)
  • 21. The engineered host cell of claim 19, wherein the Pichia species is Pichia pastoris.
  • 22. The engineered host cell of claim 1, wherein the engineered host cell comprises a genomic modification that expresses the fusion or a portion of the glycosyl hydrolase in addition to its catalytic domain.
  • 23-24. (canceled)
  • 25. The engineered host cell of claim 1, wherein in the fusion protein, the catalytic domain is N-terminal to the anchoring domain, or wherein in the fusion protein, the catalytic domain is C-terminal to the anchoring domain.
  • 26. (canceled)
  • 27. The engineered host cell of claim 1, wherein the fusion protein comprises a linker between the catalytic domain and the anchoring domain.
  • 28. (canceled)
  • 29. The engineered host cell of claim 1, wherein a growth rate of the engineered host cell in a media containing sucrose as a primary carbon source is higher than a growth rate of a control host cell, wherein the control host cell is identical to the engineered host cell, except the control cell does not express the glycosyl hydrolase.
  • 30. The engineered eukaryotic cell of claim 1, wherein the engineered eukaryotic cell comprises a genomic modification that overexpresses a secreted recombinant protein and/or comprises an extrachromosomal modification that overexpresses a secreted recombinant protein.
  • 31. The engineered eukaryotic cell of claim 30, wherein the secreted recombinant protein is an egg protein.
  • 32. (canceled)
  • 33. The engineered eukaryotic cell of claim 31, wherein the egg protein is selected from the group consisting of ovalbumin, ovomucoid, lysozyme ovoglobulin G2, ovoglobulin G3, α-ovomucin, β-ovomucin, ovotransferrin, ovoinhibitor, ovoglycoprotein, flavoprotein, ovomacroglobulin, ovostatin, cystatin, avidin, ovalbumin related protein X, and ovalbumin related protein Y.
  • 34. The engineered eukaryotic cell of claim 30, wherein the genomic modification and/or the extrachromosomal modification that overexpresses the secreted recombinant protein comprises an inducible promoter selected from an A0X1, DAK2, PEX11, FLD1, FGH1, DAS1, DAS2, CAT1, MDH3, HAC1, BiP, RAD30, RVS161-2, MPP10, THP3, TLR, GBP2, PMP20, SHB17, PEX8, PEX4, or TKL3 promoter, and/or a terminator selected from an AOX1, TDH3, MOX, RPS25A, or RPL2A terminator.
  • 35-36. (canceled)
  • 37. The engineered eukaryotic cell of claim 30, wherein the genomic modification and/or the extrachromosomal modification that overexpresses a secreted recombinant protein encodes a signal peptide, a secretory signal, and/or codons that are optimized for the species of the engineered eukaryotic cell.
  • 38. (canceled)
  • 39. The engineered eukaryotic cell of claim 30, wherein the secreted recombinant protein is designed to be secreted from the cell and/or is capable of being secreted from the cell.
  • 40. The engineered eukaryotic cell of claim 1, wherein the fusion protein comprises an amino acid sequence that is at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identical to the amino acid sequence selected from SEQ ID NOs: 315, 332-335, and 342.
  • 41. The engineered eukaryotic cell of claim 1, wherein the fusion protein comprises a nucleotide sequence that is at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% identical to the nucleotide sequence of SEQ ID ON: 314.
  • 42. A method of growing/culturing the engineered host cell of claim 1, wherein the method comprises culturing the engineered host cell with a carbon source that is not naturally utilized by the host cell in the absence of the glycosyl hydrolase.
  • 43. A method for growing/culturing a host cell with a carbon source that is not naturally utilized by the host cell, the method comprising: (a) recombinantly producing in the host cell, a fusion protein comprising a catalytic domain of a glycosyl hydrolase capable of digesting sucrose; optionally, wherein the glycosyl hydrolase capable of digesting sucrose is an invertase;(b) recombinantly producing in the host cell a heterologous protein of interest (POI); wherein the host cell does not express the glycosyl hydrolase endogenously;wherein the engineered host cell prior to step (a) does not utilize sucrose as a carbon source as efficiently as glucose, and wherein the glycosyl hydrolase is expressed on the surface of the engineered host cell.
  • 44. A method for manufacturing a host cell capable of utilizing a carbon source that is not naturally utilized by the host cell, the method comprising: (a) obtaining a host cell that recombinantly expresses a fusion protein comprising a catalytic domain of a glycosyl hydrolase capable of digesting sucrose, wherein the glycosyl hydrolase capable of digesting sucrose is an invertase; and(b) genetically modifying the host cell to express a heterologous protein of interest (POI); wherein the host cell does not utilize sucrose as a carbon source as efficiently as glucose in the absence of the glycosyl hydrolase.
  • 45. A method for manufacturing a host cell capable of utilizing a carbon source that is not naturally utilized by the host cell, the method comprising: (a) obtaining a host cell that recombinantly expresses a heterologous protein of interest (POI); and(b) genetically modifying the host cell to express a fusion protein comprising a catalytic domain of a glycosyl hydrolase capable of digesting sucrose; wherein the glycosyl hydrolase capable of digesting sucrose is an invertase;wherein the host cell prior to step (b) does not utilize sucrose as a carbon source as efficiently as glucose.
CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to and benefit of U.S. Provisional Application No. 63/356,972, filed Jun. 29, 2022; which is herein incorporated by reference in its entirety.

Provisional Applications (1)
Number Date Country
63356972 Jun 2022 US