HETEROLOGOUS EXPRESSION OF CARBOHYDRATE BINDING MODULES AND USES THEREOF FOR CADAVERINE PRODUCTION

BACKGROUND OF THE INVENTION

The class of proteins known as acid decarboxylases is a group of enzymes that catalyze the decarboxylase reaction of basic amino acids (e.g., lysine, arginine, ornithine) in order to generate polyamines as part of the acid stress response in many microorganisms. Escherichia coli has several pyridoxal phosphate-dependent (PLP)-dependent) acid decarboxylases: CadA, LdcC, AdiA, SpeA, SpeC, SpeF, GadA, and GadB. All of these enzymes function within a narrow pH range, and the enzyme activity decreases significantly outside of that pH range (Kanjee et al., Biochemistry 50, 9388-9398, 2011). It has been previously observed that these PLP-dependent decarboxylases dimerize in order to form a complete active site. In some cases, such as CadA, the dimers form decamers that aggregate into higher molecular weight protein complexes required for optimal function. The inhibition of higher molecular weight protein complex formation (e.g., in conditions outside of the optimal pH) leads to a significant decrease in function (Kanjee et al., The EMBO Journal 30, 931-944, 2011).

The PLP-dependent decarboxylases that catalyze the conversion of lysine to cadaverine are lysine decarboxylases (e.g., CadA, LdcC, and their homologs). Lysine decarboxylases are of particular interest, because cadaverine can be a platform chemical for the production of various products, such as 1,5-pentamethylene diisocyanate or polyamide 56. However, the production of cadaverine is harmful to the host cell, since cadaverine has been shown to be toxic to cells when present above a certain concentration (Qian et al., Biotechnol. Bioeng. 108, 93-103, 2011). Therefore, the temporal control of cadaverine production in order to separate cell growth from the production of the toxic substance is important for achieving high product yields and low production costs.

One method to achieve the temporal control of cadaverine production is by controlling the timing of the expression of the lysine decarboxylase gene using an inducible promoter. An ideal inducible promoter system exhibits no gene expression (and therefore no enzyme production and activity) in the absence of the inducer, and gene expression is turned ON only after the addition of the inducer. This inducible control provides a process that separates the growth phase of the cell from the production of an enzyme whose activity is harmful to the cell. The production of the harmful enzyme after the cell completes its growth phase reduces the toxic effect on the cell, since most toxic effects mainly inhibit cell growth and have less effect on cell function. In some scenarios, an inducible promoter has leaky expression, which means that there is a small amount of expression even in the absence of the inducer. Leaky expression is a problem if the cell is extremely sensitive to the toxic effect of the enzyme.

Alternatively, the most common method to achieve temporal control is to physically separate the lysine production step from the cadaverine production step, so that lysine production is not inhibited by the presence of cadaverine. The cadaverine production step can be achieved in many ways by adding a solution containing lysine to: 1) free cells that heterologously express a lysine decarboxylase gene inside the cell (PCT/CN2014/080873), free cells producing the lysine decarboxylase gene on the cell surface (JP2004298033), immobilized cells producing the lysine decarboxylase protein inside the cell (Bhatia et al., Bioprocess Biosyst. Eng. 38, 2315-2322, 2015; Kim et al., I Industrial and Eng. Chemistry 46, 44-48, 2017), free purified or partially purified lysine decarboxylase (Park et al., J. Microbiol. Biotechnol. 27, 289-296, 2017), or immobilized lysine decarboxylase proteins (Seo et al, Process Biochemistry 51, 1413-1419, 2016).

Processes using cell immobilization have also been reported. Generally, cell immobilization involves either binding the cells to a support or entrapping the cells inside of a matrix. Commonly used materials for the matrix include polyacrylamide, polyurethane, alginate, collagen, and k-carragenan. The binding of a cell to a support is achieved by displaying a polypeptide on the surface of the cell's membrane that enables the cell to bind more strongly to a designated surface. For example, E. coli was modified to display catecholamine on its surface, and the engineered cell could bind more strongly to silica and glass microparticles, gold, titanium, silicon, polyethylene terephthalate, polyurethane, and polydimethylsiloxane (Park et al., Appl. Environ. Microbiol. 80,43-53, 2014).

Immobilized cells for the production of the amino acids aspartate using chitosan gel (Szymanska et al., Polish I Microbiol. 60, 105-112, 2011), and the amino acid glutamine using carrageenan gel (Amin et al., World Appl. Sciences 1 2, 62-67, 2007) have also been proposed. The major challenges in the development and commercialization of an immobilized cell process are productivity and the added cost of the support.

BRIEF SUMMARY OF ASPECTS OF THE DISCLOSURE

In one aspect, the present disclosure presents methods and compositions to immobilize cells overproducing a lysine decarboxylase protein onto a carbohydrate matrix. Carbohydrates (e.g., cellulose, hemicellulose, lignocellulose, xylan, chitin) are a relatively cheap and renewable material. Furthermore, carbohydrate substrates can require little to no modification for immobilization. The embodiments described herein thus provide the ability to maximize the number of times the immobilized cells are reused and reduces the cost of cadaverine production. Furthermore, unlike immobilization of the enzyme, cell lysis or an enzyme recovery step is not required.

In one aspect, the invention is further based, in part, on the surprising discovery that the co-expression of a carbohydrate binding module gene with a lysine decarboxylase gene increases the cell's ability to convert lysine to cadaverine.

In one aspect, provided herein is a host cell genetically modified to express a polynucleotide that encodes a carbohydrate binding module (CBM) fusion polypeptide on the cell surface and to overexpress a lysine decarboxylase, wherein the CBM fusion polypeptide comprises a CBM joined to a surface display polypeptide. In some embodiments, the lysine decarboxylase is exogenous, i.e., it is expressed by a polynucleotide encoding the lysine decarboxylase that is introduced into the host cell. In some embodiments, the host cell is a bacterial host cell. In some embodiments, the CBM is a cellulose binding domain (CBD), which in some embodiments, can be from an exoglucanase, an endoglucanase, a cellulose binding protein, a cellobiohydrilase I or cellobiohydrilase II, a xylanase, or a CipA or CipB inclusion protein. In some embodiments, the CBD is from a xylanase, a cellobiohydrilase I, a cellobiohydrilase II, an exoglucanase, or a cellulose binding protein. In some embodiments, the CBD comprises the amino acid sequence of any one of SEQ ID NOS:13-19. In some embodiments, the surface display polypeptide comprises a region of an outer membrane protein, e.g., OmpA. In some embodiments, the surface display polypeptide comprises amino acids 46-159 of SEQ ID NO:24. In some embodiments, the polynucleotide that encodes the CBM fusion polypeptide comprises a region that encodes a leader peptide that is heterologous to the surface display polypeptide. In some embodiments, the leader peptide is an Escherichia coli lipoprotein (Lpp) leader peptide. In some embodiments, the CBM is at the C-terminal end of the fusion polypeptide or within 15 amino acids of the C-terminal end of the fusion polypeptide. In some embodiments, the lysine decarboxylase is CadA or LdcC.

In some embodiments, the host cell is from the genus Escherichia, Hafnia, or Corynebacteria. In some embodiments, the host cell is an Escherichia coli, Hafnia alvei, or Corynebacterium glutamicum cell.

In a further aspect, provided herein is a cell culture comprising a host cell as described herein, e.g., in the preceding paragraph. In some embodiments, the cell culture is incubated with a carbohydrate substrate. In some embodiments, the carbohydrate substrate is a cellulose substrate.

In a further aspect, provided herein is a method of producing cadaverine, the method comprising incubating a cell culture comprising a population of host cells with a carbohydrate substrate, wherein the host cell population is genetically modified to express a polynucleotide that encodes a carbohydrate binding module (CBM) fusion polypeptide on the cell surface and to overexpress lysine decarboxylase, under condition in which the CBM fusion polypeptide and the lysine decarboxylase are expressed, wherein the CBM fusion polypeptide comprises a CBM joined to a surface display polypeptide. In some embodiments, the method further comprises isolating the cadverine. In some embodiments, the host cells are bacterial host cells. In some embodiments, the lysine decarboxylase is an exogenous lysine decarboxylase expressed by a polynucleotide that is introduced into the host cell. In some embodiments, the CBM is a cellulose binding domain (CBD), which in some embodiments, can be from an exoglucanase, an endoglucanase, a cellulose binding protein, a cellobiohydrilase I or cellobiohydrilase II, a xylanase, or a CipA or CipB inclusion protein.

In some embodiments, the CBD is from a xylanase, a cellobiohydrilase I, a cellobiohydrilase II, an exoglucanase, or a cellulose binding protein. In some embodiments, the CBD comprises the amino acid sequence of any one of SEQ ID NOS:13-19. In some embodiments, the surface display polypeptide comprises a region of an outer membrane protein, e.g., OmpA. In some embodiments, the surface display polypeptide comprises amino acids 46-159 of SEQ ID NO:24. In some embodiments, the polynucleotide that encodes the CBM fusion polypeptide comprises a region that encodes a leader peptide that is heterologous to the surface display polypeptide. In some embodiments, the leader peptide is an Escherichia coli lipoprotein (Lpp) leader peptide. In some embodiments, the CBM is at the C-terminal end of the fusion polypeptide or within 15 amino acids of the C-terminal end of the fusion polypeptide. In some embodiments, the lysine decarboxylase is CadA or LdcC. In some embodiments, the host cell population is from the genus Escherichia, Hafnia, or Corynebacteria. In some embodiments, the host cells are Escherichia coli, Hafnia alvei, or Corynebacterium glutamicum cells.

In another aspect, provided herein is a method of obtaining a genetically modified host cell for the production of cadverine, the method comprising: expressing a polynucleotide that encodes a carbohydrate binding module (CBM) fusion polypeptide on the cell surface in a host cell genetically modified to overexpress a lysine decarboxylase, wherein the CBM fusion polypeptide comprises a CBM joined to a surface display polypeptide; and selecting a host cell that produces an increased amount of cadaverine compared to a counterpart host cell that does not overexpress the lysine decarboxylase. In some embodiments, the lysine decarboxylase is an exogenous lysine decarboxylase expressed by a polynucleotide introduced into the host cell. In some embodiments, the host cell is a bacterial host cell. In some embodiments, the CBM is a cellulose binding domain (CBD), which in some embodiments, can be from an exoglucanase, an endoglucanase, a cellulose binding protein, a cellobiohydrilase I or cellobiohydrilase II, a xylanase, or a CipA or CipB inclusion protein. In some embodiments, the CBD is from a xylanase, a cellobiohydrilase I, a cellobiohydrilase

II, an exoglucanase, or a cellulose binding protein. In some embodiments, the CBD comprises the amino acid sequence of any one of SEQ ID NOS:13-19. In some embodiments, the surface display polypeptide comprises a region of an outer membrane protein, e.g., OmpA. In some embodiments, the surface display polypeptide comprises amino acids 46-159 of SEQ ID NO:24. In some embodiments, the polynucleotide that encodes the

CBM fusion polypeptide comprises a region that encodes a leader peptide that is heterologous to the surface display polypeptide. In some embodiments, the leader peptide is an Escherichia coli lipoprotein (Lpp) leader peptide. In some embodiments, the CBM is at the C-terminal end of the fusion polypeptide or within 15 amino acids of the C-terminal end of the fusion polypeptide. In some embodiments, the lysine decarboxylase is CadA or LdcC.

DESCRIPTION OF THE FIGURES

FIG. 1 shows an SDS-PAGE of the soluble and insoluble fractions of a H. avlei cell culture overexpressing either CadA without a CBD (111) or with a CBD (193).

DETAILED DESCRIPTION OF ASPECTS OF THE DISCLOSURE

Before the present invention is described, it is to be understood that this invention is not limited to particular embodiments described, as such may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, the preferred methods and materials are now described. All publications and accession numbers mentioned herein are incorporated herein by reference to disclose and describe the methods and/or materials in connection with which the publications are cited.

Terminology

As used herein, “carbohydrate binding module or “CBM” refers to a region of a polypeptide that binds to carbohydrates. Reference to a “CBM” includes a variant, such as a CBM that contains one or more conservative substitutions relative to a naturally occurring CBM sequence, that retains at least 30%, typically at least 50%, at least 80%, or 100%, or greater, of the carbohydrate binding activity of the naturally occurring CBM.

The term “cellulose binding domain” as used herein refers to CBM that binds to cellulose. Reference to a “CBD” includes a variant, such as a CBD that contains one or more conservative substitutions relative to a naturally occurring CBD sequence, that retains at least 30%, typically at least 50%, at least 80%, or 100%, or greater, of the carbohydrate binding activity of the naturally occurring CBD.

As used herein, a “CBM fusion polypeptide” refers to a recombinant polypeptide that comprises a CBM joined to a polypeptide that is expressed on the surface of a cell. The

CBM may be directly fused to the cell surface polypeptide or the fusion protein may comprise a peptide linker that joins the CBM to the cell surface polypeptide.

A “CBD fusion polypeptide” as used here refers to a CBM fusion polypeptide in which the carbohydrate binding module is a cellulose binding domain.

A “surface display polypeptide” as used herein refers to a region of a polypeptide that is expressed on the extracellular surface of a microorganism.

The terms “polynucleotide” and “nucleic acid” are used interchangeably and refer to a single or double-stranded polymer of deoxyribonucleotide or ribonucleotide bases read from the 5′ to the 3′ end. A nucleic acid as used in the present invention will generally contain phosphodiester bonds, although in some cases, nucleic acid analogs may be used that may have alternate backbones, comprising, e.g., phosphoramidate, phosphorothioate, phosphorodithioate, or O-methylphosphoroamidite linkages (see Eckstein, Oligonucleotides and Analogues: A Practical Approach, Oxford University Press); positive backbones; non-ionic backbones, and non-ribose backbones. Nucleic acids or polynucleotides may also include modified nucleotides that permit correct read-through by a polymerase. “Polynucleotide sequence” or “nucleic acid sequence” includes both the sense and antisense strands of a nucleic acid as either individual single strands or in a duplex. As will be appreciated by those in the art, the depiction of a single strand also defines the sequence of the complementary strand; thus, the sequences described herein also provide the complement of the sequence. Unless otherwise indicated, a particular nucleic acid sequence also implicitly encompasses variants thereof (e.g., degenerate codon substitutions) and complementary sequences, as well as the sequence explicitly indicated. The nucleic acid may be DNA, both genomic and cDNA, RNA or a hybrid, where the nucleic acid may contain combinations of deoxyribo- and ribo-nucleotides, and combinations of bases, including uracil, adenine, thymine, cytosine, guanine, inosine, xanthine hypoxanthine, isocytosine, isoguanine, etc.

The term “substantially identical” as used here with reference to a polypeptide sequence, refers to a sequence that has at least 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%, compared to a reference sequence using the programs described herein; preferably BLAST using standard parameters, as described below.

Two nucleic acid sequences or polypeptide sequences are said to be “identical” if the sequence of nucleotides or amino acid residues, respectively, in the two sequences is the same when aligned for maximum correspondence as described below. The terms “identical” or percent “identity,” in the context of two or more nucleic acids or polypeptide sequences, refer to two or more sequences or subsequences that are the same or have a specified percentage of amino acid residues or nucleotides that are the same, when compared and aligned for maximum correspondence over a comparison window, as measured using one of the following sequence comparison algorithms or by manual alignment and visual inspection.

For sequence comparison, typically one sequence acts as a reference sequence, to which test sequences are compared. When using a sequence comparison algorithm, test and reference sequences are entered into a computer, subsequence coordinates are designated, if necessary, and sequence algorithm program parameters are designated. Default program parameters can be used, or alternative parameters can be designated. The sequence comparison algorithm then calculates the percent sequence identities for the test sequences relative to the reference sequence, based on the program parameters.

A “comparison window,” as used herein, includes reference to a segment of any one of the number of contiguous positions selected from the group consisting of from 20 to 600, usually about 50 to about 200, more usually about 100 to about 150 in which a sequence may be compared to a reference sequence of the same number of contiguous positions after the two sequences are optimally aligned. Methods of alignment of sequences for comparison are well-known in the art. Optimal alignment of sequences for comparison can be conducted, e.g., by the local homology algorithm of Smith & Waterman, Adv. Appl. Math. 2:482 (1981), by the homology alignment algorithm of Needleman & Wunsch, J. Mol. Biol. 48:443 (1970), by the search for similarity method of Pearson & Lipman, Proc. Nat'l. Acad. Sci. USA 85:2444 (1988), by computerized implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group, 575 Science Dr., Madison, Wis.), or by manual alignment and visual inspection. Optimal alignments are typically conducted using BLASTP with default parameters.

Nucleic acid or protein sequences that are substantially identical to a reference sequence include “conservatively modified variants.” With respect to particular nucleic acid sequences, conservatively modified variants refer to those nucleic acids which encode identical or essentially identical amino acid sequences, or where the nucleic acid does not encode an amino acid sequence, to essentially identical sequences. Because of the degeneracy of the genetic code, a large number of functionally identical nucleic acids encode any given protein. For instance, the codons GCA, GCC, GCG and GCU all encode the amino acid alanine. Thus, at every position where an alanine is specified by a codon, the codon can be altered to any of the corresponding codons described without altering the encoded polypeptide. Such nucleic acid variations are “silent variations,” which are one species of conservatively modified variations. Every nucleic acid sequence herein which encodes a polypeptide also describes every possible silent variation of the nucleic acid. One of skill will recognize that each codon in a nucleic acid (except AUG, which is ordinarily the only codon for methionine) can be modified to yield a functionally identical molecule.

Accordingly, each silent variation of a nucleic acid which encodes a polypeptide is implicit in each described sequence.

The term “polypeptide” as used herein includes reference to polypeptides containing naturally occurring amino acids and amino acid backbones as well as non-naturally occurring amino acids and amino acid analogs.

As to amino acid sequences, one of skill will recognize that individual substitutions, in a nucleic acid, peptide, polypeptide, or protein sequence which alters a single amino acid or a small percentage of amino acids in the encoded sequence is a “conservatively modified variant” where the alteration results in the substitution of an amino acid with a chemically similar amino acid. Conservative substitution tables providing functionally similar amino acids are well known in the art. Examples of amino acid groups defined in this manner can include: a “charged/polar group” including Glu (Glutamic acid or E), Asp (Aspartic acid or D), Asn (Asparagine or N), Gln (Glutamine or Q), Lys (Lysine or K), Arg (Arginine or R) and His (Histidine or H); an “aromatic or cyclic group” including Pro (Proline or P), Phe (Phenylalanine or F), Tyr (Tyrosine or Y) and Trp (Tryptophan or W); and an “aliphatic group” including Gly (Glycine or G), Ala (Alanine or A), Val (Valine or V), Leu (Leucine or L), Ile (Isoleucine or I), Met (Methionine or M), Ser (Serine or S), Thr (Threonine or T) and Cys (Cysteine or C). Within each group, subgroups can also be identified. For example, the group of charged/polar amino acids can be sub-divided into sub-groups including: the “positively-charged sub-group” comprising Lys, Arg and His; the “negatively-charged sub-group” comprising Glu and Asp; and the “polar sub-group” comprising Asn and Gln. In another example, the aromatic or cyclic group can be sub-divided into sub-groups including: the “nitrogen ring sub-group” comprising Pro, His and Trp; and the “phenyl sub-group” comprising Phe and Tyr. In another further example, the aliphatic group can be sub-divided into sub-groups including: the “large aliphatic non-polar sub-group” comprising Val, Leu and Ile; the “aliphatic slightly-polar sub-group” comprising Met, Ser, Thr and Cys; and the “small-residue sub-group” comprising Gly and Ala. Examples of conservative mutations include amino acid substitutions of amino acids within the sub-groups above, such as, but not limited to: Lys for Arg or vice versa, such that a positive charge can be maintained; Glu for Asp or vice versa, such that a negative charge can be maintained; Ser for Thr or vice versa, such that a free --OH can be maintained; and Gln for Asn or vice versa, such that a free—NH2 can be maintained. The following six groups each contain amino acids that further provide illustrative conservative substitutions for one another. 1) Ala, Ser, Thr; 2) Asp, Glu; 3) Asn, Gln; 4) Arg, Lys; 5) Ile, Leu, Met, Val; and 6) Phe, Try, and Trp (see, e.g., Creighton, Proteins (1984)).

A polynucleotide is “heterologous” to an organism or a second polynucleotide sequence if it originates from a foreign species, or, if from the same species, is modified from its original form. For example, when a polynucleotide encoding a polypeptide sequence is said to be operably linked to a heterologous promoter, it means that the polynucleotide coding sequence encoding the polypeptide is derived from one species whereas the promoter sequence is derived from another, different species; or, if both are derived from the same species, the coding sequence is not naturally associated with the promoter (e.g., is a genetically engineered coding sequence, e.g., from a different gene in the same species, or an allele from a different species). Similarly, a polypeptide is “heterologous” to a host cell if the native wildtype host cell does not produce the polypeptide.

The term “exogenous” refers generally to a polynucleotide sequence or polypeptide that is introduced into a host cell by molecular biological techniques, i.e., engineering to produce a recombinant cell. Examples of “exogenous” polynucleotides include vectors, plasmids, and/or man-made nucleic acid constructs encoding a desired protein. An “exogenous” polypeptide expressed in the host cell may occur naturally in the cell or may be heterologous to the host cell. The term also encompasses progeny of the original host cell that has been engineered to express the exogenous polynucleotide or polypeptide sequence, i.e., a host cell that expresses an “exogenous” polynucleotide may be the original genetically modified host cell or a progeny cell that comprises the genetic modification.

The term “endogenous” refers to naturally-occurring polynucleotide sequences or polypeptides that may be found in a given wild-type cell or organism. In this regard, it is also noted that even though an organism may comprise an endogenous copy of a given polynucleotide sequence or gene, the introduction of an expression construct or vector encoding that sequence, such as to over-express or otherwise regulate the expression of the encoded protein, represents an “exogenous” copy of that gene or polynucleotide sequence.

Any of the pathways, genes, or enzymes described herein may utilize or rely on an “endogenous” sequence, which may be provided as one or more “exogenous” polynucleotide sequences, or both.

A host cell that “overexpresses” a lysine decarboxylase refers to a cell that has a genetic modification to overexpress a lysine decarboxylase, e.g., comprises an expression construct that encodes an exogenous lysine decarboxylase. The lysine decarboxylase may be a lysine decarboxylase that is naturally occurs in the wildtype cell or may be a lysine decarboxylase from another organism. In some embodiments, a cell that “overexpresses” a lysine decarboxylase expresses an amount of protein at least 10%, at least 25%, at least 50%, or at least 100% or greater, than that produced by the wildtype counterpart cell of the same strain that does not have the genetic modification.

“Recombinant nucleic acid” or “recombinant polynucleotide” as used herein refers to a polymer of nucleic acids wherein at least one of the following is true: (a) the sequence of nucleic acids is foreign to (i.e., not naturally found in) a given host cell; (b) the sequence may be naturally found in a given host cell, but in an unnatural (e.g., greater than expected) amount; or (c) the sequence of nucleic acids comprises two or more subsequences that are not found in the same relationship to each other in nature. For example, regarding instance (c), a recombinant nucleic acid sequence can have two or more sequences from unrelated genes arranged to make a new functional nucleic acid.

The term “expression cassette” or “DNA construct” or “expression construct” refers to a nucleic acid construct that, when introduced into a host cell, results in transcription and/or translation of an RNA or polypeptide, respectively. In the case of expression of transgenes, one of skill will recognize that the inserted polynucleotide sequence need not be identical, but may be only substantially identical to a sequence of the gene from which it was derived. As explained herein, these substantially identical variants are specifically covered by reference to a specific nucleic acid sequence. One example of an expression cassette is a polynucleotide construct that comprises a polynucleotide sequence encoding a polypeptide of the invention protein operably linked to a promoter, e.g., its native promoter, where the expression cassette is introduced into a heterologous microorganism. In some embodiments, an expression cassette comprises a polynucleotide sequence encoding a polypeptide of the invention where the polynucleotide is introduced into a host cell and is targeted to a position in the genome of the host cell such that expression of the polynucleotide sequence is driven by a promoter that is present in the host cell.

The term “host cell” as used in the context of this invention refers to a microorganism and includes an individual cell or cell culture that can be or has been a recipient of any recombinant vector(s) or isolated polynucleotide(s) of as described herein. Host cells include progeny of a single host cell, and the progeny may not necessarily be completely identical (in morphology or in total DNA complement) to the original parent cell due to natural, accidental, or deliberate mutation and/or change. A host cell includes cells into which a recombinant vector or a polynucleotide of the invention has been introduced, including by transformation, transfection, and the like.

The term “isolated” refers to a material that is substantially or essentially free from components that normally accompany it in its native state. For example, an “isolated polynucleotide,” as used herein, may refer to a polynucleotide that has been isolated from the sequences that flank it in its naturally-occurring or genomic state, e.g., a DNA fragment that has been removed from the sequences that are normally adjacent to the fragment, such as by cloning into a vector. A polynucleotide is considered to be isolated if, for example, it is cloned into a vector that is not a part of the natural environment, or if it is artificially introduced in the genome of a cell in a manner that differs from its naturally-occurring state. Alternatively, an “isolated peptide” or an “isolated polypeptide” and the like, as used herein, may refer to a polypeptide molecule that is free of other components of the cell, i.e., it is not associated with in vivo cellular substances.

The invention employs various routine recombinant nucleic acid techniques. Generally, the nomenclature and the laboratory procedures in recombinant DNA technology described below are commonly employed in the art. Many manuals that provide direction for performing recombinant DNA manipulations are available, e.g., Sambrook & Russell, Molecular Cloning, A Laboratory Manual (3rd Ed, 2001); and Current Protocols in Molecular Biology (Ausubel, et al., John Wiley and Sons, New York, 2009-2016).

Carbohydrate Binding Modules

There are at least 64 families of CBMs that have been identified, and are described in the CAZy database (Cantarel et al., Nuc. Ac. Res. 37, D233-D238, 2009). Cellulose-binding-domains polypeptides are well studied as part of the cellulase system (cellulosome) used by some microorganisms to degrade cellulose into disaccharides and monosaccharides.

There are at least 180 different CBDs that are categorized into 13 families (Carrard et al., PNAS 97, 10342-10347, 2000). Most CBDs fall into one of three families. Family I CBDs are found only in fungi and range in size from 32-36 amino acids. Family II CBDs are found more diversely, and can range in size from 90-100 amino acids. Family III CBDs are also found more diversely, and can range in size from 130-172 amino acids. A review of CBMs and their applications is provided in Shoseyov et al., Microbiol. Mol. Biol. Reviews 70, 283-295, 2006.

In nature, a CBM is often located at either the N terminus or C terminus of the polypeptide, but may also be found in a middle region, rather than the end of a polypeptide. For example, the CBM of the cellobiohydralase I protein from Trichoderma reesei (crystal structure available under protein database accession number PDB ID: 10EL) is located at the C-terminus, while the CBM of the cellobiohydralase II protein from T reesei (structure available under accession number PDB ID: 3CBH) is located at the N-terminus. The CBD of CipA (see, the following paragraph) is contained in the middle of protein. Other microorganisms known to express enzymes that contain CBMs include Trichoderma viridae, Cellomonas fimi, Cellvibrio mixtus, Cellvibrio japonicus, Clostridium thermocellum, Piromyces equi, Penicillium funiculosum, Arthrobacter globiformis, Aspergillus kawachii, Pseudomonas cellulose, Phanerochaete chrysosporium, Paenibacillus sp. Meloidogyne, Agaricus bisporus, Thermotoga maritima, Humicola grisea, and Humicola insolens.

CBMs can bind either reversibly or irreversibly to their substrate. For example, the CBM of cellobiohydrilase I and cellobiohydrilase II of T reesei both bind irreversibly to cellulose (Canard & Linder, Eur. J. Biochem. 262, 637-643, 1999). However, the CBM of the CipA protein of Cellomonas fimi binds reversibly to cellulose (Yaron et al., in Genetics, Biochemistry and Ecology of Cellulose Degradation: Mie Bioforum 98, eds. Ohmiya et al., pp 45-46, 1998).

The crystal structures of various CBDs have been solved. For example, the T reesei cellobiohydrilase I CBD belongs to Family I, and is composed of a wedge-shaped irregular beta-sheet, where one side of the molecule consists of three conserved tyrosine residues and forms a hydrophobic planar surface that is able to bind cellulose. The C. fimi beta-1,4-glycanase CBD belongs to Family II. It is composed of an elongated, nine-stranded beta-barrel, where one side of the barrel has three solvent-exposed tryptophans in addition to other hydrophilic residues that bind the carbohydrate substrate. The Clostridium thermocellum Cip CBD belongs to Family III, and consists of a nine-stranded beta-sandwich jelly roll with a surface exposed planar linear strip of aromatic and polar residues that interact with the cellulose surface (Tormo et al., The EMBO 1 15, 5739-5751, 1996). Throughout Families I, II, and III, the CBDs in general exhibit a flat protein surface moiety that includes two or more conserved solvent-exposed aromatic residues that are able to interact with the carbohydrate substrate.

In some embodiments, a CBD fusion polypeptide of the present invention comprises a CBD, or functional variant thereof, from C. fimi exoglucanase, Accession No. AEA30147.1 (SEQ ID: 1); C. fimi cellulose binding protein, Accession No. WP013770490.1 (SEQ ID: 3); T reesei cellobiohydrilase I, Accession No. P62694.1 (SEQ ID: 7); T reesei cellobiohydrilase II, Accession No. P07987.1 (SEQ ID: 9); or Cellvibrio japonicus XynA, Accession No. P14768.2 (SEQ ID: 11). Of these five enzymes, the CBDs of C. fimi exoglucanase, C. fimi cellulose binding protein, and T reesei cellobiohydrilase I are located at the C terminus. The CBD of C. fimi exoglucanase is annotated as amino acid residues 383-482, the CBD of C. fimi cellulose binding protein is annotated as amino acid residues 266-352, and the CBD of T. reesei cellobiohydrilase I is annotated as amino acids 481-513. In contrast, the CBDs of T reesei cellobiohydrilase II and C. japonicus XynA are located at the N terminus. The CBD of T reesei cellobiohydrilase II is annotated as amino acids 30-62, and the CBD of C. japoniucs XynA is annotated as amino acids 29-125.

In some embodiments a CBD that is contained in a CBD fusion polypeptide of the present invention belongs to Family I or Family II. For example, T reesei cellobiohydrilase I and T reesei cellobiohydrilase II both contain CBDs that belong to Family I, while the C. fimi exoglucanase, C. fimi cellulose binding protein, and C. japonicus XynA contain CBDs that belong to Family II.

In some embodiments, a CBD fusion polypeptide of the present invention comprises a CBD from a cellobiohydrilase, e.g., cellobiohydrilase I or II from T reesei, cellobiohydrilase II from T. harzianum; a CBD from an exoglucanase, e.g., endo-1,4 glucanase from B. subtilis; or a CBD from an exoglucanase, e.g., TEX1 from T viride.

In some embodiments, a CBD or CBM fusion polypeptide comprises a biologically active variant of a naturally occurring CBD or CBM sequence. Biologically active variants include alleles, mutants, fragments, and interspecies homologs of specific CBD or CBM polypeptides described herein that bind carbohydrate. In some embodiments, a CBM or CBD has at least 65% amino acid sequence identity, and in some embodiments at least 70%, 75%, 80%, 85%, 90% identity; often at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% or greater amino acid sequence identity, to a naturally occurring CBM or CBD sequence. In some embodiments, a CBD has at least 65% amino acid sequence identity, or in some embodiments, at least 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% or greater, amino acid sequence identity to any one of SEQ ID NOS:13 to 19. In some embodiments, a biologically active CBM or CBD variant polypeptide may comprise one or more deletions relative to a naturally occurring CBD amino acid sequence, so long as the CBD retains activities. In some embodiments, 1, 2, 3, 4, or 5 amino acids may be deleted, e.g., from the N-terminal and/or C-terminal end of a family II CBD, relative to the naturally occurring family II CBD amino acid sequence. In some embodiments, 1 or 2, or 1, 2, or 3, amino acids may be deleted, e.g., from the N-terminal and/or C-terminal end of a family I CBD, relative to the naturally occurring family I CBD amino acid sequence. In some embodiments, the CBD regions of a fusion polypeptide of the invention may comprise additional amino acids at the N-terminal and/or C-terminal end, e.g., from the native protein in which the CBD occurs. In some embodiments, a fusion polypeptide of the invention may comprise more than one CBD regions, which may be the same or different.

Carbohydrate binding activity of a CBD, e.g., cellulose binding activity, can be tested directly or can be tested indirectly, e.g., by measuring a downstream endpoint such as cadaverine production as illustrated in the examples. Carbohydrate binding activity can also be directly measured by taking advantage of the solvent exposed aromatic residues that characterizes most CBDs. For example, a solution containing carbohydrate binding activity can be exposed to cellulose, and the change in the amount of fluorescence in solution could be measured in order to determine the amount of binding activity present. Fluorescence measurements can be based on the change in the amount of exposed tyrosine residues (excitation 274 nm/emission 303 nm), phenylalanine residues (excitation 257 nm/emission 282 nm), or tryptophan residues (excitation 280 nm/emission 348 nm) left in solution before and after exposure to cellulose. A biologically active CBD variant typically has at least 30%, at least 50%, at least 75%, or greater, of the binding activity of the naturally occurring CBD.

Cell Surface Display Proteins

A fusion polypeptide of the present invention further comprises a region that targets the CBM or CBD for expression on the cell surface. Such a region is also referred to herein as a display polypeptide. Cell surface display systems found in various microorganisms have been well described and can be categorized into seven types: outer membrane proteins (e.g.,

OmpA, OmpC, LamB, and FhuA), cell surface appendages (e.g., flagella, pili, fimbriae), lipoprotein-derived (Lpp), virulence factors-based (e.g., EaeA inimin), Tat-dependent, autotransporter-dependent (Neisseria IgA, E. coli AIDA-I), and ice nucleating protein-based (P. syringiae Inp or InaV) (Chen & Georgiou, Biotechnol. and Bioeng. 79, 496-503, 2002). These different systems and their applications are reviewed in Bloois et al., Trends in Biotechnol. 29, 79-86, 2011. In addition, a combination of two or more of the above systems can also be used to display a protein on the surface of a cell. For example, a CBM or CBD fusion polypeptide of the present invention may contain an E. coli lipoprotein Lpp with a fragment of the E. coli outer membrane protein (Georgiou et al., Protein Eng. 9, 239-247, 1996).

In some embodiments, a CBM or CBD fusion polypeptide of the present invention comprises a leader polypeptide that is heterologous to the display polypeptide. Thus, for example, in some embodiments, the display polypeptide sequence may be from an Omp polypeptide such as an OmpA polypeptide, or an alternative Omp polypeptide, while the leader peptide is from a heterologous protein.

The fusion polypeptide may be configured such that the CBM or CBD is at or near, e.g., within 10 or 15 amino acids, of the end, e.g., the C-terminal end, of the cell surface display polypeptide so long as the CBM or CBD is displayed on the cell surface. In some embodiments, a CBM or CBD fusion polypeptide may contain the carbohydrate binding domain in the middle of the protein or a region that is greater than 15 amino acids from the end of the cell surface protein, so long as the CBM or CBD is displayed on the cell surface. In some embodiments, the fusion polypeptide may comprise a peptide linker, e.g., a flexible linker, which typically comprise small, nonpolar or polar amino acids such as Gly, Asn, Ser, Thr, Ala, and the like, joining the carbohydrate binding domain sequence to the display protein sequence.

Nucleic Acids Encoding CBM or CBD Fusion Polypeptides

Isolation or generation of CBD polynucleotide sequences and for incorporation to a fusion polypeptide that comprises a CBD joined to a cell surface display polypeptide can be accomplished by a number of techniques. In some embodiments, oligonucleotide probes based on the sequences disclosed here can be used to identify the desired polynucleotide in a cDNA or genomic DNA library from a desired bacteria species. Alternatively, the nucleic acids of interest are amplified from nucleic acid samples using routine amplification techniques. For instance, PCR may be used to amplify the sequences of the genes directly from mRNA, from cDNA, from genomic libraries or cDNA libraries. Appropriate primers and probes for identifying a CBD-containing polynucleotide in bacteria can be generated based on known parameters. Illustrative primer sequences are shown in the Table of Primers in the Examples section. Although obtaining a polynucleotide that encodes a desired polypeptide is illustrated in this paragraph with reference to CBD domains, it is understood that a polynucleotide encoding any polypeptide of interest, e.g., a surface display polypeptide, a leader peptide, or a lysine decarboxylase polypeptide, can be obtained using the same techniques.

Nucleic acid sequences encoding a CBM or CBD fusion polypeptide and/or a lysine decarboxylase polypeptide, may additionally be codon-optimized for expression in a desired host cell. Methods and databases that can be employed are known in the art. For example, preferred codons may be determined in relation to codon usage in a single gene, a set of genes of common function or origin, highly expressed genes, the codon frequency in the aggregate protein coding regions of the whole organism, codon frequency in the aggregate protein coding regions of related organisms, or combinations thereof. See e.g., Henaut and Danchin in “Escherichia coli and Salmonella,” Neidhardt, et al. Eds., ASM Pres, Washington D.C. (1996), pp. 2047-2066; Nucleic Acids Res. 20:2111-2118; Nakamura et al., 2000, Nucl. Acids Res. 28:292).

Preparation of Recombinant Vectors

Recombinant vectors for expression of a CBM or CBD fusion polypeptide can be prepared using methods well known in the art. For example, a DNA sequence encoding the fusion polypeptide, can be combined with transcriptional and other regulatory sequences which will direct the transcription of the sequence from the gene in the intended cells, e.g., bacterial cells such as H. alvei, E. coli, or C. glutamicum. In some embodiments, an expression vector that comprises an expression cassette that comprises the gene encoding the CBM or CBD fusion polypeptide further comprises a promoter operably linked to the nucleic acid sequence encoding the polypeptide. In other embodiments, a promoter and/or other regulatory elements that direct transcription of the polynucleotide encoding a CBM or CBD fusion polypeptide are endogenous to the host cell and an expression cassette comprising a gene that encodes the fusion polypeptide is introduced, e.g., by homologous recombination, such that the exogenous gene is operably linked to an endogenous promoter and expression is driven by the endogenous promoter.

As noted above, expression of the polynucleotide encoding CBM or CBD fusion polypeptide can be controlled by a number of regulatory sequences including promoters, which may be either constitutive or inducible; and, optionally, repressor sequences, if desired. In some embodiments, the promoter is a constitutive promoter. Examples of suitable promoters, especially in a bacterial host cell, are the promoters obtained from the E. coli lac operon and other promoters derived from genes involved in the metabolism of other sugars, e.g., galactose and maltose. Additional examples include promoters such as the trp promoter, bla promoter bacteriophage lambda PL, and T5. In addition, synthetic promoters, such as the tac promoter (U.S. Pat. No. 4,551,433), can be used. Further examples of promoters include Streptomyces coelicolor agarase gene (dagA), Bacillus subtilis levansucrase gene (sacB), Bacillus licheniformis alpha-amylase gene (amyL), Bacillus stearothermophilus maltogenic amylase gene (amyM), Bacillus amyloliquefaciens alpha-amylase gene (amyQ), Bacillus licheniformis penicillinase gene (penP), Bacillus subtilis xylA and xylB genes. Suitable promoters are also described in Ausubel and Sambrook & Russell, both supra. Additional promoters include promoters described by Jensen & Hammer, Appl. Environ. Microbiol. 64:82, 1998; Shimada, et al., J. Bacteriol. 186:7112, 2004; and Miksch et al., Appl. Microbiol. Biotechnol. 69:312, 2005.

In some embodiments, a promoter that influences expression of a gene encoding a fusion polypeptide of the invention may be modified to increase expression. For example, an endogenous promoter may be replaced by a promoter that provides for increased expression compared to the native promoter.

An expression vector may also comprise additional sequences that influence expression of a polynucleotide encoding the fusion polypeptide. Such sequences include enhancer sequences, a ribosome binding site, or other sequences such as transcription termination sequences, and the like.

A vector expressing a polynucleotide encoding a fusion polypeptide of the invention may be an autonomously replicating vector, i.e., a vector which exists as an extrachromosomal entity, the replication of which is independent of chromosomal replication, e.g., a plasmid, an extrachromosomal element, a minichromosome, or an artificial chromosome. The vector may contain any means for assuring self-replication. Alternatively, the vector may be one which, when introduced into the host, is integrated into the genome and replicated together with the chromosome(s) into which it has been integrated. Thus, an expression vector may additionally contain an element(s) that permits integration of the vector into the host's genome.

An expression vector of the invention preferably contains one or more selectable markers which permit easy selection of transformed hosts. For example, an expression vector may comprise a gene that confers antibiotic resistance (e.g., ampicillin, kanamycin, chloramphenicol or tetracycline resistance) to the recombinant host organism, e.g., a bacterial cell such as E. coli, H. alvei, or C. glutamicum.

Although any suitable expression vector may be used to incorporate the desired sequences, readily available bacterial expression vectors include, without limitation: plasmids such as pSC1O1, pBR322, pBBR1MCS-3, pUR, pET, pEX, pMR1OO, pCR4, pBAD24, p 15a, pACYC, pUC, e.g., pUC18 or pUC19, or plasmids derived from these plasmids; and bacteriophages, such as Ml3 phage and λ phage. One of ordinary skill in the art, however, can readily determine through routine experimentation whether any particular expression vector is suited for any given host cell. For example, the expression vector can be introduced into the host cell, which is then monitored for viability and expression of the sequences contained in the vector.

Expression vectors of the invention may be introduced into the host cell using any number of well-known methods, including calcium chloride-based methods, electroporation, or any other method known in the art.

Host Cells

The present invention provides for a genetically modified host cell that is engineered to express a CBM or CBD fusion polypeptide of the invention in which the host cell is also engineered to overexpress a lysine decarboxylase polypeptide, e.g., an exogenous lysine decarboxylase.

In some embodiments, a genetically modified host cell that expresses a CBM or CBD fusion polypeptide expresses is modified to express a heterologous lysine decarboxylase. As used herein, a lysine decarboxylase refers to an enzyme that converts L-lysine into cadaverine. The term includes variants of native, i.e., naturally occurring, lysine decarboxylase enzymes that have lysine decarboxylase enzymatic activity. Lysine decarboxylases are classified as E.C. 4.1.1.18. Lysine decarboxylase polypeptides are well characterized enzymes, the structures of which are well known in the art (see, e.g., Kanjee, et al., EMBO J. 30: 931-944, 2011; and a review by Lemmonier & Lane, Microbiology 144; 751-760, 1998; and references described therein). Illustrative lysine decarboxylase sequences are CadA homologs from Klebsiella sp., WP 012968785.1; Enterobacter aerogenes, YP 004592843.1; Salmonella enterica, WP 020936842.1; Serratia sp., WP 033635725.1; and Raoultella ornithinolytica, YP 007874766.1; and LdcC homologs from Shigella sp., WP 001020968.1; Citrobacter sp., WP 016151770.1; and Salmonella enterica, WP 001021062.1. In some embodiments, the lysine decarboxylase polypeptide expressed is the wild-type Ldc2 from Pseudomonas aeruginosa, or a mutant of such a polypeptide as described in PCT/CN2014/080873. In some embodiments, the lysine decarboxylase polypeptide expressed is the wild-type Ldc from Klebsiella oxytoca, or a mutant of such a polypeptide as described in (PCT/CN2015/071978).

In some embodiments, a genetically modified host cell that expresses a CBM or CBD fusion polypeptide is genetically modified host cell to express a heterologous CadA or heterologous LdcC.

In some embodiments, a genetically modified host cell that overexpresses a lysine decarboxylase comprises a modification in a regulatory sequence, e.g., a promoter, in the gene encoding an endogenous lysine decarboxylase that results in higher expression levels of the endogenous lysine decarboxylase. For example, the endogenous promoter may be replaced by an alternative promoter that provides for higher expression levels compared to the native promoter sequence, and/or regulatory sequences can be modified, e.g., repressor sequences or inducible sequences can be inactivated, to increase expression compared to the counterpart cell of the same strain that does not have the modification.

In some aspects, genetic modification of a host cell to express a CadA variant polypeptide is performed in conjunction with modifying the host cell to overexpress one or more lysine biosynthesis polypeptides.

In some embodiments, a host cell may be genetically modified to express one or more polypeptides that affect lysine biosynthesis. Examples of lysine biosynthesis polypeptides include the E. coli genes SucA, Ppc, AspC, LysC, Asd, DapA, DapB, DapD, ArgD, DapE, DapF, LysA, Ddh, PntAB, CyoABE, GadAB, YbjE, GdhA, GltA, SucC, GadC, AcnB, PflB, ThrA, AceA, AceB, GltB, AceE, SdhA, MurE, SpeE, SpeG, PuuA, PuuP, and YgjG, or the corresponding genes from other organisms. Such genes are known in the art (see, e.g., Shah et al., J. Med. Sci. 2:152-157, 2002; Anastassiadia, S. Recent Patents on Biotechnol. 1: 11-24, 2007). See, also, Kind, et al., Appl. Microbiol. Biotechnol. 91: 1287- 1296, 2011 for a review of genes involved in cadaverine production. Illustrative genes encoding lysine biosynthesis polypeptides are provided below.

GenBank

Protein
Gene
EC Number
Accession No.

α-ketogultarate
sucA
1.2.4.2
YP_489005.1

dehydrogenase (SucA)

Phosphoenolpyruvate
ppc
4.1.1.31
AAC76938.1

carboxylase (PPC)

aspartate transaminase (AspC)
aspC
2.6.1.1
AAC74014.1

aspartate kinase (LysC)
lysC
2.7.2.4
NP_418448.1

aspartate semialdehyde
asd
1.2.1.11
AAC76458.1

dehydrogenase (Asd)

dihydrodipicolinate synthase
dapA
4.3.3.7
NP_416973.1

(DapA)

dihydropicolinate reductase
dapB
1.17.1.8
AAC73142.1

(DapB)

tetrahydrodipicoinate
dapD
2.3.1.117
AAC73277.1

succinylase (DapD)

N-succinyldiaminopimelate
argD
2.6.1.11
AAC76384.1

aminotransferase (ArgD)

N-succinyl-L-diaminopimelate
dapE
3.5.1.18
AAC75525.1

deacylase (DapE)

diaminopimelate epimerase
dapF
5.1.1.7
AAC76812.2

(DapF)

diaminopimelate
lysA
4.1.1.20
AAC75877.1

decarboxylase (LysA)

meso-diaminopimelate
ddh
NA
P04964.1

dehydrogenase (Ddh)

pyridine nucleotide
pntAB
NA
AAC74675.1,

transhydrogenase (PntAB)

AAC74674.1

cytochrome O oxidase
cyoABE
1.10.3.10
AAC73535.1,

(CyoABE)

AAC73534.1,

AAC73531.1

glutamate decarboxylase
gadAB
4.1.1.15
AAC76542.1,

(GadAB)

AAC74566.1

L-amino acid efflux
ybjE
NA
AAC73961.2

transporter (YbjE)

glutamate dehydrogenase
gdhA
1.4.1.4
AAC74831.1

(GdhA)

citrate synthase (GltA)
gltA
2.3.3.1/
AAC73814.1

2.3.3.16

succinyl-coA synthase
sucC
6.2.1.5
AAC73822.1

(SucC)

glutamate-GABA
gadC
NA
AAC74565.1

antiporter (GadC)

aconitase B (AcnB)
acnB
4.2.1.99
AAC73229.1

pyruvate-formate lyase (PflB)
pflB
NA
AAC73989.1

aspartate kinase/homoserine
thrA
2.7.2.4
AAC73113.1

dehydrogenase (ThrA)

isocitrate lyase (AceA)
aceA
4.1.3.1
AAC76985.1

malate synthase (AceB)
aceB
2.3.3.9
AAC76984.1

glutmate synthase (GltB)
gltB
1.4.1.13
AAC76244.2

pyruvate dehydrogenase
aceE
1.2.4.1
AAC73225.1

(AceE)

succinate dehydrogenase
sdhA
1.3.5.1
AAC73817.1

(SdhA)

UDP-N-acetylmuramoyl-L-
murE
6.3.2.13
AAC73196.1

alanyl-D-glutamate: meso-

diaminopimelate ligase

(MurE)

putrescine/cadaverine
speE
2.5.1.16
AAC73232.1

aminopropyltransferase

(SpeE)

spermidine
speG
NA
AAC74656.1

acetyltransferase (SpeG)

glutamate-
puuA
NA
AAC74379.2

putrescine/glutamate-

cadaverine ligase (PuuA)

putrescine importer
puuP
NA
AAC74378.2

(PuuP)

putrescine/cadaverine
ygjG
2.6.1.82
AAC76108.3

aminotransferase (YgjG)

In some embodiments, a host cell may be genetically modified to attenuate or reduce the expression of one or more polypeptides that affect lysine biosynthesis. Examples of such polypeptides include the E. coli genes Pck, Pgi, DeaD, CilE, MenE, PoxB, AceA, AceB, AceE, RpoC, and ThrA, or the corresponding genes from other organisms. Such genes are known in the art (see, e.g., Shah et al., J. Med. Sci. 2:152-157, 2002; Anastassiadia, S. Recent Patents on Biotechnol. 1: 11-24, 2007). See, also, Kind, et al., Appl. Microbiol. Biotechnol. 91: 1287-1296, 2011 for a review of genes attenuated to increase cadaverine production. Illustrative genes encoding polypeptides whose attenuation increases lysine biosynthesis are provided below.

GenBank

Protein
Gene
EC Number
Accession No.

PEP carboxykinase (Pck)
pck
4.1.1.49
NP_417862

Glucose-6-phosphate
pgi
5.3.1.9
NP_418449

isomerase (Pgi)

DEAD-box RNA helicase
deaD

NP_417631

(DeaD)

citrate lyase (CitE)
citE
4.1.3.6/
NP_415149

4.1.3.34

o-succinylbenzoate-CoA ligase
menE
6.2.1.26
NP_416763

(MenE)

pyruvate oxidase (PoxB)
poxB
1.2.2.2
NP_415392

isocitrate lyase (AceA)
aceA
4.1.3.1
NP_418439

malate synthase A (AceB)
aceB
2.3.3.9
NP_418438

pyruvate dehydrogenase
aceE
1.2.4.1
NP_414656

(aceE)

RNA polymerase b′ subunit
rpoC
2.77.6
NP_418415

(RpoC)

aspartokinase I (ThrA)
thrA
2.7.2.4/
NP_414543

1.1.1.3

A host cell engineered to express a CBM or CBD fusion polypeptide that is expressed on the surface of the host cell is typically a bacterial host cell. In typical embodiments, the bacterial host cell is a Gram-negative bacterial host cell. In some embodiments of the invention, the bacterium is an enteric bacterium. In some embodiments of the invention, the bacterium is a species of the genus Corynebacterium, Escherichia, Pseudomonas, Zymomonas, Shewanella, Salmonella, Shigella, Enterobacter, Citrobacter, Cronobacter, Erwinia, Serratia, Proteus, Hafnia, Yersinia, Morganella, Edwardsiella, or Klebsiella taxonomical classes. In some embodiments, the host cells are members of the genus Escherichia, Hafnia, or Corynebacterium. In some embodiments, the host cell is an Escherichia coli, Hafnia alvei, or Corynebacterium glutamicum host cell. In some embodiments, the host cell is Escherichia coli. In some embodiments, the host cell is Hafnia alvei. In some embodiments, the host cell is Corynebacterium glutamicum.

In some embodiments, the host cell is a gram-positive bacterial host cell, such as a Bacillus sp., e.g., Bacillus subtilis or Bacillus licheniformis; or another Bacillus sp. such as B. alcalophilus, B. aminovorans, B. amyloliquefaciens, B. caldolyticus, B. circulans, B. stearothermophilus, B. thermoglucosidasius, B. thuringiensis or B. vulgatis.

In some embodiments, a host cell modified in accordance with the invention is yeast genetically modified to express a CBD or CBM fusion polypeptide comprising a carbohydrate binding domain fused to cell surface display polypeptide and to overexpress a lysine decarboxylase. Illustrative yeasts include species selected from Saccharomyces spp, such as Sacharomyces cerevisiae; Schizosaccharomyces spp. such as Schizosaccharomyces pombe, Kluyveromyces spp. such as Kluyveromyces lactis or Kluyveromyces marxianus; Pichia spp., such as Pichia pastoris; Yarrowia spp., Candida spp., Hansenula spp. and the like. In some embodiments, the host cell is a cyanobacteria or filamentous fungi, such as an Aspergilluis spp. As understood in the art, the cell surface display polypeptide is selected based on the host cell that is modified. For example, for yeast, a display polypeptide that is displayed on the surface of a yeast cell is selected for generating the fusion polypeptide.

Host cells modified in accordance with the invention can be screened for increased production of cadaverine, as described herein.

Methods of Producing Cadaverine.

A host cell genetically modified to express a CBD or CBM fusion polypeptide comprising the carbohydrate binding domain fused to cell surface display polypeptide, e.g., at the carboxyl end or N-terminal end, that further overexpresses a lysine decarboxylase, e.g., an exogenous lysine decarboxylase, such as CadA, provides a higher yield of cadaverine relative to a counterpart host cell, which does not express the fusion polypeptide, but is otherwise of the same genetic background. In some embodiments, cadaverine production is improved by at least 5%, typically at least 10%, 15% 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%, or greater compared to the counterpart host cell. In some embodiments, conversion of lysine to cadaverine can be measured using NMR by sampling the amount of lysine converted in the presence of PLP into cadaverine at regular intervals, e.g., about every 1.5 minutes for a total of 20 minutes, and taking the slope of the linear portion of the yield curve.

Host cells modified to co-express a CBM or CBD fusion polypeptide on the cell surface and a lysine decarboxylase gene may be immobilized to a carbohydrate substrate.

Thus, a host cell that expresses a CBD fusion polypeptide may be immobilized on a cellulose substrate. For example, cell cultures may be incubated with a cellulose substrate such as beads or filter paper, or other inert substrates that contain cellulose. Additional examples of carbohydrate immobilization materials include Avicel, acid-swollen cellulose, bacterial microcrystalline cellulose, cellulose filter paper (e.g., Whatman No. 1 filter paper), cellulose fabric (e.g., Bemliese 606), chitin beads, or chitin flakes.

Cadaverine can them be isolated and/or converted to a desired product, using well known methods.

The present invention will be described in greater detail by way of specific examples. The following examples are offered for illustrative purposes, and are not intended to limit the invention in any manner. Those of skill in the art will readily recognize a variety of noncritical parameters, which can be changed or modified to yield essentially the same results.

EXAMPLES

Example 1
Construction of Plasmid Vectors that Encode CadA.

A plasmid vector containing wild-type E. coli cadA (SEQ ID NO:21), which encodes the lysine decarboxylase CadA (SEQ ID NO:20), was amplified from the E. coli MG1655 K12 genomic DNA using the PCR primers cadA-F and cadA-R, digested using the restriction enzymes Sad and Xbal, and ligated into pUC18 to generate the plasmid pCIB60. The 5′ sequence upstream of the cadA gene was optimized using the PCR primers cadA-F2 and cadA-R2 to create pCIB71. The kanamycin resistance gene npt was amplified using the primers npt-F and npt-R, and cloned behind cadA in pCIB71 to create pCIB111.

Example 2
Construction of a Plasmid Vector Expressing a Surface Display Sequence 1pp-ompA.

A polypeptide that can target a protein fused to its C terminal end to the outer membrane of a Gram-negative bacteria can consist of the leader peptide of the Lpp lipoprotein and a portion of the transmembrane domain of OmpA (Georgiou et al., Protein Eng. 9, 239-247, 1996). The nucleotide sequence encoding the first 29 amino acid residues of E. coli Lpp was amplified using the primers lpp-F1 and lpp-R1, and fused to the nucleotide sequence encoding the amino acid residues 46-159 of E. coli OmpA amplified using the primers lpp-ompA-F and ompA-R. The fusion polypeptide fragment was digested using the restriction enzymes Sad and Xbal, and ligated into pUC18 to generate the plasmid pCIB129. The 5′ sequence upstream of the lpp gene fragment was optimized using the PCR primers lpp-F2 and lpp-R2 to create pCIB143.

Example 3
Construction of a Plasmid Vector Expressing the CBD from Cellvibrio Japonicas XynA.

The amino acid sequence of XynA from C. japonicus was obtained from NCBI (GenBank ID P14768.2) (SEQ ID NO:11). The portion of the protein sequence that includes the XynA CBD was codon optimized (SEQ ID NO:12) for heterologous expression in E. coli. Codon optimization and DNA assembly was performed according to Hoover DM & Lubkowski J, Nucleic Acids Research 30:10, 2002. The synthesized DNA product was amplified with the PCR primers XynA-F and XynA-R, digested using the restriction enzymes Xbal and Sphl, and ligated into pCIB143 to create plasmid pCIB147.

Example 4
Construction of a Plasmid Vector Expressing the CBD from Trichoderma reesei Cellobiohydrilase I.

The amino acid sequence of cellobiohydrilase I (CBH1) from T reesei was obtained from NCBI (GenBank ID P62694.1) (SEQ ID NO:7). The portion of the protein sequence that includes the CBH1 CBD was codon optimized (SEQ ID NO:8) for heterologous expression in E. coli. Codon optimization and DNA assembly was performed according to Hoover DM & Lubkowski J, Nucleic Acids Research 30:10, 2002. The synthesized DNA product was amplified with the PCR primers CBH1-F and CBH1-R, digested using the restriction enzymes Xbal and Sphl, and ligated into pCIB143 to create plasmid pCIB150.

Example 5
Construction of a Plasmid Vector Expressing the CBD from Trichoderma reesei cellobiohydrilase II.

The amino acid sequence of cellobiohydrilase II (CBH2) from T reesei was obtained from NCBI (GenBank ID P07987.1) (SEQ ID NO:9). The portion of the protein sequence that includes the CBH2 CBD was codon optimized (SEQ ID NO:10) for heterologous expression in E. coli. Codon optimization and DNA assembly was performed according to Hoover DM & Lubkowski J, Nucleic Acids Research 30:10, 2002. The synthesized DNA product was amplified with the PCR primers CBH2-F and CBH2-R, digested using the restriction enzymes Xbal and Sphl, and ligated into pCIB143 to create plasmid pCIB151.

Example 6
Construction of a Plasmid Vector Expressing the CBD from Cellomonas fimi exoglucanase

The amino acid sequence of exoglucanase from C. fimi was obtained from NCBI (GenBank ID AEA30147.1) (SEQ ID NO:1). The portion of the protein sequence that includes the exoglucanase CBD was codon optimized (SEQ ID NO:2) for heterologous expression in E. coli. Codon optimization and DNA assembly was performed according to Hoover DM & Lubkowski J, Nucleic Acids Research 30:10, 2002. The synthesized DNA product was amplified with the PCR primers CEX-F and CEX-R, digested using the restriction enzymes Xbal and Sphl, and ligated into pCIB143 to create plasmid pCIB158.

Example 7
Construction of a Plasmid Vector Expressing the CBD from Cellomonas fimi Cellulose Binding Protein.

The amino acid sequence of cellulose binding protein from C. fimi was obtained from NCBI (GenBank ID WP013770490.1) (SEQ ID NO:3). The portion of the protein sequence that includes the cellulose binding protein CBD was codon optimized (SEQ ID NO:4) for heterologous expression in E. coli. Codon optimization and DNA assembly was performed according to Hoover DM & Lubkowski J, Nucleic Acids Research 30:10, 2002. The synthesized DNA product was amplified with the PCR primers CCBP-F and CCBP-R, digested using the restriction enzymes Xbal and Sphl, and ligated into pCIB143 to create plasmid pCIB161.

Example 8
Construction of Plasmid Vectors Co-Expressing Genes that Encode a CBD and a Lysine Decarboxylase.

The cadA and npt genes were amplified from pCIB111 using the primers cadA-F3 and npt-R, while removing the Sphl restriction site in the npt gene using SOEing (Splicing by Overlapping Extension) PCR with the primers npt-F2 and npt-R2. The PCR fragment was digested using the restriction enzymes Sphl and Ndel, and ligated into pCIB147, pCIB150, pCIB151, pCIB158, and pCIB161 in order to create the plasmids pCIB171, pCIB193, pCIB190, pCIB196, and pCIB176, respectively.

Example 9
Production of Cadaverine from Free Cells of E. coli Co-Overexpressing Genes that Encode a CBD and Lysine Decarboxylase.

E. coli was transformed with pCIB111, pCIB171, pCIB176, pCIB190, pCIB193, or pCIB196. Three single colonies from each transformation were grown overnight at 37° C. in 4mL of LB medium with kanamycin (50 μg/mL). The following day, 0.6 mL of each overnight culture was added to 0.4 mL of lysine-HC1 and PLP to a final concentration of 160 g/L and 0.1 mM, respectively. Each mixture was incubated at 37° C. for 4 hours. Cadaverine production from each sample was quantified using NMR, and yield was calculated by dividing the molar amount of cadaverine produced by the molar amount of lysine added. The yield from each sample after 4 hours is presented in Table 1.

TABLE 1

Production of cadaverine by E. coli strains

overproducing a CBD and a lysine decarboxylase.

Strain
Plasmid
Cadaverine Yield (%)

E. coli

pCIB111
55.8 ± 3.5

pCIB171
72.5 ± 2.7

pCIB176
75.0 ± 1.3

pCIB190
73.8 ± 2.9

pCIB193
76.2 ± 1.6

pCIB196
77.0 ± 2.2

Table 1 shows that a E. coli cell's ability to convert lysine to cadaverine increased when a CBD was co-expressed with a lysine decarboxylase (pCIB171, pCIB176, pCIB190, pCIB193, pCIB196) compared to the negative control (pCIB111), where only a lysine decarboxylase was expressed.

Example 10
Production of Cadaverine from Free Cells of H. alvei Co-Overexpressing Genes that Encode a CBD and Lysine Decarboxylase.

H. avlei was transformed with pCIB111, pCIB171, pCIB176, pCIB190, pCIB193, or pCIB196. Three single colonies from each transformation were grown overnight at 37° C. in 4 mL of LB medium with kanamycin (50 μg/mL). The following day, 0.6 mL of each overnight culture was added to 0.4 mL of lysine-HC1 and PLP to a final concentration of 160 g/L and 0.1 mM, respectively. Each mixture was incubated at 37° C. for 2 hours. Cadaverine production from each sample was quantified using NMR, and yield was calculated by dividing the molar amount of cadaverine produced by the molar amount of lysine added. The yield from each sample after 2 hours is presented in Table 2.

TABLE 2

Production of cadaverine by H. avlei strains

overproducing a CBD and a lysine decarboxylase.

Strain
Plasmid
Cadaverine Yield (%)

H. alvei

pCIB111
53.1 ± 5.2

pCIB171
72.5 ± 1.9

pCIB176
77.6 ± 3.3

pCIB190
74.3 ± 4.7

pCIB193
78.6 ± 1.5

pCIB196
79.2 ± 2.8

Table 2 shows that a H. alvei cell's ability to convert lysine to cadaverine increased when a CBD was co-expressed with a lysine decarboxylase (pCIB171, pCIB176, pCIB190, pCIB193, pCIB196) compared to the negative control (pCIB111), where only a lysine decarboxylase was expressed.

Example 11
Production of Cadaverine from Free Cells of H. alvei Co-Overexpressing Genes that Encode a CBD and a Mutant Lysine Decarboxylase.

The lysine decarboxylase gene in the plasmid pCIB193 was mutated at to change the amino acid residue position 320 (SEQ ID NO:23) using the primers K320L-F and K320L-R. The amino acid at the 320^thresidue position was mutated from a lysine to a leucine (SEQ ID NO:22) to create plasmid pCIB357.

H. avlei was transformed with pCIB111, pCIB193, or pCIB357. Three single colonies from each transformation were grown overnight at 37° C. in 4 mL of LB medium with kanamycin (50 μg/mL). The following day, 0.6 mL of each overnight culture was added to 0.4 mL of lysine-HCl and PLP to a final concentration of 160 g/L and 0.1 mM, respectively. Each mixture was incubated at 37° C. for 2 hours. Cadaverine production from each sample was quantified using NMR, and yield was calculated by dividing the molar amount of cadaverine produced by the molar amount of lysine added. The yield from each sample after 2 hours is presented in Table 3.

TABLE 3

Production of cadaverine by H. avlei strains

overproducing a CBD and a lysine decarboxylase.

Strain
Plasmid
Cadaverine Yield (%)

H. avlei

pCIB111
48.4 ± 2.8

pCIB193
77.2 ± 1.2

pCIB357
91.7 ± 3.7

Table 3 shows that a H. alvei cell's ability to convert lysine to cadaverine was further increased when the lysine decarboxylase was mutated at amino acid position 320 from a lysine to a leucine (pCIB357) compared to either control that expressed the wild-type lysine decarboxylase protein (pCIB 111, pCIB 193).

Example 12
Determining the Amount of Lysine Decarboxylase Protein Produced when a CBD is Co-Expressed with the Lysine Decarboxylase.

To determine whether the co-expression of a CBD with the lysine decarboxylase caused any changes in the amount of protein produced that would lead to the change in activity produced, overnight cultures of H. alvei harboring either pCIB111 or pCIB193 were lysed and analyzed using SDS-PAGE in order to determine how much of the total protein consisted of the lysine decarboxylase polypeptide. Cell lysis was performed using a combination of freeze thaw and lysozyme. Lysed samples were treated with DNAse in order to remove most of the DNA. The result of the SDS-PAGE is shown in FIG. 1.

The results demonstrated that the amount of lysine decarboxylase in the soluble phase actually decreased when a CBD was co-overexpressed (FIG. 1, lanes 193-1 and 193-2) compared to when a CBD was not co-expressed (FIG. 1, lane 111). There was, however, no significant difference in the amount of protein in the insoluble phase and the amount of background protein was the same in both the soluble and insoluble fractions. This observation demonstrates that the increase in lysine decarboxylase activity of a cell co-expressing a CBD was not a result of the CBD increasing the amount of soluble lysine decarboxylase inside the cell. Instead, there is some other mechanism by which co-expression of the CBD increases the cell's ability to convert lysine to cadaverine. Not to be bound by theory, one hypothesis is that insertion of the CBD into the membrane alters the membrane permeability of the cell, which affects the transport of lysine or cadaverine into or out of the cell.

Example 13
Production of Cadaverine from Immobilized H. alvei Co-Overexpressing Genes that Encode a CBD and Lysine Decarboxylase.

To determine the cell's ability to bind to a cellulose substrate with and without expressing a CBD, H. avlei transformed with either pCIB111 or pCIB193 were grown overnight at 37° C. in 10 mL of LB medium with kanamycin (50 μg/mL). The following day, 8 mL of overnight cultures were poured into separate Petri dishes containing Whatman No.1 filter paper and submerging the filter paper. The submerged filter paper was allowed to incubate overnight at room temperature with shaking at 100 rpm. The following day, the filter papers with the adhered cells were transferred to clean Petri dishes and washed with 5 mL of PBST buffer (pH 7.4) at room temperature for 20 minutes with shaking at 100 rpm.

After the unbound cells were washed off, the filter papers were transferred to new Petri dishes, and 5 mL of lysine-HC1 and PLP were added to a final concentration of 100 g/L and 0.1 mM, respectively. The petri dishes were incubated at 37° C. for 6 hours and shaking at 100 rpm. Cadaverine production from each Petri dish was quantified using NMR, and yield was calculated by dividing the molar amount of cadaverine produced by the molar amount of lysine added. The yield from each sample after 6 hours is presented in Table 4.

TABLE 4

Production of cadaverine by immobilized H. avlei strains

overproducing a CBD and a lysine decarboxylase.

Strain
Plasmid
Cadaverine Yield (%)

H. alvei

pCIB111
8.5 ± 2.2

pCIB193
36.2 ± 1.9

Table 4 shows that H. alvei cells expressing a CBD (pCIB193) can bind better to a cellulose substrate compared to cells that do not express a CBD (pCIB111). In addition, the bound cells are still able to catalyze the conversion of lysine to cadaverine. The activity observed in the control that does not express a CBD is most likely from residual cells that were not removed from the filter paper during the wash step.

Table of plasmids used in Examples

Host
Protein(s) Overexpressed
Plasmid

CadA
pCIB71

CadA, Npt
pCIB111

Lpp-OmpA
pCIB129

Lpp-OmpA
pCIB143

Lpp-OmpA-XynA
pCIB147

Lpp-OmpA-CBH1
pCIB150

Lpp-OmpA-CBH2
pCIB151

Lpp-OmpA-CEX
pCIB158

Lpp-OmpA-CCBP
pCIB161

Lpp-OmpA-XynA, CadA, Npt
pCIB171

Lpp-OmpA-CBH1, CadA, Npt
pCIB193

Lpp-OmpA-CBH2, CadA, Npt
pCIB190

Lpp-OmpA-CEX, CadA, Npt
pCIB196

Lpp-OmpA-CCBP, CadA, Npt
pCIB176

E. coli

Lpp-OmpA-XynA, CadA, Npt
pCIB171

E. coli

Lpp-OmpA-CBH1, CadA, Npt
pCIB193

E. coli

Lpp-OmpA-CBH2, CadA, Npt
pCIB190

E. coli

Lpp-OmpA-CEX, CadA, Npt
pCIB196

E. coli

Lpp-OmpA-CCBP, CadA, Npt
pCIB176

H. avlei

Lpp-OmpA-XynA, CadA, Npt
pCIB171

H. avlei

Lpp-OmpA-CBH1, CadA, Npt
pCIB193

H. avlei

Lpp-OmpA-CBH2, CadA, Npt
pCIB190

H. avlei

Lpp-OmpA-CEX, CadA, Npt
pCIB196

H. avlei

Lpp-OmpA-CCBP, CadA, Npt
pCIB176

Table of primer sequences used in Examples.

Name
Sequence (5'-3')

cadA-F1
GGCGAGCTCACACAGGAAACAGACCATGAACGTTATTGC

AATATTGAATCAC

cadA-R1
GGCTCTAGACCACTTCCCTTGTACGAGC

npt-F1
GGCAAGCTTAAGAGACAGGATGAGGATCG

npt-R1
GGCCATATGTCAGAAGAACTCGTCAAGAAG

cadA-F2
ATTTCACACAGGAAACAGCTATGAACGTTATTGCAATAT

TGAATCAC

cadA-R2
AGCTGTTTCCTGTGTGAAAT

1pp-F1
GGCGAGCTCATGAAAGCTACTAAACTGGTACTG

1pp-R1
CTGATCGATTTTAGCGTTGC

1pp-ompA-F
GCAACGCTAAAATCGATCAGAACAACAATGGCCCGACC

ompA-R
GGCTCTAGAACGGGTAGCGATTTCAGGAG

1pp-F2
ATTTCACACAGGAAACAGCTATGAAAGCTACTAAACTGG

TACTG

1pp-R2
AGCTGTTTCCTGTGTGAAAT

XynA-F
GGCTCTAGAGTCGACGCGACCTGCTCTTACAACAT

XynA-R
GGCGCATGCCTGCAGTTACTGCTGATTACCAGAGCTGC

CBH1-F
GGCTCTAGAGTCGACTCTGGTGGTAACCCGCCAGG

CBH1-R
GGCGCATGCCTGCAGTTACAGGCACTGAGAGTAGTACG

CBH2-F
GGCTCTAGAGTCGACGTTTGGGGTCAGTGCGGTGG

CBH2-R
GGCGCATGCCTGCAGTTAACCAACCGGCGGAACACGAG

CEX-F
GGCTCTAGAGTCGACGGTGCGTCTCCTACCCCAAC

CEX-R
GGCGCATGCCTGCAGTTAACCAACGGTGCACGGGGTAC

CCBP-F
GGCTCTAGAGTCGACACCACTCCAACCCCAACG

CCBP-R
GGCGCATGCCTGCAGTTACGTTACGGCAGACGCGGTCG

cadA-F3
GGCTCTAGAATTTCACACAGGAAACAGCT

npt-F2
CAAGGCGCGTATGCCCGACG

npt-R2
CGTCGGGCATACGCGCCTTG

K320L-F
ACCGACTTCATCAAGCTGACACTGGATGTGAAATCCATC

K320L-R
GATTTCACATCCAGTGTCAGCTTGATGAAGTCGGTGTTG

Illustrative nucleic acid and polypeptide sequences:

Cellomonas fimi exoglucanase polypeptide sequence-CBD sequence is

underlined

SEQ ID NO: 1

MPRTTPAPGHPARGARTALRTTLAAAAATLVVGATVVLPAQAATTLKEAADGAGRDFGFALDPNRLSEAQYKAIADS

EFNLVVAENAMKWDATEPSQNSFSFGAGDRVASYAADTGKELYGHTLVWHSQLPDWAKNLNGSAFESAMVNHVTKVA

DHFEGKVASWDVVNEAFADGGGRRQDSAFQQKLGNGYIETAFRAARAADPTAKLCINDYNVEGINAKSNSLYDLVKD

FKARGVPLDCVGFQSHLIVGQVPGDFRQNLQRFADLGVDVRITELDIRMRTPSDATKLATQAADYKKVVQACMQVTR

CQGVTVWGITDKYSWVPDVFPGEGAALVWDASYAKKPAYAAVMEAFGASPTPTPTTPTPTPTTPTPTPTSGPAGCQV

LWGVNQWNTGFTANVTVKNTSSAPVDGWTLTFSFPSGQQVTQAWSSTVTQSGSAVTVRNAPWNGSIPAGGTAQFGFN

GSHTGTNAAPTAFSLNGTPCTVG

Codon optimized nucleotide sequence including the Cellomonas fimi

exoglucanase CBD

SEQ ID NO: 2

GGTGCGTCTCCTACCCCAACTCCAACCACCCCGACCCCGACCCCGACCACTCCGACGCCAACGCCGACCTCTGGTCC

AGCGGGTTGCCAGGTTCTGTGGGGTGTTAACCAGTGGAACACCGGCTTCACCGCGAACGTTACCGTGAAAAACACCT

CTTCTGCCCCGGTTGACGGTTGGACCCTGACCTTCTCTTTCCCGTCTGGTCAGCAGGTTACCCAGGCGTGGTCTTCT

ACCGTTACGCAGTCTGGTTCTGCGGTGACCGTTCGTAACGCGCCGTGGAATGGTTCCATCCCGGCTGGCGGTACGGC

CCAGTTCGGCTTTAACGGTTCCCACACCGGTACCAATGCGGCACCGACCGCGTTCTCTCTGAACGGTACCCCGTGCA

CCGTTGGT

Cellomonas fimi cellulose binding protein polypeptide sequence-CBD sequence

is underlined

SEQ ID NO: 3

MTTTLSRRLAGTLAALLLALAGALALAGPTQAADPVRIMPLGDSITGNPGCWRALLWQKLQQGGHTDVDMVGTLPAQ

GCGVAHDGDNEGHGGYLVTDVAAQGQLVGWLAATDPDVVVMHFGTNDVWSARTTQQILDAYTTLVQQMRASNPQMRV

LVAQIIPVAPPTCAQCPARTAALNAAIPAWAAGITTAQSPVVVVDQATGWVPATDTSDGVHPDEDGIVKLADRWYPA

LAAVLDGTTPTPTPTPTPTVSPTPTPTPTPSVTPTPTPTPGGATCTASYAVSSQWQGGFVASVRVTATSPVSSWTVA

VTLPGGAVQHAWSATATTSGSTATFANAAWNGTLAAGQQADLGFQGTGSPTASAVTCTATR

Codon optimized nucleotide sequence including the Cellomonas fimi cellulose

binding protein CBD

SEQ ID NO: 4

ACCACTCCAACCCCAACGCCGACTCCGACGCCTACCGTTAGCCCTACTCCTACCCCGACCCCTACGCCGTCTGTTAC

GCCAACTCCAACCCCAACGCCTGGTGGTGCGACCTGCACCGCATCTTACGCGGTTAGCTCTCAGTGGCAGGGCGGTT

TTGTTGCGAGCGTTCGTGTTACCGCCACCTCCCCGGTTTCTAGCTGGACCGTTGCGGTTACCCTGCCGGGTGGCGCG

GTTCAACATGCGTGGTCTGCGACGGCAACCACCTCTGGTTCTACCGCGACCTTTGCGAACGCGGCGTGGAACGGTAC

GCTGGCTGCTGGTCAGCAGGCGGACCTGGGCTTCCAGGGTACCGGCTCTCCGACCGCGTCTGCCGTAACG

Clostridium thermocellum CipA polypeptide sequence-CBD sequence is

underlined

SEQ ID NO: 5

MRKVISMLLVVAMLTTIFAAMIPQTVSAATMTVEIGKVTAAVGSKVEIPITLKGVPSKGMANCDFVLGYDPNVLEVT

EVKPGSIIKDPDPSKSFDSAIYPDRKMIVFLFAEDSGRGTYAITQDGVFATIVATVKSAAAAPITLLEVGAFADNDL

VEISTTFVAGGVNLGSSVPTTQPNVPSDGVVVEIGKVTGSVGTTVEIPVYFRGVPSKGIANCDFVFRYDPNVLEIIG

IDPGDIIVDPNPTKSFDTAIYPDRKIIVFLFAEDSGTGAYAITKDGVFAKIRATVKSSAPGYITFDEVGGFADNDLV

EQKVSFIDGGVNVGNATPTKGATPTNTATPTKSATATPTRPSVPTNTPTNTPANTPVSGNLKVEFYNSNPSDTTNSI

NPQFKVTNTGSSAIDLSKLTLRYYYTVDGQKDQTFWCDHAAIIGSNGSYNGITSNVKGTFVKMSSSTNNADTYLEIS

FTGGTLEPGAHVQIQGRFAKNDWSNYTQSNDYSFKSASQFVEWDQVTAYLNGVLVWGKEPGGSVVPSTQPVTTPPAT

TKPPATTKPPATTIPPSDDPNAIKIKVDTVNAKPGDTVNIPVRFSGIPSKGIANCDFVYSYDPNVLEIIEIKPGELI

VDPNPDKSFDTAVYPDRKIIVFLFAEDSGTGAYAITKDGVFATIVAKVKSGAPNGLSVIKFVEVGGFANNDLVEQRT

QFFDGGVNVGDTTVPTTPTTPVTTPTDDSNAVRIKVDTVNAKPGDTVRIPVRFSGIPSKGIANCDFVYSYDPNVLEI

IEIEPGDIIVDPNPDKSFDTAVYPDRKIIVFLFAEDSGTGAYAITKDGVFATIVAKVKSGAPNGLSVIKFVEVGGFA

NNDLVEQKTQFFDGGVNVGDTTEPATPTTPVTTPTTTDDLDAVRIKVDTVNAKPGDTVRIPVRFSGIPSKGIANCDF

VYSYDPNVLEIIEIEPGDIIVDPNPDKSFDTAVYPDRKIIVFLFAEDSGTGAYAITKDGVFATIVAKVKSGAPNGLS

VIKFVEVGGFANNDLVEQKTQFFDGGVNVGDTTEPATPTTPVTTPTTTDDLDAVRIKVDTVNAKPGDTVRIPVRFSG

IPSKGIANCDFVYSYDPNVLEIIEIEPGDIIVDPNPDKSFDTAVYPDRKIIVFLFAEDSGTGAYAITKDGVFATIVA

KVKEGAPNGLSVIKFVEVGGFANNDLVEQKTQFFDGGVNVGDTTEPATPTTPVTTPTTTDDLDAVRIKVDTVNAKPG

DTVRIPVRFSGIPSKGIANCDFVYSYDPNVLEIIEIEPGELIVDPNPTKSFDTAVYPDRKMIVFLFAEDSGTGAYAI

TEDGVFATIVAKVKSGAPNGLSVIKFVEVGGFANNDLVEQKTQFFDGGVNVGDTTEPATPTTPVTTPTTTDDLDAVR

IKVDTVNAKPGDTVRIPVRFSGIPSKGIANCDFVYSYDPNVLEIIEIEPGDIIVDPNPDKSFDTAVYPDRKIIVFLF

AEDSGTGAYAITKDGVFATIVAKVKEGAPNGLSVIKFVEVGGFANNDLVEQKTQFFDGGVNVGDTTVPTTSPTTTPP

EPTITPNKLTLKIGRAEGRPGDTVEIPVNLYGVPQKGIASGDFVVSYDPNVLEIIEIEPGELIVDPNPTKSFDTAVY

PDRKMIVFLFAEDSGTGAYAITEDGVFATIVAKVKEGAPEGFSAIEISEFGAFADNDLVEVETDLINGGVLVTNKPV

IEGYKVSGYILPDFSFDATVAPLVKAGFKVEIVGTELYAVTDANGYFEITGVPANASGYTLKISRATYLDRVIANVV

VTGDTSVSTSQAPIMMWVGDIVKDNSINLLDVAEVIRCFNATKGSANYVEELDINRNGAINMQDIMIVHKHFGATSS

DYDAQ

Clostridium thermocellum partial CipB polypeptide sequence-CBD sequence is

underlined

SEQ ID NO: 6

DPSKSFDSAIYPDRKMIVFLFAEDSGRGTYAITQDGVFATIVATVKSAAAAPITLLEVGAFRDNDLVEISTTFVAGG

VNLGSSVPTTQPNVPSDGVVVEIGKVTGSVGTTVEIPVYFRGVPSKGIANCDFVFRYDPNVLEIIGIDPRSIIVDPN

PTKSFDTAIYADRKIIVFLFCGRQRNRSVSITKDGVFAKIRATVKSSAPAYITFDEVGGFADNDLVEQKVSFIDGGV

NVGNATPTKGATPTNTATPTKSATATPPGHSVPTNTPTNTPANTPVSGNLKVEFYNSNPSDTTNSINPQFKVTNTGS

SAIDLSKLTLRYYYTVDGQKDQTFWCDHAAIIGSNGSYNGITSNVKGTFVKMSSSTNNADTYLEISFTGGTLEPGAH

VQIQGRFAKNDWSNYTQSNDYSFKSRSQFVEWDQVTAYLNGVLVWGKEPGGSVVPSTQPVTTPPATTKPPATTIPPS

DDPNAIKIKVDTVNAKPGDTVNIPVRFSGIPSKGIANCDFVYSYDPNVLEIIEIKPGELIVDPNPDKSFDTAVYPDR

KMIVFLFAEDSGTGAYAITEDGVFATIVAKVKEGAPEGFSAIEISEFGAFADNDLVEVETDLINGGVLVTNKPVIEG

YKVSGYILPDFSFDATVAPLVKAGFKVEIVGTELYAVTDANGYFEITGVPANASGYTLKISRATYLDRVIANVVVTG

DTSVSTSQAPIMMWVGDIVKDNSINLLDVAEVIRCFNATKGSANYVEELDINRNGAINMQDIMIVHKHFGATSSDYD

AQ

Trichoderma reesei cellobiohydrylase I polypeptide sequence-CBD sequence is

underlined

SEQ ID NO: 7

MYRKLAVISAFLATARAQSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTL

CPDNETCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFSFDVDVSQLP

CGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGWEPSSNNANTGIGGHGSCCSEMD

IWEANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVV

TQFETSGAINRYYVQNGVTFQQPNAELGSYSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWD

DYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSGGNPPGGNRGTT

TTRRPATTTGSSPGPTQSHYGQCGGIGYSGPTVCASGTTCQVLNPYYSQCL

Codon optimized nucleotide sequence including the Trichoderma reesei

cellobiohydrylase I CBD

SEQ ID NO: 8

TCTGGTGGTAACCCGCCAGGTGGCAACCGTGGCACCACCACTACGCGTCGTCCGGCGACTACGACCGGTTCTTCTCC

GGGTCCGACGCAGTCTCACTACGGTCAGTGCGGTGGTATCGGTTACTCTGGCCCGACCGTTTGCGCGTCTGGTACGA

CCTGCCAGGTTCTGAACCCGTACTACTCTCAGTGCCTG

Trichoderma reesei cellobiohydrylase II polypeptide sequence-CBD sequence is

underlined

SEQ ID NO: 9

MIVGILTTLATLATLAASVPLEERQACSSVWGQCGGQNWSGPTCCASGSTCVYSNDYYSQCLPGAASSSSSTRAAST

TSRVSPTTSRSSSATPPPGSTTTRVPPVGSGTATYSGNPFVGVTPWANAYYASEVSSLAIPSLTGAMATAAAAVAKV

PSFMWLDTLDKTPLMEQTLADIRTANKNGGNYAGQFVVYDLPDRDCAALASNGEYSIADGGVAKYKNYIDTIRQIVV

EYSDIRTLLVIEPDSLANLVTNLGTPKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPANQDPAAQLFA

NVYKNASSPRALRGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLANHGWSNAFFITDQGRSGKQPTGQ

QQWGDWCNVIGTGFGIRPSANTGDSLLDSFVWVKPGGECDGTSDSSAPRFDSHCALPDALQPAPQAGAWFQAYFVQL

LTNANPSFL

Codon optimized nucleotide sequence including the Trichoderma reesei

cellobiohydrylase II CBD

SEQ ID NO: 10

GTTTGGGGTCAGTGCGGTGGTCAGAACTGGTCTGGTCCGACCTGCTGCGCGTCTGGTTCTACCTGCGTTTACTCTAA

CGACTACTACTCTCAGTGCCTGCCGGGTGCTGCGTCTTCTAGCTCTTCTACCCGTGCGGCGTCTACCACCTCTCGTG

TGTCTCCGACTACCTCCCGTTCTTCTTCTGCGACCCCACCGCCAGGCTCTACTACCACTCGTGTTCCGCCGGTTGGT

Cellvibrio japonicas XynA polypeptide sequence-CBD sequence is underlined

SEQ ID NO: 11

MRTAMAKSLGAAAFLGAALFAHTLAAQTATCSYNITNEWNTGYTGDITITNRGSSAINGWSVNWQYATNRLSSSWNA

NVSGSNPYSASNLSWNGNIQPGQSVSFGFQVNKNGGSAERPSVGGSICSGSVASSSAPASSVPSSIASSSPSSVASS

VISSMASSSPVSSSSVASSTPGSSSGNQQCNWYGTLYPLCVTTTNGWGWEDQRSCIARSTCAAQPAPFGIVGSGSST

PVSSSSSSLSSSSVVSSIRSSSSSSSSSVATGNGLASLADFPIGVAVAASGGNADIFTSSARQNIVRAEFNQITAEN

IMKMSYMYSGSNFSFTNSDRLVSWAAQNGQTVHGHALVWHPSYQLPNWASDSNANFRQDFARHIDTVAAHFAGQVKS

WDVVNEALFDSADDPDGRGSANGYRQSVFYRQFGGPEYIDEAFRRARAADPTAELYYNDFNTEENGAKTTALVNLVQ

RLLNNGVPIDGVGFQMHVMNDYPSIANIRQAMQKIVALSPTLKIKITELDVRLNNPYDGNSSNDYTNRNDCAVSCAG

LDRQKARYKEIVQAYLEVVPPGRRGGITVWGIADPDSWLYTHQNLPDWPLLFNDNLQPKPAYQGVVEALSGR

Codon optimized nucleotide sequence including the Cellvibrio japonicas XynA

CBD

SEQ ID NO: 12

GCGACCTGCTCTTACAACATCACCAACGAATGGAACACCGGTTACACCGGCGACATTACCATCACTAATCGTGGTTC

TTCTGCGATCAACGGTTGGTCTGTTAACTGGCAATATGCTACGAACCGCCTGTCTTCTAGCTGGAACGCGAACGTTT

CTGGTTCTAACCCGTACTCTGCGTCTAACCTCTCTTGGAACGGTAACATCCAGCCGGGTCAGTCTGTTTCCTTTGGT

TTCCAGGTTAACAAAAACGGCGGCTCTGCTGAGCGTCCGTCTGTTGGTGGTAGCATCTGCTCTGGCTCTGTTGCGTC

CTCTTCCGCGCCAGCTTCTTCCGTCCCATCTTCTATCGCGTCTTCTTCTCCGTCTAGCGTTGCCTCCAGCGTTATCT

CTTCCATGGCGTCCAGCTCTCCGGTTAGCTCCAGCAGCGTAGCGAGCAGCACCCCGGGTAGCAGCTCTGGTAATCAG

CAG

Cellomonas fimi exoglucanase CBD amino acid sequence

SEQ ID NO: 13

SGPAGCQVLWGVNQWNTGFTANVTVKNTSSAPVDGWTLTFSFPSGQQVTQAWSSTVTQSGSAVTVRNAPWNGSIPAG

GTAQFGFNGSHTGTNAAPTAFSLNGTPC

Cellomonas fimi cellulose binding protein CBD amino acid sequence

SEQ ID NO: 14

TCTASYAVSSQWQGGFVASVRVTATSPVSSWTVAVTLPGGAVQHAWSATATTSGSTATFANAAWNGTLAAGQQADLG

FQGTGSPTASAVT

Clostridium thermocellum CipA CBD amino acid sequence

SEQ ID NO: 15

VEFYNSNPSDTTNSINPQFKVTNTGSSAIDLSKLTLRYYYTVDGQKDQTFWCDHAAIIGSNGSYNGITSNVKGTFVK

MSSSTNNADTYLEIS

Clostridium thermocellum Cip B CBD amino acid sequence

SEQ ID NO: 16

VEFYNSNPSDTTNSINPQFKVTNTGSSAIDLSKLTLRYYYTVDGQKDQTFWCDHAAIIGSNGSYNGITSNVKGTFVK

MSSSTNNADTYLEIS

Trichoderma reesei cellobiohydrylase I CBD sequence amino acid sequence

SEQ ID NO: 17

HYGQCGGIGYSGPTVCASGTTCQVLNPYYSQ

Trichoderma reesei cellobiohydrylase II CBD amino acid sequence

SEQ ID NO: 18

VWGQCGGQNWSGPTCCASGSTCVYSNDYYS

Cellvibrio japonicas XynA CBD amino acid sequence

SEQ ID NO: 19

ATCSYNITNEWNTGYTGDITITNRGSSAINGWSVNWQYATNR

CadA polypeptide sequence

SEQ ID NO: 20

MNVIAILNHMGVYFKEEPIRELHRALERLNFQIVYPNDRDDLLKLIENNARLCGVIFDWDKYNLELCEEISKMNENL

PLYAFANTYSTLDVSLNDLRLQISFFEYALGAAEDIANKIKQTTDEYINTILPPLTKALFKYVREGKYTFCTPGHMG

GTAFQKSPVGSLFYDFFGPNTMKSDISISVSELGSLLDHSGPHKEAEQYIARVFNADRSYMVTNGTSTANKIVGMYS

APAGSTILIDRNCHKSLTHLMMMSDVTPIYFRPTRNAYGILGGIPQSEFQHATIAKRVKETPNATWPVHAVITNSTY

DGLLYNTDFIKKTLDVKSIHFDSAWVPYTNFSPIYEGKCGMSGGRVEGKVIYETQSTHKLLAAFSQASMIHVKGDVN

EETFNEAYMMEITTTSPHYGIVASTETAAAMMKGNAGKRLINGSIERAIKFRKEIKRLRTESDGWFFDVWQPDHIDT

TECWPLRSDSTWHGFKNIDNEHMYLDPIKVTLLTPGMEKDGTMSDFGIPASIVAKYLDEHGIVVEKTGPYNLLFLFS

IGIDKTKALSLLRALTDFKRAFDLNLRVKNMLPSLYREDPEFYENIVIRIQELAQNIHKLIVEIHNLPDLMYRAFEV

LPTMVMTPYAAFQKELHGMTEEVYLDEMVGRINANMILPYPPGVPLVMPGEMITEESRPVLEFLQMLCEIGAHYPGF

ETDIRGAYRQADGRYTVKVLKEESKK

Escherichia coli cadA nucleic acid sequence

SEQ ID NO: 21

ATGAACGTTATTGCAATATTGAATCACATGGGGGTTTATTTTAAAGAAGAACCCATCCGTGAACTTCATCGCGCGCT

TGAACGTCTGAACTTCCAGATTGTTTACCCGAACGACCGTGACGACTTATTAAAACTGATCGAAAACAATGCGCGTC

TGTGCGGCGTTATTTTTGACTGGGATAAATATAATCTCGAGCTGTGCGAAGAAATTAGCAAAATGAACGAGAACCTG

CCGTTGTACGCGTTCGCTAATACGTATTCCACTCTCGATGTAAGCCTGAATGACCTGCGTTTACAGATTAGCTTCTT

TGAATATGCGCTGGGTGCTGCTGAAGATATTGCTAATAAGATCAAGCAGACCACTGACGAATATATCAACACTATTC

TGCCTCCGCTGACTAAAGCACTGTTTAAATATGTTCGTGAAGGTAAATATACTTTCTGTACTCCTGGTCACATGGGC

GGTACTGCATTCCAGAAAAGCCCGGTAGGTAGCCTGTTCTATGATTTCTTTGGTCCGAATACCATGAAATCTGATAT

TTCCATTTCAGTATCTGAACTGGGTTCTCTGCTGGATCACAGTGGTCCACACAAAGAAGCAGAACAGTATATCGCTC

GCGTCTTTAACGCAGACCGCAGCTACATGGTGACCAACGGTACTTCCACTGCGAACAAAATTGTTGGTATGTACTCT

GCTCCAGCAGGCAGCACCATTCTGATTGACCGTAACTGCCACAAATCGCTGACCCACCTGATGATGATGAGCGATGT

TACGCCAATCTATTTCCGCCCGACCCGTAACGCTTACGGTATTCTTGGTGGTATCCCACAGAGTGAATTCCAGCACG

CTACCATTGCTAAGCGCGTGAAAGAAACACCAAACGCAACCTGGCCGGTACATGCTGTAATTACCAACTCTACCTAT

GATGGTCTGCTGTACAACACCGACTTCATCAAGAAAACACTGGATGTGAAATCCATCCACTTTGACTCCGCGTGGGT

GCCTTACACCAACTTCTCACCGATTTACGAAGGTAAATGCGGTATGAGCGGTGGCCGTGTAGAAGGGAAAGTGATTT

ACGAAACCCAGTCCACTCACAAACTGCTGGCGGCGTTCTCTCAGGCTTCCATGATCCACGTTAAAGGTGACGTAAAC

GAAGAAACCTTTAACGAAGCCTACATGATGCACACCACCACTTCTCCGCACTACGGTATCGTGGCGTCCACTGAAAC

CGCTGCGGCGATGATGAAAGGCAATGCAGGTAAGCGTCTGATCAACGGTTCTATTGAACGTGCGATCAAATTCCGTA

AAGAGATCAAACGTCTGAGAACGGAATCTGATGGCTGGTTCTTTGATGTATGGCAGCCGGATCATATCGATACGACT

GAATGCTGGCCGCTGCGTTCTGACAGCACCTGGCACGGCTTCAAAAACATCGATAACGAGCACATGTATCTTGACCC

GATCAAAGTCACCCTGCTGACTCCGGGGATGGAAAAAGACGGCACCATGAGCGACTTTGGTATTCCGGCCAGCATCG

TGGCGAAATACCTCGACGAACATGGCATCGTTGTTGAGAAAACCGGTCCGTATAACCTGCTGTTCCTGTTCAGCATC

GGTATCGATAAGACCAAAGCACTGAGCCTGCTGCGTGCTCTGACTGACTTTAAACGTGCGTTCGACCTGAACCTGCG

TGTGAAAAACATGCTGCCGTCTCTGTATCGTGAAGATCCTGAATTCTATGAAAACATGCGTATTCAGGAACTGGCTC

AGAATATCCACAAACTGATTGTTCACCACAATCTGCCGGATCTGATGTATCGCGCATTTGAAGTGCTGCCGACGATG

GTAATGACTCCGTATGCTGCATTCCAGAAAGAGCTGCACGGTATGACCGAAGAAGTTTACCTCGACGAAATGGTAGG

TCGTATTAACGCCAATATGATCCTTCCGTACCCGCCGGGAGTTCCTCTGGTAATGCCGGGTGAAATGATCACCGAAG

AAAGCCGTCCGGTTCTGGAGTTCCTGCAGATGCTGTGTGAAATCGGCGCTCACTATCCGGGCTTTGAAACCGATATT

CACGGTGCATACCGTCAGGCTGATGGCCGCTATACCGTTAAGGTATTGAAAGAAGAAAGCAAAAAATAA

CadA mutant K320L polypeptide sequence-mutation underlined

SEQ ID NO: 22

MNVIAILNHMGVYFKEEPIRELHRALERLNFQIVYPNDRDDLLKLIENNARLCGVIEDWDKYNLELCEEISKMNENL

PLYAFANTYSTLDVSLNDLRLQISFFEYALGAAEDIANKIKQTTDEYINTILPPLTKALFKYVREGKYTFCTPGHMG

GTAFQKSPVGSLFYDFFGPNTMKSDISISVSELGSLLDHSGPHKEAEQYIARVFNADRSYMVTNGTSTANKIVGMYS

APAGSTILIDRNCHKSLTHLMMMSDVTPIYFRPTRNAYGILGGIPQSEFQHATIAKRVKETPNATWPVHAVITNSTY

DGLLYNTDFIKLTLDVKSIHEDSAWVPYTNESPIYEGKCGMSGGRVEGKVIYETQSTHKLLAAFSQASMIHVKGDVN

EETFNEAYMMHTTTSPHYGIVASTETAAAMMKGNAGKRLINGSIERAIKFRKEIKRLRTESDGWFFDVWQPDHIDTT

ECWPLRSDSTWHGFKNIDNEHMYLDPIKVTLLTPGMEKDGTMSDFGIPASIVAKYLDEHGIVVEKTGPYNLLFLFSI

GIDKTKALSLLRALTDFKRAFDLNLRVKNMLPSLYREDPEFYENMRIQELAQNIHKLIVHHNLPDLMYRAFEVLPTM

VMTPYAAFQKELHGMTEEVYLDEMVGRINANMILPYPPGVPLVMPGEMITEESRPVLEFLQMLCEIGAHYPGFETDI

HGAYRQADGRYTVKVLKEESKK

Escherichia coli cadA mutant K320L nucleic acid sequence-mutation underlined

SEQ ID NO: 23

ATGAACGTTATTGCAATATTGAATCACATGGGGGTTTATTTTAAAGAAGAACCCATCCGTGAACTTCATCGCGCGCT

TGAACGTCTGAACTTCCAGATTGTTTACCCGAACGACCGTGACGACTTATTAAAACTGATCGAAAACAATGCGCGTC

TGTGCGGCGTTATTTTTGACTGGGATAAATATAATCTCGAGCTGTGCGAAGAAATTAGCAAAATGAACGAGAACCTG

CCGTTGTACGCGTTCGCTAATACGTATTCCACTCTCGATGTAAGCCTGAATGACCTGCGTTTACAGATTAGCTTCTT

TGAATATGCGCTGGGTGCTGCTGAAGATATTGCTAATAAGATCAAGCAGACCACTGACGAATATATCAACACTATTC

TGCCTCCGCTGACTAAAGCACTGTTTAAATATGTTCGTGAAGGTAAATATACTTTCTGTACTCCTGGTCACATGGGC

GGTACTGCATTCCAGAAAAGCCCGGTAGGTAGCCTGTTCTATGATTTCTTTGGTCCGAATACCATGAAATCTGATAT

TTCCATTTCAGTATCTGAACTGGGTTCTCTGCTGGATCACAGTGGTCCACACAAAGAAGCAGAACAGTATATCGCTC

GCGTCTTTAACGCAGACCGCAGCTACATGGTGACCAACGGTACTTCCACTGCGAACAAAATTGTTGGTATGTACTCT

GCTCCAGCAGGCAGCACCATTCTGATTGACCGTAACTGCCACAAATCGCTGACCCACCTGATGATGATGAGCGATGT

TACGCCAATCTATTTCCGCCCGACCCGTAACGCTTACGGTATTCTTGGTGGTATCCCACAGAGTGAATTCCAGCACG

CTACCATTGCTAAGCGCGTGAAAGAAACACCAAACGCAACCTGGCCGGTACATGCTGTAATTACCAACTCTACCTAT

GATGGTCTGCTGTACAACACCGACTTCATCAAGCAGACACTGGATGTGAAATCCATCCACTTTGACTCCGCGTGGGT

GCCTTACACCAACTTCTCACCGATTTACGAAGGTAAATGCGGTATGAGCGGTGGCCGTGTAGAAGGGAAAGTGATTT

ACGAAACCCAGTCCACTCACAAACTGCTGGCGGCGTTCTCTCAGGCTTCCATGATCCACGTTAAAGGTGACGTAAAC

GAAGAAACCTTTAACGAAGCCTACATGATGCACACCACCACTTCTCCGCACTACGGTATCGTGGCGTCCACTGAAAC

CGCTGCGGCGATGATGAAAGGCAATGCAGGTAAGCGTCTGATCAACGGTTCTATTGAACGTGCGATCAAATTCCGTA

AAGAGATCAAACGTCTGAGAACGGAATCTGATGGCTGGTTCTTTGATGTATGGCAGCCGGATCATATCGATACGACT

GAATGCTGGCCGCTGCGTTCTGACAGCACCTGGCACGGCTTCAAAAACATCGATAACGAGCACATGTATCTTGACCC

GATCAAAGTCACCCTGCTGACTCCGGGGATGGAAAAAGACGGCACCATGAGCGACTTTGGTATTCCGGCCAGCATCG

TGGCGAAATACCTCGACGAACATGGCATCGTTGTTGAGAAAACCGGTCCGTATAACCTGCTGTTCCTGTTCAGCATC

GGTATCGATAAGACCAAAGCACTGAGCCTGCTGCGTGCTCTGACTGACTTTAAACGTGCGTTCGACCTGAACCTGCG

TGTGAAAAACATGCTGCCGTCTCTGTATCGTGAAGATCCTGAATTCTATGAAAACATGCGTATTCAGGAACTGGCTC

AGAATATCCACAAACTGATTGTTCACCACAATCTGCCGGATCTGATGTATCGCGCATTTGAAGTGCTGCCGACGATG

GTAATGACTCCGTATGCTGCATTCCAGAAAGAGCTGCACGGTATGACCGAAGAAGTTTACCTCGACGAAATGGTAGG

TCGTATTAACGCCAATATGATCCTTCCGTACCCGCCGGGAGTTCCTCTGGTAATGCCGGGTGAAATGATCACCGAAG

AAAGCCGTCCGGTTCTGGAGTTCCTGCAGATGCTGTGTGAAATCGGCGCTCACTATCCGGGCTTTGAAACCGATATT

CACGGTGCATACCGTCAGGCTGATGGCCGCTATACCGTTAAGGTATTGAAAGAAGAAAGCAAAAAATAA

Escherichia coli OmpA polypeptide sequence-the region used for surface

display is underlined.

SEQ ID NO: 24

MKKTAIAIAVALAGFATVAQAAPKDNTWYTGAKLGWSQYHDTGFINNNGPTHENQLGAGAFGGYQVNPYVGFEMGYD

WLGRMPYKGSVENGAYKAQGVQLTAKLGYPITDDLDIYTRLGGMVWRADTKSNVYGKNHDTGVSPVFAGGVEYAITP

EIATRLEYQWTNNIGDAHTIGTRPDNGMLSLGVSYRFGQGEAAPVVAPAPAPAPEVQTKHFTLKSDVLFNFNKATLK

PEGQAALDQLYSQLSNLDPKDGSVVVLGYTDRIGSDAYNQGLSERRAQSVVDYLISKGIPADKISARGMGESNPVTG

NTCDNVKQRAALIDCLAPDRRVEIEVKGIKDVVTQPQA

Fusion polypeptide amino acid sequence containing the CBH1 CBD fused to the

surface display region of OmpA encoded by a polynucleotide encoding a fusion

protein comprising: E. coli lipoprotein leader sequence (italic) joined to an

OmpA display region (underlined) joined to a CBH1 CBD (bold). The protein

encoded by the The OmpA region is underlined.

SEQ ID NO: 25

MKATKLVLGAVILGSTLLAGCSSNAKIDQ
NNNGPTHENQLGAGAFGGYQVNPYVGFEMGYDWLGRMPYKGSVENGAY

KAQGVQLTAKLGYPITDDLDIYTRLGGMVWRADTKSNVYGKNHDTGVSPVFAGGVEYAITPEIATRSRVDSGGNPPG

GNRGTTTTRRPATTTGSSPGPTQSHYGQCGGIGYSGPTVCASGTTCQVLNPYYSQCL

HETEROLOGOUS EXPRESSION OF CARBOHYDRATE BINDING MODULES AND USES THEREOF FOR CADAVERINE PRODUCTION

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

PCT Information