PROTEIN FOR CONSTRUCTING PROTEIN COMPLEX FROM CLOSTRIDIUM THERMOCELLUM, AND USE THEREOF

Abstract
It is an object to provide a protein having a dockerin, which is suited to production in yeasts and other eukaryotic microorganism in which sugar chain modification is predicted, and which provides excellent cohesin-dockerin binding ability, along with a use thereof. The present invention uses, as a protein for constructing a protein complex using a scaffolding protein having a type I cohesin from Clostridium thermocellum, a protein having a dockerin having at least one dockerin-specific sequence which is a dockerin-specific sequence associated with cohesin binding in type I dockerins from C. thermocellum, and which either has no intrinsic predicted N-type sugar chain modification site or has aspartic acid substituted for the asparagine of an intrinsic predicted N-type sugar chain modification site.
Description
CROSS-REFERENCE TO RELATED APPLICATION(S)

This application claims priority to Japanese Patent Application No. 2010-088952 filed on Apr. 7, 2010, the contents of which are hereby incorporated by reference into the present application.


TECHNICAL FIELD

The present application relates to a protein for constructing a protein complex from Clostridium thermocellum, and to a use thereof.


DESCRIPTION OF RELATED ART

In recent years there has been increased interest in biomass resources derived from plant photosynthesis as a substitute for limited petroleum supplies, and various attempts have been made to use biomass for energy and various kinds of materials. In order for biomass to be used effectively as an energy source or other raw material, it must be saccharified into a carbon source that is readily available to animals and microorganisms.


Using typical forms of biomass such as cellulose and hemicellulose requires good cellulases for saccharifying (decomposing) these materials. Attention has focused on cellulosomes, which are produced by certain bacteria, as a source of such cellulases. Cellulosomes are protein complexes formed on the cell surfaces of bacteria, and comprise cellulases and scaffolding proteins (scaffoldins) to which the cellulases bind. Scaffolding proteins have sites called cohesins, and cellulases are known to bind to these cohesins via their own dockerins. Cellulosomes are capable of providing a variety of cellulases in large quantities and at high densities on bacterial cell surfaces.


Artificial construction of cellulosomes by genetic engineering has been studied in recent years. In the context of cellulosome construction, various studies have been made of binding between cohesins and dockerins, which is the basis of cellulosome construction. For example, several amino acid residues have been deleted or alanine scanned from dockerins of Clostridium thermocellum to evaluate binding with cohesins and identify the residues necessary for binding ability (Non-patent Document 1). According to this document, a dockerin produced in E. coli maintains about 70% the amount of binding with a cohesin when asparagine in the amino acid sequence is replaced with alanine, but interactions with calcium ions contributing to structural stability are weakened. It has also been reported that when binding ability is eliminated by substituting AA (alanine-alanine) for ST (serine-threonine) in one of two repeating amino acid sequences making up two helixes in a dockerin of C. thermocellum, the other helix binds with a cohesin (Non-patent document 2). With respect to cohesins, when several amino acid residues of a cohesin of C. thermocellum were replaced and binding with dockerins was evaluated, it was found that binding with dockerins from C. thermocellum was eliminated by replacing certain threonines with leucine, and instead, the cohesin bound to a dockerin from Clostridium cellulolyticum with which it did not ordinarily interact (Non-patent Document 3).

  • [Non-patent Document 1] A. Karpol et al., Biochem. J 410, 331-338 (2008)
  • [Non-patent Document 2] A. L. Carvalho et al., PNAS 104(9), 3089-3094 (2007)
  • [Non-patent Document 3] A. Mechaly et al., J. Biol. Chem. 276 (13), 9883-9888 (2001)


BRIEF SUMMARY OF INVENTION

Causing a yeast or the like to produce and excrete large quantities of cellulase is considered as desirable when constructing an artificial cellulosome. If a cellulosome can be constructed on the cell surface of a yeast or other eukaryotic microorganism, the glucose decomposed by the cellulosome can be used immediately by the yeast as a carbon source for efficient production of various useful substances. However, when a foreign protein from a bacteria or other prokaryote is produced with a yeast or other eukaryote, interaction between proteins can be affected by giant sugar chain modification.


According to the reports above, it appears that amino acid substitution of dockerin domains affects cohesin-dockerin binding, either by reducing binding ability (Non-patent Documents 1, 2) or altering binding specificity (Non-patent Document 3). However, there have been no reports on improving cohesin-dockerin binding ability. Moreover, the reports above pertain only to cohesins and dockerins produced in E. coli, in which sugar chain modification of proteins does not occur. Thus, at present there are no reports at all on how amino acid substitution of dockerin domains affects cohesin-dockerin binding in yeasts and other eukaryotic microorganisms, in which sugar chain modification does occur.


It is an object of the disclosures of this Description to provide a protein having a dockerin, wherein the protein is useful for producing a protein complex derived from Clostridium thermocellum in a yeast or other eukaryotic microorganism in which sugar chain modification is expected, and provides excellent cohesin-dockerin binding ability, along with a use thereof.


In a search for dockerins of C. thermocellum using DDBJ (http://www.ddbj.nig.ac.jp/index-j.html), the inventors in this case discovered 72 attributed dockerins on the genome of C. thermocellum, and after using UniProt (http://www.uniprot.org/) and the like to identify specific sequences thought to be associated with cohesin-dockerin binding in these dockerins, we analyzed these specific sequences by multiple alignment and the like. As a result, the similarity of these 142 specific sequences exceeded 90%. It is therefore thought that all these specific sequences have binding ability with cohesins.


The inventors also discovered that of these specific sequences, 113 or about 80% of the relevant sequences have predicted sugar chain modification sites, while the remaining 29 sequences lack predicted sugar chain modification sites. The inventors then targeted two predicted sugar chain binding sites located near the scaffolding protein binding region of a dockerin from C. thermocellum, replacing the asparagines at these sites with alanine or aspartic acid. Sugar chain modification was eliminated by replacing asparagine with alanine in the dockerin, but cohesin binding ability was not improved. It is possible that the dockerin with asparagine replaced with alanine could not bind with cohesin because it does not assume a stable structure when produced in yeast. On the other hand, when a dockerin having the asparagine of the target site replaced with aspartic acid was produced in yeast, however, cohesin-dockerin binding increased, resulting in improved yeast saccharification ability.


From this, it was found that cohesin-dockerin binding ability can be increased and saccharification ability in eukaryotes in which sugar chain modification may occur can be improved if either predicted sugar chain modification sites are inherently lacking, or if when such a site is present, and an asparagine at the predicted site is replaced with aspartic acid to eliminate sugar chain modification, thereby improving cohesin-dockerin binding ability.


The disclosures of this description provide a protein for constructing a protein complex using a framework including a type I cohesin from C. thermocellum, wherein the protein has a dockerin containing at least one dockerin-specific sequence associated with cohesin binding in type I dockerins from C. thermocellum, and this dockerin satisfies either of the following conditions (a) and (b):


(a) having no intrinsic predicted N-type sugar chain modification site;


(b) having aspartic acid substituted for an asparagine of an intrinsic predicted N-type sugar chain modification site.


In a dockerin-specific sequence satisfying condition (a) above, the intrinsic predicted N-type sugar chain modification site may be an aspartic acid.


The protein disclosed in this Description may also have cellulolysis promotion activity, and this cellulolysis promotion activity may be cellulase activity. The cellulolysis promotion activity may also be conferred by an amino acid sequence from Clostridium thermocellum.


The disclosures of this Description provide a eukaryotic microorganism having a protein complex using a scaffolding protein from Clostridium thermocellum in the cell surface, wherein the eukaryotic microorganism is provided with a scaffolding protein from Clostridium thermocellum and the protein disclosed in this description, which binds with this scaffolding protein.


The disclosures of this Description provide a method for producing a useful substance, having a step of saccharifying a cellulose-containing material using a process of fermenting a cellulose-containing material as a carbon source with the eukaryotic microorganism disclosed in this Description, which is a eukaryotic microoganism in which the aforementioned dockerin protein has cellulolysis promotion activity.





BRIEF DESCRIPTION OF DRAWINGS


FIG. 1 shows a pAI-AGA1 vector prepared in Example 1.



FIG. 2 shows a pDL-CtCBDCohAGA2 vector having a Leu2 marker and ADH3 homologous region prepared in Example 2.



FIG. 3 shows an amino acid sequence having alanine or aspartic acid substituted for the No. 18 and No. 50 asparagines in the amino acid sequence of a Cel48S dockerin gene, and the corresponding genetic sequence.



FIG. 4 shows a pXU-Cel48Sdoc vector, pXU-Cel48S-N-A-doc vector and pXU-Cel48S-N-D-doc vector prepared in Example 3.



FIG. 5 shows the displayed amount of dockerin in a protein complex surface-displaying yeast containing an amino acid-substituted Cel48S dockerin.



FIG. 6 shows an amino acid sequence having aspartic acid substituted for the No. 18 and No. 54 asparagines in the amino acid sequence of a Xyn10C dockerin gene, and a corresponding genetic sequence.



FIG. 7 shows a pXU-Xyn10Cdoc vector and pXU-Xyn10C-N-D-doc vector prepared in Example 5.



FIG. 8 shows the displayed amount of dockerin in a protein complex surface-displaying yeast containing an amino acid-substituted Xyn10C dockerin.



FIG. 9 shows a pXU-Cel8A-Cel48Sdoc and pXU-Cel8A-Cel48S-N-Ddoc vector prepared in Example 7.



FIG. 10 shows the displayed amount of cellulase in a protein complex surface-displaying yeast containing amino acid-substituted dockerin-type cellulase.



FIG. 11 shows CMC decomposition activity of a protein complex surface-displaying yeast containing amino acid-substituted dockerin-type cellulase.





DETAILED DESCRIPTION OF INVENTION

The disclosures of this Description relate to a protein for constructing a protein complex using a scaffolding protein having type I cohesin from C. thermocellum, to a eukaryotic microorganism provided with a protein complex comprising this protein, and to a method for producing a useful substance using this eukaryotic microorganism.


The protein disclosed in this description has at least one dockerin-specific sequence which is a sequence associated with cohesin binding ability in a type I dockerin from C. thermocellum, and which either has no intrinsic predicted N-type sugar chain modification site or has aspartic acid substituted for asparagine at an intrinsic predicted N-type sugar chain modification site. Sugar chain modification is thus eliminated even if the protein disclosed in this Description is produced in a yeast or other eukaryotic microorganism in which sugar chain modification is expected to occur. As a result, the protein disclosed in this Description has excellent binding ability with type I cohesins of scaffolding proteins from C. thermocellum, and can be used to construct a protein complex in which this protein is bound densely and/or in large amounts.


The eukaryotic microorganism disclosed in this description may be provided in the cell surface with a protein complex in which the protein disclosed in this Description is accumulated densely and/or in large amounts. It is thus possible to obtain a eukaryotic microorganism in which the function of the protein of the invention is enhanced. Because the protein of the invention has excellent cohesin binding ability even when produced in a eukaryotic microorganism, this protein and the aforementioned scaffolding protein may both be produced by the eukaryotic microorganism disclosed in this Description. This eukaryotic microorganism may be a yeast.


The method of producing a useful substance disclosed in this Description comprises a step of fermenting a cellulose-containing material as a carbon source using the eukaryotic microorganism disclosed in this Description, in which the aforementioned protein is a protein having cellulolysis promotion activity. Because the eukaryotic microorganism disclosed in this Description has enhanced cellulolysis promotion activity, it can efficiently ferment a cellulose-containing material as a carbon source.


(Protein for Constructing Protein Complex Using Scaffolding Protein Having Type 1 Cohesin from C. Thermocellum)


The protein disclosed in this Description is a protein especially suited to constructing a protein complex using a scaffolding protein having a type I cohesin from C. thermocellum. This protein may have a dockerin comprising at least one dockerin-specific sequence that is associated with cohesin binding in a type I dockerin from C. thermocellulum, and that fulfills the condition of either (a) having no intrinsic predicted N-type sugar chain modification site or (b) having aspartic acid substituted for an asparagine of an intrinsic predicted N-type sugar chain modification site.



C. thermocellum is known as a cellulosome-producing microorganism. C. thermocellum also has cellulase activity, and produces proteins containing type I dockerins. Based on the results of a search of the C. thermocellum genome in DDBJ (http://www.ddbj.nig.ac.jp/index-j.html), the 72 amino acid sequences shown by Seq. Nos. 1 to 72 in Table I below can be given as examples of type I dockerin amino acid sequences from C. thermocellum. The locus (sequence) names shown in Table 1 are the names of each dockerin. Thus, the amino acid sequences specified by dockerin names in the locus column of Tables 2 to 21 derive from the amino acid sequences of dockerins having the same name in Table 1.











TABLE 1






locus



SEQ. D
(SEQ Name
Amino Acid Sequence







 1
C the 0015
DVNADGKDSTDLTLLKRYLLRSATLEEKLNADTDGNGTVNSTDLNYLKKYLRVI





 2
C the 0032
DLNNDGNNSTDYMLKKYLKVLERtext missing or illegible when filed NVPEKAADLNGDGStext missing or illegible when filed NSTDLTLKRFtext missing or illegible when filed MKAI





 3
C the 0043
DLNGDGNNSTDFTMLKRAtext missing or illegible when filed LGNPAPGTNLAAGDLNRDGNTNSTDLMLRRYLLKLI





 4
C the 0044
DNLDGKtext missing or illegible when filed NSTDLSALKRHtext missing or illegible when filed LRtext missing or illegible when filed TTLSGKQLENADVNNDGSVNSTDAStext missing or illegible when filed LKKYAKAI





 5
C the 0109
DFNSDSSVNSTDLMtext missing or illegible when filed LNRAVLGLG





 6
C the 0190
ELNGDGKNSSDLNMMKRYLLRLtext missing or illegible when filed DGLNDTACADLNGDGKNSSDYSLKRYLLRMI





 7
C the 0191
DLNGDAKtext missing or illegible when filed NSTDLNMMKRYLLQMtext missing or illegible when filed DRFGVDDESCADLNGDGKtext missing or illegible when filed SSDYNLLKRYtext missing or illegible when filed LHLI





 8
C the 0211
DVNGDGHVNSSDYSLFKRYLLRVtext missing or illegible when filed DRFPVGDQSVADVNRDGRDSTDLTMLKRYLRA I





 9
C the 0239
GDYNGDGAVNSTDLLACKRYLLYALKPEQNVIAGDLDGNGKNST DYAYLKRYLLKQ I





10
C the 0246
DLNADGKNSTDYNLGKRLLRTtext missing or illegible when filed ELPtext missing or illegible when filed SNGSVAFDLNGDSKVDSTDLTALKRYLLGV I





11
C the 0258
DVNGDSKNADVLLMKKYLKVNDLPSDGVKAADVNADGQNSDFTWLKKYMLKAV





12
C the 0269
DVNGDGNVNSTDLTMLKRYLLKSVTNtext missing or illegible when filed NREAADVNRDGANSSDMTLKRYLtext missing or illegible when filed KSI





13
C the 0270
DLNGDGKVNSSDLALKRYMLRAtext missing or illegible when filed DFPPEGRKLADLNRDGNVNSTDYStext missing or illegible when filed LKRYtext missing or illegible when filed LKAI





14
C the 0274
CDVGDLNVDGStext missing or illegible when filed NSVDtext missing or illegible when filed YMKRYLLRStext missing or illegible when filed VLPYQENERtext missing or illegible when filed RPAADTNGDGAtext missing or illegible when filed NSSDMVLLKRYVLRSI





15
C the 0405
DVNGDGNVNSTDVVWLRRFLLKLVEDFPVPSGKQAADMNDDGNtext missing or illegible when filed NSTDMtext missing or illegible when filed ALKRKVLKtext missing or illegible when filed P





16
C the 0412
DCNGDGKVNSTDAVALKRYtext missing or illegible when filed LRSGtext missing or illegible when filed NTDNADVNADGRVNSTDLALKRYLKEI





17
C the 0413
DCNDDGKVNSTDVAVMKRYLKKENVNNLDNADVNADGKVNSTDFSLKRYYMKNI





18
C the 0433
DLNGDGRVNSSDLALMKRYVVKQEKLNVPVKAADLNGDDKVNSTDYSVLKRYLLRSI





19
C the 0435
DVNADGVVNtext missing or illegible when filed SDYVLMKRYLRIADFPADDDMWVGDVNGDNVtext missing or illegible when filed NDDCNYLKRYLLHM I





20
C the 0438
DLNGDNNNSSDYTLLKRYLLHTI





21
C the 0536
DVNGDGRVNSSDVALLKRYLLGLVENNKEAADVNVSGTVNSTDLAMKRYVLRSI





22
C the 0543
DVNFDGRNSTDYSRLKRYVKSLEFTDPEEHQKFIAAADVDGNGRNSTDLYVLNRYtext missing or illegible when filed LKLI





23
C the 0578
DNLDGKtext missing or illegible when filed NSSDVTLLKRYtext missing or illegible when filed KSDVFPTADPERSLtext missing or illegible when filed ASDVNGDGRVNSTDYSYLKRYVLKII





24
C the 0640
DLNGDNNVNSTDLTLLKRYLTRVNDFPHPDGSVNADVNGDGKtext missing or illegible when filed NSTDYSAMRYtext missing or illegible when filed LRII





25
C the 0661
DVNGDLKVNSTDFSMLRRYLLKTDNFPTENGKQAADLNGDGRNSSDLTMLKRYLLMEV





26
C the 0624
DLNNDSKVNAVDMMLKRYtext missing or illegible when filed LGIDtext missing or illegible when filed NLTADIYFDGVVNSSDYNMKRYLLKAI





27
C the 0625
DLNGDGVVNSTDSVtext missing or illegible when filed LKRHIKFSEtext missing or illegible when filed DPVKLKAADLNGDGNtext missing or illegible when filed NSSDVSLMKRYLLRII





28
C the 0660
DLNGDGKNSTDISLMKRYLLKQIVDLPVEDDKAADNKDGKVNSTDMStext missing or illegible when filed LKRVtext missing or illegible when filed LRNY





29
C the 0729
DSNSDCKVNSTDLTLMKRYLLQQStext missing or illegible when filed SYNLtext missing or illegible when filed NADLNGDGKtext missing or illegible when filed NSSDYTLLKRYLLGYI





30
C the 0745
DNNDKTVNSTDVTYLKRFLLKQtext missing or illegible when filed NSLPNQKAADVNLDGNtext missing or illegible when filed NSTDLVLKRYVLRGI





31
C the 0797
DVNGDGKNSTDCTMLKRYLRGEEFPSPSGItext missing or illegible when filed AADVNADLKNSTDLVLMKKYLLRSI





32
C the 0798
DVNLDGQVNSTDFSLLKRYtext missing or illegible when filed LKVVDNStext missing or illegible when filed NVTNADMNNDGNNSTDIStext missing or illegible when filed LKRLLRN





33
C the 0821
DNRDGKtext missing or illegible when filed STDLGMLNRHLKLVtext missing or illegible when filed LDDNLKLAAADtext missing or illegible when filed DGNGNNSTDYSWLKKYLKVI





34
C the 0825
DVNDDGKVNSTDLTLLKRYVLKAVSTLPSSKAEKNADVNRDGRVNSSDVTtext missing or illegible when filed LSRYLtext missing or illegible when filed RVI





35
C the 0912
DVNGDGTtext missing or illegible when filed NSTDLTMLKRSVLRAItext missing or illegible when filed LTDDAKARADVDKNGSNSTDVLLLSRYLLRVI





36
C the 0918
DLNRNGtext missing or illegible when filed NDEDYtext missing or illegible when filed LLKNYLLRGNKLVtext missing or illegible when filed DLNVADVNKDGKVNSTDCLFLKKYLGLI





37
C the 1271
DTNSDGKNSTDVTALKRHLLRVTQLTGDNLANADVNGDGNVNSTDLLLLKRYtext missing or illegible when filed LGEI





38
C the 1398
DLNGDNRNSTDLTLMKRYLKStext missing or illegible when filed EDLPVEDDLWAADtext missing or illegible when filed NGDGKNSTDYTYLKKYLLQAI





39
C the 1400
DLNGDGRVNSTDYTLLKRYLLGAQTFPYERGtext missing or illegible when filed KAADLNLDGRNSTDYTVLKRYLLNA I





40
C the 1472
DLNFDNVAVNSTDLLMLKRYLKSLELGTSEQEEKFKKAADLNRDNKVDSTDLTLKRYLLKA I





41
C the 1806
EVIDTKVDSTDDtext missing or illegible when filed KYEYQFDKKtext missing or illegible when filed LCADKETEtext missing or illegible when filed LYFTVVADEEEtext missing or illegible when filed TSDNTRTLVLSVNNDSTDKTTVSGYtext missing or illegible when filed V





42
C the 1838
DVNGDGRVNSSDLTLMKRYLLKStext missing or illegible when filed DFPTPEGKtext missing or illegible when filed AADLNEDGKVNSTDLLALKKLVLREL





43
C the 1890
DLNADGSNSTDLMtext missing or illegible when filed KRVLLKQRTLDDtext missing or illegible when filed PADLNGDGKVTSTDYSLMKRYLLKEI





44
C the 1963
DLNGDGNNSSDLQALKRHLLGtext missing or illegible when filed PLTGEALLRADVNRSGKVDSTDYSVLKRYLRII





45
C the 2038
Dtext missing or illegible when filed LDGNNSLDMMKLKKYLRETQFNYDELLRADVNSDGEVNSTDYAYLKRYtext missing or illegible when filed LRII





46
C the 2089
DVNDDGKVNSTDAVALKRYVLRSGtext missing or illegible when filed NTDNADLNEDGRVNSTDLGtext missing or illegible when filed LKRYLKEI





47
C the 2137
DVDGNGTVNSTDVNYMKRYLLRQEEFPYEKALMAGDVDGNGNtext missing or illegible when filed NSTDLSYLKKYtext missing or illegible when filed LKLI





48
C the 2139
DVNAGVtext missing or illegible when filed NSSDMVLKRFLLRTtext missing or illegible when filed LTEEMLLNADTNGDGAVNSSDFTLLKRYLRSI





49
C the 2147
DVNGDFAVNSNDLTLtext missing or illegible when filed KRYVLKNDEFPSSHGLKAADVDGDEKtext missing or illegible when filed SSDAALVKRYVLRA I





50
C the 2179
DLNGDGNVNSTDSILMKRYLMKSVDLNEEQLKAADVNLDGRVNSTDRSILNRYLLKII





51
C the 2193
DNDDGNNSTDLQMLKRHLLRSIRLTEKQLLNADTNRDGRVDSTDLALLKRYLRV I





52
C the 2194
DLNGDGNNSTDLQLKKHLLRtext missing or illegible when filed LLTGKELSNADVTKDGKVDSTDLTLLKRY LRFV





53
C the 2195
DLNDDGKVNSTDFQLKKHLLRtext missing or illegible when filed LLTGKNLSNADLNKDGKVDSSDLSLMKRYLLQ II





54
C the 2196
DLNNDGKVNSTDFQLLKMHVLRQELPAGTDLSNADVNRDGKVDSSDCTLLKRY LRV I





55
C the 2197
DLNGDGKVNSTDLQLMKMHVLRQRQLTGTSLLNADVNRDGKVDSTDVALLKRYLRQI





56
C the 2271
DVNLDGSVDSDLALLYNTTYYAVPLPNRLQYIAADVNYDSSCTMLDFYMLEDYLLGRtext missing or illegible when filed SFPAGQTYTVYYGDLNGDQLVTTD





57
C the 2360
DLNGDGRVNSTDLLLMKKRIREDKFNVPDENADLNLDGKtext missing or illegible when filed NSSDYTLKRYVLKSI





58
C the 2549
DVNKDGRNSTDtext missing or illegible when filed MYLKGYLLRNSAFNLDEYGLMAADVDGNGSVSSLDLTYLKRYtext missing or illegible when filed LRRI





59
C the 2590
DLNQDGQVSSTDLVAMKRYLLKNFELSGVGLEAADLNSDGKVNSTDLVALKRFLLKEI





60
C the 2760
DLNYDGKVNSTDYLVLKRYLLGTtext missing or illegible when filed DKESDPNFLKAADLNRDGRVNSTDMSLMKRYLLG II





61
C the 2761
DVNGDGKVNSTDCStext missing or illegible when filed KRYLLKNEDFPYEYGKEAGDVNGDGKVNSTDYSLLKRFVLRNI





62
C the 2811
DLNGDGKVNSTDLTtext missing or illegible when filed MKRYtext missing or illegible when filed LKNFDKLAVPEEAADLNGDGRtext missing or illegible when filed NSTDLStext missing or illegible when filed LHRYLLRII





63
C the 2812
DLNGDQKVTSTDYTMLKRYLMKStext missing or illegible when filed DRFNTSEQAADLNRDGKNSTDLTtext missing or illegible when filed LKR





64
C the 2872
DNSDGNVNSTDLGLKRtext missing or illegible when filed VKNPPASANMDAADVNADGKVNSTDYTVLKRYLLRSI





65
C the 2879
DNSDGStext missing or illegible when filed STDVTLLKRHLLRENLTGTAYSNADTDGDGKtext missing or illegible when filed SIDLSYLKRYVLRLI





66
C the 2949
DLNGDGLVNSSDYSLLKRYLKQDLTEEKLKAADLNRNGSVDSVDYStext missing or illegible when filed LKRFLLKTI





67
C the 2950
DLNNDGRTNSTDYSLMKRYLLGStext missing or illegible when filed FTNEQLKAADVNLDGKVNSSDYTVLRRFLLGSI





68
C the 2972
VLGDLNGDKQVNSTDYTALKRHLLNtext missing or illegible when filed RLSGTALANADLNGDGKVDSTDLMLHRYLLGII





69
C the 3012
DLNGDGNVNSTDSTLMSRYLLGtext missing or illegible when filed LPAGEKAADLNGDGKVNSTDYNtext missing or illegible when filed LKRYLLKYI





70
C the 3132
DLNGDGRVNSTDLAVMKRYLLKQVQISDRPADLNGDGKANSTDYQLLKRYtext missing or illegible when filed LKTI





71
C the 3136
DDGNGEtext missing or illegible when filed SDYALKSHLNSNLTFKQLAAADVDGNGYVNSDLALQMYLLGKGGTSDI






text missing or illegible when filed indicates data missing or illegible when filed







Similar amino acid sequences that are dockerin-specific sequences associated with cohesin binding can be discovered in these 72 type I dockerins. A dockerin-specific sequence may consist of a naturally-derived consensus sequence (relevant sequence) consisting of 24 amino acids. The total of 142 amino acid sequences shown by SEQ ID NOS. 73 to 214 in Table 2 below can be given as examples of relevant sequences intrinsic to the 72 type I dockerins. These amino acid sequences can be obtained from databases such as UniProt (http://www.uniprot.org/), InterPro (http://www.ebi.ac.uk/interpro/) and Pfam (http://pfam.sanger.ac.uk). N-terminal relevant sequences are described in the 1st column of Table 2, while C-terminal relevant sequences are described in the 2nd column.












TABLE 2






locus




SEQ. D
(SEQ Name)

Related Sequence







 73
C the 0015
1st
DVNADGKtext missing or illegible when filed DSTDLTLLKRYLLRSA


 74

2nd
DTDGNGTVNSTDLNYLKKYtext missing or illegible when filed LRVI





 75
C the 0032
1st
DLNNDGNtext missing or illegible when filed NSTDYMtext missing or illegible when filed LKKYLKVL


 76

2nd
DLNGDGSNSTDLTtext missing or illegible when filed LKRFtext missing or illegible when filed MKAI





 77
C the 0043
1st
DLNGDGNtext missing or illegible when filed NSTDFTMLKRAtext missing or illegible when filed LGNP


 78

2nd
DLNRDGNTNSTDLMtext missing or illegible when filed LRRYLLKLI





 79
C the 0044
1st
Dtext missing or illegible when filed NLDGKNSTDLSALKRHLRtext missing or illegible when filed


 80

2nd
DVNNDGSVNSTDAStext missing or illegible when filed LKKYtext missing or illegible when filed AKAI





 81
C the 0109
1st
DFNSDSSVNSTDLMtext missing or illegible when filed LNRAVLGLG




2nd






 82
C the 0190
1st
ELNGDGKtext missing or illegible when filed NSSDLNMMKRYLLRLI


 83

2nd
DLNGDGKtext missing or illegible when filed NSSDYStext missing or illegible when filed LKRYLLRM I





 84
C the 0191
1st
DLNGDAKNSTDLNMMKRYLLQM I


 85

2nd
DLNGDGKtext missing or illegible when filed SSDYNLLKRYtext missing or illegible when filed LHLI





 86
C the 0211
1st
DVNGDGHVNSSDYSLFKRYLLRV I


 87

2nd
DVNRDGRDSTDLTMLKRYLtext missing or illegible when filed RAI





 88
C the 0239
1st
DYNGDGAVNSTDLLACKRYLLYAL


 89

2nd
DLDGNGKtext missing or illegible when filed NSTDYAYLKRYLLKQI





 90
C the 0246
1st
DLNADGKtext missing or illegible when filed NSTDYNLGKRLtext missing or illegible when filed LRTI


 91

2nd
DLNGDSKVDSTDLTALKRYLLGVI





 92
C the 0258
1st
DVNGDSKNAtext missing or illegible when filed DVLLMKKYtext missing or illegible when filed LKVI


 93

2nd
DVNADGQtext missing or illegible when filed NStext missing or illegible when filed DFTWLKKYMLKAV





 94
C the 0269
1st
DVNGDGNVNSTDLTMLKRYLLKSV


 95

2nd
DVNRDGANSSDMTLKRYLKSI





 96
C the 0270
1st
DLNGDKVNSSDLAtext missing or illegible when filed LKRYMLRAI


 97

2nd
DLNRDGNVNSTDYStext missing or illegible when filed LKRYtext missing or illegible when filed LKAI





 98
C the 0274
1st
DLNVDGStext missing or illegible when filed NSVDtext missing or illegible when filed YMKRYLLRSI


 99

2nd
DTNGDGANSSDMVLLKRYVLRSI





100
C the 0405
1st
DVNGDGNVNSTDVVWLRRFLLKLV


101

2nd
DMNDDGNtext missing or illegible when filed NSTDMtext missing or illegible when filed ALKRKVLKP





102
C the 0412
1st
DCNGDGKVNSTDAVALKRYtext missing or illegible when filed LRSG


103

2nd
DVNADGRVNSTDLAtext missing or illegible when filed LKRYtext missing or illegible when filed LKEI





104
C the 0413
1st
DCNDDGKVNSTDVAVMKRYLKKEN


105

2nd
DVNADGKVNSTDFStext missing or illegible when filed LKRYVMKNI





106
C the 0433
1st
DLNGDGRVNSSDLALMKRYVVKQ I


107

2nd
DLNGDDKVNSTDYSVLKRYLLRSI





108
C the 0435
1st
DVNADGVVNtext missing or illegible when filed DVVLMKRYtext missing or illegible when filed LRII


109

2nd
DVNGDNGNDtext missing or illegible when filed DCNYLKRYLLHM I





110
C the 0438
1st
DLNGDNNtext missing or illegible when filed NSSDYTLLKRYLLHTI




2nd






111
C the 0536
1st
DVNGDGRVNSSDVALLKRYLLGLV


112

2nd
DVNVSGTVNSTDLAtext missing or illegible when filed KRYVLRSI





113
C the 0543
1st
DVNFDGRtext missing or illegible when filed NSTDYSRLKRYVtext missing or illegible when filed KSL


114

2nd
DVDGNGRtext missing or illegible when filed NSTDLYVLNRYtext missing or illegible when filed LKLI





115
C the 0578
1st
Dtext missing or illegible when filed NLDGKtext missing or illegible when filed NSSDVTLLKRYtext missing or illegible when filed KSI


116

2nd
DVNGDGRVNSTDYSYLKRYVLKII





117
C the 0640
1st
DLNGDNNVNSTDLTLLKRYLTRV I


118

2nd
DVNGDGKtext missing or illegible when filed NSTDYSAMRYtext missing or illegible when filed LRII





119
C the 0661
1st
DVNGDLKVNSTDFSMLRRYLLKTI


120

2nd
DLNGDGRtext missing or illegible when filed NSSDLTMLKRYLLMEV





121
C the 0624
1st
DLNNDSKVNAVDtext missing or illegible when filed MMLKRYtext missing or illegible when filed LGII


122

2nd
DIYFDGVVNSSDYNtext missing or illegible when filed MKRYLLKAI





123
C the 0625
1st
DLNGDGVVNSTDSVtext missing or illegible when filed LKRHIKFS


124

2nd
DLNGDGNtext missing or illegible when filed NSSDVSLMKRYLLRII





125
C the 0660
1st
DLNGDGKtext missing or illegible when filed NSTDtext missing or illegible when filed SLMKRYLLKQI


126

2nd
Dtext missing or illegible when filed NKDGKVNSTDMStext missing or illegible when filed LKRVtext missing or illegible when filed LRNY





127
C the 0729
1st
DSNSDCKVNSTDLTLMKRYLLQQS


128

2nd
DLNGDGKNSSDYTLLKRYLLGYI





129
C the 0745
1st
Dtext missing or illegible when filed NNDKTVNSTDVTYLKRFLLKQI


130

2nd
DVNLDGNtext missing or illegible when filed NSTDLVtext missing or illegible when filed LKRYVLRGI





131
C the 0797
1st
DVNGDGKtext missing or illegible when filed NSTDCTMLKRYtext missing or illegible when filed LRG I


132

2nd
DVNADLKtext missing or illegible when filed NSTDLVLMKKYLLRS I





133
C the 0798
1st
DVNLDGQVNSTDFSLLKRYLKVV


134

2nd
DMNNDGNtext missing or illegible when filed NSTDtext missing or illegible when filed LKRLLRN





135
C the 0821
1st
Dtext missing or illegible when filed NRDGKtext missing or illegible when filed NSTDLGMLNRHtext missing or illegible when filed LKLV


136

2nd
Dtext missing or illegible when filed DGNGNtext missing or illegible when filed NSTDYSWLKKYtext missing or illegible when filed LKVI





137
C the 0825
1st
DVNDDGKVNSTDLTLLKRYVLKAV


138

2nd
DVNRDGRVNSSDVTtext missing or illegible when filed LSRYLtext missing or illegible when filed RVI





139
C the 0912
1st
DVNGDGTtext missing or illegible when filed NSTDLTMLKRSVLRAI


140

2nd
DVDKNGStext missing or illegible when filed NSTDVLLLSRYLLRVI





141
C the 0918
1st
DLNRNGtext missing or illegible when filed NDEDYtext missing or illegible when filed LLKNYLLRGN


142

2nd
DVNKDGKVNSTDCLFLKKYtext missing or illegible when filed LGLI





143
C the 1271
1st
DTNSDGKtext missing or illegible when filed NSTDVTALKRHLLRVT


144

2nd
DVNGDGNVNSTDLLLLKRYLGEI





145
C the 1398
1st
DLNGDNRtext missing or illegible when filed NSTDLTLMKRYtext missing or illegible when filed LKSI


146

2nd
Dtext missing or illegible when filed NGDGKtext missing or illegible when filed NSTDYTYLKKYLLQAI





147
C the 1400
1st
DLNGDGRVNSTDYTLLKRYLLGAI


148

2nd
DLNLDGRtext missing or illegible when filed NSTDYTVLKRYLLNAI





149
C the 1472
1st
DLNFDNAVNSTDLLMLKRYLKSL


150

2nd
DLNRDNKVDSTDLTtext missing or illegible when filed LKRYLLKAI





151
C the 1806
1st
EVDTKVDSTDDtext missing or illegible when filed KYEYQFDKK


152

2nd
TLVLSVNNDSTDKTTVSGYtext missing or illegible when filed VDF





153
C the 1838
1st
DVNGDGRVNSSDLTLMKRYLLKSI


154

2nd
DLNEDGKVNSTDLLALKKLVLREL





155
C the 1890
1st
DLNADGStext missing or illegible when filed NSTDLMMKRVLLKQR


156

2nd
DLNGDGKVTSTDYSLMKRYLLKEI





157
C the 1963
1st
DLNGDGNtext missing or illegible when filed NSSDLQALKRHLLGtext missing or illegible when filed


158

2nd
DVNRSGKVDSTDYSVLKRYtext missing or illegible when filed LRII





159
C the 2038
1st
Dtext missing or illegible when filed LDGNtext missing or illegible when filed NSLDMMKLKKYLtext missing or illegible when filed RET


160

2nd
DVNSDGEVNSTDYAYLKRYtext missing or illegible when filed LRII





161
C the 2089
1st
DVNDDGKVNSTDAVALKRYVLRSG


162

2nd
DLNEDGRVNSTDLGLKRYtext missing or illegible when filed LKEI





163
C the 2137
1st
DVDGNGTVNSTDVNYMKRYLLRQI


164

2nd
DVDGNGNNSTDLSYLKKYLKLI





165
C the 2139
1st
DVNADGVtext missing or illegible when filed NSSDtext missing or illegible when filed MVLKRFLLRTI


166

2nd
DTNGDGAVNSSDFTLLKRVtext missing or illegible when filed LRSI





167
C the 2147
1st
DVNGDFAVNSNDLTLKRYVLKNI


168

2nd
DVDGDEKtext missing or illegible when filed SSDAALVKRYVLRAI





169
C the 2179
1st
DLNGDGNVNSTDStext missing or illegible when filed LMKRYLMKSV


170

2nd
DVNLDGRVNSTDRStext missing or illegible when filed LNRYLLKII





171
C the 2193
1st
Dtext missing or illegible when filed NDDGNtext missing or illegible when filed NSTDLQMLKRHLLRSI


172

2nd
DTNRDGRVDSTDLALLKRYtext missing or illegible when filed LRVI





173
C the 2194
1st
DLNGDGNtext missing or illegible when filed NSTDLQtext missing or illegible when filed LKKHLLRIT


174

2nd
DVTKDGKVDSTDLTLLKRYtext missing or illegible when filed LRFV





175
C the 2195
1st
DLNDDGKVNSTDFQtext missing or illegible when filed LKKHLLRItext missing or illegible when filed


176

2nd
DLNKDGKVDSSDLSLMKRYLLQII





177
C the 2196
1st
DLNNDGKVNSTDFQLLKMHVLRQE


178

2nd
DVNRDGKVDSSDCTLLKRYLRVI





179
C the 2197
1st
DLNGDGKVNSTDLQLMKMHVLRQR


180

2nd
DVNRDGKVDSTDVALLKRYtext missing or illegible when filed LRQI





181
C the 2271
1st
DVNLDGSVDStext missing or illegible when filed DLALLYNTTYYAV


182

2nd
DVNGDGTVDGtext missing or illegible when filed DLAtext missing or illegible when filed AYtext missing or illegible when filed NGQI





183
C the 2360
1st
DLNGDGRVNSTDLLLMKKRIREI


184

2nd
DLNLDGKtext missing or illegible when filed NSSDYTtext missing or illegible when filed LKRVVLKSI





185
C the 2549
1st
DVNKDGRtext missing or illegible when filed NSTDtext missing or illegible when filed MYLKGVLLRNS


186

2nd
DVDGNGSVSSLDLTYLKRVLRRI





187
C the 2590
1st
DLNQDGQVSSTDLVAMKRYLLKNF


188

2nd
DLNSDGKVNSTDLVALKRFLLKEI





189
C the 2760
1st
DLNYDGKVNSTDYLVLKRYLLGTI


190

2nd
DLNRDGRVNSTDMSLMKRYLLGII





191
C the 2761
1st
DVNGDGKVNSTDCStext missing or illegible when filed KRYLLKNI


192

2nd
DVNGDGKVNSTDYSLLKRFVLRNI





193
C the 2811
1st
DLNGDGKVNSTDLTtext missing or illegible when filed MKRYtext missing or illegible when filed LKNF


194

2nd
DLNGDGRtext missing or illegible when filed NSTDLStext missing or illegible when filed LHRYLLRII





195
C the 2812
1st
DLNGDQKVTSTDYTMLKRYLMKSI


196

2nd
DLNRDGKtext missing or illegible when filed NSTDLTtext missing or illegible when filed LKRYLLYSI





197
C the 2872
1st
DNSDGNVNSTDLGtext missing or illegible when filed LKRtext missing or illegible when filed KNP


198

2nd
DVNADGKVNSTDYTVLKRYLLRSI





199
C the 2879
1st
Dtext missing or illegible when filed NSDGStext missing or illegible when filed NSTDVTLLKRHLLREN


200

2nd
DTDGDGKtext missing or illegible when filed Stext missing or illegible when filed DLSVLKRVVLRLI





201
C the 2949
1st
DLNGDGLVNSSDYSLLKRYLKQI


202

2nd
DLNRNGSVDSVtext missing or illegible when filed DYSLKRFLLKTI





203
C the 2950
1st
DLNNDGRTNSTDYSLMKRYLLGSI


204

2nd
DVNLDGKVNSSDYTVLRRFLLGSI





205
C the 2972
1st
DLNGDKQVNSTDYTALKRHLLNII


206

2nd
DLNGDGKVDSTDLMtext missing or illegible when filed LHRYLLGII





207
C the 3012
1st
DLNGDGNVNSTDSTLMSRYLLGII


208

2nd
DLNGDGKVNSTDYNtext missing or illegible when filed LKRYLLKVI





209
C the 3132
1st
DLNGDGRVNSTDLAVMKRYLLKQV


210

2nd
DLNGDGKANSTDYQLLKRYtext missing or illegible when filed LKTI





211
C the 3136
1st
Dtext missing or illegible when filed DGNGEtext missing or illegible when filed Stext missing or illegible when filed DYAtext missing or illegible when filed LKSHLtext missing or illegible when filed NSN






text missing or illegible when filed indicates data missing or illegible when filed







While a homology search of the doekerins shown in Table 1 revealed that the “homology” among these amino acid sequences does not exceed 90%, there is 90% or more “similarity” among the relevant sequences shown in Table 2. This suggests that the dockerins shown in Table 1 all have similar functions. It is therefore presumed that the relevant sequences shown in Table 2 are responsible for these functions.


In the dockerins shown in Table 1 or in other words in the relevant sequences shown in Table 2, the predicted N-type sugar chain modification sites are known to be N positions in N-X-T or N-X-S (in which N is asparagine, X is an amino acid other than proline, T is threonine and S is serine), which are consensus sequences that undergo N-type sugar chain modification in yeasts and other eukaryotic microorganisms (A. Herscovics et al., The FASEB Journal (6): 540-550 (1993)). An N-X-T/S of a dockerin or its relevant sequence can be found by suitable application of one of the databases described above or the like. A site corresponding to a predicted N-type sugar chain modification site in a dockerin or its relevant sequence may also correspond to N even when the amino acid sequence does not include one of the aforementioned consensus sequences. A site corresponding to a predicted N-type sugar chain modification site may be discovered by comparing an amino acid sequence that may contain this site by multiple alignment with the amino acid sequence of a known dockerin or its relevant sequence. If the amino acid sequence of a relevant sequence consists of about 24 or fewer amino acids, the predicted N-type sugar chain modification site in a dockerin, or a site corresponding to such a site, is typically the 9th amino acid from the N terminal.


The protein of the invention preferably has a dockerin comprising at least one dockerin-specific sequence having no predicted N-type sugar chain modification site. It also preferably has at least one dockerin-specific sequence in which the amino acid of a site corresponding to a predicted N-type sugar chain modification site is aspartic acid (D). It is thought that N-type sugar chain modification by yeasts and other eukaryotic microorganisms is eliminated when there is no N-type sugar chain modification site or when a site corresponding to a predicted sugar chain modification site is occupied by aspartic acid. A dockerin-specific sequence in which a site corresponding to a predicted sugar chain modification site is occupied by aspartic acid may be intrinsic to the original dockerin, or may have a N-type sugar chain modification site at which aspartic acid (D) has been substituted for asparagine (N).


Examples of one embodiment of this dockerin-specific sequence include dockerin-specific sequences having aspartic acid substituted for asparagine in the dockerins disclosed in Table 1 and the relevant sequences in these dockerins disclosed in Table 2 when these have intrinsic predicted N-type sugar chain modification sites. It is sufficient that the protein of the invention have a dockerin containing at least one such dockerin-specific sequence. Examples of relevant sequences having candidate N→D substitution sites include the following 113 relevant sequences. Consequently, preferred dockerin-specific sequences are sequences in which D is substituted for N (N-X-T/S) in the relevant sequences below.













TABLE 3







locus

Amino Acid Sequence









C the 0015
2nd
DTDGNGTVNSTDLNYLKKYtext missing or illegible when filed LRVI







C the 0032
1st
DLNNDGNtext missing or illegible when filed NSTDYMtext missing or illegible when filed LKKYLKVL







C the 0032
2nd
DLNGDGStext missing or illegible when filed NSTDLTLKRFtext missing or illegible when filed MKAI







C the 0043
1st
DLNGDGNtext missing or illegible when filed NSTDFTMLKRAtext missing or illegible when filed LGNP







C the 0043
2nd
DLNRDGNTNSTDLMtext missing or illegible when filed LRRYLLKLI







C the 0044
1st
Dtext missing or illegible when filed NLDGKNSTDLSALKRHtext missing or illegible when filed LRIT







C the 0044
2nd
DVNNDGSVNSTDAStext missing or illegible when filed LKKYIAKAI







C the 0109
1st
DFNSDSSVNSTDLMtext missing or illegible when filed LNRAVLGLG







C the 0190
1st
ELNGDGKtext missing or illegible when filed NSSDLNMMKRYLLRLI







C the 0190
2nd
DLNGDGKtext missing or illegible when filed NSSDYStext missing or illegible when filed LKRYLLRM I







C the 0191
1st
DLNGDAKtext missing or illegible when filed NSTDLNMMKRYLLQM I







C the 0211
1st
DVNGDGHVNSSDYSLFKRYLLRV I







C the 0239
1st
DYNGDGAVNSTDLLACKRYLLYAL







C the 0239
2nd 
DLDGNGKINSTDYAYLKRYLLKQI







C the 0246
1st
DLNADGKINSTDYNLGKRLILRTI







C the 0269
1st
DVNGDGNVNSTDLTMLKRYLLKSV







C the 0269
2nd 
DVNRDGAINSSDMTILKRYLIKSI







C the 0270
1st
DLNGDGKVNSSDLAILKRYMLRAI







C the 0270
2nd
DLNRDGNVNSTDYSILKRYILKAI







C the 0274
2nd
DTNGDGAINSSDMVLLKRYVLRSI







C the 0405
1st
DVNGDGNVNSTDVVWLRRFLLKLV







C the 0405
2nd
DMNDDGNINSTDMIALKRKVLKP







C the 0412
lst
DCNGDGKVNSTDAVALKRYtext missing or illegible when filed LRSG







C the 0412
2nd
DVNADGRVNSTDLAILKRYILKEI







C the 0413
1st
DCNDDGKVNSTDVAVMKRYLKKEN







C the 0413
2nd
DVNADGKVNSTDFSILKRYVMKNI







C the 0433
1st
DLNGDGRVNSSDLALMKRYVVKQI







C the 0433
2nd
DLNGDDKVNSTDYSVLKRYLLRS I







C the 0435
1st
DVNADGVVNISDYVLMKRYILRII







C the 0438
1st
DLNGDNNINSSDYTLLKRYLLHTI







C the 0536
1st
DVNGDGRVNSSDVALLKRYLLGLV







C the 0536
2nd
DVNVSGTVNSTDLAtext missing or illegible when filed MKRYVLRSI







C the 0543
1st
DVNFDGRINSTDYSRLKRYVtext missing or illegible when filed KSL







C the 0543
2nd
DVDGNGRNSTDLYVLNRYtext missing or illegible when filed LKLI







C the 0578
1st
DINLDGKNSSDVTLLKRYIVKSI







C the 0578
2nd
DVNGDGRVNSTDYSYLKRYVLKII







C the 0624
1st
DLNNDSKVNAVDtext missing or illegible when filed MMLKRYtext missing or illegible when filed LGII







C the 0624
2nd
DIYFDGVVNSSDYNtext missing or illegible when filed MKRYLLKA I







C the 0625
1st
DLNGDGVVNSTDSVtext missing or illegible when filed LKRHtext missing or illegible when filed KFS







C the 0625
2nd
DLNGDGNtext missing or illegible when filed NSSDVSLMKRYLLRII







C the 0640
1st
DLNGDNNVNSTDLTLLKRYLTRV I







C the 0640
2nd
DVNGDGKINSTDYSAMIRYILRII







C the 0660
1st
DLNGDGKINSTDISLMKRYLLKQ I







C the 0660
2nd
DINKDGKVNSTDMStext missing or illegible when filed LKRVILRNY







C the 0661
1st
DVNGDLKVNSTDFSMLRRYLLKT I







C the 0661
2nd
DLNGDGRINSSDLTMLKRYLLMEV







C the 0729
1st
DSNSDCKVNSTDLTLMKRYLLQQS







C the 0729
2nd
DLNGDGKtext missing or illegible when filed NSSDYTLLKRYLLGYI







C the 0745
1st
Dtext missing or illegible when filed NNDKTVNSTDVTYLKRFLLKQ I







C the 0745
2nd
DVNLDGNtext missing or illegible when filed NSTDLVILKRYVLRGI







C the 0797
1st
DVNGDGKINSTDCTMLKRYILRG I







C the 0797
2nd
DVNADLKNSTDLVLMKKYLLRSI







C the 0798
1st
DVNLDGQVNSTDFSLLKRYILKVV







C the 0798
2nd
DMNNDGNtext missing or illegible when filed NSTDISILKRtext missing or illegible when filed LLRN







C the 0821
1st
Dtext missing or illegible when filed NRDGKINSTDLGMLNRHILKLV







C the 0821
2nd
DIDGNGNtext missing or illegible when filed NSTDYSWLKKYILKV I







C the 0825
1st
DVNDDGKVNSTDLTLLKRYVLKAV







C the 0825
2nd
DVNRDGRVNSSDVTtext missing or illegible when filed LSRYLtext missing or illegible when filed RV I







C the 0912
1st
DVNGDGTNSTDLTMLKRSVLRA I







C the 0912
2nd
DVDKNGStext missing or illegible when filed NSTDVLLLSRYLLRV I







C the 0918
2nd
DVNKDGKVNSTDCLFLKKYLGLI







C the 1271
1st
DTNSDGKtext missing or illegible when filed NSTDVTALKRHLLRVT







C the 1271
2nd
DVNGDGNVNSTDLLLLKRYtext missing or illegible when filed LGEI







C the 1398
1st
DLNGDNRINSTDLTLMKRYILKSI







C the 1398
2nd
DINGDGKINSTDYTYLKKYLLQAI







C the 1400
1st
DLNGDGRVNSTDYTLLKRYLLGAI







C the 1400
2nd
DLNLDGRtext missing or illegible when filed NSTDYTVLKRYLLNAI







C the 1472
1st
DLNFDNAVNSTDLLMLKRYtext missing or illegible when filed LKSL







C the 1806
2nd
TLVLSVNNDSTDKTTVSGYISVDF







C the 1838
1st
DVNGDGRVNSSDLTLMKRYLLKS I







C the 1838
2nd
DLNEDGKVNSTDLLALKKLVLREL







C the 1890
1st
DLNADGSINSTDLMtext missing or illegible when filed MKRVLLKQR







C the 1963
1st
DLNGDGNINSSDLQALKRHLLGIS







C the 1963
2nd
DVNRSGKVDSTDYSVLKRYtext missing or illegible when filed LRII







C the 2038
2nd
DVNSDGEVNSTDYAYLKRYtext missing or illegible when filed LRII







C the 2089
1st
DVNDDGKVNSTDAVALKRYVLRSG







C the 2089
2nd
DLNEDGRVNSTDLGILKRYtext missing or illegible when filed LKEI







C the 2137
1st
DVDGNGTVNSTDVNYMKRYLLRQ I







C the 2137
2nd
DVDGNGNINSTDLSYLKKYtext missing or illegible when filed LKLI







C the 2139
1st
DVNADGVtext missing or illegible when filed NSSDtext missing or illegible when filed MVLKRFLLRTI







C the 2139
2nd
DTNGDGAVNSSDFTLLKRYtext missing or illegible when filed LRSI







C the 2179
1st
DLNGDGNVNSTDSLMKRYLMKSV







C the 2179
2nd
DVNLDGRVNSTDRSLNRYLLKII







C the 2193
1st
DINDDGNtext missing or illegible when filed NSTDLQMLKRHLLRSI







C the 2194
1st
DLNGDGNINSTDLQtext missing or illegible when filed LKKHLLRIT







C the 2195
1st
DLNDDGKVNSTDFQLKKHLLRIT







C the 2196
1st
DLNNDGKVNSTDFQLLKMHVLRQE







C the 2197
1st
DLNGDGKVNSTDLQLMKMHVLRQR







C the 2360
1st
DLNGDGRVNSTDLLLMKKRIREI







C the 2360
2nd
DLNLDGKINSSDYTILKRYVLKSI







C the 2549
1st
DVNKDGRtext missing or illegible when filed NSTDtext missing or illegible when filed MYLKGYLLRNS







C the 2590
2nd
DLNSDGKVNSTDLVALKRFLLKEI







C the 2760
1st
DLNYDGKVNSTDYLVLKRYLLGTI







C the 2760
2nd
DLNRDGRVNSTDMSLMKRYLLG II







C the 2761
1st
DVNGDGKVNSTDCStext missing or illegible when filed KRYLLKNI







C the 2761
2nd
DVNGDGKVNSTDYSLLKRFVLRNI







C the 2811
1st
DLNGDGKVNSTDLTtext missing or illegible when filed MKRYILKNF







C the 2811
2nd 
DLNGDGRINSTDLSILHRYLLRII







C the 2812
2nd
DLNRDGKNSTDLTtext missing or illegible when filed LKRYLLYS I







C the 2872
1st
DINSDGNVNSTDLGILKRIIVKNP







C the 2872
2nd
DVNADGKVNSTDYTVLKRYLLRS I







C the 2879
1st
Dtext missing or illegible when filed NSDGStext missing or illegible when filed NSTDVTLLKRHLLREN







C the 2949
1st
DLNGDGLVNSSDYSLLKRYtext missing or illegible when filed LKQ I







C the 2949
2nd
DLNRNGSVDSVDYSILKRFLLKTI







C the 2950
1st
DLNNDGRTNSTDYSLMKRYLLGSI







C the 2950
2nd
DVNLDGKVNSSDYTVLRRFLLGSI







C the 2972
1st
DLNGDKQVNSTDYTALKRHLLNIT







C the 3012
1st
DLNGDGNVNSTDSTLMSRYLLGII







C the 3012
2nd
DLNGDGKVNSTDYNILKRYLLKYI







C the 3132
1st
DLNGDGRVNSTDLAVMKRYLLKQ V







C the 3132
2nd
DLNGDGKANSTDYQLLKRYLKTI







C the 3141
1st
DVNGDNSESTDCVWVKRYLLKQ I







C the 3141
2nd
DVNGNGTtext missing or illegible when filed DSTDYQLLKRFILKV I








text missing or illegible when filed indicates data missing or illegible when filed







The protein of the invention may be provided with a dockerin comprising one or two such dockerin-specific sequences, but typically, aspartic acid is substituted for asparagine at a predicted sugar chain modification site in the relevant sequence. The dockerins shown in the table below are examples of such dockerins. In these tables, the dockerins are specified by means of their relevant sequences. Thus, a preferred dockerin can have a dockerin-specific sequence in which D is substituted for N in (N-X-T/S) in one or two relevant sequences of any of the dockerins in the table below.










TABLE 4





locus
Amino Acid Sequence



















C the 0032
1st
DLNNDGNINSTDYMILKKYLKVL
2nd
DLNGDGStext missing or illegible when filed NSTDLTLKRFMKAI





C the 0043
1st
DLNGDGNINSTDFTMLKRAILGNP
2nd
DLNRDGNTNSTDLMtext missing or illegible when filed LRRYLLKLI





C the 0044
1st
DINLDGKNSTDLSALKRHLRtext missing or illegible when filed
2nd
DVNNDGSVNSTDAStext missing or illegible when filed LKKYtext missing or illegible when filed AKAI





C the 0190
1st
ELNGDGKINSSDLNMMKRYLLRLI
2nd
DLNGDGKNSSDYSLKRYLLRMI





C the 0239
1st
DYNGDGAVNSTDLLACKRYLLYAL
2nd
DLDGNGKNSTDYAYLKRYLLKQI





C the 0269
1st
DVNGDGNVNSTDLTMLKRYLLKSV
2nd
DVNRDGAtext missing or illegible when filed NSSDMTtext missing or illegible when filed LKRYLKSI





C the 0270
1st
DLNGDGKVNSSDLAILKRYMLRAI
2nd
DLNRDGNVNSTDYStext missing or illegible when filed LKRYLKAI





C the 0405
1st
DVNGDGNVNSTDVVWLRRFLLKLV
2nd
DMNDDGNtext missing or illegible when filed NSTDMtext missing or illegible when filed ALKRKVLKtext missing or illegible when filed P





C the 0412
1st
DCNGDGKVNSTDAVALKRYtext missing or illegible when filed RSG
2nd
DVNADGRVNSTDLAtext missing or illegible when filed LKRYLKEI





C the 0413
1st
DCNDDGKVNSTDVAVMKRYLKKEN
2nd
DVNADGKVNSTDFStext missing or illegible when filed LKRYVMKNI 





C the 0433
1st
DLNGDGRVNSSDLALMKRYVVKQI
2nd
DLNGDDKVNSTDYSVLKRYLLRSI





C the 0536
1st
DVNGDGRVNSSDVALLKRYLLGLV
2nd
DVNVSGTVNSTDLAtext missing or illegible when filed MKRYVLRSI





C the 0543
1st
DVNFDGRINSTDYSRLKRYVIKSL
2nd
DVDGNGRNSTDLYVLNRYtext missing or illegible when filed LKLI





C the 0578
1st
DINLDGKNSSDVTLLKRYIVNKSI
2nd
DVNGDGRVNSTDYSYLKRYVLKII





C the 0624
1st
DLNNDSKVNAVDtext missing or illegible when filed MLKRYILGII
2nd
DIYFDGVVNSSDYNMKRYLLKAI





C the 0625
1st
DLNGDGVVNSTDSVLKRHtext missing or illegible when filed KFS
2nd
DLNGDGNtext missing or illegible when filed NSSDVSLMKRYLLRII





C the 0640
1st
DLNGDNNVNSTDLTLLKRYLTRVI
2nd
DVNGDGKtext missing or illegible when filed NSTDYSAMtext missing or illegible when filed RLRII





C the 0660
1st
DLNGDGKINSTDtext missing or illegible when filed LMKRYLLKQI
2nd
Dtext missing or illegible when filed NKDGKVNSTDMSLKRVLRNY





C the 0661
1st
DVNGDLKVNSTDFSMLRRYLLKTI
2nd
DLNGDGRtext missing or illegible when filed NSSDLTMLKRYLLMEV





C the 0729
1st
DSNSDCKVNSTDLTLMKRYLLQQS
2nd
DLNGDGKNSSDYTLLKRYLLGYI





C the 0745
1st
DINNDKTVNSTDVTYLKRFLLKQI
2nd
DVNLDGNNSTDLVtext missing or illegible when filed LKRYVLRGI





C the 0797
1st
DVNGDGKNSTDCTMLKRYtext missing or illegible when filed LRGI
2nd
DVNADLKINSTDLVLMKKYLLRSI





C the 0798
1st
DVNLDGQVNSTDFSLLKRYLKVV
2nd
DMNNDGNNSTDtext missing or illegible when filed Stext missing or illegible when filed LKRtext missing or illegible when filed LLRN





C ihe 0821
1st
Dtext missing or illegible when filed NRDGKtext missing or illegible when filed NSTDLGMLNRHtext missing or illegible when filed LKLV
2nd
Dtext missing or illegible when filed DGNGNNSTDYSWLKKYLKVI





C the 0825
1st
DVNDDGKVNSTDLTLLKRYVLKAV
2nd
DVNRDGRVNSSDVTLSRYLtext missing or illegible when filed RVI





C the 0912
1st
DVNGDGTNSTDLTMLKRSVLRAI
2nd
DVDKNGStext missing or illegible when filed NSTDVLLLSRYLLRVI





C the 1271
1st
DTNSDGKNSTDVTALKRHLLRVT
2nd
DVNGDGNVNSTDLLLLKRYLGEI





C the 1398
1st
DLNGDNRNSTDLTLMKRYtext missing or illegible when filed LKSI
2nd
Dtext missing or illegible when filed NGDGKtext missing or illegible when filed NSTDYTYLKKYLLQAI





C the 1400
1st
DLNGDGRVNSTDYTLLKRYLLGAI
2nd
DLNLDGRtext missing or illegible when filed NSTDYTVLKRYLLNAI





C the 1838
1st
DVNGDGRVNSSDLTLMKRYLLKSI
2nd
DLNEDGKVNSTDLLALKKLVLREL





C the 2089
1st
DVNDDGKVNSTDAVALKRYVLRSG
2nd
DLNEDGRVNSTDLGtext missing or illegible when filed LKRYtext missing or illegible when filed LKEI





C the 2137
1st
DVDGNGTVNSTDVNYMKRYLLRQI
2nd
DVDGNGNNSTDLSYLKKYLKLI





C the 2139
1st
DVNADGVNSSDMVLKRFLLRTI
2nd
DTNGDGAVNSSDFTLLKRYtext missing or illegible when filed LRSI





C the 2179
1st
DLNGDGNVNSTDSLMKRYLMKSV
2nd
DVNLDGRVNSTDRStext missing or illegible when filed LNRYLLKII





C the 2360
1st
DLNGDGRVNSTDLLLMKKRIREI
2nd
DLNLDGKNSSDYTtext missing or illegible when filed LKRYVLKSI





C the 2549
1st
DVNKDGRtext missing or illegible when filed NSTDtext missing or illegible when filed MYLKGYLLRNS
2nd
DVDGNGSVSSLDLTYLKRYLRRI





C the 2760
1st
DLNYDGKVNSTDYLVLKRYLLGTI
2nd
DLNRDGRVNSTDMSLMKRYLLGII





C the 2761
1st
DVNGDGKVNSTDCStext missing or illegible when filed KRYLLKNI
2nd
DVNGDGKVNSTDYSLLKRFVLRNI





C the 2811
1st
DLNGDGKVNSTDLTtext missing or illegible when filed KRYtext missing or illegible when filed LKNF
2nd
DLNGDGRtext missing or illegible when filed NSTDLStext missing or illegible when filed LHRYLLRII





C the 2872
1st
Dtext missing or illegible when filed NSDGNVNSTDLGtext missing or illegible when filed LKRtext missing or illegible when filed VKNP
2nd
DVNADGKVNSTDYTVLKRYLLRSI





C the 2949
1st
DLNGDGLVNSSDYSLLKRYLKQI
2nd
DLNRNGSVDSVDYStext missing or illegible when filed LKRFLLKTI





C the 2950
1st
DLNNDGRTNSTDYSLMKRYLLGSI
2nd
DVNLDGKVNSSDYTVLRRFLLGSI





C the 3012
1st
DLNGDGNVNSTDSTLMSRYLLGII
2nd
DLNGDGKVNSTDYNtext missing or illegible when filed LKRYLLKYI





C the 3132
1st
DLNGDGRVNSTDLAVMKRYLLKQV
2nd
DLNGDGKANSTDYQLLKRYtext missing or illegible when filed LKTI






text missing or illegible when filed indicates data missing or illegible when filed







The dockerins shown in Table 4 each have two relevant sequences in the dockerin, and each relevant sequence has a predicted N-type sugar chain modification site. A preferred dockerin can be obtained with any of these dockerins by making a dockerin-specific sequence in which aspartic acid is substituted for asparagine at the predicted N-type sugar chain modification site of one or both of the two relevant sequences.










TABLE 5





locus
Amino Acid Sequence



















C the 0109
1st
DFNSDSSVNSTDLMtext missing or illegible when filed LNRAVLGLG
2nd






C the 0191
1st
DLNGDAKNSTDLNMMKRYLLQMI
2nd
DLNGDGKtext missing or illegible when filed SSDYNLLKRYtext missing or illegible when filed LHLI





C the 0211
1st
DVNGDGHVNSSDYSLFKRYLLRVI
2nd
DVNRDGRtext missing or illegible when filed DSTDLTMLKRYLtext missing or illegible when filed RAI





C the 0246
1st
DLNADGKNSTDYNLGKRLtext missing or illegible when filed LRTI
2nd
DLNGDSKVDSTDLTALKRYLLGVI





C the 0435
1st
DVNADGVVNtext missing or illegible when filed SDYVLMKRYtext missing or illegible when filed LRII
2nd
DVNGDNVtext missing or illegible when filed NDDCNYLKRYLLHMI





C the 0438
1st
DLNGDNNtext missing or illegible when filed NSSDYTLLKRYLLHTI
2nd






C the 1472
1st
DLNFDNAVNSTDLLMLKRYtext missing or illegible when filed LKSL
2nd
DLNRDNKVDSTDLTtext missing or illegible when filed LKRYLLKAI





C the 1890
1st
DLNADGStext missing or illegible when filed NSTDLMtext missing or illegible when filed MKRVLLKQR
2nd
DLNGDGKVTSTDYSLMKRYLLKEI





C the 1963
1st
DLNGDGNtext missing or illegible when filed NSSDLQALKRHLLGtext missing or illegible when filed S
2nd
DVNRSGKVDSTDYSVLKRYtext missing or illegible when filed LRII





C the 2193
1st
DINDDGNtext missing or illegible when filed NDSTDLQMLKRHLLRSI
2nd
DTNRDGRVDSTDLALLKRYtext missing or illegible when filed LRVI





C the 2194
1st
DLNGDGNtext missing or illegible when filed NSTDLQtext missing or illegible when filed LKKHLLRtext missing or illegible when filed
2nd
DVTKDGKVDSTDLTLLKRYtext missing or illegible when filed LRFV





C the 2195
1st
DLNDDGKVNSTDFQtext missing or illegible when filed LKKHLLRtext missing or illegible when filed
2nd
DLNKDGKVDSSDLSLMKRYLLQII





C the 2196
1st
DLNNDGKVNSTDFQLLKMHVLRQE
2nd
DVNRDGKVDSSDCTLLKRYILRVI





C the 2197
1st
DLNGDGKVNSTDLQLMKMHVLRQR
2nd
DVNRDGKVDSTDVALLKRYtext missing or illegible when filed LRQI





C the 2879
1st
Dtext missing or illegible when filed NSDGStext missing or illegible when filed NSTDVTLLKRHLLREN
2nd
DTDGDGKtext missing or illegible when filed TStext missing or illegible when filed DLSYLKRYVLRLI





C the 2972
1st
DLNGDKQVNSTDYTALKRHLLNtext missing or illegible when filed T
2nd
DLNGDGKVDSTDLMtext missing or illegible when filed LHRYLLGII






text missing or illegible when filed indicates data missing or illegible when filed







The dockerins shown in Table 5 each have one or two relevant sequences in the dockerin, and have a predicted N-type sugar chain modification site in the N-terminal relevant sequence. A preferred dockerin can be obtained with these dockerins by making a dockerin-specific sequence in which aspartic acid is substituted for asparagine at the predicted N-type sugar chain modification site of this relevant sequence.










TABLE 6





locus
Amino Acid Sequence



















C the 0015
1st
DVNADGKtext missing or illegible when filed DSTDLTLLKRYLLRSA
2nd
DTDGNGTVNSTDLNYLKKYILRVI





C the 0274
1st
DLNVDGStext missing or illegible when filed NSVDtext missing or illegible when filed TYMKRYLLRSI
2nd
DINGDGAtext missing or illegible when filed NSSDMVLLKRYVLRSI





c the 0918
1st
DLNRNGtext missing or illegible when filed VNDEDYLLKNYLLRGN
2nd
DVNKDGKVNSTDCLFLKKYtext missing or illegible when filed LGLI





C the 1806
1st
EVtext missing or illegible when filed DTKVtext missing or illegible when filed DSTDDIVKYEYQFDKK
2nd
TLVLSVNNDSTDKTTVSGYtext missing or illegible when filed SVDF





C the 2038
1st
DIVLDGNtext missing or illegible when filed NSLDMMKLKKYLtext missing or illegible when filed RET
2nd
DVNSDGEVNSTDYAYLKRYtext missing or illegible when filed LRII





C the 2590
1st
DLNQDGQVSSTDLVAMKRYLLKNF
2nd
DLNSDGKVNSTDLVALKRFLLKEI





C the 2812
1st
DLNGDQKVTSTDYTMLKRYLMKSI
2nd
DLNRDGKtext missing or illegible when filed NSTDLTtext missing or illegible when filed LKRYLLYSI






text missing or illegible when filed indicates data missing or illegible when filed







The dockerins shown in Table 6 each have two relevant sequences in the dockerin, and have a predicted N-type sugar chain modification site in the C-terminal relevant sequence. A preferred dockerin can be obtained with these dockerins by making a dockerin-specific sequence in which aspartic acid is substituted for asparagine in the predicted N-type sugar chain modification site of this relevant sequence.


The C. thermocellum type I dockerins shown in the following table, the binding ability of which with cohesins has been confirmed from existing literature and the like, are considered when selecting dockerin-specific sequences including preferred dockerins in the protein of the invention. In the following table, the dockerins are each specified by two relevant sequences. A preferred dockerin comprising a dockerin-specific sequence with aspartic acid substituted for asparagine at a predicted N-type sugar chain modification site in a relevant sequence can be obtained if this relevant sequence has 90% or more amino acid sequence similarity with any of the relevant sequences contained in these dockerins.











TABLE 7





locus
protein
Related Sequence




















C the 0269
Cel8A
1st
DVNGDGNVNSTDLTMLKRYLLKSV
2nd
DVNRDGAtext missing or illegible when filed NSSDMTtext missing or illegible when filed LKRYLtext missing or illegible when filed KSI





C the 0412
Ce19K
1st
DCNGDGKVNSTDAVALKRYtext missing or illegible when filed LRSG
2nd
DVNADGRVNSTDLAtext missing or illegible when filed LKRYtext missing or illegible when filed LKEI





C the 0413
Cbh9A
1st
DCNDDGKVNSTDVAVMKRYLKKEN
2nd
DVNADGKVNSTDFStext missing or illegible when filed LKRYVMNKNI





C the 0578
Ce19R
1st
Dtext missing or illegible when filed NLDGKtext missing or illegible when filed NSSDVTLLKRYtext missing or illegible when filed KSI
2nd
DVNGDGRVNSTDYSYLKRYVLKII





C the 0825
Cel9D
1st
DVNDDGKVNSTDLTLLKRYVLKAV
2nd
DVNRDGRVNSSDVTtext missing or illegible when filed LSRYLtext missing or illegible when filed RVI





C the 1838
Xyn10C
1st
DVNGDGRVNSSDLTLMKRYLLKSI
2nd
DLNEDGKVNSTDLLALKKLVLREL





C the 2089
Cel48S
1st
DVNDDGKVNSTDAVALKRYVLRSG
2nd
DLNEDGRVNSTDLGtext missing or illegible when filed LKRYtext missing or illegible when filed LKEI





C the 2147
Ce150
1st
DVNGDFAVNSNDLTLtext missing or illegible when filed KRYVLKNI
2nd
DVDGDEKtext missing or illegible when filed TSSDAALVKRYVLRAI






text missing or illegible when filed indicates data missing or illegible when filed

















TABLE 8










Similarity of Amino Acid Sequence




















Cel8A
Ce19K
Cbh9A
Cel9R
Cel9D
Xyn10C
Cel48S
Ce150

























locus

Related Sequence
1st
2nd
1st
2nd
1st
2nd
1st
2nd
1st
2nd
1st
2nd
1st
2nd
1st
2nd





C the 0109
1st
DFNSDSSVNSTDLMtext missing or illegible when filed LNRAVLGLG
85
85
84
85
93
 85
85
76
85
85
85
90
80
90
80
80



2nd






















C the 0438
1st
DLNGDNNNSSDYTLLKRYLLHTI
95
91
90
95
94
100
91
95
95
91
95
83
91
95
91
95



2nd























text missing or illegible when filed indicates data missing or illegible when filed







The dockerins shown in Table 8 each have one relevant sequence on the N-terminal side, and this relevant sequence has 90% or greater amino acid sequence similarity to one of the relevant sequences in the 8 dockerins shown in Table 7, which have confirmed cohesin binding ability. A preferred dockerin can be obtained by making a dockerin-specific sequence in which aspartic acid has been substituted for asparagine at a predicted N-type sugar chain modification site in this relevant sequence.












TABLE 9










Similarity of Amino Acid Sequence




















Cel8a
Cel9K
Cbh9A
Cel9R
Cel9D
Xyn10C
Cel48S
Cel50

























locus

Related Sequence
1st
2nd
1st
2nd
1st
2nd
1st
2nd
1st
2nd
1st
2nd
1st
2nd
1st
2nd





C the0239
1st
DYNGDGAVNSTDLLACKRYLLYAL
80
76
85
76
85
76
71
71
76
71
76
79
80
76
76
71



2nd
DLDGNGKNSTDYAYLKRVLLKQI
87
79
90
91
85
95
83
95
91
83
87
83
90
91
83
91





C the0435
1st
DVNADGVVNISDYVLMKRYtext missing or illegible when filed LRII
83
83
77
83
80
83
79
83
83
83
83
77
81
79
83
83



2nd
DVNGDNVtext missing or illegible when filed NDDCNYLKRYtext missing or illegible when filed LHMI
83
79
77
83
80
83
75
95
83
79
79
77
81
83
79
79





C the2038
1st
DIVLDGNNSLDMMKLKKYLtext missing or illegible when filed RET
81
77
83
78
78
78
81
72
77
77
77
78
77
78
73
72



2nd
DVNSDGEVNSTDYAYLKRYtext missing or illegible when filed LRII
87
87
86
85
85
91
83
95
87
87
83
81
90
83
83
87





C the2549
1st
DVNKDGRtext missing or illegible when filed NSTDtext missing or illegible when filed MYLKGYLLRNS
82
82
79
79
79
82
82
81
86
86
82
86
87
86
73
77



2nd
DVDGNGSVSSLDLTYLKRYLRRI
91
87
81
85
85
91
87
87
87
83
91
79
86
87
87
83






text missing or illegible when filed indicates data missing or illegible when filed







The dockerins shown in Table 9 each have two relevant sequences, and the relevant sequence on the C-terminal side has 90% or greater amino acid similarity to one of the relevant sequences in the 8 dockerins shown in Table 7, which have confirmed cohesin binding ability. A preferred dockerin can be obtained by making a dockerin-specific sequence in which aspartic acid has been substituted for asparagine at a predicted N-type sugar chain modification site in this relevant sequence.












TABLE 10










Similarity of Amino Acid Sequence




















Cel8A
Cel9K
Cbh9A
Cel9R
Cel9D
Xyn10C
Cel48S
Cel50

























locus

Related Sequence
1st
2nd
1st
2nd
1st
2nd
1st
2nd
1st
2nd
1st
2nd
1st
2nd
1st
2nd




























C the0015
1st
DVNADGKIDSTDLTLLKRVLLRSA
100
91
87
100
95
100
95
90
100
95
100
86
91
100
91
95



2nd
DTDGNGTVNSTDLNYLKKYILRV I
91
87
81
87
90
91
87
91
95
87
87
81
86
83
87
91





C the0032
1st
DLNNDGNtext missing or illegible when filed NSTDYMtext missing or illegible when filed LKKYtext missing or illegible when filed LKVL
87
87
90
90
85
95
83
91
91
91
87
83
90
90
86
91



2nd
DLNGDGStext missing or illegible when filed NSTDLTLKRFtext missing or illegible when filed MKAI
100
95
90
100
100
100
95
87
100
95
100
91
91
100
95
96





C the0043
1st
DLNGDGNtext missing or illegible when filed NSTDFTMLKRAtext missing or illegible when filed LGNP
95
90
84
95
93
95
85
90
95
85
95
85
85
95
90
90



2nd
DLNRDGNTNSTDLMtext missing or illegible when filed LRRYLLKLI
87
91
88
87
85
87
87
83
87
95
87
86
86
87
83
83





C the0044
1st
Dtext missing or illegible when filed NLDGKNSTDLSALKRHtext missing or illegible when filed LRII
90
86
88
90
100
90
95
86
90
91
90
86
86
90
81
86



2nd
DVNNDGSVNSTDAStext missing or illegible when filed LKKYtext missing or illegible when filed AKAI
91
91
82
91
90
91
95
83
91
95
91
79
86
91
87
95





C the0190
1st
ELNGDGKtext missing or illegible when filed NSSDLNMMKRYLLRLI
95
87
85
95
100
95
91
91
95
95
95
86
86
95
87
91



2nd
DLNGDGKtext missing or illegible when filed NSSDYStext missing or illegible when filed LKRYLLRM I
91
83
85
91
94
95
87
95
91
91
91
81
86
91
83
91





C the0191
1st
DLNGDAKtext missing or illegible when filed NSTDLNMMKRYLLQM I
95
87
85
95
100
95
91
91
95
95
95
86
86
95
87
91



2nd
DLNGDGKITSSDYNLLKRYtext missing or illegible when filed LHLI
91
83
85
91
94
95
87
95
91
91
91
81
86
91
83
91





C the0211
1st
DVNGDGHVNSSDYSLFKRYLLRV I
91
83
81
91
90
95
87
95
95
91
91
81
86
91
83
91



2nd
DVNRDGRtext missing or illegible when filed DSTDLTMLKRYLtext missing or illegible when filed RAI
95
95
82
95
86
95
95
83
95
100
95
83
86
95
87
91





C the0246
1st
DLNADGKtext missing or illegible when filed NSTDYNLGKRLtext missing or illegible when filed LRTI
87
79
76
87
83
91
83
87
87
83
87
83
78
87
79
87



2nd
DLNGDSKVDSTDLTALKRYLLGV I
87
79
94
87
100
87
83
87
91
87
87
90
95
87
79
87





C the0269
1st
DVNGDGNVNSTDLTMLKRYLLKSV
100
95
86
100
90
100
95
87
100
91
100
87
91
100
95
95



2nd
DVNRDGAtext missing or illegible when filed NDDSMTtext missing or illegible when filed LKRYLtext missing or illegible when filed KSI
95
100
78
91
81
91
91
79
91
91
91
79
82
91
91
87





C the0270
1st
DLNGDGKVNSSDLAtext missing or illegible when filed LKRYMLRA I
100
91
90
100
100
100
95
87
100
95
100
87
91
100
91
95



2nd
DLNRDGNVNSTDYStext missing or illegible when filed LKRYtext missing or illegible when filed LKAI
91
95
84
91
85
95
91
87
91
95
91
79
82
91
87
91





C the0274
1st
DLNVDGStext missing or illegible when filed NSVDITYMKRYLLRSI
91
87
85
95
93
95
91
87
91
83
87
83
86
91
83
83



2nd
DTNGDGAtext missing or illegible when filed NSSDMVLLKRYVLRSI
100
95
82
95
90
91
91
79
95
87
95
83
86
87
95
91





C the0405
1st
DVNGDGNVNSTDVVWLRRFLLKLV
91
87
90
91
86
83
87
91
91
91
91
95
95
87
87
91



2nd
DMNDDGNtext missing or illegible when filed NSTDMIALKRKVLKP
90
86
90
86
93
86
86
82
90
86
90
95
90
86
86
81





C the0412
1st
DCNGDGKVNSTDAVALKRYtext missing or illegible when filed LRSG
86
78
100
86
100
82
94
81
86
86
86
90
95
85
78
91



2nd
DVNADGRVNSTDLAtext missing or illegible when filed LKRYILKEI
100
91
86
100
91
100
95
87
100
91
100
87
90
100
91
95





C the0413
1st
DCNDDGKVNSTDVAVMKRYLKKEN
90
81
100
91
100
91
94
81
90
86
90
85
95
95
82
95



2nd
DVNADGKVNSTDFStext missing or illegible when filed LKRYVMKNI
100
91
82
100
91
100
91
91
100
87
100
87
86
100
91
95





C the0433
1st
DLNGDGRVNSSDLALMKRYVVKQ I
95
87
90
100
95
100
91
87
100
91
95
87
90
100
91
95



2nd
DLNGDDKVNSTDYSVLKRYLLRSI
95
87
90
95
94
100
91
91
95
87
95
87
91
95
87
95





C the0536
1st
DVNGDGRVNSSDVALLKRYLLGLV
100
90
90
100
95
95
95
87
100
91
100
85
95
100
90
100



2nd
DVNVSGTVNSTDLAtext missing or illegible when filed MKRYVLRS I
95
95
82
95
90
100
100
79
95
87
91
83
86
91
91
91





C the0543
1st
DVNFDGRtext missing or illegible when filed NSTDYSRLKRYYtext missing or illegible when filed KSL
91
83
84
87
83
91
91
87
87
83
87
79
82
87
79
87



2nd
DVDGNGRtext missing or illegible when filed NSTDLYVLNRYtext missing or illegible when filed LKLI
91
83
86
91
86
91
87
87
91
91
91
90
90
91
83
87





C the0578
1st
Dtext missing or illegible when filed NLDGKNSSDVTLLKRYtext missing or illegible when filed KSI
95
91
94
95
94
91
100
83
95
91
95
83
91
95
87
95



2nd
DVNGDGRVNSTDYSYLKRYVLKII
87
79
81
87
81
91
83
100
87
87
87
81
86
87
79
87





C the0640
1st
DLNGDNNVNSTDLTLLKRYLTRV I
91
91
85
91
100
91
91
87
95
95
91
81
86
91
87
91



2nd
DVNGDGKtext missing or illegible when filed NSTDYSAMtext missing or illegible when filed RYtext missing or illegible when filed LRII
83
75
81
83
85
87
79
91
83
83
83
81
86
83
75
83





C the0661
1st
DVNGDLKVNSTDFSMLRRYLLKT I
95
87
78
95
81
95
87
91
95
87
95
83
82
95
95
91



2nd
DLNGDGRtext missing or illegible when filed NSSDLTMLKRYLLMEV
100
91
89
100
95
100
95
87
100
91
100
87
90
100
91
95





C the0624
1st
DLNNDSKVNAVDtext missing or illegible when filed MMLKRYtext missing or illegible when filed LG II
87
79
89
87
94
87
79
83
87
87
83
90
90
87
75
79



2nd
DIYFDGVVNSSDYNtext missing or illegible when filed MKRYLLKA I
83
87
78
83
83
87
87
79
90
83
83
75
78
90
94
90





C the0625
1st
DLNGDGVVNSTDSVtext missing or illegible when filed KRHIKFS
90
90
90
90
85
86
86
82
90
86
90
81
90
86
90
95



2nd
DLNGDGNtext missing or illegible when filed NSSDVSLMKRYLLRII
95
91
90
95
100
91
91
91
95
95
95
86
90
95
91
95





C the0660
1st
DLNGDGKtext missing or illegible when filed NSTDtext missing or illegible when filed LMKRYLLKQ I
95
87
85
100
95
100
91
87
100
91
95
87
86
100
91
95



2nd
Dtext missing or illegible when filed NKDGKVNSTDMStext missing or illegible when filed LKRVtext missing or illegible when filed LRNY
91
91
73
91
86
91
91
81
95
95
91
91
82
95
82
86





C the0729
1st
DSNSDCKVNSTDLTLMKRYLLQQS
90
86
86
91
91
91
86
81
90
90
90
78
81
91
86
86



2nd
DLNGDGKtext missing or illegible when filed NSSDYTLLKRYLLGYI
95
85
89
95
94
91
90
87
95
90
95
80
90
95
79
95





C the0745
1st
Dtext missing or illegible when filed NNDKTVNSTDVTYLKRFLLKQ I
87
87
86
87
82
87
83
83
91
83
83
87
90
87
91
95



2nd
DVNLDGNtext missing or illegible when filed NSTDLVtext missing or illegible when filed LKRYVLRGI
95
95
89
95
100
91
100
79
95
91
95
87
86
91
91
91





C the0797
1st
DVNGDGKtext missing or illegible when filed NSTDCTMLKRYtext missing or illegible when filed LRG I
95
87
86
95
90
95
91
91
95
87
95
83
91
95
87
96



2nd
DVNADLKtext missing or illegible when filed NSTDLVLMKKYLLRSI
95
87
82
95
90
91
91
79
95
87
95
87
86
91
95
91





C the0798
1st
DVNLDGQVNSTDFSLLKRYtext missing or illegible when filed LKVV
91
91
83
91
88
91
91
91
95
91
91
81
81
91
87
91



2nd
DMNNDGNtext missing or illegible when filed NSTDtext missing or illegible when filed LKRtext missing or illegible when filed LLRN
95
95
80
95
90
95
91
86
95
95
95
91
82
96
91
90





C the0821
1st
Dtext missing or illegible when filed NRDGKtext missing or illegible when filed NSTDLGMLNRHtext missing or illegible when filed LKLV
91
91
77
91
86
91
91
87
91
100
91
81
81
91
83
87



2nd
Dtext missing or illegible when filed DGNGNtext missing or illegible when filed NSTDYSWLKKYtext missing or illegible when filed LKVI
87
83
81
87
81
91
83
100
91
87
87
81
86
87
83
91





C the0825
1st
DVNDDGKVNSTDLTLLKRYVLKAV
100
91
86
100
90
100
95
87
100
95
100
87
91
100
91
95



2nd
DVNRDGRVNSSDVTtext missing or illegible when filed LSRYLtext missing or illegible when filed RVI
91
91
86
91
86
87
91
87
95
100
91
81
90
91
83
95





C the0912
1st
DVNGDGTtext missing or illegible when filed NSTDLTMLKRSVLRAI
95
91
82
91
94
95
91
79
95
87
91
87
86
91
91
91



2nd
DVDKNGStext missing or illegible when filed NSTDVLLLSRYLLRVI
87
91
86
87
90
83
87
83
95
95
87
90
95
91
83
91





C the0918
1st
DLNRNGtext missing or illegible when filed DEDYtext missing or illegible when filed LLKNYLLRGN
86
91
84
81
77
86
86
81
86
90
86
77
82
81
81
82



2nd
DVNKDGKVNSTDCLFLKKYtext missing or illegible when filed LGLI
85
85
80
85
80
85
85
87
90
87
85
85
90
90
76
85





C the1271
1st
DTNSDGKtext missing or illegible when filed NSTDVTALKRHLLRVT
95
90
95
95
95
90
90
91
95
95
95
86
100
90
86
95



2nd
DVNGDGNVNSTDLLLLKRYtext missing or illegible when filed LGEI
91
87
85
91
90
91
87
79
91
90
91
87
90
91
87
87





C the1398
1st
DLNGDNRtext missing or illegible when filed NSTDLTLMKRYtext missing or illegible when filed LKSI
100
91
90
100
95
100
95
87
100
91
100
87
91
100
91
95



2nd
Dtext missing or illegible when filed NGDGKtext missing or illegible when filed NSTDYTYLKKYLLQAI
91
83
86
91
85
95
87
95
91
87
91
83
91
91
83
91





C the1400
1st
DLNGDGRVNSTDYTLLKRVLLGAI
91
83
89
91
94
95
87
87
91
87
91
80
90
91
83
91



2nd
DLNLDGRtext missing or illegible when filed NSTDYTVLKRYLLNAI
91
87
94
91
93
95
95
87
91
91
91
83
91
91
83
91





C the1472
1st
DLNFDNAVNSTDLLMLKRYtext missing or illegible when filed LKSL
91
91
84
87
83
87
91
75
87
83
87
83
82
87
87
83



2nd
DLNRDNKVDSTDLTtext missing or illegible when filed LKRYLLKAI
95
95
89
95
90
95
95
83
95
100
95
83
86
95
87
91






text missing or illegible when filed

1st
EVDTKVtext missing or illegible when filed DSTDDIVKVEYQFDKK
75

75
84
84
76
85
70
75
100

75
68
84

100





C the2137
1st
DVDGNGTVNSTDVNYMKRYLLRQI
91
87
86
91
86
91
87
87
95
83
87
87
90
91
91
95



2nd
DVNGNGNtext missing or illegible when filed NSTDLSYLKKYtext missing or illegible when filed LKLI
91
87
81
91
86
91
87
95
91
91
91
86
86
91
87
87





C the2139
1st
DVNADGVtext missing or illegible when filed NSSDtext missing or illegible when filed MVLKRFLLRTI
91
91
86
91
85
91
87
83
91
87
91
95
91
91
91
87



2nd
DTNGDGAVNSSDFTLLKRYtext missing or illegible when filed LRSI
100
95
82
95
85
95
87
87
95
83
95
79
86
91
95
91





C the2179
1st
DLNGDGNVNSTDStext missing or illegible when filed LMKRYLMKSV
95
91
95
91
90
91
91
83
95
87
95
87
95
91
91
95



2nd
DVNLDGRVNSTDRStext missing or illegible when filed LNRYLLKII
87
87
83
87
88
87
91
87
87
91
87
77
81
87
79
87





C the2193
1st
Dtext missing or illegible when filed NDDGNtext missing or illegible when filed NSTDLQMLKRHLLRSI
95
91
82
100
95
95
91
83
95
87
95
87
86
95
91
95



2nd
DTNRDGRVDSTDLALLKRYtext missing or illegible when filed LRVI
91
91
81
91
90
91
91
87
95
100
91
77
86
87
94
91





C the2194
1st
DLNGDGNtext missing or illegible when filed NSTDLQtext missing or illegible when filed LKKHLLRIT
95
90
85
100
100
95
90
86
96
91
95
86
86
95
90
95



2nd
DVTKDGKVDSTDLTLLKRYLRFV
91
91
88
95
95
95
91
87
100
95
91
86
90
100
86
90





C the2195
1st
DLNDDGKVNSTDFQtext missing or illegible when filed LKKHLLRIT
95
86
85
100
94
95
86
91
95
86
95
86
86
95
86
95



2nd
DLNKDGKVDSSDLSLMKRYLLQII
91
91
83
91
100
91
91
87
95
100
91
86
86
95
83
87





C the2196
1st
DLNNDGKVNSTDFQLLKMHVLRQE
95
90
85
100
90
95
86
90
95
90
95
86
86
95
86
95



2nd
DVNRDGKVDSSDCTLLKRYtext missing or illegible when filed LRVI
87
87
81
87
85
87
87
91
91
95
87
77
86
87
79
91





C the2197
1st
DLNGDGKVNSTDLQLMKMHVLRQR
95
86
85
100
95
95
90
86
95
90
95
86
86
95
86
95



2nd
DVNRDGKVDSTDVALLKRYtext missing or illegible when filed LRQ I
91
91
86
95
86
91
91
83
95
95
91
83
90
95
87
95





C the2360
1st
DLNGDGRVNSTDLLLMKKRIREI
91
83
85
91
85
91
87
79
91
83
91
91
86
91
83
87



2nd
DLNLDGKtext missing or illegible when filed NSSDYTtext missing or illegible when filed LKRYVLKSI
91
87
89
91
88
95
95
87
91
87
91
79
86
91
83
91





C the2590
1st
DLNQDGQVSSTDLVAMKRYLLKNF
91
95
90
95
95
91
91
81
95
95
91
95
95
91
87
87



2nd
DLNSDGKVNSTDLVALKRFLLKEI
95
91
95
95
95
91
91
83
95
91
95
100
95
91
87
91





C the2760
1st
DLNVDGKVNSTDYLVLKRYLLGT I
83
79
94
83
87
87
83
83
83
83
83
85
90
83
75
83



2nd
DLNRDGRVNSTDMSLMKRYLLG II
87
87
82
87
94
87
87
83
87
95
87
80
80
87
79
83





C the2761
1st
DVNGDGKVNSTDCStext missing or illegible when filed KRYLLKNI
95
87
82
95
86
95
91
91
95
87
95
83
86
95
87
95



2nd
DVNGDGKVNSTDYSLLKRFVLRNI
95
87
82
95
86
100
91
91
95
87
95
87
86
95
87
95





C the2811
1st
DLNGDGKVNSTDLTtext missing or illegible when filed MKRYtext missing or illegible when filed LKNF
100
91
90
100
95
100
95
90
100
95
100
87
91
100
91
95



2nd
DLNGDGRtext missing or illegible when filed NSTDLStext missing or illegible when filed LHRYLLRII
95
87
85
95
100
95
91
91
95
91
95
86
86
95
87
91





C the2812
1st
DLNGDQKVTSTDYTMLKRYLMKSI
91
83
85
91
90
95
87
87
91
83
91
79
86
91
87
95



2nd
DLNRDGKtext missing or illegible when filed NSTDLTLKRYLLYSI
91
91
88
91
94
91
91
85
91
100
91
80
85
91
83
87





C the2872
1st
Dtext missing or illegible when filed SDGNVNSTDLGtext missing or illegible when filed LKRtext missing or illegible when filed KNP
95
95
78
95
86
95
91
86
95
95
95
91
82
95
91
90



2nd
DVNADGKVNSTDYTVLKRYLLRSI
95
87
91
95
90
100
91
91
95
87
95
87
95
95
87
95





C the2879
1st
Dtext missing or illegible when filed NSDGStext missing or illegible when filed NSTDVTLLKRHLLREN
100
100
90
100
91
95
95
90
100
100
100
86
95
100
95
100



2nd
DTDGDGKITSDLSYLKRYVLRLI
91
79
81
91
90
91
83
95
91
87
87
81
86
87
79
83





C the2949
1st
DLNGDLVNSSDYSLLKRYLKQI
87
83
80
91
85
95
83
87
91
83
87
79
81
91
87
91



2nd
DLNRNGSVDSVDYStext missing or illegible when filed LKRFLLKTI
91
91
84
91
85
95
87
91
91
91
87
83
82
91
83
87





C the2950
1st
DLNNDGRTNSTDYSLMKRYLLGSI
91
87
84
91
94
95
87
87
91
87
91
80
85
91
83
91



2nd
DVNLDGKVNSSDYTVLRRFLLGSI
87
83
94
87
93
91
91
83
87
83
87
85
90
87
79
87





C the2972
1st
DLNGDKQVNSTDYTALKRHLLNII
86
81
89
86
88
90
81
91
86
82
86
80
90
86
86
90



2nd
DLNGDGKVDSTDLMtext missing or illegible when filed LHRVLLGII
87
79
89
87
94
87
83
83
87
83
87
90
90
87
79
83





C the3012
1st
DLNGDGNVNSTDSTLMSRYLLGII
87
83
94
87
94
87
83
87
87
87
87
80
95
87
83
91



2nd
DLNGDGKVNSTDYNtext missing or illegible when filed LKRYLLKYI
91
83
85
91
90
95
87
91
91
87
91
81
86
91
83
91





C the3132
1st
DLNGDGRVNSTDLAVMKRYLLKQV
95
87
95
100
95
100
91
87
100
91
95
91
95
100
91
95



2nd
DLNGDGKANSTDYQLLKRYtext missing or illegible when filed LKTI
91
79
85
95
90
95
83
91
91
87
91
83
86
91
83
91






text missing or illegible when filed indicates data missing or illegible when filed







The dockerins shown in Table 10 each have two relevant sequences, and each of the relevant sequences has 90% or greater amino acid similarity to one of the relevant sequences in the 8 dockerins shown in Table 7, which have confirmed cohesin binding ability. A preferred dockerin can be obtained by making a dockerin-specific sequence in which aspartic acid has been substituted for asparagine at a predicted N-type sugar chain modification site in one or both of these relevant sequences.


When the amino acid sequence of a dockerin has 90% or greater similarity to the amino acid sequence of any of the known dockerins having cohesin binding ability shown in Table 7, moreover, a preferred dockerin can be obtained by making a dockerin-specific sequence in which aspartic acid has been substituted for asparagine at a predicted N-type sugar chain modification site in this dockerin.











TABLE 11









Similarity of




Amino Acid Sequence












locus
Amino Acid Sequence
Cel8A
Cel9K
Cbh9A
Cel9R





C the0043
DLNGDGNNSTDFTHLKRAtext missing or illegible when filed LGNPAPGTNLAAGDLNRDGNTNSTDLMtext missing or illegible when filed LRRYLLKLI
91
83
83
76





C the0044
DNLDGKtext missing or illegible when filed STDLSALKRHtext missing or illegible when filed LRITTLSGKQLENADVNNDGSVNSTDASLKKYtext missing or illegible when filed KAI
86
81
88
77





C the0109
DFNSDSSVNSTDLMtext missing or illegible when filed LNRAVLGLG
85
85
85
76





C the0211
DVNGDHVNSSDYSLFKRYLLRVtext missing or illegible when filed DRFPVGDQSVADVNRDGRtext missing or illegible when filed DSTDLTMLKRYLtext missing or illegible when filed RA I
86
83
88
91





C the0269
DVNGDGNVNSTDLTMLKRYLLKSVTNtext missing or illegible when filed NREAADVNRDGAtext missing or illegible when filed NSSDMTLKRYLtext missing or illegible when filed KSI
100
87
89
77





C the0270
DLNGDGKVNSSDLAtext missing or illegible when filed LKRYMLRAISDFPPEGRKLADLNRDGNVNSTDYSLKRYLKAI
88
82
82
87





C the0405
DVNGDGNVNSTDVVWLRRFLLKLVEDFPVPSGKQAADMNDDGNNSTDMtext missing or illegible when filed LKRKVLKP
82
79
77
75





C the0412
DCNGDGKVNSTDAVALKRYtext missing or illegible when filed LRSGIStext missing or illegible when filed NTDNADVNADGRVNSTDLAtext missing or illegible when filed LKRYtext missing or illegible when filed LKEI
87
100
96
74





C the0413
DCNDDGKVNSTDVAVMKRYLKKENVNNLDNADVNADGKVNSTDFSLKRYVMKNI
89
96
100
79





C the0433
DLNGDGRVNSSDLALMKRYVVKQtext missing or illegible when filed EKLNVPVKAADLNGDDKVNSTDYSVLKRYLLRSI
87
87
92
80





C the0438
DLNGDNNNSSDYTLLKRYLLHTI
95
95
100
95





C the0536
DVNGDGRVNSSDVALLKRYLLGLVENtext missing or illegible when filed NKEAADVtext missing or illegible when filed VSGTVNSTDLAtext missing or illegible when filed MKRYVLRSI
94
87
87
74





C the0578
Dtext missing or illegible when filed NLDGKtext missing or illegible when filed NSSDVTLLKRYtext missing or illegible when filed KSDVFPTADPERSLtext missing or illegible when filed SDVNGDGRVNSTDVSYLKRYVLKII
77
74
79
100





C the0625
DLNGDGVVNSTDSVtext missing or illegible when filed LKRHIKFSEITDPVKLKAADLNGDGNNSSDVSLMKRYLLRII
86
82
84
80





C the0660
DLNGDGKtext missing or illegible when filed NSTDISLMKRYLLKQtext missing or illegible when filed DLPVEDDtext missing or illegible when filed KAADtext missing or illegible when filed NKDGKVNSTDMSLKRVtext missing or illegible when filed LRNY
88
85
80
82





C the0661
DVNGDLKVNSTDFSMLRRYLLKTtext missing or illegible when filed DNFPTENGKQAADLNGDGRtext missing or illegible when filed NSSDLTMLKRYLLMEV
86
87
84
79





C the0745
Dtext missing or illegible when filed NNDKTVNSTDVTYLKRPLLKQtext missing or illegible when filed NSLPNQKAADVNLDGNNSTDLVtext missing or illegible when filed LKRYVLRGI
91
84
82
74





C the0797
DVNGDGKNSTDCTMLKRYtext missing or illegible when filed LRGtext missing or illegible when filed EEFPSPSGtext missing or illegible when filed AADVNADLKNSTDLVLMKKYLLRSI
83
80
76
80





C the0798
DVNLDGQVNSTDFSLLKRYtext missing or illegible when filed LKVVDtext missing or illegible when filed NStext missing or illegible when filed NVTtext missing or illegible when filed ADMNNDGNtext missing or illegible when filed NSTDIStext missing or illegible when filed LKRLLRN
91
86
88
80





C the0825
DVNtext missing or illegible when filed DDGKVNSTDLTLLKRYVLKAVSTLPSSKAEKNADVNRDGRVNSSDVTLSRYLRVI
88
80
78
88





C the0912
DVNGDGTtext missing or illegible when filed NSTDLTMLKRSVLRAtext missing or illegible when filed LTDDAKARADVDKNGStext missing or illegible when filed NSTDVLLLSRYLLRVI
87
82
79
74





C the1398
DLNGDNRtext missing or illegible when filed NSTDLTLMKRYLKStext missing or illegible when filed EDLPVEDDLWAADtext missing or illegible when filed NGDGKtext missing or illegible when filed NSTDYTYLKKYLLQAI
85
86
84
88





C the1838
DVNGDGRVNSSDLTLMKRYLLKSISDFPTPEGKtext missing or illegible when filed ADLNEDGKVNSTDLLALKKLVLREL
83
81
78
83





C the2089
DVNDDGKVNSTDAVALKRYVLRSGISNTDNADLNEDGRVNSTDLGtext missing or illegible when filed LKRYtext missing or illegible when filed LKEI
89
98
94
74





C the2137
DVDGNGTVNSTDVNYMKRYLLRQtext missing or illegible when filed EEFPYEKALMAGDVDGNGNtext missing or illegible when filed NSTDLSYLKKYtext missing or illegible when filed LKLI
81
83
80
85





C the2179
DLNGDGNVNSTDStext missing or illegible when filed LMKRYLMKSVDLNEEQLKAADVNLDGRVNSTDRSLNRYLLKII
86
85
85
76





C the2193
Dtext missing or illegible when filed NDDGNtext missing or illegible when filed NSTDLQMLKRHLLRStext missing or illegible when filed LTEKQLLNADTNRDGRVDSTDLALLKRYLRVI
81
79
87
91





C the2195
DLNDDGKVNSTDFQtext missing or illegible when filed LKKHLLRtext missing or illegible when filed LLTGKNLSNADLNKDGKVDSSDLSLMKRYLLQII
86
81
86
90





C the2196
DLNNDGKVNSTDFQLLKMHVLRQELPAGTDLSNADVNRDGKVDSSDCTLLKRYtext missing or illegible when filed LRVI
82
79
84
91





C the2761
DVNGDGKVNSTDCStext missing or illegible when filed KRYLLKNEDFPYEYGKEAGDVNGDGKVNSTDYSLLKRFVLRNI
86
80
83
80





C the2811
DLNGDGKVNSTDLTMKRYtext missing or illegible when filed LKNFDKLAVPEEAADLNGDGRNSTDLSLHRYLLRII
86
87
91
82





C the2812
DLNGDQKVTSTDYTMLKRYLMKStext missing or illegible when filed DRFNTSEQAADLNRDGKNSTDLTtext missing or illegible when filed LKR
88
84
86
75














Similarity of




Amino Acid Sequence












locus
Amino Acid Sequence
Cel9D
Xyn10C
Cel48S
Cel50





C the0043
DLNGDGNNSTDFTHLKRAtext missing or illegible when filed LGNPAPGTNLAAGDLNRDGNTNSTDLMtext missing or illegible when filed LRRYLLKLI
86
83
84
80





C the0044
DNLDGKtext missing or illegible when filed STDLSALKRHtext missing or illegible when filed LRITTLSGKQLENADVNNDGSVNSTDASLKKYtext missing or illegible when filed KAI
90
81
81
88





C the0109
DFNSDSSVNSTDLMtext missing or illegible when filed LNRAVLGLG
85
90
90
80





C the0211
DVNGDHVNSSDYSLFKRYLLRVtext missing or illegible when filed DRFPVGDQSVADVNRDGRtext missing or illegible when filed DSTDLTMLKRYLtext missing or illegible when filed RA I
77
84
79
83





C the0269
DVNGDGNVNSTDLTMLKRYLLKSVTNtext missing or illegible when filed NREAADVNRDGAtext missing or illegible when filed NSSDMTLKRYLtext missing or illegible when filed KSI
88
83
89
85





C the0270
DLNGDGKVNSSDLAtext missing or illegible when filed LKRYMLRAISDFPPEGRKLADLNRDGNVNSTDYSLKRYLKAI
93
88
83
88





C the0405
DVNGDGNVNSTDVVWLRRFLLKLVEDFPVPSGKQAADMNDDGNNSTDMtext missing or illegible when filed LKRKVLKP
89
93
81
82





C the0412
DCNGDGKVNSTDAVALKRYtext missing or illegible when filed LRSGIStext missing or illegible when filed NTDNADVNADGRVNSTDLAtext missing or illegible when filed LKRYtext missing or illegible when filed LKEI
80
81
98
78





C the0413
DCNDDGKVNSTDVAVMKRYLKKENVNNLDNADVNADGKVNSTDFSLKRYVMKNI
78
78
94
81





C the0433
DLNGDGRVNSSDLALMKRYVVKQtext missing or illegible when filed EKLNVPVKAADLNGDDKVNSTDYSVLKRYLLRSI
88
85
87
88





C the0438
DLNGDNNNSSDYTLLKRYLLHTI
95
95
95
95





C the0536
DVNGDGRVNSSDVALLKRYLLGLVENtext missing or illegible when filed NKEAADVtext missing or illegible when filed VSGTVNSTDLAtext missing or illegible when filed MKRYVLRSI
85
81
87
81





C the0578
Dtext missing or illegible when filed NLDGKtext missing or illegible when filed NSSDVTLLKRYtext missing or illegible when filed KSDVFPTADPERSLtext missing or illegible when filed SDVNGDGRVNSTDVSYLKRYVLKII
88
83
74
82





C the0625
DLNGDGVVNSTDSVtext missing or illegible when filed LKRHIKFSEITDPVKLKAADLNGDGNNSSDVSLMKRYLLRII
93
88
84
87





C the0660
DLNGDGKtext missing or illegible when filed NSTDISLMKRYLLKQtext missing or illegible when filed DLPVEDDtext missing or illegible when filed KAADtext missing or illegible when filed NKDGKVNSTDMSLKRVtext missing or illegible when filed LRNY
95
90
86
95





C the0661
DVNGDLKVNSTDFSMLRRYLLKTtext missing or illegible when filed DNFPTENGKQAADLNGDGRtext missing or illegible when filed NSSDLTMLKRYLLMEV
86
82
88
90





C the0745
Dtext missing or illegible when filed NNDKTVNSTDVTYLKRPLLKQtext missing or illegible when filed NSLPNQKAADVNLDGNNSTDLVtext missing or illegible when filed LKRYVLRGI
88
81
84
86





C the0797
DVNGDGKNSTDCTMLKRYtext missing or illegible when filed LRGtext missing or illegible when filed EEFPSPSGtext missing or illegible when filed AADVNADLKNSTDLVLMKKYLLRSI
90
91
80
88





C the0798
DVNLDGQVNSTDFSLLKRYtext missing or illegible when filed LKVVDtext missing or illegible when filed NStext missing or illegible when filed NVTtext missing or illegible when filed ADMNNDGNtext missing or illegible when filed NSTDIStext missing or illegible when filed LKRLLRN
89
84
85
84





C the0825
DVNtext missing or illegible when filed DDGKVNSTDLTLLKRYVLKAVSTLPSSKAEKNADVNRDGRVNSSDVTLSRYLRVI
100
91
81
93





C the0912
DVNGDGTtext missing or illegible when filed NSTDLTMLKRSVLRAtext missing or illegible when filed LTDDAKARADVDKNGStext missing or illegible when filed NSTDVLLLSRYLLRVI
91
82
86
85





C the1398
DLNGDNRtext missing or illegible when filed NSTDLTLMKRYLKStext missing or illegible when filed EDLPVEDDLWAADtext missing or illegible when filed NGDGKtext missing or illegible when filed NSTDYTYLKKYLLQAI
90
88
86
90





C the1838
DVNGDGRVNSSDLTLMKRYLLKSISDFPTPEGKtext missing or illegible when filed ADLNEDGKVNSTDLLALKKLVLREL
91
100
83
86





C the2089
DVNDDGKVNSTDAVALKRYVLRSGISNTDNADLNEDGRVNSTDLGtext missing or illegible when filed LKRYtext missing or illegible when filed LKEI
81
83
100
80





C the2137
DVDGNGTVNSTDVNYMKRYLLRQtext missing or illegible when filed EEFPYEKALMAGDVDGNGNtext missing or illegible when filed NSTDLSYLKKYtext missing or illegible when filed LKLI
91
84
85
90





C the2179
DLNGDGNVNSTDStext missing or illegible when filed LMKRYLMKSVDLNEEQLKAADVNLDGRVNSTDRSLNRYLLKII
90
81
86
85





C the2193
Dtext missing or illegible when filed NDDGNtext missing or illegible when filed NSTDLQMLKRHLLRStext missing or illegible when filed LTEKQLLNADTNRDGRVDSTDLALLKRYLRVI
79
81
86
86





C the2195
DLNDDGKVNSTDFQtext missing or illegible when filed LKKHLLRtext missing or illegible when filed LLTGKNLSNADLNKDGKVDSSDLSLMKRYLLQII
80
83
85
83





C the2196
DLNNDGKVNSTDFQLLKMHVLRQELPAGTDLSNADVNRDGKVDSSDCTLLKRYtext missing or illegible when filed LRVI
79
82
85
86





C the2761
DVNGDGKVNSTDCStext missing or illegible when filed KRYLLKNEDFPYEYGKEAGDVNGDGKVNSTDYSLLKRFVLRNI
90
86
81
90





C the2811
DLNGDGKVNSTDLTMKRYtext missing or illegible when filed LKNFDKLAVPEEAADLNGDGRNSTDLSLHRYLLRII
90
89
87
88





C the2812
DLNGDQKVTSTDYTMLKRYLMKStext missing or illegible when filed DRFNTSEQAADLNRDGKNSTDLTtext missing or illegible when filed LKR
90
81
84
83






text missing or illegible when filed indicates data missing or illegible when filed







The amino acid sequence of each of the dockerins shown in Table 11 has 90% or greater similarity to any amino acid sequence of the 8 dockerins shown in Table 7. A preferred dockerin can be obtained by substituting aspartic acid for asparagine in at least one predicted N-type sugar chain modification site of this amino acid sequence.


When a dockerin has a relevant sequence having 90% or greater homology with the amino acid sequence of a relevant sequence of any of the known dockerins with cohesin binding ability shown in Table 7, a preferred dockerin can be obtained by substituting aspartic acid for asparagine at a predicted N-type sugar chain modification site in that relevant sequence of the dockerin. A predicted N-type sugar chain modification site in a dockerin having 90% or greater homology with the amino acid sequence of such a known dockerin is also a preferred candidate for substitution. Only the dockerins shown in Table 7 are applicable to such dockerins. A preferred dockerin can be obtained by substituting aspartic acid for asparagine at a predicted N-type sugar modification site in a relevant sequence in one of these dockerins.


Another embodiment of a dockerin-specific sequence is a dockerin-specific sequence having no intrinsic predicted N-type sugar chain modification site in one of the dockerins disclosed in Table 1 or the relevant sequences of these dockerins disclosed in Table 2. It is sufficient for the protein of the invention to have a dockerin containing at least one such dockerin-specific sequence. The following 29 relevant sequences are examples of relevant sequences that are such dockerin-specific sequences.











TABLE 12





locus

Amino Acid Sequence







C the0015
1st
DVNADGKIDSTDLTLLKRYLLRSA





C the0191
2nd
DLNGDGKITSDYNLLKRYILHLI





C the0211
2nd
DVNRDGRIDSTDLTMLKRYLIRAI





C the0246
2nd
DLNGDSKVDSTDLTALKRYLLGV I





C the0258
1st
DVNGDSKINAIDVLLMKKYILKVI





C the0258
2nd
DVNADGQtext missing or illegible when filed NSIDFTWLKKYMLKAV





C the0274
1st
DLNVDGSINSVDITYMKRYLLRSI





C the0435
2nd
DVNGDNVtext missing or illegible when filed NDIDCNYLKRYLLHM I





C the0918
1st
DLNRNGIVNDEDYtext missing or illegible when filed LLKNYLLRGN





C the1472
2nd
DLNRDNKVDSTDLTtext missing or illegible when filed LKRYLLKA I





C the1806
1st
EVtext missing or illegible when filed DTKVtext missing or illegible when filed DSTDDIVKYEYQFDKK





C the1890
2nd
DLNGDGKVTSTDYSLMKRYLLKEI





C the2038
1st
DIVLDGNINSLDMMKLKKYLIRET





C the2147
1st
DVNGDFAVNSNDLTLIKRYVLKN I





C the2147
2nd
DVDGDEKITSSDAALVKRYVLRA I





C the2193
2nd
DTNRDGRVDSTDLALLKRYtext missing or illegible when filed LRV I





C the2194
2nd
DVTKDGKVDSTDLTLLKRYtext missing or illegible when filed LRFV





C the2195
2nd
DLNKDGKVDSSDLSLMKRYLLQ II





C the2196
2nd
DVNRDGKVDSSDCTLLKRYILRV I





C the2197
2nd
DVNRDGKVDSTDVALLKRYILRQ I





C the2271
1st
DVNLDGSVDStext missing or illegible when filed DLALLYNTTYYAV





C the2271
2nd
DVNGDGTVDGtext missing or illegible when filed DLAIITAYtext missing or illegible when filed NG Q I





C the2549
2nd
DVDGNGSVSSLDLTYLKRYILRR I





C the2590
1st
DLNQDGQVSSTDLVAMKRYLLKNF





C the2812
1st
DLNGDQKVTSTDYTMLKRYLM KSI





C the2879
2nd
DTDGDGKITStext missing or illegible when filed DLSYLKRYVLRLI





C the2972
2nd
DLNGDGKVDSTDLMILHRYLLG II





C the3136
1st
DIDGNGEISSIDYAtext missing or illegible when filed LKSHLtext missing or illegible when filed NSN













C the3136
2nd
DVDGNGYVNSIDLAtext missing or illegible when filed LQMYLLGKG







text missing or illegible when filed indicates data missing or illegible when filed







The protein of the invention may be provided with a dockerin comprising one or two of the dockerin-specific sequences shown in Table 12, and typically a dockerin that inherently has such a dockerin-specific sequence is preferred. Examples of such dockerins are those shown in the following tables. In these tables, the dockerins are specified by their relevant sequences. In these dockerin-specific sequences, an amino acid at a site corresponding to a predicted N-type sugar chain modification site is preferably aspartic acid. A dockerin having one or two such dockerin-specific sequences is preferred.










TABLE 13





locus
Amino Acid Sequence



















C the0258
1st
DVNGDSKtext missing or illegible when filed NAIDVLLMKKYILKVI
2nd
DVNADGQINSIDFTWLKKYMLKAV





C the2147
1st
DVNGDFAVNSNDLTLtext missing or illegible when filed KRYVLKNI
2nd
DVDGDEKITSSDAALVKRYVLRA I





C the2271
1st
DVNLDGSVDSIDLALLYNTTYYAV
2nd
DVNGDGTVDGIDLAIITAYtext missing or illegible when filed NG Q I





C the3136
1st
DIDGNGEISSIDYAtext missing or illegible when filed LKSHLNSN
2nd
DVDGNGYVNSIDLAILQM YLLGKG















C the3141
1st
DVNGNGStext missing or illegible when filed ESTDCVWVKRYLLKQ I
2nd
DVNGNGTtext missing or illegible when filed DSTDYQLLKRFtext missing or illegible when filed LKV I







text missing or illegible when filed indicates data missing or illegible when filed







The dockerins shown in Table 13 have two relevant sequences in the dockerin, and no predicted N-type sugar chain modification site in either relevant sequence.


In the protein of the invention, the C. thermocellum type I dockerins shown in Table 7, the binding ability of which with cohesins has been confirmed from existing literature or the like, are considered when selecting a dockerin-specific sequence having no intrinsic predicted N-type sugar chain modification site in a preferred dockerin. When a dockerin has a relevant sequence with 90% or greater similarity to the amino acid sequence of any of the relevant sequences in these dockerins, it can be used as a preferred dockerin if the sequence is a natural dockerin-specific sequence with aspartic acid occupying a site corresponding to a predicted N-type sugar chain modification site in the relevant sequence.












TABLE 14










Similarity of Amino Acid Sequence




















Cel8A
Cel9K
Cbh9A
Cel9R
Cel9D
Xyn10C
Cel48S
Cel50

























locus

Amino Acid Sequence of repeated region
1st
2nd
1st
2nd
1st
2nd
1st
2nd
1st
2nd
1st
2nd
1st
2nd
1st
2nd




























C the0258
1st
DVNGDSKNAtext missing or illegible when filed DVLLMKKYLKVI
91
79
90
91
86
87
83
87
95
87
87
90
95
91
79
91



2nd
DVNADGQNStext missing or illegible when filed DFTWLKKYMLKAY
95
87
86
95
81
95
83
95
95
83
91
87
91
95
87
87





C the2147
1st
DVNGDFAVNSNDLTLKRYVLKNI
95
91
78
91
82
91
87
79
91
83
91
79
82
91
100
87



2nd
DVDGDEKIISSDAALVKRYVLRAI
95
87
91
95
96
95
95
87
95
95
95
83
95
95
87
100





C the2271
1st
DVNLDGSVDSDLALLYNTTYYAV
93
87
83
93
100
93
93
81
93
87
87
81
81
93
81
81



2nd
DVNGDGTVDGtext missing or illegible when filed DLAIITAYtext missing or illegible when filed NGQI
95
85
80
83
90
87
85
80
95
80
85
80
85
83
79
85





C the3136
1st
Dtext missing or illegible when filed DGNGEISStext missing or illegible when filed DYAtext missing or illegible when filed LKSHLtext missing or illegible when filed NSN
95
86
86
90
87
100
86
90
95
80
86
80
91
90
86
91



2nd
DVDGNGYVNStext missing or illegible when filed DLAtext missing or illegible when filed LQMYLLGKG
95
85
79
95
90
95
85
85
95
80
90
80
83
95
85
85





C the3141
1st
DVNGNGStext missing or illegible when filed ESTDCVWVKRYLLKQI
87
83
86
91
82
87
83
91
91
83
87
87
90
87
87
91



2nd
DVNGNGTtext missing or illegible when filed DSTDYQLLKRFtext missing or illegible when filed LKVI
87
83
81
87
86
91
83
87
91
83
83
86
86
83
83
95






text missing or illegible when filed indicates data missing or illegible when filed







The dockerins shown in Table 14 have two relevant sequences, and each of these relevant sequences has 90% or greater similarity to the amino acid sequence of one of the 8 dockerins shown in Table 7, the binding ability of which with cohesins has been confirmed from existing literature or the like. Moreover, one or both of these relevant sequences is a natural dockerin-specific sequence. Aspartic acid may also be substituted for asparagine at a predicted N-type sugar chain modification site in a relevant sequence that is not a natural dockerin-specific sequence.


When a dockerin has an amino acid sequence with 90% or greater similarity to the amino acid sequence of any of the aforementioned known dockerins, moreover, it can be used as a preferred dockerin if it has at least one natural dockerin-specific sequence in which a site corresponding to a predicted N-type sugar chain modification site is occupied by aspartic acid in an intrinsic relevant sequence of the dockerin.











TABLE 15









Similarity of




Amino Acid




Sequence










locus
Amino Acid Sequence
Cel8A
Cel9K





C the1806
EVtext missing or illegible when filed DTKVDSTDDtext missing or illegible when filed KYEYQFDKKtext missing or illegible when filed LCADKETEtext missing or illegible when filed LYFTVVADEEEtext missing or illegible when filed TSDNTRTLVLSVNNDSTDKTTVSGY
67
75





C the2147
DVNGDFAVNSNDLTLtext missing or illegible when filed KRYVLKNDEFPSSHGLKAADVDGDEKITSSDAALVKRYVLRAI
85
78





C the3136
Dtext missing or illegible when filed DGNGEISStext missing or illegible when filed DYAtext missing or illegible when filed LKSHLtext missing or illegible when filed NSNLTFKQLAAADVDGNGYVNStext missing or illegible when filed DLAtext missing or illegible when filed LQMYLLGKGGTSDI
87
88





C the3141
DVNGNGStext missing or illegible when filed ESTDCVWVKRYLLKQtext missing or illegible when filed DSFPNENGARAADVNGNGTtext missing or illegible when filed DSTDYQLLKRFtext missing or illegible when filed LKVI
75
84














Similarity of




Amino Acid




Sequence










locus
Amino Acid Sequence
Cbh9A
Cel9R





C the1806
EVtext missing or illegible when filed DTKVDSTDDtext missing or illegible when filed KYEYQFDKKtext missing or illegible when filed LCADKETEtext missing or illegible when filed LYFTVVADEEEtext missing or illegible when filed TSDNTRTLVLSVNNDSTDKTTVSGY
91
83





C the2147
DVNGDFAVNSNDLTLtext missing or illegible when filed KRYVLKNDEFPSSHGLKAADVDGDEKITSSDAALVKRYVLRAI
81
82





C the3136
Dtext missing or illegible when filed DGNGEISStext missing or illegible when filed DYAtext missing or illegible when filed LKSHLtext missing or illegible when filed NSNLTFKQLAAADVDGNGYVNStext missing or illegible when filed DLAtext missing or illegible when filed LQMYLLGKGGTSDI
90
73





C the3141
DVNGNGStext missing or illegible when filed ESTDCVWVKRYLLKQtext missing or illegible when filed DSFPNENGARAADVNGNGTtext missing or illegible when filed DSTDYQLLKRFtext missing or illegible when filed LKVI
80
90














Similarity of




Amino Acid




Sequence










locus
Amino Acid Sequence
Cel9D
Xyn10C





C the1806
EVtext missing or illegible when filed DTKVDSTDDtext missing or illegible when filed KYEYQFDKKtext missing or illegible when filed LCADKETEtext missing or illegible when filed LYFTVVADEEEtext missing or illegible when filed TSDNTRTLVLSVNNDSTDKTTVSGY
75
70





C the2147
DVNGDFAVNSNDLTLtext missing or illegible when filed KRYVLKNDEFPSSHGLKAADVDGDEKITSSDAALVKRYVLRAI
93
86





C the3136
Dtext missing or illegible when filed DGNGEISStext missing or illegible when filed DYAtext missing or illegible when filed LKSHLtext missing or illegible when filed NSNLTFKQLAAADVDGNGYVNStext missing or illegible when filed DLAtext missing or illegible when filed LQMYLLGKGGTSDI
80
73





C the3141
DVNGNGStext missing or illegible when filed ESTDCVWVKRYLLKQtext missing or illegible when filed DSFPNENGARAADVNGNGTtext missing or illegible when filed DSTDYQLLKRFtext missing or illegible when filed LKVI
80
75














Similarity of




Amino Acid




Sequence










locus
Amino Acid Sequence
Cel48S
Cel50





C the1806
EVtext missing or illegible when filed DTKVDSTDDtext missing or illegible when filed KYEYQFDKKtext missing or illegible when filed LCADKETEtext missing or illegible when filed LYFTVVADEEEtext missing or illegible when filed TSDNTRTLVLSVNNDSTDKTTVSGY
70
100





C the2147
DVNGDFAVNSNDLTLtext missing or illegible when filed KRYVLKNDEFPSSHGLKAADVDGDEKITSSDAALVKRYVLRAI
80
100





C the3136
Dtext missing or illegible when filed DGNGEISStext missing or illegible when filed DYAtext missing or illegible when filed LKSHLtext missing or illegible when filed NSNLTFKQLAAADVDGNGYVNStext missing or illegible when filed DLAtext missing or illegible when filed LQMYLLGKGGTSDI
90
77





C the3141
DVNGNGStext missing or illegible when filed ESTDCVWVKRYLLKQtext missing or illegible when filed DSFPNENGARAADVNGNGTtext missing or illegible when filed DSTDYQLLKRFtext missing or illegible when filed LKVI
76
91






text missing or illegible when filed indicates data missing or illegible when filed







The dockerins shown in Table 15 have amino acid sequences each having 90% or greater similarity to the amino acid sequence of any of the 8 dockerins shown in Table 7. They can be used as preferred dockerins because they have natural dockerin-specific sequences in which at least one site corresponding to a N-type sugar chain modification site in the amino acid sequence is occupied by aspartic acid. When there is another relevant sequence in which a predicted N-type sugar chain modification site is occupied by asparagine, aspartic acid can be substituted for that asparagine.


The protein of the invention can be provided with an active site in addition to the dockerin. The type of active site can be selected appropriately according to the use. The protein of the invention can also be an artificial protein in which a dockerin is suitably combined with an active site. A cellulase that is a constituent protein of a cellulosome and already has an intrinsic dockerin can also be used either as is or after modifications.


The protein of the invention can have cellulolysis promoting activity for example when it is used to saccharify a cellulose-containing material from biomass. That is, it can be provided with a cellulolysis-promoting active site. Examples of cellulolysis-promoting activity include cellulase activity, cellulose-binding activity, cellulose loosening activity and the like.


An active site in a known cellulase can be used appropriately as a cellulase active site. Examples of cellulases include endoglucanase (EC 3.2.1.74), cellobiohydrolase (EC 3.2.1.91) and β-glucosidase (EC 23.2.4.1, EC 3.2.1.21). Cellulases are classified into 13 families (5, 6, 7, 8, 9, 10, 12, 44, 45, 48, 51, 61, 74) of the GHF (glycoside hydrolase family) (http://www.cazy.org/fam/acc.gh.html) based on similarity of amino acid sequence. It is also possible to combine cellulases of the same or different kinds classified into different families.


A cellulase is not particularly limited but is preferably one that itself has strong cellulase activity. Examples of such cellulases include those derived from Phanerochaete, Trichoderma reesei and other Trichoderma, Fusarium, Tremetes, Penicillium, Humicola, Acremonium, Aspergillus and other filamentous bacteria as well as from Clostridium, Pseudomonas, Cellulomonas, Ruminococcus, Bacillus and other bacteria, Sulfolobus and other Archaea, and Streptomyces, Thermoactinomyces and other Actinomycetes. These cellulases or their active sites may also be artificially modified.


Because the protein of the invention is derived from C. thermocellum, its cellulolysis-promoting activity is preferably conferred by an amino acid sequence derived from Clostridium thermocellum.


From the standpoint of effective use of biomass, the protein of the invention may be provided with a hemicellulase active site. A lignin decomposing enzyme such as lignin peroxidase, manganese peroxidase or laccase is also possible. Other examples include the cellulose loosening proteins expansin and swollenin, and cellulose-binding domains (proteins) that are constituents of cellulosomes and cellulases. Other examples include xylanase, hemicellulase and other biomass decomposing enzymes. All these proteins can improve the accessibility of the cellulase to cellulose.


This protein is preferably provided with the function of extracellular secretability in eukaryotic microorganisms. That is, it is preferably a protein that is produced as a secretory protein in eukaryotic microorganisms. Cellulase and other enzymes often have intrinsic signals for extracellular secretion. A known secretion signal can be used to confer extracellular secretability on a dockerin protein. The secretion signal is selected appropriately according to the type of eukaryotic microorganism. Secretion signals and the like will be explained below.


A person skilled in the art will be able to produce the protein of the invention by genetic recombination or the like in a suitable host microorganism, or obtain it by chemical synthesis.


As explained above, because the protein of the invention has a specific dockerin it has improved binding ability with type I cohesins from C. thermocellum, and may have improved accumulation and accumulated density on scaffolding proteins with such cohesins.


(Scaffolding Protein Having Type I Cohesin from C. Thermocellum)


The protein of the present invention is suitable as a protein for constructing a complex with a scaffolding protein having a type I cohesin from C. thermocellum. A scaffolding protein having a type I cohesin from C. thermocellum can be provided with 1 or 2 or more type cohesins from C. thermocellum. Cohesins are known as domains on type I and other scaffolding proteins that bind non-covalently to cellulases and the like with enzymatic activity in cellulosomes formed by cellulosome-producing microorganisms (Sakka et al., Protein, Nucleic Acid and Enzyme, Vol. 44, No. 10 (1999), pp. 41-50; Demain, A. L. et al., Microbial. Mol. Biol. Rev., 69(1), 124-54 (2005); Doi, R. H. et al., J. Bacterol., 185(20), 5907-5914 (2003), etc.). A scaffolding protein from C. thermocellum for binding with the protein of the invention has at least a type I cohesin domain on a type I scaffolding protein. It may also be provided with a type II cohesin domain on a type II scaffolding protein and a type III cohesin domain on a type III scaffolding protein. A number of sequences of such different types of cohesin domains have been determined in various cellulosome-producing microorganisms. The amino acid sequences and DNA sequences of these various types of cohesins can be easily obtained from various protein databases and DNA sequence databases accessible via the NCBI HP (http://www.ncbi.nlm.nih.gov/).


A scaffolding protein having a cohesin from C. thermocellum need not itself be a scaffolding protein from C. thermocellum as long as it has a type I cohesin from C. thermocellum, and may be an artificial protein. The scaffolding protein may have a natural type I cohesin from C. thermocellum, or may have a modified cohesin with one or two or more mutations (additions, insertions, deletions or substitutions) introduced in the amino acid sequence of such a cohesin as long as binding ability is retained. Multiple such cohesins or the like may also be provided at suitable intervals in the cohesin protein. The amino acid sequence of a type I scaffolding protein and such a sequence with suitable mutations introduced therein can be used for the total amino acid sequence of the cohesin protein, and for the amino acid sequences between cohesins if such are present.


The scaffolding protein may also have a cellulose binding domain (CBD) of a scaffolding protein selected from types I to III. CBDs are known as domains in scaffolding proteins that bind to cellulose substrates (see Sakka et al above). There may be one or two or more cellulose binding domains, Many amino acid sequences and DNA sequences of CBDs in the cellulosomes of various cellulosome-producing microorganisms have already been determined. These various CBD amino acid sequences and DNA sequences can be easily obtained from various protein databases and DNA sequences databases accessible through the NCBI HP (http://www.ncbi.nlm.nih.gov/) and the like.


The scaffolding protein preferably has extracellular secretability or cell surface display properties in eukaryotic microorganisms. That is, it is preferably a protein that is produced as a secretory protein in eukaryotic microorganisms, or a protein that is displayed on the cell surfaces of eukaryotic microorganisms. A known secretory signal or surface display system can be used to give a cohesin protein extracellular secretability or cell surface display properties.


A person skilled in the art can produce a scaffolding protein having these various domains as necessary by genetic recombination or the like in a suitable host microorganism. Cohesin proteins having cohesin domains of these various scaffolding proteins can also be obtained by chemical synthesis.


As explained above, because the protein of the invention has a specific dockerin, it has improved binding with type I cohesins from C. thermocellum, and may have enhanced accumulation and accumulated density on scaffolding proteins with such cohesins.


(Protein Complex)


The disclosures of this Description also provide a protein complex comprising a scaffolding protein having a type I cohesin from C. thermocellum and the protein of the invention bound to this scaffolding protein. This protein complex has enhanced activity of the protein of the invention because the accumulated amount and/or accumulated density of the protein of the invention is greater.


(Eukaryotic Microorganism Provided with Protein Complex on Cell Surface)


The eukaryotic microorganism disclosed in this Description is provided on the cell surface with the protein complex disclosed in this Description. In this eukaryotic microorganism, the scaffolding protein and protein of the invention making up the protein complex may be supplied from outside the cell and self-assembled on the cell surface to construct the protein complex, but preferably the microorganism produces these proteins itself. This is because sugar chain modification by the sugar chain modification system is eliminated or controlled even when the protein of the invention is produced within an eukaryotic microorganism, resulting in improved cohesin binding.


When the protein of the invention has cellulase activity or other cellulolysis promotion activity, a protein complex comprising accumulated proteins having cellulase or other cellulolysis promotion activity can be constructed on the cell surface of the eukaryotic microorganism. Such a eukaryotic microorganism can use glucose obtained by decomposition and saccharification of a cellulose-containing material on its cell surface as a carbon source.


There are no particular limits on how the DNA coding for such a protein is retained within the host microorganism as long as it is able to express the protein. For example, it can be linked under the control of a promoter capable of operating in the eukaryotic microorganism, and with a suitable terminator located downstream therefrom. The promoter may be a constitutive promoter or an inducible promoter. In this state, the DNA may be incorporated into a host chromosome, or may be in the form of a 2μ plasmid held within the host nucleus or a plasmid held outside the nucleus. In general, a selection marker gene that is usable in the host is retained at the same time when introducing such exogenous DNA.


The dockerin proteins and cohesin proteins produced in the eukaryotic microorganism are preferably given extracellular secretability or cell surface display properties. The protein of the invention is preferably given extracellular secretability, while the scaffolding protein is preferably given cell surface display properties, by which it is excreted outside the cell and displayed on the cell surface. To give it extracellular secretability, a protein is assigned a secretory signal. Examples of excretory signals include secretory signals of the Rhizopus oryzae and C. albicans glucoamylase genes, yeast invertase leaders, α-factor leaders and the like. Using an agglutinating protein or a part thereof, the protein can be secreted in such a way that it is displayed on the surface of the eukaryotic microorganism. One example is a peptide consisting of 320 amino acid residues of the 5′ region of the SAG1 gene, which codes for the agglutinating protein α-agglutinin. Polypeptides and methods for displaying desired proteins on cell surfaces are disclosed in WO 01/79483, Japanese Patent Application Laid-open No. 2003-235579, WO 2002/042483 pamphlet, WO 2003/016525 pamphlet, Japanese Patent Application Laid-open No. 2006-136223, and the publications of Fujita et al (Fujita et al., 2004, Appl. Environ. Microbiol. 70:1207-1212 and Fujita et al., 2002, Appl. Environ. Microbiol. 68:5136-5141), and Murai et al., 1998, Appl. Environ. Microbiol. 64:4857-4861.


The eukaryotic microorganism is not particularly limited, and for example various known yeasts can be used. For purposes of ethanol fermentation and the like as discussed below, examples include Saccharomyces cerevisiae and other Saccharomyces yeasts, Schizosaccharomyces pombe and other Schizosaccharomyces yeasts, Candida shehatae and other Candida yeasts, Pichia stipitis and other Pichia yeasts, Hansenula yeasts, Trichosporon yeasts, Brettanomyces yeasts, Pachysolen yeasts, Yamadazyma yeasts, and Kluyveromyces marxianus, Kluyveromyces lactis and other Kluyveromyces yeasts. Of these, a Saccharomyces yeast is desirable from the standpoint of industrial utility and the like, and Saccharomyces cerevisiae is especially desirable.


A eukaryotic microorganism expressing an exogenous protein can be prepared according to the methods described in Molecular Cloning, 3rd Ed., Current Protocols in Molecular Biology and the like. Vectors and methods for constructing vectors for expressing the protein of the invention and scaffolding protein in a eukaryotic microorganism are similarly well-known to those skilled in the art. The vector can be in various forms according to the mode of use. For example, it can assume the form of a DNA fragment, or of a 2 micron plasmid or other suitable yeast vector. The eukaryotic microorganism disclosed in this description can be obtained by transforming a eukaryotic microorganism with such a vector. Various conventional known methods can be adopted for transformation, such as transformation methods, transfection methods, conjugation methods, protoplast methods, electroporation, lipofection, lithium acetate methods and the like.


(Method for Producing Useful Substance)


The method for producing a useful substance disclosed in this Description may comprise a step of saccharifying and fermenting a cellulose-containing material by means of a process whereby a cellulose-containing material is fermented as a carbon source using the eukaryotic microorganism disclosed in this Description, in which the protein of the invention has cellulolysis promotion activity. With this method, a cellulose-containing material can be directly decomposed and saccharified with a eukaryotic microorganism, and used as glucose or the like by the eukaryotic microorganism. A useful substance is produced by this fermentation step according to the useful substance production ability of the eukaryotic microorganism used.


The useful substance is a product obtained by fermentation of glucose or the like by the eukaryotic microorganism, and differs both according to the type of eukaryotic microorganism and the fermentation conditions. The useful substance is not particularly limited, and can be any produced by yeasts and other eukaryotic microorganisms using glucose. The useful substance may also be a compound that is not an intrinsic metabolite, but one that the yeast or other eukaryotic microorganism has been made capable of synthesizing by a genetically engineered substitution, addition or the like in one or two or more enzymes in the glucose metabolism system. Examples of useful substances include ethanol as well as C3-5 lower alcohols, lactic acid and other organic acids, fine chemicals obtained by addition of isoprenoid synthesis pathways (coenzyme Q10, vitamins and other raw materials and the like), glycerin, plastics, synthetic raw materials and the like obtained by modifications in the glycolytic system, and other materials used in biorefinery technology. The useful substance production step may be followed by a step of collecting a useful substance-containing fraction from the culture liquid, and a further step of refining or concentrating this fraction. The collection step and refining or other step can be selected appropriately according to the type of useful substance and the like.


Proteins of the invention retained as protein complexes on the surface of the eukaryotic microorganism preferably have two or more cellulolysis-promoting activities. For example, it is desirable to use two or more cellulases having endoglucanase and cellobiohydrolase or other activity, respectively.


A cellulose-containing material is a material containing cellulose, a β-glucan consisting of D-glucose units condensed through β-1,4 glycosidic bonds. The cellulose-containing material may be any containing cellulose, regardless of derivation or form. Consequently, the cellulose-containing material may include lignocellulose material, crystalline cellulose material, soluble cellulose material (amorphous cellulose material), insoluble cellulose material and various other cellulose materials and the like for example. Examples of lignocellulose materials include lignocellulose materials comprising complexes of lignin and the like in the wood and leaves of woody plants and the leaves, stalks, roots and the like of herbaceous plants. These lignocellulose materials may be rice straw, wheat straw, corn stalks, bagasse and other agricultural waste, collected wood, brush, dried leaves and the like and chips obtained by grinding these, sawdust, chips and other sawmill waste, forest thinnings, damaged wood and other forest waste, and construction waste and other waste products. Examples of crystalline cellulose materials and insoluble cellulose materials include crystalline or insoluble cellulose materials containing crystalline cellulose and insoluble cellulose after separation of lignin and the like from lignocellulose materials. Cellulose materials may also be derived from used paper containers, used paper, used clothes and other used fiber materials and pulp wastewater.


Prior to being brought into contact with a cellulase, cellulose-containing material may also be subjected to suitable pretreatment or the like in order to facilitate decomposition by the cellulase. For example, the cellulose can be partially hydrolyzed to render it amorphous or reduce its molecular weight under acidic conditions using an inorganic acid such as sulfuric acid, hydrochloric acid, phosphoric acid, nitric acid or the like. It can also be treated with supercritical water, alkali, pressurized hot water or the like to render it amorphous or reduce its molecular weight.


Cellulose-containing materials include polymers and derivatives of polymers of glucose units condensed through β-1,4-glycosidic bonds. The degree of glucose polymerization is not particularly limited. Derivatives include carboxymethylated, aldehyded, esterified and other derivatives. The cellulose may be either crystalline cellulose or amorphous cellulose.


As understood in the technical field, identity or similarity in this Description signifies a relationship between two or more proteins or polynucleotides, determined by comparing their sequences. In this field, “identity” signifies the degree of sequence invariance between proteins or polynucleotides as determined by alignment between proteins or polynucleotides or in some cases alignment between a series of such sequences. Similarity signifies the degree of correlation between protein or polynucleotide sequences as determined by alignment of protein or polynucleotide sequences, or in some cases alignment between a series of such sequences. More specifically, it is determined by the identity or conservation (substitution that maintains the physical characteristics of a sequence or specific amino acid in a sequence) of the sequence. In the BLAST sequence homology test results below, similarity is called similarity. The method of determining identity or similarity is preferably designed so as to show the longest possible alignment between sequences. Methods for determining identity or similarity are provided by available public programs. For example, they can be determined using the BLAST (Basic Local Alignment Search Tool) program of Altschul et al (for example, Altschul, S F, Gish W, Miller W, Myers E W, Lipman D J, J. Mol. Biol. 215:403-410 (1990); Altschul S F, Madden T L, Schaffer A A, Zhang J, Miller W, Lipman D J, Nucleic Acids Res. 25:3389-3402 (1997)). The conditions when using software such as BLAST are not particularly limited, but the default values are used by preference.


EXAMPLES

The present invention is explained in detail below using examples, but the present invention is not limited by these examples. The gene recombination operations below were performed in accordance with Molecular Cloning: A Laboratory Manual (T. Maniatis et al., Cold Spring Harbor Laboratory).


Example 1

A pAI-AGA1 vector (FIG. 1) was prepared having an AAP1 homologous region and a HOR7 promoter upstream and a Tdh3 terminator, His3 marker and AAP1 homologous region downstream from an aga1 gene, which was amplified and cloned by ordinary PCR methods. The yeast S. cerevisiae BY 4741 was transformed using this vector to obtain a BY-AGA1 yeast displaying large quantities of aga1 on the cell surface.


Example 2

CBD-cohesin was amplified and cloned by ordinary PCR methods from the C. thermocellum genome (SEQ ID NO. 215). A pDL-CtCBDCohAGA2 vector was then prepared having an ADH3 homologous region and HOR7 promoter upstream and a V5-tag, aga2, Tdh3 terminator, Leu2 marker and ADH3 homologous region downstream from the resulting gene (FIG. 2). The resulting vector was introduced into the BY-AGA1 yeast prepared in Example 1, to obtain a CtCBDcoh yeast displaying cohesin from C. thermocellum on the cell surface.


Example 3

The Cel48S dockerin gene was amplified and cloned by ordinary PCR methods from the C. thermocellum genome (SEQ ID NO. 216). Using the resulting Cel48S dockerin gene as a template, two primers, 48Sdock-N18A-Fw and 48Sdock-N50A-Rv (SEQ ID NOS. 217, 218) were used to obtain a gene having alanine substituted for the No. 18 and No. 50 asparagines. A gene having aspartic acid substituted for the No. 18 and No. 50 asparagines was obtained in the same way using the two primers 48Sdock-N18D-Fw and 48Sdock-N50D-Rv (SEQ ID NOS. 219, 220) with the Cel48S dockerin gene as the template (FIG. 3). A pXU-Cel48Sdoc vector, pXU-Cel48S-N-A-doc vector and pXU-Cel48S-N-Doc vector were prepared each having a HXT3 homologous region, HOR7 promoter and His-tag upstream and a Tdh3 terminator, Ura3 marker and HXT3 homologous region downstream from the respective gene (FIG. 4). The resulting vectors were introduced into the CtCBDcoh yeast obtained in Example 2 to obtain CtCBDcoh48Sdoc, CtCBDcoh48SdocN-A and CtCBDcoh48SdocN-D displaying cohesin from C. thermocellum on the cell surface and simultaneously producing a dockerin or amino acid-substituted dockerin.


Example 4

The three yeasts CtCBDcoh48Sdoc, CtCBDcoh48SdocN-A and CtCBDcoh48SdocN-D obtained in Example 3 were each cultured for 24 hours at 30° C. in YP+2% glucose medium, and the equivalent of OD 600=0.5, 62.5 μl was collected, washed with PBS solution, mixed with PBS+1 mg/ml BSA+anti-His-FITC solution, reacted for 30 minutes at 4° C., and washed twice with PBS solution, and the amount of dockerin displayed on the yeast cell surface was then evaluated by flow cytometry. The amount of Cel48S dockerin displayed was reduced by about half by substitution of alanine for asparagine. On the other hand, the amount of Cel48S dockerin displayed was increased by 3.3 times by substitution of aspartic acid for asparagine (FIG. 5).


Example 5

The Xyn10C dockerin gene was amplified and cloned by ordinary PCR methods from the C. thermocellum genome (SEQ ID NO. 221). Genes having aspartic acid substituted for the No. 18 and No. 54 asparagine were obtained using the two primers 10Cdock-N18D-Fw and 10Cdock-N50D-Rv (SEQ ID NOS. 222, 223) with this Xyn10C dockerin gene as the template (FIG. 6). A pXU-Xyn10Cdoc vector and pXU-Xyn10C-N-D-doc vector were prepared each having an HXT3 homologous region, HOR7 promoter and His-tag upstream and a Tdh3 terminator, Ura3 marker and HXT3 homologous region downstream from the respective genes (FIG. 7). The resulting vectors were each introduced into the CtCBDcoh yeast obtained in Example 2 to obtain CtCBDcoh10Cdoc and CtCBDcoh10CdocN-D displaying cohesin from C. thermocellum on the cell surface and simultaneously producing a dockerin or amino acid-substituted dockerin.


Example 6

The two yeasts CtCBDcoh10Cdoc and CtCBDcoh10CdocN-D obtained in Example 5 were each cultured for 24 hours at 30° C. in YP+2% glucose medium, and the equivalent of OD 600=0.5, 62.5 μl was collected, washed with PBS solution, mixed with PBS+1 mg/ml BSA+anti-His-FITC solution, reacted for 30 minutes at 4° C., and washed twice with PBS solution, and the amount of dockerin displayed on the yeast cell surface was then evaluated by flow cytometry. The amount of Xyn10C dockerin displayed was increased by 1.8 times by substitution of aspartic acid for asparagine (FIG. 8). The two asparagines targeted in the dockerin in this case are conserved in about 82% of the 142 dockerins attributed to the C. thermocellum genome. Since amino acid substitution had a similar effect in two different dockerins, Cel48S and Xyn10C, it appears that this is applicable to most enzyme groups of C. thermocellum.


Example 7

The Cel8A cellulase gene was amplified and cloned by ordinary PCR methods from the C. thermocellum genome (SEQ ID NO. 224). The resulting gene was spliced to the Cel48S dockerin gene obtained in Example 3 and to a gene having aspartic acid substituted for the No. 18 and No. 50 asparagines of the Cel48S dockerin, and pXU-Cel8A-Cel48Sdoc and pXU-Cel8A-Cel48S-N-D-doc vectors were prepared each having a HXT3 homologous region, HOR7 promoter and His-tag upstream and a Tdh3 terminator, Ura3 marker and HXT3 homologous region downstream from the respective gene (FIG. 9). The resulting vectors were each introduced into the CtCBDcoh yeast obtained in Example 2 to obtain CtCBDcohCel8A48Sdoc and CtCBDcohCel8A48SdocN-D, each displaying a cohesin from C. thermocellum on the cell surface and simultaneously producing a dockerin-type cellulase or amino acid-substituted dockerin-type cellulase.


Example 8

The two yeasts CtCBDcohCel8A48Sdoc and CtCBDcohCel8A48SdocN-D obtained in Example 7 were cultured for 24 hours at 30° C. in YP+2% glucose medium, and the equivalent of OD 600=0.5, 62.5 μl was collected, washed once with PBS solution, mixed with PBS+1 mg/ml BSA+anti-His-FITC solution, reacted for 30 minutes at 4° C., and washed twice with PBS solution, and the displayed amount of CelA on the yeast cell surface was evaluated by flow cytometry. An increase in the displayed amount of CelA was confirmed due to amino acid substitution (FIG. 10).


Example 9

The two yeasts CtCBDcohCel8A48Sdoc and CtCBDcohCel8A48SdocN-D obtained in Example 7 were cultured for 24 hours at 30° C. in YP+2% glucose medium, and the equivalent of OD 600=1, 1 ml was collected, washed with 50 mM acetic acid buffer pH 6.0 solution, mixed with 1% CMC, 20 mM acetic acid buffer pH 6.0 solution, and reacted for 2 hours at 40° C. to decompose the CMC. CMC decomposition activity was increased by amino acid substitution (FIG. 11), indicating improved saccharification ability of the yeast.


[Sequence Table Free Text]

SEQ ID NOS. 217, 218, 219, 220, 222, 223: Primers


[Sequence Tables]

Claims
  • 1. A protein for constructing a protein complex using a framework having a type I cohesin from C. thermocellum, the protein having a dockerin containing at least one dockerin-specific sequence which is associated with cohesin binding in a type I dockerin from C. thermocellum and which fulfills one of the following conditions (a) and (b):(a) having no intrinsic predicted N-type sugar chain modification site;(b) having aspartic acid substituted for an asparagine of an intrinsic predicted N-type sugar chain modification site.
  • 2. The protein according to claim 1, wherein the dockerin-specific sequence fulfilling the condition (b) is an amino acid sequence having aspartic acid substituted for asparagine at the N-type sugar chain modification site in an amino acid sequence selected from the following table.
  • 3. The protein according to claim 1, wherein the dockerin is characterized by a dockerin-specific sequence of one or two of the amino acid sequences described in the following tables, and includes at least one the dockerin-specific sequence fulfilling the condition (b).
  • 4. The protein according to claim 1, wherein the dockerin is characterized by an amino acid sequence selected from the following table, and includes at least one dockerin-specific sequence fulfilling the condition (b).
  • 5. The protein according to claim 1, wherein the amino acid occupying a site corresponding to a predicted N-type sugar chain modification site is aspartic acid in the dockerin-specific sequence fulfilling the condition (a).
  • 6. The protein according to claim 1, wherein the dockerin-specific sequence fulfilling the condition (a) is an amino acid sequence selected from the following table.
  • 7. The protein according to claim 5, wherein the dockerin is characterized by a dockerin-specific sequence of two of the amino acid sequences described in the following table, and includes at least one dockerin-specific sequence fulfilling the condition (a).
  • 8. The protein according to claim 5, wherein the dockerin is characterized by an amino acid sequence selected from the following table, and includes at least one dockerin-specific sequence fulfilling the condition (a).
  • 9. The protein according to claim 1, having cellulolysis-promoting activity.
  • 10. The protein according to claim 9, wherein the cellulolysis-promoting activity is cellulase activity.
  • 11. The protein according to claim 9, wherein the cellolysis-promoting activity is conferred by an amino acid sequence from Clostridium thermocellum.
  • 12. A eukaryotic microorganism having, on a cell surface, a protein complex using a scaffolding protein having a type I cohesin from Clostridium thermocellum, the eukaryotic microorganism being provided with a scaffolding protein having a type I cohesin from Clostridium thermocellum and a protein for constructing a protein complex using a framework having a type I cohesin from C. thermocellum and bound to the scaffolding protein,the protein having a dockerin containing at least one dockerin-specific sequence which is associated with cohesin binding in a type I dockerin from C. thermocellum and which fulfills one of the following conditions (a) and (b):(a) having no intrinsic predicted N-type sugar chain modification site;(b) having aspartic acid substituted for an asparagine of an intrinsic predicted N-type sugar chain modification site.
  • 13. The eukaryotic microorganism according to claim 12, wherein the scaffolding protein and the protein are produced by the eukaryotic microorganism itself.
  • 14. The eukaryotic microorganism according to claim 12, wherein the eukaryotic microorganism is a yeast.
  • 15. A method of producing a useful substance, the method comprising a step of saccharifying and fermenting a cellulose-containing material using a step of fermenting a cellulose-containing material as a carbon source in use of a eukaryotic microorganism, in which the protein has cellulolysis-promoting activity,the eukaryotic microorganism having, on a cell surface, a protein complex using a scaffolding protein having a type I cohesin from Clostridium thermocellum, the eukaryotic microorganism being provided with a scaffolding protein having a type I cohesin from Clostridium thermocellum and a protein for constructing a protein complex using a framework having a type I cohesin from C. thermocellum and bound to the scaffolding protein,the protein having a dockerin containing at least one dockerin-specific sequence which is associated with cohesin binding in a type I dockerin from C. thermocellum and which fulfills one of the following conditions (a) and (b):(a) having no intrinsic predicted N-type sugar chain modification site;(b) having aspartic acid substituted for an asparagine of an intrinsic predicted N-type sugar chain modification site.
Priority Claims (1)
Number Date Country Kind
2010-088952 Apr 2010 JP national