This application contains sequence data provided on a computer readable diskette and as a paper version. The paper version of the sequence data is identical to the data provided on the diskette.
1. Technical Field of the Invention
The present invention relates to protein detection and purification. More specifically the invention relates to protein recognition sites. Even more specifically the invention relates to the modified peptide sequence from the Semliki Forest Virus (SFV) encoded non-structural protein suitable for use as recognition site for recombinant non-structural protease of SFV (hereafter “SFV protease site”), the nucleotide sequence and its variants that encode the recognition site. The invention also extends to the SFV protease recognition site fused into a polypeptide, inserted into a polypeptide sequence or placed between any peptide or protein tag and polypeptide sequence. The invention also extends to the methods for using the SFV protease site and corresponding enzyme.
2. Background of the Invention
Site-specific proteolytic processing of expressed proteins is a widely used technical approach. This approach is used to remove unwanted sequences from expressed and purified recombinant proteins. Such unwanted sequences are often expression and/or purification tags; they can be peptide tags or protein tags. Peptide tags usually contain 4 to 20 amino acids, while protein tags usually have a molecule weight of some kDas. This approach is also used to process the multi-domain proteins into individual proteins both in vitro and in vivo (in living cells) conditions. Currently this approach is commonly used, but along with development of the methods of functional proteomics and methods for analysis of protein-protein interactions and protein functions directly in cell there is an increasing need for more precise, highly specific effective instruments.
Tagging (epitope tags, affinity tags, tags which stabilize the expressed protein or facilitate its correct folding in cells) is a widely used technology. Most of the proteins, currently used in biotechnology industry and for research purposes, are at some stage expressed as tagged fusion proteins since this allows using common and well established technologies for their detection, purification and concentration. However, because tags are usually immunogenic; because they can affect the protein structure and its ability to crystallize; because they can also mask the functional domains of recombinant protein and/or block specific and significant interactions with other proteins or cofactors, removal of the tags is an essential step before functional characterization of these recombinant proteins is possible. The tag removal is usually achieved by use of site specific processing with different proteases (thrombin, enterokinase, factor Xa, TEV (tobacco etch virus) protease and several others). Tag removal with proteases requires that the sequence encoding for protease recognition site has to be included in the expression vector. This sets certain limitations for protease recognition sites that can be used for this kind of vector type design:
Since all these properties can not always be combined inside of one vector, one usually has to choose between different sets of vectors depending on the purposes: Vectors with maximally efficient cleavage site provide rapid and highly efficient cleavage. With this kind of cleavage sites in the vector the amount of substrate processed by one unit of enzyme is as high as possible. Vectors with maximally precise cleavage site provide cleavage to take place as close as possible to the N- or C-terminus of recombinant proteins. Thus, depending on the nature of experimental and/or technological setup using one and same enzyme different cleavage sites can provide different results.
For use of the cleavage in in vivo conditions additional requirements will apply. The protease used for these experiments must be highly specific, must not cause injuries of the cells and the cleavage should be highly efficient. One application of the in vivo cleavage is to affect on the expression protein stability by removing degradation signals from the protein or to cleave protein in such a way that the N-terminal amino acid residue of cleaved protein will be recognized by protein degradation machinery and the cleaved protein will be degraded by N-end rule. For this kind of approach either inducible cell lines, conditionally expressing the protease or high efficiency cell co-transfection systems would be beneficial.
The high importance of these problems has led to commercialization of set of enzymes with site specific protease activity and corresponding vector plasmids. The enzymes have different (cellular, viral) origins and include thrombin, enterokinase, factor Xa, TEV (tobacco etch virus) protease and several others. The list of enzymes used for these purposes is growing and the information of the enzymatic and structural properties is expanding. The ideal combination of protease and its recognition sequence should fulfill the following criteria:
In spite of the efforts to develop an ideal combination, so far none of the available protease/recognition site-combinations meets all the conditions of an ideal system as listed above. The present invention discloses a system that meets all these conditions and thereby introduces a novel, highly useful, precise and specific tool for site-specific proteolytic processing of proteins.
Semliki Forest virus (SFV) belongs to genus Alphavirus (family Togaviridae) together with 27 other known viruses. Alpha viruses infect their vertebrate hosts (mammals, birds and fish) and invertebrate transmission vectors (mosquitoes). In infected organisms the alpha viruses replicate in different cells to a high titer.
Alphavirus genome encodes for two protease activities—one is associated with virus coat protein which is an autoproteinase and another with non-structural protein nsP2, which cleaves three cleavage sites in alpha virus non-structural polyprotein P1234 (Merits et al., 2001, J. Gen Virol. 2001: 82:765–773). These cleavage sites have different consensus sequences and they differ from each other by the mode of proteolytic cleavage (in cis or in trans), the enzymatic activity required for the cleavage (intact nsP2 or protease domain of nsP2) and by the cleavage efficiency (Vasilieva et al., 2001: J. of Biol. Chem. 276(33): 30786–30793).
NsP2 consists of two enzymatically active domains: N-terminal NTPase/helicase/RNA triphosphatase domain and C-terminal cystein protease domain. Both domains are needed for virus replication and for processing of the second cleavage site in SFV polyprotein, while only the C-terminal protease domain is needed for processing the third cleavage site (between nsP3 and nsP4). Cysteine 481 and histidine 558 have been identified as essential residues for the protease activity of nsP2. It has been shown that nsP2 protease domain (hereafter named Pro39 ) can be expressed as recombinant protein in E. coli, purified with Ni-NTA chromatography and used for in vitro processing of the recombinant substrates, containing 37 aa region of the protease recognition site (19 aa residues upstream and 18 aa residues downstream of the cleavage point; hereafter 19/18 recognition site). (Vasiljeva et al. 2001). The cleavage is highly specific and active; Pro39 is capable to process 50% of 400-fold molar excess of substrate in 5 minutes. (Vasilieva et al., 2001).
One of the biological functions of cleavage of the protease site between nsP3 and nsP4 proteins is to release the nsP4 from P1234 precursor protein and from alpha virus early replicase complex. SFV, in contrast to majority of alpha viruses analyzed to the date produces atypically large amounts of P1234 polyprotein; in case of most other alpha viruses the P1234 production is about 20 fold down-regulated by presence of leaky termination codon at the end of nsP3 region. This leads us to believe that compared to most alpha virus proteases the SFV nsP2 protease should have a higher cleavage activity for the last processing site, since it has to digest significantly higher amounts of substrate. It may also be that proteases from other alpha viruses may have similar high activities.
The present invention relates to linear protease recognition site from the SFV encoded polyprotein which in truncated and modified forms can be used as highly efficient and precise target sequence for the SFV non-structural protease nsP2 and for its C-terminal protease domain Pro39 . The target sequence has earlier been identified as a 37 aa long sequence which however, is far too large for use in any practical expression system. On the contrary, the present disclosure provides a target sequence that can easily be used in various expression systems.
The present disclosure provides details of the protease recognition site requirements. This disclosure shows that the target sequence of Pro39 can surprisingly be truncated into shorter but still very efficiently cleavable variants. The cleavage efficiency for the artificial protease substrates containing these sequences is somewhat lower than the efficiency of full-size 19/18 recognition site, but unexpectedly it is high enough to enable protease to process over 10-fold molar excesses of substrate within one hour. According to the present disclosure the cleavage specificity was also maintained for these truncated sites. The preferred sequences according to the present disclosure for active recognition site variants are:
Moreover, the present disclosure shows that the +1 amino acid residue of the protease recognition site can be substituted from native Y (tyrosine) to virtually any type of amino acid with no change of protease cleavage specificity. If the native Y residue is substituted with S (serine) and R (arginine) residues the cleavage site recognition and/or processing efficiency is significantly enhanced. G (glycine) was found to be the best residue to substitute native Y residue as the substrate containing G was processed 3 fold more effectively as compared to native Y containing substrate. Substrates with R and S residues are processed 2 fold more efficiently as compared to substrates with native Y in the same position.
Even further, the present disclosure shows that the protease recognition downstream region can be substituted with His-tag repeat. Such substitution does not affect the cleavage as such but only the efficiency of the cleavage.
In one aspect of the invention these inventive steps are successfully combined. It is demonstrated that artificial truncated substrates with altered +1 amino acid residues are efficiently recognized and actively processed by Pro39 , the processing is more rapid and complete as compared to substrates where it truncated sites are used. Preferred embodiments of the present invention therefore include the following variants of the highly effective protease recognition sites:
Importantly, cleavage specificity is preserved for all of these modified recognition sites.
In another aspect of this invention the ability of Pro39 to process differently positioned modified recognition sites in recombinant target protein was examined. Positioning the recognition sequence between highly structured thioredoxine domains does surprisingly not cause decrease of protease cleavage efficiency, but on the contrary, the cleavage efficiency is significantly enhanced as compared to cleavage of recognition sites placed between structured and non-structured protein sequences.
According to the present disclosure Pro39 cleaves substrates possessing the truncated and modified recognition site in wide temperature range (including low temperatures) 4–39° C., in neutral pH region (i.e. pH 7–8) and in presence of high concentration NaCl (up to 4 M) or urea (up to 1.5 M). Furthermore, according to the present disclosure the cleavage of the substrates possessing the truncated and modified recognition sites can be reversibly inhibited by addition of Zn-ions and re-activated by addition of EDTA. Furthermore, Pro39 cleaves substrates having the recognition site according to this disclosure both in liquid phase as well as in resin-bound state.
One object of the present invention is to use modified recognition sites for Pro39 for removal of large and structured protein tags such as thioredoxine, GST (glutathione S-transferase), MBP (maltose binding protein) or CBP (calmoduline binding protein) from recombinant proteins.
Another object of the present invention is to insert the modified recognition site(s) for Pro39 into recombinant protein sequence and the recombinant protein can subsequently be processed into subdomains with desired sizes.
A still another object of the present invention is to provide a cleavage site and a protease that can be used in wide temperature range, at neutral pH, and under high salt concentration. Moreover, an object of the present invention is to provide a cleavage site and protease that can be used as well in liquid phase as in a resin bound stage.
An even further aspect of the present invention is to provide expression vectors comprising sequences encoding recombinant proteins and modified recognition sites for Pro39 to subsequently process the protein
B. Activity of Pro39 in presence of urea. Purified substrate was incubated with Pro39 for 60 minutes at 30° C. in molar ration 20:1 at presence of urea concentrations indicated at the top of each lane. Reaction products were analyzed by SDS-PAGE and visualized by Coomassie Blue staining. Lane 1 contains the same substrate with no Pro39 added. Positions of Pro39 , substrates and cleavage products are indicated on the right hand side of the blot
The present invention is related to linear protease recognition site from the SFV encoded polyprotein P1234 which, in truncated and modified forms can be used as highly efficient and precise target sequence for the SFV non-structural protease nsP2 and for its C-terminal protease domain Pro39 . In contrast to the previously identified target sequence, which is 37 aa long, and as such far too large to be used in practical expression systems, this disclosure provides shortened and still active variants as well as minimal, but still recognizable and cleavable forms of the recognition site. Additionally, this disclosure provides highly efficient modifications of the recognition site.
We produced a specific set of expression vectors (for prokaryotic expression and for in vitro translation) for analysis purposes. Extensive analysis was preformed using both in vitro translated substrates as well as substrates expressed in E. coli and purified as recombinant proteins by using Ni-NTA chromatography. Using crude deletion analysis we successfully demonstrated that the recognition site for the Pro39 can be considerably shortened. This shortening of recognition sequence eventually led to gradual decrease on the processing efficiency and was used for preliminary mapping of essential sequences. The precise mapping of the essential sequences was made by construction of the protease recognition sequence variants from synthetic oligonucleotides. The cleavage efficiency of the artificial protease substrates, selected as results of this procedure, was somewhat lower than for substrates containing the full-size 19/18 recognition site, but markedly it was high enough to enable protease to process over 10-fold molar excess of substrate within one hour. MALDI-TOF mass-spectrometry showed that the cleavage specificity surprisingly was also maintained for these truncated sites (
The intermediate variants, containing 9, 8 or 7 aa from the nsP3 region (upstream region with the respect of cleavage point) were cleaved specifically but with significantly lower efficiency (
Multiple biotechnological approaches require production of proteins with native N-terminus which, quite often, starts with an amino acid different from methionine. The same applies for processing proteins in in vitro conditions with the aim to produce stabilized or destabilized proteins (use of the N-end rule in cell). This approach can serve as powerful approach to create conditional protein knockout (protein will be destabilized and rapidly degraded after removal of stabilizing amino acids from its N-terminus) or knock-in constructs (recombinant protein will be stabilized after removal by protease destabilizing elements like pest-sequences or ubiquitine fusion part). To obtain the recombinant proteins or their subdomains with native N-terminal residues there is a need for a protease being able to cleave specifically substrates with any amino acid residue at its N-terminus (in other words, all elements required for protease site recognition and protease activity should be located upstream from the cleavage point). At the same time site specificity should be maintained. The invention according to the present disclosure is applicable to these approaches.
We performed a two step functional analysis and found out that recognition sequence of Pro39 and the corresponding enzyme meet the criteria set forth above. First, deletion mutagenesis of the protease recognition consensus downstream region was carried out. This experiment showed that the downstream region is not needed for cleavage as such to take place, only the cleavage efficiency was affected (
This disclosure shows that artificial truncated substrates with altered +1 amino acid residues were efficiently recognized and actively processed by Pro39 . When activating (G, S or R) amino acids were used as +1 amino acid residues, the processing was more rapid and complete as compared to substrates where wild type truncated sites where used. At the same time the rule that the most efficient cleavage required 10 or 6 native upstream amino acid residues remained unchanged. This led us to conclude that following variants of the highly effective protease recognition sites are among preferred embodiments of the invention:
Importantly, cleavage specificity was preserved for all of these modified recognition sites (demonstrated by mass-spectrometry). All this leads us to conclude that one preferable embodiment of the present invention includes insertion of 10/0 or 6/0 recognition sites into a recombinant protein for protease cleavage with Pro39 so that:
Two sets of recombinant proteins with inserted protease sites were used for studying the ability of Pro39 to process the modified recognition sites positioned differently in recombinant target protein:
Positioning of the recognition sequence between highly structured thioredoxine domains did not cause the decrease of protease cleavage efficiency, but surprisingly on the contrary, the cleavage efficiency was significantly enhanced as compared to cleavage of recognition sites placed between structured and non-structured protein sequences (
A still another aspect the present invention provides the conditions for use of the Pro39 for cleavage of the recombinant proteins containing the modified recognition sequences. This is essential part of the invention, since there has been an unmet need for a protease operable in wide range of conditions. A wide variety of conditions was tested to estimate stability of Pro39 and its preferences for cleavage of recombinant substrates. According to this disclosure Pro39 cleaves these substrates in wide temperature range including low temperatures with temperature optimum around 30° C. (
Pro39 cleaves the substrates in neutral pH region (
Pro39 cleaves these substrates at the presence of high concentration of NaCl (up to 4 M) or urea (up to 1.5 M) (
Cleavage of these substrates by Pro39 can be reversibly inhibited by addition of Zn-ions and re-activated by addition of EDTA. Low concentration of Zn-ions cause rapid and complete block of the processing, removal of Zn-ions activates the processing (
Pro39 cleaves these substrates both in liquid phase as well as in resin-bound state on the immuno-absorption column (
The invention can be better understood by way of the following examples which are representative of the preferred embodiments thereof, but which are not to be construed as limiting the scope of the invention.
Construction of Vectors for Expression of Recombinant Proteins in Mammalian Cells
Vector 1 was designed based on the vector pQM-CMV-E2-N-A-int (Quattromed Ltd. P1-114-020). Full sequence of the vector is shown in
Vector 2 is designed for cloning of recombinant protein expressing gene downstream of optimized 10/R5 cleavage site for maximally efficient cleavage of recombinant protein. The vector is based on vector pQM-CMV-E2-N-A-int (Quattromed Ltd.). Full sequence of the vector is shown in
Design of Insertion Elements which can be Used in QM-CMV-E2-N-A-intvector (Quattromed Ltd. Catalog Number P1-114-020) Based Constructs and/or Directly Inserted into Chosen Position of Recombinant Protein Using Site-specific Mutagenesis
Cassette designs are based on the use of optimized 10/R5 or 6/R5 sites (SEQ ID NO: 3 and SEQ ID NO: 4, respectively). These sites can be inserted into expression vector or introduced directly into recombinant protein encoding sequence by site directed mutagenesis:
Construct was made by using oligonucleotides:
Oligos were annealed and cloned into the vector pQM-CMV-E2-N-A-int, digested with restrictases BamHI and HindIII. The underlined GCG codon codes for arginine, which activates the cleavage of recombinant protein.
Construct was made by using oligonucleotides:
Oligos were annealed and cloned into the vector pQM-CMV-E2 N-Aint, digested with restrictases BamHI and HindIII. The underlined GCG codon codes for arginine, which activates the cleavage of recombinant protein.
The insertion strategy is also usable for non-optimized 10/0 and 6/0 type cassettes, which can be used for production of cleavage products with exact N-terminal amino acid residues.
Cassette designs are based on the use of use non-optimized 10/0 or 6/0 sites which can be inserted into expression vector or introduced directly into recombinant protein encoding sequence by site directed mutagenesis.
Construct was made by using oligonucleotides:
Oligonucleotides were annealed and cloned into the vector pQM-CMV-E2-N-A-int, digested with restrictases BamHI and HindIII.
Construct was made using oligonucleotides:
Oligonucleotide were annealed and cloned into the pQM-CMV-E2-N-Aint, digested with restrictases BamHI and HindIII.
Identification of the Minimal Cleavage Consensus of Pro 39 Using Deletion Mutagenesis
The analysis of the cleavage consensus requirements was made as a two step experiment. First, the set of constructs, expressing recombinant proteins with truncated protease recognition sites were made. For this purpose the green fluorescent protein was fused with following truncated cleavage consensus elements constructed by PCR:
The eight substrates (green fluorescent protein fused with the eight recognition site variant given above) are schematically illustrated also in
All eight substrates were expressed in E. coli and purified as recombinant proteins using Ni-NTA chromatography and subjected to the processing with Pro39 . Purified substrates were incubated with Pro39 for 60 minutes at 30° C. in molar ratio 20:1. Reaction products were analyzed by SDS-PAGE and visualized by Coomassie Blue staining. The results clearly demonstrate that the recognition site for the Pro39 can be considerably shortened. Referring to
Based on these results we subjected substrates having 15 to 5 aa residues from upstream and 0–5 aa residues from downstream region of the protease recognition site for more detailed analysis.
Identification of the Precise Minimal Cleavage Consensus of Pro 39 Using Oligonucleotide Insertion Mutagenesis
The precise mapping of the essential sequences was made by construction of the protease recognition sequence variants from synthetic oligonucleotides.
10/5 site (construct 8) encoding the recognition peptide with SEQ ID NO:1 as follows:
9/5 site (construct 9) encoding the recognition peptide with SEQ ID NO: 18 as follows:
8/5 site (construct 10) encoding the recognition peptide with SEQ ID NO:19 as follows:
7/5 site (construct 11) encoding the recognition peptide with SEQ ID NO:20 as follows:
6/5 site (construct 12) encoding the recognition peptide with SEQ ID NO:2 as follows:
5/5 site (construct 13) encoding the recognition peptide with SEQ ID NO:21 as follows:
Corresponding recombinant proteins as illustrated in
Identification of the Role of +1 Amino Acid Residue for Cleavage Activity and Specificity. Construction of the Optimized Recognition Sites.
As indicated in examples above Pro39 is capable to process a substrate containing construct 8 (10/5 site according to SEQ ID NO: 1) with no virus-specific sequence located downstream of the cleavage point. To determine if there is any requirement for +1 amino acid residue in substrate for Pro39 recognition and cleavage specificity the protease recognition sequence variants were constructed from synthetic oligonucleotides.
15/Y site (construct 14) encoding the recognition peptide with SEQ ID NO: 22 as follows:
15/A site (construct 15) encoding the recognition peptide with SEQ ID NO:23 as follows:
15/G site (construct 16) encoding the recognition peptide with SEQ ID NO: 24 as follows:
15/R site (construct 17) encoding the recognition peptide with SEQ ID NO:25 as follows:
15/S site (construct 18) encoding the recognition peptide with SEQ ID NO:26 as follows:
15/N site (construct 19) encoding the recognition peptide with SEQ ID NO: 27 as follows:
15/E site (construct 20) encoding the recognition peptide with SEQ ID NO: 28 as follows:
15/D site (construct 21) encoding the recognition peptide with SEQ ID NO: 29 as follows:
Corresponding recombinant proteins were expressed in E. coli, purified by Ni-NTA chromatography and subjected to the treatment with Pro39 . Purified substrates were incubated with Pro39 for 60 minutes at 30° C. in molar ration 20:1. Reaction products were analyzed by SDS-PAGE and visualized by Coomassie blue staining. It was demonstrated that the N-terminal amino acid residue of the protease recognition site can be substituted from Y (tyrosine, construct 14 according to SEQ ID NO: 22) to virtually any type of amino acids (S, G, R, N, D, E, C, M, L and A) except P with no change of protease cleavage specificity. At the same time anomalous electrophoretic mobility was detected for cleavage products with acidic amino acid residues (constructs 20 and 21 according to SEQ ID NO: 28 and 29, respectively) on its N-terminal position; MALDI-TOF analysis of these products clearly indicated that this is not due the unspecific cleavage of corresponding cleavage sites but due some change of mobility during SDS-PAGE. Most importantly, these experiments clearly indicated that if the native +1 amino acid residue Y (tyrosine) was substituted with glycine (G), serine (S ) or arginine (R ) residues (constructs 16, 17 and 18 according to SEQ ID NO: 24, 25 and 26, respectively) the cleavage site recognition and/or processing efficiently was significantly enhanced.
Oligonucleotide duplexes, encoding for following cleavage site variants were inserted into specially designed vector for expression of the recombinant substrates:
10/S site (construct 22) encoding the recognition peptide with SEQ ID NO:30 as follows:
6/S site (construct 23) encoding the recognition peptide with SEQ ID NO: 31 as follows:
Corresponding recombinant proteins where expressed in E. coli, purified by Ni-NTA chromatography and subjected to the treatment with Pro39 as described above. It was demonstrated that recombinant proteins, containing these protease recognition sites, were processed specifically and with higher efficiency that recombinant proteins containing corresponding unmodified sites (result not shown). This finding indicates that modified protease recognition sites can be used in expression vectors instead of unmodified sites.
Demonstration of Pro39 Cleavage of Substrate Purified on Anti-E2Tag Antibody Conjugated Sepharose Resin
Recombinant protein TAP-DBD (
Celavage of Column-bound Substrate by Pro39.
Recombinant protein TAP-DBD (
A control experiment was performed in order to show that column-bound substrate is not cleaved without Pro39 . Recombinant protein TAP-DBD (
This application claims priority of the U.S. Provisional patent application number 60/581,579 filed on Jun. 21, 2004.
| Number | Date | Country |
|---|---|---|
| WO 9210578 | Jun 1992 | WO |
| Number | Date | Country | |
|---|---|---|---|
| 20050282254 A1 | Dec 2005 | US |
| Number | Date | Country | |
|---|---|---|---|
| 60581579 | Jun 2004 | US |