Compositions and methods comprising permuted protein tags for facilitating overexpression, solubility, and purification of target proteins

BACKGROUND

There is an ongoing and unmet need for compositions and methods that improve expression, solubility and/or purification of proteins. The present disclosure pertains to these needs.

SUMMARY

The present disclosure provides improved compositions and methods for expressing proteins. In embodiments the disclosure provides expression vectors that are suitable for expressing target proteins that are present in fusion proteins between separate segments of Ribose Binding Protein (RBP) or Maltose Binding Protein (MBP). Kits comprising the expression vectors and cells comprising the expression vectors are included. Methods of making fusion proteins, methods of separating fusion proteins and/or the target proteins they include are also included, as are the fusion proteins themselves.

In particular embodiments the disclosure provides an expression vector encoding a polypeptide, the polypeptide comprising sequentially in an N to C terminal direction:

a) optionally, at the N-terminus of the polypeptide a first Histidine sequence that can function as a component of a functional Histidine tag with a second Histidine sequence located at the C-terminus of the polypeptide;

b) a first segment of a Ribose Binding Protein (RBP);

c) a first linker sequence;

d) at least one restriction endonuclease digestion site;

e) a second linker sequence;

f) a second segment of the RBP.

In one embodiment the second segment is located N-terminal to the first segment relative to an intact wild type amino acid sequence of an RBP comprising the sequence of SEQ ID NO:1, and the sequence is permuted as further described herein, but the disclosure includes non-permuted configurations as well, and thus includes permuted and linear version of the fusion proteins. The expression vector optionally further encodes at the C-terminus of the polypeptide a second Histidine sequence that can function with the first Histidine sequence in the functional Histidine tag, wherein the functional His tag may have improved metal binding relative to either of the first or second His tags alone.

Proteins described herein can comprise a non-covalently bound ribose, which can be present in cells that make the proteins, and which may persist during separation of the protein from the cells if such separation is performed.

In a configuration of the first and second segments, the amino acid sequence of the first segment and the amino acid sequence of the second segment together comprise an amino acid sequence that has at least 90% identity with a segment of SEQ ID NO:1 that is at least 251 amino acids in length, and the amino acid sequences of the first and second segments do not overlap with each other. This degree of identity includes amino acid sequences that have, for example, insertions and/or deletions (gaps), or amino acid substitutions/mutations. In one implementation the first and second segments can together have at least 90% identity with a segment of SEQ ID NO:1 that comprises amino acids number 4 and 254 of SEQ ID NO:1.

In an embodiment the configuration is such that the first linker and the second linker, and the first protease cleavage site if present, and the second protease cleavage site if present, together comprise at least thirty amino acids. If present cleavage sites can be the same or different from each other. In embodiments, the fusion proteins can comprises sequences such that they are susceptible to non-enzymatic cleavage, which can be used in conjunction with or as an alternative to protease recognition sites.

In certain embodiments the segment of SEQ ID NO:1 is amino acids 4-254 of SEQ NO:1. In embodiments, the second segment comprises a contiguous amino acid sequence that has at least 90% identity with a segment of SEQ ID NO:1 that begins with amino acid number 1, 2, 3, or 4 of SEQ ID NO:1 and ends with amino acid number 33, 59, 69, 84, 96, 124, 135, 185, or 209 of SEQ ID NO:1 In embodiments, the first segment comprises a contiguous amino acid sequence that has at least 90% identity with a segment of SEQ ID NO:1 that is amino acids 34-254 of SEQ ID NO:1, 60-254 of SEQ ID NO:1, 70-254 of SEQ ID NO:1, 85-254 of SEQ ID NO:1, 97-254 of SEQ ID NO:1, 125-254 of SEQ ID NO:1, 136-254 of SEQ ID NO:1, 186-254 of SEQ ID NO:1, or 210-254 of SEQ ID NO:1, thereby having the first amino acid of the first segment as amino acid 34, 60, 70, 85, 97, 125, 136, 186 or 210 of SEQ ID NO:1, and wherein the first segment is optionally extended by any number of amino acids up to amino acid number 277 of SEQ ID NO:1. In one embodiment the first segment ends at amino acid 277 of SEQ ID NO:1.

In certain embodiments the second segment comprises a contiguous amino acid sequence of SEQ ID NO:1 that begins with amino acid number 1, 2, 3, or 4 of SEQ ID NO:1 and ends with amino acid number 33, 59, 69, 84, 96, 124, 135, 185, or 209 of SEQ ID NO:1. In one embodiment, the second segment ends at amino acid 96 or amino acid 124 of SEQ ID NO:1.

In certain embodiments, the expression vector has at least one restriction endonuclease digestion present in a multiple cloning site; and/or ii) the expression vector further encodes at least one protease cleavage site located between the at least one restriction endonuclease digestion site and the first or the second linker sequence; and/or iii) the first and/or the second linker is at least 15 amino acids in length. In one approach at least one restriction endonuclease digestion site is present in the multiple cloning site and the first and/or the second linker is at least 15 amino acids in length.

In another aspect the disclosure comprises methods. In one approach the method comprises allowing expression of any expression vector described herein such that a fusion protein is expressed, with the proviso that a polynucleotide sequence encoding a target protein is inserted into the multiple cloning site, and wherein the expressed fusion protein optionally comprises the first and second Histidine sequences, the first and second segments of the Ribose Binding Protein, the first and second linker sequences, and the at least one protease cleavage site if the protease cleavage site is encoded by the expression vector. In embodiments, the protein comprises one or both of the Histidine sequences, and the method further comprises exposing the fusion protein to a metal such that the first and second Histidine sequences form a functional Histidine tag that forms a non-covalent association with the metal. The method can further comprise separating the fusion protein from the metal. In certain approaches the fusion protein comprises at least one protease cleavage site and optionally comprises a second protease cleavage site such that the first and second protease cleavage sites flank the target protein. In an embodiment the method further comprising cleaving the fusion protein at the first or the first and the second protease cleavage sites, and optionally purifying a protein cleavage product that comprises the target protein.

In certain aspects the disclosure includes expressing the fusion proteins in prokaryotic or eukaryotic cells, and includes such cells and cell cultures, and cell culture media. In embodiments, kits comprising the expression vectors are provided.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 provides a schematic comparisons of embodiments of this disclosure (FIG. 1, panel B) with linear tags, such as are described in PCT/US16/56832. (FIG. 1, panel A). The endonuclease site (ERS) is intact before the sequence encoding Target protein is inserted into the expression vector. The ERS can be present in a multi-cloning site in an expression vector that contains a plurality of endonuclease recognition sites (i.e., restriction sites).

FIG. 2 provides a schematic of representative fusion proteins. Panel A, linear tag, similar to FIG. 1, Panel A. FIG. 2 Panel B, split His tag with linear permutated Ribose Binding Protein (RBP). Panel C, linear RBP tag with a target protein inserted in between positions 96 and 97 of RBP flanked by linker sequences (circular RBP tag). FIG. 2C accordingly shows that in contrast to the split, circularly-permuted tag of this disclosure (using RBP as a representative solubility tag), the target protein gene can be inserted internally into the RBP gene, and thus can be used to produce a fusion protein encoded by the gene with the same protein orientation.

FIG. 3, panels A, B and C provide schematic comparisons of different first and second segments of RBP in illustrative split, circularly permutated fusion proteins of this disclosure.

FIG. 4 provides comparison of purification chromatogram of s125-clover with one N-terminal His tag (His₃), with that of s125-clover with both N-(His₃) and C-terminal (His₃) His tags. His₃-s125-clover does not bind to cobalt-NTA (nitrilotriacetic acid) resin in 20 mM imidazole while His₃-S125-clover-His₃binds to the resin and is released at higher concentration of imidazole.

FIG. 5 provides comparison of purification chromatogram of s97-clover with one N-terminal His tag (His₆) with that of s97-clover with both N-(His₆) and C-terminal (Hiss) His tags. Both proteins bind to nickel-NTA resin in 10 mM imidazole, but His₆-597-clover is released at lower concentration of imidazole than His₆-597-clover-HiS₅protein.

FIG. 6 provides a photographic representation of a SDS-polyacrylamide (SDS-PAGE) gel stained with Coomassie dye demonstrating that His₆-LECT2 (14 kDa) does not express to detectable levels. Similarly, His₆-wtRBP-LECT2 expresses poorly; the major species present is the truncated species His₆-wtRBP. By contrast, His₆-597-LECT2 L is expressed at high levels, with the full-length protein being the major species present.

FIG. 7 provides a photographic representation of a SDS-PAGE gel stained with Coomassie dye demonstrating that His₆-wtRBP MDM2 does not express to detectable levels (left panel). By contrast, His₆-s97-MDM2 is expressed at high levels (right panel).

FIG. 8 provides a photographic representation of a SDS-PAGE gel stained with Coomassie dye demonstrating that His₃-597-P53-His₃is expressed at high levels (left panel). The fusion protein is digested with HRV3C protease on Nickel-NTA resin and eluted with one other contaminant (right panel).

FIG. 9A provides the amino acid and DNA sequences of a split tteRBP protein (s97) with a multiple cloning site for inserting a target gene (with N-terminal Met) and two linker sequences. tteRBP is Thermoanaerobacter tengcongensis (tte) RBP.

FIG. 9B provides the amino acid sequence of a split, circularly permuted RBP-MDM2 fusion protein, the sequence of a split tteRBP protein (s125) with a multiple cloning site for inserting a target gene, and a split, circularly permuted RBP-clove fusion. FIG. 9B also provides the amino acid sequence of a split, circularly permuted RBP-MDM2 fusion protein, the sequence of a split tteRBP protein with a multiple cloning site for inserting a target gene, and a split, circularly permuted RBP-clove fusion.

FIG. 10 provides the amino acid sequence of tteRBP with permutation sites signified by asterisks.

FIG. 11 provides the amino acid sequence of Pyrococcus furiosus (pfu) MPB, and the amino acid and DNA sequences of a split pfu MBP protein with a multiple cloning site for inserting a target gene. Pfu is extremophilic species of Archaea.

FIG. 12 provides a photographic representation of SDS-PAGE gel stained with Coomassie dye demonstrating that human hRAS is purified using just Nickel-NTA resin. Soluble fraction containing His₃-597-hRAS-His₃protein was loaded on the column and hRAS was eluted after on-column protease digestion.

FIG. 13 provides a photographic representation of SDS-PAGE gel stained with Coomassie dye demonstrating high expression of yeast actin using split s125 tteRBP system.

FIG. 14 provides a photographic representation of SDS-PAGE gel stained with Coomassie dye demonstrating high expression of human GAP344 using split s97 tteRBP system and subsequent purification on Nickel-NTA resin. GAP is GTPase-activating protein.

DETAILED DESCRIPTION

Unless defined otherwise herein, all technical and scientific terms used in this disclosure have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains.

Every numerical range given throughout this specification includes its upper and lower values, as well as every narrower numerical range that falls within it, as if such narrower numerical ranges were all expressly written herein. Every DNA sequence disclosed herein includes its complementary DNA sequence, and also includes the RNA equivalents thereof. Every DNA and RNA sequence encoding the polypeptides disclosed herein is encompassed by this disclosure, including but not limited to all fusion proteins, and all of the Ribose Binding Protein (RBP) segment of fusion proteins and all of the Maltose Binding Protein (MBP) segment of fusion proteins, including but not limited to those comprising N-terminal and/or C-terminal truncations of the RBP segment or the MBP segment.

The disclosure includes permuted and non-permuted protein configurations of proteins, which can be present in fusion proteins. “Permuted” and “permutation” and “permuting” and “permute” and “permutants” as used herein means that, relative to a wild type amino reference sequence, proteins of this disclosure have first and second segments of an RBP or MBP, wherein the second segment is located N-terminal to the first segment when compared to an intact wild type amino acid sequence of the RBP of the MBP. To illustrate in a non-limiting fashion, a hypothetical contiguous reference protein has a series of segments of NH₂-AA₁AA₂, AA₃, AA₄, AA₅, AA₆, AA₇, AA₈, AA₉, AA₁₀, AA₁₁, AA₁₂, AA₁₃, AA₁₄, AA₁₅, AA₁₆, AA₁₈, AA₁₉, AA₂₀-COOH. The reference protein therefore has amino acids 1-20 in the N to C orientation. A non-limiting example of a permutation of this protein is: NH₂— . . . AA₉, AA₁₀, AA₁₁, AA₁₂, AA₁₃, AA₁₄. . . AA₂, AA₃, AA₄, AA₅, AA₆, AA₇, AA₈. . . —COOH, wherein the ellipses represent other amino acids that may or may not be part of a fusion protein that contains such permutated segments. Such fusion proteins are described further below.

Reference to N-terminal and C-terminal when referring to amino acids within a polypeptide is used herein as a convenience to describe orientation, but does not necessarily mandate that the particular amino acid be at the N- or C-terminal amino end of the polypeptide itself.

In embodiments the disclosure comprises segments of an RBP protein, wherein the segments comprise an amino acid sequence that is 90, 91, 92, 93, 94, 95, 96, 97, 98 or 99% or completely identical to the sequence (the tteRBP, described further below):

(SEQ ID NO: 1)

KEGKTIGLVISTLNNPFFVTLKNGAEEKAKELGYKIIVEDSQNDSSK

ELSNVEDLIQQKVDVLLINPVDSDAVVTAIKEANSKNIPVITIDRSA

NGGDVVSHIASDNVKGGEMAAEFIAKALKGKGNVVELEGIPGASAAR

DRGKGFDEAIAKYPDIKIVAKQAADFDRSKGLSVMENILQAQPKIDA

VFAQNDEMALGAIKAIEAANRQGIIVVGFDGTEDALKAIKEGKMAAT

IAQQPALMGSLGVEMADKYLKGEKIPNFIPAELKLITKENVQ.

The tteRBP sequence that includes and counts the terminal Met in amino acid numbering is SEQ ID NO:2.

Variants of the RBP and MBP or target protein bearing one or several amino acid substitutions or deletions are also included in this disclosure. In embodiments, the variants comprise mutations, including but not necessarily limited to conservative amino acid substitutions, and mutations that enhance one or more properties of the RBP or the MBP. In embodiments, a sequence having at least 90% similarity to any sequence described herein can be shorter or longer than the described sequence. The skilled artisan can easily assess whether such variants are appropriate for a method of this disclosure.

The location of the N-terminal and C-terminal amino acid(s) where an RBP or MBP according to this disclosure can be separated into segments is referred to as a permutation site, and may be referred to as a circular permutation site. All individual permutation sites, all combinations of permutation sites, and all protein segments delineated by each single and every combination of permutation sites is encompassed by this disclosure.

In embodiments, the disclosure includes a fusion protein comprising first and second segments of the RBP wherein the second segment is located N-terminal to the first segment relative to an intact wild type amino acid sequence of an RBP. In embodiments, the fusion protein comprises the amino acid sequence of the first segment and the amino acid sequence of the second segment together comprise an amino acid sequence that has at least 90% identity with a segment of SEQ ID NO:1 that is at least 251 amino acids in length, and wherein the amino acid sequences of the first and second segments do not overlap with each other. In embodiments, the first segment has as its first amino acid position 34, 60, 70, 85, 97, 125, 136, 186 or 210 of SEQ ID NO:1. The first segment can optionally extended by any number of amino acids up to amino acid number 277 of SEQ ID NO:1. Thus, in one embodiment, the first segment ends at amino acid 277 of SEQ ID NO:1. In embodiments, the first segment has at least 90% identity with a segment of SEQ ID NO:1 that is amino acids 34-254 of SEQ ID NO:1, 60-254 of SEQ ID NO:1, 70-254 of SEQ ID NO:1, 85-254 of SEQ ID NO:1, 97-254 of SEQ ID NO:1, 125-254 of SEQ ID NO:1, 136-254 of SEQ ID NO:1, 186-254 of SEQ ID NO:1, or 210-254 of SEQ ID NO:1. In embodiments, the first segment has at least 90% identity with a segment of SEQ ID NO:1 that begins with one of amino acids 34 of SEQ ID NO:1, 60 of SEQ ID NO:1, 70 of SEQ ID NO:1, 85 of SEQ ID NO:1, 97 of SEQ ID NO:1, 125 of SEQ ID NO:1, 136 of SEQ ID NO:1, 186 of SEQ ID NO:1, or 210 of SEQ ID NO:1 and ends with an amino acid from 254 of SEQ ID NO:1 to 277 of SEQ ID NO:1.

In embodiments, the second segment comprises a contiguous amino acid sequence of SEQ ID NO:1 that begins with amino acid number 1, 2, 3, or 4 of SEQ ID NO:1 and ends with amino acid number 33, 59, 69, 84, 96, 124, 135, 185, or 209 of SEQ ID NO:1. In embodiments, the second segment ends at amino acid 96 or amino acid 124 of SEQ ID NO:1. In embodiments, the second segment has at least 90% identity with a segment of SEQ ID NO:1 that begins with amino acid number 1, 2, 3, or 4 of SEQ ID NO:1 and ends with amino acid number 33, 59, 69, 84, 96, 124, 135, 185, or 209 of SEQ ID NO:1.

The term “fusion” and “fuse” as used herein mean a protein that contains amino acid segments from distinct sources, wherein the proteins are made using recombinant molecular biology approaches that adapt standard approaches that are known in the art. It is not intended to mean chemical formation of polypeptides, such as by non-protein translation approaches, such as solid or solution phase based peptide synthesis approaches. Likewise, the fusion proteins are not made by chemical conjugation of pre-existing peptides in the absence of translation.

As described above, in certain approaches a segment from a C-terminal region of the wild type protein is moved to a position that is N-terminal to said wild type C-terminal region, and vice versa. But in alternative embodiments the wild type orientation can be maintained, provided first and second segments are included and are separated from one another, as described further herein. In embodiments, a fusion protein described herein does not contain only one RBP or MBP segment, i.e., the fusion proteins comprises more than one RBP or MBP segment which are separated from each other by intervening amino acids that are not RBP or MBP amino acids, such as linkers and target proteins described further below. Without intending to be bound by any particular feature, it is consider that this disclosure comprises improvements in increasing protein expression and/or solubility that are distinct from certain other approaches, such as those described in PCT/US16/56832, published as WO 2017/066441.

In more detail, in one non-limiting and representative approach, the present disclosure presents a novel approach to incorporating an expression tag into a fusion protein. In one approach the disclosure provides a circular permutant. In embodiments circular permutants are used in a novel purification system which utilizes split Histidine (His) tags fused to each of the N- and C-termini of the fusion protein, which is referred to in some embodiments as a split His tag. The present system is applicable to any expression tag, including but not limited to RBP and MBP.

In certain embodiments, fusion proteins produced according to the methods of this disclosure have improved and/or different characteristics that relate at least in part to the discontinuous inclusion of two segments of the RBP or the MBP in the fusion protein. In non-limiting examples, such improvements can be detected by comparison to a suitable reference (i.e., a control or control value). In embodiments, the reference is a value based on one or more properties of a fusion protein that comprises only one segment of the RBP or the MBP and a target protein. In embodiments, the reference can include a standardized value or curve(s), and/or experimentally designed controls such as a known or determined expression value for a protein, such as a target protein, wherein the expression is measured for the target protein without it being in a fusion protein that contains two discontinuous segments of the RBP or the MBP. A reference value may also be depicted as an area on a graph, or a value obtained from an elution profile, or a solubility value, or a protein degradation value, and/or the total amount of protein that is expressed and/or recovered from an expression system, such values being determined based on any suitable parameter, such as the mass, moles, etc. of the protein that is produced and/or separated from the expression system. In non-limiting embodiments fusion proteins can be evaluated using any suitable approaches, which include but are not limited to Western blotting, spectroscopy (such as circular dichroism, fluorescence, absorbance, NMR), circular dichroism, mass spectrometry, Gel electrophoresis under denaturing conditions, gel electrophoresis under non-denaturing conditions, 2D gel electrophoresis, chromatography, including but not limited to cation-exchange chromatography, high-performance liquid chromatography (HPLC), chromatography-mass spectroscopy (LC/MS), immunological methods, and analysis of resistance to degradation using a variety of approaches known to those skilled in the art. In embodiments, the fusion proteins can be evaluated based on actual or predicted ability to bind to sugar, i.e., ribose for RBP and maltose for MBP.

In embodiments, the disclosure relates to fusion proteins that comprise RBP segments or derivatives thereof such that they retain sufficient homology to WT RBP (such as RBP expressed by Thermoanaerobacter tengcongensis (tteRBP, described further below) that when they fold together (i.e., a tertiary structure is formed) a functional ribose binding pocket is preserved. The structure of RBP, and its residues that contribute to ribose binding, are known in the art. (See, for example, “The backbone structure of the thermophilic Thermoanaerobacter tengcongensis ribose binding protein is essentially identical to its mesophilic E. coli homolog.” BMC Structural Biology (2008) 8:20; and Analysis of ligand binding to a ribose biosensor using site-directed mutagenesis and fluorescence spectroscopy. Protein Science (2007) 16, 362-368, the descriptions of each of which are incorporated herein by reference). In embodiments, a fusion protein of this disclosure binds more ribose relative to a fusion protein that comprises only one RBP segment. In embodiments, a fusion protein of this disclosure that is, for example, bound to a metal due to the inclusion of one or more His tags is in a non-covalent association with one or more ribose molecules. In embodiments, a fusion protein that is separated from a binding partner such as a suitable metal is in a non-covalent association with one or more ribose molecules. In embodiments, a fusion protein of this disclosure that is in a non-covalent complex with ribose molecules is more stable and/or is more soluble than a fusion protein that contains only one RBP segment. In an embodiment ribose is added to, for example, a cell lysate prior to or during a fusion protein separation/isolation process. In embodiments, ribose is added to a cell culture medium in which a fusion protein of this disclosure is being expressed. In certain embodiments, an fusion protein comprising two RBP segments includes one or more of E. coli RBP amino acids S9, N13, F15, F16, N64, D89, S103, 1132, F164, N190, F214, D215 and Q235, or the T. tengcongensis RBP amino acids that are the equivalents thereof. In embodiments, an RBP segment of this disclosure binds specifically to D-ribose. In embodiments, the amino acid sequence of an RBP protein of this disclosure comprises or consists of SEQ ID NO:1, which is RBP produced by T. tengcongensis. The amino acid residues of this amino acid sequence can be compared by those skilled in the art to the RBP sequence of RBP produced by E. coli, which is known in the art and can be found, for example, under GenBank accession number SMH27141.1, the amino acid sequence from which is incorporated herein by reference as of the filing date of this application or patent. In this regard, the ribose binding residues of RBP produced by E. coli that are involved in the ribose binding pocket comprise S9, N13, F15, F16, N64, D89, S103, 1132, F164, N190, F214, D215 and Q235, and the homologous amino acids in tteRBP can be readily recognized by comparison to the E. coli sequence.

With respect to ribose binding, in a specific and non-limiting example, amino acids 17 and 18 and 217 and 218 of SEQ ID NO:1 and a number of amino acids between 18 and 217 are considered to be necessary for RBP to bind to ribose. Because the two segments in the present invention together include the amino acids of SEQ ID NO:1 numbered 4 to 254, which includes the aforementioned amino acid residues (in contrast to the linear RBP in described in WO 2017/066441) and because the two segments are capable of interacting with each other and adopting a suitable tertiary structure for ribose binding (unlike the segments described in the published PCT application WO 2013/101915), absent certain modifications to the sequence of the two segments, the RBP formed when the two segments fold together will be capable of binding ribose. However, it should be understood that ribose binding is not required for the two segments of RBP to fold together properly. In this regard, it has been demonstrated by the inventors that the residues 17+18, and residues 217+218 can be been modified in RBP creating mutants which cannot bind ribose but are still folded and remain stable. It is expected that this approach will apply to the folding of two segments wherein one or more of the residues 17, 18, 217 and 218 have been similarly mutated (i.e., they will still be able to fold together properly and be stable). Thus, the disclosure includes such mutated proteins. In one embodiment, the disclosure includes a mutation that is a Cys to Ser alteration relative to a native sequence RBP sequence, such as a Cys to Ser alteration at position 101 in SEQ ID NO:1 (position 102 in SEQ ID NO:2).

Gene and Protein Structures

FIG. 1 compares one embodiment of this disclosure (FIG. 1, panel B) with a linear tag, such as that described in PCT/US16/56832 published as WO 2017/066441. (FIG. 1, panel A). In the linear tag, the target protein gene is appended to either the 3′-end of the RBP gene (shown as top linear RBP tag in FIG. 1A) or the 5′-end of the RBP gene (shown as bottom linear RBP tag in FIG. 1A). The RBP gene encodes for full-length RBP (amino acids [AA] 1-277, in normal, sequential order, or at a minimum, encodes the RBP amino acids 34 [Gly] to 210 [Gln]). RBP and the target protein are separated by a peptide linker (labeled ‘link’ in FIG. 1) and a protease cleavage site (labeled ‘cleave’ in FIG. 1), to enable recovery of the untagged target protein. A nucleotide sequence encoding for a full histidine tag (typically 6-8 His residues) is placed at the beginning or end of the gene to facilitate purification.

In contrast, in the split, circularly-permuted tag of this disclosure (using RBP as a representative solubility tag), the target protein gene is inserted internally into the RBP gene (FIG. 1B), and thus can be used to produce a fusion protein encoded by the gene with the same protein orientation. In addition, the RBP gene can be rearranged such that it encodes for an RBP protein that is permuted.

The disclosure includes variations of amino acid positions that are expressly described herein, so long as the fusion protein includes properties that are improved relative to a reference.

In one embodiment the RBP gene is rearranged such that it encodes for an RBP protein that is circularly permuted at amino acid position 97. The amino acid sequence of permuted RBP in certain embodiments begins with amino acid 97 (numbered according to the wild-type RBP sequence), continues through amino acid 277, through a linker sequence of variable composition (the target protein can be inserted here), to amino acid number 1, and ends with amino acid 96. The target protein gene is flanked by two nucleotide sequences that each encode for peptide linkers and a protease cleavage site, to facilitate recovery of the untagged target protein. A sequence encoding a split-His tag can be placed at each end (i.e., C- and N-termini) of the final gene. The design and rationale for an embodiment of the split-His tag is described below. Representative fusion proteins expressed by the genes in FIG. 1 are shown schematically in FIG. 2. In an embodiment, the 97-277 segment can be C-terminal to the RBP 1-96 segment. An optional linker can be inserted between the split-His tag and the adjacent segment of RBP.

For E. coli protein expression systems, a methionine will generally be added to the N-terminus of the fusion protein. In the split, circularly permuted tags described here, the methionine precedes the first split His tag or, if a split His tag is not in use, the first segment of RBP or MBP. In some embodiments, the N-terminal methionine is not counted in the amino acid numbering of the protein (SEQ ID NO:1 does not include the first Met). Some of the DNA sequences given herein for expression of proteins include a stop codon at the 3′ end. A plasmid containing the DNA sequence does not need to include that stop codon.

Internal Fusion Vs. End-to-End Fusion

In the disclosure, the target protein is inserted internally into RBP (FIGS. 1B and 2B), whereas the linear system of the 832 PCT application published as WO 2017066441 places the two proteins end-to-end (FIGS. 1A and 2A). In the current system, and without intending to be constrained by any particular theory, it is considered that the RBP folds by docking of the 1-96 and 97-277 fragments, which extend from both termini of the target protein. This effectively yields a closed, topologically circular protein in which the target protein is protected from exoprotease attack at both its N- and C-termini by the presence of the extremely stable RBP protein (FIG. 3A). Likewise if the RBP tag is circularly permuted at AA 125, the RBP folds by docking of the 1-124 and 125-277 fragments, which extend from both termini of the target protein.

Circular Permutation Vs. Normal Amino Acid Sequence of RBP

The same benefits of internal fusion could be achieved by inserting the target protein into position 97 of the normal RBP sequence, i.e. (RBP 1-96)-(target protein)-(RBP 97-277) (FIG. 2C), instead of the permuted RBP sequence, i.e. (RBP 97-277)-(target protein)-(RBP 1-96)(or into position 125 of the normal RBP sequence). In the former, the target protein is still protected from exoprotease degradation by the RBP protein at both its N- and C-termini. The circular permutation is employed in order to make a unique, high-affinity binding site (split-His tag) that will allow us to specifically purify the full-length protein, and reject those that are truncated either by protease cleavage or by incomplete translation. Although embodiments excluding amino acids 1-33 and/or amino acids 211-277 will dock properly, they will be unable to bind ribose, and as a result will likely be less stable than embodiments containing both these segments

In various embodiments, one segment of a fusion protein comprises amino acids 34-96 of the RBP, while another segment comprises amino acids 97-211. In various embodiments, one segment of a fusion protein of this comprises amino acids 34-124 of the RBP, while another segment comprises amino acids 125-211. These variations can be made in the context of the split permuted RBP: wherein the “RBP (97-277)” segment is engineered to comprise or consist of RBP amino acids 97-211, and/or wherein the “RBP (1-96)” is engineered to comprise or consist of RBP amino acids 34-96, or wherein the “RBP (125-277)” segment is engineered to comprise or consist of RBP amino acids 125-211, and/or wherein the “RBP (1-124)” is engineered to comprise or consist of RBP amino acids 34-124. One of the two segments may be engineered to begin with any amino acids between 1 and 33 inclusive and end at the permutation site, and the other of the two segments may be engineered to start after the permutation site and end at any AA between 210 and 277 inclusive.

Other expression tags, such as MBP (e.g., pfu MBP) and GST, can be circularly permuted for the purposes of protein expression, including for enabling the effective use of split His tags. In one embodiment the MBP gene is rearranged such that it encodes for an MBP protein that is circularly permuted at amino acid position 126 (FIG. 3C). The amino acid sequence of permuted MBP in certain embodiments begins with amino acid 126 (numbered according to the wild-type MBP sequence), continues through amino acid 379, through a linker sequence of variable composition (the target protein can be inserted here), to amino acid number 1, and ends with amino acid 125. The target protein gene is flanked by two nucleotide sequences that each encode for peptide linkers and a protease cleavage site, to facilitate recovery of the untagged target protein. A split-His tag is placed at each end (i.e., C- and N-termini) of the final gene. In an embodiment, the N-terminus of the 1-125 segment of the MBP is truncated by one or more amino acids, such as, for example by 1, 2, 3, 4, etc. up to about 34 AAs, while the C-terminus of the 126-379 segment of the MBP is truncated by one or more amino acids, such as, for example by 1, 2, 3, 4, etc. up to about 60 AAs. These variations can be made in the context of the split permuted MBP.

Circular permutes of many proteins are less stable than the non-permuted protein. Circular permutes of some proteins from thermophilic organisms, however, such as tte-RBP (from Thermoanaerobacter tengcongensis) and PfuMBP (from Pyrococcus furiosus), are very stable.

Split His-Tag Vs. His-Tag

In embodiments of this disclosure, a “Split-His tag” means a His-tag that is divided between the N- and C-termini of the fusion protein. In certain embodiments, each of the two portions of the split His-tag comprises an adequate length of Histidines such that when the two portions are adjacent to one another recovery of the fusion protein is greater than a suitable control. In an embodiment, each split-His tag comprises a length of Histidines that is too short for stable binding to nickel ions that have been attached to beads.

In more detail, a His-tag is a linear sequence of n histidine residues where n is typically 6-8. His-tags achieve purification by binding specifically to nickel or cobalt ions that have been attached to beads. In all His-tag purification systems described to date, the His-tag placed at the N-terminus of the protein, at the C-terminus of the protein, or occasionally in the middle. The current system employs two split-His that are arrayed in close proximity and in approximately parallel orientation, by virtue of the structure of circularly permuted RBP. The distinction between split-His tag and conventional His-tag is that the two split-His tags must be very close to each other, such that the two tags can cooperatively bind the same, or nearby (on a molecular scale), nickel ion(s). Because the two split-His tags cooperatively bind, they act almost as a single His tag with a number of His equal to the sum of the number of His in the two split-His tags. For example, if each split His tag contains three His residues, their cooperative binding strength is close to or equal to that of a single His tag with six residues. To our knowledge, no expression system exists that has this feature. Further, engineering a split His tag into any recombinant protein, regardless of whether or not the other elements of the fusion proteins of this disclosure are included in the protein, is encompassed by this invention. Note that if one His-tag is placed at each terminus of the linear RBP-target protein fusion (FIG. 2), the two His-tags will not by physically close to each other and will therefore not bind cooperatively.

When a protein is circularly permuted, two sequential amino acids of the parent protein (typically in a surface loop) become the new amino and carboxy termini of the permuted protein. Therefore, the termini of a permuted protein are always close in space. If one attaches a split-His tag to each terminus, these tags are expected to project outward in roughly parallel orientation and be very close to each other (FIG. 2B). This tandem arrangement of 2× split-His binds nickel more tightly than a single His tag of the same number of His residues as each split His tag. What this means is that the full-length protein binds more tightly to nickel beads (or other appropriate stationary phase) than truncated proteins (which will contain only one or neither of the split-His tags). The full-length protein can then be selectively purified from fragments by a gradient of eluent (e.g. imidazole), represented in FIGS. 4 and 5.

Split His tags of equal length tend to work optimally for purification because both split His tags have an equal affinity to the nickel substrate of the column and as a result will release from the substrate under the same conditions and at the same time. In one embodiment, each split-His tag contains two, three, four, five, six, seven or eight His residues. In one embodiment, each split-His tag contains 2 to 20 His residues. The present disclosure, however, does not preclude split His tags of unequal length, such as one split-His tag containing two residues and the other containing three residues, or one split-His tag containing three His residues and the other split-His tag containing five, etc., as performed for FIG. 5. An optional linker can be inserted between the split-His tag and the adjacent segment of RBP.

Native proteins that are expressed by the cell being used for protein production may be rich in His and may therefore bind naturally to the nickel (or other suitable metal, such as cobalt) substrate. In order to make it easier to separate these proteins from the fusion protein, the disclosure includes use of split His tags that contain a total of more than six His residues. For example, the disclosure includes use two six or two eight residue split His tag, one on each end of the fusion protein. In such embodiments, a higher concentration of, for example, imidazole can be used to elute the native, His-rich proteins that have bound opportunistically to the nickel column than can be used when split His tags with fewer His residues are used.

One advantage of the split-His tag circularly permuted RBP-target fusion protein production system is that single column purification is possible. (See, for example, representative and non-limiting embodiments described below under “One column purification of the target protein” and data shown represented by FIG. 12). Truncated fusion proteins will only have a single split His tag and will therefore only bind loosely to the nickel column's beads. These truncated fusion proteins and any His-rich native proteins can be eluted away using an eluent gradient. After eluting away the undesired fusion protein fragments, an appropriate protease can be run through the column to cleave the target protein from the RBP tags and release it from the column and into the mobile phase (e.g., buffer). If the protease contains a His tag of an appropriate length (e.g., six), it will bind to the column, either on the first pass or on a subsequent pass of the mobile phase through the column, thereby separating it from the released target protein. In some cases, the target protein may be nicked by proteases in the cell used for expressing the fusion protein. In that case there may be full-length target protein mixed with fragments of the target protein. Additional steps, using any of the techniques known to those skilled in chromatography and/or filtration, may be necessary to separate the fragments from the full-length target protein. For Instance, the heparin resin can be used to purify P53 further. Ion exchange, size exclusion, or affinity chromatography can also be used for purification, depending on the protein one wants to purify. The foregoing is applicable to a split-His tag circularly permuted RBP-target fusion protein production system. In certain embodiments of the disclosure as illustrated by the Figures, we used affinity chromatography using heparin resin to purify P53 and the size exclusion chromatography to purify MDM2 and Lect2 protein.

Circular Permutation at Other Positions (FIG. 10)

A circular permutant can be created by permuting the amino acid sequence at any position, although it is preferable to permutate surface loops to avoid perturbing protein structure. RBP has many surface loops (15) from which to choose when making a circular permutant. An aspect of this disclosure is our discovery that positions 97 and 124 are considered to be preferable sites to permute RBP due to their ability to yield a combination of stability, solubility, and foldability in the permutated proteins. Using a screening method that we developed [Ha et al. (2015), Chemistry & Biology 22, 1384] we discovered that position 97 is a favorable position to break the RBP sequence. It is possible to create similar constructs as that shown in FIG. 1 by permuting RBP at the other sites described in the above study, and this invention covers those designs as well. Other permutation sites on RBP include 34 (i.e., between AAs 33 and 34), 60, 125, 137, 186, and 210. The permutation can be within several AAs to either side of the aforementioned sites, as long as it is within the surface loops that comprise those sites (e.g., it can be at AA 121, 122, 123, 124, 126, 127, 128, 129, or 130 instead of at 125 in RBP). In embodiments where there is no permutation of RBP, the target protein and flanking nucleotides are inserted into the RBP at the aforementioned sites, or at another suitable site within the RBP. The same applies to MBP, which can be permuted at any of its loops, such as at 55, 82, 126, and 204.

Split His tags are to be employed with a circularly permuted tag, is the proximity and relative orientation of the N-terminus of the leading segment (e.g., the 125-277 segment of RBP) and the C-terminus of the trailing segment (e.g., the 1-124 segment of RBP) after the two segments fold together. If the N- and C-termini are oriented roughly parallel and are adjacent to each other, the split His tags will be roughly parallel and adjacent to each other and will generally work well. If the N- and C-termini are neither, the split His tags may not function optimally or at all. Thus, the disclosure provides modifying properties of the proteins using suitable length linkers, as described elsewhere herein.

Monomer Vs. Domain-Swapped Oligomer

In certain examples in which we inserted “lever” proteins into surface loops of “assembler” proteins (including RBP) we developed a technology by which such insertions would cause RBP to form domain-swapped dimers and oligomers (described in U.S. patent application Ser. No. 14/369,408; “the 408 application” published as WO/2013/101915). A purpose of this technology was to create self-assembling, domain-swapped biomaterials. This employed very short linkers (0-3 amino acids in length) to fuse lever proteins (with unusually long amino-to-carboxy terminal distances) into internal positions of assembler proteins. The lever protein then tears the assembler protein in two and holds the pieces so far apart that they cannot refold with each other within the same molecule. They are forced to refold with other molecules (domain swap).

In contrast to the '408 application, in the present invention, longer linkers are used (i.e., 15-30 amino acids, or longer, which can include the protease cleavage site if present) to join the target protein to the RBP segments. This allows RBP to accommodate target proteins with even the longest amino-to-carboxy distances without domain swapping (FIG. 2). In this regard, and without intending to be bound by any particular theory, the disclosure includes fusion proteins that are designed to be monomeric to achieve the benefits of split-His tag technology. In particular, the linkers and cleavage site are flexible enough and long enough to allow the two, separated and permuted sections of RBP become adjacent each other and fold to form a permuted and non-permuted RBP. In an embodiment, the linkers that flank the RBP, including both peptide linkers and protease cleavage (or the nucleotides coding therefore), are greater than the longest amino-to-carboxy dimension within the target protein. In an embodiment, the linkers that flank the RBP, including both peptide linkers and protease cleavage (or the nucleotides coding therefore), can be 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50 or more amino acids in length.

In more detail, with respect to the linkers that can be used with embodiments of this disclosure, it is preferable for the linkers to be flexible. In certain approaches, appropriate linkers are rich in small or polar amino acids such as Gly and Ser to provide suitable flexibility. Thus, the amino acid composition of the linkers is not particularly limited. Certain exemplary linker sequences are provided in the amino acid sequences and figures of this disclosure, which are not intended to limit the amino acid composition of the linkers. Further, approaches to design of flexible linkers are known in the art. (See, for example, Klein, et al. “Design and Characterization of Structured Protein Linkers with Differing Flexibilities.” Protein Engineering, Design and Selection 27.10 (2014): 325-330, the disclosure of which is incorporated herein by reference).

In embodiments, the combination of the two linkers collectively include at least 30 amino acids, and thus are substantially longer than previous linkers that have been described with fusion proteins that include segments of RBP. In certain embodiments, the disclosure relates to configurations of a fusion protein wherein the two discontinuous RBP segments are be able to fold together (i.e., adopt a suitable tertiary structure such that the ribose binding pocket remains functional). In embodiments, linker sequences can comprise protease recognitions sites as described elsewhere herein. Without intending to be constrained to any particular theory, it is considered that the total number of amino acids in the linkers (and any protease sites if included) is approximately 30 amino acids in total when the end of the segment of RBP corresponding to the C terminus segment (such as of SEQ IN NO:1) is spatially proximal to amino acid 277 of SEQ ID NO:1. The 30 amino acids can be distributed between the linkers in any way, with the first and second linkers (plus protease sites if included) being between 0 and 30 amino acids long, again provided that the total is at least about 30 amino acids. In certain approaches it is preferable that there be at least a short (about 3, 4, 5, 10) linker between each termini of the target protein. In embodiments, the linkers and any protease sites if present together are about 20 amino acids longer than 30 if the segment of RBP corresponding to the C terminus segment of SEQ IN NO:1 ends at or near amino acid 254 (about 20 amino acids longer). This is because, and again without intending to be constricted by theory, in a folded protein of SEQ ID NO:1 beginning at amino acid number 4 and ending at amino acid number 254, the N and C termini are farther away from each other than in a folded protein of SEQ ID NO:1 that begins at amino acid number 4 and ends at amino acid number 277.

In particular, the length of the linkers can be adapted to account for the length of the RBP or MBP protein segments that are included in the fusion protein/encoded by the expression vectors. As a non-limiting example, if in certain embodiments, the first and second RBP segments collectively include 277 RBP amino acids, such as those depicted in SEQ ID NO:1, it is considered that total 30 amino acids of linker length is considered adequate. If for example, RBP amino acids are not included, such as by omitting the last 23 amino acids (i.e., 255 to 277), then the linker can be extended to substitute for those amino acids. In a non-limiting approach if the final RBP amino acid in a segment of the fusion protein is 254, a linker can be extended by an additional 23 amino acids. Similar linker length modifications can be made if other RBP amino acids are omitted. The same approach applies to MBP as well.

In a further embodiment, the disclosure includes a recombinant DNA molecule, such as an expression vector, encoding a fusion protein, comprising operatively-linked at least one nucleotide sequence coding for a target polypeptide at least one nucleotide sequence coding for the RBP segments as described herein.

Polynucleotide sequences are operatively-linked when they are placed into a functional relationship with another polynucleotide sequence. For instance, a promoter is operatively-linked to a coding sequence if the promoter affects transcription or expression of the coding sequence. Generally, operatively-linked means that the linked sequences are contiguous and, where necessary to join two protein coding regions, both contiguous and in reading frame. However, it is well known that certain genetic elements, such as enhancers, may be operatively-linked even at a distance, i.e., even if not contiguous. Promoters of the present disclosure may be endogenous or heterologous to the host, and may be constitutive or inducible. The appropriate promoter and other necessary vector sequences are selected so as to be functional in the host. Examples of workable combinations of cell lines and expression vectors include but are not limited to those described Sambrook, J., et al., in “Molecular Cloning: A Laboratory Manual” (1989, 4th edition: 2012)-, Eds. J. Sambrook, E. F. Fritsch and T. Maniatis, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, or Ausubel, F., et al., in “Current Protocols in Molecular Biology” (1987 and periodic updates), Eds. F. Ausubel, R. Brent and K. R. E., Wiley & Sons Verlag, New York; and Metzger, D., et al., Nature 334 (1988) 31-6. Many useful vectors for expression in bacteria, yeast, mammalian, insect, plant or other cells are known in the art and may be obtained from vendors including, but not limited to, Stratagene, New England Biolabs, Promega Biotech, and others. In addition, the construct may be joined to an amplifiable gene so that multiple copies of the gene may be obtained. Thus, and without intending to be constrained by any particular theory, it is considered that one of the advantages of using highly soluble proteins such as RBP and MBP as overexpression tags as described herein is that they are so soluble that a stronger expression promoter, such as the T7 promoter (T7P in FIG. 1), can be used to drive protein production to very high levels. We have discovered that the presently described approaches facilitate expression of so much of the RBP fusion protein (FIG. 1B) that any native E. coli proteins that are present are at very small concentrations, making purification easier. However, any other suitable promoters (ranging from strong to weak promoters) can be used in the expression vectors of this disclosure, some examples of which include but are not limited to promoters that are provided with commercially available expression vectors, such as the Pm/xylS promoter from Vectron Biosolutions. Other specific examples are known and are described in, for example, “A comparative analysis of the properties of regulated promoter systems commonly used for recombinant gene expression in Escherichia coli” Microbial Cell Factories, 201312:26, from which the disclosure of promoters is incorporate herein by reference. The promoters used in embodiments or the disclosure may be constitutive promoters, or they may be inducible.

Furthermore, the expression systems are not limited to prokaryotic systems, and thus may be configured for expression in eukaryotic systems, such as yeast, animal systems such as baculovirus-insect cell systems, mammalian cell expression systems, and cell free expression systems that are known in the art and that, when given the benefit of the present disclosure, can also be used. Suitable expression vectors are known in the art and can be adapted for use in methods of this disclosure.

Expression of the proteins can be also be scaled to produce any desired amount of the proteins, such as by batch scaling, and can be used to produce milligram, gram, or kilogram quantities of the proteins. Such quantities can refer to production of the fusion protein, or of the target protein if the target protein is separated from the fusion protein.

In more detail with respect to expression systems, DNA constructs prepared for introduction into a host typically comprise a replication system recognized by the host, including the intended DNA fragment encoding the desired target fusion peptide, and will can also include transcription and translational initiation regulatory sequences operatively-linked to the polypeptide encoding segment. Expression systems (expression vectors) may include, for example, an origin of replication or autonomously replicating sequence (ARS) and expression control sequences, a promoter, an enhancer and necessary processing information sites, such as ribosome-binding sites, RNA splice sites, polyadenylation sites, transcriptional terminator sequences, and mRNA stabilizing sequences.

Expression and cloning vectors can contain a selectable marker, a gene encoding a protein necessary for the survival or growth of a host cell transformed with the vector, although such a marker gene may be carried on another polynucleotide sequence co-introduced into the host cell. Only those host cells expressing the marker gene will survive and/or grow under selective conditions. Typical selection genes include but are not limited to those encoding proteins that (a) confer resistance to antibiotics or other toxic substances, e.g., ampicillin, tetracycline, etc.; (b) complement auxotrophic deficiencies; or (c) supply critical nutrients not available from complex media. The choice of the proper selectable marker will depend on the host cell, and appropriate markers for different hosts are known in the art.

The expression vectors containing the polynucleotides of interest can be introduced into the host cell by any method known in the art. These methods vary depending upon the type of cellular host, including but not limited to transfection employing calcium chloride, rubidium chloride, calcium phosphate, DEAE-dextran, other substances, and infection by viruses. Large quantities of the polynucleotides and polypeptides may be prepared by expressing the polynucleotides in compatible host cells. The most commonly used prokaryotic hosts are strains of Escherichia coli, although other prokaryotes, such as Bacillus subtilis may also be used.

Construction of a vector according to the present disclosure employs conventional ligation techniques. Isolated plasmids or DNA fragments are cleaved, tailored, and religated in the form desired to generate the plasmids required. If desired, analysis to confirm correct sequences in the constructed plasmids is performed in a known fashion. Suitable methods for constructions expression vectors, preparing in vitro transcripts, introducing DNA into host cells, and performing analyses for assessing expression and function are known to those skilled in the art.

The DNA construct comprise linker peptides as illustrated herein. As described above, the linkers and cleavage site are flexible enough and long enough to allow the two, separated and permuted sections of RBP become adjacent each other and fold to form a permuted RBP.

In cases where it is desired to release one or all of the solubility and expression tags out of a fusion protein, the linker peptide(s) can be constructed to comprise a proteolytic cleavage site. Thus, a recombinant DNA molecule, such as an expression vector, encoding a fusion protein comprising at least one polynucleotide sequence coding for a target polypeptide, a polynucleotide sequence coding for the RBP-solubility tags as described herein, and additionally comprising a nucleic acid sequence coding for a peptidic linker comprising a proteolytic cleavage site, represents a non-limiting embodiment of this invention. In certain embodiments, the expression vector comprises codons optimized for expression in the host cell.

As discussed above, fusion proteins of this disclosure may or may not contain protease recognition sites. If a protease recognition site is included it can be comprised within a linker sequence. Suitable protease recognition sites are known in the art and can be adapted with embodiments of this disclosure. In embodiments, the protease recognition site can comprise a site that is recognized by any of plant, viral, bacterial and/or animal proteases. Animal proteases include acid proteases secreted into the stomach (such as pepsin) and serine proteases present in duodenum (trypsin and chymotrypsin). Proteases present in blood serum (thrombin, plasmin, Hageman factor, etc.) recognize sites that can also be used. Other proteases are present in leukocytes (elastase, cathepsin G), and some venoms also contain proteases, such as pit viper haemotoxin; sites that are recognized by such proteases are included in the disclosure. It will also be recognized that amino acid sequences recognized by proteases expressed by bacteria in the gut of various animals, including humans, or by the animals themselves in various parts of their anatomy, such as their gut, digestive system or circulatory system, can also be incorporated into the fusion proteins. In such cases, the separation of the target protein from the fusion protein can occur in that part of the animal's anatomy. Thus, in embodiments, a target protein of this disclosure can comprise a protein-based pro-drug (or other biologically active protein) which is activated via liberation from the fusion protein only once it is administered to an animal that expresses the cognate protease. Protease recognition sites that can be used in this invention are known in the art. For example, see the table of protease recognition sites available at www.proteinsandproteomics.org/content/free/tables_1/table11.pdf, the disclosure of which is incorporated herein by reference.

In specific but non-limiting embodiments, protease cleavage sites that can be used in embodiments of this disclosure include Tobacco etch virus (ENLYFQ/G; SEQ ID NO:16, Enterokinase site (DDDDK/ SEQ ID NO:13), Factor Xa site IEGR/ (SEQ ID NO:14) and Thrombin (LVPR/GS SEQ ID NO:15).

Protease sequences (or other cleavage sites) that are not included in the target protein can be designed and included by analysis of the amino acid sequence of the target protein, thus avoiding or minimizing cleavage of the target protein. In embodiments, publicly available tools for protease sites can be used to determine protease cleavage sites, such as PeptideCutter (available at web.expasy.org/peptide_cutter/).

In other embodiments, the fusion proteins can comprise amino acid sequences that can be cleaved to, for example, liberate the target protein, but not necessarily by a protease, and thus may be cleaved non-enzymatically. Such sequences can be included in the linkers. In certain embodiments, such sequence can be, for example, particularly susceptible to acid hydrolysis, or by exposure to other chemicals, or by heat treatment. In certain approaches the fusion proteins comprise sequences that are designed to be exclusively or preferentially cleaved by cyanogen bromide, which cleaves peptide bonds after a methionine. Likewise, the fusion proteins may be designed to be exclusively or preferentially cleaved at tryptophanyl, aspartyl, cysteinyl, and/or asparaginyl peptide bonds. Acids such as trifluoroacetic acid and formic acid may also be used for such non-enzymatic proteolysis. Approaches such as these can be adapted to alter conditions in which a fusion protein of this disclosure is treated, such as by modifying pH, temperature, salt concentrations and the like so that preferential cleavage of the target protein can be achieved. Combinations of these cleavage mechanisms or approaches can be used in a single fusion protein, including incorporating different cleavage mechanisms between the target protein and each of the RBP segments or incorporating more than one cleavage mechanism between the target protein and one or both of the RBP segments.

The invention is demonstrated using several target proteins as described in the Examples. These include Leukocyte Cell Derived Chemotaxin 2 (LECT2), Human mouse double minute 2 homolog (MDM2) also known as E3 ubiquitin-protein ligase Mdm2, p53, hRAS, actin and GTPase-activating protein (GAP). Thus, the disclosure demonstrates embodiments with proteins of vastly different amino acids compositions, sizes and function. Accordingly, it is expected that the target protein that is included in the fusion proteins of this disclosure may be any polypeptide of interest. In embodiments, a target polypeptide according to the present disclosure may be any polypeptide required or desired in larger amounts and therefore may be difficult to isolate or purify from other sources. Non-limiting examples of target proteins that can produced by the present methods include mammalian gene products, such as enzymes, cytokines, growth factors, hormones, vaccines, antibodies and the like. In embodiments, overexpressed gene products of the present disclosure include gene products such as p53, erythropoietin, insulin, somatotropin, growth hormone releasing factor, platelet derived growth factor, epidermal growth factor, transforming growth factor a, transforming growth factor 13, epidermal growth factor, fibroblast growth factor, nerve growth factor, insulin-like growth factor I, insulin-like growth factor II, clotting Factor VIII, superoxide dismutase, α-interferon, γ-interferon, interleukin-1, interleukin-2, interleukin-3, interleukin-4, interleukin-5, interleukin-6, granulocyte colony stimulating factor, multi-lineage colony stimulating activity, granulocyte-macrophage stimulating factor, macrophage colony stimulating factor, T cell growth factor, lymphotoxin and the like. In embodiments overexpressed gene products are human gene products. The present methods can readily be adapted to enhance secretion of any overexpressed gene product which can be used as a vaccine. Overexpressed gene products which can be used as vaccines include any structural, membrane-associated, membrane-bound or secreted gene product of a mammalian pathogen. Mammalian pathogens include viruses, bacteria, single-celled or multi-celled parasites which can infect or attack a mammal. For example, viral vaccines can include vaccines against viruses such as human immunodeficiency virus (HIV), vaccinia, poliovirus, adenovirus, influenza, hepatitis A, hepatitis B, dengue virus, Japanese B encephalitis, Varicella zoster, cytomegalovirus, hepatitis A, rotavirus, as well as vaccines against viral diseases like measles, yellow fever, mumps, rabies, herpes, influenza, parainfluenza and the like. Bacterial vaccines can include vaccines against bacteria such as Vibrio cholerae, Salmonella typhi, Bordetella pertussis, Streptococcus pneumoniae, Hemophilus influenza, Clostridium tetani, Corynebacterium diphtheriae, Mycobacterium leprae, R. rickettsii, Shigella, Neisseria gonorrhoeae, Neisseria meningitidis, Coccidioides immitis, Borellia burgdorferi, and the like. A target polypeptide may also comprise sequences; e.g., diagnostically relevant epitopes, from several different proteins constructed to be expressed as a single recombinant polypeptide.

In embodiments, the target protein can comprise a protein that can be suitable for use as a nutraceutical, a dietary or other food supplement, a food additive, a filler, a binder, or for any purpose related to human and non-human animal nutrition. In an embodiment, the target protein is intended for human consumption, or for veterinary purposes, including but not limited to the purposes of providing a feed, feedstock, a dietary supplement, or other food component to, for example, animals that are used in an agricultural industry, or for companion animals. In embodiments, the non-human animals are bovine animals, poultry, porcine animals, felines, canines, equine animals, or fish. In embodiments, the protein can be comprised within intact cells, such as in a cell culture, or in can be provided as a cell lysate. In embodiments, cells that produce the protein can be used as a probiotic agent, which could be for instance fed to a recipient, and/or could be used as an inoculant so that that the cells could colonize for example some or all of the gastrointestinal tract of the animal (or elsewhere) and provide an ongoing supply of the target protein, whether or not it remains as a component of the fusion protein. In embodiments, the protein comprises a high proportion of essential amino acids, i.e., an abundance of any one or combination of phenylalanine, valine, threonine, tryptophan, methionine, leucine, isoleucine, lysine, and histidine. In embodiments, the protein comprises an enzyme that is beneficial to a person who may produce an inadequate amount of the enzyme.

The recombinant proteins of the inventions can be recovered by conventional methods. Thus, where the host cell is bacterial, such as E. coli it may be lysed physically, chemically or enzymatically and the protein product isolated from the resulting lysate. It is then purified using conventional techniques, including but not necessarily limited to conventional protein isolation techniques such as selective precipitation, adsorption chromatography, and affinity chromatography, including but not limited to a monoclonal antibody affinity column.

Proteins of the present invention that are expressed with a histidine tail (HIS tag) as described above can easily be purified by affinity chromatography using an ion metal affinity chromatography column (IMAC) column. In embodiments where the permute or non-permute RBP formed when the two segments of the fusion protein fold together is capable of binding ribose, the fusion protein will bind to any ribose in the cells expressing it. In some cases, the amount of fusion protein will exceed the supply of ribose in the cell. To increase that supply and ensure that each fusion protein is bound to a ribose, ribose can be added to the media in which the cells expressing the fusion protein are growing. Alternatively, the ribose can be added after the cells are lysed. In either case, the additional ribose will ensure that all of the fusion protein is bound to a ribose.

When used as part of an expression construct designed for the expression of the coded protein in an appropriate host the disclosure produces a novel fusion protein, from which the protein of interest can be readily purified, in certain embodiments at substantially higher levels than can be achieved using only the sequence for the protein of interest alone, or using a linear configuration as described above.

Fusion polypeptides can be purified to high levels (greater than 80%, or greater than 90% pure, as visualized by SDS-PAGE) by undergoing further purification steps. Additional purification steps can be carried out and may be performed either before or after the IMAC column to yield highly purified protein. They present a major single band when analyzed by SDS PAGE under reducing conditions, and western blot analysis show less than 5% host cell protein contamination.

In one aspect, the present disclosure relates to a method of producing a fusion protein. The method comprises the steps of culturing a host cell transformed with an expression vector as described above, expression of that fusion protein in the respective host cell and separating the protein from the cell culture. The expression system is demonstrated to function with several distinct proteins as described herein, but it is expected it will function with a wide variety of distinct polypeptides with different structural and functional properties.

Compositions comprising fusion proteins, or proteins liberated from the fusion proteins of this disclosure are also provided. Such compositions include but are not necessarily limited to compositions that comprise a pharmaceutically acceptable excipient and thus are suitable for human and veterinary prophylactic and/or therapeutic approaches. In another embodiment, kits for producing fusion proteins according to this disclosure are provided. The kits can provide one or more expression vectors described herein, as well as printed instructions for using the vectors, and/or for recovering the overexpressed protein.

Although we have expressly designed the fusion protein to remain monomeric, it is plausible that it may domain swap in the cell, particularly if it is expressed at extremely high concentrations. If domain swapping occurs, the present invention will protect against proteolytic attack by blocking both termini of the target protein and allow for purification and recovery of the target protein.

The following Examples are intended to illustrate, but not limit the invention.

Example: GFP

To demonstrate the effectiveness of the split-His tag design, we inserted a target protein (clover, a green fluorescent protein variant) into position 97 of RBP (FIG. 3A), and in a second construct, into position 125 of RBP to generate the construct (split-His)-(RBP 125-277)-(linker/cleavage site)-(clover)-(cleavage site/linker)-(RBP 1-124)-(split-His) (FIG. 3B). We designate these constructs as s97-clover and s125-clover, respectively (s97 means a target protein bracketed by linkers and cleavage sites was inserted into position 97 of the RBP). Permuting/splitting RBP at positions other than 97 is covered in this invention and is discussed in the next section. We chose to permute/split at position 125 because inspection of the X-ray structure of WT RBP suggested that the split-His tags would be oriented in a more parallel orientation than they would be when RBP is permuted at position 97. Thus, we reasoned that the metal binding affinity of the split-His tags in s125 constructs would be higher than in s97 constructs. We added His₃tags to both termini of s125-clover (His₃-s125-clover-His₃). To mimic the products of proteolytic cleavage or incomplete transcription/translation, we created a second construct in which only a single His₃tag was added to the N-terminus (His₃-s125-clover). We expressed the proteins in E. coli and loaded them on a Co²⁺-agarose purification column. As expected, the single His₃tag was too short to facilitate binding of His₃-s125-clover to the column; nearly all of the protein flowed through in the wash and only a tiny peak eluted in the 0.15 M imidazole elution step (FIG. 4). In marked contrast, most of the HiS₃-s125-clover-HiS₃protein bound to the column, with a large, sharp peak coming off with the 0.15 M imidazole elution. FIG. 4 demonstrates that the full-length fusion protein, which contains both halves of the split-His₃tag, can be efficiently separated from degraded and or incompletely transcribed/translated species that only contain a single His₃tag.

As an additional demonstration of the ability of split-His tag to selectively purify full-length proteins, we performed a similar experiment with His₆-s97-clover-Hiss (and His₆-597-clover as the single-His tag control). To more closely replicate real-world purification, we pre-mixed the two proteins before loading them onto a Ni²⁺ column. FIG. 5 shows that His₆-s97-clover elutes earlier in the imidazole step gradient compared to His₆-s97-clover-Hiss, again demonstrating that the split-His tag can resolve full-length from truncated products.

Example: Lect2 Protein Overexpression Via Split-RBP Vs. Linear RBP Vs. Tagless

Here we demonstrate that the split, circularly permuted RBP expression tag is superior to both the linear RBP and tagless systems for overexpressing the protein LECT2. LECT2 is the protein that causes the 4^thmost common form of systemic amyloidosis in the United States. Lect2 has three disulfide bonds. When Lect2 was previously expressed in E. coli, it did not fold properly because the disulfide bonds became scrambled. Lect2 expressed as S97 fusion protein was not only soluble but also has all three disulfide bonds correctly formed according to Mass Spec. We made three LECT2 constructs. For the first we inserted LECT2 into position 97 of split-RBP (as shown in FIG. 3A) to create His₆-597-LECT2. For the second we fused wild-type, non-permuted RBP to the N-terminus of LECT2 to create His₆-wtRBP-LECT2. For the third we simply added a His₆-tag to the N-terminus of LECT2 to generate His₆-LECT2. We then expressed the proteins under identical conditions in E. coli, lysed the cells, and ran the insoluble and soluble fractions on an SDS-polyacrylamide gel. FIG. 6 shows that His₆-LECT2 (14 kDa) does not express to detectable levels. Similarly, His₆-wtRBP-LECT2 expresses poorly; the major species present is the truncated species His₆-wtRBP. By contrast, His₆-597-LECT2 is expressed at high levels, with the full-length protein being the major species present. Importantly, truncation products (e.g. the s97 fragments of 11 kDa and 20 kDa) are not detected. These results indicate that: (1) the s97 tag greatly enhances expression of LECT2, and (2) the closed, topologically circular topology created by the s97-LECT2 fusion seems to protect the full-length protein from degradation, compared to the linear wtRBP-LECT2 fusion.

Example: Expression of Human Double-Minute Protein 2 (MDM2)

MDM2 is a high-value target for protein expression because it is the major negative regulator for the p53 tumor suppressor. Disrupting the MDM2-p53 interaction is a major target for developing anti-cancer drugs, but these efforts have been hindered by the inability to express full-length MDM2 in sufficient quantity. We directly compared MDM2 expression using the split, circularly permuted RBP system of the present invention versus a linear RBP embodiment. The MDM2 gene was appended to or inserted in the RBP gene as shown in FIG. 3A, with appropriate linkers (see FIG. 9b for gene and protein sequences). E. coli cultures were grown, induced, harvested, and lysed under identical conditions. FIG. 7 shows the eluents from the nickel column. Five times as many cells were lysed for the linear RBP data.

The linear RBP system prep (left) shows a faint, barely detectable band of full-length RBP-MDM2 eluting from the nickel column. By contrast, the free RBP band is very intense. The example of the current disclosure (right) indicates an intense band of full-length RBP-MDM2 (again, generated from ⅕ as many cells as the linear control) with less contamination with RBP fragments.

We digested the solutions with protease to release free MDM2. In the linear RBP system prep, we then passed the solution over nickel beads to remove the free RBP contaminant. No MDM2 band was observed in the gel. In contrast, for the permuted circular tag, we digested with protease but did not pass the solution over nickel beads. We observe nearly complete cleavage of full-length RBP-MDM2 to yield an intense band of free MDM2. Thus, the present disclosure provides a significant improvement in protein production/purification relevant to a control.

Example: Expression of p53

We inserted human p53 into position 97 of split, circularly permuted RBP (as shown in FIG. 3A) to create His₃-597-p53-His₃. FIG. 8 (left gel) indicates that full-length His₃-s97-p53-His₃is overexpressed to a high level in E. coli lysates, with the majority of the protein found in the soluble fraction. P53 alone, without being fused to s97 or wtRBP, does not express to detectable levels (not shown). FIG. 8 (right gel) demonstrates that the split-His tag enables purification of the full-length protein to homogeneity, and that subsequent cleavage with prescission protease yields the correct, native p53 protein. This protein was determined to be fully functional by DNA binding assays (not shown).

The following is a non-limiting protocol by which an embodiment of this disclosure can be performed.

One Column Purification of the Target Protein

- 1. Transform BL21(DE3) cells (or one of its derivatives) with an appropriate plasmid (e.g. pET41 sRBP-mdm2) as described herein.
- 2. Inoculate one colony in 1 liter of LB and grow at 30° C. until OD₆₀₀˜0.6.
- 3. Induce the protein expression with Isopropyl β-D-galactopyranoside (0.1 to 0.4 mM) at 18° C. and further incubate the media for 16 to 19 hours with vigorous shaking at 18° C.
- 4. Spin down cells and freeze on dry ice.
- 5. Lyse cells in 10 mM Tris, pH 8.0, 0.3 to 0.5 M NaCl, 10 mM Imidazole (Buffer A).
- 6. Remove insoluble material by centrifugation and load the soluble fraction onto ˜15 ml of Ni-NTA (or Co-NTA) resin which is pre-equilibrated in Buffer A.
- 7. Wash the resin with Buffer A until the absorbance at 260 and 280 reaches the buffer level.
- 8. Add HisTag-RBP-HRV3C Protease (1 to 2% (w/w) of the target protein) to the resin and gently mix at 4° C. overnight.
- 9. Collect the flow through and further wash the resin with Buffer A till OD280˜0.
- If the target protein has free thiols, β-mercaptoethanol or a reagent with a similar function should be present.
- In some cases, it is preferable to elute target protein fused to split RBP from the resin and then mix HRV3C protease (or any other protease) to the fusion protein.

While the disclosure has been particularly shown and described with reference to specific embodiments, it should be understood by those having skill in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the present disclosure as disclosed herein.

Number	Name	Date	Kind
6303128	Webb et al.	Oct 2001	B1
6410220	Hodgson et al.	Jun 2002	B1
20050130269	Gokce et al.	Jun 2005	A1
20140302078	Masignani et al.	Oct 2014	A1

	Number	Date	Country
	62411295	Oct 2016	US
	62518207	Jun 2017	US

Compositions and methods comprising permuted protein tags for facilitating overexpression, solubility, and purification of target proteins

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

Field of Search

US

International Classifications

Term Extension

Abstract

Description

Claims

CROSS REFERENCE TO RELATED APPLICATIONS

GOVERNMENT SUPPORT CLAUSE

PCT Information

US Referenced Citations (4)

Foreign Referenced Citations (1)

Related Publications (1)

Provisional Applications (2)