1. Field of the Invention
The present invention relates to the field of biotechnology and more specifically to the artificial sequences used as affinity tags biotechnological applications.
2. Background of the Prior Art
Peptide affinity ligands are useful for several processes. One important use is for diagnostic or research assays. Antibodies or ligands that recognize a specific assay target can be labeled with a peptide affinity tag and identified by having a second ligand that recognizes the affinity tag and also is labeled with an enzyme or other means to produce a detectable signal.
A number of natural and synthetic peptide affinity tags are known to the industry and have been used for affinity purification and diagnostic assays. None of the known peptide affinity tags have the property of the present invention wherein amino acids expressed from their nucleotide sequences have the same amino acid sequence when the nucleotide sequence is out-of-frame in either +1 or −1 direction. It would be desirable to be able to detect amino acids sequences produced by out-of-frame gene expression using a single affinity ligand that recognizes the out of frame sequence and differs from the ligand that recognizes the in-frame sequence.
Described are polynucleotide sequences that encode peptide affinity tags with the surprising property that all of the possible out-of-frame peptide sequences are identical to each other, yet are distinct from the in-frame encoded peptide tag. The present invention includes peptide affinity tags incorporated in expressed proteins. Also described are ligands and antibodies that recognize the in-frame peptides and separate ligands or antibodies that recognize the out-of-frame peptides.
The present invention includes nucleotide sequences that encode one amino acid sequence when expressed in-frame and different amino acid sequences when expressed out-of-frame, wherein, said out-of-frame amino acid sequences are identical if the out-of-frame mutation is either a +1 or a −1 nucleotide mutation. The present invention also includes peptide sequences produced by the expression of nucleotide sequences that encode one amino acid sequence when expressed in-frame and different amino acid sequences when expressed out-of-frame, wherein, said out-of-frame amino acid sequences are identical if the out-of-frame mutation is either a +1 or a −1 nucleotide mutation.
A −1 frameshift refers to a deletion mutation that results in a shift in reading frame by loss of one nucleotide and a +1 frameshift refers to an insertion mutation that results in a shift in reading frame by +1 nucleotide. Deletion mutations that result in a shift in reading frame of two nucleotides are in the same reading frame as +1 frameshifts. Insertion mutations that result in a shift in reading frame of two nucleotides are in the same reading frame as −1 frameshifts. Either of these mutations occur in sequences prior to the affinity tag sequence.
One form of the invention includes a nucleotide sequence that encodes one amino acid sequence when expressed in-frame and a different amino acid sequence when expressed out-of-frame as a result of a single nucleotide change, wherein said out-of-frame amino acid sequence is the same whether single nucleotide change is a nucleotide addition or deletion. This can comprise at least two repeats of the nucleotide sequence ctttcc (example SEQ ID No. 1, 9, 17, 25, 34, 42), at least two repeats of the nucleotide sequence tccctt (example SEQ ID No. 2, 10, 18, 26, 35, 43, 116), at least two repeats of the nucleotide sequence agggaa (example SEQ ID No. 3, 11, 19, 27, 36, 44), or at least two repeats of the nucleotide sequence gaaagg (example SEQ ID No. 4, 12, 20, 28, 29, 37, 45). A preferred embodiment would have 2-10 repeats of a nucleotide sequence selected from the list; gaaagg, cttcc, tcctt, or agggaa.
Another aspect of the invention is ligands, to include antibodies and antibody fragments, that bind to amino acid sequences of the present invention. In this form of the invention a ligand that specifically binds to an amino acid sequence that comprises two or more repeats of amino acid pairs selected from the lists of in-frame amino acid pairs leucine-serine (example SEQ ID No. 30, 58, 90); serine-leucine (example SEQ ID No. 31, 59, 75, 91); arginine-glutamic acid (example SEQ ID No. 32, 60, 76, 92); glutamic acid-arginine (example SEQ ID No. 33, 61, 77, 93) or of out-of-frame amino acid pairs proline-phenylalanine (example SEQ ID No. 55, 71, 87, 103); phenylalanine-proline (example SEQ ID No. 54, 70, 86, 102); lysine-glycine (example SEQ ID No. 57, 73, 89, 105); glycine-lysine (example SEQ ID No. 56, 72, 88, 104, 131). These amino acid sequences can have an additional amino acid in the form of in-frame amino acid pairs (leucine-serine)n+leucine (example SEQ ID No. 66, 74, 82, 98); (serine-leucine)n+serine (example SEQ ID No. 67, 83, 99); (arginine-glutamic acid)n+arginine (example SEQ ID No. 68, 84, 100); (glutamic acid-arginine)n+glutamic acid (example SEQ ID No. 69, 101, 127) or out-of-frame amino acid pairs (proline-phenylalanine)n+proline (example SEQ ID No. 51, 63, 79, 95); (phenylalanine-proline)n+phenylalanine (example SEQ ID No. 50, 62, 78, 94); (lysine-glycine)n+lysine (example SEQ ID No. 53, 65, 81, 85, 97); or (glycine-lysine)n+glycine (example SEQ ID No. 52, 64, 80, 96) wherein n represents n number of amino acid pairs and wherein n is 2 or greater.
In one aspect of the invention, amino acid sequences of the present invention are amino acid sequences that are comprised of repeats of the amino acid pairs proline-phenylalanine; phenylalanine-proline; lysine-glycine; or glycine-lysine.
In another aspect of the invention, amino acid sequences of the present invention comprise two or more repeats of amino acid pairs selected from the list comprising in-frame amino acid pairs leucine-serine (example SEQ ID No. 30, 58, 90); serine-leucine (example SEQ ID No. 31, 59, 75, 91); arginine-glutamic acid (example SEQ ID No. 32, 60, 76, 92); glutamic acid-arginine (example SEQ ID No. 33, 61, 77, 93) or out-of-frame amino acid pairs proline-phenylalanine (example SEQ ID No. 55, 71, 87, 103); phenylalanine-proline (example SEQ ID No. 54, 70, 86, 102); lysine-glycine (example SEQ ID No. 57, 73, 89, 105); glycine-lysine (example SEQ ID No. 56, 72, 88, 104, 131). These sequences can additional an amino acid in the form of in-frame amino acid pairs (leucine-serine)n+leucine (example SEQ ID No. 66, 74, 82, 98); (serine-leucine)n+serine (example SEQ ID No. 67, 83, 99); (arginine-glutamic acid)n+arginine (example SEQ ID No. 68, 84, 100); (glutamic acid-arginine)n+glutamic acid (example SEQ ID No. 69, 101, 127) or out-of-frame amino acid pairs (proline-phenylalanine)n+proline (example SEQ ID No. 51, 63, 79, 95); (phenylalanine-proline)n+phenylalanine (example SEQ ID No. 50, 62, 78, 94); (lysine-glycine)n+lysine (example SEQ ID No. 53, 65, 81, 85, 97); or (glycine-lysine)n+glycine (example SEQ ID No. 52, 64, 80, 96) wherein n represents n number of amino acid pairs and wherein n is 2 or greater and the total number of amino acids in the sequence is an odd number.
Peptide affinity tags are amino acid sequences inserted in a protein to assist detection or purification of that protein. Recombinant proteins produced in a fermenter are produced among a mixture of other proteins. It is important to be able to economically recover the desired proteins with high purity from the complicated mix from which it resides. Affinity purification is one method. In order to eliminate the need to produce antibodies or ligands that specifically bind to every different protein produced, affinity tags can be added to an amino acid sequence. These tags are encoded in DNA in genetic sequences used to express the proteins and they produce amino acid sequences that can be recognized by specific ligands or antibodies. Such affinity tags are amino acid sequences that are inserted into a polypeptide or protein to assist detection or purification of that polypeptide or protein. Affinity tags are typically inserted into polypeptides or proteins by manipulating the encoding polynucleotides for those polypeptides or proteins. Affinity tags are used for identifying proteins in assays; they can be used for purification of proteins and have other uses.
Artificial affinity tags described here have the surprising property of being expressed as one amino acid sequence when expressed in-frame from their encoding polynucleotide sequences and a distinct separate and unique amino acid sequence when expressed out-of-frame with the additional property that the peptide expressed from an out-of frame nucleotide sequence is the same whether the frameshift consists of a gain (+1) or a loss (−1) of a nucleotide. This occurs despite the fact that the polynucleotide sequences in the +1 and −1 frameshift may differ. In fact, this is another surprising aspect of one embodiment of the present invention.
Nucleotide sequences that produce identical amino acid sequences in each out of frame direction do not produce identical out of frame nucleotide codons. They have alternate codon usage in each direction.
One aspect of another embodiment of the invention comprises ligands and antibodies that recognize the in-frame peptide and separate ligands or antibodies that recognize the out-of-frame peptides. As utilized herein, antibodies, antibody fragments, single domain antibodies, single chain antibodies and other forms or fragments of antibodies are a subset of the term ligand.
Candidate codon combinations and corresponding amino acids with the desired properties are shown in
When mentioned, a −1 frameshift refers to a deletion mutation that results in a shift in reading frame by −1 nucleotide and a +1 frameshift refers to an insertion mutation that results in a shift in reading frame by +1 nucleotide. Deletion mutations that result in a shift in reading frame of two nucleotides are in the same reading frame as +1 frameshifts. Insertion mutations that result in a shift in reading frame of two nucleotides are in the same reading frame as −1 frameshifts.
In one aspect of the invention, multiple codon (trinucleotide) pairs are used in genetic sequences to express alternating amino acids. Codon pairs can be selected from the list comprising ctttcc, tccctt, agggaa or gaaagg. The codon pairs are used as multiples in the form of (ctttcc)×n, (tccctt)×n, (agggaa)×n, or (gaaagg)×n where n is any number greater than 1. For example, (ctttcc)×2 is ctttccctttcc (SEQ ID No. 1), (tccctt)×2 is tccctttccctt (SEQ ID No. 2), agggaa×2 is agggaaagggaa (SEQ ID No. 3) and gaaagg×2 is gaaagggaaagg (SEQ ID No. 4). They can also be used in codon pair multiples with a preceding or following single codon to make an odd number of codons. This can take the form of ((ctttcc)×n)+ctt, ((tccctt)×n)+tcc, ((agggaa)×n)+agg, or ((gaaagg)×n)+gaa, where “n” is any number greater than 1. For example, ((ctttcc)×2)+ctt is ctttccctttccctt (SEQ ID No. 5), ((tccctt)×2)+tcc is tccctttccctttcc (SEQ ID No. 6), (agggaa×2)+agg is agggaaagggaaagg (SEQ ID No. 7) and (gaaagg×2)+gaa is gaaagggaaagggaa (SEQ ID No. 8). They also can take the form of tcc+((ctttcc)×n), ctt+((tccctt)×n), gaa+((agggaa)×n) or agg+((gaaagg)×n) where n is any number greater than 1. For example, tcc+((ctttcc)×2) is tccctttccctttcc (SEQ ID No. 6), ctt+((tccctt)×2) is ctttccctttccctt (SEQ ID No. 5), gaa+(agggaa×2) is gaa+agggaaagggaa (SEQ ID No. 8) and agg+(gaaagg×2) is agggaaagggaaagg (SEQ ID No. 7). In a preferred mode n is from 2 to 12.
Using different formula nomenclature, the codon pairs are used as multiples in the form of (ctttcc)n, (tccctt)n, (agggaa)n or (gaaagg)n where n is any number greater than 1 and ( )n indicates multiplication. They also can be used in codon pair multiples with a preceding or following single codon to make an odd number of codons. This can take the form of [(ctttcc)n+ctt], [(tccctt)n+tcc], [(agggaa)n+agg], [(gaaagg)n+gaa], [tcc+(ctttcc)n], [ctt+(tccctt)n], [gaa+(agggaa)n] or [agg+(gaaagg)n].
Other examples of codon combinations corresponding to the formulas in the preceding two paragraphs are illustrated in sequences shown in (SEQ ID No. 9) through (SEQ ID No. 24). This is not an exhaustive sequence list.
Genetic manipulations such as the polymerase chain reaction can introduce mutations in a nucleotide sequence. Especially disruptive to expressed protein sequences are frameshift mutations that change the expressed amino acid sequence beyond the frameshift. This can lead to loss of product and/or contamination of product. The ability to detect and/or remove contaminating proteins from protein mixtures can be useful. In most cases, frameshift mutations lead to amino acid sequences that are different in all three frames. This includes −1 and +1 frameshifts which are not the same. As an example, the gene sequence (SEQ ID No. 110) in
Nucleotide sequences of the invention encode one amino acid sequence when in-frame and another amino sequence when out-of-frame. But surprisingly, in contrast to the previous example, each out-of-frame nucleotide sequence encodes the same amino acid sequence if a nucleotide is gained (+1) or if one is lost (−1).
One aspect of the invention would be polynucleotide sequences of the invention encoding an even number of in-frame amino acids, resulting in an odd number of out-of frame amino acids (if there were a frameshift). These would take the form of [(ctttcc)×n]; [(tccctt)×n]; [(agggaa)×n]; or [(gaaagg)×n], or equivalently, [(ctttcc)n]; [(tccctt)n]; [(agggaa)n]; or [(gaaagg)n], where “n” is any integer greater than 1. Typical ranges for “n” are 2 to 8. In another aspect of the invention, ranges for “n” are 2 to 12.
In one aspect of the invention,
It is recognized in the examples, that each of the out-of-frame sequences are one codon shorter than the in-frame sequence of the invention and the that loss of a codon occurs at different ends of the sequence for +1 and −1 frameshift mutations. For the +1 frameshift, the loss is at the 5′ end of the selected sequence and, for the −1 frameshift, the loss is at the 3′ end of the selected sequence. In spite of that, both encoded amino acid sequences are the same.
In the aspect of the invention shown in
The embodiment described in
Within each of the four above examples of the invention, the two out-of-frame nucleotide sequences are different, yet they produce an identical amino acid sequence. This is possible because of the redundancy of the genetic code. It makes discovery of in-frame nucleotide sequences that produce the same amino acid sequences with an out-of-frame nucleotide sequence difficult. No nucleotide sequences could be found that produced identical out-of-frame nucleotide sequences in the −1 and +1 directions.
It also is apparent from the examples that the number of codons in a sequence of the invention is usually one less than the in-frame sequence after a one nucleotide frameshift (+1 or −1) has occurred. With an even number of codons in the in-frame sequence, this results in an odd number of codons in the out-of-frame sequences. In order to produce an out-of-frame sequence with an even number of codons, an odd number of codons must be used in the in-frame sequence. This may be achieved by adding half of a pair of codons (i.e. one codon) to one end of the in-frame sequence. Examples to illustrate this are shown in
Another aspect of the invention would be polynucleotide sequences of the invention encoding an odd number of in-frame amino acids, resulting in an even number of out-of frame amino acids if there were a frameshift. To reduce possible steric hindrance, multiple sets of amino acids may be used. 2-3 repeats of 5-15 amino acids could be used. These would take the form of [(ctttcc)×n]+ctt; [(tccctt)×n]+tcc; [(agggaa)×n]+agg; or [(gaaagg)×n]+gaa, or equivalently, [(ctttcc)n+ctt]; [(tccctt)n+tcc]; [(agggaa) n+agg]; or [(gaaagg)n+gaa], where “n” is any integer greater than 1. Typical ranges for “n” are 2 to 8. In another aspect of the invention, ranges for “n” are 2 to 12.
In
In
It is useful to be able to detect amino acid sequences that result from in-frame expression and out-of-frame expression of nucleotide sequences of the present invention. Ligands can be made that specifically bind to selected amino acid sequences. Ligands can consist of antibodies, single chain antibodies, single domain antibodies artificial ligands and any other protein sequence that binds to a selected amino acid sequence. Ligands also can be made artificially from molecularly imprinted polymers or other polymers. Ligands can also be composed of other organic polymers such as RNA, DNA or peptide nucleic acids.
To increase avidity of ligand binding, multiple sets of affinity tags may be used, preferably connected by a linker to decrease possible steric hindrance. The linker preferably would consist of multiple repeats of ggg, aaa, ttt or ccc. Two, three, four or more repeats of 4-15 amino acids could be used. Linkers would not necessarily produce the same linker amino acid sequence from +1 and −1 frameshift mutations.
gaaagggaaagggaaagggaaaggGGGGGGGGGgaaagggaaagggaaa
gggaaagg
contains two sets of nucleotide sequences of the invention (in bold) connected by nine guanine nucleotides that encode three glycine amino acids when in-frame. The in-frame amino acid sequence is:
GluArgGluArgGluArgGluArgGlyGlyGlyGluArgGluArgGlu
ArgGluArg.
The insertion (+1) frameshift amino acid sequence is:
LysGlyLys
The deletion (−1) frameshift amino acid sequence is
LysGlyLys.
The +1 and −1 frameshift amino acid sequences are the same. The linker is one that expresses three glycine amino acids in-frame and four glycine amino acids in each out-of-frame expression. Thus, the nucleotide sequence GGGGGGGGG works well as a linker.
Other linkers produce results that differ among the expression frames.
ctttccctttccctttcccttGGGGGGGGGctttccctttccctttccc
tt
expresses in-frame as
LeuSerLeuSerLeuSerLeuGlyGlyGlyLeuSerLeuSerLeuSer
Leu
The insertion (+1) frameshift amino acid sequence is:
Pro.
The deletion (−1) frameshift as amino acid sequence is:
Another aspect of the invention are ligands, to include antibodies or antibody fragments, that specifically bind to an amino acid sequence encoded by a nucleotide sequence comprised of two or more repeats of nucleotide sequences selected from the list comprising ctttcc, tccctt, agggaa, gaaagg. The ligands would specifically bind to an amino acid sequence comprising two or more repeats of amino acid pairs selected from the list comprising proline-phenylalanine; phenylalanine-proline; lysine-glycine; glycine-lysine.
Another aspect of one embodiment of the invention are amino acid sequences selected from the list comprising (proline-phenylalanine)n+proline; (phenylalanine-proline)n+phenylalanine; (lysine-glycine)n+lysine; or (glycine-lysine)n+glycine where n represents n number of amino acid pairs where n is 2 or greater. This includes simple repeats of amino acid sequences containing two or more repeats of amino acid pairs selected from the list comprising proline-phenylalanine; phenylalanine-proline; lysine-glycine; glycine-lysine.
Another aspect of the invention is a nucleotide sequence and the amino acid sequence encoded by said nucleotide sequence wherein, said out-of-frame amino acid sequences are identical in all of the possible one or two out-of-frame reading frames.
One aspect of the invention are ligands that bind to amino acid sequences of the present invention; recognizing amino acid sequences that are comprised of repeats of the amino acid pairs proline-phenylalanine; phenylalanine-proline; lysine-glycine; or glycine-lysine.
Another aspect of the invention is ligands, to include antibodies and antibody fragments, that bind to amino acid sequences of the present invention. The ligands recognize amino acid sequences that are comprised of repeats of amino acids in the form of (proline-phenyalanine)n+proline; (phenylalanine-proline)n+phenylalanine; (lysine-glycine)n+lysine; (glycine-lysine)n+glycine; phenylalanine+(proline-phenylalanine)n; proline+(phenylalanine-proline)n; glycine+(lysine-glycine)n; or lysine+(glycine-lysine)n; (proline-phenyalanine)n; (phenylalanine-proline)n; (lysine-glycine)n; (glycine-lysine)n; where “n” is any integer greater than 1.
A person of ordinary skill in the art would recognize that antibody and antibody fragments can be obtained through the use of standard methods For example, antibodies can be made by vaccinating animals with peptides or proteins containing peptide sequences described herein. Hybridomas can be made using B cells collected from animals using immortalization techniques known to the art. For example, peptide specific, antibody secreting hybridomas can be made using electrofusion, polyethylene glycol fusion or other fusion methods to fuse antigen specific antibody secreting B cells with myeloma cells. Alternatively, antibodies can be made using molecular techniques. Libraries of antibody encoding genes can be isolated from vaccinated animals, unvaccinated animals or made synthetically. Antigen specific antibodies can be isolated from the antibody encoding DNA library using display techniques that maintain association of an expressed antibody and its encoding polynucleotide. A number of methods have been devised to identify protein-protein interactions that also allow for the recovery of genetic material that encodes the identified proteins. Some of these technologies work by reconstituting cellular functions in vivo while others utilize in vitro binding assays to identify physical interactions. Examples are the two-hybrid system, phage display, cellular display, ribosome display, and mRNA display.
It is recognized that traditional antibodies are only one form of ligand that can bind to a peptide sequence. Alternate protein scaffolds are known to the art. An example of one alternate scaffold is VHH fragments isolated from single domain camelid antibodies. Other scaffolds may be used to make ligands that bind the affinity TAGs described herein.
In one aspect of the invention, amino acid sequences of the present invention are amino acid sequences that are comprised of repeats of the amino acid pairs proline-phenylalanine; phenylalanine-proline; lysine-glycine; or glycine-lysine.
In another aspect of the invention, amino acid sequences of the present invention, are amino acid sequences that are comprised of repeats of amino acids in the form of (proline-phenylalanine)n+proline; (phenylalanine-proline)n+phenylalanine; (lysine-glycine)n+lysine; (glycine-lysine)n+glycine; phenylalanine+(proline-phenylalanine)n; proline+(phenylalanine-proline)n; glycine+(lysine-glycine)n or lysine+(glycine-lysine)n, where “n” is any integer greater than 1.
One form of the invention includes a nucleotide sequence that encodes one amino acid sequence when expressed in-frame and a different amino acid sequence when expressed out-of-frame as a result of a single nucleotide change, wherein said out-of-frame amino acid sequence is the same whether single nucleotide change is a nucleotide addition or deletion. This can comprise at least two repeats of the nucleotide sequence ctttcc (example SEQ ID No. 1, 9, 17, 25, 34, 42), at least two repeats of the nucleotide sequence tccctt (example SEQ ID No. 2, 10, 18, 26, 35, 43, 116), at least two repeats of the nucleotide sequence agggaa (example SEQ ID No. 3, 11, 19, 27, 36, 44), or at least two repeats of the nucleotide sequence gaaagg (example SEQ ID No. 4, 12, 20, 28, 29, 37, 45). A preferred embodiment would have 2-10 repeats of a nucleotide sequence selected from the list; gaaagg, cttcc, tcctt, or agggaa.
One form of the invention includes at least two repeats of the nucleotide sequence ctttcc followed by an additional ctt (example SEQ ID No. 5, 13, 21, 38, 46, 106). Another form includes at least two repeats of the nucleotide sequence tccctt followed by an additional tcc (example SEQ ID No. 6, 14, 22, 39, 47, 107). Another form includes at least two repeats of the nucleotide sequence agggaa followed by an additional agg (example SEQ ID No. 7, 15, 23, 40, 48, 108). Still another form includes at least two repeats of the nucleotide sequence gaaagg followed by and additional gaa (example SEQ ID No. 8, 16, 24, 41, 49, 109).
One form of the invention includes at least two repeats of the nucleotide sequence ctttcc preceded by an additional tcc (example SEQ ID No. 6, 14, 22, 39, 47, 107). Another form includes at least two repeats of the nucleotide sequence tccctt preceded by an additional ctt (example SEQ ID No. 5, 13, 21, 38, 46, 106). Another form includes at least two repeats of the nucleotide sequence agggaa preceded by an additional gaa (example SEQ ID No. 8, 16, 24, 41, 49, 109). Still another form includes at least two repeats of the nucleotide sequence gaaagg preceded by an additional agg (example SEQ ID No. 7, 15, 23, 40, 48, 108).
Another form of the invention includes a ligand that specifically binds to an amino acid sequence comprising two or more repeats of amino acid pairs selected from the list comprising in-frame amino acid pairs leucine-serine (example SEQ ID No. 30, 58, 90); serine-leucine (example SEQ ID No. 31, 59, 75, 91); arginine-glutamic acid (example SEQ ID No. 32, 60, 76, 92); glutamic acid-arginine (example SEQ ID No. 33, 61, 77, 93) or out-of-frame amino acid pairs proline-phenylalanine (example SEQ ID No. 55, 71, 87, 103); phenylalanine-proline (example SEQ ID No. 54, 70, 86, 102); lysine-glycine (example SEQ ID No. 57, 73, 89, 105); glycine-lysine (example SEQ ID No. 56, 72, 88, 104, 131). These amino acid sequences can have an additional amino acid in the form of in-frame amino acid pairs (leucine-serine)n+leucine (example SEQ ID No. 66, 74, 82, 98); (serine-leucine)n+serine (example SEQ ID No. 67, 83, 99); (arginine-glutamic acid)n+arginine (example SEQ ID No. 68, 84, 100); (glutamic acid-arginine)n+glutamic acid (example SEQ ID No. 69, 101, 127) or out-of-frame amino acid pairs (proline-phenylalanine)n+proline (example SEQ ID No. 51, 63, 79, 95); (phenylalanine-proline)n+phenylalanine (example SEQ ID No. 50, 62, 78, 94); (lysine-glycine)n+lysine (example SEQ ID No. 53, 65, 81, 85, 97); or (glycine-lysine)n+glycine (example SEQ ID No. 52, 64, 80, 96) wherein n represents n number of amino acid pairs and wherein n is 2 or greater and the total number of amino acids in the targeted sequence is an odd number.
Another form of the invention includes an amino acid sequence comprising two or more repeats of amino acid pairs selected from the list comprising in-frame amino acid pairs leucine-serine (example SEQ ID No. 30, 58, 90); serine-leucine (example SEQ ID No. 31, 59, 75, 91); arginine-glutamic acid (example SEQ ID No. 32, 60, 76, 92); glutamic acid-arginine (example SEQ ID No. 33, 61, 77, 93) or out-of-frame amino acid pairs proline-phenylalanine (example SEQ ID No. 55, 71, 87, 103); phenylalanine-proline (example SEQ ID No. 54, 70, 86, 102); lysine-glycine (example SEQ ID No. 57, 73, 89, 105); glycine-lysine (example SEQ ID No. 56, 72, 88, 104, 131). These sequences can additional an amino acid in the form of in-frame amino acid pairs (leucine-serine)n+leucine (example SEQ ID No. 66, 74, 82, 98); (serine-leucine)n+serine (example SEQ ID No. 67, 83, 99); (arginine-glutamic acid)n+arginine (example SEQ ID No. 68, 84, 100); (glutamic acid-arginine)n+glutamic acid (example SEQ ID No. 69, 101, 127) or out-of-frame amino acid pairs (proline-phenylalanine)n+proline (example SEQ ID No. 51, 63, 79, 95); (phenylalanine-proline)n+phenylalanine (example SEQ ID No. 50, 62, 78, 94); (lysine-glycine)n+lysine (example SEQ ID No. 53, 65, 81, 85, 97); or (glycine-lysine)n+glycine (example SEQ ID No. 52, 64, 80, 96) wherein n represents n number of amino acid pairs and wherein n is 2 or greater and the total number of amino acids in the sequence is an odd number.
Affinity TAGs describe herein may be used to purify proteins produced during in vivo or in vitro protein production. An antibody recognizing an affinity TAG can be attached to beads in a column using standard bioconjugation techniques and a biological solution containing a desired protein run through the column to separate, purify and/or concentrate the desired protein. Proteins are eluted from the column using high salt or high or low pH solutions.
The affinity TAGs described herein are especially useful to help monitor the quality of gene expression. An affinity TAG DNA sequence applied to a sequence immediately after the Kozac sequence or after a start codon will help monitor initial protein expression by identifying out-of-frame expression at the beginning of the sequence. The DNA encoding the affinity TAG can be placed any place down- stream of the start codon to include the 3′ end of the gene to detect frameshifts in other areas of the sequence. It is recognized that insertions closer to the 3′ end increase the risk that a frameshift mutation may create a stop codon prior to the TAG sequence, thus insertions closer to the 5′ end are more likely to detect a frameshift using the TAG sequence. An affinity TAG described herein can be placed at each end of a protein to monitor frame-shifts between TAGs. In this instance, an antibody recognizing an in-frame sequence is used to capture the desired protein while an antibody recognizing the out-of-frame sequence is used to detect frameshift mutations between the two TAG sites.
Frameshift detection using sequences described herein may be especially useful for monitoring bioprocesses. Ligands or antibodies detecting in-frame and out-of-frame gene expression would be used in assays. Assays showing the relative amount of in-frame TAG sequence and out-of-frame TAG sequence would provide a quantitative measure of process stability or deterioration.
TAG sequences also may be used as a marker to identify specific sets of proteins.
The invention has been described with references to a preferred embodiment. While specific values, relationships, materials and steps have been set forth for purposes of describing concepts of the invention, it will be appreciated by persons skilled in the art that numerous variations and/or modifications may be made to the invention as shown in the specific embodiments without departing from the spirit or scope of the basic concepts and operating principles of the invention as broadly described. It should be recognized that, in the light of the above teachings, those skilled in the art can modify those specifics without departing from the invention taught herein. Having now fully set forth the preferred embodiments and certain modifications of the concept underlying the present invention, various other embodiments as well as certain variations and modifications of the embodiments herein shown and described will obviously occur to those skilled in the art upon becoming familiar with such underlying concept. It is intended to include all such modifications, alternatives and other embodiments insofar as they come within the scope of the appended claims or equivalents thereof. It should be understood, therefore, that the invention may be practiced otherwise than as specifically set forth herein. Consequently, the present embodiments are to be considered in all respects as illustrative and not restrictive.
The following references cited in the specification are hereby incorporated by reference in their entirety.
This application claims the priority benefit under 35 U.S.C. §119(e) of U.S. Provisional Application Ser. No. 61/722,536, entitled “AFFINITY TAGS ABLE TO DETECT FRAMESHIFTS” filed Nov. 5, 2012, which is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
61722536 | Nov 2012 | US |