Identifying peptides that bind to the surface of a protein with high affinity and high specificity is a time consuming process. In general, peptides are first selected from libraries of vast repertoires using any of a number of in vitro or in vivo selection technologies (i.e., DNA-display, phage display, mRNA display, ribosome display etc.). This process usually requires many cycles of selection and amplification, sometimes followed by additional rounds of directed evolution to optimize a given sequence for improved binding. The output of these selections are then cloned and sequenced, although in some cases, sequencing is done by mass spectrometry. Representative sequences are constructed by solid-phase synthesis, purified by HPLC, and assayed for affinity and specificity. Relative and specific solution binding affinities (Kd's) are typically measured on an individual basis using competitive binding assays or surface plasmon resonance (SPR). In total, this process is a costly endeavor that can easily take 2-3 months to complete per target.
Developing protein affinity reagents on a proteome-wide scale demands advances in peptide and protein selection technologies that reduce the time and cost required to generate and characterize high quality affinity reagents.
In a first aspect, the present invention provides recombinant double stranded DNA constructs and nucleic acid libraries comprising a plurality of the recombinant double stranded DNA constructs, wherein each double stranded DNA construct comprises
(a) a promoter;
(b) one or more translation enhancement elements downstream of the promoter and upstream of the start codon;
(c) a start codon downstream of the one or more translation enhancing element;
(d) a random region of at least about 18 to about 60 nucleotides immediately downstream from the start codon;
(e) a protease cleavage site downstream of the random region;
(f) a unique restriction enzyme recognition site downstream of the protease cleavage site; and
(g) a heterologous cross-linking region downstream of the unique restriction enzyme recognition site.
In the libraries of this aspect of the invention, at least 1011 different random sequences are represented in the plurality of double stranded nucleic acid constructs.
In one embodiment, expressed RNA from the cross-linking region can serve as a site for ligation to a linker containing a 3′-puromycin residue. In another embodiment, RNA expressed from the cross linking region is complementary to a DNA linker sequence to be used.
In a second aspect, the present invention provides recombinant double stranded DNA constructs and nucleic acid libraries comprising a plurality of the recombinant double stranded DNA constructs, wherein each double stranded DNA construct comprises
(a) a first restriction enzyme recognition site;
(b) one or more translation enhancement elements downstream of the first restriction enzyme recognition site;
(c) a start codon downstream of the one or more translation enhancement elements;
(d) a random region of at least about 18 to about 60 nucleotides immediately downstream from the start codon, wherein the peptide encoded by the random region of each linear recombinant double stranded DNA construct is capable of binding to the same target;
(e) a protease cleavage site downstream of the random region; and
(f) a second restriction enzyme recognition site downstream of the protease cleavage site.
In the libraries of this second aspect of the invention, at least 10 different random sequences are represented in the plurality of double stranded nucleic acid constructs.
In one embodiment, the double stranded DNA constructs comprises plasmids. In another embodiment, the recombinant double stranded DNA constructs further comprises:
(g) a promoter upstream of the first restriction enzyme recognition site; and
(h) a region encoding a peptide purification tag downstream of the second restriction enzyme recognition site.
In a third aspect, the invention provides methods for identifying polypeptide ligands for a target of interest, comprising
(a) contacting the recombinant nucleic acid constructs or nucleic acid library of any embodiment or combination of embodiments of the second aspect of the invention with reagents for RNA transcription under conditions to promote transcription of RNA from the double stranded nucleic acid constructs, resulting in an RNA expression product;
(b) contacting the RNA expression product with reagents for protein expression under conditions to promote translation of detectable polypeptide;
(c) incubating the detectable polypeptide with a target of interest under suitable conditions to promote binding of the detectable polypeptide to the target, to produce binding complexes; and
(d) analyzing the detectable polypeptides bound to the target.
In a fourth aspect, the invention provides methods for identifying peptide ligands for a target of interest, comprising
(a) contacting the recombinant nucleic acid constructs or the nucleic acid library of any embodiment or combination of embodiments of the first aspect of the invention with reagents for RNA transcription under conditions to promote transcription of RNA from the double stranded nucleic acid constructs, resulting in an RNA expression product;
(b) contacting the RNA expression product with reagents for ligating a linker containing a puromycin residue to the 3′ end of the RNA expression product, resulting in a labeled RNA expression product;
(c) contacting the labeled RNA expression product with reagents for protein expression under conditions to promote protein translation from the labeled RNA expression product, resulting in a RNA-polypeptide fusion product;
(d) reverse transcribing the RNA-polypeptide fusion products to produce an RNA-polypeptide fusion product-cDNA heteroduplex;
(e) incubating the RNA-polypeptide fusion product-cDNA heteroduplexes with a target of interest;
(f) removing RNA-polypeptide fusion product-cDNA heteroduplexes that are not bound to the target of interest, resulting in binding complexes; and
(g) amplifying ligand-bound RNA-polypeptide fusion product-cDNA heteroduplexes in the binding complexes, to produce double stranded DNA constructs that can be used to identify the peptide ligands bound to the target of interest.
In a fifth aspect, the present invention provides kits comprising
(a) the nucleic acid library of any embodiment or combination of embodiments of the first aspect of the invention; and
(b) an expression vector, wherein, the expression vector comprises:
In a sixth aspect, the invention provides separation devices, comprising:
(a) a multiwell plate;
(b) a regenerated cellulose layer below the multiwell plate, wherein the regenerated cellulose layer has a pore size suitable to retain peptides bound to a target, but not to retain unbound peptides; and
(c) a nylon membrane layer below the regenerated cellulose layer, wherein the nylon membrane layer has a pore size suitable to retain unbound peptides
In a seventh aspect, the invention provides an RNA pool resulting from transcription of the library of the first aspect or the second aspect of the invention.
In a first aspect, the present invention provides recombinant double stranded DNA constructs, or nucleic acid libraries comprising a plurality of the recombinant double stranded DNA constructs, wherein each double stranded DNA construct comprises
(a) a promoter;
(b) one or more translation enhancement elements downstream of the promoter and upstream of the start codon;
(c) a start codon downstream of the one or more translation enhancing elements;
(d) a random region of at least about 18 to about 60 nucleotides immediately downstream from the start codon;
(e) a coding region for a protease cleavage site downstream of the random region;
(f) a unique restriction enzyme recognition site downstream of the protease cleavage site; and
(g) a heterologous cross-linking region downstream of the unique restriction enzyme recognition site;
In the libraries of this aspect of the invention, at least 1011 different random sequences are represented in the plurality of double stranded nucleic acid constructs.
The nucleic acid libraries according to all aspects of the present invention can be used, for example, in the methods of the invention for identifying peptide ligands for a target of interest. The libraries comprise a series of linear constructs, which, when used in in vitro selection methods as described herein, permit use of a library diversity of at least 1011 different polynucleotide sequences. As used herein, a “library” is a collection of linear double stranded nucleic acid constructs.
As used herein, “heterologous” means that none of the promoter, translation enhancement element (TEE), random region, and cross-linking region are normally associated with each other (i.e.: they are not part of the same gene in vivo), but are recombinantly combined in the construct.
As used herein, a “promoter” is any DNA sequence that can be used to help drive RNA expression of a DNA sequence downstream of the promoter. Suitable promoters include, but are not limited to, the T7 promoter, SP6 promoter, CMV promoter, and vaccinia virus synthetic-late promoter. As will be understood by those of skill in the art, a given double stranded DNA construct may contain more than one promoter, as appropriate for a given proposed use.
As used herein, a translation enhancement element (TEE) can be any polynucleotide domain that mediates cap-independent protein translation. Any suitable TEE can be used, including but not limited to SEQ ID NO: 7-645, listed in Table 1. In a preferred embodiment, the isolated polynucleotides consist of the recited sequence. In a further embodiment, the isolated polynucleotides comprise the sequence of SEQ ID NO:4 (A/-) (A/G)ATC(A/G)(A/G)TAAA(T/C)G, wherein the isolated polynucleotides is between 13-200 nucleotides in length. SEQ ID NO:4 is a consensus sequence found within a number of the TEES (Clones 985 (SEQ ID NO:448), 1092 (SEQ ID NO:495), 1347 (SEQ ID NO:623), 906 (SEQ ID NO:408), 12 (SEQ ID NO:12), 1200 (SEQ ID NO:553), 958 (SEQ ID NO:434), 1011 (SEQ ID NO:458), 459 (SEQ ID NO:214) in Table 1). In a preferred embodiment, the isolated polynucleotides comprise the sequence of SEQ ID NO:5 5′-AAATCAATAAATG-3′, which is a conserved sequence found in the top-performing TEEs. In various preferred embodiments, the isolated polynucleotides are between 13-180, 13-170, 13-160, 13-150, 13-140, 13-130, 13-120, 13-110, 13-100, 13-90, 13-80, 13-70, 13-60, 13-50, 13-40, 13-30, or 13-20 nucleotides in length.
In one embodiment, the TEE is selected from the group consisting of SEQ ID NO:583 (clone 1267), SEQ ID NO:397 (clone 877), SEQ ID NO:54 (clone 100), SEQ ID NO:401 (clone 884), SEQ ID NO:471 (clone 1033), SEQ ID NO:327 (clone 733), SEQ ID NO:398 (clone 878), SEQ ID NO:301 (clone 675), and SEQ ID NO:310 (clone 694). In a further embodiment, the TEE comprises a nucleic acid sequence according to SEQ ID NO:1. This sequence represents a consensus sequence of a subset of 733 (SEQ ID NO:327), 877 (SEQ ID NO:397), 1033 (SEQ ID NO:471), and 1267 (SEQ ID NO:583), and thus is strongly correlated with TEE activity. In further embodiments, the TEE comprise a nucleic acid sequence according to SEQ ID NO:2 or SEQ ID NO:3, which are longer portions of the consensus sequence between 733 (SEQ ID NO:327), 877 (SEQ ID NO:397), 1033 (SEQ ID NO:471), 1267 (SEQ ID NO:583.
5′AT(C/G)GAAT(C/G)(G/A)AA(G/T)(A/G/C)
5′-(A/--)(A/--)(G/A/--)(C/T/--)(G/--)
(G/--)(A/--)(A/--)(T/--)(T/C/--)(--/A/G)
(-/A)AT(C/G)GAAT(C/G)(G/A)AA(G/T)(A/G/C)
5′-(A/--)(A/--)(A/--)(G/C/--)(A/--)
(G/--)(A/--)(A/--)(T/--)(C/--)(A/--)
The “random region” is any DNA sequence of at least 18 nucleotides in length. In one embodiment, the random region is between 18-60 nucleotides in length. The random sequence may be non-naturally occurring, or derived from a naturally occurring source, and may be of any primary sequence.
As used herein, a “cross linking region” is any nucleic acid sequence that can be expressed as RNA, where the expressed RNA can serve as a site for ligation/binding to a linker to form a stable complex between mRNA-ribosome-protein. In a preferred embodiment, expressed RNA from the cross-linking region can serve as a site for ligation to a linker containing a 3′-puromycin residue. In a non-limiting embodiment, the expressed RNA from the cross-linking region can serve as a site for photo-ligation of a psoralen-DNA-puromycin linker (5′-psoralen-(oligonucleotide complementary to linker)-(PEG9)2-A15-ACC-puromycin). In a preferred embodiment, the linker is a DNA linker, and the mRNA expressed from the cross linking region is complementary to the DNA linker sequence to be used.
The “protease cleavage site” can be the cleavage site for any suitable protease to be used in the methods of the invention.
The “unique restriction enzyme recognition site” can be any suitable restriction enzyme recognition site, so long as it is unique to the double stranded construct.
As used herein, “at least 1011 different polynucleotide sequences are represented in the plurality of double stranded nucleic acid constructs” means that the library, in its entirety, contains at least 1011 different polynucleotide sequences that can be tested for peptide binding activity to a target of interest, while each different double stranded nucleic acid construct contains only a single polynucleotide sequence. In various embodiments, at least 1012, 1013, 1014, or 1015 different polynucleotide sequences are represented in the plurality of double stranded nucleic acid constructs.
It will be understood by those of skill in the art that the constructs of the invention may comprise further nucleotide elements as appropriate for a given intended use. In one preferred embodiment, the double stranded nucleic acid constructs further comprise one or more unique restriction sites upstream of the polynucleotide sequence and downstream of the promoter, and one or more unique restriction sites downstream of the polynucleotide sequence
In a second aspect, the present invention provides recombinant double stranded DNA constructs, and nucleic acid libraries that comprise a plurality of the recombinant double stranded DNA constructs, wherein each double stranded DNA construct comprises
(a) a first restriction enzyme recognition site;
(b) one or more translation enhancement elements downstream of the first restriction enzyme recognition site;
(c) a start codon downstream of the one or more translation enhancement elements;
(d) a random region of at least about 18 to about 60 nucleotides immediately downstream from the start codon, wherein the peptide encoded by the random region of each linear recombinant double stranded DNA construct is capable of binding to the same target;
(e) a coding region for a protease cleavage site downstream of the random region; and
(f) a second restriction enzyme recognition site downstream of the protease cleavage site;
wherein at least 10 different random sequences are represented in the plurality of double stranded nucleic acid constructs.
These constructs and libraries can be generated using any techniques, and can be used for identifying peptide ligands for a target of interest, such as disclosed in the methods of the invention. In another embodiment, the library of the first aspect of the invention is incubated with a desired target, washed to remove unbound peptides, and constructs encoding binding peptides to a specific target are amplified by PCR to isolate bound molecules. The linear DNA is restriction digested and cloned into a vector to create the nucleic acid libraries of this second aspect of the invention.
All terms used in this second aspect have the same meaning as used elsewhere herein; similarly, all embodiments of the nucleic acid libraries and components thereof that are disclosed above, and combinations thereof, can be used in the methods of the invention.
In one embodiment, the double stranded DNA constructs comprises plasmids. In another embodiment, the recombinant double stranded DNA constructs further comprises:
(g) a promoter upstream of the first restriction enzyme recognition site; and
(h) a region encoding a peptide purification tag downstream of the second restriction enzyme recognition site.
Any suitable region encoding a peptide purification tag can be used, as will be understood by those of skill in the art, based on the teachings herein. In one non-limiting and exemplary embodiment, the encoded purification tag may comprise streptavidin binding peptide.
The libraries of the third aspect of the invention comprise at least 10 different random sequences represented in the plurality of double stranded nucleic acid constructs. In various preferred embodiments, at least 15, 20, 25, 30, 35, 40, 45, 50, 75, 100, 250, 500, 1000, 2500, 5000, 10,000, 50,000, 100,000, or more different random sequences are represented in the plurality of double stranded nucleic acid constructs
It will be understood by those of skill in the art that the constructs of the invention may comprise further nucleotide elements as appropriate for a given intended use. In one preferred embodiment, the double stranded nucleic acid constructs further comprise one or more unique restriction sites upstream of the polynucleotide sequence and downstream of the promoter, and one or more unique restriction sites downstream of the polynucleotide sequence.
In a third aspect, the present invention provides methods for identifying polypeptide ligands for a target of interest, comprising
(a) contacting the nucleic acid library of any embodiment or combination of embodiments of the second aspect of the invention with reagents for RNA transcription under conditions to promote transcription of RNA from the double stranded nucleic acid constructs, resulting in an RNA expression product;
(b) contacting the RNA expression product with reagents for protein expression under conditions to promote translation of detectable polypeptide;
(c) incubating the detectable polypeptide with a target of interest under suitable conditions to promote binding of the detectable polypeptide to the target, to produce binding complexes; and
(d) analyzing the detectable polypeptides bound to the target.
The methods of the invention can be used, for example, to rapidly identify a plurality of peptides that bind to any target of interest. All terms used in this third aspect have the same meaning as used elsewhere herein; similarly, all embodiments of the nucleic acid libraries and components thereof that are disclosed above, and combinations thereof, can be used in the methods of the invention.
“Analyzing” the detectable polypeptides bound to the target means to make any qualitative or quantitative assessment of the bound polypeptide, including but not limited to determining a fraction of bound polypeptide, determining a binding constant of the bound polypeptide for the target, determining an amino acid sequence of the bound polypeptide, etc. The analyzing may further comprise purifying (partially or completely) bound polypeptide from the target.
The target may any target of interest, including but not limited to proteins, nucleic acids, lipids, polysaccharides, organic molecules, inorganic molecules, metals, polymers, solids, etc.
General conditions for in vitro transcription and translation are well known to those of skill in the art. Similarly, any suitable technique for detectably labeling the expressed polypeptides can be used, including but not limited to radioactive or fluorescent labeling, expressing the polypeptide a fusion protein with a detectable label, etc.
In a further embodiment, the target is immobilized on a solid support during the incubating step. Any suitable solid support can be used, including but not limited to magnetic beads, microarrays, columns, optical fibers, wipes, nitrocellulose, nylon, glass, quartz, diazotized membranes (paper or nylon), silicones, polyformaldehyde, cellulose, cellulose acetate, paper, ceramics, metals, metalloids, semiconductive materials, coated beads, magnetic particles; plastics such as polyethylene, polypropylene, and polystyrene; nanostructured surfaces; nanotubes (such as carbon nanotubes), and nanoparticles (such as gold nanoparticles or quantum dots).
In one embodiment, the target is incubated with an excess of the detectable polypeptide (i.e.: more than 1:1; preferably 1.5:1, 2:1, 3:1, 4:1, 5:1, 6:1, 7:1, 8:1, 9:1, 10:1, or more).
In embodiments where the constructs encode a peptide purification tag, the methods may further comprise passing the translation products through an affinity column with affinity for the peptide purification tag. Any suitable affinity column techniques can be used that permit binding to the peptide purification tag being used in a given method. It is well within the level of skill in the art, based on the teachings herein, to identify an appropriate affinity column technique to be used for a given purpose. In this embodiment, the methods may further comprise releasing isolated peptides from their purification tags, to help isolate the expressed peptide. It is well within the level of skill in the art, based on the teachings herein, to identify an appropriate release technique to be used for a given purpose.
In a further embodiment, the methods may further comprise incubating the in vitro translated peptides with the target of interest to form a second binding complex, and removing unbound in vitro translated peptides. This embodiment helps to further purify the peptide binders of interest. Any suitable technique for removing unbound peptides can be used. In one non-limiting embodiment, removing unbound in vitro translated peptides comprises contacting the binding complexes with a size-limiting membrane, wherein detectable polypeptides bound to the target are retained on the membrane, and unbound polypeptides pass through pores of the membrane. Such membranes may be of any type that possesses suitable pore size, including but not limited to regenerated cellulose.
For example, the separation devices of the invention (see below) can be used for removal of unbound polypeptides. Combining various embodiments, radiolabeled peptides are brought to equilibrium with their cognate target, and the bound fraction is separated from the unbound fraction by passing the mixture through a size-limiting membrane. Peptides that are bound to a given target are retained on the top layer of regenerated cellulose, while unbound peptides are retained on the bottom layer of, for example, nylon. In a further embodiment, following separation, bound peptides can be quantitated using any suitable technique, including but not limited to phosphorimaging.
The methods of the invention provide a means, for example, to rapidly screen peptides identified in the output of an in vitro selection experiment. Traditionally, this was a costly and time consuming process that required generating each peptide by solid phase synthesis and measuring the properties of the peptide by a standard binding technique like SPR.
In a fourth aspect, the present invention provides methods for identifying peptide ligands for a target of interest, comprising
(a) contacting the nucleic acid library of any one embodiment or combination of embodiments of the first aspect of the invention with reagents for RNA transcription under conditions to promote transcription of RNA from the double stranded nucleic acid constructs, resulting in an RNA expression product;
(b) contacting the RNA expression product with reagents for ligating a linker containing a puromycin residue to the 3′ end of the RNA expression product, resulting in a labeled RNA expression product;
(c) contacting the labeled RNA expression product with reagents for protein expression under conditions to promote protein translation from the labeled RNA expression product, resulting in a RNA-polypeptide fusion product;
(d) reverse transcribing the RNA-polypeptide fusion products to produce an RNA-polypeptide fusion product-cDNA heteroduplex;
(e) incubating the RNA-polypeptide fusion product-cDNA heteroduplexes with a target of interest;
(f) removing RNA-polypeptide fusion product-cDNA heteroduplexes that are not bound to the target of interest, resulting in binding complexes; and
(g) amplifying ligand-bound RNA-polypeptide fusion product-cDNA heteroduplexes in the binding complexes, to produce double stranded DNA constructs that can be used to identify the peptide ligands bound to the target of interest.
The methods can be used, for example, to rapidly identify a plurality of peptides that bind to any target of interest. All terms used in this fourth aspect have the same meaning as used elsewhere herein; similarly, all embodiments of the nucleic acid libraries and components thereof that are disclosed above, and combinations thereof, can be used in the methods of the invention.
The target may any target of interest, including but not limited to proteins, nucleic acids, lipids, polysaccharides, organic molecules, inorganic molecules, metals, polymers, solids, etc.
In one embodiment of this fourth aspect, the double stranded DNA constructs comprise:
(a) a first restriction enzyme recognition site;
(b) one or more translation enhancement elements downstream of the first restriction enzyme recognition site;
(c) a start codon downstream of the one or more translation enhancement elements;
(d) a random region of at least about 18 to about 60 nucleotides immediately downstream from the start codon, wherein the peptide encoded by the random region of each linear recombinant double stranded DNA construct is capable of binding to the same target;
(d) a protease cleavage site downstream of the random region; and
(e) a second restriction enzyme recognition site downstream of the protease cleavage site. Any suitable embodiments or combinations thereof of the constructs as described above can be used in the methods of the invention.
General conditions for in vitro transcription and translation, PCR, reverse transcription, and mRNA display techniques are well known to those of skill in the art. Contacting the RNA expression product with reagents for ligating a linker containing a puromycin residue to the 3′ end of the RNA expression product, resulting in a labeled RNA expression product, can be carried out via any suitable method, including photo-crosslinking or Moore-Sharp splint-directed ligation. Any suitable linker may be used. In a preferred embodiment the linker comprises a DNA linker complementary to the transcribed single stranded RNA. The DNA linker may comprise any suitable modifications, including but not limited non-natural residues and pegylation, as can be used in mRNA display.
Similarly, general conditions for incubating the RNA-polypeptide fusion product-cDNA heteroduplexes with a target of interest; removing RNA-polypeptide fusion product-cDNA heteroduplexes that are not bound to the target of interest, resulting in binding complexes; and amplifying ligand-bound RNA-polypeptide fusion product-cDNA heteroduplexes in the binding complexes, to produce double stranded DNA constructs that can be used to identify the peptide ligands bound to the target of interest, are well known to those of skill in the art.
In a further embodiment, the target is immobilized on a solid support during the incubating step. Any suitable solid support can be used, including but not limited to magnetic beads, microarrays, columns, optical fibers, wipes, nitrocellulose, nylon, glass, quartz, diazotized membranes (paper or nylon), silicones, polyformaldehyde, cellulose, cellulose acetate, paper, ceramics, metals, metalloids, semiconductive materials, coated beads, magnetic particles; plastics such as polyethylene, polypropylene, and polystyrene; nanostructured surfaces; nanotubes (such as carbon nanotubes), and nanoparticles (such as gold nanoparticles or quantum dots).
In one embodiment, the target is incubated with an excess of the RNA-polypeptide fusion product-cDNA heteroduplexes (i.e.: more than 1:1; preferably 1.5:1, 2:1, 3:1, 4:1, 5:1, 6:1, 7:1, 8:1, 9:1, 10:1, or more).
In another embodiment, removing RNA-polypeptide fusion product-cDNA heteroduplexes that are not bound to the target of interest comprises incubating the in the presence of a denaturant, including but not limited to guanidine hydrochloride, urea, and heat.
Traditionally, iterative rounds of in vitro selection and amplification are used to identify peptides with low nanomolar affinities to the surface of a given protein target. By combining the high library complexity of mRNA display with stringent washing conditions, we have discovered that high affinity peptides can be discovered without resorting to iterative rounds of selection and amplification. This advance greatly reduces the time required to generate and optimize high quality peptides.
In another embodiment, the methods further comprise cloning the double stranded DNA constructs that encode binders into an expression vector, wherein, after cloning, the vector comprises:
(g) a promoter upstream of the first restriction enzyme recognition site; and
(h) a region encoding a peptide purification tag downstream of the second restriction enzyme recognition site.
These added steps can be used, for example, to rapidly isolate the double stranded DNA constructs that encode peptide binders to a target of interest, and to use the isolated constructs to express the peptides of interest for isolation and identification.
Thus, in a further embodiment, the methods comprise in vitro translation of peptides encoded by the cloned double stranded DNA construct, wherein the peptides are expressed as N-terminal fusions with the peptide purification tag. Any suitable in vitro translation technique can be used. In one embodiment, the in vitro translation comprises use of a detectable amino acid monomer.
In a further embodiment, the methods comprise passing the in vitro translation products through an affinity column with affinity for the peptide purification tag. Any suitable affinity column techniques can be used that permit binding to the peptide purification tag being used in a given method. It is well within the level of skill in the art, based on the teachings herein, to identify an appropriate affinity column technique to be used for a given purpose. In this embodiment, the methods may further comprise releasing isolated peptides from their purification tags, to help isolate the expressed peptide. It is well within the level of skill in the art, based on the teachings herein, to identify an appropriate release technique to be used for a given purpose.
In a further embodiment, the methods may further comprise incubating the in vitro translated peptides with the target of interest to form a second binding complex, and removing unbound in vitro translated peptides. This embodiment helps to further purify the peptide binders of interest. Any suitable technique for removing unbound peptides can be used. In one non-limiting embodiment, removing unbound in vitro translated peptides comprises passing the second binding complex through a size-limiting membrane. Any suitable size-limiting membrane can be used, including but not limited to regenerated cellulose.
Combining various embodiments, radiolabeled peptides are brought to equilibrium with their cognate target, and the bound fraction is separated from the unbound fraction by passing the mixture through a size-limiting membrane. Peptides that are bound to a given target are retained on the top layer of regenerated cellulose, while unbound peptides are retained on the bottom layer of, for example, nylon.
In a further embodiment, following separation, bound peptides can be quantitated using any suitable technique, including but not limited to phosphorimaging.
In a fifth aspect, the present invention provides kits comprising
(a) the nucleic acid library of any embodiment or combination of embodiments of the first aspect of the invention; and
(b) an expression vector, wherein, the expression vector comprises:
Exemplary expression vectors include any embodiment or combination of embodiments of the vectors disclosed in the third aspect of the invention, and in the examples that follow. The library and vectors of the kits may independently be present on a solid surface or free in solution. The library and vectors of the kits may independently be frozen, lyophilized, or in solution.
In a sixth aspect, the present invention provides a separation device, comprising:
(a) a multiwell plate;
(b) a regenerated cellulose layer below the multiwell plate, wherein the regenerated cellulose layer has a pore size suitable to retain peptides bound to a target, but not to retain unbound peptides; and
(c) a nylon membrane layer below the regenerated cellulose layer, wherein the nylon membrane layer has a pore size suitable to retain unbound peptides.
The multiwell plate may comprise any number of wells as deemed appropriate by a user. The multiwell plate is one in which the wells are separated by barriers that allow peptides to pass through but retain proteins. In this way, peptides bound to a target may be retained on the regenerated cellulose layer, and peptides not bound to a target bind to the nylon membrane when passed through the wells of the multi-well plate.
In a seventh aspect, the present invention provides an mRNA pool resulting from transcription of the library of the nucleic acid library of the first aspect or the second aspect of the invention. Such mRNA pools can be used, for example, in the methods of the invention below. Any suitable technique for RNA transcription can be used. In one non-limiting embodiment, the double stranded DNA constructs each comprise a T7 RNA polymerase promoter, and the library is transcribed in vitro using T7 RNA polymerase, using standard techniques. It will be clear to those of skill in the art how to optimize transcription conditions in terms of buffers, nucleotides, salt conditions, etc., based on the general knowledge of in vitro transcription techniques in the art. The resulting mRNA pools will comprise single stranded RNA from all/almost all the double stranded DNA constructs in the library. In a further embodiment of mRNA pools resulting from transcription of the first aspect of the invention, the transcripts in the pooled mRNA comprise a DNA linker, containing a 3′ puromycin residue, ligated at the 3′end of the transcript. In a further aspect, the invention provides pooled mRNA-peptide fusion molecules resulting from in vitro translation of the pooled mRNA. Methods for in vitro translation of RNA transcripts are well known to those of skill in the art. In one non-limiting embodiment, the methods comprise incubating the pooled mRNA with rabbit reticulocyte lysate and 35S-methionine for a suitable time. The method may further comprise incubating the mixture overnight in the presence of suitable amounts of KCl and MgCl2 to promote fusion formation. When the pool of RNA is translated in vitro, the product is an mRNA-peptide fusion molecule. The chemical bond forming step of mRNA display is due to the natural peptidyl transferase activity of the ribosome, which catalyzes the formation of a non-hydrolyzable amide bond between puromycin and the polypeptide chain. In this embodiment, individual RNA polynucleotides in the pool are covalently linked to a random peptide encoded by their random region. In a further embodiment, the RNA polynucleotides in the pool comprise RNA-cDNA heteroduplexes created via reverse transcription, as described in the methods that follow.
We have developed methods, reagents, and device improvements that make it possible to select, sequence, and characterize high affinity peptides in days. This technology is automatable and could be performed in 96- or 384-well format. One specific embodiment of our technology is a custom library design and vector characterization strategy. A second embodiment is the use of a novel bar-coding strategy that is compatible with next-generation deep sequencing. Third, is a stringent selection strategy that reduces the number of selection cycles from many to one. Fourth, is a cell-free characterization process that allows for rapid screening and characterization of individual members without the need for solid-phase synthesis.
These advances make it possible to generate peptides with antibody-like affinity in 3-5 days. The process is amenable to automation and can be performed against tens-to-hundreds of proteins simultaneously. By combing in vitro selection with next-generation deep sequencing, it should be possible to map the ligand binding space for human and all other relevant proteomes.
We have designed an mRNA display library and cell-free peptide expression vector that when used together make it possible to characterize selected peptides in 2-3 days.
This combined library—vector design strategy greatly reduces the time required to screen individual peptides present in the output of a protein selection. Traditionally, this is done by sequencing the selection output, synthesizing representative peptides by solid-phase synthesis, and purifying the polypeptides by HPLC. This is a time consuming process that can easily take 4-6 weeks. Even when the peptides are ordered from a commercial vendor, they can still take 3-4 weeks to receive and generally cost $200-300 per peptide depending on the level of purity requested.
In this specific embodiment, the library design strategy was made compatible with all of the sequence information needed to synthesize large peptide libraries by mRNA display; however, this strategy is general and could be applied to other selection technologies. The library was constructed at the DNA level and contains a T7 promoter for in vitro transcription, followed by a translation enhancing element, followed by an ATG start codon, followed by a random region, followed by protease cleavage site, followed by a restriction digest site and finally a photo-crosslinking site. Using standard mRNA display technology, the DNA library is transcribed into RNA, the RNA is photo-ligated to a short DNA fragment containing a 3′-puromycin residue. The library is translated in vitro to produce a library of peptides, each of which is covalently linked to their encoding RNA sequence. Prior to selection, the RNA portion of the mRNA-peptide fusion is reverse transcribed to create an RNA-cDNA heteroduplex.
The library is then incubated with a desired protein target, washed to remove unbound peptides, and amplified by PCR to isolate bound molecules. The linear DNA is restriction digested and cloned into our custom peptide expression vector. The custom protein expression vector contains a T7 promoter for in vitro transcription, followed by restriction sites that are compatible with the mRNA display library, followed by a peptide purification tag, followed by a PolyA region and finally a T7 terminator site. Individual clones are isolated by transforming the vector into Escherichia coli and picking individual colonies. Colonies are grown-up in LB or other suitable media and mini-prepped to isolate the vector. Each vector then serves as both a template for in vitro peptide expression and DNA sequencing (see
To minimize the possibility of cross-contamination when multiple selections are conducted in parallel, multiple variants of the mRNA display library have been constructed and tested. These libraries are distinguished on the basis of their unique translation enhancing elements. In this way the libraries function almost identically under the same conditions, but can be discriminated by DNA sequencing. Development of these libraries also opens the opportunity for next-generation deep sequencing of multiple selection outputs at the same time. Such experiments make it possible to map the entire ligand binding space for a set of target proteins with very little investment of time or money.
We have developed methods and conditions that make it possible to identify peptides with antibody-like affinities (nM affinities) from a single mRNA display screen.
Traditionally, iterative rounds of in vitro selection and amplification are used to identify peptides with low nanomolar affinities to the surface of a given protein target. In general, in vitro selection technologies like mRNA display and ribosome display yielded higher affinity binders, because the starting libraries used for these technologies are much larger than what is commonly achieved with technologies that require transforming DNA into cells (i.e., cell-surface display or phage display).
By combining the high library complexity of mRNA display with stringent washing conditions, we have discovered that high affinity peptides can be discovered without resorting to iterative rounds of selection and amplification. This advance greatly reduces the time required to generate and optimize high quality peptides.
The first step of the selection is to immobilize the protein target to a solid support, such as a magnetic bead. The protein is then incubated with an excess of the peptide library, constructed as mRNA-peptide fusion molecules using standard mRNA display technology. Once equilibrium is achieved the beads are washed in selection buffer to remove all of the unbound peptide fusions. Next, the beads are incubated with selection buffer that includes denaturants such as guanidine hydrochloride. Subsequent rounds of washing will remove peptide that are weakly bound to the target protein, but retain all high affinity binders. Finally, the cDNA from the bound peptides are amplified using the polymerase chain reaction (PCR) and cloned into our cell-free expression vector.
We have developed methods and devices that allow peptides present in the output of a selection to be rapidly screened and characterized in 1-2 days.
As described previously, peptides present in the output of a selection are typically synthesized by solid-phase synthesis and purified by HPLC. This is a time consuming process that is not easily amenable to high throughput automation and generally requires 3-4 weeks per peptide.
To eliminate this bottleneck, we have developed a custom peptide expression vector that allows peptides present in the output of a selection to be expressed in vitro as N-terminal fusions to a protein affinity tag. Sufficient peptide can be synthesized from less than 10 μL of cell-free expression lysate. Peptide expression is done in the presence of radiolabeled methionine, which allows the peptides to be detected by scintillation counting or phosphorimaging. Once expressed, peptides are purified by passing the crude lysate mixture through an affinity column with affinity to the peptide affinity tag. After washing the column, proteolytic cleavage then releases the peptide of interest from the purification tag. Alternatively, peptides can be recovered by incubating the beads in a suitable buffer like warm water or a competitive binder. Purified peptides are then used directly to evaluate different binders or obtain solution binding affinity (Kd) values for their cognate targets or off-target proteins (see
To determine the binding characteristics of each peptide, we developed a high throughput method and device that allows in vitro generated peptides to be rapidly and quantitatively screened for high affinity binding. With this method, radiolabeled peptides are brought to equilibrium with their cognate protein target, and the bound fraction is separated from the unbound fraction by passing the mixture through a size-limiting membrane. Peptides that are bound to a given target protein are retained on the top layer, while unbound peptides are retained on the bottom layer. Following separation, the amount of peptide on each membrane is quantification by phosphorimaging. The membranes used for our method include nylon, and regenerated cellulose. Regenerated cellulose has not previously been used in this way and therefore constitutes a new device (see
We have validated the methods of the invention using two different peptides that are well characterized in the literature. The T10-39 peptide is a peptide selected to bind thrombin, while SBP is a peptide selected to bind streptavidin.
Peptides were expressed as fusions with a C-terminal affinity binding tag, the streptavidin binding peptide (SBP), using a coupled in vitro transcription/translation (TnT) rabbit reticulocyte lysate (Promega). One microgram of PCR-generate dsDNA was used as template in a 100 μL reaction that was spiked with 35S-Methionine and left to incubate at 30° C. for 90 minutes. Expressed peptides were purified with 100 μL of streptavidin agarose loaded onto a column. The column was equilibrated with phosphate buffer saline (PBS) and the entire TnT lysate was loaded onto the column along with an equal volume of 2×PBS. The peptides were left on the column with shaking for 30 minutes at 4° C. to allow the peptides to bind. The column was then washed with PBS and peptides eluted in one of two ways. Peptides fused to the SBP tag were eluted as the full length construct with deionized water, or constructs containing a protease cleavage site between the peptide of interest and the affinity tag were incubated with the corresponding protease in order to elute the peptide of interest without the affinity tag. Elutions were monitored by liquid scintillation counting to identify the presence of peptides due to the incorporation of 35S-Methionine during translation.
In order to determine the binding affinity of expressed peptides, multiple solutions were prepared that contain a constant amount of peptide and varying concentrations of target protein. These solutions were brought to equilibrium by incubating at 4° C. for one hour. Each solution of peptide and target protein was loaded into one well of a dot blot apparatus. The 96-well dot blot apparatus was prepared by building a stack of membranes that contains one piece of filter paper on the bottom, followed by two pieces of nylon membrane and topped with one piece of dialysis membrane. Once samples were loaded, vacuum was applied to the apparatus, pulling the solutions through the stack of membranes. Each membrane is imaged by phosphorimaging to detect signal from 35S-Methionine, indicating which membranes have bound peptide. Free peptide passes through the dialysis membrane and binds to the nylon, while peptides bound to their target remain on the dialysis membrane once the solution is pulled through. The fraction of bound peptide for each concentration of target protein was used to plot a binding isotherm and determine the binding dissociation constant.
Data from these studies (not shown) demonstrated that the dissociation constants are consistent with literature values for these peptides (Raffler et al. Chemistry & Biology (2003) 10, 69-79; Wilson et al. Proceeding of the Nation Academy of Sciences (2001) 98, 3750-3755.)
AAATCAATAAATG
TAATTCAGCATATAAACAGAACCAAAGACAAAAACCACATGATTATCTCAATAG
This application claims priority to U.S. Provisional Patent Application Ser. No. 61/657,694 filed Jun. 8, 2012, incorporated by reference herein in its entirety.
This invention was made with government support under DK093449 awarded by the National Institute of Health. The government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2013/044731 | 6/7/2013 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
61657694 | Jun 2012 | US |