Ribosomal initiation constitutes a critical step in the protein translation process, allowing the ribosome to locate the correct AUG start site in the RNA message and initiate the transfer of genetic information from RNA into proteins via the genetic code. In eukaryotes, recruitment of the 40S ribosomal subunit to the RNA message occurs by recognition of a 7-methylguanosine cap located at the 5′ end of the mRNA strand. Ribosomal recruitment can also occur by a less common cap-independent mechanism, an example of which is the internal ribosomal entry site (IRES). In many cases, the recruitment site is located some distance upstream of the initiation codon, which poses the question of how the ribosome is able to bypass the intervening sequence. While linear scanning is the dominant model used to explain this process, emerging evidence suggests that transient mRNA-rRNA base pairing may play an important role in the initiation of certain mRNAs. This possibility, and the fact that the genome is routinely and pervasively transcribed into RNA, raise many interesting questions about the role of RNA inside cells and the potential for many unknown protein coding regions.
Discovering translation initiation elements (TIEs), also known as translation enhancing elements (TEEs) in human and other higher order genomes is a challenging problem as computational methods are unable to locate these sequences at the DNA level. This limitation has created a pressing need for new functional tools that can be used to identify and map these sequences in known genomes.
In a first aspect, the present invention provides nucleic acid libraries comprising a plurality of linear recombinant double stranded DNA constructs, wherein each double stranded DNA construct comprises
(a) a promoter;
(b) a heterologous coding region downstream from the promoter, wherein the coding region encodes a detectable polypeptide;
(c) a heterologous cross-linking region downstream of the coding region;
(d) a heterologous polynucleotide sequence of between 20-500 base pairs in length located downstream of the promoter and upstream of the coding region; and
(e) a first PCR primer binding site and a second PCR primer binding site, wherein the first PCR primer binding site is upstream of the polynucleotide sequence and the second PCR primer site is downstream of the polynucleotide sequence;
wherein at least 1013 different polynucleotide sequences are represented in the plurality of double stranded nucleic acid constructs, and wherein the first PCR primer and the second PCR primer are the same for each construct in the plurality of double stranded nucleic acid constructs.
In a second aspect, the present invention provides mRNA pools, comprising mRNA transcripts resulting from transcription of the nucleic acid libraries of the first aspect of the invention.
In a third aspect, the present invention provides methods for identifying translational enhancing elements (TEEs), comprising
(a) contacting the nucleic acid library of the first aspect of the invention with reagents for RNA transcription under conditions to promote transcription of RNA from the double stranded nucleic acid constructs, resulting in an RNA expression product;
(b) contacting the RNA expression product with reagents for ligating a linker containing a puromycin residue to the 3′ end of the RNA expression product, resulting in a labeled RNA expression product;
(c) contacting the labeled RNA expression product with reagents for protein expression under conditions to promote protein translation from the labeled RNA expression product, resulting in a RNA-polypeptide fusion product;
(d) isolating RNA-polypeptide fusion products;
(e) converting the isolated RNA-polypeptide fusion products to cDNA by reverse transcription-PCR using a primer to the 3′ end of the isolated RNA-polypeptide fusion products;
(f) amplifying the cDNA by PCR using primers to the 5′ and 3′ end of the cDNA; and
(g) repeating steps (a)-(f) a desired number of times, wherein the amplified polynucleotide sequence fragments comprise TEEs.
In a fourth aspect, the present invention provides isolated polynucleotides, comprising a nucleic acid sequence according to any one of SEQ ID NOS: 1-5 and 7-645. These polynucleotides have been identified as TEEs using the methods of the present invention.
In a fifth aspect, the present invention provides expression vectors comprising
(a) a promoter;
(b) a heterologous TEE downstream of the promoter, where the TEE comprises a polynucleotide according to the fourth aspect of the invention; and
(c) a cloning site suitable for cloning of an protein-encoding nucleic acid of interest located upstream of the TEE, and downstream of the promoter.
In a sixth aspect, the present invention provides recombinant host cells comprising the expression vector of the fifth aspect of the invention.
In a seventh aspect, the present invention provides methods for protein expression, comprising contacting an expression vector of the fifth aspect of the invention with reagents and under conditions suitable for promoting expression of a polypeptide cloned into the cloning site.
In a first aspect, the present invention provides nucleic acid libraries comprising a plurality of linear recombinant double stranded DNA constructs, wherein each double stranded DNA construct comprises
(a) a promoter;
(b) a heterologous coding region downstream from the promoter, wherein the coding region encodes a detectable polypeptide;
(c) a heterologous cross-linking region downstream of the coding region;
(d) a heterologous polynucleotide sequence of between 20-500 base pairs in length located downstream of the promoter and upstream of the coding region; and
(e) a first PCR primer binding site and a second PCR primer binding site, wherein the first PCR primer binding site is upstream of the polynucleotide sequence and the second PCR primer site is downstream of the polynucleotide sequence;
wherein at least 1013 different polynucleotide sequences are represented in the plurality of double stranded nucleic acid constructs, and wherein the first PCR primer and the second PCR primer are the same for each construct in the plurality of double stranded nucleic acid constructs.
The nucleic acid libraries according to the present invention can be used, for example, in the methods of the invention for performing in vitro selection for the isolation of RNA elements (TEEs, including internal ribosome entry sites (IRESs)) that can mediate cap-independent protein translation. The libraries comprise a series of linear constructs, which, when used in in vitro selection methods as described herein, permit use of a library diversity of at least 1013 different polynucleotide sequences. As described in detail below, the inventors have used the libraries of the present invention to identify a large number of novel TEEs, including a number of IRESs. As used herein, a “library” is a collection of linear double stranded nucleic acid constructs.
As used herein, “heterologous” means that none of the promoter, coding region, genomic fragment, and cross-linking region are normally associated with each other (ie: they are not part of the same gene in vivo), but are recombinantly combined in the construct.
As used herein, a “promoter” is any DNA sequence that can be used to help drive RNA expression of a DNA sequence downstream of the promoter. Suitable promoters include, but are not limited to, the T7 promoter, SP6 promoter, CMV promoter, and vaccinia virus synthetic-late promoter. As will be understood by those of skill in the art, a given double stranded DNA construct may contain more than one promoter, as appropriate for a given proposed use.
As used herein, a “coding region” is any DNA sequence encoding a polypeptide product. As used herein, a “detectable polypeptide” is any polypeptide whose expression can be detected, including but not limited to a fluorescent polypeptide (GFP, BFP, etc.), a member of a binding pair, an affinity tag, etc. The ability to detect the polypeptide greatly facilitates the methods of the invention. Non-limiting examples of such detectable polypeptides include affinity tags, protein DX (Smith et al. (2007) PLoS ONE 2, e467), maltose-binding protein (MBP), streptavadin, glutathionine S-transferase (GST), flagellar protein FlaG (FLAG affinity tag), and myelocytomatosis and viral oncogene homologs (Myc affinity tag).
As used herein, a “cross linking region” is any nucleic acid sequence that can be expressed as RNA, where the expressed RNA can serve as a site for ligation/binding to a linker to form a stable complex between mRNA-ribosome-protein. In a preferred embodiment, expressed RNA from the cross-linking region can serve as a site for ligation to a linker containing a 3′-puromycin residue. In a non-limiting embodiment, the expressed RNA from the cross-linking region can serve as a site for photo-ligation of a psoralen-DNA-puromycin linker (5′-psoralen-(oligonucleotide complementary to linker)-(PEG9)2-A15-ACC-puromycin). In a preferred embodiment, the linker is a DNA linker, and the mRNA expressed from the cross linking region is complementary to the DNA linker sequence to be used.
The polynucleotide sequence can be any suitable length, such as between 20-1000 base pairs. In a preferred embodiment, the polynucleotide sequence is between 20-500 base pairs, and may comprise genomic fragments, such as a representation of an entire or partial genome from an organism of interest, or may comprise synthetic sequences. In embodiments where genomic fragments are used, the genomic fragments may be generated by any appropriate means, including restriction enzyme digestion, shearing, polynucleotide synthesis, etc. Genomic fragments from any suitable organism of interest may be used, including but not limited to human, mammal, fish, reptile, plant, yeast, insect, prokaryotic, bacterial (E. coli, etc.), viral, fungal, and pathogenic organism genomic fragments. In another preferred embodiment, such genomic fragments are obtained from plurality of individual organisms of a single species; in a further embodiment, the plurality of individual organisms of a single species differ in ancestry, age, gender, and/or other characteristics.
The primer binding sites provide regions of known sequence around the polynucleotide sequence of unknown sequence to be tested for TEE activity. Additionally the primer binding sites provide a way to amplify only the polynucleotide sequence back out of the construct as desired. As will be understood by those of skill in the art, any suitable sequence can be used as a primer binding site so long as it can be used to bind a primer of interest. The primer binding site may be immediately adjacent to the polynucleotide sequence, or there may be additional nucleotides present between the primer binding site and the polynucleotide sequence as deemed appropriate for a given purpose.
As used herein, “at least 1013 different polynucleotide sequences are represented in the plurality of double stranded nucleic acid constructs” means that the library, in its entirety, contains at least 1013 different polynucleotide sequences that can be tested for TEE activity, while each different double stranded nucleic acid construct contains only a single polynucleotide sequence. In various embodiments, at least 1014 different polynucleotide sequences or at least 1015 different polynucleotide sequences are represented in the plurality of double stranded nucleic acid constructs.
It will be understood by those of skill in the art that the constructs of the invention may comprise further nucleotide elements as appropriate for a given intended use. In one preferred embodiment, the double stranded nucleic acid constructs further comprise one or more unique restriction sites upstream of the polynucleotide sequence and downstream of the promoter, and one or more unique restriction sites downstream of the polynucleotide sequence. This embodiment provides a further means by which to isolate polynucleotide sequences of interest from the constructs. In a further embodiment, the constructs do not include sequences encoding a 3′ poly(A) tail, or sequences that promote formation of a 5′ cap on the resulting transcript.
In another preferred embodiment, the second (3′) primer binding site is immediately upstream of the coding region in the double stranded nucleic acid construct. In this embodiment, the 3′ primer binding site abuts the coding region when the polynucleotide sequence is upstream of the promoter.
In a second aspect, the present invention provides an mRNA pool resulting from transcription of the library of any embodiment of the first aspect of the invention. Such mRNA pools can be used, for example, in the methods of the invention below. Any suitable technique for RNA transcription can be used. In one non-limiting embodiment, the double stranded DNA constructs each comprise a T7 RNA polymerase promoter, and the library is transcribed in vitro using T7 RNA polymerase, using standard techniques. It will be clear to those of skill in the art how to optimize transcription conditions in terms of buffers, nucleotides, salt conditions, etc., based on the general knowledge of in vitro transcription techniques in the art. The resulting mRNA pools will comprise single stranded RNA from all/almost all the double stranded DNA constructs in the library. In a further embodiment, the transcripts in the pooled mRNA comprise a DNA linker, containing a 3′ puromycin residue, ligated at the 3′ end of the transcript. In a further aspect, the invention provides pooled mRNA-peptide fusion molecules resulting from in vitro translation of the pooled mRNA. Methods for in vitro translation of RNA transcripts are well known to those of skill in the art. In one non-limiting embodiment, the methods comprise incubating the pooled mRNA with rabbit reticulocyte lysate and 35S-methionine for a suitable time. The method may further comprise incubating the mixture overnight in the presence of suitable amounts of KCl and MgCl2 to promote fusion formation. When the pool of RNA is translated in vitro, transcripts that contain a TEE (such as an IRES) in their 5′ UTR would initiate translation and produce an mRNA-peptide fusion molecule; thus, modifying TEE-containing RNAs with a selectable tag. The chemical bond forming step of mRNA display is due to the natural peptidyl transferase activity of the ribosome, which catalyzes the formation of a non-hydrolyzable amide bond between puromycin and the polypeptide chain (
In a third aspect, the present invention provides in vitro methods for identifying translational enhancing elements (TEEs), comprising
(a) contacting the nucleic acid library of any embodiment or combination of embodiments of the first aspect of the invention with reagents for RNA transcription under conditions to promote transcription of RNA from the double stranded nucleic acid constructs, resulting in an RNA expression product;
(b) contacting the RNA expression product with reagents for ligating a linker containing a puromycin residue to the 3′ end of the RNA expression product, resulting in a labeled RNA expression product;
(c) contacting the labeled RNA expression product with reagents for protein expression under conditions to promote protein translation from the labeled RNA expression product, resulting in a RNA-polypeptide fusion product;
(d) isolating RNA-polypeptide fusion products;
(e) converting the isolated RNA-polypeptide fusion products to cDNA by reverse transcription-PCR using a primer to the 3′ end of the isolated RNA-polypeptide fusion products;
(f) amplifying the cDNA by PCR using primers to the 5′ and 3′ end of the cDNA; and
(g) repeating steps (a)-(f) a desired number of times, wherein the amplified polynucleotide sequence fragments comprise TEEs.
The methods of this aspect of the present invention serve to isolate RNA elements that could mediate cap-independent translation (ie: TEEs, including but not limited to IREs). The mechanism-based approach of mRNA display provides an efficient method to systematically and comprehensively survey nucleic acid sequences for all of the possible RNA elements that could initiate translation of uncapped mRNA transcripts. Since IRESs function by a cap-independent mechanism, this selection serves to identify IRESs as well as TEEs that promote cap-independent translation but do not initiate internally. All terms used in this third aspect have the same meaning as used elsewhere herein; similarly, all embodiments of the nucleic acid libraries and components thereof that are disclosed above, and combinations thereof, can be used in the methods of the invention. Thus, for example, each double stranded DNA construct comprises
(a) a promoter;
(b) a heterologous coding region downstream from the promoter, wherein the coding region encodes a detectable polypeptide;
(c) a heterologous cross-linking region downstream of the coding region;
(d) a heterologous polynucleotide sequence of between 20-1000 base pairs in length located downstream of the promoter and upstream of the coding region; and
(e) a first PCR primer binding site and a second PCR primer binding site, wherein the first PCR primer binding site is upstream of the polynucleotide sequence and the second PCR primer site is downstream of the polynucleotide sequence. In one non-limiting embodiment, the heterologous polynucleotide sequences are randomly digested fragments (in various non-limiting embodiments, ranging between 20-1000 nts, 20-750 nts, 20-500 nts; or about 150 nts) of total human DNA. Since the heterologous polynucleotide sequence is located downstream of the promoter and upstream of the coding region.
In the method, step (f) amplifying the cDNA by PCR using primers to the 5′ and 3′ end of the cDNA serves to add sequence information that was lost in steps (a) and (e). In one embodiment, primers to add a promoter (such as a T7 promoter) to the 5′ end and the cross-linking region (such as a photo-crosslinking) site (3′ end) back onto the DNA library are after each round of selection. The sequence of these PCR primers may vary depending on how each library is constructed. The result of this PCR is the fully constructed double stranded nucleic acid construct, which can be used to repeat steps (a)-(f) as desired.
Contacting the RNA expression product with reagents for ligating a linker containing a puromycin residue to the 3′ end of the RNA expression product, resulting in a labeled RNA expression product, can be carried out via any suitable method, including photo-crosslinking or Moore-Sharp splint-directed ligation.
Any suitable linker may be used. In a preferred embodiment the linker comprises a DNA linker complementary to the transcribed single stranded RNA. The DNA linker may comprise any suitable modifications, including but not limited non-natural residues and pegylation, as can be used in mRNA display.
In one preferred embodiment, the polynucleotide sequences in the library comprise genomic fragments; in a further preferred embodiment the starting pool of constructs used in the methods contains at least a 5×-1000× coverage of the genome of interest.
General conditions for in vitro transcription and translation, PCR, reverse transcription, and mRNA display techniques (including contacting an RNA expression product with reagents for ligating a linker containing a puromycin residue to the 3′ end of the RNA expression product), are well known to those of skill in the art. Exemplary such conditions are described above and in the examples that follow. To favor the selection of RNA elements that enhance ribosomal recruitment via a cap-independent mechanism, the pool of RNA transcripts is preferably devoid of a 5′ cap and 3′ poly(A) tail. As will be apparent to those of skill in the art, this can be accomplished, for example, by not including polyT sequences in the DNA template (to avoid poly(A) tail production) and by not providing capping enzymes required for 5′ cap production.
When the pool of RNA is translated in vitro, transcripts that contain a TEE in their 5′ UTR initiate translation and produce an mRNA-peptide fusion molecule; thus, modifying TEE-containing RNAs with a selectable tag. The chemical bond forming step of mRNA display is due to the natural peptidyl transferase activity of the ribosome, which catalyzes the formation of a non-hydrolyzable amide bond between puromycin and the polypeptide chain (
In one non-limiting embodiment, for each round of selection, the dsDNA library was transcribed with an RNA polymerase suitable for the promoter being used, photo-ligated to a psoralen-DNA-puromycin linker (5′-psoralen-oligonucleotide complementary to linker)-(PEG9)2-A15-ACC-puromycin), and translated in vitro by incubating the library with rabbit reticulocyte lysate and 35S-methionine under suitable conditions. mRNA-peptide fusion molecules are reverse transcribed, and can be purified by any suitable means, including but not limited to a two-step procedure on oligo (dT)-cellulose beads (NEB) and Ni-NTA agarose affinity resin (Qiagen). Functional TEEs are recovered by any suitable technique, including but not limited to eluting the column with imidazole, dialyzing the sample into water, and amplifying the cDNA by PCR. The selection progress can be monitored using any suitable technique, including but not limited to determining the fraction of S35-labeled mRNA-peptide fusions that remained on the oligo (dT)/Ni-NTA affinity columns. After a desired number of rounds of selection and amplification, the TEEs can be identified by any suitable means, including but not limited to cloning and sequencing of the amplified DNA constructs.
The selection process (steps (a)-(f)) can be carried out any suitable number of times deemed appropriate to identify TEEs, such as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more times. In one preferred embodiment, at least three selection cycles are carried out, such that step (g) comprises repeating steps (a)-(f) at least two more times, and even more preferably at least 3, 4, 5, 6, 7, 8, 9, or more times.
In one embodiment, the method further comprises testing polynucleotide sequences identified as TEEs for TEE activity in vivo using, for example, the vaccinia system described herein. Any suitable system may be used. In one non-limiting embodiment, a plasmid-based reporter assay that allows coupled transcription and translation to occur in the cytoplasm of human cells was developed (
In a further embodiment, TEE candidate sequences are tested for the ability to initiate internal translation initiation. Any suitable assay for testing internal translation initiation can be used, including but not limited to those disclosed herein. In one non-limiting embodiment, TEE candidate sequences are inserted into a firefly reporter plasmid (F-luc-hp) containing a stable stem-loop structure (ΔG=−58 kcal/mol) to prevent ribosomal scanning (
In a fourth aspect, the present invention provides isolated polynucleotides, comprising a nucleic acid sequence according to any one of SEQ ID NOS: 1-5 and 7-645. In another embodiment, the isolated polynucleotides comprise or consist of a sequence according to one or more of SEQ ID NO: 7-645, listed in Table 1. The isolated polynucleotides listed in the recited tables were all identified as TEEs by the methods of the invention; all are human genomic sequences, and thus can be used, for example, in designing expression vectors for improved translational efficiency of one or more proteins encoded by the vector. In various preferred embodiments, the isolated polynucleotides are between 13-180, 13-170, 13-160, 13-150, 13-140, 13-130, 13-120, 13-110, 13-100, 13-90, 13-80, 13-70, 13-60, 13-50, 13-40, 13-30, or 13-20 nucleotides in length. In a preferred embodiment, the isolated polynucleotides consist of the recited sequence. In a further embodiment, the isolated polynucleotides comprise the sequence of SEQ ID NO:4 (A/-)(A/G)ATC(A/G)(A/G)TAAA(T/C)G, wherein the isolated polynucleotides is between 13-200 nucleotides in length. SEQ ID NO:4 is a consensus sequence found within a number of the TEES (Clones 985 (SEQ ID NO:448), 1092 (SEQ ID NO:495), 1347 (SEQ ID NO:623), 906 (SEQ ID NO:408), 12 (SEQ ID NO:12), 1200 (SEQ ID NO:553), 958 (SEQ ID NO:434), 1011 (SEQ ID NO:458), 459 (SEQ ID NO:214) in Table 1) identified using the methods of the invention. In a preferred embodiment, the isolated polynucleotides comprise the sequence of SEQ ID NO:55′-AAATCAATAAATG-3′, which is a conserved sequence found in the top-performing TEEs as described in the examples that follow. In various preferred embodiments, the isolated polynucleotides are between 13-180, 13-170, 13-160, 13-150, 13-140, 13-130, 13-120, 13-110, 13-100, 13-90, 13-80, 13-70, 13-60, 13-50, 13-40, 13-30, or 13-20 nucleotides in length.
In one embodiment, the polynucleotide is selected from the group consisting of SEQ ID NO:583 (clone 1267), SEQ ID NO:397 (clone 877), SEQ ID NO:54 (clone 100), SEQ ID NO:401 (clone 884), SEQ ID NO:471 (clone 1033), SEQ ID NO:327 (clone 733), SEQ ID NO:398 (clone 878), SEQ ID NO:301 (clone 675), and SEQ ID NO:310 (clone 694). These sequences have been identified as IRESs using the methods disclosed herein. In a further embodiment, the present invention provides isolated polynucleotides comprising a nucleic acid sequence according to SEQ ID NO:1. This sequence represents a consensus sequence of a subset of 733 (SEQ ID NO:327), 877 (SEQ ID NO:397), 1033 (SEQ ID NO:471), and 1267 (SEQ ID NO:583), and thus is strongly correlated with activity. In further embodiments, the isolated polynucleotides comprise a nucleic acid sequence according to SEQ ID NO:2 or SEQ ID NO:3, which are longer portions of the consensus sequence between 733 (SEQ ID NO:327), 877 (SEQ ID NO:397), 1033 (SEQ ID NO:471), 1267 (SEQ ID NO:583.
--)(
--)(
--)(
--)(
--)
--)(
--)(
--)(
--)(
/--)(--/
)
)AT(C/G)
--)(
--)(
--)(
--)(
--)(
--)
--)(
--)(
--)(
--)(A/--)(A/--)(G/A/--)
In a fifth aspect, the present invention provides expression constructs comprising:
(a) a promoter;
(b) a heterologous translational initiation element (TEE) downstream of the promoter, where the TEE comprises or consists of a sequence according to any one of SEQ ID NO:1-5 and 7-645; and
(c) a polylinker suitable for cloning of an open reading frame of interest located upstream or downstream of the TEE, and downstream of the promoter.
In this aspect, the invention provides constructs comprising the TEEs of the invention that are positioned relative to the polylinker (ie: one or more unique restriction sites to facilitate cloning) to increase translational efficiency of any polynucleotide coding region cloned into the polylinker. In a preferred embodiment, the TEE is between 13-500 nucleotides in length; in a more preferred embodiment, between 13 and 200 nucleotides in length. In a preferred embodiment, the polylinker is located downstream of the TEE. Any suitable coding region for which an increase in translational efficiency is desired can be cloned into the vector. Thus, in a further embodiment, the construct comprises a polynucleotide coding region cloned into the polylinker. In a further preferred embodiment, the TEE comprises or consists of the sequence of any one or more of SEQ ID NOS:1-5, 448, 495, 623, 408, 12, 553, 434, 458, 214, 327, 397, 471, and 583. In a further preferred embodiment, the TEE comprises or consists of the sequence of any one or more of 583 (clone 1267), SEQ ID NO:397 (clone 877), SEQ ID NO:54 (clone 100), SEQ ID NO:401 (clone 884), SEQ ID NO:471 (clone 1033), SEQ ID NO:327 (clone 733), SEQ ID NO:398 (clone 878), SEQ ID NO:301 (clone 675), and SEQ ID NO:310 (clone 694). These sequences have been identified as IRESs using the methods disclosed herein. Suitable promoters include, but are not limited to, the T7 promoter, SP6 promoter, CMV promoter, and vaccinia virus synthetic-late promoter. The constructs in this aspect of the invention may be linear constructs, or may be part of an expression vector, such as a plasmid or viral-based expression vector as are known in the art. As will be apparent to those of skill in the art, the constructs may contain any other components as desired by a user, such as origins of replication, selection markers, etc.
In a sixth aspect, the present invention provides recombinant host cell comprising an expression vector of any embodiment or combination of embodiments of the fifth aspect of the invention. Such host cells can be used, for example, to prepare large amounts of the expression vector and to provide for expression of the encoded proteins in the host cells. Any suitable host cell may be used, including but not limited to bacterial and eukaryotic host cells, including but not limited to mammalian and human cells.
In a seventh aspect, the present invention provides methods for protein expression, comprising contacting an expression construct according to any embodiment or combination of embodiments of the fifth aspect of the invention, wherein the construct comprises a polynucleotide coding region cloned into the polylinker, with reagents and under conditions suitable for promoting expression of the polypeptide encoded by the polynucleotide coding region. It is within the level of skill in the art to choose appropriate reagents and conditions for RNA expression from the expression construct, followed by translation of the encoded polypeptide. Exemplary reagents and conditions are described in the examples that follow. The methods of this aspect of the invention may be carried out in vitro or in vivo.
Unless clearly dictated otherwise by the context, all embodiments of any aspect of the invention may be combined with other embodiments of the same and different aspects.
Internal ribosomal entry sites (IRESs) are RNA elements located in the untranslated region of mRNA transcripts that initiate protein synthesis independent of the canonical 5′ cap. To date, only a handful of IRESs have been identified in higher order genomes. Here, we have applied a mechanism-based approach to search the entire human genome for RNA sequences with IRES activity. Starting from a library of >1013 human RNA fragments, we performed iterative cycles of mRNA display to capture leader sequences that mediate cap-independent translation. The selected sequences are distributed throughout the genome, and often occur in repetitive regions with high conservation to mammals. We observed strong cis-regulatory activity for more than 200 sequences tested in a monocistronic translation-enhancing assay. The most active sequences function as potent IRESs in vitro and in human cells. These results demonstrate the power of mRNA display as a genome-wide tool for identifying functional IRESs.
Initiation is a critical step in protein translation, allowing the ribosome to locate the translation start site in the RNA message and initiate the transfer of genetic information from RNA into protein via the genetic code. In eukaryotes, the 43S ribosomal pre-initiation complex (PIC) is recruited to the RNA message by recognition of the eIF4F cap-binding complex bound to a 7-methylguanosine cap located at the 5′ end of the mRNA strand (1, 2). A subset of leader sequences known as internal ribosomal entry sites (IRESs) can bypass the 5′ cap structure by recruiting the ribosome to internal positions in the 5′ untranslated region (5′ UTR) (3-7). IRESs play an important role in gene regulation by allowing essential proteins to be synthesized when normal cap-dependent translation is compromised (8). This can occur during regular cellular processes like mitosis and apoptosis (9, 10), as well as during hypoxia (11), viral infection (12), or during states of cellular dysregulation (13).
Ribosomal profiling, a technique that combines polysome fractioning with DNA microarrays, has been employed to profile cellular translation under conditions that impede normal cap-dependent translation (14). Data from these studies suggest that the human genome likely contains many more IRESs than previously thought; however, only a few human IRESs have been characterized in detail. These studies further suggest that cellular systems may possess mechanisms to support the coordinated regulation of specific IRES subtypes, as different physiological conditions gave rise to different IRES subsets. Despite a wealth of useful information gained by ribosomal profiling, this approach suffers from limited resolution and sequence accuracy, as well as an inability to distinguish stalled ribosomes from actively translating ribosomes. While continued technological advancement could circumvent some of these problems, thorough investigation of the human genome would require exhaustive sampling of countless conditions and cell types. This limitation has created a need for new molecular tools that can be used to identify human IRESs on a genome-wide scale (15).
To identify IRESs encoded in the human genome, we devised an in vitro selection strategy for the isolation of RNA elements that could mediate cap-independent translation. We reasoned that the mechanism-based approach of mRNA display provided an efficient method to systematically and comprehensively survey the entire human genome for all of the possible RNA elements that could initiate translation of uncapped mRNA transcripts (16). Since IRESs function by a cap-independent mechanism, it was hypothesized that this selection would lead to the discovery of human IRESs as well as human translation enhancing elements that promote cap-independent translation but do not initiate internally. In this scheme (
We started the selection with an RNA-DNA-puromycin library that contained >1013 sequences, which provided 100-1000-fold coverage of the human genome. We translated the library for 1 hour at 30° C. in nuclease treated reticulocyte lysate and fusion formation was promoted by incubating the mixture overnight at −20° C. in the presence of 600 mM KCl and 75 mM MgCl2. mRNA-peptide fusions were isolated from the crude lysate by affinity purification on an oligo-(dT) resin, and the elution fractions were applied to Ni-NTA agarose beads. The Ni-NTA beads were thoroughly washed to remove RNA sequences that did not form mRNA-peptide fusions or did not initiate in the correct reading frame. mRNA-peptide fusions that remained bound to the column were selectively eluted with imidazole, exchanged into buffer, reverse-transcribed, and amplified by PCR to reinitiate the selection cycle described above. We monitored the selection progress by following the proportion of S35-labeled mRNA-peptide fusions that remained in the pool after purification. The abundance of mRNA-peptide fusions increased up to round 5 and plateaued in round 6, indicating that the library became dominated by RNA elements that could enhance cap-independent translation (
We cloned and sequenced 712 members from round 6. Of these, 639 were non-redundant, indicating that the library contained significant sequence diversity even after six rounds of mRNA display (Table S1). Each non-redundant sequence was aligned to the human reference genome (hg18) using the UCSC BLAT web-tool (18). A subset of 229 sequences showed 100% identity to 1814 genomic locations. These sites are distributed across all 24 human chromosomes with ˜34% occurring in the intronic regions of known genes (
We examined their evolutionary conservation using the 44-species UCSC alignments (
Because many of the perfectly matched sequences mapped to multiple genomic locations, we compared the distribution of repetitive elements found in the starting library to that of all round 6 sequences. The distribution of repetitive elements in the starting library is similar to the distribution obtained by random computational sampling (
We chose the set of 229 perfectly matched sequences and a set of 15 high homology sequences for functional characterization in human cells. Testing large numbers of sequences in cells presents a challenging problem as traditional assays are often complicated by splicing events that can occur during nuclear transcription and export (24). To test sequences under conditions that are not subject to nuclear processing, we developed a plasmid-based reporter assay that allows coupled transcription and translation to occur in the cytoplasm of human cells (
To test whether the selected sequences were capable of internal translation initiation, we inserted the top 9 sequences from the monocistronic assay into a firefly reporter plasmid (F-luc-hp) containing a stable stem-loop structure (ΔG=−58 kcal/mol) to prevent ribosomal scanning (
Many well-characterized IRESs contain AUG triplets in their 5′ UTR that are expected to impede ribosomal scanning (2). Likewise, the human in vitro selected sequences identified in round 6 also have an abundance of AUG triplets. How is it then that a given AUG codon is selected as a start site when multiple options are present? One might expect a priori that AUGs in good sequence context would lead to more efficient translation initiation; however, only 1 out of 657 AUG codons observed in the 229 sequences contains a Kozak motif (Fig. S2) (27). To investigate this question, we selected ten sequences with a range of monocistronic activity and AUG triplet patterns, and examined their relative translation initiation efficiency and start site usage in vitro. This analysis revealed a number of striking observations (
To determine what role, if any, upstream AUG codons could play in ribosomal recruitment, we removed the in-frame, out-of-frame, and all AUG triplets from a high activity sequence (HGL6.877). Mutation of the AUG triplets had a strong negative impact on the translation initiation efficiency of all HGL6.877 variants (
Our results represent the first example of mRNA display as a genome-wide tool for identifying cap-independent translation initiation elements in the human genome. The in vitro selected IRESs characterized here represent novel regulatory elements that were previously hidden in the human genome. The general scheme used to identify these sequences is readily adaptable to other organisms and translation initiation mechanisms, and the versatility of the in vitro protocol makes it possible to explore ribosomal translation under a variety of conditions. We suggest that further discovery of additional cis-regulatory elements will advance our understanding of genome structure and function, and the biological role that IRESs play in the human genome.
Library Assembly and mRNA Display Selection
The human DNA library was provided by the Szostak laboratory18. This library was modified by PCR to add the genetic information necessary for performing mRNA display31. For each round of selection, the dsDNA library was transcribed with T7 RNA polymerase, photo-ligated to a psoralen-DNA-puromycin linker (5′-psoralen-TAGCCGGTG-(PEG9)2-A15-ACC-puromycin) (SEQ ID NO:6), and translated in vitro by incubating the library (1 nmol) with rabbit reticulocyte lysate and 35S-methionine for 1 hour at 30° C. Fusion formation was promoted by incubating the mixture overnight at −20° C. in the presence of KCl (600 mM) and MgCl2 (75 mM). The mRNA-peptide fusion molecules were reverse transcribed, and purified by a two-step procedure on oligo (dT)-cellulose beads (NEB) and Ni-NTA agarose affinity resin (Qiagen). Functional TEEs were recovered by eluting the column with imidazole, dialyzing the sample into water, and amplifying the cDNA by PCR. The selection progress was monitored by determining the fraction of S35-labeled mRNA-peptide fusions that remained on the oligo (dT) Ni-NTA affinity columns. After 6 rounds of selection and amplification, the dsDNA library was cloned and sequenced.
A monocistronic luciferase reporter vector (pT7_v_<TEE>_FLuc) that contains both a T7 and a vaccinia virus synthetic late promoter was constructed by modifying a pT3-R-luc<IRES>F-luc(pA)62 luciferase reporter plasmid provided by the Doudna laboratory (Gilbert et al., 2007),32. HeLa and HEK-293 cells were seeded at a density of 15,000 cells per well in white 96-well plates 18 hours prior to transfection. Cells were transfected with a complex of the reporter plasmid (200 ng) and Lipofectamine 2000 (0.5 μl) in Opti-MEM (Invitrogen), and immediately infected with the Copenhagen strain (VC-2) of WT vaccinia virus at a multiplicity of infection (m.o.i) of 5 PFU/cell (
A portion of the cells used in the transfect-infect study was separately lysed to evaluate the quality of the cellular RNA. Isolated RNA was reverse transcribed with Superscript II (Invitrogen), and realtime PCR was used to determine the mRNA levels of luciferase relative to the housekeeping gene hypoxanthine-guanine phospho-ribosyltransferase (HPRT). Using the ΔΔCt method, the amount of luciferase mRNA was normalized to HPRT mRNA levels. In addition, the length of luciferase mRNA was determined using PCR to analyze the relative proportion of the 5′- and 3′-ends of representative cDNA molecules.
The 13-nucleotide core motif was assayed for activity by constructing five luciferase reporter constructs in which the 13-mer motif was either added to the 5′ end of a low activity TEE (clones 499, 646 and 347) or deleted from the 5′ end of a high activity TEE (clones 1092 and 1347). HGL sequences 1092 and 1347 were regenerated with the 13-nucleotide deletion by Klenow DNA polymerase extension followed by a restriction enzyme digest with BamHI and NcoI. The digested fragments were then ligated into the luciferase reporter plasmid pT7_v_<TEE>_FLuc. The insertion constructs were generated by overlap PCR, and then digested and ligated into the reporter plasmid. Translation enhancement of the modified sequences was assessed using the transfect/infect assay in HeLa cells. Sequences 1347 and 499 were additionally characterized in BSC40, RK13, BHK and 129SV cells.
Bioinformatics analysis was used to analyze 143 sequences from the naïve library, and 709 sequences isolated after six rounds of in vitro selection. The genomic locations of all non-redundant sequences were determined using the BLAT webtool to map each sequence to the human reference genome (hg18)21. This analysis revealed that 75 sequences from the naïve pool and 227 sequences from the round 6 pool matched with perfect sequence identity to the human reference genome. The program RepeatMasker was used to classify the selected sequences into specific repeat families33. By randomly selecting 10,000 genomic locations, we generated the null expectation for the fraction of sequence motifs of length 200 nucleotides to overlap a repeat family. This number was 45.7% and was not statistically-significantly different from that observed for Round-0 sequences. However, the null hypothesis for TEEs is rejected at P<10-6 indicating that TEEs are significantly enriched in their involvement with repeat families.
AAATCAATAAATG
TAATTCAGCATATAAACAGAACCAAAGACAAAAACCACAT
This application claims priority to U.S. Provisional Patent Application Ser. No. 61/365,133 filed Jul. 16, 2010, incorporated by reference herein in its entirety.
This work was funded in part by NIH Eureka Award GM085530. The U.S. government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US11/44198 | 7/15/2011 | WO | 00 | 3/29/2013 |
Number | Date | Country | |
---|---|---|---|
61365133 | Jul 2010 | US |