The present application is being filed along with a Sequence Listing in electronic format. The Sequence Listing is provided as a file entitled UCSD089.001C1 Substitute .TXT, created Jun. 8, 2017, which is 11 Kb in size. The information in the electronic format of the Sequence Listing is incorporated herein by reference in its entirety.
Methods and compositions for identifying RNAs which interact with one another in a cell are provided.
Currently, there are no efficient methods that can directly assay substantially all RNA-RNA interactions in a cell type at once. There are two kinds of methods which exist to partially achieve this goal, both with weaknesses. Technologies like HITS-CLIP and CLASH can detect targets of many miRNAs. However, both methods concentrate on miRNAs, which only comprise a small portion of RNAs. Thus, these technologies are not able to reveal the majority of RNA-RNA interactions. Furthermore, each technology has additional weaknesses. For example, direct pairing of a miRNA to their target mRNAs cannot be directly deduced from HITS-CLIP. In other words, HITS-CLIP does not directly inform which miRNA regulates which mRNAs (no one-to-one information).
A recent method called CLASH (cross-linking, ligation, and sequencing of hybrids) could allow direct observation of miRNA-target pairs. However, the number of interactions is still small as compared to the number of sequencing reads: only 2% of sequenced reads are chimeric, 98% are still single reads. This requires much deeper sequencing coverage or preparation of multiple samples to obtain enough coverage of miRNA-mRNA interactions.
Some embodiments of the present invention are provided in the following numbered paragraphs:
1. A method for generating chimeric RNAs comprising RNAs which interact with one another in a cell comprising cross-linking RNA to protein and ligating RNAs cross-linked to the same protein molecule together to form a chimeric RNA.
2. The method of Paragraph 1, wherein said cross-linking of RNA to protein is performed on an intact cell or in a cell lysate.
3. The method of any one of Paragraphs 1 or 2 wherein said cross-linking comprises UV cross-linking.
4. The method of any one of Paragraphs 1-3, further comprising associating said protein with an agent which facilitates immobilization of said protein on a surface.
5. The method of Paragraph 4, wherein said agent which facilitates immobilization comprises biotin.
6. The method of any one of Paragraphs 1-5, further comprising fragmenting said RNAs cross-linked to the same protein molecule.
7. The method of Paragraph 6, wherein said fragmenting comprises contacting said RNAs cross-linked to the same protein molecule with an RNAse under conditions which facilitate partial digestion of said RNAs.
8. The method of any one of Paragraphs 1-7, further comprising linking said RNAs cross-linked to the same protein molecule to an agent which facilitates recovery of said RNAs.
9. The method of Paragraph 8, wherein said linking comprises ligating the ends of said RNAs to said agent.
10. The method of Paragraph 9, wherein said agent facilitates recovery of said RNAs comprises a nucleic acid.
11. The method of Paragraph 10, wherein said nucleic acid comprises a nucleic acid having biotin thereon.
12. The method of Paragraph 11, wherein said linking of said nucleic acid having biotin thereon to said ends of said RNAs comprises ligating said nucleic acid having biotin thereon to the 5′ ends of said RNAs prior to ligating said RNAs cross-linked to the same protein molecule together to form a chimeric RNA.
13. The method of Paragraph 12, further comprising removing said biotin from the 5′ region of said chimeric RNA.
14. The method of any one of Paragraphs 1-13, further comprising recovering said chimeric RNAs.
15. The method of any one of Paragraphs 1-14, further comprising fragmenting said chimeric RNAs.
16. The method of any one of Paragraphs 1-15, wherein said fragmenting of said chimeric RNAs comprises contacting said chimeric RNAs with an RNAse under conditions which facilitate partial digestion of said RNAs.
17. The method of any one of Paragraphs 1-16, further comprising reverse transcribing said chimeric RNAs to generate a chimeric cDNA.
18. The method of any one of Paragraphs 1-17, further comprising determining at least a portion of the sequences in said chimeric RNAs or chimeric cDNAs which originate from each of the RNAs in said chimeric RNAs or chimeric cDNAs.
19. The method of any one of Paragraphs 1-17, further comprising identifying the RNAs present in said chimeric RNAs, thereby identifying RNAs which interact with one another in a cell.
20. The method of Paragraph 19, wherein at least 100, at least 500, at least 1000 or more than 1000 RNA-RNA interactions in the cell are identified.
21. The method of Paragraph 19, wherein substantially all of the RNAs which interact with one another in a cell are identified.
22. The method of Paragraph 21, wherein at least 70%, at least 80%, at least 90% or more than 90% of the direct RNA-RNA interactions in the cell are identified.
23. The method of any one of Paragraphs 19-22, wherein the identification of the RNAs which interact with one another in a cell comprises performing sequence reads on said chimeric RNAs using an automated sequencing device.
24. The method of Paragraph 23, wherein said identification of the RNAs which interact with one another in a cell comprises identifying the chimeric sequences from all the sequence reads.
25. The method of any one of Paragraphs 19-24, further comprising transforming the chimeric RNAs into annotated RNA clusters using a computer.
26. The method of Paragraph 25, further comprising identifying direct interactions among said RNA clusters using a statistical test performed by a computer.
27. An isolated complex comprising a chimeric RNA cross-linked to a protein, wherein said chimeric RNA comprises RNAs which interact with one another in a cell.
28. A method for identifying a candidate therapeutic agent comprising:
29. The method of Paragraph 28, wherein said agent comprises a nucleic acid.
30. The method of Paragraph 28, wherein said agent comprises a chemical compound.
31. A method of making a pharmaceutical comprising formulating an agent identified using the method of any one of Paragraphs 28-30 in a pharmaceutically acceptable carrier.
32. A pharmaceutical made using the method of Paragraph 31.
33. A method for generating chimeric RNAs comprising RNAs which interact with one another in a cell comprising cross-linking RNA to protein intermediates and/or a protein complex and ligating RNAs cross-linked to protein intermediates and/or the protein complex together to form a chimeric RNA, and wherein the protein complex comprises two or more interacting proteins.
34. The method of Paragraph 33, wherein said cross-linking of RNA to the protein intermediates and/or the protein complex is performed on an intact cell or in a cell lysate.
35. The method of any one of Paragraphs 33 or 34 wherein said cross-linking comprises UV cross-linking.
36. The method of any one of Paragraphs 33-35, further comprising associating said protein intermediates and/or the protein complex with an agent which facilitates immobilization of said protein intermediates and/or the protein complex on a surface.
37. The method of Paragraph 36, wherein said agent which facilitates immobilization comprises biotin.
38. The method of any one of Paragraph s 33-37, further comprising fragmenting said RNAs cross-linked to the at least one protein molecule.
39. The method of Paragraph 38, wherein said fragmenting comprises contacting said RNAs cross-linked to the protein intermediates and/or the protein complex with an RNAse under conditions which facilitate partial digestion of said RNAs.
40. The method of any one of Paragraph s 33-39, further comprising linking said RNAs cross-linked to the protein intermediates and/or the protein complex to an agent which facilitates recovery of said RNAs.
41. The method of Paragraph 40, wherein said linking comprises ligating the ends of said RNAs to said agent.
42. The method of Paragraph 41, wherein said agent which facilitates recovery of said RNAs comprises a nucleic acid.
43. The method of Paragraph 42, wherein said nucleic acid comprises a nucleic acid having biotin thereon.
44. The method of Paragraph 43, wherein said linking of said nucleic acid having biotin thereon to said ends of said RNAs comprises ligating said nucleic acid having biotin thereon to the 5′ ends of said RNAs prior to ligating said RNAs cross-linked to the protein intermediates and/or the protein complex together to form a chimeric RNA.
45. The method of Paragraph 44, further comprising removing said biotin from the 5′ region of said chimeric RNA.
46. The method of any one of Paragraph s 33-45, further comprising recovering said chimeric RNAs.
47. The method of any one of Paragraph s 33-46, further comprising fragmenting said chimeric RNAs.
48. The method of any one of Paragraph s 33-47, wherein said fragmenting of said chimeric RNAs comprises contacting said chimeric RNAs with an RNAse under conditions which facilitate partial digestion of said RNAs.
49. The method of any one of Paragraph s 33-48, further comprising reverse transcribing said chimeric RNAs to generate a chimeric cDNA.
50. The method of any one of Paragraph s 33-49, further comprising determining at least a portion of the sequences in said chimeric RNAs or chimeric cDNAs which originate from each of the RNAs in said chimeric RNAs or chimeric cDNAs.
51. The method of any one of Paragraph s 33-49, further comprising identifying the RNAs present in said chimeric RNAs, thereby identifying RNAs which interact with one another in a cell.
52. The method of Paragraph 51, wherein at least 100, at least 500, at least 1000 or more than 1000 RNA-RNA interactions in the cell are identified.
53. The method of Paragraph 51, wherein substantially all of the RNAs which interact with one another in a cell are identified.
54. The method of Paragraph 53, wherein at least 70%, at least 80%, at least 90% or more than 90% of the direct RNA-RNA interactions in the cell are identified.
55. The method of any one of Paragraph s 51-54, wherein the identification of the RNAs which interact with one another in a cell comprises performing sequence reads on said chimeric RNAs using an automated sequencing device.
56. The method of Paragraph 55, wherein said identification of the RNAs which interact with one another in a cell comprises identifying the chimeric sequences from all the sequence reads.
57. The method of any one of Paragraph s 51-56, further comprising transforming the chimeric RNAs into annotated RNA clusters using a computer.
58. The method of Paragraph 57, further comprising identifying direct interactions among said RNA clusters using a statistical test performed by a computer.
59. The method of any one of Paragraph s 33-58, wherein said RNAs which interact with each other in the cell are cross-linked to different proteins in said protein intermediate or protein complex.
60. An isolated complex comprising a chimeric RNA cross-linked to protein intermediates and/or a protein complex, wherein said chimeric RNA comprises RNAs which interact with one another in a cell, wherein the protein complex comprises two or more interacting proteins.
61. The isolated complex of Paragraph 59, wherein said chimeric RNA comprises RNAs which are cross-linked to different proteins in said protein intermediate or protein complex
In the description that follows, a number of terms are used extensively. The following definitions are provided to facilitate understanding of the present alternatives.
As used herein, “a” or “an” may mean one or more than one.
As used herein, the term “about” indicates that a value includes the inherent variation of error for the method being employed to determine a value, or the variation that exists among experiments.
“Ribonucleic acid”, “RNA,” as described herein refers to a nucleic acid that is a polymeric molecule that is implicated in its roles in coding, decoding, regulation, and expression of genes. In some embodiments described herein, the RNA can play an active role within cells by catalyzing biological reactions, controlling gene expression, or sensing and communicating responses to cellular signals. There are several types of RNA. Without being limiting, RNA can include, for example, messenger RNA (mRNA), lincRNA, transposon RNA, pseudoRNA, regulatory RNA, small nuclear RNA (snRNA), small nucleolar RNAs (snoRNA), double stranded RNA, long non coding RNA (long ncRNA or lncRNA), microRNA (miRNAs), short interfering RNAs (siRNAs), Piwi-interacting RNAs (piRNAs), and other types of short RNAs. In some embodiments, a method for generating chimeric RNAs comprising RNAs which interact with one another in a cell is provided. The method can include cross-linking RNA to protein and ligating RNAs cross-linked to the same protein molecule together to form a chimeric RNA. In some embodiments, the RNA is messenger RNA (mRNA), regulatory RNA, small nuclear RNA (snRNA), small nucleolar RNAs (snoRNA), double stranded RNA, long non coding RNA (long ncRNA or lncRNA), microRNA (miRNAs), short interfering RNAs (siRNAs), Piwi-interacting RNAs (piRNAs), or other types of short RNAs known to those skilled in the art.
“Chimeric RNA” as described herein, refers to an RNA complex in which the RNA complex comprises ligated RNAs that are ligated to a same protein molecule and the RNAs are ligated to one another to form this chimeric RNA. In some embodiments, a method for generating chimeric RNAs comprising RNAs which interact with one another in a cell is provided. The method can include cross-linking RNA to protein and ligating RNAs cross-linked to the same protein molecule together to form a chimeric RNA. In some embodiments, the RNA is messenger RNA (mRNA), regulatory RNA, small nuclear RNA (snRNA), double stranded RNA, long non coding RNA (long ncRNA or lncRNA), microRNA (miRNAs), short interfering RNAs (siRNAs), Piwi-interacting RNAs (piRNAs), small nucleolar RNAs (snoRNAs) or other types of short RNAs known to those skilled in the art. In some embodiments, an isolated complex is provided, wherein the isolated complex comprises a chimeric RNA cross-linked to a protein, wherein said chimeric RNA comprises RNAs which interact with one another in a cell.
“Cross-linking,” or “Cross-linked” as described herein, refers to a bond that can link one polymer to another polymer. The cross-linking can occur through covalent or ionic bonds. In some embodiments, RNA is cross-linked to protein by UV induced cross-linking. Irradiation of protein-nucleic acid complexes (a complex comprising protein and nucleic acid, intermediate proteins and nucleic acid or a protein complex and nucleic acid) with ultraviolet light can cause covalent bonds to form between the nucleic acid and proteins that are in close contact with the nucleic acid. In some embodiments herein, RNA is cross-linked to protein by UV radiation.
Cross-linking can also be performed by using a linker as well as other cross-linking methods known to those skilled in the art. In some embodiments, cross-linking can occur by using a probe to link proteins together as well as other cross-linking methods known to those skilled in the art. Cross-linking can be used in synthetic polymer chemistry as well as in the biological sciences. Cross-links can be formed by chemical reactions that are initiated by a variety of conditions. Without being limiting, cross-linking can be initiated, for example by heating, change in pressure, change in pH, UV light, electron beam exposure, gamma radiation and/or other types of radiation known to one skilled in the art. Additionally, cross-linking can also be induced by cross-linking reagents resulting in a chemical reaction that leads to cross-links between two polymers. In some embodiments described herein, the cross-linking is initiated by heat, change in pressure, change in pH, UV light, electron beam exposure, gamma radiation and/or other types of radiation known to those skilled in the art.
Cross-linking reagents can include but is not limited to Amine-to-Amine Cross-linkers, Sulfhydryl-to-Sulfhydryl Cross-linkers, Amine-to-Sulfhydryl Cross-linkers, Sulfhydryl-to-Carbohydrate Cross-linkers, Photoreactive Cross-linkers, Chemoselective Ligation Cross-linking Reagents, In vivo cross-linking reagents and Carboxyl-to-Amine Cross-linkers. In some embodiments, the cross-linking reagent comprises formaldehyde, DSG (disuccinimidyl glutarate), DSS (disuccinimidyl suberate), BS3 (bis(sulfosuccinimidyl)suberate), TSAT (tris-(succinimidyl)aminotriacetate), BS(PEG)5 (PEGylated bis(sulfosuccinimidyl)suberate), BS(PEG)9 (PEGylated bis(sulfosuccinimidyl)suberate), DSP (dithiobis(succinimidyl propionate)), DTSSP (3,3′-dithiobis(sulfosuccinimidyl propionate)), DST (disuccinimidyl tartrate), BSOCOES (bis(2-(succinimidooxycarbonyloxy)ethyl)sulfone), EGS (ethylene glycol bis(succinimidyl succinate)), Sulfo-EGS (ethylene glycol bis(sulfosuccinimidyl succinate)), DMA (dimethyl adipimidate), DMP (dimethyl pimelimidate), DMS (dimethyl suberimidate), DTBP (Wang and Richard's Reagent), DFDNB (1,5-difluoro-2,4-dinitrobenzene), BMOE (bismaleimidoethane), BMB (1,4-bismaleimidobutane), BMH (bismaleimidohexane), TMEA (tris(2-maleimidoethyl)amine), BM(PEG)2 (1,8-bismaieimido-diethyleneglycol), BM(PEG)3 (1,11-bismaleimido-triethyleneglycol), DTME (dithiobismaleimidoethane), SIA (succinimidyl iodoacetate), SBAP (succinimidyl 3-(bromoacetamido)propionate), STAB (succinimidyl (4-iodoacetyl)aminobenzoate), Sulfo-SIAB (sulfosuccinimidyl (4-iodoacetyl)aminobenzoate), AMAS (N-α-maleimidoacet-oxysuccinimide ester), BMPS (N-β-maleimidopropyl-oxysuccinimide ester), GMBS (N-γ-maleimidobutyryl-oxysuccinimide ester), Sulfo-GMBS (N-γ-maleimidobutyryl-oxysulfosuccinimide ester), MBS (m-maleimidobenzoyl-N-hydroxysuccinimide ester), Sulfo-MBS (m-maleimidobenzoyl-N-hydroxysulfosuccinimide ester), SMCC (succinimidyl 4-(N-maleimidomethyl)cyclohexane-1-carboxylate), Sulfo-SMCC (sulfosuccinimidyl 4-(N-maleimidomethyl)cyclohexane-1-carboxylate), EMCS (N-ε-maleimidocaproyl-oxysuccinimide ester), Sulfo-EMCS (N-ε-maleimidocaproyl-oxysulfosuccinimide, ester), SMPB (succinimidyl 4-(p-maleimidophenyl)butyrate), Sulfo-SMPB (sulfosuccinimidyl 4-(N-maleimidophenyl)butyrate), SMPH (Succinimidyl 6-((beta-maleimidopropionamido)hexanoate)), LC-SMCC (succinimidyl 4-(N-maleimidomethyl)cyclohexane-1-carboxy-(6-amidocaproate)), Sulfo-KMUS (N-κ-maleimidoundecanoyl-oxysulfosuccimide ester), SPDP (succinimidyl 3-(2-pyridyldithio)propionate), LC-S PDP (succinimidyl 6-(3(2-pyridyldithio)propionamido)hexanoate), Sulfo-LC-SPDP (sulfosuccinimidyl 6-(3′-(2-pyridyldithio)propionamido)hexanoate), SMPT (4-succinimidyloxycarbonyl-alpha-methyl-α(2-pyridyldithio)toluene), PEG4-SPDP (PEGylated, long-chain SPDP cross-linker), PEG12-SPDP (PEGylated. long-chain SPDP cross-linker), SM(PEG)2 (PEGylated SMCC cross-linker), SM(PEG)4 (PEGylated SMCC cross-linker), SM(PEG)6 (PEGylated, long-chain SMCC cross-linker), SM(PEG)8 (PEGylated, long-chain SMCC cross-linker), SM(PEG)12 (PEGylated, long-chain SMCC cross-linker), SM(PEG)24 (PEGylated, long-chain SWAT; cross-linker), Succinimidyl 3-(2-Pyridyldithio)Propionate PDP), SMCC, Succinimidyl trans-4-(maleimidylmethyl)cyclohexane-1-Carboxylate, B MPH (N-β-maleimidopropionic acid hydrazide), EMCH (N-ε-maleimidocaproic acid hydrazide), MPB (4-(4-N-maleimidophenyl)butyric acid hydrazide), KMUH (N-κ-maleimidoundecanoic acid hydrazide), PDPH (3-(2-pyridyldithio)propionyl hydrazide), ANB-NOS (N-5-azido-2-nitrobenzoyloxysuccinimide), Sulfo-SANPAH (sulfosuccinimidyl 6-(4′-azido-2′-nitrophenylamino)hexanoate), SDA (NHS-Diazirine) (succinimidyl 4,4′-azipentanoate), Sulfo-SDA (Sulfo-NHS-Diazirine) (sulfosuccinimidyl 4,4′-azipentanoate), LC-SDA (NHS-LC-Diazirine) (succinimidyl 6-(4,4′-azipentanamido)hexanoate), Sulfo-LC-SDA (Sulfo-NHS-LC-Diazirine) (sulfosuccinimidyl 6-(4,4′-azipentanamido)hexanoate), SDAD (NHS-SS-Diazirine) (succinimidyl 2-((4,4′-azipentanamido)ethyl)-1,3′-dithiopropionate), Sulfa-SDAD (Sulfo-NHS-SS-Diazirine) (sulfosuccinimidyl 2-((4,4′-azipentanamido)ethyl)-1,3′-dithiopropionate), ATFB, SE, 4-Azido-2,3,5,6-Tetrafluorobenzoic Acid, Succinimidyl Ester, SDA (NHS-Diazirine) (succinimidyl 4,4′-azipentanoate), SPB (succinimidyl-[4-(psoralen-8-yloxy)]-butyrate), L-Photo-Leucine, L-Photo-Methionine, ManNAz (N-azidoacetylmannosamine tetraacylated), GalNAz (N-azidoacetylgalactosamine, tetraacylated), DCC (dicyclohexylcarbodiimide), DyLight™ 550-Phosphine, DyLight™ 650-Phosphine, EZ-Link™ Phosphine-PEG3-Biotin, EZ-Link™ Phosphine-PEG4-Desthiobiotin, EDC (1-ethyl-3-(3-dimethylaminopropyl)carbodiimide hydrochloride), NHS (N-hydroxysuccinimide), Sulfo-NHS (N-hydroxysulfosuccinimide), Sulfo-NHS (N-hydroxysulfosuccinimide), Sulfo-NHS (N-hydroxysulfosuccinimide) or Sulfo-NHS (N-hydroxysulfosuccinimide).
“Immobilization” as described herein, refers to the capturing of a molecule, wherein the capturing is performed by a first molecule that is specific for a specific molecule or a label. In some embodiments, the immobilization is performed by attachment of a capture molecule onto a solid support. The solid support can be a bead or a column. In some embodiments, the solid support comprises a streptavidin molecule for capturing a molecule such as streptavidin or a portion thereof. In some embodiments. the protein is biotinylated at a cysteine residue.
“Fragmenting” as described herein, can refer to digesting or breaking apart of a nucleic acid. In some embodiments of the methods described herein, an RNA is fragmented by an enzyme. RNA degradation can be performed by many types of nucleases. For example, ribonuclease (RN Fuse), is a type of nuclease that can catalyze the degradation of RNA into smaller components. RNAses can be divided into endoribonucleases and exoribonucleases. In some embodiments, a method for generating chimeric RNAs comprising RNAs which interact with one another in a cell is provided, wherein the method comprises cross-linking RNA to protein and ligating RNAs cross-linked to the same protein molecule together to form a chimeric RNA. In some embodiments, cross-linking of RNA to protein is performed on an intact cell or in a cell lysate. In some embodiments, cross-linking comprises UV cross-linking. In some embodiments, the method further comprises associating said protein with an agent which facilitates immobilization of said protein on a surface. In some embodiments, said agent which facilitates immobilization, comprises biotin. In some embodiments, the protein is biotinylated at a cysteine residue. In some embodiments, the method further comprises fragmenting said RNAs cross-linked to the same protein molecule. In some embodiments, said fragmenting comprises contacting said RNAs cross-linked to the same protein molecule with an RNAse under conditions which facilitate partial digestion of said RNAs.
“Biotin” as described herein, refers to a water soluble B vitamin that is also known as vitamin H or coenzyme R. In several embodiments described herein, biotin can be used to label RNA for capture by a streptavidin molecule on a solid support, such as a bead. In some embodiments, a method for generating chimeric RNAs comprising RNAs which interact with one another in a cell is provided, wherein the method comprises cross-linking RNA to protein and ligating RNAs cross-linked to the same protein molecule together to form a chimeric RNA. In some embodiments, cross-linking of RNA to protein is performed on an intact cell or in a cell lysate. In some embodiments, cross-linking comprises UV cross-linking. In some embodiments, the method further comprises associating said protein with an agent which facilitates immobilization of said protein on a surface. In some embodiments, said agent which facilitates immobilization, comprises biotin. In some embodiments, the protein is biotinylated at a cysteine residue. In some embodiments, the method further comprises fragmenting said RNAs cross-linked to the same protein molecule. In some embodiments, said fragmenting comprises contacting said RNAs cross-linked to the same protein molecule with an RNAse under conditions which facilitate partial digestion of said RNAs. In some embodiments, the method further comprises linking said RNAs cross-linked to the same protein molecule to an agent which facilitates recovery of said RNAs. In some embodiments, said linking comprises ligating the ends of said RNAs to said agent. In some embodiments, said agent which facilitates recovery of said RNAs comprises a nucleic acid. In some embodiments, said nucleic acid comprises a nucleic acid having biotin thereon. In some embodiments, said linking of said nucleic acid having biotin thereon to said ends of said RNAs comprises ligating said nucleic acid having biotin thereon to the 5′ ends of said RNAs prior to ligating said RNAs cross-linked to the same protein molecule together to form a chimeric RNA. In some embodiments, the method further comprises removing said biotin from the 5′ region of said chimeric RNA. In some embodiments, the method further comprises recovering said chimeric RNAs. In some embodiments, the method further comprises fragmenting said chimeric RNAs.
“Protein” as described herein refers to a macromolecule comprising one or more polypeptide chains. A protein can therefore comprise of peptides, which are chains of amino acid monomers linked by peptide (amide) bonds, formed by any one or more of the amino acids. A protein or peptide can contain at least two amino acids, and no limitation is placed on the maximum number of amino acids that can comprise the protein or peptide sequence. Without being limiting, the amino acids are, for example, arginine, histidine, lysine, aspartic acid, glutamic acid, serine, threonine, asparagine, glutamine, cysteine, cystine, glycine, proline, alanine, valine, hydroxyproline, isoleucine, leucine, pyrolysine, methionine, phenylalanine, tyrosine, tryptophan, ornithine, S-adenosylmethionine, and selenocysteine. A protein can also comprise non-peptide components, such as carbohydrate groups. Carbohydrates and other non-peptide substituents can be added to a protein by the cell in which the protein is produced, and will vary with the type of cell. Without being limiting, proteins can function within organisms by catalyzing metabolic reactions, DNA replication, responding to stimuli, and transporting molecules from one location to another. For example, the proteins can be an enzyme, a transmembrane protein, and antibody, a small biomolecule for transport, a receptor or a hormone. In some embodiments, a method for generating chimeric RNAs comprising RNAs which interact with one another in a cell is provided, wherein the method comprises cross-linking RNA to protein and ligating RNAs cross-linked to the same protein molecule together to form a chimeric RNA. In some embodiments, the protein is an enzyme. In some embodiments, the protein is involved in transport, or in catalysis of metabolic reactions.
“Interactome” as described herein, refers to a whole set of molecular interactions in a particular cell. The term specifically refers to physical interactions among molecules (such as those among proteins, also known as protein-protein interactions) but can also describe sets of indirect interactions among genes (genetic interactions) such as RNA-RNA interactions or interactions between one or more RNA and a protein molecule. In some examples, the interactomes can be displayed as graphs. In some embodiments, the present methods and compositions map substantially all protein-assisted RNA-RNA interactions in one assay. In some embodiments described herein the methods have been applied to produce the first global map of an RNA interactome. In some embodiments, an interactome is produced from a specific cell. In some embodiments, the cell is from a human. In some embodiments, the cell is a cancer cell, a tumor cell, a lymphocyte or an immune cell. In some embodiments, the interactome can be used to determine or predict a disease pathway.
A “protein complex” as defined herein, refers to a group or two or more associated proteins or polypeptide chains and can also be referred to as a “multiprotein complex”. In some embodiments, a complex comprising a nucleic acid(s) bound to a protein complex is provided. In some embodiments, the nucleic acid(s) is RNA.
“Protein intermediates” as defined herein refers to proteins that can bind to one another off and on during a process or a specific pathway, and can also be referred to as “protein binding intermediates.” Without being limiting, examples in which protein intermediates can be seen binding can include processes such as transcription, translation and metabolic pathways. Without being limiting, examples of protein binding intermediates can include polymerases, nucleic acid binding proteins, RNA recognition motic proteins, heterogeneous ribonucleoprotein particles, and other protein binding intermediates known to those skilled in the art. In some embodiments, a complex comprising a nucleic acid(s) bound to protein intermediate(s) is provided. In some embodiments, the nucleic acid(s) is RNA. In some embodiments, the protein intermediates interact with other protein intermediates, thus forming a protein complex, wherein the protein complex comprises protein intermediates.
Disclosed herein are methods and compositions for identifying direct RNA-RNA interactions in a cell. In some embodiments, the methods and compositions can be used to identify at least about 100, at least about 500, at least about 1000 or more than about 1000 RNA-RNA interactions in the cell. In some embodiments, the methods and compositions can be used to identify about 100, about 200, about 300, about 300, about 500, about 600, about 700, about 800, about 900, about 1000, about 2000, about 3000, about 4000, about 5000, about 6000, about 7000, about 8000, about 9000 or about 10,000 RNA-RNA interactions or any other number of RNA-RNA interactions between any two of these aforementioned values. In other embodiments, the methods and compositions can be used to identify substantially all of the direct RNA-RNA interactions in the cell. For example, the methods and compositions can be used to identify at least about 70%, at least about 80%, at least about 90% or more than about 90% of the direct RNA-RNA interactions in the cell. In some embodiments, the methods and compositions can be used to identify at least about 70%, at least about 80%, at least about 90% or about 100% of the direct RNA-RNA interactions in the cell, or any other percent between any two of the aforementioned values described. This method does not rely on knowledge of any specific RNA sequence and one of the benefits is identifying unknown RNA-RNA interactions.
Only about 5% of the genome codes for RNA that is translated into a protein. About 50% of the genome is transcribed into RNA, including non-coding RNA (ncRNA) such as microRNA and long ncRNA (longer than 200 nt). ncRNA often interacts with other RNA, via protein-associated interactions. Accordingly, direct RNA-RNA interactions can be identified using a protein-based capture method. In some embodiments, the direct RNA-RNA interactions can be identified using a protein-based capture method.
Although RNA-RNA interactions are essential for RNA's regulatory functions, there is yet no technology to globally survey them. The available technologies including HITS-CLIP (Nature 460, 479-486) and CLASH (Cell 153, 654-665) can only map the RNAs attached to a selected protein. Such one-protein-at-a-time approaches cannot map the entire RNA interactome.
In some embodiments, the present methods and compositions map substantially all protein-assisted RNA-RNA interactions in one assay. In some embodiments described herein the methods have been applied to produce the first global map of an RNA interactome. In some embodiments, the present methods and compositions circumvents the requirement for a protein-specific antibody or the need to express a tagged protein. This allows for an unbiased mapping of the RNA interactome. To our knowledge, other methods can only work with one RNA-binding protein at a time. The embodiments described herein, lead to a surprising outcome in which RNA-RNA interactions can be determined for multiple RNA binding proteins.
In some embodiments, the present methods and compositions analyze the endogenous cellular condition without introducing any exogenous nucleotides or protein-coding genes (CLASH) prior to cross-linking. Rather than requiring a transformed cell line (CLASH), some embodiments are generally applicable to analyze any cell type or tissue.
In some embodiments, the present methods and compositions overcome an important drawback of HITS-CLIP. HITS-CLIP inferred RNA-RNA interactions did not necessarily occur in the cells analyzed. This is because any two RNAs that co-appeared in HITS-CLIP could have resulted from the independent attachment of either RNA to different copies of the targeted protein. However, in some embodiments, the present methods and compositions reliably represent the physical interactions of RNAs.
The RNA interactome in mouse embryonic stem (ES) cells have been mapped and herein the new findings show:
Although some embodiments of the present methods and compositions can be used for mapping inter-molecule interactions, they can also reveal unique information concerning RNA structure. The intra-molecule reads of RNA Hi-C provided spatial proximity information for various segments of an RNA. As such, this is the first time that such information has become available in a high-throughput manner. Additionally, the single stranded regions of every RNA were obtained during the same assay as a byproduct. In an exemplary embodiment, an RNA was bent by a protein, and such quaternary structure was captured by intra-molecule reads of RNA Hi-C.
In some embodiments, the method comprises: (1) cross-linking RNA1 and RNA2 to a protein (or to a protein intermediate or a protein complex) to form a complex, (2) labelling protein (e.g. Biotin), (3) fragmenting RNA, (4) capturing labelled protein (e.g. biotin-streptavidin-bead), (5) ligating a biotin-tagged RNA linker to the 5′ end of RNA1 and RNA2, (6) performing proximity ligation to ligate RNA1-linker-RNA2 forming a chimera, (7) protease treating the complex to release RNA1-linker-RNA2 chimera (DNAse treat), (8) hybridizing with DNA probe complementary to biotin-tagged RNA linker and treating with T7 exonuclease to remove non-ligated biotin-tagged RNA linker, (9) fragmenting nucleic acids to about 150 nt to assist with ultimate sequencing, (10) capturing RNA1-linker-RNA2 chimera using streptavidin bead, (11) converting RNA1-linker-RNA2 to cDNA and sequencing at least a portion of the cDNA. In some embodiments, bioinformatics is used to identify RNA1 and RNA2.
The present methods and compositions find application in a variety of contexts, including use by RNA therapeutic companies searching for new therapeutic targets, use by researchers to investigate RNA-RNA interactions and development by device and reagent companies for research and discovery devices.
Non-coding RNAs (ncRNAs) are involved in a wide range of cellular processes, including the regulation of gene expression. MicroRNAs (miRNAs) and long ncRNAs (lncRNAs) are two classes of ncRNAs with known regulatory functions. The ability of these ncRNAs to modulate gene expression at post-transcriptional or epigenetic level provide new opportunities for ncRNA based therapeutics. Identification of direct interactions among ncRNAs and messenger RNAs (mRNAs) is an inevitable step to understand the regulatory roles of ncRNAs. MiRNA and lincRNA targetings are only small portions of interactions that can be detected by technology described in the embodiments herein, it is also designed to discover the potential regulatory functions of other ncRNAs. However, the market of diagnosis and therapeutics driven only by these two classes of ncRNAs is already going to be significant.
MiRNAs are a group of non-coding ribonucleic acids that serve as key regulators of gene expression. Recent studies have further revealed the importance of miRNAs in diseases, especially in cancer, cardiovascular, and neurological diseases. Large-scale cloning efforts have revealed the abundance and variety of miRNAs. The human genome has been estimated to encode up to 1000 miRNAs and these are predicted to regulate a third of all genes. In neurological processes, miRNAs are key mediators of both central nervous system (CNS) development and plasticity. Increasing evidence indicates that miRNAs are involved in neurological disorders as diverse as traumatic spinal cord injury, traumatic brain injury, Alzheimer's disease, Parkinson's disease and Huntington's disease. A potent feature of miRNA-based regulation is the ability of single miRNAs to regulate multiple functionally related mRNAs, as exemplified by the liver-specific miR-122, which regulates multiple metabolic genes. On average, a given miRNA can regulate several hundred transcripts whose effector molecules function at various sites within cellular pathways and networks. Because of this, miRNAs are able to switch instantly between cellular programs and are therefore often viewed as master regulators of the human genome.
It was only 10 years ago that the first human miRNA was discovered, and yet a miRNA-based therapeutic has already entered Phase 2 clinical trials (miR-122 antagonist, SPC3649, developed by Santaris, is administered to HCV patients to block replication of the virus). This rapid progress from discovery to development reflects the importance of miRNAs as critical regulators in human disease, and holds the promise of yielding a new class of therapeutics that could represent an attractive addition to the current drug pipeline.
The principles that apply to developing miRNA-based therapies remain the same as for other targeted therapies that take the path from drug target to drug. For instance, target identification and validation are key to selecting miRNAs that are causally involved in the disease process. Furthermore, diligent drug development is necessary to assure satisfactory efficacy, specificity and lack of toxicity. However, since miRNAs constitute a class of drug targets unrelated to any others, new ancillary technologies and methods are also required. A critical missing piece in harnessing the therapeutic potentials of miRNAs is an assay to identify the target mRNAs of miRNAs. In some embodiments, the present methods and compositions can be used to develop therapeutic strategies and compositions.
The market of cancer therapy is close to 100 billion currently and is predicted to expand exponentially in the next five years. microRNA based therapeutics have become the leading edge of this field, and according to some analysts predicted to occupy a market space worth $7.5 billion, based on a $150 million market per therapeutic miRNA and assuming 50 miRNAs with therapeutic potential are approved for use.
In some embodiments, the present compositions and methods provide a missing piece that cannot be circumvented in any miRNA-driven therapeutic applications. Other applications of the present methods and compositions include therapeutic applications in neurological disorders and research labs.
lincRNAs are non-protein coding transcripts longer than 200 nts which can mediate interactions between epigenetic remodeling complexes and chromatin. A deeper understanding of lncRNA function in human cancer will not only expand the number of potential target cancer genes, but can also facilitate development of novel anti-cancer therapies, such as gene regulation mediated by antisense RNAs or targeting lncRNA-protein interactions. With a deeper understanding of the roles of lncRNA in normal and diseases states, it is believed that lncRNAs can also be used as diagnostic or predictive biomarkers. For example, the lncRNA HOTAIR is increased in expression in primary breast tumors and metastases, and its expression level in primary tumors is a powerful predictor of eventual metastasis and death. Moving closer to the clinics, a lncRNA called prostate cancer antigen 3 (PCA3), which is highly overexpressed in prostate cancer, happens to be found in urine, making for easy testing. A commercial kit, called the Progensa PCA3 test, which is the first urine-based molecular test to help determine a need for repeat prostate biopsies, has been approved for clinical application by the FDA recently. The disease-regulating importance of lncRNAs is not limited to cancer. They also play important roles in heritable conditions, notes Gibb, in which lncRNA deregulation has been associated with brachydactyly and HELLP syndrome. Another lncRNA was shown to stabilize the mRNA for a crucial enzyme in the Alzheimer's disease pathway. Increasing evidence suggests lncRNAs are closely associated with major human diseases, and can have better performance in disease diagnosis and prognosis compared with protein-coding RNAs. Furthermore, the majority of currently available drugs and tool compounds exhibit an inhibitory mechanism of action and there is a relative lack of pharmaceutical agents that are capable of increasing the activity of effectors or pathways for therapeutic benefit. Indeed, the upregulation of many genes, including tumor suppressors, growth factors, transcription factors and genes that are deficient in various genetic diseases, would be desired in specific situations. Many reports suggest that lncRNAs can often be suppressed by RNAi triggers. Targeting lncRNAs by RNAi that silence other genes can activate gene expression. In some embodiments, the methods and compositions can be used to detect the presence or absence of upregulated genes in cells of interest. In some embodiments the cells comprise tumor cells, cancer cells or immune cells. In some embodiments, the methods can be used to identify or predict disease or disease outcome by evaluation of a transcriptome comprising the information of genes upregulated.
Thus, in some embodiments, the present methods and compositions can be utilized by companies in the miRNA therapeutics market who use miRNA mimics to normalize gene regulatory network on cancerous cells, or treat cardiovascular and muscle disease. In an exemplary embodiment, the present methods and compositions can be utilized to validate candidate products and also to search for new targets.
In some embodiments, the present methods and compositions can be used for manufacturing RNA Hi-C kits. In other embodiments, the present methods and compositions can be used to provide oligonucleotides for research. For example, the present methods and compositions can be utilized in the context of large lncRNA-targeting RNAi trigger libraries. In some embodiments, the present methods and compositions are used to identify potential lncRNA candidates for RNAi targeting.
One embodiment provides a technology to map out RNA-RNA interactions in cells. In one embodiment, the methods and compositions unbiasedly map out substantially all RNA-RNA interactions in one experiment, and provide one-to-one resolution (which RNA interacts with which RNA). Some embodiments include a novel experimental component and a new computational strategy. Starting from the cells of a certain cell type, some embodiments map out a list of directly interacting RNAs of this cell type. The present methods and compositions have been applied to mouse embryonic stem cells and identified 4049 RNA-RNA interactions using one experiment. In one embodiment, the experimental component takes these cells as input, transforms substantially all direct RNA-RNA interactions into chimeric RNA molecules, and sequences these chimeric RNAs using pair-end sequencing. Some embodiments comprise (1) immobilization of all protein-RNA complexes (a complex comprising protein and nucleic acid, intermediate proteins and nucleic acid or a protein complex and nucleic acid) to magnetic beads; (2) proximity-based ligation of interacting RNAs; (3) selective purification of chimeric RNA molecules; (4) high-throughput sequencing of chimeric transcript. In an embodiment described herein, the method can further comprise using a bioinformatic program to take these sequencing data as input, and produce a list of high-confidence RNA-RNA interactions.
Currently, there are no efficient methods that can directly assay substantially all RNA-RNA interactions in a cell type at once. There are two kinds of methods which exist to partially achieve this goal, both with weakness. First, experimentally characterizing the targets of only one miRNA/lincRNA in vivo is considered as a pioneering technology [Lal et al., 2011; Baigude et al., 2012; Kretz et al., 2013]. Second, other technologies like HITS-CLIP and CLASH that can detect targets of many miRNAs also have restrictions. One major common restriction is that they both concentrated on miRNAs, which only comprise a small portion of RNAs. Thus, these technologies are not able to reveal the majority of RNA-RNA interactions. Furthermore, each technology has its own specific weakness.
High-throughput sequencing of RNA isolated by cross-linking immunoprecipitation (HITS-CLIP) is the most reliable method for genome-wide analyses of miRNA targets currently [Chi et al., 2009]. HITS-CLIP allows the identification of the total collection of miRNAs present in a tissue, as well as all the total collection of mRNAs regulated by miRNAs. However direct pairing of a miRNA to its target mRNAs cannot be directly deduced from HITS-CLIP. In other words, HITS-CLIP does not directly inform which miRNA regulates which mRNAs (no one-to-one information).
A recent method called CLASH (cross-linking, ligation, and sequencing of hybrids) could allow direct observation of miRNA-target pairs. However, the number of interactions is still small as compared to number of sequencing reads: only 2% of sequenced reads are chimeric, 98% are still single reads. This requires much deeper sequencing coverage or preparation of multiple samples to obtain enough coverage of miRNA-mRNA interactions.
In some embodiments, the present methods and compositions include experimental and computational components to make and enrich RNA chimeras so that an unbiased, genome-wide, direct assay for information of all RNA-RNA interactions could be mapped.
In some embodiments, the present methods and compositions provide:
In some embodiments, the present methods and compositions are able to:
As previously noted, some technologies characterize the targets of only one miRNA/lincRNA in vivo (for example, Lal et al., 2011; Baigude et al., 2012; RNA interactome analysis).
As previously noted, some technologies can detect targets of many miRNAs, but are restricted to miRNA (for example, HITS-CLIP, PAR-CLIP, which also lack direct one-to-one information and CLASH, which provides only a small portion of chimeric RNAs). As such the present embodiments described herein lead to an advantage relative to the previous methods by not restricting the RNA is to a small subset such as miRNA.
One exemplary embodiment is illustrated in
One exemplary embodiment of the bioinformatics analysis of the sequenced cDNAs is illustrated in (
Next, a hypergeometric test are developed to identify strong interactions between clusters within R1 and R2 pools based on the number of ligated chimeras (R1-linker-R2). Different types of strong interactions are determined by genomic annotations of clusters in R1 and R2 pools. (
Two independent experiments using mouse embryonic stem (ES) cells have been conducted. These two experiments produced comparable results. The cDNAs ranged from 75 to 200 nts (
This proof of principle experiment with our technology produced a list of 4049 pairs of interacting RNAs. The top 10 interactions, based on p-values and number of supporting read-pairs, are provided in Table 1.
Many biological processes are regulated by RNA-RNA interactions (Kretz, M. et al. Control of somatic tissue differentiation by the long non-coding RNA TINCR. Nature 493, 231-235, doi:10.1038/nature11661 (2013)), nonetheless it remains formidable to analyze the entire RNA interactome. In an exemplary embodiment, a method, RNA Hi-C, was developed to map protein-assisted RNA-RNA interactions in vivo. By circumventing the selection for a specific RNA-binding protein (Hafner, M. et al. Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell 141, 129-141, doi:10.1016/j.cell.2010.03.009 (2010); Chi, S. W., Zang, J. B., Mele, A. & Darnell, R. B. Argonaute HITS-CLIP decodes microRNA-mRNA interaction maps. Nature 460, 479-486, doi:10.1038/nature08170 (2009); Helwak, A., Kudla, G., Dudnakova, T. & Tollervey, D. Mapping the human miRNA interactome by CLASH reveals frequent noncanonical binding. Cell 153, 654-665, doi:10.1016/j.cell.2013.03.043 (2013); Kudla, G., Granneman, S., Hahn, D., Beggs, J. D. & Tollervey, D. Cross-linking, ligation, and sequencing of hybrids reveals RNA-RNA interactions in yeast. Proceedings of the National Academy of Sciences of the United States of America 108, 10010-10015, doi:10.1073/pnas.1017386108 (2011)), the approach vastly expanded the identifiable portion of the RNA interactome. Use of this technology, allowed mapping of the RNA interactome in mouse embryonic stem cells, which was composed of 46,780 RNA-RNA interactions. The RNA interactome was a scale-free network, with several lincRNAs and mRNAs emerging as hubs. An interaction was validated between two hubs, Malat1 and Slc2a3, using single molecule RNA fluorescence in situ hybridization. Base pairing was observed at the interaction sites of long RNAs, and was particularly strong in transposon RNA-mRNA and lincRNA-mRNA interactions. This revealed a new type of regulatory sequences acting in trans. Consistent with their hypothesized roles, the RNA interaction sites were more evolutionarily conserved than other regions of the transcripts. RNA Hi-C also provided new information on RNA structures, by simultaneously revealing the footprint of single stranded regions and the spatially proximal sites of each RNA. Thus, the unbiased mapping of the protein-assisted RNA interactome with minimum perturbation of cell physiology is advantageous to previous methods and will greatly expand the capacity to investigate RNA functions.
Interactions between RNA molecules exert key regulatory roles and are often mediated by RNA binding proteins (Ray, D. et al. A compendium of RNA-binding motifs for decoding gene regulation. Nature 499, 172-177, doi:10.1038/nature12311 (2013)) such as ARGONAUTE proteins (AGO) (Meister, G. Argonaute proteins: functional insights and emerging roles. Nature reviews. Genetics 14, 447-459, doi:10.1038/nrg3462 (2013)), PUM2, QKI (Hafner, M. et al. Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell 141, 129-141, doi:10.1016/j.cell.2010.03.009 (2010)), and snoRNP proteins (Granneman, S., Kudla, G., Petfalski, E. & Tollervey, D. Identification of protein binding sites on U3 snoRNA and pre-rRNA by UV cross-linking and high-throughput analysis of cDNAs. Proceedings of the National Academy of Sciences of the United States of America 106, 9613-9618, doi:10.1073/pnas.0901997106 (2009)). Despite recent advances such as PAR-CLIP (Hafner, M. et al. Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell 141, 129-141, doi:10.1016/j.cell.2010.03.009 (2010)), HITS-CLIP (Chi, S. W., Zang, J. B., Mele, A. & Darnell, R. B. Argonaute HITS-CLIP decodes microRNA-mRNA interaction maps. Nature 460, 479-486, doi:10.1038/nature08170 (2009)), and CLASH (Helwak, A., Kudla, G., Dudnakova, T. & Tollervey, D. Mapping the human miRNA interactome by CLASH reveals frequent noncanonical binding. Cell 153, 654-665, doi:10.1016/j.cell.2013.03.043 (2013); Kudla, G., Granneman, S., Hahn, D., Beggs, J. D. & Tollervey, D. Cross-linking, ligation, and sequencing of hybrids reveals RNA-RNA interactions in yeast. Proceedings of the National Academy of Sciences of the United States of America 108, 10010-10015, doi:10.1073/pnas.1017386108 (2011)), it remains a formidable challenge to map all protein-assisted RNA-RNA interactions.
In each of these three approaches, only the interactions mediated by one RNA-binding protein can be analyzed per experiment. Additionally, each experiment requires either a protein-specific antibody (HITS-CLIP or PAR-CLIP) or stable expression of a tagged protein in transformed cell lines (CLASH). Furthermore, any two RNAs that co-appeared in either HITS-CLIP or PAR-CLIP could have resulted from the independent attachment of either RNA to different copies of the targeted protein. For example, suppose 10 AGO proteins were present in a cell, each of which was bound by a different RNA; these 10 RNAs would be identified as interacting from AGO HITS-CLIP. Therefore, HITS-CLIP and PAR-CLIP inferred RNA-RNA interactions did not necessarily occur in the cells analyzed.
In an exemplary embodiment described herein, an RNA Hi-C method was developed to detect protein-assisted RNA-RNA interactions in vivo. In this procedure, RNA is cross-linked with its bound proteins then ligated to a biotinylated RNA linker such that the RNAs, RNA1 and RNA2, are co-bound by the same protein forming a chimeric RNA of the form RNA1-Linker-RNA2. These linker-containing chimeric RNAs are isolated using streptavidin coated magnetic beads and subjected to pair-end sequencing (Methods,
The RNA Hi-C method offers several advantages for mapping RNA-RNA interactions. First, only the RNAs brought together by the same protein molecule are captured, overcoming the drawback in HITS-CLIP where different RNAs would be considered as interacting when they are independently bound to different copies of the same protein. Second, the use of a biotinylated linker as a selection marker circumvents the requirement for a protein-specific antibody or the need to express a tagged protein. This allows for an unbiased mapping of the RNA interactome. As described in the art, other methods can only work with one RNA-binding protein at a time. Thus this method leads to the surprising effect of working efficiently with more than one RNA-binding protein at a time. Third, false positives that result from RNAs ligating randomly to other nearby RNAs are minimized by performing the RNA ligation step on streptavidin beads in extremely dilute conditions. Fourth, the RNA linker provides a clear boundary delineating sequencing reads that span across the ligation site, thus avoiding ambiguities in mapping the sequencing reads. Fifth, RNA Hi-C directly analyzes the endogenous cellular condition without introducing any exogenous nucleotides (Hafner, M. et al. Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell 141, 129-141, doi:10.1016/j.cell.2010.03.009 (2010); Lal, A. et al. Capture of microRNA-bound mRNAs identifies the tumor suppressor miR-34a as a regulator of growth factor signaling. PLoS genetics 7, e1002363, doi:10.1371/journal.pgen.1002363 (2011); Baigude, H., Ahsanullah, Li, Z., Zhou, Y. & Rana, T. M. miR-TRAP: a benchtop chemical biology strategy to identify microRNA targets. Angew Chem Int Ed Engl 51, 5880-5883, doi:10.1002/anie.201201512 (2012)) or protein-coding genes (Helwak, A., Kudla, G., Dudnakova, T. & Tollervey, D. Mapping the human miRNA interactome by CLASH reveals frequent noncanonical binding. Cell 153, 654-665, doi:10.1016/j.cell.2013.03.043 (2013)), prior to cross-linking. Sixth, potential PCR amplification biases are removed by attaching a random 6 nucleotide barcode to each chimeric RNA before PCR amplification and subsequently counting completely overlapping sequencing reads with identical barcodes only once (Chi, S. W., Zang, J. B., Mele, A. & Darnell, R. B. Argonaute HITS-CLIP decodes microRNA-mRNA interaction maps. Nature 460, 479-486, doi:10.1038/nature08170 (2009); Loeb, G. B. et al. Transcriptome-wide miR-155 binding map reveals widespread noncanonical microRNA targeting. Molecular cell 48, 760-770, doi:10.1016/j.molcel.2012.10.002 (2012); Wang, Z. et al. iCLIP predicts the dual splicing effects of TIA-RNA interactions. PLoS biology 8, e1000530, doi:10.1371/journal.pbio.1000530 (2010); Konig, J. et al. iCLIP reveals the function of hnRNP particles in splicing at individual nucleotide resolution. Nature structural & molecular biology 17, 909-915, doi:10.1038/nsmb.1838 (2010)).
In an exemplary embodiment, two independent RNA Hi-C assays were carried out on mouse embryonic stem (ES) cells with minor technical differences (
A set of bioinformatic tools was created (RNA-HiC-tools) to analyze and visualize RNA Hi-C data (
The four RNA Hi-C libraries were compared. ES-1 and ES-2 were most similar judged by correlations of FPKMs (separately calculated for the read fragments on the left and the right sides of the linker), followed by ES-indirect, and then MEF (
It was then desired to know whether other RNAs could experience a similar process to miRNA biogenesis and also interact with mRNAs. To do so, the RNA Hi-C identified interacting RNAs were intersected with those found by small RNA sequencing (smallRNA-seq) and those bond to the AGO protein (HITS-CLIP) in ES cells (S. W. Chi, J. B. Zang, A. Mele, R. B. Darnell, Argonaute HITS-CLIP decodes microRNA-mRNA interaction maps. Nature 460, 479 (Jul. 23, 2009)). The smallRNA-seq selectively sequenced, “miRNAs and other small RNAs that have a 3′ hydroxyl group resulting from enzymatic cleavage by Dicer or other RNA processing enzymes” (IIlumina, “TruSeq® Samll RNA Sample Preparation Guide” (2014)). Besides miRNA, other RNA types including snoRNA, pseudogene RNA, mRNA UTRs also contributed to the small RNA pool, and were attached to AGO (
To elucidate what types of non-miRNA genes were most likely to undergo miRNA-like biogenesis, the RNA Hi-C identified RNA-RNA interactions were subjected to the following filters:
A total of 302 RNA-RNA interactions passed these filters. The majority (79%) of the source RNAs in these interactions were snoRNAs (Table 2). The snoRNAs were therefore prioritized for functional analysis.
It was hypothesized that a large number of snoRNAs were enzymatically processed into miRNA-like short RNAs and interact with mRNAs. This hypothesis was supported by 919 RNA Hi-C identified snoRNA-mRNA interactions where both the mRNA and the snoRNA were bound by AGO. Furthermore, AGO bound snoRNAs and their interacting mRNAs exhibited anti-correlated expression changes during guided differentiation of ES cells toward mesendoderm (P. Yu et al., Spatiotemporal clustering of the epigenome reveals rules of dynamic gene regulation. Genome research 23, 352 (February, 2013)) (
The ES-1 and ES-2 libraries were merged to infer the RNA interactome in ES cells. This data included 4.54 million non-duplicated pair-end reads that were unambiguously split into two RNA fragments with both fragments uniquely mapping to the genome (mm9). 46,780 inter-RNA interactions were identified (FDR<0.05, Fisher's exact test) (
1.1 Data synthesis. In order to estimate the sensitivity and specificity of RNA Hi-C, including its experimental and computational procedures, a simulation analysis was carried out. 1,000,000 pair-end reads by computationally mimicking the data generation process were simulated. The parameters used for the simulation were derived from real data. The simulated data generation process is as follows.
For each pair-end read (2×100 bases):
Steps 1-5 simulated a cDNA sequence according the experimental procedure, and steps 6-8 simulated a pair-end read based on this cDNA sequence. The simulated interacting RNA pairs, as well as the cDNA type and the length of each part (RNA1, linker, and RNA2, if applicable) were kept for comparison with the computational predictions.
1.2. Evaluation of intermediate and final results. The synthetic data was used to evaluate the sensitivities and specificities of two intermediate analysis steps, as well as the final predictions.
First, the predicted cDNA lengths were compared (output of Step 3 of RNA-HiC-Tools) to the actual lengths (Table 3). This step “3. Recovering the cDNAs in the sequencing library” assigns each cDNA into four types with respect to their lengths, namely Type 1 (<100 bp); Type 2 (100-200 bp); Type 3 (>200 bp); Type 4 (unknown) (
When the predicted length was shorter than 200 bp (Types 1 and 2), the exact length could be predicted. In these cases, the predicted lengths often precisely matched the lengths of the simulated cDNAs (
Next, the predicted chimeric configuration of each cDNA was compared (output of Step 4 of RNA-HiC-Tools) to the synthesized configuration. In Step “4. Parsing the chimeric cDNAs”, the algorithm assigned the cDNAs into five categories, based on the presence of the linker sequence. The algorithm reached 99.89% sensitivity and 95.82% specificity for the cDNAs in the “RNA1-linker-RNA2” form (Table 4).
Lastly, the predicted and the simulated RNA-RNA interactions were compared. The simulated dataset contained 200,200 chimeric RNA pairs, among which 131,571 pairs of RNAs were detected (sensitivity=65.72%, specificity=92.57%,
The number of interacting partners per RNA was strongly unbalanced. The ES cell RNA interactome was a scale-free network, with a degree distribution that conformed to power law P(k)˜k−γ, γ=3) (
The majority (83.05%) of the interacting RNAs exhibited overlapping RNA Hi-C reads (
It was postulated whether base complementation is utilized by different types of RNA-RNA interactions. The hybridization energy of a pair of interacting RNAs was estimated by the average hybridization energy of the pairs of ligated fragments (RNA1, RNA2) (Bellaousov, S., Reuter, J. S., Seetin, M. G. & Mathews, D. H. RNAstructure: web servers for RNA secondary structure prediction and analysis. Nucleic Acids Res 41, W471-W474, doi:Doi 10.1093/Nar/Gkt290 (2013)), and was compared to the hybridization energy of control RNAs generated by random shuffling of the bases. Complementary bases were preferred in nearly all types of RNA-RNA interactions, and were most pronounced in transposonRNA-mRNA, mRNA-mRNA, pseudogeneRNA-mRNA, lincRNA-mRNA, miRNA-mRNA interactions (p-values <2.4−18), but was not observed in LTR-pseudogeneRNA interactions (
If these RNA-RNA interactions are sequence-specific, the RNA interaction sites should be under selective pressure. It was found that the interspecies conservation levels (Cooper, G. M. et al. Distribution and intensity of constraint in mammalian genomic sequence. Genome research 15, 901-913, doi:10.1101/gr.3577405 (2005)) are strongly increased at the interaction sites, and the peak of conservation precisely pinpointed the junction of the two RNA fragments (
Although RNA Hi-C was originally designed for mapping inter-molecule interactions, it was found that RNA Hi-C revealed RNA secondary and tertiary structures. All the analyses above were based on inter-molecular reads. By looking at intra-molecular reads, several things can be learned about RNA structure. First, the footprint of single stranded regions of an RNA were identified by the density of RNase I digestion sites (RNase I digestion was applied before ligation, see Step 2 in
The key to mapping RNA interactions is selection. The introduction of a selectable linker in RNA Hi-C enabled an unbiased selection of interacting RNAs, making it possible to globally map an RNA interactome. The number of interacting partners per RNA in ES cells was strongly unbalanced, resulting in a scale-free RNA network. Interactions between long RNAs frequently used a small fraction of the transcripts. In analogy to protein interaction domains, the notion of RNA interaction sites was proposed. RNA interaction sites utilized base pairing to facilitate interactions of long RNAs, suggesting a new type of trans regulatory sequences. These trans regulatory sequences are more evolutionarily conserved than other parts of transcripts. RNA structure could be mapped by RNA Hi-C as well. Provided herein is an exemplary embodiment, where an RNA was bent by a protein, and such tertiary structure was revealed by the intro-molecule reads of RNA Hi-C. As such, this method and data should greatly facilitate future investigations of RNA functions and regulatory roles.
The RNA-HiC-tools software is available at http://systemsbio.ucsd.edu/RNA-Hi-C, the disclosure of which is incorporated herein by reference in its entirety.
Undifferentiated mouse E14 ES cells were cultured under feeder-free conditions. ES cells were seeded on gelatin-coated dishes and were cultured in Dulbecco's modified Eagle medium (DMEM; GIBCO) supplemented with 15% fetal bovine serum (FBS; Gemini Gemcell), 0.055 mM 2-mercaptoethanol (Sigma), 2 mM Glutamax (GIBCO), 0.1 mM MEM nonessential amino acid (GIBCO), 5,000 U/ml penicillin/streptomycin (GIBCO) and 1,000 U/ml of LIF (Millipore). The cells were maintained in an incubator at 37° C. and 5% CO2.
Mouse embryonic fibroblasts (MEFs) were cultivated in 15-cm dishes in DMEM (GIBCO) supplemented with 15% fetal bovine serum (FBS; Gemini Gemcell), 0.055 mM 2-mercaptoethanol (Sigma), 2 mM Glutamax (GIBCO), 0.1 mM MEM nonessential amino acid (GIBCO), 5,000 U/ml penicillin/streptomycin (GIBCO). MEFs were also maintained in an incubator at 37° C. and 5% CO2.
Drosophila S2 cells (Invitrogen) were maintained in 15-cm plates in Schneider's Drosophila Medium (GIBCO) supplemented with 10% heat-inactivated fetal bovine serum (FBS; Gemini Gemcell), and 5 ml 1:100 Penicillin-Streptomycin (GIBCO) in an incubator at 28° C. without CO2.
Mice handling was approved by the Institutional Animal Care and Use Committee of the University of California San Diego. Adult female (C57BL/6J background) was sacrificed by cervical dislocation and the whole brain was immediately collected, rinsed with ice-cold PBS three times and snap frozen. Frozen whole mouse brain tissue was ground into fine powder in liquid nitrogen using a mortar and pestle. The tissue powder was quickly transferred into a Petri dish on a bed of dry ice and irradiated on dry ice three times at 400 mJ/cm2 in a UV cross-linker (254 nm) with gentle swirling between each irradiation. Cross-linked powdered tissue was immediately lysed and subjected to RNA Hi-C procedure as described.
RNA Hi-C was designed to: (i) capture interacting RNAs in vivo in an unbiased manner without genetically or transiently introducing exogenous molecules; (ii) allow stringent removal of non-physiologic associations that form after cell lysis (S. Mili, J. A. Steitz, RNA 10, 1692 (2004)); (iii) select the proximity-ligated chimeric RNAs; (iv) allow unambiguous bioinformatic identification of interacting RNAs. These objectives can be achieved by: (i) cross-linking and immobilization of all RNA-protein complexes (a complex comprising protein and nucleic acid, intermediate proteins with nucleic acid or a protein complex bound to nucleic acid, wherein the nucleic acid is RNA) in streptavidin beads and removal of non-specific binding by denaturing conditions; (ii) attaching a biotin-tagged RNA linker to facilitate selective enrichment of chimeric RNA constructs; (iii) using the linker sequence to unambiguously split the interacting RNAs from a sequencing read pair.
UV irradiation was used to form covalent bonds between photoreactive nucleotide bases and amino acids. UV irradiation generates highly reactive, short-lived states of the nucleotide bases within the RNA, inducing covalent bond formation only with amino acids at their contact points without additional elements that might cause conformational perturbation (I. G. Pashev, S. I. Dimitrov, D. Angelov, Trends in Biochemical Sciences 16, 323 (1991)). UV irradiation at 254 nm does not promote protein-protein cross-linking due to the different wave lengths absorbed by amino acids. Specifically, cells were washed twice in ice-cold PBS and irradiated with UV-C (254 nm) at 400 mJ/cm2 in ice-cold PBS on ice. Cells were harvested by scraping and pelleted by centrifugation at 1,000×g for 5 min at 4° C. Cell pellets were snap-frozen in liquid nitrogen and stored at −80° C.
An RNA Hi-C library (ES-indirect) was generated in which protein-protein complexes were cross-linked as well. This was to capture the RNA that were brought together by protein interactions. An in vivo dual cross-linking method was applied with previously validated parameters (Illumina, “TruSeq® Samll RNA Sample Preparation Guide” (2014); P. Yu et al., Spatiotemporal clustering of the epigenome reveals rules of dynamic gene regulation. Genome research 23, 352 (February, 2013); N. J. Loman et al., Performance comparison of benchtop high-throughput sequencing platforms. Nature biotechnology 30, 434 (May, 2012)). Briefly, cells were first rinsed with room temperature PBS and treated with 1.5 mM EthylGlycol bis(SuccinimidylSuccinate) (EGS, Pierce Protein Research Products, Rockford, Ill.) freshly-prepared in PBS for 45 minutes at room temperature on a shaker. Cells were further treated with formaldehyde (Pierce Protein Research Products, Rockford, Ill.) to a final concentration of 1% and incubated for 20 minutes at room temperature with rocking. Glycine was added to a final concentration of 250 mM and incubated for 10 minutes at room temperature to quench the cross-linking reaction. Cells were then washed once with PBS at room temperature, scraped off, pelleted at 1,000×g for 5 min at 4° C., snap-frozen in liquid nitrogen and stored at −80° C.
A control experiment (ES-indirect) was conducted in which protein-protein complexes were cross-linked as well. This controls for the RNAs that were brought together by protein interactions. Thus, an in vivo dual cross-linking method was applied with previously validated parameters (S. K. Kurdistani, M. Grunstein, Methods 31, 90 (2003); D. E. Nowak, B. Tian, A. R. Brasier, BioTechniques 39, 715 (2005); J. Zhang et al., Methods 58, 289 (2012)). Briefly, cells were first rinsed with room temperature PBS and treated with 1.5 mM EthylGlycol bis(SuccinimidylSuccinate) (EGS, Pierce Protein Research Products, Rockford, Ill.) freshly-prepared in PBS for 45 minutes at room temperature on a shaker. Cells were further treated with formaldehyde (Pierce Protein Research Products, Rockford, Ill.) to a final concentration of 1% and incubated for 20 minutes at room temperature with rocking. Glycine was added to a final concentration of 250 mM and incubated for 10 minutes at room temperature to quench the cross-linking reaction. Cells were then washed once with PBS at room temperature, scraped off, pelleted at 1,000×g for 5 min at 4° C., snap-frozen in liquid nitrogen and stored at −80° C.
Approximately 6×108 cross-linked cells stored at −80° C. were thawed on ice and resuspended in ˜3 volumes of lysis buffer (50 mM Tris-HCl pH 7.5, 100 mM NaCl, 0.1% SDS, 1% IGEPAL CA-630, 0.5% sodium deoxycholate, 1 mM EDTA supplemented with 1:20 volume of EDTA-free complete protease inhibitor cocktail (Roche)). Lysis was performed on ice for 20 minutes. Cell debris and insoluble chromatin were removed by centrifugation at 20,000×g for 10 min at 4° C. The supernatant was collected and treated with TURBO DNase (Invitrogen) at concentration of 10 μl TURBO DNase per ml lysate for 20 minutes at 37° C. RNAs were digested into ˜1000-2000 nt (ES-1) or ˜1000 nt (ES-2) fragments by adding 10 μl of 1:100 diluted RNase I (NEB) per ml of lysate and incubating at 37° C. for 3 minutes. Following RNase I treatment, the lysate was immediately transferred to ice for at least 5 minutes. Both RNase I and sonication based fragmentation leave 5′-OH and 3′-P ends, incompatible with RNA ligation, which suppress undesirable RNA ligations. To stop DNase digestion, EDTA (Ambion) was added to a 25 mM final concentration and incubated the mixture at 4° C. for 15 minutes with rotation. The fragmented dual cross-linked (ES-indirect) lysate was prepared as follows: after the lysis on ice for 20 minutes the suspension was directly subjected to fragmentation by sonication (Covaris E220) under the following settings: 20 min with 5% duty cycle, 140 Watts peak incident power and 200 cycles per burst at 4° C.
For cross-species experiment (Fly-Mm), approximately 3×108 E14 mES cells and 3×108 Drosophila S2 cells were lysed separately and then mixed before protein biotinylation.
To dissociate loosely bound proteins, 500 mM NaCl final concentration was added and the solution was incubated at 4° C. for 10 minutes with rotation. To further dissociate protein complexes and non-cross-linked RNAs and halt the activities of RNase I, SDS was added to a 0.3% final concentration and incubated the mixture with shaking at 750 r.p.m. for 15 minutes at 65° C. After letting the solution mixture cool down to room temperature, the cysteine residues were biotinylated by adding to the lysate 1:5 volume of 25 mM (13.56 mg/ml) EZlink Iodoacetyl-PEG2-Biotin (IPB) (Pierce Protein Research Products) and rotating the mixture in the dark for 90 minutes at room temperature. The biotinylation reaction was quenched by adding DTT to a 5 mM concentration and incubating at room temperature for 15 minutes. To neutralize SDS, Triton X-100 (Sigma) was added to a 2% final concentration and incubated at 37° C. for 15 minutes. The lysate sample was dialyzed in a 20 kD cutoff Slide-A-Lyzer Dialysis Cassette (Pierce Protein Research Products, Rockford, Ill.) at room temperature in 2 litters of dialysis buffer (20 mM Tris-HCl pH 7.5, 1 mM EDTA) to remove excess biotin. The dialysis buffer was changed at least thrice, once every 2 hours. Following dialysis, the lysate was transferred to a 15 ml tube.
The protein-RNA complexes were immobilized at low bead-surface density on streptavidin-coated beads (800 μl MyOne Streptavidin T1 beads, which is equivalent to 200 cm2 surface area). The advantages of immobilization on a solid surface include: (i) reduction of random intermolecular ligations between non-cross-linked oligonucleotides (R. Kalhor, H. Tjong, N. Jayathilaka, F. Alber, L. Chen, Nat Biotech 30, 90 (2012)), (ii) permit efficient buffer exchange, (iii) removal of non-physiologic interactions by stringent washes.
800 μl MyOne T1 beads were washed thrice with PBST (PBS with 0.1% Tween-20), resuspended in 800 μl of the same buffer and transferred into the biotinylated lysate. The bead-lysate suspension was rotated at room temperature for 45 minutes. During this incubation, 200 μl of neutralized 25 mM IPB was prepared by adding equal molarity of DTT and incubating at room temperature for at least 30 minutes. The beads were immobilized using a magnetic stand and most of the supernatant was aspirated out, leaving behind 4 ml of the supernatant. The beads were resuspended in the left-over solution followed by the addition of 200 μl of neutralized IPB. IPB was used to saturate excess of un-bound streptavidin after immobilization, which can interfere with subsequent step which involves biotin-tagged RNA linker. To remove the undesired RNAs non-covalently attached to proteins or via nonspecific protein-protein interactions (S. C. Kwon et al., Nat Struct Mol Biol 20, 1122 (2013); A. Castello et al., Nat. Protocols 8, 491 (2013)), the beads were washed three times with ice-cold denaturing washing buffer I (50 mM Tris-HCl pH 7.5, 0.5% lithium dodecyl sulfate, 500 mM lithium chloride, 7 mM EDTA, 3 mM EGTA, 5 mM DTT) with rotation at 4° C. for 5 minutes in every wash. Then the beads were washed with ice-cold high-salt wash buffer II (50 mM Tris-HCl pH 7.5, 1 M NaCl, 0.1% SDS, 1% IGEPAL CA-630, 1% sodium deoxycholate, 5 mM EDTA, 2.5 mM EGTA, 5 mM DTT), wash buffer III (1×PBS, 1% Triton X-100, 1 mM EDTA, 1 mM DTT), and PNK wash buffer (20 mM Tris-HCl pH 7.5, 10 mM MgCl2, 0.2% Tween-20, 1 mM DTT); each buffer two times with rotation for 5 minutes at 4° C. during the second wash.
Next, a biotin-tagged RNA linker (5′-rCrUrArG/iBiodT/rArGrCrCrCr ArUrGrCrArArUrGrCrGrArGrGrA) (SEQ ID NO: 1) was attached to the RNA's 5′ end. The biotin-tagged linker serves as a selection marker to enrich for the ligated the RNAs; it also delineates a clear boundary to unambiguously split any sequencing read that covered a ligation junction. The 5′-end of the RNA linker was temporarily “blocked” from ligation to avoid linker circularization or concatenation. This was achieved by synthesizing the linker with a 5′-OH group, which is incompatible with ligation but can be “re-activated” by phosphorylation. However, RNase I leaves a 5′-OH end, which is incompatible for linker ligation, thus the 5′ end was first phosphorylated with T4 Polynucleotide Kinase (PNK), 3′ phosphatase minus (NEB). The wild-type T4 PNK was not used due to its additional 3′ phosphatase activities, which modifies the 3′-ends of RNAs from 3′-P into 3′-OH, making them susceptible to self-ligation.
This was achieved by removing wash buffer and subsequently resuspending the beads in 100 μl of PNK reaction mixture (73 μl of RNase-free water, 10 μl of 10×PNK buffer, 10 μl of 10 mM ATP, 5 μl of 10 U/μl T4 PNK (3′ phosphatase minus) (NEB), 2 μl of RNAsin Plus (Promega)) and incubating for 1 hour at 37° C. with intermittent shaking at 1,200 r.p.m. for 5 seconds every 2 minutes. The beads were washed with wash buffer I, II, III and PNK, each buffer two times with rotation for 5 minutes at 4° C. in the second wash. The ice-cold washes were used to eliminate any left-over PNK which can phosphorylate the RNA linker, inducing it to be potentially ligated to the 3′-end of RNAs. After wash buffer was remove, the biotin-tagged RNA linker was ligated to RNA 5′-ends by adding 160 μl RNA ligation reaction mixture which contained 2 μl RNAsin Plus (Promega), 16 μl of 10 mM ATP, 16 μl of 10×RNA ligase buffer, 16 μl of 1 mg/ml BSA, 30 μl of 20 μM biotin-labelled linker, 64 μl of 50% PEG8000 (NEB), 16 μl of 10 U/μl T4 RNA ligase 1 (NEB). Ligation was carried out at 37° C. for 1 hour and at 16° C. overnight with intermittent shaking at 1,200 r.p.m. for 15 seconds every 2 minutes. BSA was added to enhance the activities of T4 RNA ligase and prevent bead aggregation. PEG was used to enhance intermolecular ligation by increasing the concentrations of the donor and the acceptor ends (D. B. Munafó, G. B. Robb, RNA 16, 2537 (2010)).
Next, the beads were washed twice with ice-cold wash buffer II, once with ice-cold wash buffer III, and PNK wash buffer. To prepare for proximity ligation, the RNA 3′-end was first dephosphorylated using the 3′ phosphatase activities of T4 PNK, leaving a 3′-hydroxyl group (I. Huppertz et al., Methods 65, 274 (2014)). After discarding wash buffer, the beads were mixed with 73 μl of RNase-free water, 20 μl of 5×PNK buffer pH 6.5 (350 mM Tris-HCl pH 6.5, 50 mM MgCl2, 10 mM DTT), 5 μl of 10 U/μl T4 PNK (3′ phosphatase minus) (NEB), 2 μl of RNAsin Plus (Promega) and incubated for 20 minutes at 37° C. with intermittent shaking at 1,200 r.p.m. for 5 seconds every 2 minutes. The beads were washed once with PNK wash buffer and the 5′-end of the biotin-labelled linker was phosphorylated in 100 μl of PNK reaction mixture (73 μl of RNase-free water, 10 μl of 10×PNK buffer, 10 μl of 10 mM ATP, 5 μl of 10 U/μl T4 PNK (3′ phosphatase minus) (NEB), 2 μl of RNAsin Plus (Promega)) for 1 hour at 37° C. with intermittent shaking. Following phosphorylation, the beads were wash twice in PNK wash buffer and proximity ligation was then performed under extremely diluted conditions in a 15 ml total volume reaction (8.9 ml of RNase-free water, 1.5 ml of 10 mM ATP, 1.5 ml of 10×RNA ligase buffer, 75 μl of 20 mg/ml BSA (NEB), 25 μl of 1 M DTT, 2.25 ml of 100% DMSO, 0.75 ml of 10 U/μl T4 RNA ligase 1 (NEB)) to minimize inter-complex ligations. The proximity ligation was carried out at 37° C. for 1 hour and at 16° C. overnight with continuous rotation. Dimethylsulfoxide (DMSO) was added to a 15% (v/v) final concentration to stimulate ligation of highly structured RNAs.
The following day, ligation was stopped by adding EDTA to a final concentration of 25 mM and rotating for 15 minutes at 4° C. to prevent inter-molecular ligation from happening as the beads were collected on the wall of the tube. The beads were washed once in PBST. The protein-RNA complexes were next eluted from streptavidin beads twice in 100 μl of Elution Buffer (100 mM Tris-HCl pH 7.5, 50 mM NaCl, 10 mM EDTA, 1% SDS, 10 mM DTT, 2.5 mM D-biotin (Invitrogen)) by heating to 95° C. for 5 minutes. The resulting solutions were combined, mixed with 50 μl of 800 U/ml Proteinase (NEB) and incubated at 55° C. for 2 hours. The mixture was then topped-up with RNase-free water to the final volume of 400 μl. RNAs were extracted in 400 μl of phenol:chloroform:isoamyl alcohol (125:24:1, pH 4.5) (Ambion) and incubation at 37° C. for 20 minutes with shaking at 1000 r.p.m. The mixture was transferred into a 2 ml MaXtract high density phase lock gel tube (Qiagen) and centrifuged at 16,000×g for 5 minutes at room temperature. Residual phenol was removed by adding 400 μl of chloroform to the same MaXtract tube and centrifugation at 16,000×g for 5 minutes at room temperature. Following centrifugation, the aqueous phase was transferred into a new tube and RNAs were precipitated by adding 1:9 volume of 3 M sodium acetate pH 5.2, 1.5 μl of glycoblue (Ambion) together with 1 ml of 1:1 ethanol:isopropanol and incubating at −20° C. overnight. The precipitated RNA was pelleted by centrifugation at 21,000 g for 30 minutes at 4° C. After discarding the supernatant, the pellet was washed twice with 80% ethanol and air-dried until ethanol completely evaporated. The purified RNAs at this stage were a mixture of RNAs without linkers (RNA1 or RNA2), RNAs ligated with linkers but not proximity-ligated with other RNAs (5′-linker-RNA2), and the desirable chimeric constructs in the form of 5′-RNA1-linker-RNA2. RNA1 can be depleted by selection of the biotin tagged linker. The non-informative 5′-linker-RNA2 was therefore depleted as well as in the next reaction with T7 exonuclease.
6.1.
Removing biotin from terminal linkers (5′-linker-RNA2). This was based on the RNase H activity of T7 exonuclease, which not only removes 5′ mononucleotides from duplex DNA but also exert exonucleolytic activity on the RNA strand from a RNA-DNA hybrid (K. Shinozaki, O. Tuneko, Nucleic Acids Research 5, 4245 (1978)). A complementary DNA oligonucleotide (5′-T*C*G*C*ATTGCATGGGCTACT AGCAT (SEQ ID NO: 2), where * denotes the phosphorothioate bond to block its digestion by T7 exonuclease (T. T. Nikiforov, R. B. Rendle, M. L. Kotewicz, Y. H. Rogers, Genome Research 3, 285 (1994)) was annealed to the RNA linker, creating a double stranded DNA-RNA hybrid between the RNA linker and the complementary DNA strand. The complementary DNA strand was designed so that after annealed, the 5′-end of the RNA linker was recessed while the 3′-end of the DNA strand was protruding. The annealed products were then treated with T7 exonuclease.
The RNA pellet was resuspended in 17 μl of RNase-free water, 4 μl of 10×NEBuffer4, 7 μl of 100 μM complementary DNA oligo. Annealing was performed by denaturing at 70° C. for 5 minutes and then slowly ramping down the temperature (at −0.1° C./s) to 60° C., incubating at 60° C. for another 5 minutes before slowly cooling down (−0.1° C./s) to 37° C. and incubating at 37° C. for 15 minutes. The annealed mixture was then mixed with 8 μl of 10 U/μl T7 exonuclease (NEB), 4 μl of 1 mg/ml BSA and incubated at 37° C. for 30 minutes and another 30 minutes at 30° C. The DNA oligonucleotides was removed as well as any contaminating genomic DNA using TURBO DNase rigorous treatment: 44 μl of RNase-free water, 10 μl of 10×TURBO DNase buffer, 6 μl of TURBO DNase (Invitrogen) was added and the resulting mixture was incubated at 37° C. for 1 hour. DNase-treated RNA was purified by phenol:chloroform extraction and ethanol precipitation as described above.
6.2.
Removal of rRNAs by antibody-based depletion of RNA-DNA hybrid (GeneRead rRNA Depletion Kit (Qiagen)) in ES-2, MEF samples. rRNA was removed according to the manufacturer's instructions with the following modifications. Instead of cleaning up depleted RNA by RNeasy MinElute spin columns which will remove RNAs shorter than 200 nucleotides, excess rRNA capture probes were removed by rigorous DNase-treatment. DNase-treated RNA was also purified by phenol:chloroform extraction and ethanol precipitation as described above.
6.3.
RNA shearing. Following ethanol precipitation, RNA was fragmented into size range of 150-400 bp, optimal for sequencing by Illumina HiSeq, by using the RNase III fragmentation kit according to the manufacturer's protocol. Fragmented RNA was purified by 2.2×SPRISelect beads (Beckman Coulter Genomics) and ethanol precipitated as described above.
6.4.
Ligation with reverse transcription adapter. Next, the RNAs were ligated with a 3′ reverse transcription (RT) adapter (/5rApp/AGATCGGAAGAGC GGTTCAG/3ddC/ (SEQ ID NO: 3)) that served as a primer for a RT reaction. Following ethanol precipitation, the RNA pellet was resuspended in 20 μl of ligation reaction mixture: 1 RNAsin Plus (Promega), 2 μl of 10×RNA ligase buffer, 7 μl of 20 μM pre-adenylated L3-App adapter, 8 μl of 50% PEG8000 (NEB), 2 μl of 200 U/μl T4 RNA ligase 2, truncated KQ (NEB). The reaction was incubated overnight at 16° C.
6.5.
Reverse transcription. Following ligation, RNA was purified by 2× SPRISelect beads (Beckman Coulter Genomics) and eluted in RNase-free water. The following RT reaction is described for 2 μg of RNA and was scaled up accordingly for higher amount of RNAs. For each experiment or replicate, a different RT primer containing individual experimental barcode sequence was used. Each RT primer has the form of 5′-/5Phos/NNXXXXNNNNAGATCGGAAGAGCGTCGTGgatcCTGAACCGCTCTTCCGAT CT (SEQ ID NO: 4). According to this scheme, the first read of every sequencing read pairs contains a barcode that takes the configuration of NNNNXXXXNN (SEQ ID NO: 5) (reverse complement of that from the RT primer), where the Ns are a random 6 nt barcode for removing PCR duplicates (G. B. Loeb et al., Molecular cell 48, 760 (Dec. 14, 2012); Z. Wang et al., PLoS Biol 8, e1000530 (2010); J. Konig et al., Nature structural & molecular biology 17, 909 (July, 2010); S. W. Chi, J. B. Zang, A. Mele, R. B. Darnell, Nature 460, 479 (Jul. 23, 2009)). Any two pair-end reads with identical mapped locations and random barcodes would be counted as only one. The XXXX is a fixed 4 nt sample barcode for multiplexed sequencing (AGGT for ES-1, CGCC for ES-2, CATT for ES-indirect, CGCC for MEF). Any two 4 nt sample barcodes differs by three nucleotides to avoid potential confusions from mutations or sequencing errors.
For cDNA synthesis, 9 μl of RNA was mixed with 1 μl 10 mM dNTPs and 1 μl of 50 μM RT primer. The mixture was heated at 65° C. for 5 minutes and snap-cooled in ice for at least 2 minutes. 4 μl of 5× First-Strand buffer (Invitrogen), 1 μl DTT 0.1 M, 1 μl RNasin Plus, 1 μl of 10 mg/ml T4 gene 32 protein (NEB) were added. The resulting mixture was incubated at 50° C. for 2 minutes before adding reverse transcriptase enzyme to minimize mispriming. Then 2 μl of 200 U/μl Superscript III reverse transcriptase (Invitrogen) was added to the solution. The RT reaction mixture was then incubated at 50° C. for 45 minutes, 55° C. for 20 minutes followed by 4° C. hold. Here, the heat-inactivation of reverse transcriptase enzyme was omitted in order to preserve the RNA-cDNA hybrids.
Streptavidin-biotin affinity purification was used to enrich for chimeric RNA-DNA hybrids. This pull-down was carried out after the second RNA fragmentation and reverse transcription in order to allow a substantial fraction of the sequencing read pairs to cover the RNA-linker or linker-RNA junctions, in one end of the read pair.
Specifically, 50 μl of Myone C1 beads (Invitrogen) was prepared by washing twice with 1× Tween B&W buffer (5 mM Tris-HCl pH 8.0, 0.5 mM EDTA, 1 M NaCl, 0.05% Tween) and once with 1×B&W buffer (5 mM Tris-HCl pH 8.0, 0.5 mM EDTA, 1 M NaCl). The beads were then resuspended with 100 μl of 2)<B&W buffer (10 mM Tris-HCl pH 8.0, 1 mM EDTA, 2 M NaCl). The RT mixture was topped up with RNase-free water to the final volume of 100 μl before being combined with 100 μl C1 bead suspension and incubated at RT for 30 minutes with rotation. The beads were reclaimed and washed thrice with 1×B&W buffer before being transferred into a new tube, followed by washing once with TE buffer pH 8.0. Next, the cDNA strand was released from streptavidin beads by completely digesting the RNA strand in 50 μl RNase H elution mixture (39.5 μl of RNase-free water, 5 μl 10×RNase H reaction buffer, 0.5 μl 10% Tween-20, 5 μl 5 U/μl RNase H (NEB)) for 1 hour at 37° C. The beads were collected on the tube wall using a magnetic concentrator and the supernatant was collected in a new tube for subsequent manipulations. RNase H was inactivated by heating at 70° C. for 20 minutes. cDNA was purified by 2.2× SPRISelect beads (Beckman Coulter Genomics) (v/v).
Considering the UV-induced cross-link site sometimes stalls reverse transcription, resulting in truncated cDNAs that lack the 5′ adapter (Y. Sugimoto et al., Genome Biology 13, R67 (2012)), a circularization strategy was adopted that allowed for constructing sequencing libraries even from truncated cDNAs (I. Huppertz et al., Methods 65, 274 (2014)) (
8.1.
Circularization. cDNA was circularized by CircLigase II (Epicentre). Briefly, cDNA was eluted from SPRISelect beads in 20 μl CircLigase reaction mixture (12 μl of sterile water, 2 μl of CircLigase II 10× reaction buffer, 1 μl of 50 mM MnCl2, 4 μl of 5M Betaine, 1 μl of 100 U/μl CircLigase II (Epicentre)) and incubated for 2 hours at 60° C. CircLigase II was inactivated by incubating the reaction at 80° C. for 10 minutes.
8.2.
Relinearization. A complementary DNA oligo was annealed to the RT primer, generating a short double-stranded region suitable for BamHI restriction. This strategy also prevents BamHI activities on other endogenous BamHI restriction sites. Next, BamHI were applied, creating linear cDNAs with adapters at both 5′ and 3′ ends to prime subsequent PCR amplification. Next, oligo annealing mixture (43 μl water, 6 μl 10× FastDigest Buffer (Fermentas), 5 μl 20 μM Cut_oligo (5′-GTTCAGGATCCACGACGC TCTTCAAAA/3InvdT/) (SEQ ID NO: 8) was added into the CircLigase II reaction. Annealing was carried out by heating to 95° C. for 2 minutes, followed by 71 cycles of 20 seconds each, starting from 95° C. and decreasing the temperature by 1° C. after every cycle down to 25° C. and holding at 25° C. 6 μl of FastDigest BamHI (Fermentas) was added and incubated at 37° C. for 30 minutes. Re-linearized cDNA was purified by 2×SPRISelect beads (Beckman Coulter Genomics) (v/v) and eluted in nuclease free water.
8.3.
First PCR pre-amplification and size selection. Single-stranded cDNA was first pre-amplified by PCR using a truncated version of PCR primers (forward primer DP5, 5′-CACGACGCTCTTCCGATCT (SEQ ID NO: 9); reverse primer DP3, 5′-CTGAACCGCTCTTCCGATCT) (SEQ ID NO: 10) with small number of cycles (6 cycles). It was found that the final libraries were less prone to be contaminated with undesirable smaller size fragments (primer-dimers, products which contain only the barcode and/or RNA linker) by doing size selection at this stage.
Six cycles of PCR were performed in a 40 μl reaction which contained 20 μl of NEBNext High-Fidelity 2×PCR Master Mix (NEB), 0.625 μM of each DP5/DP3 primer using the following temperatures: 1 cycle of initial denaturation at 98° C. for 30 seconds; 6 cycles of amplification with 98° C. for 10 seconds, 65° C. for 30 seconds, 72° C. for 30 seconds; followed by final extension at 72° C. for 5 minutes; and hold at 4° C. The PCR product was purified by 1.8×SPRISelect beads (v/v) and size-selected using E-gel EX 2% Agarose gels (Invitrogen). The DNA fragments between 150 bp and 350 were excised from the gel and purified using MinElute gel extraction kit (Qiagen).
8.4.
rRNA removal by duplex-specific nuclease (DSN) approach (H. Yi et al., Nucleic Acids Research 39, e140 (2011)) (ES-1, ES-indirect). To reduce rRNA cDNAs from ES-1 and ES-indirect library, ss-cDNA were also pre-amplified using the truncated PCR primer DP5/DP3. However, the PCR cycle number was increased until 80-100 ng of cDNA could be obtained after purification by 1.8×SPRISelect beads (Beckman Coulter Genomics) (v/v). The size selection by agarose gel was skipped as this would largely reduce the amount of DNA. The eluted DNA from SPRISelect beads was mixed with 4.5 μl hybridization buffer (2 M NaCl, 200 mM HEPES, pH 8.0) and sterile water (if necessary) to a final volume of 18 μl. The resulting mixture was denatured at 98° C. for 2 minutes and re-annealed at 68° C. for 5 hours on a thermal cycler. While the reaction mix tube was still in the thermal cycler, 20 μl of 68° C.-preheated 2×DSN buffer (Axxora) was added to the reaction mix, mixed well by pipetting up and down 10 times and incubated the reaction for 10 minutes at 68° C. 2 μl of 1 U/μl DSN enzyme (Axxora) was added, mixed, and incubated at 68° C. for 25 more minutes. The reaction was stopped by adding 40 μl of 2×DSN stop solution (Axxora) to the reaction mix tube, mixing well and transferred the tube to ice. The reaction mixture was then purified using 1.8×SPRISelect beads.
8.5.
Final PCR amplification. PCR amplification was performed on the DNA produced from previous steps using full-length PCR primer PE 1.0 and 2.0 (Illumina). The number of PCR cycles was carefully titrated by running pilots PCRs with small aliquots of DNA to avoid over-amplification. The PCR products were purified by 1.8×SPRISelect beads (v/v) and size-selected fragments between 250-550 (120-420 bp insert plus ˜130 bp, the combined length of Illumina PE 1.0/2.0). Final libraries were quantified by Qubit (Invitrogen) and qPCR, quality-checked by Bioanalyzer (Agilent Technologies) and submitted for paired-end sequencing on Illumina HiSeq platform.
The custom-designed RNA and DNA oligonucleotides used in the procedure are:
Biotinylated RNA linker (RNase-free HPLC-purified from IDT):
Complementary DNA strand with RNA linker (RNase-free HPLC-purified from Sigma):
Pre-adenylated RT adapter (RNase-free HPLC-purified from IDT):
RT primers (adapted from (I. Huppertz et al., Methods 65, 274 (2014))) (RNase-free HPLC-purified from Sigma):
RT Primer for the ES-1 sample:
RT Primer for the ES-2 and MEF samples (sequenced on different lanes):
RT Primer for the ES-indirect sample:
Cut_oligo (HPLC-purified from IDT)
BamHI restriction site is underlined and in bold print.
Truncated PCR Forward Primer DP5 (HPLC-purified from IDT):
Truncated PCR Reverse Primer DP3 (HPLC-purified from IDT):
Illumina PE PCR Forward Primer 1.0 (PAGE-purified from Sigma):
Illumina PE PCR Reverse Primer 2.0 (PAGE-purified from Sigma):
RNA-HiC-tools is a package of command-line tools for analyses of RNA Hi-C data. It is written in Python and R and is version controlled by GitHub. The full documentation is at http://systemsbio.ucsd.edu/RNA-Hi-C. The pipeline takes pair-end sequencing reads as input (
The forward read (Read1 in
2. Assigning Multiplexed Sequencing Reads into Corresponding Experimental Samples
The tool ‘split_library_pairend.py’ assigns each pair-end read into a sample by matching the sample barcode in each read with those in the list of sample barcodes (a user input text file), generates a fastq/fasta file for the reads assigned to each sample, as well as a fastq/fasta file for the unassigned reads.
3. Recovering the cDNAs in the Sequencing Library
This step identifies the overlapping regions of the two ends of every read pair, if any. It also recovers the entire sequences of the cDNAs in the sequencing library, whenever possible.
If an overlap existed, this read pair was sequenced from a cDNA between 100 bp and 200 bp (not counting the lengths of P5 and P7) (Type 2,
If the cDNA was shorter than 100 bp, the presence of the P5 and the P7 primers at the two ends of the cDNA were verified (Type 1). The ones did not contain P5 or P7 were discarded (Type 4).
Without an overlap, the read pair was sequenced from a cDNA longer than 200 bp, whose sequence can only be partially recovered (Type 3,
This function is achieved by ‘recoverFragment.py’, which uses local alignment to identify the overlapping regions. When the overlap was small (15 bp or less) compared to read length (100 bp on each end), local alignment could be insensitive. To overcome this insensitivity, ‘recoverFragment.py’ collects the read pairs without identifiable overlaps after the first alignment (ALIGN1,
4. Parsing the Chimeric cDNAs
This step categorizes the cDNAs based on their configurations (
Four linker-containing categories, including:
Hereafter, all analyses were based on the RNA1-Linker-RNA2 type of read pairs. First, any cDNA containing less than 15 bp on either the RNA1 or RNA2 side of linker was discarded, because it is unlikely to uniquely map a 15 bp or less sequence to the genome in the mapping step. Then the two RNA fragments on each side of the linker (RNA1 and RNA2) were separately mapped to the mouse genome mm9/NCBI37 using Bowtie version 0.12.7 (B. Langmead, C. Trapnell, M. Pop, S. L. Salzberg, Genome Biology 10, (2009)), and parameters -f -n 1 -l 15 -e 200 -p 9 -S. This step, implemented in ‘Stitch-seq_Aligner.py’ outputs the read pairs where both RNA1 and RNA2 were uniquely mapped to the genome.
A potentially more sensitive mapping method was tested using Bowtie2 (B. Langmead, S. L. Salzberg, Nat Methods 9, 357 (April, 2012))'s “—sensitive-local” mode, with parameters “-D 15 -R 2 -N 0 -L 20 -i S,1,0.75”. This “multiseed alignment” used 20 bp seeds, allowing for 0 mismatches in any seed, 9 bp intervals (ceil (1+0.75×√100) between seeds, up to 15 consecutive seed extension attempts, and up to 2 times of “re-seeding”. It turned out that this alternative strategy identified slightly fewer unique alignments than Bowtie 0.12.7. The Bowtie 0.12.7 results were therefore passed into the next steps.
The annotations were retrieved from Ensembl (release 67, mouse NCBIM37), including the genes of mRNAs, lincRNAs, rRNAs, snRNAs, snoRNAs, miRNAs, misc_RNAs, tRNAs, and transposons. The different genomic copies of the same transposon were considered as different genes in this analysis. The reads mapped to rRNAs were removed from further analysis. The number of uniquely aligned reads (from either RNA1 or RNA2 of the RNA1-Linker-RNA2 type) were counted on every gene. Any gene with a read count less than 5 was filtered out. Next, the association between any two genes was tested with Fisher's exact test. The null hypothesis was that gene A and gene B independently contributed to the sequencing reads. The alternative hypothesis was that their contributions to read counts were associated. cA, cB were denoted as the read counts for gene A and gene B, respectively, and IA, B as the read counts of co-appearance, where the two genes co-appeared on the same read pair. A Fisher's exact test was carried out on each gene pair, with IA,B, cA, cB, cA, cB,
The RNA interaction site was defined as a continuous RNA segment that frequently contributed to RNA-RNA interactions. RNA interaction sites were inferred from RNA Hi-C data as continuous RNA segments with multiple overlapping reads and frequent co-appearance (proximity ligation) with other RNAs. First, any continuous RNA segment covered by 5 or more uniquely aligned reads was identified as a candidate interaction site. Second, the association between any two candidate sites were tested with Fisher's exact test. The null hypothesis was that candidate sites A and gene B independently contributed to the sequencing reads. The alternative hypothesis was that their contributions to read counts were associated. cA, cB, was denoted as the read counts for candidate sites A and B, respectively, and IA,B as the read counts of co-appearance, where the two sites co-appeared on the same read pair. A Fisher's exact test was carried out on each site pair, with IA,B, cA, cB, cA, cB,
The tool ‘Plot_interaction.py’ was developed for visualizing RNA interaction sites and the ligation events of these sites (
The tool ‘Plot_Circos.R’ provides a global view of the RNA-RNA interactome (
The binding energies between two RNA interaction sites were calculated by the DuplexFold program from RNAstructure version 5.6 (S. Bellaousov, J. S. Reuter, M. G. Seetin, D. H. Mathews, Nucleic Acids Res 41, W471 (July, 2013)). The base paring between two interaction sites was determined by MiRanda version 3.3a (D. Betel, A. Koppal, P. Agius, C. Sander, C. Leslie, Genome Biol 11, (2010).
For every read pair in the RNA1-Linker-RNA2 category (output of Step 4), the PhyloP conservation scores were obtained (G. M. Cooper et al., Genome Res 15, 901 (July, 2005)) of two 1000 bp genomic regions, one centered at the ligation junction of RNA1-Linker and the other centered at the ligation junction of Linker-RNA2. The average PhyloP scores of all the RNA1-Linker-RNA2 type read pairs were plotted. As a control, average PhyloP scores from the same number of random genomic regions of the same lengths were obtained.
The identified RNA-RNA interactions (output of Step 6) were converted to tabular format and imported into Cytoscape 3.1.0 (R. Saito et al., Nat Methods 9, 1069 (November, 2012)) for visualization. Each node represents a gene and is color-coded by the gene type. The degree of each node was calculated by Cytoscape.
Detecting Read Pairs Generated from Intra-Molecule Cutting and Ligation
Starting from the RNA1-Linker-RNA2 type of read pairs (output of Step 6), the following filters to identify the pair-end reads generated from self-interacting RNAs were applied:
Structural information of the RNAs with known or generally accepted structures was downloaded from fRNAdb database v3.4 (T. Mituyama et al., Nucleic Acids Research 37, D89 (January, 2009)) in DOT format (graph description language). Figures were drawn from the DOT files using the command line version of VARNA Applet version 3.9 (K. Darty, A. Denise, Y. Ponty, Bioinformatics 25, 1974 (Aug. 1, 2009)). For the RNAs without structural information in fRNAdb, their secondary structures were predicted based on the sequence using the “Fold” program in RNAstructure version 5.6 (S. Bellaousov, J. S. Reuter, M. G. Seetin, D. H. Mathews, Nucleic Acids Res 41, W471 (July, 2013)).
The first control experiment skipped the cross-linking step in the procedure. The second control experiment skipped the protein biotinylation step. The third control experiment carried out the entire procedure on the mixed cell lysate of mouse ES cells and Drosophila S2 cells.
A non-cross-linking control with approximately 3×108 mouse ES cells was first carried out. The RNAs immobilized with proteins on streptavidin beads were purified by protein digestion as previously described. The purified RNAs were subjected to quantification by Qubit RNA HS assay (Invitrogen). The RNAs were below the detection limit of the assay (250 pg/μl). The sample volume was 20 μl (the same as previously described), which suggests that the RNA abundance was no more than 5 ng. At this point, the experiment was stopped because there was no chance to accomplish linker selection and library construction. In previously described experiments, the purified RNAs would be in the μg range at this step.
Second, another control was performed by not doing protein biotinylation (keeping cross-linking) with 3×108 mouse ES cells. It turned out the RNAs purified from the beads were below the detection limit of Qubit RNA HS assay.
Third, the experiment was started with 3×108 Drosophila S2 cells and 3×108 mouse ES cells (cross-species control). The cells were cross-linked and lysed. The lysate from the two cell lines were mixed before protein biotinylation and proximity ligation. The mixture was subjected to the rest of the experimental procedure to produce a sequencing library (Fly-Mm). Fly-Mm contained 27,748,688 read pairs. After removing duplicate reads and splitting by the linker, there were 16,881,326 RNA1-RNA2 pairs. Each RNA part (either RNA1 or RNA2) was mapped to the fly genome (dm6) and mapped to the mouse genome (mm9). A total of 7,188,769 pairs had at least one part (either RNA1 or RNA2) that was not mappable to either mouse or fly genome. The rest 9,692,557 RNA1-RNA2 pairs had both parts mapped to the genomes, among which 8,484,807 pairs had each RNA part uniquely mapped to only one genome. The distribution of these mapped RNA pairs is as follows (Table 6). The proportion of RNA pairs mapped to two species is 0.52% (44,229/8,484,807).
Furthermore, it was inquired what would happen if the ES-1 library (pure mouse sample) were to be subjected to the analysis above. It turned out that 0.55% of the RNA1-RNA2 pairs would have one RNA part mapped uniquely to the mouse genome and the other part mapped uniquely to the fly genome. Therefore, the “contamination rate” for Fly-Mm sample (0.52%) was even smaller than that of the ES-1 sample (0.55%), suggesting that the experimental contamination (supposedly due to random ligation) was so low that it fell into the error range of the informatics procedure.
FA-DSG dual cross-linking was compared to psoralen cross-linking and formaldehyde (FA) cross-linking in RAP-sequencing (J. M. Engreitz et al., RNA-RNA interactions enable specific targeting of noncoding RNAs to nascent Pre-mRNAs and chromatin sites. Cell 159, 188 (Sep. 25, 2014)). After cross-linking, Engreitz et al. used antisense oligonucleotides to purify nuclear Malat1 RNA, and sequenced the RNA that were purified together with Malat1. Engreitz et al. found little overlap of the Malat1 targets between dual cross-linking and the other two cross-linking methods. Except for one RNA, the hundreds of RNAs co-purified with Malat1 in the dual cross-linking were all unique (Supplementary Table 3 of Engreitz et al.). Engreitz et al. attributed this to the idea that dual cross-linking could “efficiently capture RNAs linked indirectly through multiple protein intermediates.” UV cross-linking (our method) was less effective than psoralen in nucleic acid to nucleic acid cross-linking, and was less effective than FA overall. Based on the published data, it was not expected that the detected RNA pairs by UV cross-linking and dual cross-linking strongly overlap.
More specifically, snoRNAs are short (˜150 nt) and are likely wrapped around or within the snoRNP protein complex when interacting with mRNA. Dual cross-linking is expected to retain the entire snoRNP complex. The snoRNP complex is expected to hinder RNase I from cutting snoRNA and also hinder RNA ligation. Therefore, large differences in the detected interactions involving snoRNA was expected.
Other RNAs with miRNA-Like Interactions.
It was inquired whether other RNAs could experience a similar process to miRNA biogenesis and also interact with mRNAs. The RNA Hi-C identified interacting RNAs with those found by small RNA sequencing (smallRNA-seq) and those bond to the AGO protein (HITS-CLIP) in ES cells. The smallRNA-seq selectively sequenced, “miRNAs and other small RNAs that have a 3′ hydroxyl group resulting from enzymatic cleavage by Dicer or other RNA processing enzymes”. Besides miRNA, other RNA types including snoRNA, pseudogeneRNA, mRNA UTRs also contributed to the small RNA pool, and were attached to AGO (
To elucidate what types of non-miRNA genes were most likely to undergo miRNA-like biogenesis, the RNA Hi-C identified RNA-RNA interactions to the following filters were subjected:
1. the interaction involves one mRNA (dubbed target) and one other RNA (source RNA);
2. the source RNA is processed into small RNA by enzymatic cleavage (FPKM>0 in smallRNA-seq);
3. both the target and the source RNAs appear in AGO HITS-CLIP (FPKM>0 for both RNAs);
4. the RNA Hi-C identified interaction sites on the source and the target RNAs exhibit strong base pairing (p-value <0.05, Wilcoxon signed-rank test comparing the binding energies between the RNA1 and RNA2 sequences of every pair-end read to the binding energies of randomly shuffled nucleotide sequences).
A total of 302 RNA-RNA interactions passed these filters. The majority (79%) of the source RNAs in these interactions were snoRNAs (Table ST2). The snoRNAs were prioritized for functional analysis.
It was hypothesized that a large number of snoRNAs were enzymatically processed into miRNA-like short RNAs and interact with mRNAs. This hypothesis was supported by 919 RNA Hi-C identified snoRNA-mRNA interactions where both the mRNA and the snoRNA were bound by AGO. Furthermore, AGO bound snoRNAs and their interacting mRNAs exhibited anti-correlated expression changes during guided differentiation of ES cells toward mesendoderm (P. Yu et al., Spatiotemporal clustering of the epigenome reveals rules of dynamic gene regulation. Genome research 23, 352 (February, 2013))(
Mapping RNA-RNA Interactome and RNA Structures In Vivo without Perturbation
It remains formidable to analyze the entire RNA-RNA interactome. The RNA Hi-C technology was developed to map RNA-RNA interactions embraced by any single protein in vivo, without any perturbation. The RNA-RNA interactome was systematically mapped in embryonic stem cells, revealing 46,780 interactions. 7 interactions were validated using RAP-seq 1. In this interactome the majority of miRNAs and lincRNAs each specifically interacted with one mRNA, which contradicts the current dogma of “promiscuous” RNA interactions. Base pairing was observed at the interacting regions between long RNAs, suggesting a class of regulatory sequences acting in trans. In addition, RNA Hi-C provided new information on RNA structures, by simultaneously revealing the footprint of single stranded regions and the spatially proximal sites of each RNA. This technology vastly expands the identifiable portion of an RNA-RNA interactome, without perturbing the endogenous level of RNA expression.
Simulation analysis of RNA Hi-C.
Data Synthesis.
In order to estimate the sensitivity and specificity of RNA Hi-C, including its experimental and computational procedures, a simulation analysis was carried out. 1,000,000 pair-end reads was simulated by computationally mimicking the data generation process. The parameters used for the simulation were derived from real data. The simulated data generation process is as follows.
For each pair-end read (2×100 bases):
1. Choose a sample barcode from the four sample barcodes with equal probabilities and concatenate it with a 6 nt random barcode (as in
2. Assign this pair-end read to a type of cDNAs from the list of [linkerOnly, NoLinker, RNA1-linker, linker-RNA2, RNA1-linker-RNA2] with probability [0.1, 0.3, 0.1, 0.3, 0.2], respectively (as in
3. If this read-pair was assigned to a linker-containing type, randomly choose 1 or 2 linkers with equal probability. It is noted that a small percentage of linker-containing read-pairs contained 2 linkers; the use of equal probability was a conservative choice for estimating worst cases.
4. Generate the sequences for the RNA1 and the RNA2 parts, according to the cDNA type determined in Step 2. For both RNA1 and RNA2,
5. Concatenate the barcodes, linker, and RNA fragments generated from Steps 1, 3, 4, producing a synthetic cDNA sequence.
6. If the synthetic cDNA in Step 5 is 100 bp or longer, take the 100 bases from the two ends of the synthetic cDNA in forward and reverse strands respectively.
7. If the synthetic cDNA in Step 5 is shorter than 100 bp, assign its forward and reverse strands as the forward and the reverse reads, and concatenate P5 and P7 primer sequences to the two reads.
8. Simulate sequencing errors with a rate of 0.01 on each base (N. J. Loman et al., Performance comparison of benchtop high-throughput sequencing platforms. Nature biotechnology 30, 434 (May, 2012).
Steps 1-5 simulated a cDNA sequence according the experimental procedure, and steps 6-8 simulated a pair-end read based on this cDNA sequence. The simulated interacting RNA pairs, as well as the cDNA type and the length of each part (RNA1, linker, and RNA2, if applicable) were kept for comparison with the computational predictions.
The synthetic data was used to evaluate the sensitivities and specificities of two intermediate analysis steps, as well as the final predictions.
First, the program-identified cDNA lengths were compared (output of Step 3 of RNA-HiC-Tools) to the actual (synthesized) lengths (Table 8). This step “3. Recovering the cDNAs in the sequencing library” assigns each cDNA into four types with respect to their lengths, namely Type 1 (<100 bp); Type 2 (100-200 bp); Type 3 (>200 bp); Type 4 (unknown). The algorithm achieved high sensitivity and specificity for identifying each type. Only very few (0.58%) of the cDNAs shorter than 200 bp were identified as longer than 200 bp. These errors were due to a small overlap (typically between 0 and 5 bps) of the forward and the reverse reads, which were not detected by the local alignment.
When the program identified length was shorter than 200 bp (Types 1 and 2), the exact length could be computed. In these cases, the program identified lengths often precisely matched the lengths of the simulated cDNAs (
Next, the program identified chimeric configuration of each cDNA and they were compared (output of Step 4 of RNA-HiC-Tools) with the synthesized configuration. In Step “4. Parsing the chimeric cDNAs”, the algorithm assigned the cDNAs into five categories, based on the presence of the linker sequence. The algorithm reached 99.89% sensitivity and 95.82% specificity for the cDNAs in the “RNA1-linker-RNA2” form (Table 9).
Lastly, the program identified and the simulated RNA-RNA interactions, which were compared. The simulated dataset contained 200,200 chimeric RNA pairs, among which 131,571 pairs of RNAs were detected (sensitivity=65.72%, specificity=92.57%). The sensitivity and specificity for interactions of each type of RNAs were also separately calculated (
A Malat1 RAP-sequencing experiment on mouse ES cell was carried out. After cross-linking, five antisense oligonucleotides were used to pulldown Malat1 and then sequence the other RNAs that were purified together with Malat1. Actin RAP-sequencing was performed as the control. Malat1 RNA itself exhibited a 5.81 fold increase in Malat1 RAP-seq than Actin RAP-seq, confirming the validity of the purification. RNA Hi-C reported that Malat1 as a “hub” lincRNA which interacted with Tfrc, S1c2a3, Eif4a2, and 0610007P14Rik RNA. These RNAs showed 14.6 (0610007P14Rik), 4.53 (S1c2a3), 3.38 (Eif4a2), and 2.39 (Tfrc) fold increase in Malat1 RAP-seq than Actin RAP-seq (the largest Chi-square test p-value <0.0003). This suggests a strong overlap of Malat1 targets from RNA Hi-C and Malat1 RAP-seq.
For another validation, a Tfrc RAP-seq experiment was performed. Tfrc was identified as a Malat1 interacting RNA from RNA Hi-C (
The other RNAs interacting with Tfrc as identified by RNA Hi-C was checked and could be validated by Tfrc RAP-seq as well. RNA Hi-C data identified a total of five RNAs as interacting with Tfrc. Besides Malat1, the other four were all snoRNAs, namely Snord13, SNORA3, Snord52, SNORA74. Three of these 4 snoRNAs exhibited fold increases (1.4 fold for Snord13, 13.6 fold for SNORA3, 8.7 fold for SNORA74) in Tfrc RNA-seq as compared to Actin RAP-seq, confirming these interactions (Chi-square test p value <0.00002). In summary, RAP-seq confirmed nearly all RNA Hi-C identified interactions. With the two types of experiments (RNA Hi-C and RAP-seq), a few RNA interactions (mentioned above) were nominated as “real” in mouse ES cells.
Comparison of snoRNA-mRNA Interactions with mRNA Pseudouridines.
The pseudouridylation sequencing data (Ψ-seq) were compared with the RNA-interaction sites. Schwartz et al. carried out Ψ-seq in yeast and in mouse bone-marrow-derived dendritic cells (BMDDC). BMDDC Ψ-seq data were retrieved (CMC treated GSM1464234 and control GSM1464235), and called pseudouridines (Ψ-sites) using the bioinformatic procedure described in the paper. Briefly, Ψ-sites were determined as having more than 5 CMC-treated reads next to a ‘U’ on the correct strand and direction and having a Ψ-fc value greater than 3. This yielded 386 Ψ-sites out of a total of 8,194,131 ‘U’ positions (0.00471% ‘U’s were Ψ-sites).
Next, these 386 Ψ-sites to RNA Hi-C identified RNA interaction sites were compared. It was acknowledged that Ψ-seq and RNA Hi-C were done in different cell types. Nevertheless, within the RNA interaction sites, 93 were Ψ-sites out of a total of 551,634 ‘U’s (0.0109%). Therefore, RNA interaction sites determined by RNA Hi-C were enriched with Ψ-sites (odds ratio=4.4, Chi-square test p-value=7.70×10−95).
Furthermore, it was asked whether the Ψ-sites were enriched in the snoRNA-mRNA interaction sites detected by RNA Hi-C. Within snoRNA participating interaction sites, there were 57 Ψ-sites out of a total of 136,535 ‘U’s (0.0381%). Compared to the entire transcriptome, RNA Hi-C detected snoRNA-participated interaction sites were greatly enriched with Ψ-sites (odds ratio=10.2, Chi-square test p-value <1×10−100). Although snoRNA was known to contribute to RNA pseudouridination, these data indicate which snoRNAs may be specifically responsible. (Table 10).
Interactions between RNA molecules exert key regulatory roles and are often mediated by RNA binding proteins (Ray, D. et al. A compendium of RNA-binding motifs for decoding gene regulation. Nature 499, 172-177, doi:10.1038/nature12311 (2013)) such as ARGONAUTE proteins (AGO), PUM2, QKI, and snoRNP proteins (Meister, G. Argonaute proteins: functional insights and emerging roles. Nat Rev Genet 14, 447-459, doi:10.1038/nrg3462 (2013); Hafner, M. et al. Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell 141, 129-141, doi:10.1016/j.cell.2010.03.009 (2010); Granneman, S., Kudla, G., Petfalski, E. & Tollervey, D. Identification of protein binding sites on U3 snoRNA and pre-rRNA by UV cross-linking and high-throughput analysis of cDNAs. Proceedings of the National Academy of Sciences of the United States of America 106, 9613-9618, doi:10.1073/pnas.0901997106 (2009)). Despite recent advances, such as PAR-CLIP 4, HITS-CLIP 6, and CLASH 7,8, it remains a formidable challenge to map all protein-assisted RNA-RNA interactions (Hafner, M. et al. Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell 141, 129-141, doi:10.1016/j.cell.2010.03.009 (2010); Chi, S. W., Zang, J. B., Mele, A. & Darnell, R. B. Argonaute HITS-CLIP decodes microRNA-mRNA interaction maps. Nature 460, 479-486, doi:10.1038/nature08170 (2009); Helwak, A., Kudla, G., Dudnakova, T. & Tollervey, D. Mapping the human miRNA interactome by CLASH reveals frequent noncanonical binding. Cell 153, 654-665, doi:10.1016/j.cell.2013.03.043 (2013). Kudla, G., Granneman, S., Hahn, D., Beggs, J. D. & Tollervey, D. Cross-linking, ligation, and sequencing of hybrids reveals RNA-RNA interactions in yeast. Proc Natl Acad Sci USA 108, 10010-10015, doi:10.1073/pnas.1017386108 (2011)). In each of these three approaches, only the interactions mediated by one RNA-binding protein can be analyzed per experiment. HITS-CLIP and PAR-CLIP cannot directly map the interacting RNA pairs. Additionally, each experiment requires either a protein-specific antibody (HITS-CLIP or PAR-CLIP) or stable expression of a tagged protein in transformed cell lines (CLASH).
Earlier approaches often require ectopic expression of one or several components of the proposed interactions. Such methods include luciferase reporter assays and the use of synthetic RNA mimics for target capturing (Nicolas, F. E. Experimental validation of microRNA targets using a luciferase reporter system. Methods in molecular biology 732, 139-152, doi:10.1007/978-1-61779-083-6_11 (2011); Lal, A. et al. Capture of microRNA-bound mRNAs identifies the tumor suppressor miR-34a as a regulator of growth factor signaling. PLoS Genet 7, e1002363, doi:10.1371/journal.pgen.1002363 (2011)). Because ectopic expression rarely reproduces the endogenous expression levels, it is prudent to interpret the results from these methods as potential interactions rather than in vivo interactions. It is noted that the premise that miRNA tend to “promiscuously” interact with many mRNAs were primarily derived from data using ectopic expression (Du, T. & Zamore, P. D. Beginning to understand microRNA function. Cell Res 17, 661-663, doi: 10.1038/cr.2007.67 (2007)).
The RNA Hi-C method was developed to detect protein-assisted RNA-RNA interactions in vivo. In this procedure, RNA molecules are cross-linked with their bound proteins then ligated to a biotinylated RNA linker such that RNA molecules co-bound by the same protein form a chimeric RNA of the form RNA1-Linker-RNA2. These linker-containing chimeric RNAs are isolated using streptavidin coated magnetic beads and subjected to pair-end sequencing (Methods,
The RNA Hi-C method offers several advantages for mapping RNA-RNA interactions. First, RNA Hi-C directly analyzes the endogenous cellular features without introducing any exogenous nucleotides or protein-coding genes prior to cross-linking (Hafner, M. et al. Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell 141, 129-141, doi:10.1016/j.cell.2010.03.009 (2010); Helwak, A., Kudla, G., Dudnakova, T. & Tollervey, D. Mapping the human miRNA interactome by CLASH reveals frequent noncanonical binding. Cell 153, 654-665, doi:10.1016/j.cell.2013.03.043 (2013); Lal, A. et al. Capture of microRNA-bound mRNAs identifies the tumor suppressor miR-34a as a regulator of growth factor signaling. PLoS Genet 7, e1002363, doi:10.1371/journal.pgen.1002363 (2011); Baigude, H., Ahsanullah, Li, Z., Zhou, Y. & Rana, T. M. miR-TRAP: a benchtop chemical biology strategy to identify microRNA targets. Angew Chem Int Ed Engl 51, 5880-5883, doi:10.1002/anie.201201512 (2012)). This eliminates the uncertainty of reporting spurious interactions produced by changing the RNA or protein expression levels. Moreover, it makes RNA Hi-C well suited for assaying tissue samples. Second, the use of a biotinylated linker as a selection marker circumvents the requirement for a protein-specific antibody or the need to express a tagged protein. This allows for an unbiased mapping of the RNA-RNA interactome. As described in the literature other methods can only work with one RNA-binding protein at a time. Third, only RNA brought together by the same, singular protein molecule are captured, avoiding capture of independent RNA molecules that are individually bound to different copies of the same protein (potentially leading to reporting spurious interactions) (Hafner, M. et al. Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell 141, 129-141, doi:10.1016/j.cell.2010.03.009 (2010); Chi, S. W., Zang, J. B., Mele, A. & Darnell, R. B. Argonaute HITS-CLIP decodes microRNA-mRNA interaction maps. Nature 460, 479-486, doi:10.1038/nature08170 (2009)). Fourth, false positives that result from RNAs ligating randomly to other nearby RNAs are minimized by performing the RNA ligation step on streptavidin beads in extremely dilute conditions. Fifth, the RNA linker provides a clear boundary delineating sequencing reads that span across the ligation site, thus avoiding ambiguities in mapping the sequencing reads. Sixth, potential PCR amplification biases are removed by attaching a random 6 nucleotide barcode to each chimeric RNA before PCR amplification and subsequently counting completely overlapping sequencing reads with identical barcodes only once (Chi, S. W., Zang, J. B., Mele, A. & Darnell, R. B. Argonaute HITS-CLIP decodes microRNA-mRNA interaction maps. Nature 460, 479-486, doi:10.1038/nature08170 (2009), Loeb, G. B. et al. Transcriptome-wide miR-155 binding map reveals widespread noncanonical microRNA targeting. Mol Cell 48, 760-770, doi:10.1016/j.molcel.2012.10.002 (2012); Wang, Z. et al. iCLIP predicts the dual splicing effects of TIA-RNA interactions. PLoS Biol 8, e1000530, doi:10.1371/journal.pbio.1000530 (2010); Konig, J. et al. iCLIP reveals the function of hnRNP particles in splicing at individual nucleotide resolution. Nat Struct Mol Biol 17, 909-915, doi:10.1038/nsmb.1838 (2010)).
Two independent RNA Hi-C assays were carried out on mouse embryonic stem (ES) cells with minor technical differences (Table 5,
A suite of bioinformatics tools was created (RNA-HiC-tools) to analyze and visualize RNA Hi-C data (
The five RNA Hi-C libraries were compared. ES-1 and ES-2 were most similar judged by correlations of FPKMs (separately calculated for the read fragments on the left and the right sides of the linker), followed by ES-indirect, and then MEF and brain tissue (
The ES-1 and ES-2 libraries were merged to infer the RNA-RNA interactome in ES cells. This data included 4.54 million non-duplicated pair-end reads that were unambiguously split into two RNA fragments with both fragments uniquely mapping to the genome (mm9). 46,780 inter-RNA interactions were identified (FDR<0.05, Fisher's exact test with Benjamin & Hochberg correction) (
In order to confirm interactions at a larger scale, RNA antisense oligonucleotide purification sequencing was carried out (RAP-seq)(Engreitz, J. M. et al. RNA-RNA interactions enable specific targeting of noncoding RNAs to nascent Pre-mRNAs and chromatin sites. Cell 159, 188-199, doi:10.1016/j.cell.2014.08.018 (2014)). First, Malat1 RAP-seq and Actb RAP-seq (control) was performed to test the interactions involving Malat1 (Comparison of snoRNA-mRNA interactions with mRNA pseudouridines). Malat1 RNA itself exhibited a 5.81 fold increase in Malat1 RAP-seq over Actb RAP-seq, confirming the validity of the purification. The RNA-Hi C reported Malat1 interacting RNAs (
RNA-RNA interactions have been reported as “surprisingly promiscuous” (Du, T. & Zamore, P. D. Beginning to understand microRNA function. Cell Res 17, 661-663, doi:10.1038/cr.2007.67 (2007)). It was suggested that each miRNA interacts with 300 to 1,000 mRNAs in one cell type, and a similar picture was proposed for lincRNAs (Chi, S. W., Zang, J. B., Mele, A. & Darnell, R. B. Argonaute HITS-CLIP decodes microRNA-mRNA interaction maps. Nature 460, 479-486, doi:10.1038/nature08170 (2009); Guttman, M. et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458, 223-227, doi:10.1038/nature07672 (2009)). However, the observed RNA-RNA interactome (46,780 interactions) is a scale-free network, with a degree distribution conforming to power law (
The majority (83.05%) of the interacting RNAs exhibited overlapping RNA Hi-C reads (
It was asked whether base complementation is utilized by different types of RNA-RNA interactions. It was estimated the hybridization energy of a pair of interacting RNAs by the average hybridization energy of the pairs of ligated fragments (RNA1, RNA2), and compared it to the hybridization energy of control RNAs generated by random shuffling of the bases (Ray, D. et al. A compendium of RNA-binding motifs for decoding gene regulation. Nature 499, 172-177, doi:10.1038/nature12311 (2013); Bellaousov, S., Reuter, J. S., Seetin, M. G. & Mathews, D. H. RNAstructure: web servers for RNA secondary structure prediction and analysis. Nucleic Acids Research 41, W471-W474, doi:Doi 10.1093/Nar/Gkt290 (2013)). Complementary bases were preferred in nearly all types of RNA-RNA interactions, and were most pronounced in transposonRNA-mRNA, mRNA-mRNA, pseudogeneRNA-mRNA, lincRNA-mRNA, miRNA-mRNA interactions (p-values <2.4-18), but was not observed in LTR-pseudogeneRNA interactions (
If these RNA-RNA interactions are sequence-specific, the RNA interaction sites should be under selective pressure (Gong, C. & Maquat, L. E. lncRNAs transactivate STAU1-mediated mRNA decay by duplexing with 3′ UTRs via Alu elements. Nature 470, 284-288, doi:10.1038/nature09701 (2011)). It was found that the interspecies conservation levels are strongly increased at the interaction sites, and the peak of conservation precisely pinpointed the junction of the two RNA fragments (
Although designed RNA Hi-C were originally for mapping inter-molecule interactions, it was found that RNA Hi-C revealed RNA secondary and tertiary structures. All the analyses above were based on inter-molecular reads. By looking at intra-molecular reads, two characteristics of RNA structure were learned. First, the footprint of single stranded regions of an RNA were identified by the density of RNase I digestion sites (RNase I digestion was applied before ligation, see Step 2 in
The key to mapping RNA interactions is selection. The introduction of a selectable linker in RNA Hi-C enabled an unbiased selection of interacting RNAs, making it possible to globally map an RNA-RNA interactome. The number of interacting partners per RNA in ES cells was strongly unbalanced, resulting in a scale-free RNA network. Interactions between long RNAs frequently used a small fraction of the transcripts. Analogous to protein interaction domains, the notion of RNA interaction sites were proposed. RNA interaction sites utilized base pairing to facilitate interactions of long RNAs, suggesting a new type of trans regulatory sequences. These trans regulatory sequences are more evolutionarily conserved than other parts of transcripts. RNA structure could be mapped by RNA Hi-C as well. Here an example is provided where an RNA was bent by a protein, and such tertiary structure was revealed by the intro-molecule reads of RNA Hi-C. This method and data should greatly facilitate future investigations of RNA functions and regulatory roles.
Software Access
The RNA-HiC-tools software is available at http://systemsbio.ucsd.edu/RNA-Hi-C.
From the foregoing, it will be appreciated that various embodiments of the present disclosure have been described herein for purposes of illustration, and that various modifications can be made without departing from the scope and spirit of the present disclosure. Accordingly, the various embodiments disclosed herein are not intended to be limiting, with the true scope and spirit being indicated by the following claims.
In some embodiments, a method for generating chimeric RNAs comprises RNAs which interact with one another in a cell, wherein the method comprises cross-linking RNA to protein and ligating RNAs cross-linked to the same protein molecule together to form a chimeric RNA. In some embodiments, said cross-linking of RNA to protein is performed on an intact cell or in a cell lysate. In some embodiments, said cross-linking comprises UV cross-linking. In some embodiments, the method further comprises associating said protein with an agent which facilitates immobilization of said protein on a surface. In some embodiments, said agent which facilitates immobilization comprises biotin. In some embodiments, the protein is biotinylated at least one cysteine. In some embodiments, the method further comprises fragmenting said RNAs cross-linked to the same protein molecule. In some embodiments, said fragmenting comprises contacting said RNAs cross-linked to the same protein molecule with an RNAse under conditions which facilitate partial digestion of said RNAs. In some embodiments, the method further comprises linking said RNAs cross-linked to the same protein molecule to an agent which facilitates recovery of said RNAs. In some embodiments, said linking comprises ligating the ends of said RNAs to said agent. IN some embodiments, the RNA is ligated with a biotin-tagged RNA linker. In some embodiments, the biotin-tagged RNA linker is 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18. 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29 or 30 nucleotides long or any length between any aforementioned values. In some embodiments, said agent which facilitates recovery of said RNAs comprises a nucleic acid. In some embodiments, said nucleic acid comprises a nucleic acid having biotin thereon. In some embodiments, said linking of said nucleic acid having biotin thereon to said ends of said RNAs comprises ligating said nucleic acid having biotin thereon to the 5′ ends of said RNAs prior to ligating said RNAs cross-linked to the same protein molecule together to form a chimeric RNA. In some embodiments, the method further comprises removing said biotin from the 5′ region of said chimeric RNA. In some embodiments, the method further comprises recovering said chimeric RNAs. In some embodiments, the method further comprises fragmenting said chimeric RNAs. In some embodiments, the method further comprises DNAse treatment to eliminate DNA contamination. In some embodiments, said fragmenting of said chimeric RNAs comprises contacting said chimeric RNAs with an RNAse under conditions which facilitate partial digestion of said RNAs. In some embodiments, the method further comprises reverse transcribing said chimeric RNAs to generate a chimeric cDNA. In some embodiments, the method further comprises determining at least a portion of the sequences in said chimeric RNAs or chimeric cDNAs which originate from each of the RNAs in said chimeric RNAs or chimeric cDNAs. In some embodiments, the method further comprises identifying the RNAs present in said chimeric RNAs, thereby identifying RNAs which interact with one another in a cell. In some embodiments, at least 100, at least 500, at least 1000 or more than 1000 RNA-RNA interactions in the cell are identified. In some embodiments, substantially all of the RNAs which interact with one another in a cell are identified. In some embodiments, wherein at least 70%, at least 80%, at least 90% or more than 90% of the direct RNA-RNA interactions in the cell are identified. In some embodiments, the identification of the RNAs which interact with one another in a cell comprises performing sequence reads on said chimeric RNAs using an automated sequencing device. In some embodiments, said identification of the RNAs which interact with one another in a cell comprises identifying the chimeric sequences from all the sequence reads. In some embodiments, the method further comprises transforming the chimeric RNAs into annotated RNA clusters using a computer. In some embodiments, the method further comprises identifying direct interactions among said RNA clusters using a statistical test performed by a computer.
In some embodiments, an isolated complex is provided. The isolated complex can comprise a chimeric RNA cross-linked to a protein, wherein said chimeric RNA comprises RNAs which interact with one another in a cell. An isolated complex can also comprise a complex comprising a protein and nucleic acid, intermediate proteins and nucleic acid or a protein complex and nucleic acid, wherein the nucleic acid is RNA. In some embodiments, an isolated complex comprises a complex comprising a protein and nucleic acid, intermediate proteins and nucleic acid or a protein complex and nucleic acid, wherein the nucleic acid is RNA.
In some embodiments, a method for identifying a candidate therapeutic agent is provided, wherein the method comprises identifying RNAs which interact with one another in a cell using the method of any of the embodiments described herein and evaluating the ability of an agent to reduce or increase the interaction of said RNAs, wherein said agent is a candidate therapeutic agent if said agent is able to reduce or increase said interaction of said RNAs. In some embodiments the method for identifying RNAs which interact with one another in a cell comprises cross-linking RNA to protein and ligating RNAs cross-linked to the same protein molecule together to form a chimeric RNA. In some embodiments, said cross-linking of RNA to protein is performed on an intact cell or in a cell lysate. In some embodiments, said cross-linking comprises UV cross-linking. In some embodiments, the method further comprises associating said protein with an agent which facilitates immobilization of said protein on a surface. In some embodiments, said agent which facilitates immobilization comprises biotin. In some embodiments, the method further comprises fragmenting said RNAs cross-linked to the same protein molecule. In some embodiments, said fragmenting comprises contacting said RNAs cross-linked to the same protein molecule with an RNAse under conditions which facilitate partial digestion of said RNAs. In some embodiments, the method further comprises linking said RNAs cross-linked to the same protein molecule to an agent which facilitates recovery of said RNAs. In some embodiments, said linking comprises ligating the ends of said RNAs to said agent. In some embodiments, said agent which facilitates recovery of said RNAs comprises a nucleic acid. In some embodiments, said nucleic acid comprises a nucleic acid having biotin thereon. In some embodiments, said linking of said nucleic acid having biotin thereon to said ends of said RNAs comprises ligating said nucleic acid having biotin thereon to the 5′ ends of said RNAs prior to ligating said RNAs cross-linked to the same protein molecule together to form a chimeric RNA. In some embodiments, the method further comprises removing said biotin from the 5′ region of said chimeric RNA. In some embodiments, the method further comprises recovering said chimeric RNAs. In some embodiments, the method further comprises fragmenting said chimeric RNAs. In some embodiments, said fragmenting of said chimeric RNAs comprises contacting said chimeric RNAs with an RNAse under conditions which facilitate partial digestion of said RNAs. In some embodiments, the method further comprises reverse transcribing said chimeric RNAs to generate a chimeric cDNA. In some embodiments, the method further comprises determining at least a portion of the sequences in said chimeric RNAs or chimeric cDNAs which originate from each of the RNAs in said chimeric RNAs or chimeric cDNAs. In some embodiments, the method further comprises identifying the RNAs present in said chimeric RNAs, thereby identifying RNAs which interact with one another in a cell. In some embodiments, at least 100, at least 500, at least 1000 or more than 1000 RNA-RNA interactions in the cell are identified. In some embodiments, substantially all of the RNAs which interact with one another in a cell are identified. In some embodiments, wherein at least 70%, at least 80%, at least 90% or more than 90% of the direct RNA-RNA interactions in the cell are identified. In some embodiments, the identification of the RNAs which interact with one another in a cell comprises performing sequence reads on said chimeric RNAs using an automated sequencing device. In some embodiments, said identification of the RNAs which interact with one another in a cell comprises identifying the chimeric sequences from all the sequence reads. In some embodiments, the method further comprises transforming the chimeric RNAs into annotated RNA clusters using a computer. In some embodiments, the method further comprises identifying direct interactions among said RNA clusters using a statistical test performed by a computer. In some embodiments, said agent comprises a nucleic acid. In some embodiments, said agent comprises a chemical compound.
In some embodiments, a method of making a pharmaceutical is provided, wherein the method comprises formulating an agent identified using the method of any of the embodiments described herein, in a pharmaceutically acceptable carrier. In some embodiments, formulating an agent identified is performed by a method for identifying a candidate therapeutic agent, wherein the method comprises identifying RNAs which interact with one another in a cell using the method of any of the embodiments described herein and evaluating the ability of an agent to reduce or increase the interaction of said RNAs, wherein said agent is a candidate therapeutic agent if said agent is able to reduce or increase said interaction of said RNAs. In some embodiments the method for identifying RNAs which interact with one another in a cell comprises cross-linking RNA to protein and ligating RNAs cross-linked to the same protein molecule together to form a chimeric RNA. In some embodiments, said cross-linking of RNA to protein is performed on an intact cell or in a cell lysate. In some embodiments, said cross-linking comprises UV cross-linking. In some embodiments, the method further comprises associating said protein with an agent which facilitates immobilization of said protein on a surface. In some embodiments, said agent which facilitates immobilization comprises biotin. In some embodiments, the method further comprises fragmenting said RNAs cross-linked to the same protein molecule. In some embodiments, said fragmenting comprises contacting said RNAs cross-linked to the same protein molecule with an RNAse under conditions which facilitate partial digestion of said RNAs. In some embodiments, the method further comprises linking said RNAs cross-linked to the same protein molecule to an agent which facilitates recovery of said RNAs. In some embodiments, said linking comprises ligating the ends of said RNAs to said agent. In some embodiments, said agent which facilitates recovery of said RNAs comprises a nucleic acid. In some embodiments, said nucleic acid comprises a nucleic acid having biotin thereon. In some embodiments, said linking of said nucleic acid having biotin thereon to said ends of said RNAs comprises ligating said nucleic acid having biotin thereon to the 5′ ends of said RNAs prior to ligating said RNAs cross-linked to the same protein molecule together to form a chimeric RNA. In some embodiments, the method further comprises removing said biotin from the 5′ region of said chimeric RNA. In some embodiments, the method further comprises recovering said chimeric RNAs. In some embodiments, the method further comprises fragmenting said chimeric RNAs. In some embodiments, said fragmenting of said chimeric RNAs comprises contacting said chimeric RNAs with an RNAse under conditions which facilitate partial digestion of said RNAs. In some embodiments, the method further comprises reverse transcribing said chimeric RNAs to generate a chimeric cDNA. In some embodiments, the method further comprises determining at least a portion of the sequences in said chimeric RNAs or chimeric cDNAs which originate from each of the RNAs in said chimeric RNAs or chimeric cDNAs. In some embodiments, the method further comprises identifying the RNAs present in said chimeric RNAs, thereby identifying RNAs which interact with one another in a cell. In some embodiments, at least 100, at least 500, at least 1000 or more than 1000 RNA-RNA interactions in the cell are identified. In some embodiments, substantially all of the RNAs which interact with one another in a cell are identified. In some embodiments, wherein at least 70%, at least 80%, at least 90% or more than 90% of the direct RNA-RNA interactions in the cell are identified. In some embodiments, the identification of the RNAs which interact with one another in a cell comprises performing sequence reads on said chimeric RNAs using an automated sequencing device. In some embodiments, said identification of the RNAs which interact with one another in a cell comprises identifying the chimeric sequences from all the sequence reads. In some embodiments, the method further comprises transforming the chimeric RNAs into annotated RNA clusters using a computer. In some embodiments, the method further comprises identifying direct interactions among said RNA clusters using a statistical test performed by a computer. In some embodiments, said agent comprises a nucleic acid. In some embodiments, said agent comprises a chemical compound.
In some embodiments a pharmaceutical is provided, wherein the pharmaceutical is made using the method of any of the embodiments described herein. In some embodiments, the method comprises formulating an agent identified using the method of any of the embodiments described herein, in a pharmaceutically acceptable carrier. In some embodiments, formulating an agent identified is performed by a method for identifying a candidate therapeutic agent, wherein the method comprises identifying RNAs which interact with one another in a cell using the method of any of the embodiments described herein and evaluating the ability of an agent to reduce or increase the interaction of said RNAs, wherein said agent is a candidate therapeutic agent if said agent is able to reduce or increase said interaction of said RNAs. In some embodiments the method for identifying RNAs which interact with one another in a cell comprises cross-linking RNA to protein and ligating RNAs cross-linked to the same protein molecule together to form a chimeric RNA. In some embodiments, said cross-linking of RNA to protein is performed on an intact cell or in a cell lysate. In some embodiments, said cross-linking comprises UV cross-linking. In some embodiments, the method further comprises associating said protein with an agent which facilitates immobilization of said protein on a surface. In some embodiments, said agent which facilitates immobilization comprises biotin. In some embodiments, the method further comprises fragmenting said RNAs cross-linked to the same protein molecule. In some embodiments, said fragmenting comprises contacting said RNAs cross-linked to the same protein molecule with an RNAse under conditions which facilitate partial digestion of said RNAs. In some embodiments, the method further comprises linking said RNAs cross-linked to the same protein molecule to an agent which facilitates recovery of said RNAs. In some embodiments, said linking comprises ligating the ends of said RNAs to said agent. In some embodiments, said agent which facilitates recovery of said RNAs comprises a nucleic acid. In some embodiments, said nucleic acid comprises a nucleic acid having biotin thereon. In some embodiments, said linking of said nucleic acid having biotin thereon to said ends of said RNAs comprises ligating said nucleic acid having biotin thereon to the 5′ ends of said RNAs prior to ligating said RNAs cross-linked to the same protein molecule together to form a chimeric RNA. In some embodiments, the method further comprises removing said biotin from the 5′ region of said chimeric RNA. In some embodiments, the method further comprises recovering said chimeric RNAs. In some embodiments, the method further comprises fragmenting said chimeric RNAs. In some embodiments, said fragmenting of said chimeric RNAs comprises contacting said chimeric RNAs with an RNAse under conditions which facilitate partial digestion of said RNAs. In some embodiments, the method further comprises reverse transcribing said chimeric RNAs to generate a chimeric cDNA. In some embodiments, the method further comprises determining at least a portion of the sequences in said chimeric RNAs or chimeric cDNAs which originate from each of the RNAs in said chimeric RNAs or chimeric cDNAs. In some embodiments, the method further comprises identifying the RNAs present in said chimeric RNAs, thereby identifying RNAs which interact with one another in a cell. In some embodiments, at least 100, at least 500, at least 1000 or more than 1000 RNA-RNA interactions in the cell are identified. In some embodiments, substantially all of the RNAs which interact with one another in a cell are identified. In some embodiments, wherein at least 70%, at least 80%, at least 90% or more than 90% of the direct RNA-RNA interactions in the cell are identified. In some embodiments, the identification of the RNAs which interact with one another in a cell comprises performing sequence reads on said chimeric RNAs using an automated sequencing device. In some embodiments, said identification of the RNAs which interact with one another in a cell comprises identifying the chimeric sequences from all the sequence reads. In some embodiments, the method further comprises transforming the chimeric RNAs into annotated RNA clusters using a computer. In some embodiments, the method further comprises identifying direct interactions among said RNA clusters using a statistical test performed by a computer. In some embodiments, said agent comprises a nucleic acid. In some embodiments, said agent comprises a chemical compound.
In some embodiments, a method for generating chimeric RNAs comprising RNAs which interact with one another in a cell is provided, wherein the method comprises cross-linking RNA to protein intermediates and/or a protein complex and ligating RNAs cross-linked to protein intermediates and/or the protein complex together to form a chimeric RNA, and wherein the protein complex comprises two or more interacting proteins. In some embodiments, said cross-linking of RNA to the protein intermediates and/or the protein complex is performed on an intact cell or in a cell lysate. In some embodiments, said cross-linking comprises UV cross-linking. In some embodiments, the method further comprises associating said protein intermediates and/or the protein complex with an agent which facilitates immobilization of said protein intermediates and/or the protein complex on a surface. In some embodiments, said agent which facilitates immobilization comprises biotin. In some embodiments, the method further comprises fragmenting said RNAs cross-linked to the at least one protein molecule. In some embodiments, fragmenting comprises contacting said RNAs cross-linked to the protein intermediates and/or the protein complex with an RNAse under conditions which facilitate partial digestion of said RNAs. In some embodiments, the method further comprises linking said RNAs cross-linked to the protein intermediates and/or the protein complex to an agent which facilitates recovery of said RNAs. In some embodiments, said linking comprises ligating the ends of said RNAs to said agent. In some embodiments, said agent which facilitates recovery of said RNAs comprises a nucleic acid. In some embodiments, said nucleic acid comprises a nucleic acid having biotin thereon. In some embodiments, said linking of said nucleic acid having biotin thereon to said ends of said RNAs comprises ligating said nucleic acid having biotin thereon to the 5′ ends of said RNAs prior to ligating said RNAs cross-linked to the protein intermediates and/or the protein complex together to form a chimeric RNA. In some embodiments, the method further comprises removing said biotin from the 5′ region of said chimeric RNA. In some embodiments, the method further comprises recovering said chimeric RNAs. In some embodiments, the method further comprises fragmenting said chimeric RNAs. In some embodiments, said fragmenting of said chimeric RNAs comprises contacting said chimeric RNAs with an RNAse under conditions which facilitate partial digestion of said RNAs. In some embodiments, the method further comprises reverse transcribing said chimeric RNAs to generate a chimeric cDNA. In some embodiments, the method further comprises identifying the RNAs present in said chimeric RNAs, thereby identifying RNAs which interact with one another in a cell. In some embodiments, at least 100, at least 500, at least 1000 or more than 1000 RNA-RNA interactions in the cell are identified. In some embodiments, substantially all of the RNAs which interact with one another in a cell are identified. In some embodiments, at least 70%, at least 80%, at least 90% or more than 90% of the direct RNA-RNA interactions in the cell are identified. In some embodiments, the identification of the RNAs which interact with one another in a cell comprises performing sequence reads on said chimeric RNAs using an automated sequencing device. In some embodiments, said identification of the RNAs which interact with one another in a cell comprises identifying the chimeric sequences from all the sequence reads. In some embodiments, the method further comprises transforming the chimeric RNAs into annotated RNA clusters using a computer. In some embodiments, the method further comprises identifying direct interactions among said RNA clusters using a statistical test performed by a computer. In some embodiments, said RNAs which interact with each other in the cell are cross-linked to different proteins in said protein intermediate or protein complex.
In some embodiments, an isolated complex comprising a chimeric RNA cross-linked to protein intermediates and/or a protein complex is provided, wherein said chimeric RNA comprises RNAs which interact with one another in a cell, wherein the protein complex comprises two or more interacting proteins. In some embodiments, said chimeric RNA comprises RNAs which are cross-linked to different proteins in said protein intermediate or protein complex.
Each reference listed herein is incorporated herein by reference in its entirety.
The present application is a continuation of PCT/US2015/051075 filed Sep. 18, 2015, which claims the benefit of priority to U.S. Provisional Patent Application No. 62/053,615, filed on Sep. 22, 2014. The entire disclosures of the aforementioned applications are expressly incorporated herein by reference in their entirety.
This invention was made with government support under grant number NIH DP2-OD007417 awarded by the National Institute of Health. The government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
62053615 | Sep 2014 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/US2015/051075 | Sep 2015 | US |
Child | 15462680 | US |