The present invention belongs to the technical field of biology, and particularly relates to a whole transcriptome level RNA structure probing method and use thereof.
RNA has different functions, for example: as messengers to convey genetic information, or as ribozymes to catalyze reactions. RNA molecules are precisely regulated throughout their entire life cycle and at different subcellular locations. The complex and flexible structures are the core of the functional diversity and fine regulation of RNA molecules. Misfolding of RNA structures can interfere with processes such as alternative splicing, translation, RNA modification and editing, and RNA-protein interactions, thereby leading to disease.
RNA structure probing methods utilize chemical reagents that specifically modify single-stranded nucleotides. The modification sites can interfere with reverse transcription (RT), resulting in RT stops or mutations; therefore, the modification sites can be detected by sequencing and bioinformatic analyses, and RNA structural information is thus obtained. Most reagents can only probe structural information of one or two bases; for example, dimethyl sulfate (DMS) modifies single-stranded cytosines and adenines, glyoxal modifies single-stranded guanines, cytosines and adenines, and kethoxal modifies single-stranded guanines. Selective 2-hydroxy acylation analyzed by primer extension (SHAPE) reagents can modify the 2′ OH group of ribose within single-stranded regions and provide structural information for all four nucleotides.
Global RNA structure probing studies have revealed that structural differences often exist at functional RNA sites, such as protein and miRNA binding sites, and studies have shown that RNA structures can be involved in regulating the splicing, translation and degradation processes of RNA. Notably, several studies have shown that RNA sequences can form different structures in vivo and in vitro, at different subcellular compartments, and at different stages of embryogenesis. Indeed, many factors in cells can affect RNA structures, including pH, cation concentrations, endogenous RNA modifications (e.g., methylation, acetylation), and interactions with proteins and/or other RNAs. Therefore, studying RNA structures in their most relevant natural environments is crucial for revealing RNA functions and regulatory mechanisms.
However, current state-of-the-art RNA structure probing methods typically require a large amount of RNA as input, which limits their practical uses. For example, the construction of RNA libraries for icSHAPE and Structure-seq2 requires approximately 107 cells, which is difficult to achieve for biological studies of rare primary cells and many tissue samples. Therefore, in addition to some studies on zebrafish early embryos and drosophila ovaries that are experimentally easy to collect, RNA structure probing studies are as yet limited to cultured cell lines. However, the cellular environments in cell lines and the RNA structures generated therefrom may deviate significantly from the primary sample, such that the results cannot truly reflect the functional states of the cells.
To overcome this obstacle, we developed smartSHAPE (small amount random RT icSHAPE), a novel secondary structure probing method for low amounts of input RNA, which is an improvement over the icSHAPE method. Therefore,
In a first aspect of the present invention, an RNA structure probing method is provided, wherein the method comprises:
1. obtaining an RNA-containing sample; 2. preparing a smartSHAPE library; and 3. RNA structure probing and analysis, wherein in step 2, preparing the smartSHAPE library comprises: (1) RNA modification and preparation; (2) RNA reverse transcription, removal of background reverse transcription stop signals caused by non-modification sites (premature RT stops), and cDNA enrichment.
Preferably, step 2 of the RNA structure probing method further comprises (3) adapter ligation, second strand synthesis, and amplification. More preferably, the adapter ligation includes 3′ adapter ligation and 5′ adapter ligation.
Preferably, the background reverse transcription stop signals are caused by non-RNA modification sites. More preferably, the background reverse transcription stop signals may be derived from endogenous modifications (e.g., m1A modifications), local structures (e.g., G-quadruplexes), or random shedding of reverse transcriptase.
More preferably, the background reverse transcription stop signals are removed by ribonuclease (RNase) digestion. More preferably, the background reverse transcription stop signals are removed by RNase I digestion.
Preferably, a primer for the reverse transcription (RT) has the sequence of 5′-NNNNNN-3′, 5′-NNWNNWNN-3′, or 5′-TTTTTTTTVN-3′. Preferably, the RNA is modified with a labeling reagent; more preferably, the labeling reagent is a cell membrane penetrating reagent; more preferably, the labeling reagent is dimethyl sulfate (DMS), 1-methyl-7-nitroisatoic anhydride (1M7), 2-methylnicotinic acid imidazolide-azide (NAI-N3) or kethoxal; more preferably, the labeling reagent is 2-methylnicotinic acid imidazolide-azide (NAI-N3).
Preferably, the cDNA enrichment is enrichment with magnetic beads; more preferably, the magnetic beads are streptavidin magnetic beads, such as MyOne C1 magnetic beads.
Preferably, the RNA structure is an RNA secondary structure.
Preferably, the RNA is a full-length RNA; further, the RNA is a transcriptome RNA. It may be a long-chain RNA, such as an mRNA, lncRNA or rRNA, or it may comprise many small RNAs, such as small RNAs smaller than 200 nt, protein-bound RNAs, or RNAs serving as Dicer substrates.
Preferably, the RNA may be derived from any cell, virus, etc.; preferably, the cell includes, but is not limited to, cell lines cultured in laboratories, living cells, primary cells, mammalian early embryos, bacteria, fungi, and various infected cells, such as cells infected by viruses, bacteria, fungi, etc.; more preferably, the living cells may be any somatic cell or germ cell, such as epithelial cells, dermal cells, glandular cells, blood-derived cells, bone cells, immune cells (T cells, B cells, NK cells, macrophages, etc.), or fertilized eggs.
The RNA structure probing method further comprises a processing step of calculating smartSHAPE scores using a computational pipeline. The calculation processing step comprises: 1) removing a 3′ adapter; 2) removing duplicate reads; 3) removing a molecular label; 4) aligning clean reads to rRNA standard sequences; 5) aligning reads that are not aligned to rRNA sequences to a genome; 6) converting Sam files into .tab files using icSHAPE-pipe sam2tab; and 7) calculating smartSHAPE scores using icSHAPE-pipe calcSHAPENoCont.
Preferably, in step 7), the smartSHAPE scores are calculated by normalization and winsorization of RT stop counts across all exons in a sliding window fashion, and the scores for bases with coverage below 100 are defined as NULL.
More preferably, parameters in step 7) are: —N NAI_rep1.tab, NAI_rep2.tab; -size chrNameLength.txt; -out reactivity.gTab; -ijf sjdbList.fromGTF.out.tab.
Preferably, the probing method does not comprise a gel recovery step before library amplification.
Preferably, in the library construction of the computational pipeline, no control group is required to remove background signals.
Preferably, in the RNA structure probing method, RNA structure probing can be performed with an RNA input of as little as 1 ng (104 to 105 cells).
The present invention further provides use of the RNA structure probing method described above, the use comprising assessing functional states of cells, studying the effect of RNA on early development and the development and progression of cancer, etc., according to the result of the probing method described above.
Preferably, the functional states include various physiological and abnormal states, such as cellular inflammation, injury, ischemia, immune stress state, early developmental process, infection, and cancer proliferation. More preferably, the infection is caused by viruses, bacteria, fungi, etc.
Preferably, the cells are derived from any tissue organ, such as the cutaneous system, the blood lymphatic system, the immune system, the cardiovascular system, the digestive system, the respiratory system, the urinary system, the skeletal system, the reproductive system, or the nervous system.
Preferably, the cells include immune cells, such as B cells, T cells, NK cells, and macrophages.
Preferably, the use is not a diagnosis or treatment method for a disease.
The present invention further provides a method for assessing a functional state of a cell, the assessing method comprising probing RNA structures of the cell by any probing method described above, and assessing the functional state of the cell according to the probing result.
Preferably, the functional state of the cell is cellular inflammation, injury, ischemia, immune stress state, early developmental process, infection, cancer proliferation, etc.; more preferably, the infection is caused by viruses, bacteria, fungi, etc.
More preferably, the functional state of the cell is an immune stress state of the cell. An example is an immune stress state of an immune cell. Still further preferably, the immune cell includes, for example, B cells, T cells, NK cells, and macrophages.
The present invention has the following beneficial technical effects:
1. The present invention removes the background reverse transcription stop signals, reducing false positive signals caused by the background reverse transcription stop signals in the structure score calculation, thereby improving the accuracy of the probing method.
2. The present invention adopts a different library construction strategy, wherein we combine random RT with on-bead single-stranded DNA library construction, greatly reducing the losses caused by multiple purification steps.
3. SmartSHAPE requires an RNA input of as little as 1 ng (104 to 105 cells), enabling RNA structure analysis of in vivo cells at a very low sample amount. The method can be applied to any cell, such as rare primary cells, mammalian early embryos, and patient biopsy samples.
4. We used smartSHAPE to describe the whole transcriptome RNA secondary structure of intestinal macrophages from bacterial infection model mice, wherein only 100 ng of total RNA was used as input for each sample. We revealed differences in RNA structure between two populations of macrophages after immune stress, which are rich in immune response-associated genes, and we provided evidence for regulation of immune response through RNA structure.
5. The smartSHAPE of the present invention is an efficient, accurate and robust method for studying whole transcriptome RNA secondary structures in vivo that requires only a very small amount of RNA as input. Our method integrates random reverse transcription, RNase I digestion, and on-bead library construction to increase the efficiency of library construction and to generate accurate RNA structural data. The results of the present invention show that smartSHAPE successfully removes background reverse transcription stop signals by RNase I digestion followed by magnetic bead enrichment, and achieves better accuracy than icSHAPE even without a DMSO group as a control.
6. In view of the minimal requirements of the method of the present invention for RNA initial material, it is very promising to apply smartSHAPE to the study of the widespread roles of RNA structure in many other potential biological environments. For example, maternal RNA degradation is essential for early development, and several studies have reported that RNA structure plays a regulatory role in maternal RNA degradation during early embryogenesis of zebrafish. The RNA structurome in mammalian early embryos has not been studied due to the limited sample amount in the prior art, but can be approached by smartSHAPE of the present invention. In addition, dysregulation of RBP binding is known to be involved in the development and progression of many cancers. SmartSHAPE may provide a viable means to study these dysregulations from the perspective of RNA structure by using rare biopsy samples from the clinic. In addition, when used in combination with enrichment (e.g., by antisense oligonucleotides or protein antibodies), smartSHAPE is expected to help discover and functionally validate regulatory effects based on RNA structure; these RNAs include RNAs expressed at low levels (such as many lncRNAs), RNA species in stress granules, and RNA fragments bound by RBPs, etc.
The foregoing is merely a summary of some aspects of the present invention, and is not, and should not be construed as, limiting the present invention in any way.
Unless otherwise specified, the practice of the present invention will adopt traditional techniques of cell biology, cell culture, molecular biology, immunology, and the like. These techniques are explained in detail in the following documents. For example:
All patents and publications mentioned in this specification are herein incorporated by reference in their entirety. Those skilled in the art should recognize that certain changes may be made to the present invention without departing from the conception or scope of the present invention. The following examples further illustrate the present invention in detail and should not be construed as limiting the scope of the present invention or the specific methods described herein.
The present invention is further described with reference to the following specific examples, and the advantages and features of the present invention will be clearer as the description proceeds. These examples are illustrative only and do not limit the scope of the present invention in any way. It should be understood by those skilled in the art that modifications and replacements can be made to the details and form of the technical solutions of the present invention without departing from the spirit and scope of the present invention and that all these modifications and replacements fall within the scope of the present invention.
In icSHAPE, NAI-N3 was used to modify RNAs in vivo in single-stranded regions. The RNAs were then fragmented, ligated to a 3′ adapter, and converted into double-stranded DNA libraries by reverse transcription, circligation, and amplification. Notably, icSHAPE library construction employs multiple steps of gel extraction and column purification steps, which lead to RNA sample loss, making it difficult or impossible to analyze samples with a small amount of input RNA. Even with a high recovery rate of 80% and 50% for column and gel purification, respectively, we typically obtained only a 5% yield after seven column purification steps and two gel size selection steps.
To minimize the loss of input material, we developed smartSHAPE, which combines random-primed reverse transcription, on-beads reactions, and single-stranded DNA library construction (see
The subsequent single-stranded DNA library construction was performed with most steps on magnetic beads, and the original gel extraction and column purification steps can be replaced by simple magnetic bead washing, such that the efficiency of library construction was greatly improved, and the process was simplified. Specifically, biotinylated adapters were ligated to the 3′ end of cDNA fragments by CircLigase or T4 DNA ligase, enabling their immobilization with streptavidin beads (see
The specific procedures are as follows:
I. Cell Culturing
HEK293T cells were maintained in a DMEM medium with high glucose (Gibco) supplemented with 10% fetal bovine serum (FBS) and 1% penicillin-streptomycin.
II. smartSHAPE Library Preparation
1. Modification by labeling reagent NAI-N3 and RNA preparation.
RNA was modified in vivo by NAI-N3. Briefly, cells were rinsed and scraped in 1×PBS at room temperature. Cells were then pelleted and resuspended in 450 μL of 1×PBS, and the suspension was mixed with 50 μL of 1 M NAI-N3 or 50 μL of DMSO (as an untreated group). Reactants were incubated for 5 min at 37° C. with rotation and the reaction was then terminated after centrifugation at 2500 g for 1 min at 4° C. Cells were resuspended and lysed with 500 μL of Trizol (Invitrogen), and total RNAs were separated by isopropanol precipitation. Poly (A)*RNA was separated using poly-A selection (Ambion) or RiboErase (KAPA). RNA samples were incubated with 1 μL of RiboLock and 2 μL of 185 mM Dibo-Biotin for 2 h at 37° C. at 1000 r.p.m in a mixer (Eppendorf). Zymo RNA Clean & Concentrator-5 column was used for purification. 2. Reverse transcription, RNase digestion, enrichment, and 3′ adapter ligation. 3.5 μL of RT primer mixture (50 μM 5′-NNNNNN-3′, 50 μM 5′-NNWNNWNN-3′, and 6 μM 5′-TTTTTTTTVN-3′) and 3μ of 5× first strand buffer (Life Technologies) were added to 8.5 μL of biotinylated RNA sample. The samples were heated to 85° C. for 5 min and then slowly cooled to 4° C. (0.1° C. per s) for primer annealing and weak fragmentation. To RNAs with primers, 0.75 μL of RiboLock, 1 μL of 100 mM DTT, 1 μL of 5× first strand buffer, and 1.25 μL of SuperScript III (Life Technologies) were added for random RT. cDNA extension was performed at 4° C. for 2 min, 15° C. for 3 min, 25° C. for 10 min, 42° C. for 45 min, and 50° C. for 25 min. 5 μL of RNase I (Thermo Fisher Scientific), 3 μL of 10×TNF buffer, and 2 μL of H2O were added to RT products, and the mixture was incubated for 30 min at 37° C. After cDNA extension, samples should be kept at below 37° C. to avoid denaturing conditions.
MyOne C1 magnetic beads (Invitrogen) (20 μL/sample) were prepared by washing three times with 1 mL of bead binding buffer (100 mM Tris-HCl pH 7.0, 1 M NaCl, 10 mM EDTA) and resuspending in 10 μL of bead binding buffer supplied with 1 μL of RiboLock. The product of RNase I digestion was mixed with pre-washed beads and incubated for 45 min at room temperature with rotation. After five washes with 500 μL of wash buffer (100 mM Tris pH 7.0, 4 M NaCl, 10 mM EDTA and 0.2% Tween-20) and two washes with 500 μL of 1×PBS, the magnetic beads bound to the cDNA samples were resuspended with 40 μL of H2O. cDNAs were eluted by adding 5 μL of 1 M NaOH and incubated for 15 min at 70° C. at 1000 r.p.m. in a mixer to fully digest RNAs. Samples were immediately placed on a magnet, 45 μL of cDNA eluate was moved to a new tube, and 5 μL of 1 M HCl was added. The eluate was then purified on a Zymo DNA Clean & Concentrator-5 column. After RNase I digestion, DMSO groups were incubated directly and purified with NaOH. The purified samples were mixed with 1 μL (1 U) of FastAP (Thermo Fisher Scientific), 3 μL of 10×CircLigase II (Epicentre), and 1.5 μL of MnCl2, and incubated for 10 min at 37° C. and for 2 min at 95° C. for end repair. A ligation mixture consisting of 12 μL of 50% PEG-4000 (Sigma), 1.5 μL of CircLigase II (Epicentre), and 1 μL of 10 μM 3′ adapter (see Table 1) was added and mixed by intense vortexing. Reactants were incubated for 2 h at 60° C. and then cooled down to 4° C.
The C at the 3′ end of SEQ ID No. 3 was preferably modified by dd; the TCAC at the 3′ end of SEQ ID No. 4 was optionally subjected to thio-modification; an index sequence was optionally inserted between the GAGAT and GTGAC in SEQ ID No. 6.
3. 3′ Adapter Ligation and Second Strand Synthesis
MyOne C1 magnetic beads (Invitrogen) (20 μL/sample) were prepared by washing twice with 500 μL of binding buffer (10 mM Tris-HCl pH 8.0, 1 M NaCl, 1 mM EDTA, 0.05% Tween-20, 0.5% SDS) and resuspending in 250 μL of binding buffer. The ligation products were heated for 2 min at 95° C., then immediately transferred onto ice for at least 1 min, and incubated with pre-washed magnetic beads for 20 min at room temperature with rotation. The beads were then washed once with 200 μL of wash buffer A (10 mM Tris-HCl pH 8.0, 100 mM NaCl, 1 mM EDTA, 0.05% Tween-20, 0.5% SDS) and once with 200 μL of wash buffer B (10 mM Tris-HCl pH 8.0, 100 mM NaCl, 1 mM EDTA, 0.05% Tween).
The magnetic beads were resuspended with 47 μL of a master mix consisting of 40.5 μL of H2O, 5 μL of 10× isothermal amplification buffer (NEB), 0.5 μL of 25 mM dNTP (Thermo Fisher Scientific), and 1 μL of 100 μM extension primer. The mixture was incubated for 2 min at 65° C. in a mixer at 1000 r.p.m., cooled on ice for 1 min and transferred to a pre-cooled 15° C. mixer, and then 3 μL of Bst 2.0 DNA polymerase (NEB) was added. Extension reactants were incubated from 15° C. to 37° C. (1° C./min) and held at 37° C. for 5 min (15 s of mixing per min) at 1500 r.p.m. in a mixer. The magnetic beads were washed once with 200 μL of wash buffer A, once with 50 μL of stringency wash buffer (0.1×SSC buffer, 0.1% SDS) at 55° C. at 1500 r.p.m. in a mixer (15 s of mixing per min), and once with 200 μL of wash buffer B. The magnetic beads were resuspended in 99 μL of a master mix consisting of 86.1 μL of H2O, 10 μL of 10× Tango buffer (Thermo Fisher Scientific), 2.5 μL of 1% Tween-20 and 0.4 μL of 25 mM dNTP and 1 μL of T4 DNA polymerase (Thermo Fisher Scientific). Reactants were incubated for 15 min at 25° C. at 1500 r.p.m. in a mixer (15 s of mixing per min). The beads were washed three times as described above.
4. 5′ Adapter Ligation and Amplification
The magnetic beads were resuspended with 98 μL of a master mix consisting of 73.5 μL of H2O, 10 μL of 10× T4 DNA ligase buffer (Thermo Fisher Scientific), 10 μL of 50% PEG-4000 (Thermo Fisher Scientific), 2.5 μL of 1% Tween-20, and 2 μL of 100 μM double-stranded adapter (DSA) (see Table 1). The DSA was annealed by heating two complementary oligonucleotides for 10 sec at 95° C. and slowly cooling to 14° C. (0.1° C./s). After the addition of 2 μL (10 U) of T4 DNA ligase (Thermo Fisher Scientific), the ligation reactants were incubated for 1 h at 25° C. at 1500 r.p.m. in a mixer (15 s of mixing per min). The beads were washed three times as described above, then resuspended in 25 μL of elution buffer (10 mM Tris-HCl pH 8.0, 0.05% Tween-20), and incubated for 10 min at 95° C. The supernatant was collected for amplification.
Samples were amplified in 40 μL of qPCR reactants (12 μL of cDNA, 20 μL of 2× Phusion HF master mix, 0.75 μL of 10 μM P7 index primer (see Table 1), 0.75 μL of 10 μM P5 primer (see Table 1), 0.4 μL of 25× SybrGreen). The qPCR instrument was programmed as follows: 98° C. for 1 min, 98° C. for 15 s, 65° C. for 30 s, and 72° C. for 45 s. After the qPCR amplification, the samples were size-selected (>150 bp) with 6% native PAGE gel. Deep sequencing was run on HiSeq X Ten (Illumina) after quantification with Qubit (Invitrogen).
II. Computational Pipeline for smartSHAPE Score Calculation
Since most of insertion sequences were shorter than 100 nt, we used only read mate 1 for subsequent processing. The smartSHAPE sequencing data was processed using icSHAPE-pipe. The processing steps were as follows: 1) The 3′ adapter was removed by Cutadapt; 2) Duplicate reads were removed; 3) The first 10 nt were removed using Trimmomatic; 4) Clean reads were mapped to human rRNA with Bowtie2; 5) The un-mapped reads were then mapped to the human (hg38) or mouse (mm10) genome using STAR; 6) Sam files were converted into .tab files using icSHAPE-pipe sam2tab; 7) The smartSHAPE score was calculated using icSHAPE-pipe calcSHAPENoCont with parameters: —N NAI_rep1.tab, NAI_rep2.tab; -size chrNameLength.txt; -out reactivity.gTab; -ijf sjdbList.fromGTF.out.tab. The sjdbList.fromGTF.out.tab and chrNameLength.txt files were generated by STAR during genome index generation.
Basically, icSHAPE-pipe calculated genome-wide smartSHAPE scores based on a sliding window scheme with a default window size of 200 nt and a step size of 5 nt, which skipped non-coding regions and concatenated exons when defining windows. Each nucleotide was calculated 40 times and only nearby nucleotides were considered during the calculations to avoid bias caused by uneven coverage of different regions in each transcript. When 5′ of a read was aligned to a 3′ adjacent site (+1 position), the reverse transcription stop signal of each site was increased by one. Reverse transcription stop signals were normalized within each window and 90% winsorization was performed to get final scores ranging from 0 to 1. The final smartSHAPE score of each base was the average score of all windows containing the base. The smartSHAPE scores were defined as NULL if the coverage is lower than 100, which means failure to probe the structure at these sites.
IV. RNA Structure Analysis
The receiver operating characteristic (ROC) curve was generated with the Python package sklearn. In summary, given a secondary structure and a list of SHAPE scores (0-1), single-stranded bases were regarded as positive samples, and double-stranded bases were regarded as negative samples. The false positive rate (FTR) and true positive rate (TPR) could be calculated if a cutoff of SHAPE scores was used to divide all bases into positive samples and negative samples. Therefore, the ROC curve could be generated by gradually adjusting the cutoff from 0 to 1. AUC is the area under the ROC curve.
RNA structure modeling: The RNA secondary structure was modeled using the Fold program in the RNAstructure package. The smartSHAPE scores could be used as constraints, with the default slope and intercept parameters.
Biotinylated total RNAs of HEK293T modified with NAI-N3 were mixed with 3.5 μL of specific RT primer and 3 μL of 5× first strand buffer. The mixture was heated to 65° C. for 5 min and incubated on ice for 2 min. The annealed samples were mixed with 0.75 μL of RiboLock, 1 μL of 100 mM DTT, 1 μL of 5× first strand buffer, and 1.25 μL of SuperScript III (Life Technologies) and incubated for 30 min at 55° C. The RT products were divided into 5 parts, wherein one group omitted both RNase I digestion and magnetic bead enrichment and one group directly performed magnetic bead enrichment. Other groups were incubated with 10 μL, 5 μL, or 2.5 μL of RNase I, respectively in a 30 μL reaction system. Sample enrichment was performed with MyOne C1 magnetic beads, and the samples were incubated with NaOH for elution as described above. Finally, all the samples were purified with Zymo DNA Clean & Concentrator-5 column and separated by 7 M urea PAGE.
NAI-N3 in icSHAPE and smartSHAPE modifies single-stranded nucleotides and causes reverse transcription (RT) stops. However, reverse transcriptase also stops at some sites of endogenous modifications such as m1A, local structures such as the G-quadruplexes, or simply unmodified sites by chance. These background reverse transcription stop signals will cause false positive signals in the structure score calculation. Therefore, in previous RNA structure probing methods, a DMSO control group was added to remove background signals. In smartSHAPE, however, we introduced an RNase I digestion step after reverse transcription to remove the stop signals at non-modified sites. As shown in
To verify that the RNase I digestion step functions as expected to remove the background reverse transcription stop signals, we designed RT primers upstream of a known m1A modification site in human ribosomal RNA 28S (
To further assess the removal of the background signals in smartSHAPE sequencing data, we constructed libraries from HEK293T cells treated with NAI-N3 and DMSO (see
To assess the performance of smartSHAPE with different amounts of input RNA, we constructed smartSHAPE libraries by using 1 ng, 5 ng, 25 ng and 125 ng of RNA (after rRNA removal) as input to probe whole transcriptome RNA secondary structures in HEK293T cells. All smartSHAPE libraries showed good reproducibility both between libraries of different inputs (see the example in
To assess the complexity of each library at different sequencing depths, we randomly sampled the same number of reads from the total raw sequencing data of each library (Table 2) and calculated smartSHAPE scores accordingly. As shown in
We further compared the proportion of usable sequencing reads in each library. Both icSHAPE and smartSHAPE used random sequence molecular tags adjacent to the 3′ adapter to mark PCR duplication. PCR duplicate reads and reads that were too short to be aligned to the genome or reads that were aligned to rRNAs were useless for calculating RNA structure scores and needed to be discarded. The remaining reads (those aligned to the genome) were defined as usable reads. We observed that more than 60% of the total sequencing reads were usable in the 5 ng, 25 ng and 125 ng libraries. In contrast, only about 40% of the reads in the icSHAPE library generated with 500 ng of RNA as input were usable, showing that the 5 ng, 25 ng and 125 ng smartSHAPE libraries had much more reads that could be aligned to the genome than the icSHAPE library (see
To assess the accuracy of smartSHAPE, we plotted ROC curves for the modifiable bases in 18S and 28S rRNAs by using the calculated smartSHAPE values. The AUCs of different inputs of smartSHAPE library 18S exceeded 0.8, and those of 28S exceeded 0.7, indicating good concordance between the smartSHAPE data and the known structure models, and the accuracy of the smartSHAPE library being significantly higher than that of icSHAPE (see
We also examined other quality control parameters of the smartSHAPE library. Similar to the previous findings, the smartSHAPE data revealed structural features at translation initiation and termination sites, as well as the 3-nucleotide periodicity in CDS regions (see
In summary, smartSHAPE can accurately and reliably probe RNA structures in different amounts of input samples, while requiring only a small fraction of the amount of input RNA required by other state-of-the-art in vivo RNA structure probing methods, and smartSHAPE can still accurately probes RNA structures when using a small amount, e.g., 1 ng, of RNA as input.
Therefore, smartSHAPE should be fairly suitable for many biomedical applications where the acquisition of large amounts of sample materials is extremely challenging.
We developed a new analysis pipeline for the calculation of RNA structure scores based solely on NAI-N3 libraries (see Example 1). Briefly, smartSHAPE values were calculated by normalization and winsorization of RT stop signals in a sliding window fashion across all exons, and the smartSHAPE values for bases with coverage below 100 were defined as NULL (default window size=20 nt, step size=5 nt). We assessed the performance of the new pipeline by using a known structure model of human ribosomal RNA 18S (see Example 1). By plotting a receiver operating characteristic (ROC) curve, we observed that the smartSHAPE scores calculated with the new pipeline were better than the published icSHAPE data, and the area under the curve (AUC) of the smartSHAPE values was significantly higher than that of the icSHAPE values (see
Citrobacter rodentium was grown overnight in LB broth with shaking at 37° C. C57BL/6J mice (6-8 weeks) were infected with a total volume of 200 μL of 2×109 CFUs of Citrobacter rodentium by gavage and sacrificed on day 5 post-infection. Intestinal tissue was collected and placed in ice-cold Hank's balanced salt solution (HBSS) free of calcium and magnesium. The intestine was cut open longitudinally and cut into 1.5 cm pieces and incubated twice at 37° C. for 20 min in HBSS containing 10 mM HEPES, 10 mM EDTA (Promega) and 1 mM dithiothreitol (DTT, Fermentas) to remove epithelial cells and mucus. Then the tissue was washed with HBSS containing 10 mM HEPES and digested with slow rotation at 37° C. for 75 min in RPMI 1640 (containing calcium and magnesium) containing 5% heat-inactivated fetal bovine serum (FBS), 1 mg/mL collagenase IV (Sigma), 1 mg/mL dispase (Roche), and 100 μg/mL DNase I (Sigma). The digested tissue was homogenized by vigorous shaking, passed through a 70 μm cell strainer and resuspended in 40% Percoll (GE health care) solution, and the suspension was then gradient-density centrifuged at 2,500 rpm for 20 min at room temperature. And red blood cells were lysed with ACK lysis buffer. After staining, Ly6C+ and Ly6C− colonic macrophages were sorted on FACSAria4 laser (BD).
Innate immunity is precisely regulated to effectively eliminate pathogens while avoiding tissue damage caused by excessive immune responses. The mediators of these immune responses generally show transient expression to induce and subsequently eliminate inflammation. Post-transcriptional regulation is crucial for the rapid inhibition of protein expression of key inflammatory mediators, in which RNA structures play an important role in the regulation of RNA degradation and translation. For example, the GAIT element (the only riboswitch in mammalian cells) blocks the translation of the Vegfa gene in macrophages by recruiting GAIT complex when switching into a hairpin conformation.
To identify new post-transcriptional regulatory RNA structure elements in immune cells, we used smartSHAPE to probe RNA secondary structure whole transcriptome in intestinal macrophages isolated from mice infected with Citrobacter rodentium (see
The intestinal macrophages are essential for maintaining a balance between immune responses and antigen tolerance in the intestines. Specifically, monocytes recruited from blood differentiate into Ly6Clo tissue resident macrophages, which maintain intestinal homeostasis by producing anti-inflammatory cytokines such as Interleukin (IL)-10. However, during intestinal inflammation, circulating monocytes differentiate into Ly6Chi pro-inflammatory macrophages, which trigger inflammation by producing pro-inflammatory cytokines such as IL6, IL1b, and IL12. To explore the potential differences in the RNA structure between tissue resident and pro-inflammatory macrophages, we used about 100 ng of total RNA to perform smartSHAPE 20 library construction for Ly6Clo and Ly6Chi macrophages. From the smartSHAPE data of Ly6Clo and Ly6Chi macrophages, we obtained the structural information of more than 3,000 and more than 2,000 transcripts with high coverage, respectively (see
It can be seen that the results of the RNA structure probing method of the present invention can be used to assess the functional states of cells, for example, immune stress responses. Similarly, the results of the RNA structure probing method can be used to assess other functional states of cells, for example, to study the effect of RNA on early development, and the occurrence and progression of cancer.
The preferred embodiments of the present invention are described in detail above, which, however, are not intended to limit the present invention. Within the scope of the technical concept of the present invention, various simple modifications can be made to the technical solution of the present invention, all of which will fall within the protection scope of the present invention.
In addition, it should be noted that the various specific technical features described in the above specific embodiments can be combined in any suitable manner without contradiction. In order to avoid unnecessary repetition, such combinations will not be illustrated separately.
Various embodiments of the present invention can also be combined arbitrarily, and should also be regarded as the disclosure of the present invention, as long as they do not violate the idea of the present invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2020/126766 | 11/5/2020 | WO |